Hi Brandi!
I think that it is because you must enable X11 permise on remote node.
Go to /etc/ssh directory and edit the ssh_config file. First, you should
check if it contains the code that follow:
Host *
ForwardX11 yes
If this code doesn't exist, try to type it at the end of the ssh_config
file, restart the sshd daemon and finally, make SSH again.
Cheers,
Jose
On Fri, 2004-09-17 at 23:27, Brandi Winfrey wrote:
> I found an email in the archives that had a similar problem, but could not
> find the
> answer to the problem. I had a cluster of 9 computers that worked fine. I
> just
> added 6 more nodes to the cluster. All nodes were added successfully, the
> networking went fine, even the "Complete the Cluster Setup" section passed.
> When I started the "Test Cluster Setup". First it was having issues with
> SSH
> that had to do with passwords and "man-in-the-middle" attacks. I know this
> is because I used ssh from one node to another before I should have and the
> keys didn't match. Wasn't exactly sure how to fix this so I just deleted
> the
> known_hosts file in the .ssh directory and ran /opt/opium/bin/sync_users
> --force
> to try to reset all of the passwords. I was also getting prompted for a
> password on the Test Cluster Setup even though I didn't set a password.
>
> After doing this, I now can pass the SSH pingtest, the SSH server->node, and
> the SSH node->server tests. I can't go any further than this without
> failing.
> There is an error that there aren't enough free nodes.
>
> I quit the test, and try a few things...
>
> I CAN ssh to and from all of the nodes, but I get the following warning
> "Warning: No xauth data; using fake authentication data for
> X11 forwarding."
> My /etc/hosts file looks fine
>
> /etc/hosts:---------------------------------------------------------------------------------
>
> # Do not remove the following line, or various programs
> # that require network functionality will fail.
> 10.0.0.100 oscar oscar.oscardomain oscar_server nfs_oscar pbs_oscar
> 127.0.0.1 localhost localhost.oscardomain localhost
> 129.162.79.58 oscar oscar.geophysics.swri.edu
>
> # These entries are managed by SIS, please don't modify them.
> 10.0.0.1 oscarnode1.oscardomain oscarnode1
> 10.0.0.10 oscarnode10.oscardomain oscarnode10
> 10.0.0.11 oscarnode11.oscardomain oscarnode11
> 10.0.0.12 oscarnode12.oscardomain oscarnode12
> 10.0.0.13 oscarnode13.oscardomain oscarnode13
> 10.0.0.14 oscarnode14.oscardomain oscarnode14
> 10.0.0.2 oscarnode2.oscardomain oscarnode2
> 10.0.0.3 oscarnode3.oscardomain oscarnode3
> 10.0.0.4 oscarnode4.oscardomain oscarnode4
> 10.0.0.5 oscarnode5.oscardomain oscarnode5
> 10.0.0.6 oscarnode6.oscardomain oscarnode6
> 10.0.0.7 oscarnode7.oscardomain oscarnode7
> 10.0.0.8 oscarnode8.oscardomain oscarnode8
> 10.0.0.9 oscarnode9.oscardomain oscarnode9
> -------------------------------------------------------------------------------------------------
>
> If I run pbsnodes -a, all of the nodes show up with the following
> (substitute the correct
> node number where the 13 is):
> oscarnode13.oscardomain
> state = job-exclusive
> np = 1
> properties = all
> ntype = cluster
> jobs = 0/2.oscar
>
> Oh, the job-exclusive and 0/2.oscar comments above are probably because I
> have
> a run currently executing correctly on nodes 1-8 which still work correctly.
> The only
> nodes that I can't get to cluster are the new nodes 9-14.
>
> If I execute ifconfig -a, all of the ethernet cards are UP
>
> The problem seems to be with MPI. When I run MPI I get the following error
> (only with the new nodes):
>
> rm_905: p4_error: Could not gethostbyname for host oscarnode9; may be
> invalid name
> : 61
> bm_list_1264: (81.330047) wakeup_slave: unable to interrupt slave 0 pid 1263
> bm_list_1264: (81.330446) wakeup_slave: unable to interrupt slave 0 pid 1263
> bm_list_1264: (81.330694) wakeup_slave: unable to interrupt slave 0 pid 1263
> bm_list_1264: (81.330941) wakeup_slave: unable to interrupt slave 0 pid 1263
> bm_list_1264: (81.331366) wakeup_slave: unable to interrupt slave 0 pid 1263
> bm_list_1264: (81.331619) wakeup_slave: unable to interrupt slave 0 pid 1263
> p9_762: (68.906620) net_recv failed for fd = 3
> p9_762: p4_error: net_recv read, errno = : 104
> p13_691: (61.793863) net_recv failed for fd = 3
> p13_691: p4_error: net_recv read, errno = : 104
> p14_691: (60.231699) net_recv failed for fd = 3
> p14_691: p4_error: net_recv read, errno = : 104
> p11_691: (65.034232) net_recv failed for fd = 3
> p11_691: p4_error: net_recv read, errno = : 104
> p12_691: (63.368020) net_recv failed for fd = 3
> p12_691: p4_error: net_recv read, errno = : 104
> p10_694: (66.745358) net_recv failed for fd = 3
> p10_694: p4_error: net_recv read, errno = : 104
>
>
> I looked at the file <eth_module>.o in
> /lib/modules/<kernal-version>/kernal/drivers/net
> on the master and on the nodes. They are not the same, but when I look at
> this file
> for the first 8 nodes that are working correctly, they are also not the
> same. Despite this,
> I took some advice from the archives and copied the master's <eth_module>.o
> file to
> the nodes (only the 6 new nodes) and rebooted the nodes. This seems to have
> done
> nothing. Since I saved the original files, I'll probably just put it back
> the way it was.
>
> I looked at the /etc/exports and /etc/fstab files. I didn't see anything
> wrong there.
>
> /etc/fstab
> (nodes):-------------------------------------------------------------------
>
> /dev/hda6 / ext2 defaults 1 2
> /dev/hda5 swap swap defaults 0 0
> /dev/hda1 /boot ext2 defaults 1 2
> /dev/fd0 /mnt/floppy auto noauto,owner 0 0
> none /dev/pts devpts defaults 0 0
> none /proc proc defaults 0 0
> nfs_oscar:/home /home nfs rw 0 0
>
> /etc/fstab
> (master):---------------------------------------------------------------------
>
> LABEL=/ / ext3 defaults 1 1
> LABEL=/boot /boot ext3 defaults 1 2
> none /dev/pts devpts gid=5,mode=620 0 0
> none /proc proc defaults 0 0
> none /dev/shm tmpfs defaults 0 0
> /dev/hda3 swap swap defaults 0 0
> /dev/cdrom /mnt/cdrom udf,iso9660
> noauto,owner,kudzu,ro 0 0
> /dev/hdd4 /mnt/zip auto noauto,owner,kudzu 0
> 0
> /dev/fd0 /mnt/floppy auto noauto,owner,kudzu 0
> 0
>
> /etc/exports (master -- the nodes don't have one):----------------
>
> /home 10.0.0.100/255.255.255.0(async,rw,no_root_squash)
>
> ---------------------------------------------------------------------------------------
>
> Do you have any suggestions on how to fix this?
>
> Thank you,
> Brandi
>
> _________________________________________________________________
> Check out Election 2004 for up-to-date election news, plus voter tools and
> more! http://special.msn.com/msn/election2004.armx
>
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170
> Project Admins to receive an Apple iPod Mini FREE for your judgement on
> who ports your project to Linux PPC the best. Sponsored by IBM.
> Deadline: Sept. 24. Go here: http://sf.net/ppc_contest.php
> _______________________________________________
> Oscar-users mailing list
> [EMAIL PROTECTED]
> https://lists.sourceforge.net/lists/listinfo/oscar-users
-------------------------------------------------------
This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170
Project Admins to receive an Apple iPod Mini FREE for your judgement on
who ports your project to Linux PPC the best. Sponsored by IBM.
Deadline: Sept. 24. Go here: http://sf.net/ppc_contest.php
_______________________________________________
Oscar-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/oscar-users