Hi Brandi!

I think that it is because you must enable X11 permise on remote node.
Go to /etc/ssh directory and edit the ssh_config file. First, you should
check if it contains the code that follow:

Host *
        ForwardX11 yes

If this code doesn't exist, try to type it at the end of the ssh_config
file, restart the sshd daemon and finally, make SSH again.

Cheers,

Jose

On Fri, 2004-09-17 at 23:27, Brandi Winfrey wrote:
> I found an email in the archives that had a similar problem, but could not 
> find the
> answer to the problem.  I had a cluster of 9 computers that worked fine.  I 
> just
> added 6 more nodes to the cluster.  All nodes were added successfully, the
> networking went fine, even the "Complete the Cluster Setup" section passed.
> When I started the "Test Cluster Setup".  First it was having issues with 
> SSH
> that had to do with passwords and "man-in-the-middle" attacks.  I know this
> is because I used ssh from one node to another before I should have and the
> keys didn't match.  Wasn't exactly sure how to fix this so I just deleted 
> the
> known_hosts file in the .ssh directory and ran /opt/opium/bin/sync_users 
> --force
> to try to reset all of the passwords.   I was also getting prompted for a
> password on the Test Cluster Setup even though I didn't set a password.
> 
> After doing this, I now can pass the SSH pingtest, the SSH server->node, and
> the SSH node->server tests.  I can't go any further than this without 
> failing.
> There is an error that there aren't enough free nodes.
> 
> I quit the test, and try a few things...
> 
> I CAN ssh to and from all of the nodes, but I get the following warning
>               "Warning: No xauth data; using fake authentication data for 
> X11 forwarding."
> My /etc/hosts file looks fine
> 
> /etc/hosts:---------------------------------------------------------------------------------
> 
> # Do not remove the following line, or various programs
> # that require network functionality will fail.
> 10.0.0.100 oscar oscar.oscardomain oscar_server nfs_oscar pbs_oscar
> 127.0.0.1       localhost       localhost.oscardomain   localhost
> 129.162.79.58   oscar   oscar.geophysics.swri.edu
> 
> # These entries are managed by SIS, please don't modify them.
> 10.0.0.1             oscarnode1.oscardomain     oscarnode1
> 10.0.0.10            oscarnode10.oscardomain    oscarnode10
> 10.0.0.11            oscarnode11.oscardomain    oscarnode11
> 10.0.0.12            oscarnode12.oscardomain    oscarnode12
> 10.0.0.13            oscarnode13.oscardomain    oscarnode13
> 10.0.0.14            oscarnode14.oscardomain    oscarnode14
> 10.0.0.2             oscarnode2.oscardomain     oscarnode2
> 10.0.0.3             oscarnode3.oscardomain     oscarnode3
> 10.0.0.4             oscarnode4.oscardomain     oscarnode4
> 10.0.0.5             oscarnode5.oscardomain     oscarnode5
> 10.0.0.6             oscarnode6.oscardomain     oscarnode6
> 10.0.0.7             oscarnode7.oscardomain     oscarnode7
> 10.0.0.8             oscarnode8.oscardomain     oscarnode8
> 10.0.0.9             oscarnode9.oscardomain     oscarnode9
> -------------------------------------------------------------------------------------------------
> 
> If I run pbsnodes -a, all of the nodes show up with the following 
> (substitute the correct
> node number where the 13 is):
>      oscarnode13.oscardomain
>      state = job-exclusive
>      np = 1
>      properties = all
>      ntype = cluster
>      jobs = 0/2.oscar
> 
> Oh, the job-exclusive and 0/2.oscar comments above are probably because I 
> have
> a run currently executing correctly on nodes 1-8 which still work correctly. 
>   The only
> nodes that I can't get to cluster are the new nodes 9-14.
> 
> If I execute ifconfig -a, all of the ethernet cards are UP
> 
> The problem seems to be with MPI.  When I run MPI I get the following error
> (only with the new nodes):
> 
> rm_905:  p4_error: Could not gethostbyname for host oscarnode9; may be 
> invalid name
> : 61
> bm_list_1264: (81.330047) wakeup_slave: unable to interrupt slave 0 pid 1263
> bm_list_1264: (81.330446) wakeup_slave: unable to interrupt slave 0 pid 1263
> bm_list_1264: (81.330694) wakeup_slave: unable to interrupt slave 0 pid 1263
> bm_list_1264: (81.330941) wakeup_slave: unable to interrupt slave 0 pid 1263
> bm_list_1264: (81.331366) wakeup_slave: unable to interrupt slave 0 pid 1263
> bm_list_1264: (81.331619) wakeup_slave: unable to interrupt slave 0 pid 1263
> p9_762: (68.906620) net_recv failed for fd = 3
> p9_762:  p4_error: net_recv read, errno = : 104
> p13_691: (61.793863) net_recv failed for fd = 3
> p13_691:  p4_error: net_recv read, errno = : 104
> p14_691: (60.231699) net_recv failed for fd = 3
> p14_691:  p4_error: net_recv read, errno = : 104
> p11_691: (65.034232) net_recv failed for fd = 3
> p11_691:  p4_error: net_recv read, errno = : 104
> p12_691: (63.368020) net_recv failed for fd = 3
> p12_691:  p4_error: net_recv read, errno = : 104
> p10_694: (66.745358) net_recv failed for fd = 3
> p10_694:  p4_error: net_recv read, errno = : 104
> 
> 
> I looked at the file <eth_module>.o in 
> /lib/modules/<kernal-version>/kernal/drivers/net
> on the master and on the nodes.  They are not the same, but when I look at 
> this file
> for the first 8 nodes that are working correctly, they are also not the 
> same.  Despite this,
> I took some advice from the archives and copied the master's <eth_module>.o 
> file to
> the nodes (only the 6 new nodes) and rebooted the nodes.  This seems to have 
> done
> nothing.  Since I saved the original files, I'll probably just put it back 
> the way it was.
> 
> I looked at the /etc/exports and /etc/fstab files.  I didn't see anything 
> wrong there.
> 
> /etc/fstab 
> (nodes):-------------------------------------------------------------------
> 
> /dev/hda6       /       ext2    defaults        1       2
> /dev/hda5       swap    swap    defaults        0       0
> /dev/hda1       /boot   ext2    defaults        1       2
> /dev/fd0        /mnt/floppy     auto    noauto,owner    0       0
> none    /dev/pts        devpts  defaults        0       0
> none    /proc   proc    defaults        0       0
> nfs_oscar:/home /home   nfs     rw      0       0
> 
> /etc/fstab 
> (master):---------------------------------------------------------------------
> 
> LABEL=/                 /                       ext3    defaults        1 1
> LABEL=/boot             /boot                   ext3    defaults        1 2
> none                    /dev/pts                devpts  gid=5,mode=620  0 0
> none                    /proc                   proc    defaults        0 0
> none                    /dev/shm                tmpfs   defaults        0 0
> /dev/hda3               swap                    swap    defaults        0 0
> /dev/cdrom              /mnt/cdrom              udf,iso9660 
> noauto,owner,kudzu,ro 0 0
> /dev/hdd4               /mnt/zip                auto    noauto,owner,kudzu 0 
> 0
> /dev/fd0                /mnt/floppy             auto    noauto,owner,kudzu 0 
> 0
> 
> /etc/exports (master -- the nodes don't have one):----------------
> 
> /home 10.0.0.100/255.255.255.0(async,rw,no_root_squash)
> 
> ---------------------------------------------------------------------------------------
> 
> Do you have any suggestions on how to fix this?
> 
> Thank you,
> Brandi
> 
> _________________________________________________________________
> Check out Election 2004 for up-to-date election news, plus voter tools and 
> more! http://special.msn.com/msn/election2004.armx
> 
> 
> 
> -------------------------------------------------------
> This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170
> Project Admins to receive an Apple iPod Mini FREE for your judgement on
> who ports your project to Linux PPC the best. Sponsored by IBM.
> Deadline: Sept. 24. Go here: http://sf.net/ppc_contest.php
> _______________________________________________
> Oscar-users mailing list
> [EMAIL PROTECTED]
> https://lists.sourceforge.net/lists/listinfo/oscar-users


-------------------------------------------------------
This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170
Project Admins to receive an Apple iPod Mini FREE for your judgement on
who ports your project to Linux PPC the best. Sponsored by IBM.
Deadline: Sept. 24. Go here: http://sf.net/ppc_contest.php
_______________________________________________
Oscar-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/oscar-users

Reply via email to