Hi!
I am not blaming OSCAR in any way. I just don't know where this error could
be originating from and I am trying to eliminate the options.
I have a small cluster (OSCAR 5.1rc18047, Fedora 8) and I was able to run
some application software on it. Then lightning struck very close to the
building. Fortunately I had unplugged all the power cables (because the
cluster has not yet been moved to where the power lines are protected), but
it seems that the institution didn't have any protection on their intranet
cables, and so the whole building's public network cards are damaged. A
costly lesson.
Anyway, when I tried to run the application software in parallel across the
cluster (using the private network which is unscathed) I get the following
error message:
*bind: Cannot assign requested address.*
I contacted the application software's help department as I thought I had
perhaps forgotten to set something, but according to them it is a normal
network problem. They gave some suggestions as to what the problem may be,
but I have checked it and it doesn't cure the problem.
I have included it here so that you don't waste time by suggesting the same
things.
Quote:
Check the /etc/hosts file and make sure that the nodes all have a
single definition and you don't have lines like
127.0.0.1 localhost normnode3
and that normnode3 has the same address both on the master and on the
node.
You can try
ping normnode3
from the master and see what address comes back
64 bytes from 164.190.57.105: icmp_seq=1 ttl=64 time=0.306 ms
or is it 127.0.0.1. Then do the reverse.
Also double check that you can ssh between nodes without password
but I would expect a different error then.
The command "hostname" returns gnlserv01, which is the public NIC.
After the lightning I had trouble getting the nodes to communicate
"automatically" with each other, but it can be cured by starting the xinetd
service (so that the nodes can boot across the network, otherwise I get tftp
errors) and disabling the firewall on the master node (it's not too
dangerous since I don't have a public interface at present and since I'm
sitting behind the institution's firewall as well.)
Is there a service that I need to start, or some port that I need to open?
Ganglia also doesn't work (doesn't show any stats) after the lightning. But
I guess it's because it was configured as: https//gnlserv01/ganglia and the
public NIC is dead... ?
Here is a copy of how my /etc/hosts file looks like:
Code:
# Do not remove the following line, or various programs
# that require network functionality will fail.
# These entries are managed by SIS, please don't modify them.
127.0.0.1 localhost.localdomain localhost
192.168.1.254 snode0.oscardomain.za snode0 oscar_server
nfs_oscar pbs_oscar
abc.xyz.104.218 gnlserv01.ab.cx.yz gnlserv01
192.168.1.1 normnode1.ab.cx.yz normnode1
192.168.1.2 normnode2.ab.cx.yz normnode2
192.168.1.3 normnode3.ab.cx.yz normnode3
Here is the output of ifconfig -a:
Code:
[compc...@gnlserv01 /root]$ ifconfig -a
eth0 Link encap:Ethernet HWaddr 00:1C:C0:AF:10:18
inet addr:192.168.1.254 Bcast:192.168.255.255 Mask:255.255.0.0
inet6 addr: fe80::21c:c0ff:feaf:1018/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:2587 errors:0 dropped:0 overruns:0 frame:0
TX packets:3109 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:332943 (325.1 KiB) TX bytes:409521 (399.9 KiB)
Base address:0x20c0 Memory:e0300000-e0320000
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:5009 errors:0 dropped:0 overruns:0 frame:0
TX packets:5009 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:3813184 (3.6 MiB) TX bytes:3813184 (3.6 MiB)
I'm really clueless. I'm a chemist and I got this cluster to run somehow,
but it wasn't because I knew what I was doing.
I would greatly appreciate any suggestions and comments!
;-)
Rion
--
"For the Lord will not cast off forever, but, though He cause grief, He will
have compassion according to the abundance of His steadfast love; for He
does not willingly afflict or grieve the children of men." - Lamentations
3:31-33
------------------------------------------------------------------------------
ThinkGeek and WIRED's GeekDad team up for the Ultimate
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the
lucky parental unit. See the prize list and enter to win:
http://p.sf.net/sfu/thinkgeek-promo
_______________________________________________
Oscar-users mailing list
Oscar-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oscar-users