Wow, this list is alive? I joined 5 or 6 months ago and this post is
the first traffic I got! Sorry, I am a noob here so can't help you.
Just wondering if anyone else is out there or if there's a better list
to join re Oscar and hpc?
Thanx,
A. Jorge Garcia
Applied Math & CS
http://shadowfaxrant.blogspot.com
Sent from my iPod
On Jun 18, 2010, at 9:25 AM, Marion Randall <ranmar...@gmail.com> wrote:
Hi!
I am not blaming OSCAR in any way. I just don't know where this
error could be originating from and I am trying to eliminate the
options.
I have a small cluster (OSCAR 5.1rc18047, Fedora 8) and I was able
to run some application software on it. Then lightning struck very
close to the building. Fortunately I had unplugged all the power
cables (because the cluster has not yet been moved to where the
power lines are protected), but it seems that the institution didn't
have any protection on their intranet cables, and so the whole
building's public network cards are damaged. A costly lesson.
Anyway, when I tried to run the application software in parallel
across the cluster (using the private network which is unscathed) I
get the following error message:
bind: Cannot assign requested address.
I contacted the application software's help department as I thought
I had perhaps forgotten to set something, but according to them it
is a normal network problem. They gave some suggestions as to what
the problem may be, but I have checked it and it doesn't cure the
problem.
I have included it here so that you don't waste time by suggesting
the same things.
Quote:
Check the /etc/hosts file and make sure that the nodes all have a
single definition and you don't have lines like
127.0.0.1 localhost normnode3
and that normnode3 has the same address both on the master and on the
node.
You can try
ping normnode3
from the master and see what address comes back
64 bytes from 164.190.57.105: icmp_seq=1 ttl=64 time=0.306 ms
or is it 127.0.0.1. Then do the reverse.
Also double check that you can ssh between nodes without password
but I would expect a different error then.
The command "hostname" returns gnlserv01, which is the public NIC.
After the lightning I had trouble getting the nodes to communicate
"automatically" with each other, but it can be cured by starting the
xinetd service (so that the nodes can boot across the network,
otherwise I get tftp errors) and disabling the firewall on the
master node (it's not too dangerous since I don't have a public
interface at present and since I'm sitting behind the institution's
firewall as well.)
Is there a service that I need to start, or some port that I need to
open?
Ganglia also doesn't work (doesn't show any stats) after the
lightning. But I guess it's because it was configured as: https//
gnlserv01/ganglia and the public NIC is dead... ?
Here is a copy of how my /etc/hosts file looks like:
Code:
# Do not remove the following line, or various programs
# that require network functionality will fail.
# These entries are managed by SIS, please don't modify them.
127.0.0.1 localhost.localdomain localhost
192.168.1.254 snode0.oscardomain.za snode0 oscar_server
nfs_oscar pbs_oscar
abc.xyz.104.218 gnlserv01.ab.cx.yz gnlserv01
192.168.1.1 normnode1.ab.cx.yz normnode1
192.168.1.2 normnode2.ab.cx.yz normnode2
192.168.1.3 normnode3.ab.cx.yz normnode3
Here is the output of ifconfig -a:
Code:
[compc...@gnlserv01 /root]$ ifconfig -a
eth0 Link encap:Ethernet HWaddr 00:1C:C0:AF:10:18
inet addr:192.168.1.254 Bcast:192.168.255.255 Mask:
255.255.0.0
inet6 addr: fe80::21c:c0ff:feaf:1018/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:2587 errors:0 dropped:0 overruns:0 frame:0
TX packets:3109 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:332943 (325.1 KiB) TX bytes:409521 (399.9 KiB)
Base address:0x20c0 Memory:e0300000-e0320000
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:5009 errors:0 dropped:0 overruns:0 frame:0
TX packets:5009 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:3813184 (3.6 MiB) TX bytes:3813184 (3.6 MiB)
I'm really clueless. I'm a chemist and I got this cluster to run
somehow, but it wasn't because I knew what I was doing.
I would greatly appreciate any suggestions and comments!
;-)
Rion
--
"For the Lord will not cast off forever, but, though He cause grief,
He will have compassion according to the abundance of His steadfast
love; for He does not willingly afflict or grieve the children of
men." - Lamentations 3:31-33
---
---
---
---------------------------------------------------------------------
ThinkGeek and WIRED's GeekDad team up for the Ultimate
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the
lucky parental unit. See the prize list and enter to win:
http://p.sf.net/sfu/thinkgeek-promo
_______________________________________________
Oscar-users mailing list
Oscar-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oscar-users
------------------------------------------------------------------------------
ThinkGeek and WIRED's GeekDad team up for the Ultimate
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the
lucky parental unit. See the prize list and enter to win:
http://p.sf.net/sfu/thinkgeek-promo
_______________________________________________
Oscar-users mailing list
Oscar-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oscar-users