Hi Costel,
Don't worry about disable service,. If your iptable is disabled, then it is ok.
If I'm correct, your nodes are on a private network connected to eth1 on your
head (and eth0 is on the public network).
If this is the case, and If I remember well my old cluster which had the same
architecture, the /etc/hosts pbs_oscar entry should point to the IP of the eth1.
Check (on the head *and* on the nodes) that /etc/torque/server_name contains a
hostname that can be resolved by all nodes and points to the eth1 IP. Check
that the /etc/hosts in the image, the nodes and the head have the correct entry
for pbs_oscar (ort the host that is in /etc/torque/server_name
The restart all pbs_mom, trqauthd and pbs_server services.
If it doesn't fix the issues, as a last resort, check the return of the
hostname commandon the nodes and try to use that in the
/var/lib/torque/server_priv/nodes. If hostnames are not correct, fix that in
/etc/sysconfig/network
Beyond that I don't have anymore ideas.
Best regards,
PS: Why did you had to manually edit the nodes files, did the step 7 failed to
setup that correctly? I almost copletely rewriten the torque setup post install
and handely many unhandeled errors situation.... Seems that I missed some :(
(If you can send to me the log of the torque post install it may help me).
Olivier.
--
Olivier LAHAYE
CEA DRT/LIST/DCSI/DIR
________________________________
De : Costel Seitan [csei...@slb.com]
Date d'envoi : mercredi 13 mars 2013 16:08
À : oscar-users@lists.sourceforge.net
Cc: LAHAYE Olivier
Objet : RE: [Oscar-users] RE : RE : OSCAR unstable News: yume finaly WORKS in
all situations:-) and new oscar-utils package.
Olivier,
I am not sure I selected disable service opkg .. I do not really remember.
I checked line by line
/var/lib/torque/server_priv/nodes : I created it myself and added the hostnames
of all present and future nodes, one per line.
/etc/torque/server_name: contains “pbs_oscar » on all the nodes and the master
I did cexec iptables –L and seems disabled. I even did telnet masternode 15001
and it looks OK.
I restarted pbs_mom on nodes and pbs_server several times. I also restarted
trqauthd processes.
munge is running fine on all nodes and the server.
I changed the log level and the messages are more complete now. It looks like a
host resolution pb.:
03/13/2013 15:51:28;0004;PBS_Server.4105;Svr;authenticate_user;Hosts do not
match: Requested host <eth0_hostname>: credential host: <eth1_hostname>
Where
eth0_hostname is the first name appearing into the /etc/hosts file for the
master (the same line with pbs_server)
And
eth1_hostname is the FQDN name = DNS hostname for the master as seen from
outside the cluster.
Kind Regards,
Costel
From: LAHAYE Olivier [mailto:olivier.lah...@cea.fr]
Sent: Wednesday, March 13, 2013 2:27 PM
To: Costel Seitan
Cc: oscar-users@lists.sourceforge.net
Subject: [Oscar-users] RE : RE : OSCAR unstable News: yume finaly WORKS in all
situations:-) and new oscar-utils package.
did you select the disable service opkg? I don't remember if I recommended it.
IT'll disable iptables if my memory is correct.
can you check /var/lib/torque/server_priv/nodes
can you check /etc/torque/server_name
anyway, can you check that iptables are disabled on nodes?
can you restart the pbs_mom on nodes and pbs_server on head?
can you check that munge is running on head and nodes
What does /opt/pbs/bin/pbsnodes reports?
Note that it is recommended to avoid running step 7 when all nodes are not up
and running. I've fixed many post install scripts so they can be run multiple
times, but sometimes there are things that can be run once. example: cexec will
automatically disable nodes that are in /etc/c3.conf and that fail to respond.
There is no command to automatically reenable dead nodes (I've asked for the
feature upstream and received positive feedback, but no delays in feature
availability).
Best regards,
Olivier.
PS: I forgot to reply to oscar-user the 1st time, but I think it can be of any
use to other oscar users, so I put my answer again in the list. please accept
my apologies for that.
--
Olivier LAHAYE
CEA DRT/LIST/DCSI/DIR
------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar
_______________________________________________
Oscar-users mailing list
Oscar-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oscar-users