|
Okay this sounds like a bug then - we need to make sure we
are appending the pbs_oscar host alias to the correct
entry/ip...
Can you file a bug in sourceforge?
(http://www.sf.net/projects/oscar)
Thanks,
Bernard
Hi Bernard,
The the address 10.0.1.10 was for my intranet
network card (eth1) and I used ./install_cluster eth1 for the
installation.
BTW, I'd like to make it a bit more clear for the meaning
of " hostname for the network should be the same in the PBSserver file".
My understanding is that the hostname (one can get from %echo $HOSTNAME)
should be on the same line as the name of the PBSserver (pbs_oscar) in the
file /etc/hosts. In the example below my $HOSTNAME was
abc.ntu.edu.tw.
Thanks to you for your suggestions as
well.
Shiang-Tai
Bernard Li wrote:
Hi Shiang-Tai:
Good work! This may be a potential bug that we might have to look into.
One question though, assuming that 10.0.1.10 is the address of your eth1
interface, have you always been running:
./install_cluster eth1
? Have you ever run install_cluster on eth0?
Thanks,
Bernard
-----Original Message-----
From: Shiang-Tai Lin [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, January 11, 2005 20:53
To: Yu Chen
Cc: Bernard Li; [email protected]
Subject: Re: [Oscar-users] PBS configuration failure during
post_install (OSCAR4+FC2)
Hi,
I finally figure out the problem with the Torque post_install
failure.
The problem (at least for me) was that
the hostname for the network is not the same in the PBSserver file.
That is, the name of the PBSserver,
pbs_oscar (defined in /var/spool/pbs/server_name), is not
defined in the file /etc/hosts. This should always be true if
one has only one network card installed but may be setup
incorrectly if there are additional network cards on the
server node. To be more clear, I list my /etc/hosts file here
(the xx are numbers I do not wish to disclose for security reasons)
# Do not remove the following line, or various programs #
that require network functionality will fail.
10.0.1.10 abc.ntu.edu.tw abc oscar_server nfs_oscar pbs_oscar
140.112.xx.xx def.ntu.edu.tw def
# These entries are managed by SIS, please don't modify them.
10.0.1.1 node1.abc.ntu.edu.tw node1
10.0.1.2 node2.abc.ntu.edu.tw node2
Originally I had "10.0.1.10 def.ntu.edu.tw def oscar_server
nfs_oscar pbs_oscar" in the /etc/hosts
so the post_install always failed.
I also want to note that it is not necessary to edit the
nodes file /var/spool/pbs/server_priv/nodes. The post_install
script can still find the nodes without this file.
Finally, I'd like to point out that I figured this out after
reading the following paragraph I found in
http://www.mail-archive.com/[email protected]/
msg01387.html
"If the primary name on the interface is not the name in the
PBSserver file, you will get get an "Unauthorized Request"
error when you attempt to configure the server with qmgr."
Thanks to all, especially Bernard, who tried to help me out.
Shiang-Tai
Yu Chen wrote:
Hello,
I can confirm Shiang-Tai's finding, it happened to me too, the same
thing, although different system. I am using RH-EL-AS-3
update 3 on i386.
The error messages are the same, I thought it's the
"/opt/pbs/bin/pbsnodes: Server has no node list" problem,
so I created
/var/spool/pbs/server_priv/nodes file manually, restarted
pbs_server,
then run the "pbs_postinstall", now there is no
"/opt/pbs/bin/pbsnodes: Server has no node list" message, but still
tons of "qmgr obj=node2.cl.hhmi.umbc.edu svr=default: Unauthorized
Request" messages from each node.
This is my first time playing with PBS, so anyone has any ideas on
this, maybe something on nodes have to be done? BTW, I can
ssh to any
node without password without problem.
Chen
On Fri, 7 Jan 2005, Bernard Li wrote:
Hi Shiang-Tai:
You should not need to select anything in Step 1, since
Torque should
be selected by default. If you need to select it manually, then
something is wrong.
Can you run the following command and paste the output
here? Run it
2 times at least:
% cd /opt/oscar/packages/torque/scripts
% ./post_install
also:
% qmgr -c "print server"
You might also want to check the Torque logs to see what
is going on:
/var/spool/pbs/server_logs/pbs_server.log
Cheers,
Bernard
===========================================
Yu Chen
Howard Hughes Medical Institute
Chemistry Building, Rm 182
University of Maryland at Baltimore County 1000 Hilltop Circle
Baltimore, MD 21250
phone: (410)455-6347 (primary)
(410)455-2718 (secondary)
fax: (410)455-1174
email: [EMAIL PROTECTED]
===========================================
|