Hi,
I finally figure out the problem with the Torque post_install failure. The problem (at least for me) was that
the hostname for the network is not the same in the PBSserver file. That is, the name of the PBSserver,
pbs_oscar (defined in /var/spool/pbs/server_name), is not defined in the file /etc/hosts. This should
always be true if one has only one network card installed but may be setup incorrectly if there are additional
network cards on the server node. To be more clear, I list my /etc/hosts file here (the xx are numbers I do not
wish to disclose for security reasons)


# Do not remove the following line, or various programs
# that require network functionality will fail.
10.0.1.10      abc.ntu.edu.tw  abc oscar_server nfs_oscar pbs_oscar
140.112.xx.xx def.ntu.edu.tw def

# These entries are managed by SIS, please don't modify them.
10.0.1.1        node1.abc.ntu.edu.tw    node1
10.0.1.2        node2.abc.ntu.edu.tw    node2

Originally I had "10.0.1.10 def.ntu.edu.tw def oscar_server nfs_oscar pbs_oscar" in the /etc/hosts
so the post_install always failed.


I also want to note that it is not necessary to edit the nodes file /var/spool/pbs/server_priv/nodes. The
post_install script can still find the nodes without this file.


Finally, I'd like to point out that I figured this out after reading the following paragraph I found in
http://www.mail-archive.com/[email protected]/msg01387.html


"If the primary name on the interface is not the name in the PBSserver file, 
you will get
get an "Unauthorized Request" error when you attempt to configure the server with 
qmgr."


Thanks to all, especially Bernard, who tried to help me out.

Shiang-Tai

Yu Chen wrote:

Hello,

I can confirm Shiang-Tai's finding, it happened to me too, the same thing, although different system. I am using RH-EL-AS-3 update 3 on i386.

The error messages are the same, I thought it's the
"/opt/pbs/bin/pbsnodes: Server has no node list" problem,
so I created /var/spool/pbs/server_priv/nodes file manually, restarted pbs_server, then run the "pbs_postinstall", now there is no
"/opt/pbs/bin/pbsnodes: Server has no node list" message, but still tons of "qmgr obj=node2.cl.hhmi.umbc.edu svr=default: Unauthorized Request" messages from each node.


This is my first time playing with PBS, so anyone has any ideas on this, maybe something on nodes have to be done? BTW, I can ssh to any node without password without problem.

Chen


On Fri, 7 Jan 2005, Bernard Li wrote:

Hi Shiang-Tai:

You should not need to select anything in Step 1, since Torque should be selected by default. If you need to select it manually, then something is wrong.

Can you run the following command and paste the output here? Run it 2 times at least:

% cd /opt/oscar/packages/torque/scripts
% ./post_install

also:

% qmgr -c "print server"

You might also want to check the Torque logs to see what is going on:

/var/spool/pbs/server_logs/pbs_server.log

Cheers,

Bernard



===========================================
Yu Chen
Howard Hughes Medical Institute
Chemistry Building, Rm 182
University of Maryland at Baltimore County
1000 Hilltop Circle
Baltimore, MD 21250

phone:     (410)455-6347 (primary)
    (410)455-2718 (secondary)
fax:     (410)455-1174
email:     [EMAIL PROTECTED]
===========================================



-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
_______________________________________________
Oscar-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/oscar-users

Reply via email to