Hi Bernard,

The the address 10.0.1.10 was for my intranet network card (eth1) and I used ./install_cluster eth1 for the installation.

BTW, I'd like to make it a bit more clear for the meaning of " hostname for the network should be the same in the
PBSserver file". My understanding is that the hostname (one can get from %echo $HOSTNAME) should be on the
same line as the name of the PBSserver (pbs_oscar) in the file /etc/hosts. In the example below my $HOSTNAME
was abc.ntu.edu.tw.

Thanks to you for your suggestions as well.

Shiang-Tai


Bernard Li wrote:
Hi Shiang-Tai:

Good work!  This may be a potential bug that we might have to look into.

One question though, assuming that 10.0.1.10 is the address of your eth1
interface, have you always been running:

./install_cluster eth1

?  Have you ever run install_cluster on eth0?

Thanks,

Bernard 

  
-----Original Message-----
From: Shiang-Tai Lin [mailto:[EMAIL PROTECTED]] 
Sent: Tuesday, January 11, 2005 20:53
To: Yu Chen
Cc: Bernard Li; [email protected]
Subject: Re: [Oscar-users] PBS configuration failure during 
post_install (OSCAR4+FC2)

Hi,
I finally figure out the problem with the Torque post_install 
failure. 
The problem (at least for me) was that
the hostname for the network  is not the same in the PBSserver file. 
That is, the name of the PBSserver,
pbs_oscar (defined in /var/spool/pbs/server_name), is not 
defined in the file /etc/hosts. This should always be true if 
one has only one network card installed but may be setup 
incorrectly if there are additional network cards on the 
server node. To be more clear, I list my /etc/hosts file here 
(the xx are numbers I do not wish to disclose for security reasons)

# Do not remove the following line, or various programs # 
that require network functionality will fail.
10.0.1.10      abc.ntu.edu.tw  abc oscar_server nfs_oscar pbs_oscar
140.112.xx.xx def.ntu.edu.tw def

# These entries are managed by SIS, please don't modify them.
10.0.1.1        node1.abc.ntu.edu.tw    node1
10.0.1.2        node2.abc.ntu.edu.tw    node2

Originally I had "10.0.1.10      def.ntu.edu.tw  def oscar_server 
nfs_oscar pbs_oscar" in the /etc/hosts
so the post_install always failed.

I also want to note that it is not necessary to edit the 
nodes file /var/spool/pbs/server_priv/nodes. The post_install 
script can still find the nodes without this file.

Finally, I'd like to point out that I figured this out after 
reading the following paragraph I found in 
http://www.mail-archive.com/[email protected]/
    
msg01387.html
  
"If the primary name on the interface is not the name in the 
PBSserver file, you will get get an "Unauthorized Request" 
error when you attempt to configure the server with qmgr."


Thanks to all, especially Bernard, who tried to help me out.

Shiang-Tai

Yu Chen wrote:

    
Hello,

I can confirm Shiang-Tai's finding, it happened to me too, the same 
thing, although different system. I am using RH-EL-AS-3 
      
update 3 on i386.
    
The error messages are the same, I thought it's the
"/opt/pbs/bin/pbsnodes: Server has no node list" problem, 
      
so I created 
    
/var/spool/pbs/server_priv/nodes file manually, restarted 
      
pbs_server, 
    
then run the "pbs_postinstall", now there is no
"/opt/pbs/bin/pbsnodes: Server has no node list" message, but still 
tons of "qmgr obj=node2.cl.hhmi.umbc.edu svr=default: Unauthorized 
Request" messages from each node.

This is my first time playing with PBS, so anyone has any ideas on 
this, maybe something on nodes have to be done? BTW, I can 
      
ssh to any 
    
node without password without problem.

Chen


On Fri, 7 Jan 2005, Bernard Li wrote:

      
Hi Shiang-Tai:

You should not need to select anything in Step 1, since 
        
Torque should 
    
be selected by default.  If you need to select it manually, then 
something is wrong.

Can you run the following command and paste the output 
        
here?  Run it
    
2 times at least:

% cd /opt/oscar/packages/torque/scripts
% ./post_install

also:

% qmgr -c "print server"

You might also want to check the Torque logs to see what 
        
is going on:
    
/var/spool/pbs/server_logs/pbs_server.log

Cheers,

Bernard

        
===========================================
Yu Chen
Howard Hughes Medical Institute
Chemistry Building, Rm 182
University of Maryland at Baltimore County 1000 Hilltop Circle 
Baltimore, MD 21250

phone:     (410)455-6347 (primary)
    (410)455-2718 (secondary)
fax:     (410)455-1174
email:     [EMAIL PROTECTED]
===========================================

      

  

Reply via email to