Hmmm, the hosts file was definitely wrong.

On Wednesday 21 December 2005 23:47, Johnston Michael J Contr AFRL/DES wrote:
> Okay, I've checked all the items you suggested and what I have looks
> correct. 
> 
> "pbsnodes -a" reports that all my nodes are available.

Can you post the output for one host, please?

> My /var/spool/pbs/server_name doesn't have an IP in it.  It does have
> pbs_oscar and since I removed the nfs_oscar and pbs_oscar from the end of
> the line for each node in the /etc/hosts file it they all ping the master
> node.

That should be fine.


> My /etc/hosts file looks correct.  The server name is not on the same line
> as the localhost line.
> 
> The /var/spoo/pbs/mom_priv file looks fine on all my nodes.

Is the output of the hostname command the string which you have in the
/var/spool/pbs/mom_priv/config?

> I've been flip-flopping with pbs_sched and maui just to see if I can get one
> to work.  In the process of that and while pbs_sched was on I noticed a line
> in one my logs.
> 
> #######################
> Pbs_sched;Job;35.domain.com;Not enough of the right type of nodes available.
> #######################

This looks bad. The job seems to be named "35.domain.com". This sounds like
your master node had the hostname domain.com when it started the
pbs_server. I doubt that's the name you want to use...

Once again: the ethernet interface internal to the cluster must have the name
which your master node gets as hostname. And the best idea is to have this
name to resolve first. For example:

10.10.100.100 master.domain.com master oscar_server pbs_server nfs_server
1.2.3.4       your.external.ip

The master node should better print "master.domain.com" when you type
"hostname".

> The command I'm running is:
> "qsub -l nodes=1:ppn:1,walltime=30:00:00 /path/to/job.pbs"

What exact job ID does the job have?
Could you try with only "-lnodes=1" to make sure that it's not some resource
issue?

> If I turn off pbs_serv and start maui the job will sumit but wont' do
> anything.  I get the connection refused errors.

Check on the master to which address/ports are the daemons bound?

netstat -pant | egrep 'pbs|maui'
(as root)

I hope that helps...

Regards,
Erich



-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Oscar-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/oscar-users

Reply via email to