Hi,
after failing to execute parallel jobs with PBS installed by OSCAR (on a Fedora
Core 3 system), I tried to remove any PBS thing from the master and a node, and
reinstall it completely. I installed Torque 1.2.p04, on master and on a node,
it compiled with no problems, I configureed it and I started all demons . Now
it marks as "free" both nodes (they are 000 and 005). Problem is that when I
launch a one processor job, I obtain only the error file that contains:
-bash: line 1: /usr/spool/PBS/mom_priv/jobs/15.medusa.d.SC: No such file or
directory:
What does it mean?
Moreover, if I try to launch a two processors job (with qsub -l nodes=2
./hello), it remains indefinitely in queue and a qstat -f gives the following
result:
Job Id: 14.medusa.dicea.unifi.it
Job_Name = hello
Job_Owner = [EMAIL PROTECTED]
job_state = Q
queue = batch
server = medusa.dicea.unifi.it
Checkpoint = u
ctime = Fri Aug 12 20:35:05 2005
Error_Path = medusa.dicea.unifi.it:/home/lcampo/hello.e14
exec_host = medusa005.dicea.unifi.it/0+medusa000.dicea.unifi.it/0
Hold_Types = n
Join_Path = n
Keep_Files = n
Mail_Points = a
mtime = Fri Aug 12 20:35:09 2005
Output_Path = medusa.dicea.unifi.it:/home/lcampo/hello.o14
Priority = 0
qtime = Fri Aug 12 20:35:05 2005
Rerunable = True
Resource_List.nodect = 2
Resource_List.nodes = 2
Resource_List.walltime = 01:00:00
Variable_List = PBS_O_HOME=/home/lcampo,PBS_O_LANG=en_US.UTF-8,
PBS_O_LOGNAME=lcampo,
PBS_O_PATH=/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/usr/X11R6/b
in:/home/intel_fc_80/lib:/home/intel_fc_80/bin:/opt/kernel_picker/bin:/
opt/env-switcher/bin:/opt/mpich-1.2.5.10-ch_p4-gcc/bin:/opt/pvm3/lib:/o
pt/pvm3/lib/LINUX:/opt/pvm3/bin/LINUX:/usr/local/apitest:/opt/c3-4/:/op
t/pbs/bin:/opt/pbs/lib/xpbs/bin:/home/lcampo/bin,
PBS_O_MAIL=/var/spool/mail/lcampo,PBS_O_SHELL=/bin/bash,
PBS_O_HOST=medusa.dicea.unifi.it,PBS_O_WORKDIR=/home/lcampo,
PBS_O_QUEUE=batch
comment = Not Running - PBS Error: Resource temporarily unavailable REJHOST
=medusa005.dicea.unifi.it MSG=could not contact host
etime = Fri Aug 12 20:35:05 2005
I tried to look for the string "Not Running - PBS Error: Resource temporarily
unavailable REJHOST
=medusa005.dicea.unifi.it MSG=could not contact host" in forums, PBS
manuals, etc., but I didn't find anything. I can connect with the node 005 with
no problems, ssh works and iptables is inactive. Any idea? I'm using the
pbs_sched as scheduler.
Thank you
Lorenzo Campo
-------------------------------------------------------------------------
Visita http://domini.interfree.it, il sito di Interfree dove trovare
soluzioni semplici e complete che soddisfano le tue esigenze in Internet,
ecco due esempi di offerte:
- Registrazione Dominio: un dominio con 1 MB di spazio disco + 2 caselle
email a soli 18,59 euro
- MioDominio: un dominio con 20 MB di spazio disco + 5 caselle email
a soli 51,13 euro
Vieni a trovarci!
Lo Staff di Interfree
-------------------------------------------------------------------------
-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
Oscar-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/oscar-users