Hi,
after failing to execute parallel jobs with PBS installed by OSCAR (on a Fedora 
Core 3 system), I tried to remove any PBS thing from the master and a node, and 
reinstall it completely. I installed Torque 1.2.p04, on master and on a node, 
it compiled with no problems, I configureed it and I started all demons . Now 
it marks as "free" both nodes (they are 000 and 005). Problem is that when I 
launch a one processor job, I obtain only the error file that contains:


-bash: line 1: /usr/spool/PBS/mom_priv/jobs/15.medusa.d.SC: No such file or 
directory:


What does it mean?
Moreover, if I try to launch a two processors job (with qsub -l nodes=2 
./hello), it remains indefinitely in queue and a qstat -f gives the following 
result:


Job Id: 14.medusa.dicea.unifi.it
    Job_Name = hello
    Job_Owner = [EMAIL PROTECTED]
    job_state = Q
    queue = batch
    server = medusa.dicea.unifi.it
    Checkpoint = u
    ctime = Fri Aug 12 20:35:05 2005
    Error_Path = medusa.dicea.unifi.it:/home/lcampo/hello.e14
    exec_host = medusa005.dicea.unifi.it/0+medusa000.dicea.unifi.it/0
    Hold_Types = n
    Join_Path = n
    Keep_Files = n
    Mail_Points = a
    mtime = Fri Aug 12 20:35:09 2005
    Output_Path = medusa.dicea.unifi.it:/home/lcampo/hello.o14
    Priority = 0
    qtime = Fri Aug 12 20:35:05 2005
    Rerunable = True
    Resource_List.nodect = 2
    Resource_List.nodes = 2
    Resource_List.walltime = 01:00:00
    Variable_List = PBS_O_HOME=/home/lcampo,PBS_O_LANG=en_US.UTF-8,
        PBS_O_LOGNAME=lcampo,
        PBS_O_PATH=/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/usr/X11R6/b
        in:/home/intel_fc_80/lib:/home/intel_fc_80/bin:/opt/kernel_picker/bin:/
        opt/env-switcher/bin:/opt/mpich-1.2.5.10-ch_p4-gcc/bin:/opt/pvm3/lib:/o
        pt/pvm3/lib/LINUX:/opt/pvm3/bin/LINUX:/usr/local/apitest:/opt/c3-4/:/op
        t/pbs/bin:/opt/pbs/lib/xpbs/bin:/home/lcampo/bin,
        PBS_O_MAIL=/var/spool/mail/lcampo,PBS_O_SHELL=/bin/bash,
        PBS_O_HOST=medusa.dicea.unifi.it,PBS_O_WORKDIR=/home/lcampo,
        PBS_O_QUEUE=batch
    comment = Not Running - PBS Error: Resource temporarily unavailable REJHOST
        =medusa005.dicea.unifi.it MSG=could not contact host
    etime = Fri Aug 12 20:35:05 2005



I tried to look for the string "Not Running - PBS Error: Resource temporarily 
unavailable REJHOST
        =medusa005.dicea.unifi.it MSG=could not contact host" in forums, PBS 
manuals, etc., but I didn't find anything. I can connect with the node 005 with 
no problems, ssh works and iptables is inactive. Any idea? I'm using the 
pbs_sched as scheduler.
Thank you
Lorenzo Campo 

-------------------------------------------------------------------------
Visita http://domini.interfree.it, il sito di Interfree dove trovare
soluzioni semplici e complete che soddisfano le tue esigenze in Internet,
ecco due esempi di offerte:

-  Registrazione Dominio: un dominio con 1 MB di spazio disco +  2 caselle
   email a soli 18,59 euro
-  MioDominio: un dominio con 20 MB di spazio disco + 5 caselle email 
   a soli 51,13 euro

Vieni a trovarci!

Lo Staff di Interfree 
-------------------------------------------------------------------------



-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
Oscar-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/oscar-users

Reply via email to