-------- Original Message --------
Subject: Re: [Oscar-users] problem killing a running job
From: "David Gutierrez" <[EMAIL PROTECTED]>
Date: Fri, July 11, 2003 5:00 pm
To: <[EMAIL PROTECTED]>

i finalli got rid of the job in the qstat but now after reboot
everything i am getting the following messages:


PBS Job Id: 109.oscarnode1.oscardomain
Job Name:   PBSMPIcpiTEST
Aborted by PBS Server
Job cannot be executed
See Administrator for help

I saw the Mom logs in /var/spool/pbs/....and there is nothing rare but
in the server log i got the attached log file.





david



>I had the same problem and I had to go manually and clear the queue of
> the PBS system.
>
> If I remember correctly, I
>
> 1. stopped pbs on the master
> 2. went to the spool directory on all the master and the slave nodes
> (/var/spool/pbs/spool) and deleted everything left over. You may want
> to do a recursive grep in /var/spool/pbs for "91" to make sure there
> is nothing left there.
> 3. I then rebooted everything too :-)
>
> Good luck, Yannis
>
> On Fri, 11 Jul 2003, David Gutierrez wrote:
>
>> i have tried a lot of things but still nothing.Every time i do a qdel
>> JOBID i got a messages like this:
>>
>> PBS JOB ID: JOBID
>> Job Name:JOB NAME
>> Job deleted at request of [EMAIL PROTECTED]
>>
>> i tryed qsiging the job , and pbsnodes -o , then reboot the nodes ,
>> and then pbsnodes -c , but nothing.
>>
>> When i do qstat i see:
>>
>> Job id           Name             User             Time Use S Queue
>> ---------------- ---------------- ---------------- -------- - -----
>> 91.oscarnode1    PIPI             oscartst                0 R workq
>>
>> and  when i do pbsnodes -a i got:
>>
>> oscarnode2.oscardomain
>>      state = free
>>      np = 2
>>      properties = all
>>      ntype = cluster
>>      jobs = 0/91.oscarnode1.oscardomain
>>
>>
>> oscarnode3.oscardomain
>>      state = free
>>      np = 2
>>      properties = all
>>      ntype = cluster
>>      jobs = 0/91.oscarnode1.oscardomain
>>
>>
>> oscarnode4.oscardomain
>>      state = free
>>      np = 2
>>      properties = all
>>      ntype = cluster
>>      jobs = 0/91.oscarnode1.oscardomain
>>
>>
>> oscarnode5.oscardomain
>>      state = free
>>      np = 2
>>      properties = all
>>      ntype = cluster
>>
>> oscarnode6.oscardomain
>>      state = free
>>      np = 2
>>      properties = all
>>      ntype = cluster
>>
>>
>> oscarnode7.oscardomain
>>      state = free
>>      np = 2
>>      properties = all
>>      ntype = cluster
>>
>>
>> oscarnode8.oscardomain
>>      state = state-unknown,down
>>      np = 2
>>      properties = all
>>      ntype = cluster
>>
>> I am unable to execute any other job in the cluster because of this.
>>
>> What can i do?
>>
>> David
>>
>>
>> > Check to see if any of the nodes is down.
>> >
>> > --- David Gutierrez <[EMAIL PROTECTED]> wrote:
>> >> i tried with qdel JOBID , and  i received a messages
>> >> that says:
>> >>
>> >> PBS Job Id: 91.oscarnode1.oscardomain
>> >> Job Name:   PIPI
>> >> Job deleted at request of
>> >> [EMAIL PROTECTED]
>> >>
>> >> but the problem is that i still see this job as
>> >> running every time i make
>> >> a qstat , and the other jobs in the queue do not go
>> >> to a run state
>> >>
>> >> david
>> >>
>> >>
>> >> but the
>> >> > qdel JOBID?
>> >> >
>> >> >          Jeremy
>> >> >
>> >> > At 10:26 PM 7/10/2003 -0400, David Gutierrez
>> >> wrote:
>> >> >>hi:
>> >> >>
>> >> >>i have a problem killing a running job.
>> >> >>
>> >> >>i have tried with qsig jobid  , but nothing.It
>> >> always appears as runnig
>> >> >> ecverytime i do qstat .
>> >> >>
>> >> >>any idea?
>> >> >>
>> >> >>david
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >>
>> >>>-------------------------------------------------------
>> >> >>This SF.Net email sponsored by: Parasoft
>> >> >>Error proof Web apps, automate testing & more.
>> >> >>Download & eval WebKing and get a free book.
>> >> >>www.parasoft.com/bulletproofapps1
>> >> >>_______________________________________________
>> >> >>Oscar-users mailing list
>> >> >>[EMAIL PROTECTED]
>> >>
>> >>>https://lists.sourceforge.net/lists/listinfo/oscar-users
>> >> >
>> >> >
>> >> >
>> >> >
>> >>
>> > -------------------------------------------------------
>> >> > This SF.Net email sponsored by: Parasoft
>> >> > Error proof Web apps, automate testing & more.
>> >> > Download & eval WebKing and get a free book.
>> >> > www.parasoft.com/bulletproofapps1
>> >> > _______________________________________________
>> >> > Oscar-users mailing list
>> >> > [EMAIL PROTECTED]
>> >> >
>> >>
>> > https://lists.sourceforge.net/lists/listinfo/oscar-users
>> >>
>> >>
>> >> --
>> >> Ing.David Gutierrez Diaz
>> >>
>> >>
>> >>
>> >>
>> >>
>> > -------------------------------------------------------
>> >> This SF.Net email sponsored by: Parasoft
>> >> Error proof Web apps, automate testing & more.
>> >> Download & eval WebKing and get a free book.
>> >> www.parasoft.com/bulletproofapps1
>> >> _______________________________________________
>> >> Oscar-users mailing list
>> >> [EMAIL PROTECTED]
>> >>
>> > https://lists.sourceforge.net/lists/listinfo/oscar-users
>> >
>> >
>> > __________________________________
>> > Do you Yahoo!?
>> > SBC Yahoo! DSL - Now only $29.95 per month!
>> > http://sbc.yahoo.com
>>
>>
>> --
>> Ing.David Gutierrez Diaz
>>
>>
>>
>>
>> -------------------------------------------------------
>> This SF.Net email sponsored by: Parasoft
>> Error proof Web apps, automate testing & more.
>> Download & eval WebKing and get a free book.
>> www.parasoft.com/bulletproofapps1
>> _______________________________________________
>> Oscar-users mailing list
>> [EMAIL PROTECTED]
>> https://lists.sourceforge.net/lists/listinfo/oscar-users
>>
>
>
>
> ---------------------
> "Ama et fac quod vis"


-- 
Ing.David Gutierrez Diaz


-- 
Ing.David Gutierrez Diaz

Attachment: 20030712
Description: Binary data

Reply via email to