Am 28.08.2013 um 16:39 schrieb Guillermo Marco Puche:

> On 08/28/2013 04:33 PM, Reuti wrote:
>> Am 28.08.2013 um 11:00 schrieb Guillermo Marco Puche:
>> 
>> 
>>> On 08/28/2013 10:57 AM, Reuti wrote:
>>> 
>>>> Hi,
>>>> 
>>>> Am 28.08.2013 um 10:40 schrieb Guillermo Marco Puche:
>>>> 
>>>> 
>>>> 
>>>>> I've been experiencing some weird behavior with Picard tools 
>>>>> (bioinformatic tool on Java). 
>>>>> 
>>>>>   • Job starts running
>>>>>   • Job gets to T status (threshold)
>>>>> 
>>>>> 
>>>> Do you mean the process state "T" or the SGE job state "T"?
>>>> 
>>> T in SGE.
>>> 
>> Ok, the Threshold was triggered by the load being too high?
>> 
> Indeed, that's exactly what happened.
>>>> 
>>>>>   • Job comes back to R status.
>>>>> 
>>>>> 
>>>> Do you use any checkpointing interface, to restart the job? If so, it 
>>>> should output "Rr" in `qstat` instead of a plain "R" for the SGE job state.
>>>> 
>>>> 
>>>> 
>>> No, I don't use any checkpointing interface.
>>> 
>> Then the state should be "r".
>> 
>> In total: the processes are suspended in the correct way (and reach state 
>> "T" also in `ps -e f), but after the `kill -cont ...` to wake them up they 
>> become sleeping?
>> 
> Correct. That's why I don't understand. Other processes after being in a T 
> state because the load was too high they resume correctly and finish.
> 
> But it seems that Java doesn't like being in a non running status. Or at 
> leasts that's my pov.

This depends on the particular program. In principle it's possible to stop/cont 
a java application.

-- Reuti


> Best regards,
> Guillermo.
>> -- Reuti
>> 
>> 
>> 
>>>>>   • Job stays in R status forever. The processes stay on compute node 
>>>>> without using resources.
>>>>> 
>>>>> 
>>>> In the list I see only "S" states.
>>>> 
>>>> -- Reuti
>>>> 
>>>> NB: Maybe it could help, to run these "suspend-sensible" jobs with a nice 
>>>> value of 19 ("priority 19" in the queue configuration), and normal job 
>>>> like usual at 0.
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> You can see the real process inside compute node in the following 
>>>>> picture. As I said they seem to do nothing, they just stay here.
>>>>> 
>>>>> 
>>>>> 
>>>>> http://imm.io/1gmXT
>>>>> 
>>>>> 
>>>>> 
>>>>> Thank you.
>>>>> 
>>>>> Best regards,
>>>>> Guillermo.
>>>>> On 07/29/2013 01:35 PM, Reuti wrote:
>>>>> 
>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> Am 29.07.2013 um 13:07 schrieb Guillermo Marco Puche:
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> I have set some subscribing rules. So cluster compute nodes have load 
>>>>>>> balanced. This way, grid engine put some jobs to a T state when a 
>>>>>>> compute node exceeds load rule.
>>>>>>> 
>>>>>>> The problem is I've some perl scripts that use MySQL connection after 
>>>>>>> resuming from a T state die because they lose the connection to MySQL. 
>>>>>>> 
>>>>>>> The question is.. Is there any way to exclude a job by name from 
>>>>>>> suffering this rule? So It will never enter T status and die after 
>>>>>>> resume.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> unfortunately no.
>>>>>> 
>>>>>> Nevertheless I saw the need for some kind of "suspensible y/n" flag for 
>>>>>> a submitted job too:
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> https://arc.liv.ac.uk/trac/SGE/ticket/735
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> For your situation it could help to have a dedicated queue for these 
>>>>>> Perl scripts only, which will never get suspended.
>>>>>> 
>>>>>> -- Reuti
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> Thank you.
>>>>>>> 
>>>>>>> Best regards,
>>>>>>> Guillermo.
>>>>>>> -- 
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> [email protected]
>>>>>>> https://gridengine.org/mailman/listinfo/users
>>> -- 
>>> Guillermo Marco Puche
>>> 
>>> Bioinformatician, Computer Science Engineer.
>>> Sistemas Genómicos S.L.
>>> Phone: +34 902 364 669
>>> Fax: +34 902 364 670
>>> 
>>> www.sistemasgenomicos.com
>>> 
>>> 
>>>  <bioinfo.png> 
>>> 
>>> 
> 
> 
> -- 
> Guillermo Marco Puche
> 
> Bioinformatician, Computer Science Engineer.
> Sistemas Genómicos S.L.
> Phone: +34 902 364 669
> Fax: +34 902 364 670
> www.sistemasgenomicos.com
> 
>  <Mail-Anhang.png> 
> 
> 
> 
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to