[gridengine users] Fwd: Re: Exclude job from rule.

Guillermo Marco Puche Wed, 28 Aug 2013 07:42:17 -0700

Title: Guillermo Marco Puche

On 08/28/2013 04:33 PM, Reuti wrote:

Am 28.08.2013 um 11:00 schrieb Guillermo Marco Puche:

On 08/28/2013 10:57 AM, Reuti wrote:

Hi,


Am 28.08.2013 um 10:40 schrieb Guillermo Marco Puche:

I've been experiencing some weird behavior with Picard tools (bioinformatic tool on Java). 

	• Job starts running
	• Job gets to T status (threshold)

Do you mean the process state "T" or the SGE job state "T"?

T in SGE.

Ok, the Threshold was triggered by the load being too high?

Indeed, that's exactly what happened.

	• Job comes back to R status.

Do you use any checkpointing interface, to restart the job? If so, it should output "Rr" in `qstat` instead of a plain "R" for the SGE job state.

No, I don't use any checkpointing interface.

Then the state should be "r".

In total: the processes are suspended in the correct way (and reach state "T" also in `ps -e f), but after the `kill -cont ...` to wake them up they become sleeping?

Correct. That's why I don't understand. Other processes after being in a T state because the load was too high they resume correctly and finish.

But it seems that Java doesn't like being in a non running status. Or at leasts that's my pov.

Best regards,
Guillermo.

-- Reuti

	• Job stays in R status forever. The processes stay on compute node without using resources.

In the list I see only "S" states.

-- Reuti

NB: Maybe it could help, to run these "suspend-sensible" jobs with a nice value of 19 ("priority 19" in the queue configuration), and normal job like usual at 0.

You can see the real process inside compute node in the following picture. As I said they seem to do nothing, they just stay here.


http://imm.io/1gmXT


Thank you.

Best regards,
Guillermo.
On 07/29/2013 01:35 PM, Reuti wrote:

Hi,

Am 29.07.2013 um 13:07 schrieb Guillermo Marco Puche:

I have set some subscribing rules. So cluster compute nodes have load balanced. This way, grid engine put some jobs to a T state when a compute node exceeds load rule.

The problem is I've some perl scripts that use MySQL connection after resuming from a T state die because they lose the connection to MySQL. 

The question is.. Is there any way to exclude a job by name from suffering this rule? So It will never enter T status and die after resume.

unfortunately no.

Nevertheless I saw the need for some kind of "suspensible y/n" flag for a submitted job too:



https://arc.liv.ac.uk/trac/SGE/ticket/735



For your situation it could help to have a dedicated queue for these Perl scripts only, which will never get suspended.

-- Reuti

Thank you.

Best regards,
Guillermo.
-- 
_______________________________________________
users mailing list


[email protected]
https://gridengine.org/mailman/listinfo/users

-- 
Guillermo Marco Puche

Bioinformatician, Computer Science Engineer.
Sistemas Genómicos S.L.
Phone: +34 902 364 669
Fax: +34 902 364 670
www.sistemasgenomicos.com

 <bioinfo.png>

Guillermo Marco Puche

Bioinformatician, Computer Science Engineer.
Sistemas Genómicos S.L.
Phone: +34 902 364 669
Fax: +34 902 364 670
www.sistemasgenomicos.com

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

[gridengine users] Fwd: Re: Exclude job from rule.

Reply via email to