Am 03.03.2011 um 12:38 schrieb Chris Jewell:

> On 2 Mar 2011, at 19:32, Reuti wrote:
>>> Would it be possible to write to the stderr with an epilogue script, 
>>> harvesting the same line from the messages file, I wonder?
>> 
>> My experience is: no. At time of the epilog the entry in the messges file is 
>> not written yet, so I decided to put it in the mail wrapper which will be 
>> used to send the email after the job has left the node.
>> 
>> It's of course possible to "abuse" the mail-wrapper to also append the line 
>> to the stderr of the job. But the setup is a little bit convoluted, as you 
>> have to save the name of the stderr file in a persistent file which will 
>> survive the end of the job - after the job you can't retrieve its name any 
>> longer. This can be done in a job prolog, and we use it to include some 
>> entries of the job context in the email later on. Let me know if you need 
>> some directions to set it up.
> 
> 
> Okay, thanks for the info.  I understand how to set it up.  IMHO, we should 
> probably have this issue addressed in GE, as it can save a lot of debugging 
> time to quickly know why your job was killed when it was -- if this 
> information is available to write to the messages file, then I don't see why 
> it shouldn't be possible for the shepherd to append it to the job stderr.

It's already an RFE to spot job abortions easier. But personally I wouldn't 
like it in the error file (as it's not an error of the started application 
itself), but going to the email by default. What about a third output to &3 
which could be set to any file or so?

Also a proper entry in the accouting file would help. This could then be done 
by the qmaster, as in case of a parallel job across several nodes it could be 
any of them where the memory limit is passed. Then it should be noted in the 
accounting entry of the job (with the node being mentiong there) and the one of 
the `qrsh -inherit ...` too (in case of accounting_summary is set to FALSE).

-- Reuti
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to