Some time ago we've been using slurmctl prologue for this.

2017-10-16 16:36 GMT+02:00 Ryan Richholt <ryanrichh...@gmail.com>:

> Thanks, that sounds like a good idea. A prolog script could also handle
> this right? That way if the node crashes while the job is running, it would
> still be saved.
>
> On Mon, Oct 16, 2017 at 3:20 AM Merlin Hartley <
> merlin-sl...@mrc-mbu.cam.ac.uk> wrote:
>
>> You could also use a simple epilog script to save the output of ‘scontrol
>> show job’ to a file/database.
>>
>> M
>>
>>
>> --
>> Merlin Hartley
>> Computer Officer
>> MRC Mitochondrial Biology Unit
>> Cambridge, CB2 0XY
>> United Kingdom
>>
>> On 15 Oct 2017, at 20:49, Ryan Richholt <ryanrichh...@gmail.com> wrote:
>>
>> Is there any way to get the job command with sacct?
>>
>> For example, if I submit a job like this:
>>
>> $ sbatch testArgs.sh hey there
>>
>> I can get the full command from "scontrol show job":
>>
>>   ...
>>   Command=/home/rrichholt/scripts/testArgs.sh hey there
>>   ...
>>
>> But, that information is not available long-term with sacct.
>>
>> To explain why I would like this:
>>
>> I'm dealing with a workflow that submits lots of jobs for different
>> projects. Each submits the same script, but the first argument points to a
>> different project directory. When jobs fail, it's very hard to tell which
>> project they were working on, because "scontrol show job" only lasts for
>> 300 seconds. Sometimes they fail at night and I don't know until the next
>> morning.
>>
>>
>>

Reply via email to