[slurm-dev] Re: Finding job command after fails

2017-10-18 Thread Marcin Stolarek
Some time ago we've been using slurmctl prologue for this.

2017-10-16 16:36 GMT+02:00 Ryan Richholt :

> Thanks, that sounds like a good idea. A prolog script could also handle
> this right? That way if the node crashes while the job is running, it would
> still be saved.
>
> On Mon, Oct 16, 2017 at 3:20 AM Merlin Hartley <
> merlin-sl...@mrc-mbu.cam.ac.uk> wrote:
>
>> You could also use a simple epilog script to save the output of ‘scontrol
>> show job’ to a file/database.
>>
>> M
>>
>>
>> --
>> Merlin Hartley
>> Computer Officer
>> MRC Mitochondrial Biology Unit
>> Cambridge, CB2 0XY
>> United Kingdom
>>
>> On 15 Oct 2017, at 20:49, Ryan Richholt  wrote:
>>
>> Is there any way to get the job command with sacct?
>>
>> For example, if I submit a job like this:
>>
>> $ sbatch testArgs.sh hey there
>>
>> I can get the full command from "scontrol show job":
>>
>>   ...
>>   Command=/home/rrichholt/scripts/testArgs.sh hey there
>>   ...
>>
>> But, that information is not available long-term with sacct.
>>
>> To explain why I would like this:
>>
>> I'm dealing with a workflow that submits lots of jobs for different
>> projects. Each submits the same script, but the first argument points to a
>> different project directory. When jobs fail, it's very hard to tell which
>> project they were working on, because "scontrol show job" only lasts for
>> 300 seconds. Sometimes they fail at night and I don't know until the next
>> morning.
>>
>>
>>


[slurm-dev] Re: Finding job command after fails

2017-10-16 Thread Ryan Richholt
Thanks, that sounds like a good idea. A prolog script could also handle
this right? That way if the node crashes while the job is running, it would
still be saved.

On Mon, Oct 16, 2017 at 3:20 AM Merlin Hartley <
merlin-sl...@mrc-mbu.cam.ac.uk> wrote:

> You could also use a simple epilog script to save the output of ‘scontrol
> show job’ to a file/database.
>
> M
>
>
> --
> Merlin Hartley
> Computer Officer
> MRC Mitochondrial Biology Unit
> Cambridge, CB2 0XY
> United Kingdom
>
> On 15 Oct 2017, at 20:49, Ryan Richholt  wrote:
>
> Is there any way to get the job command with sacct?
>
> For example, if I submit a job like this:
>
> $ sbatch testArgs.sh hey there
>
> I can get the full command from "scontrol show job":
>
>   ...
>   Command=/home/rrichholt/scripts/testArgs.sh hey there
>   ...
>
> But, that information is not available long-term with sacct.
>
> To explain why I would like this:
>
> I'm dealing with a workflow that submits lots of jobs for different
> projects. Each submits the same script, but the first argument points to a
> different project directory. When jobs fail, it's very hard to tell which
> project they were working on, because "scontrol show job" only lasts for
> 300 seconds. Sometimes they fail at night and I don't know until the next
> morning.
>
>
>


[slurm-dev] Re: Finding job command after fails

2017-10-16 Thread Merlin Hartley
You could also use a simple epilog script to save the output of ‘scontrol show 
job’ to a file/database.

M


--
Merlin Hartley
Computer Officer
MRC Mitochondrial Biology Unit
Cambridge, CB2 0XY
United Kingdom

> On 15 Oct 2017, at 20:49, Ryan Richholt  wrote:
> 
> Is there any way to get the job command with sacct?
> 
> For example, if I submit a job like this:
> 
> $ sbatch testArgs.sh hey there
> 
> I can get the full command from "scontrol show job":
> 
>   ...
>   Command=/home/rrichholt/scripts/testArgs.sh hey there
>   ...
> 
> But, that information is not available long-term with sacct. 
> 
> To explain why I would like this:
> 
> I'm dealing with a workflow that submits lots of jobs for different projects. 
> Each submits the same script, but the first argument points to a different 
> project directory. When jobs fail, it's very hard to tell which project they 
> were working on, because "scontrol show job" only lasts for 300 seconds. 
> Sometimes they fail at night and I don't know until the next morning.



[slurm-dev] Re: Finding job command after fails

2017-10-15 Thread Douglas Jacobsen
We use a job completion plugin to store that data.  Ours is custom, but it
is loosely based on the elastic completion plugin, which may be a good
starting point.

On Oct 15, 2017 12:48, "Ryan Richholt"  wrote:

> Is there any way to get the job command with sacct?
>
> For example, if I submit a job like this:
>
> $ sbatch testArgs.sh hey there
>
> I can get the full command from "scontrol show job":
>
>   ...
>   Command=/home/rrichholt/scripts/testArgs.sh hey there
>   ...
>
> But, that information is not available long-term with sacct.
>
> To explain why I would like this:
>
> I'm dealing with a workflow that submits lots of jobs for different
> projects. Each submits the same script, but the first argument points to a
> different project directory. When jobs fail, it's very hard to tell which
> project they were working on, because "scontrol show job" only lasts for
> 300 seconds. Sometimes they fail at night and I don't know until the next
> morning.
>