Also worth looking at AcctGatherProfileType/HDF5 for more detailed job profiling
Cheers, Roshan ________________________________________ From: Christopher Samuel <[email protected]> Sent: 01 December 2014 22:38 To: slurm-dev Subject: [slurm-dev] Re: Job Resource Report On 02/12/14 06:45, Will French wrote: > Does anyone know of a Slurm equivalent to the Torque command tracejob > (http://docs.adaptivecomputing.com/torque/4-1-7/Content/topics/11-troubleshooting/usingTracejobToLocateFailures.htm)? > This command allows you to easily compare requested resources to actual > usage, and is useful for troubleshooting when a user's job dies. I think sacct (if you've set up accounting) will give you a lot of that. Here's an example from a trivial job of mine that just does a sleep 60 and exit 1. Apologies for the very long lines! [samuel@barcoo BARCOO]$ sacct -j 2633455 -l JobID JobName Partition MaxVMSize MaxVMSizeNode MaxVMSizeTask AveVMSize MaxRSS MaxRSSNode MaxRSSTask AveRSS MaxPages MaxPagesNode MaxPagesTask AvePages MinCPU MinCPUNode MinCPUTask AveCPU NTasks AllocCPUS Elapsed State ExitCode AveCPUFreq ReqCPUFreq ReqMem ConsumedEnergy MaxDiskRead MaxDiskReadNode MaxDiskReadTask AveDiskRead MaxDiskWrite MaxDiskWriteNode MaxDiskWriteTask AveDiskWrite ------------ ---------- ---------- ---------- -------------- -------------- ---------- ---------- ---------- ---------- ---------- -------- ------------ -------------- ---------- ---------- ---------- ---------- ---------- -------- ---------- ---------- ---------- -------- ---------- ---------- ---------- -------------- ------------ --------------- --------------- -------------- ------------ ---------------- ---------------- -------------- 2633455 failjob.sh main 1 00:01:00 FAILED 1:0 2Gc 2633455.bat+ batch 134884K barcoo001 0 106056K 316K barcoo001 0 316K 0 barcoo001 0 0 00:00:00 barcoo001 0 00:00:00 1 1 00:01:00 FAILED 1:0 2.70G 0 2Gc 0 0.01M barcoo001 0 0.01M 0.00M barcoo001 0 0.00M -- Christopher Samuel Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: [email protected] Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci
