Hi all, I saw post earlier today (or yesterday) about jobs in a dependency chain starting while the prior job epilogue is still running. I have a related, but more general case of this.
I've been using a test configuration of slurm on a Cray XC30 in hybrid mode. I've seen that the end-of-reservation nodehealthcheck (a Cray thing) will often run at the same time as, or before a spank plugin epilogue runs. This generates a race between the two - especially since I use the nodehealthcheck to validate that the epilogue properly cleaned up the job. Is it feasible to run the job/spank epilogues *before* releasing the resources? Or, is this already the behavior and I'm misdiagnosing this. Thanks, Doug ---- Doug Jacobsen, Ph.D. NERSC Computer Systems Engineer National Energy Research Scientific Computing Center <http://www.nersc.gov> [email protected] ------------- __o ---------- _ '\<,_ ----------(_)/ (_)__________________________
