Thank you for the rapid answer! But still I have several questions, please
see inline.


On Fri, May 31, 2013 at 6:47 PM, Brian Gilmer <[email protected]> wrote:

>  I addressed a similar problem with _exit(<value>).
>
[Victor Kocheganov] Where can I find it? I can not any clue in archive of
slurm-dev list 
(http://dir.gmane.org/gmane.comp.distributed.slurm.devel<http://dir.gmane.org/gmane.comp.distributed.slurm.devel%22>
)

Slurm will kill off the rest of the pe in a job step if one exits with a
> non-zero code.
>
[Victor Kocheganov] Unfortunately it depends on slurm configurations as far
as I know (whether '-K' flag is set or not; it could be set implicitly). So
I can not rely on such a behavior...

The exit() function doesn't work under mx shmem because the exit() function
> is overridden and does not propagate the exit code.  PMI_Abort(exit_code)
> uses exit() so in our case it always returns an exit code of 9 regardless
> of the value of exit_code.
>
[Victor Kocheganov] And this is interesting, because I see that SLURM
always returns zero value to system when PMI_Abort(0,NULL) was invoked by
some process, except for the case when process with zero rank (PMI daemon
as I suspect) invoked it. Therefore a little hope still exists in my mind,
that I can make PMI_Abort work for me (return zero always in case
PMI_Abort(0,NULL)).

But you are saying that there is no hope in PMI_Abort(), am I understand
right? Do you have any other ways to make SLURM ( using PMI or without it)
terminate all the processes if one of them requested it (with passed exit
statuses off course)?

>
>
> On Fri, May 31, 2013 at 5:23 AM, Victor Kocheganov <
> [email protected]> wrote:
>
>>  Hello,
>>
>> I am SHMEM library developer and I am looking for approach to terminate
>> the whole slurm job with the specific exit status, when one of processes
>> initiate it. That is SHMEM library should have some API routine named
>> 'globalexit(int status);', which terminates the job with other processes in
>> it with status exit code.
>>
>> The only way I found out is to use PMI_Abort(status), but it does not
>> work for zero status value, when PMI_Abort is invoked by zero process
>> (daemon for PMI, as I understand). Is it normal behavior or a bug? Could
>> you please help to find any other approaches, if this one does not seem
>> proper for slurm?
>>
>> Thank you in advance,
>> Victor Kocheganov.
>>
>
>
>
> --
> Speak when you are angry--and you will make the best speech you'll ever
> regret.
>   - Laurence J. Peter
>

Reply via email to