I'm leery of this patch - will discuss with other MPI folks as this could
cause problems for existing apps


>
>
> Sent from my iPhone
>
> On Jun 3, 2013, at 5:17 AM, "Riebs, Andy" <[email protected]> wrote:
>
>   Hi Victor,****
>
> ** **
>
> If the patch is straight-forward, and the reason for it is clear, patches
> sent to this list tend to be adopted quickly. However, since this changes
> behavior that someone else may be counting on, it might get held for the
> next major release if it is accepted.****
>
> ** **
>
> Andy****
>
> ** **
>
> *From:* Victor Kocheganov 
> [mailto:[email protected]<[email protected]>]
>
> *Sent:* Monday, June 03, 2013 6:48 AM
> *To:* slurm-dev
> *Subject:* [slurm-dev] Re: PMI_Abort with zero value****
>
> ** **
>
> OK, I see.****
>
> ** **
>
> I've got SLURM sources with PMI in it and found out the reason of
> "strange" behavior (I mean 0 rank process behaves different from others in
> PMI_Abort()). ****
>
> It seems clear to deal with it. Is it a complex procedure to provide a
> minor fix to community (via patch)?****
>
> ** **
>
> On Mon, Jun 3, 2013 at 12:18 AM, Brian Gilmer <[email protected]> wrote:*
> ***
>
> We are able to use _exit() so I did not go any further.  The behavior of
> PMI_Abort() and exit() were both odd so I thought that my save you some
> time.  I am interested if you find another solution.  ****
>
> ** **
>
> On Sun, Jun 2, 2013 at 3:39 AM, Victor Kocheganov <
> [email protected]> wrote:****
>
> Thank you for the rapid answer! But still I have several questions, please
> see inline.****
>
> ** **
>
> On Fri, May 31, 2013 at 6:47 PM, Brian Gilmer <[email protected]> wrote:*
> ***
>
> I addressed a similar problem with _exit(<value>).  ****
>
> [Victor Kocheganov] Where can I find it? I can not any clue in archive of
> slurm-dev list 
> (http://dir.gmane.org/gmane.comp.distributed.slurm.devel<http://dir.gmane.org/gmane.comp.distributed.slurm.devel%22>
> )****
>
> ** **
>
>  Slurm will kill off the rest of the pe in a job step if one exits with a
> non-zero code. ****
>
>  [Victor Kocheganov] Unfortunately it depends on slurm configurations as
> far as I know (whether '-K' flag is set or not; it could be set
> implicitly). So I can not rely on such a behavior...****
>
> ** **
>
>  The exit() function doesn't work under mx shmem because the exit()
> function is overridden and does not propagate the exit code.
> PMI_Abort(exit_code) uses exit() so in our case it always returns an exit
> code of 9 regardless of the value of exit_code.****
>
>  [Victor Kocheganov] And this is interesting, because I see that SLURM
> always returns zero value to system when PMI_Abort(0,NULL) was invoked by
> some process, except for the case when process with zero rank (PMI daemon
> as I suspect) invoked it. Therefore a little hope still exists in my mind,
> that I can make PMI_Abort work for me (return zero always in
> case PMI_Abort(0,NULL)).****
>
> ** **
>
> But you are saying that there is no hope in PMI_Abort(), am I understand
> right? Do you have any other ways to make SLURM ( using PMI or without it)
> terminate all the processes if one of them requested it (with passed exit
> statuses off course)? ****
>
>   ** **
>
> On Fri, May 31, 2013 at 5:23 AM, Victor Kocheganov <
> [email protected]> wrote:****
>
>   Hello,****
>
> ** **
>
> I am SHMEM library developer and I am looking for approach to terminate
> the whole slurm job with the specific exit status, when one of processes
> initiate it. That is SHMEM library should have some API routine named
> 'globalexit(int status);', which terminates the job with other processes in
> it with status exit code.****
>
> ** **
>
> The only way I found out is to use PMI_Abort(status), but it does not work
> for zero status value, when PMI_Abort is invoked by zero process (daemon
> for PMI, as I understand). Is it normal behavior or a bug? Could you please
> help to find any other approaches, if this one does not seem proper for
> slurm?****
>
> ** **
>
> Thank you in advance,****
>
> Victor Kocheganov.****
>
> ****
>
>
>
>
> --
> Speak when you are angry--and you will make the best speech you'll ever
> regret.
>   - Laurence J. Peter ****
>
> ****
>
> ** **
>
> ****
>
>
>
> ****
>
> ** **
>
> --
> Speak when you are angry--and you will make the best speech you'll ever
> regret.
>   - Laurence J. Peter ****
>
> ****
>
> ** **
>
> ****
>
>

Reply via email to