I'm leery of this patch - will discuss with other MPI folks as this could cause problems for existing apps
> > > Sent from my iPhone > > On Jun 3, 2013, at 5:17 AM, "Riebs, Andy" <[email protected]> wrote: > > Hi Victor,**** > > ** ** > > If the patch is straight-forward, and the reason for it is clear, patches > sent to this list tend to be adopted quickly. However, since this changes > behavior that someone else may be counting on, it might get held for the > next major release if it is accepted.**** > > ** ** > > Andy**** > > ** ** > > *From:* Victor Kocheganov > [mailto:[email protected]<[email protected]>] > > *Sent:* Monday, June 03, 2013 6:48 AM > *To:* slurm-dev > *Subject:* [slurm-dev] Re: PMI_Abort with zero value**** > > ** ** > > OK, I see.**** > > ** ** > > I've got SLURM sources with PMI in it and found out the reason of > "strange" behavior (I mean 0 rank process behaves different from others in > PMI_Abort()). **** > > It seems clear to deal with it. Is it a complex procedure to provide a > minor fix to community (via patch)?**** > > ** ** > > On Mon, Jun 3, 2013 at 12:18 AM, Brian Gilmer <[email protected]> wrote:* > *** > > We are able to use _exit() so I did not go any further. The behavior of > PMI_Abort() and exit() were both odd so I thought that my save you some > time. I am interested if you find another solution. **** > > ** ** > > On Sun, Jun 2, 2013 at 3:39 AM, Victor Kocheganov < > [email protected]> wrote:**** > > Thank you for the rapid answer! But still I have several questions, please > see inline.**** > > ** ** > > On Fri, May 31, 2013 at 6:47 PM, Brian Gilmer <[email protected]> wrote:* > *** > > I addressed a similar problem with _exit(<value>). **** > > [Victor Kocheganov] Where can I find it? I can not any clue in archive of > slurm-dev list > (http://dir.gmane.org/gmane.comp.distributed.slurm.devel<http://dir.gmane.org/gmane.comp.distributed.slurm.devel%22> > )**** > > ** ** > > Slurm will kill off the rest of the pe in a job step if one exits with a > non-zero code. **** > > [Victor Kocheganov] Unfortunately it depends on slurm configurations as > far as I know (whether '-K' flag is set or not; it could be set > implicitly). So I can not rely on such a behavior...**** > > ** ** > > The exit() function doesn't work under mx shmem because the exit() > function is overridden and does not propagate the exit code. > PMI_Abort(exit_code) uses exit() so in our case it always returns an exit > code of 9 regardless of the value of exit_code.**** > > [Victor Kocheganov] And this is interesting, because I see that SLURM > always returns zero value to system when PMI_Abort(0,NULL) was invoked by > some process, except for the case when process with zero rank (PMI daemon > as I suspect) invoked it. Therefore a little hope still exists in my mind, > that I can make PMI_Abort work for me (return zero always in > case PMI_Abort(0,NULL)).**** > > ** ** > > But you are saying that there is no hope in PMI_Abort(), am I understand > right? Do you have any other ways to make SLURM ( using PMI or without it) > terminate all the processes if one of them requested it (with passed exit > statuses off course)? **** > > ** ** > > On Fri, May 31, 2013 at 5:23 AM, Victor Kocheganov < > [email protected]> wrote:**** > > Hello,**** > > ** ** > > I am SHMEM library developer and I am looking for approach to terminate > the whole slurm job with the specific exit status, when one of processes > initiate it. That is SHMEM library should have some API routine named > 'globalexit(int status);', which terminates the job with other processes in > it with status exit code.**** > > ** ** > > The only way I found out is to use PMI_Abort(status), but it does not work > for zero status value, when PMI_Abort is invoked by zero process (daemon > for PMI, as I understand). Is it normal behavior or a bug? Could you please > help to find any other approaches, if this one does not seem proper for > slurm?**** > > ** ** > > Thank you in advance,**** > > Victor Kocheganov.**** > > **** > > > > > -- > Speak when you are angry--and you will make the best speech you'll ever > regret. > - Laurence J. Peter **** > > **** > > ** ** > > **** > > > > **** > > ** ** > > -- > Speak when you are angry--and you will make the best speech you'll ever > regret. > - Laurence J. Peter **** > > **** > > ** ** > > **** > >
