Hey George,

thanks of course, this fully explains it, I simply assumed it being a problem 
of the child process.
In this case there is also no issue with negative values when considering the 
modulo 256.

BR Alex

From: George Bosilca <bosi...@icl.utk.edu>
Sent: Wednesday, July 19, 2023 4:45 PM
To: Alexander Stadik <alexander.sta...@essteyr.com>
Cc: Open MPI Users <users@lists.open-mpi.org>
Subject: [EXT] Re: [EXT] Re: [OMPI users] Error handling

External: Check sender address and use caution opening links or attachments

Alex,

exit(status) does not make status available to the parent process wait, instead 
it makes the low 8 bits available to the parent as unsigned. This explains why 
small positive values seem to work correctly while negative values do not 
(because of the 32 bits negative value representation in complement to two).

  George.


On Wed, Jul 19, 2023 at 12:45 AM Alexander Stadik 
<alexander.sta...@essteyr.com<mailto:alexander.sta...@essteyr.com>> wrote:
Hey George,

I said random only because I do not see the method behind it, but exactly like 
this when I do allreduce by MIN and return a negative number I get either 248, 
253, 11 or 6 usually. Meaning that's purely a number from MPI side.

The Problem with MPI_Abort is it shows the correct number in its output in 
Logfile, but it does not communicate its value to other processes, or forward 
its value to exit. So one also always sees these "random" values.

When using positive numbers in range it seems to work, so my question was on 
how it works, and how one can do it? Is there a way to let MPI_Abort 
communicate  the value as exit code?
Why do negative numbers not work, or does one simply have to always use 
positive numbers? Why I would prefer Abort is because it seems safer.

BR Alex

________________________________
Von: George Bosilca <bosi...@icl.utk.edu<mailto:bosi...@icl.utk.edu>>
Gesendet: Dienstag, 18. Juli 2023 18:47
An: Open MPI Users <users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>>
Cc: Alexander Stadik 
<alexander.sta...@essteyr.com<mailto:alexander.sta...@essteyr.com>>
Betreff: [EXT] Re: [OMPI users] Error handling

External: Check sender address and use caution opening links or attachments

Alex,

How are your values "random" if you provide correct values ? Even for negative 
values you could use MIN to pick one value and return it. What is the problem 
with `MPI_Abort` ? it does seem to do what you want.

  George.


On Tue, Jul 18, 2023 at 4:38 AM Alexander Stadik via users 
<users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>> wrote:
Hey everyone,

I am working for longer time now with cuda-aware OpenMPI, and developed longer 
time back a small exceptions handling framework including MPI and CUDA 
exceptions.
Currently I am using MPI_Abort with costum error numbers, to terminate 
everything elegantly, which works well, by just reading the logfile in case of 
a crash.

Now I was wondering how one can handle return / exit codes properly between 
processes, since we would like to filter non-zero exits by return code.

One way is a simple Allreduce (in my case) + exit instead of Abort. But the 
problem seems to be the values are always "random" (since I was using negative 
codes), only by using MPI error codes it seems to work correctly.
But usage of that is limited.

Any suggestions on how to do this / how it can work properly?

BR Alex


[https://www.essteyr.com/wp-content/uploads/2020/02/pic-1_1568d80e-78e3-426f-85e8-4bf0051208351.png]

[https://www.essteyr.com/wp-content/uploads/2021/01/ESSSignatur3.png]<https://www.essteyr.com/>

[https://www.essteyr.com/wp-content/uploads/2020/02/linkedin_38a91193-02cf-4df9-8e91-230f7459e9c3.png]<https://at.linkedin.com/company/ess-engineeringsoftwaresteyr>
 
[https://www.essteyr.com/wp-content/uploads/2020/02/twitter_5fc7318f-c0e4-495c-b96c-ebd9cf186067.png]
 <https://twitter.com/essteyr>  
[https://www.essteyr.com/wp-content/uploads/2020/02/facebook_ee01289e-1a90-48d0-8e82-049bb3c3a46b.png]
 <https://www.facebook.com/essteyr>  
[https://www.essteyr.com/wp-content/uploads/2020/09/SocialLink_Instagram_32x32_ea55186d-8d0b-4f5e-a023-02e04995f5bf.png]
 <https://www.instagram.com/ess_engineering_software_steyr/>

[cid:image001.png@01D9BAD2.AFB31D30]

DI Alexander Stadik

Head of Large Scale Solutions
Research & Development | Large Scale Solutions

[cid:image002.png@01D9BAD2.AFB31D30]Book a 
Meeting<https://outlook.office365.com/owa/calendar/di%20alexandersta...@essteyr.com/bookings/>

Phone:          +4372522044622
Company:     +43725220446

Mail: alexander.sta...@essteyr.com<mailto:alexander.sta...@essteyr.com>


Register of Firms No.: FN 427703 a
Commercial Court: District Court Steyr
UID: ATU69213102

[https://www.essteyr.com/wp-content/uploads/2018/09/pic-2_f96fc865-57a5-4ef1-a924-add9b85d55cc1.png]

ESS Engineering Software Steyr GmbH • Berggasse 35 • 4400 • Steyr • Austria

[https://www.essteyr.com/wp-content/uploads/2018/09/pic-2_1df6b77f-61f1-40d3-a337-0145e62afb3e1.png]

This message is confidential. It may also be privileged or otherwise protected 
by work product immunity or other legal rules. If you have received it by 
mistake, please let us know by e-mail reply and delete it from your system; you 
may not copy this message or disclose its contents to anyone. Please send us by 
fax any message containing deadlines as incoming e-mails are not screened for 
response deadlines. The integrity and security of this message cannot be 
guaranteed on the Internet.


Reply via email to