Re: [OMPI users] Hybrid OpenMPI/OpenMP leading to deadlocks?

2014-10-16 Thread Ralph Castain
If you only have one thread doing MPI calls, then single and funneled are indeed the same. If this is only happening after long run times, I'd suspect resource exhaustion. You might check your memory footprint to see if you are running into leak issues (could be in our library as well as your

Re: [OMPI users] Open MPI 1.8.3 openmpi-mca-params.conf: old and new parameters

2014-10-16 Thread Ralph Castain
On Oct 15, 2014, at 11:46 AM, Gus Correa wrote: > Thank you Ralph and Jeff for the help! > > Glad to hear the segmentation fault is reproducible and will be fixed. > > In any case, one can just avoid the old parameter name > (rmaps_base_schedule_policy), > and use

[OMPI users] Open MPI on Cray xc30 and getpwuid

2014-10-16 Thread Aurélien Bouteiller
I am building trunk on the Cray xc30. I get the following warning during link (static link) ../../../orte/.libs/libopen-rte.a(session_dir.o): In function `orte_session_dir_get_name': session_dir.c:(.text+0x226): warning: Using 'getpwuid' in statically linked applications requires at runtime

Re: [OMPI users] Open MPI on Cray xc30 and getpwuid

2014-10-16 Thread Ralph Castain
Add --disable-getpwuid to configure On Oct 16, 2014, at 12:36 AM, Aurélien Bouteiller wrote: > I am building trunk on the Cray xc30. > I get the following warning during link (static link) > ../../../orte/.libs/libopen-rte.a(session_dir.o): In function >

Re: [OMPI users] Hybrid OpenMPI/OpenMP leading to deadlocks?

2014-10-16 Thread McGrattan, Kevin B. Dr.
The individual MPI processes appear to be using a few percent of the system memory. I have created a loop containing repeated calls to MPI_TESTALL. When the process is in this loop for more than 10 s, it calls MPI_ABORT. So the only error message I see is related to all the processes being

Re: [OMPI users] Hybrid OpenMPI/OpenMP leading to deadlocks?

2014-10-16 Thread Gus Correa
Hi Kevin Wouldn't it be possible to make your code restartable, by saving the appropriate fluid configuration/phase space variables, and splitting your long run into smaller pieces? That is a very common strategy for large PDE integrations. Time invested in programming the restart features may

Re: [OMPI users] Open MPI 1.8.3 openmpi-mca-params.conf: old and new parameters

2014-10-16 Thread Gus Correa
Hi Ralph Yes, I know the process placement features are powerful. They were already very good in 1.6, even in 1.4, and I just tried the new 1.8 "-map-by l2cache" (works nicely on Opteron 6300). Unfortunately I couldn't keep track, test, and use the 1.7 series. I did that in the previous

Re: [OMPI users] Hybrid OpenMPI/OpenMP leading to deadlocks?

2014-10-16 Thread McGrattan, Kevin B. Dr.
Yes, the code is restartable, and our users often do this. We have users in countries with unreliable power supplies. However, we still try to make the code as robust as possible. Usually, if I do something improper in my MPI coding, failure occurs right away. But I've run out of ideas as to

Re: [OMPI users] Open MPI 1.8.3 openmpi-mca-params.conf: old and new parameters

2014-10-16 Thread Ralph Castain
On Oct 16, 2014, at 9:43 AM, Gus Correa wrote: > Hi Ralph > > Yes, I know the process placement features are powerful. > They were already very good in 1.6, even in 1.4, > and I just tried the new 1.8 > "-map-by l2cache" (works nicely on Opteron 6300). > >

[OMPI users] knem in Open MPI 1.8.3

2014-10-16 Thread Gus Correa
Dear Open MPI developers Well, I just can't keep my promises for too long ... So, here I am pestering you again, although this time it is not a request for more documentation. Hopefully it is something more legit. I am having trouble using knem with Open MPI 1.8.3, and need your help. I

Re: [OMPI users] knem in Open MPI 1.8.3

2014-10-16 Thread Aurélien Bouteiller
Are you sure you are not using the vader BTL ? Setting mca_btl_base_verbose and/or sm_verbose should spit out some knem initialization info. The CMA linux system (that ships with most 3.1x linux kernels) has similar features, and is also supported in sm. Aurelien -- ~~~ Aurélien

Re: [OMPI users] knem in Open MPI 1.8.3

2014-10-16 Thread Ralph Castain
FWIW: vader is the default in 1.8 On Oct 16, 2014, at 1:40 PM, Aurélien Bouteiller wrote: > Are you sure you are not using the vader BTL ? > > Setting mca_btl_base_verbose and/or sm_verbose should spit out some knem > initialization info. > > The CMA linux system

Re: [OMPI users] knem in Open MPI 1.8.3

2014-10-16 Thread Gus Correa
Thank you, Aurelien! Aha, "vader btl", that is new to me! I tought Vader was that man dressed in black in Star Wars, Obi-Wan Kenobi's nemesis. That was a while ago, my kids were children, and Alec Guiness younger than Harrison Ford is today. Oh, how nostalgic code developers can get when it

Re: [OMPI users] knem in Open MPI 1.8.3

2014-10-16 Thread Nathan Hjelm
And it doesn't support knem at this time. Probably never will because of the existence of CMA. -Nathan On Thu, Oct 16, 2014 at 01:49:09PM -0700, Ralph Castain wrote: > FWIW: vader is the default in 1.8 > > On Oct 16, 2014, at 1:40 PM, Aurélien Bouteiller wrote: > > > Are

Re: [OMPI users] knem in Open MPI 1.8.3

2014-10-16 Thread Nathan Hjelm
On Thu, Oct 16, 2014 at 05:27:54PM -0400, Gus Correa wrote: > Thank you, Aurelien! > > Aha, "vader btl", that is new to me! > I tought Vader was that man dressed in black in Star Wars, > Obi-Wan Kenobi's nemesis. > That was a while ago, my kids were children, > and Alec Guiness younger than

Re: [OMPI users] knem in Open MPI 1.8.3

2014-10-16 Thread Gus Correa
On 10/16/2014 04:49 PM, Ralph Castain wrote: > FWIW: vader is the default in 1.8 Yes, Ralph, thank you, I just noticed it in my job's stderr, after Aurelien pointed out that new "vader" thing existed. What a quick promotion: from inexistent to default btl! But what is "vader" after all? Any

Re: [OMPI users] knem in Open MPI 1.8.3

2014-10-16 Thread Gus Correa
On 10/16/2014 05:28 PM, Nathan Hjelm wrote: And it doesn't support knem at this time. Probably never will because of the existence of CMA. -Nathan Thanks, Nathan But for the benefit of mere mortals like me who don't share the dark or the bright side of the force, and just need to keep

Re: [OMPI users] knem in Open MPI 1.8.3

2014-10-16 Thread Gus Correa
Hi All Back to the original issue of knem in Open MPI 1.8.3. It really seems to be broken. I launched the Intel MPI benchmarks (IMB) job both with '-mca btl ^vader,tcp', and with '-mca btl sm,self,openib'. Both syntaxes seem to have turned off vader (along with tcp), as shown in stderr by

Re: [OMPI users] knem in Open MPI 1.8.3

2014-10-16 Thread Ralph Castain
You probably have this somewhere below, but what OS are you running? I have CentOS6, and vader works fine for me and is much faster than the sm btl. I can certainly ask to see if someone has time to fix the knem support - if they do, we would definitely include the fix in the 1.8 series. On

Re: [OMPI users] knem in Open MPI 1.8.3

2014-10-16 Thread Gus Correa
On 10/16/2014 05:38 PM, Nathan Hjelm wrote: On Thu, Oct 16, 2014 at 05:27:54PM -0400, Gus Correa wrote: Thank you, Aurelien! Aha, "vader btl", that is new to me! I tought Vader was that man dressed in black in Star Wars, Obi-Wan Kenobi's nemesis. That was a while ago, my kids were children,

Re: [OMPI users] knem in Open MPI 1.8.3

2014-10-16 Thread Jeff Squyres (jsquyres)
Gus -- Can you send the output of configure and your config.log? On Oct 16, 2014, at 4:24 PM, Gus Correa wrote: > On 10/16/2014 05:38 PM, Nathan Hjelm wrote: >> On Thu, Oct 16, 2014 at 05:27:54PM -0400, Gus Correa wrote: >>> Thank you, Aurelien! >>> >>> Aha, "vader

Re: [OMPI users] knem in Open MPI 1.8.3

2014-10-16 Thread Gus Correa
Hi Ralph I have clusters with CentOS 6.4, 6.5, and 5.5. OK, completing my table (ran on CentOS 6.4): #bytes #repetitions t[usec] Mbytes/sec 262144 16048.04 5203.93 :OMPI 1.6.5+knem 262144 16063.72 3923.30 :OMPI 1.8.3+vader 262144 160

Re: [OMPI users] knem in Open MPI 1.8.3

2014-10-16 Thread Gus Correa
On 10/16/2014 07:32 PM, Jeff Squyres (jsquyres) wrote: Gus -- Can you send the output of configure and your config.log? Hi Jeff. Sure. This is for the OMPI 1.8.3 build with Intel compilers that I've been using to compile and run IMB. The config.log is attached. The configure command and