Re: [OMPI users] MPI_Comm_spawn and exported variables

2013-12-19 Thread Tim Miller
Hi Ralph, That's correct. All of the original processes see the -x values, but spawned ones do not. Regards, Tim On Thu, Dec 19, 2013 at 6:09 PM, Ralph Castain wrote: > > On Dec 19, 2013, at 2:57 PM, Tim Miller wrote: > > > Hi All, > > > > I have a

Re: [OMPI users] openmpi-1.7.4a1r29646 with -hostfile option under Torque manager

2013-12-19 Thread Ralph Castain
Actually, it looks like it would happen with hetero-nodes set - only required that at least two nodes have the same architecture. So you might want to give the trunk a shot as it may well now be fixed. On Dec 19, 2013, at 8:35 AM, Ralph Castain wrote: > Hmmm...not having

Re: [OMPI users] MPI_Comm_spawn and exported variables

2013-12-19 Thread Ralph Castain
On Dec 19, 2013, at 2:57 PM, Tim Miller wrote: > Hi All, > > I have a question similar (but not identical to) the one asked by Tom Fogel a > week or so back... > > I have a code that uses MPI_Comm_spawn to launch different processes. The > executables for these use

[OMPI users] MPI_Comm_spawn and exported variables

2013-12-19 Thread Tim Miller
Hi All, I have a question similar (but not identical to) the one asked by Tom Fogel a week or so back... I have a code that uses MPI_Comm_spawn to launch different processes. The executables for these use libraries in non-standard locations, so what I've done is add the directories containing

Re: [OMPI users] environment variables and MPI_Comm_spawn

2013-12-19 Thread Ralph Castain
In trunk, cmr'd for 1.7.4 - copied you on ticket Thanks! Ralph On Dec 19, 2013, at 12:37 PM, tom fogal wrote: > Okay, no worries on the delay, and thanks! -tom > > On 12/19/2013 04:32 PM, Ralph Castain wrote: >> Sorry for delay - buried in my "day job". Adding values to

Re: [OMPI users] Error: Unable to create the sub-directory (/tmp/openmpi etc...)

2013-12-19 Thread Brandon Turner
Thanks a lot! Indeed, it was an issue of permissions. I did not realize the difference in the /tmp directories, and it seems that the /tmp directory for the node in question was "read-only". This has since been switched, and presumably everything else will run smoothly now. My fingers are crossed.

Re: [OMPI users] environment variables and MPI_Comm_spawn

2013-12-19 Thread tom fogal
Okay, no worries on the delay, and thanks! -tom On 12/19/2013 04:32 PM, Ralph Castain wrote: Sorry for delay - buried in my "day job". Adding values to the env array is fine, but this isn't how we would normally do it. I've got it noted on my "to-do" list and will try to get to it in time

Re: [OMPI users] openmpi-1.7.4a1r29646 with -hostfile option under Torque manager

2013-12-19 Thread Ralph Castain
Hmmm...not having any luck tracking this down yet. If anything, based on what I saw in the code, I would have expected it to fail when hetero-nodes was false, not the other way around. I'll keep poking around - just wanted to provide an update. On Dec 19, 2013, at 12:54 AM,

Re: [OMPI users] environment variables and MPI_Comm_spawn

2013-12-19 Thread Ralph Castain
Sorry for delay - buried in my "day job". Adding values to the env array is fine, but this isn't how we would normally do it. I've got it noted on my "to-do" list and will try to get to it in time for 1.7.5 Thanks Ralph On Dec 13, 2013, at 4:42 PM, Jeff Squyres (jsquyres)

Re: [OMPI users] EXTERNAL: Re: What's the status of OpenMPI and thread safety?

2013-12-19 Thread Blosch, Edwin L
Thanks Ralph, We are attempting to use 1.6.4 with an application that requires multi-threading, and it is hanging most of the time; it is using openib. They steered us to try Intel MPI for now. If you lack drivers/testers for improved thread safety on openib, let me know and I'll encourage

Re: [OMPI users] [EXTERNAL] Re: What's the status of OpenMPI and thread safety?

2013-12-19 Thread Barrett, Brian W
Pablo - As Ralph mentioned, it will be different, possibly not for the better, in 1.7. This is an area of active work, so any help would be appreciated. However, the one issue you brought up is going to be problematic, even with threads. Our design essentially makes it such that blocking MPI

Re: [OMPI users] What's the status of OpenMPI and thread safety?

2013-12-19 Thread Ralph Castain
Just answered a similar question yesterday: This was, in fact, a primary point of discussion at last week's OMPI developer's conference. Bottom line is that we are only a little further along than we used to be, but are focusing on improving it. You'll find good thread support for some

Re: [OMPI users] slowdown with infiniband and latest CentOS kernel

2013-12-19 Thread Noam Bernstein
On Dec 18, 2013, at 5:19 PM, Martin Siegert wrote: > > Thanks for figuring this out. Does this work for 1.6.x as well? > The FAQ http://www.open-mpi.org/faq/?category=tuning#using-paffinity > covers versions 1.2.x to 1.5.x. > Does 1.6.x support mpi_paffinity_alone = 1 ? > I set

Re: [OMPI users] What's the status of OpenMPI and thread safety?

2013-12-19 Thread Pablo Barrio
Hi all, this is the first time I post to the list (although I have read it for a while now). I hope this helps. I'm heavily using MPI_THREAD_MULTIPLE on multicores (sm BTL) and my programs work fine from a CORRECTNESS point of view. I use OpenMPI 1.6 (SVN rev. 26429) and pthreads on Linux.

Re: [OMPI users] slowdown with infiniband and latest CentOS kernel

2013-12-19 Thread Dave Love
Brice Goglin writes: > hwloc-ps (and lstopo --top) are better at showing process binding but > they lack a nice pseudographical interface with dynamic refresh. That seems like an advantage when you want to check on a cluster! > htop uses hwloc internally iirc, so there's

Re: [OMPI users] slowdown with infiniband and latest CentOS kernel

2013-12-19 Thread Dave Love
Noam Bernstein writes: > On Dec 18, 2013, at 10:32 AM, Dave Love wrote: > >> Noam Bernstein writes: >> >>> We specifically switched to 1.7.3 because of a bug in 1.6.4 (lock up in >>> some >>> collective

Re: [OMPI users] "-bind-to numa" of openmpi-1.7.4rc1 dosen't work for our magny cours based 32 core node

2013-12-19 Thread tmishima
I can wait it'll be fixed in 1.7.5 or later, because putting "-bind-to numa" and "-map-by numa" at the same time works as a workaround. Thanks, Tetsuya Mishima > Yeah, it will impact everything that uses hwloc topology maps, I fear. > > One side note: you'll need to add --hetero-nodes to your

Re: [OMPI users] "-bind-to numa" of openmpi-1.7.4rc1 dosen't work for our magny cours based 32 core node

2013-12-19 Thread Ralph Castain
Yeah, it will impact everything that uses hwloc topology maps, I fear. One side note: you'll need to add --hetero-nodes to your cmd line. If we don't see that, we assume that all the node topologies are identical - which clearly isn't true here. I'll try to resolve the hier inversion over the

Re: [OMPI users] "-bind-to numa" of openmpi-1.7.4rc1 dosen't work for our magny cours based 32 core node

2013-12-19 Thread tmishima
I think it's normal for AMD opteron having 8/16 cores such as magny cours or interlagos. Because it usually has 2 numa nodes in a cpu(socket), numa-node can not include a socket. This type of hierarchy would be natural. (node03 is Dell PowerEdge R815 and maybe quite common, I guess) By the

Re: [OMPI users] "-bind-to numa" of openmpi-1.7.4rc1 dosen't work for our magny cours based 32 core node

2013-12-19 Thread Ralph Castain
Ick - yeah, that would be a problem. I haven't seen that type of hierarchical inversion before - is node03 a different type of chip? Might take awhile for me to adjust the code to handle hier inversion... :-( On Dec 18, 2013, at 9:05 PM, tmish...@jcity.maeda.co.jp wrote: > > > Hi Ralph, > >

Re: [OMPI users] "-bind-to numa" of openmpi-1.7.4rc1 dosen't work for our magny cours based 32 core node

2013-12-19 Thread tmishima
Hi Ralph, I found the reason. I attached the main part of output with 32 core node(node03) and 8 core node(node05) at the bottom. >From this information, socket of node03 includes numa-node. On the other hand, numa-node of node05 includes socket. The direction of object tree is opposite.