Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is specifically requested

2014-05-07 Thread Ralph Castain
On May 7, 2014, at 6:15 PM, Christopher Samuel wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > Hi all, > > Apologies for having dropped out of the thread, night intervened here. ;-) > > On 08/05/14 00:45, Ralph Castain wrote: > >> Okay, then we'll just

Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is specifically requested

2014-05-07 Thread Ralph Castain
On May 7, 2014, at 6:51 PM, Christopher Samuel wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > On 07/05/14 18:00, Ralph Castain wrote: > >> Interesting - how many nodes were involved? As I said, the bad >> scaling becomes more evident at a fairly high node

Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is specifically requested

2014-05-07 Thread Artem Polyakov
That is interesting. I think I will reconstruct your experiments on my system when I will be testing PMI selection logic. According to your resource count numbers I can do that. I will publish my results in the list. 2014-05-08 8:51 GMT+07:00 Christopher Samuel : >

Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is specifically requested

2014-05-07 Thread Artem Polyakov
Hi Chris. Current disign is to provide the runtime parameter for PMI version selection. It would be even more flexible that configuration-time selection and (with my current understanding) not very hard to acheive. 2014-05-08 8:15 GMT+07:00 Christopher Samuel : >

Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is specifically requested

2014-05-07 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 07/05/14 18:00, Ralph Castain wrote: > Interesting - how many nodes were involved? As I said, the bad > scaling becomes more evident at a fairly high node count. Our x86-64 systems are low node counts (we've got BG/Q for capacity), the cluster

Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is specifically requested

2014-05-07 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi all, Apologies for having dropped out of the thread, night intervened here. ;-) On 08/05/14 00:45, Ralph Castain wrote: > Okay, then we'll just have to develop a workaround for all those > Slurm releases where PMI-2 is borked :-( Do you know

Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is specifically requested

2014-05-07 Thread Artem Polyakov
2014-05-08 7:15 GMT+07:00 Ralph Castain : > Take a look in opal/mca/common/pmi - we already do a bunch of #if PMI2 > stuff in there. All we are talking about doing here is: > > * making those selections be runtime based on an MCA param, compiling if > PMI2 is available but

Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is specifically requested

2014-05-07 Thread Ralph Castain
Take a look in opal/mca/common/pmi - we already do a bunch of #if PMI2 stuff in there. All we are talking about doing here is: * making those selections be runtime based on an MCA param, compiling if PMI2 is available but selected at runtime * moving some additional functions into that code

Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is specifically requested

2014-05-07 Thread Artem Polyakov
I like #2 too. But my question was slightly different. Can we incapsulate PMI logic that OMPI use in common/pmi as #2 suggests but have 2 different implementations of this component say common/pmi and common/pmi2? I am asking because I have concerns that this kind of component is not supposed to

Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is specifically requested

2014-05-07 Thread Ralph Castain
The desired solution is to have the ability to select pmi-1 vs pmi-2 at runtime. This can be done in two ways: 1. you could have separate pmi1 and pmi2 components in each framework. You'd want to define only one common MCA param to direct the selection, however. 2. you could have a single pmi

Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is specifically requested

2014-05-07 Thread Artem Polyakov
Just reread your suggestions in our out-of-list discussion and found that I misunderstand it. So no parallel PMI! Take all possible code into opal/mca/common/pmi. To additionally clarify what is the preferred way: 1. to create one joined PMI module having a switches to decide what functiononality

Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is specifically requested

2014-05-07 Thread Artem Polyakov
2014-05-08 5:54 GMT+07:00 Ralph Castain : > Ummmno, I don't think that's right. I believe we decided to instead > create the separate components, default to PMI-2 if available, print nice > error message if not, otherwise use PMI-1. > > I don't want to initialize both PMIs

Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is specifically requested

2014-05-07 Thread Ralph Castain
Ummmno, I don't think that's right. I believe we decided to instead create the separate components, default to PMI-2 if available, print nice error message if not, otherwise use PMI-1. I don't want to initialize both PMIs in parallel as most installations won't support it. On May 7,

Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is specifically requested

2014-05-07 Thread Artem Polyakov
We discussed with Ralph Joshuas concerns and decided to try automatic PMI2 correctness first as it was initially intended. Here is my idea. The universal way to decide if PMI2 is correct is to compare PMI_Init(.., , , ...) and PMI2_Init(.., , , ...). Size and rank should be equal. In this case we

Re: [OMPI devel] scif btl side effects

2014-05-07 Thread Hjelm, Nathan T
On Wednesday, May 07, 2014 5:23 AM, devel [devel-boun...@open-mpi.org] on behalf of Gilles Gouaillardet [gilles.gouaillar...@iferc.org] wrote: > To: Open MPI Developers > Subject: [OMPI devel] scif btl side effects > > Dear OpenMPI Folks, > > i noticed some crashes when running OpenMPI (both

Re: [OMPI devel] regression with derived datatypes

2014-05-07 Thread Rolf vandeVaart
I tried this. However, 23 bytes is too small so I added the 23 to the 56 (79) required for the PML header. I do not get the error. mpirun -host host0,host1 -np 2 --mca btl self,tcp --mca btl_tcp_flags 3 --mca btl_tcp_rndv_eager_limit 23 --mca btl_tcp_eager_limit 23 --mca

Re: [OMPI devel] regression with derived datatypes

2014-05-07 Thread George Bosilca
Strange. The outcome and the timing of this issue seems to highlight a link with the other datatype-related issue you reported earlier, and as suggested by Ralph with Gilles scif+vader issue. Generally speaking, the mechanism used to split the data in the case of multiple BTLs, is identical to

Re: [OMPI devel] regression with derived datatypes

2014-05-07 Thread Ralph Castain
I wonder if that might also explain the issue reported by Gilles regarding the scif BTL? In his example, the problem only occurred if the message was split across scif and vader. If so, then it might be that splitting messages in general is broken. On May 7, 2014, at 10:11 AM, Rolf vandeVaart

Re: [OMPI devel] regression with derived datatypes

2014-05-07 Thread Rolf vandeVaart
OK. So, I investigated a little more. I only see the issue when I am running with multiple ports enabled such that I have two openib BTLs instantiated. In addition, large message RDMA has to be enabled. If those conditions are not met, then I do not see the problem. For example: FAILS: Ø

Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is specifically requested

2014-05-07 Thread Ralph Castain
Yeah, we'll want to move some of it into common - but a lot of that was already done, so I think it won't be that hard. Will explore On May 7, 2014, at 9:00 AM, Joshua Ladd wrote: > +1 Sounds like a good idea - but decoupling the two and adding all the right > selection

Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is specifically requested

2014-05-07 Thread Joshua Ladd
+1 Sounds like a good idea - but decoupling the two and adding all the right selection mojo might be a bit of a pain. There are several places in OMPI where the distinction between PMI1and PMI2 is made, not only in grpcomm. DB and ESS frameworks off the top of my head. Josh On Wed, May 7, 2014

Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is specifically requested

2014-05-07 Thread Artem Polyakov
Good idea :)! среда, 7 мая 2014 г. пользователь Ralph Castain написал: > Jeff actually had a useful suggestion (gasp!).He proposed that we separate > the PMI-1 and PMI-2 codes into separate components so you could select them > at runtime. Thus, we would build both (assuming both PMI-1 and 2

Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is specifically requested

2014-05-07 Thread Ralph Castain
Jeff actually had a useful suggestion (gasp!).He proposed that we separate the PMI-1 and PMI-2 codes into separate components so you could select them at runtime. Thus, we would build both (assuming both PMI-1 and 2 libs are found), default to PMI-1, but users could select to try PMI-2. If the

Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is specifically requested

2014-05-07 Thread Moody, Adam T.
Hi Josh, Are your changes to OMPI or SLURM's PMI2 implementation? Do you plan to push those changes back upstream? -Adam From: devel [devel-boun...@open-mpi.org] on behalf of Joshua Ladd [jladd.m...@gmail.com] Sent: Wednesday, May 07, 2014 7:56 AM To: Open MPI

Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is specifically requested

2014-05-07 Thread Moody, Adam T.
Thanks, Chris. -Adam From: devel [devel-boun...@open-mpi.org] on behalf of Christopher Samuel [sam...@unimelb.edu.au] Sent: Wednesday, May 07, 2014 12:07 AM To: de...@open-mpi.org Subject: Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is

Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is specifically requested

2014-05-07 Thread Ralph Castain
On May 7, 2014, at 7:56 AM, Joshua Ladd wrote: > Ah, I see. Sorry for the reactionary comment - but this feature falls > squarely within my "jurisdiction", and we've invested a lot in improving OMPI > jobstart under srun. > > That being said (now that I've taken some

Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is specifically requested

2014-05-07 Thread Joshua Ladd
Ah, I see. Sorry for the reactionary comment - but this feature falls squarely within my "jurisdiction", and we've invested a lot in improving OMPI jobstart under srun. That being said (now that I've taken some deep breaths and carefully read your original email :)), what you're proposing isn't a

Re: [OMPI devel] regression with derived datatypes

2014-05-07 Thread Joshua Ladd
Rolf, This was run on a Sandy Bridge system with ConnectX-3 cards. Josh On Wed, May 7, 2014 at 10:46 AM, Joshua Ladd wrote: > Elena, can you run your reproducer on the trunk, please, and see if the > problem persists? > > Josh > > > On Wed, May 7, 2014 at 10:26 AM, Jeff

Re: [OMPI devel] regression with derived datatypes

2014-05-07 Thread Joshua Ladd
Elena, can you run your reproducer on the trunk, please, and see if the problem persists? Josh On Wed, May 7, 2014 at 10:26 AM, Jeff Squyres (jsquyres) wrote: > On May 7, 2014, at 10:03 AM, Elena Elkina wrote: > > > Yes, this commit is also in

Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is specifically requested

2014-05-07 Thread Ralph Castain
Okay, then we'll just have to develop a workaround for all those Slurm releases where PMI-2 is borked :-( FWIW: I think people misunderstood my statement. I specifically did *not* propose to *lose* PMI-2 support. I suggested that we change it to "on-by-request" instead of the current

Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is specifically requested

2014-05-07 Thread Joshua Ladd
Just saw this thread, and I second Chris' observations: at scale we are seeing huge gains in jobstart performance with PMI2 over PMI1. We *CANNOT*loose this functionality. For competitive reasons, I cannot provide exact numbers, but let's say the difference is in the ballpark of a full

Re: [OMPI devel] regression with derived datatypes

2014-05-07 Thread Jeff Squyres (jsquyres)
On May 7, 2014, at 10:03 AM, Elena Elkina wrote: > Yes, this commit is also in the trunk. Yes, I understand that -- my question is: is this same *behavior* happening on the trunk. I.e., is there some other effect on the trunk that is causing the bad behavior to not

Re: [OMPI devel] regression with derived datatypes

2014-05-07 Thread Rolf vandeVaart
This seems similar to what I reported on a different thread. http://www.open-mpi.org/community/lists/devel/2014/05/14688.php I need to try and reproduce again. Elena, what kind of cluster were your running on? Rolf From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Elena Elkina

Re: [OMPI devel] regression with derived datatypes

2014-05-07 Thread Elena Elkina
Yes, this commit is also in the trunk. Best, Elena On Wed, May 7, 2014 at 5:45 PM, Jeff Squyres (jsquyres) wrote: > Is this also happening on the trunk? > > > Sent from my phone. No type good. > > On May 7, 2014, at 9:44 AM, "Elena Elkina" >

Re: [OMPI devel] regression with derived datatypes

2014-05-07 Thread Jeff Squyres (jsquyres)
Is this also happening on the trunk? Sent from my phone. No type good. On May 7, 2014, at 9:44 AM, "Elena Elkina" > wrote: Sorry, Fixes #4501: Datatype unpack code produces incorrect results in some case

Re: [OMPI devel] regression with derived datatypes

2014-05-07 Thread Elena Elkina
Sorry, Fixes #4501: Datatype unpack code produces incorrect results in some case ---svn-pre-commit-ignore-below--- r31370 [[BR]] Reshape all the packing/unpacking functions to use the same skeleton. Rewrite the generic_unpacking to take advantage of the same capabilitites. r31380 [[BR]] Remove

Re: [OMPI devel] regression with derived datatypes

2014-05-07 Thread Jeff Squyres (jsquyres)
Can you cite the branch and SVN r number? Sent from my phone. No type good. > On May 7, 2014, at 9:24 AM, "Elena Elkina" wrote: > > b531973419a056696e6f88d813769aa4f1f1aee6

[OMPI devel] regression with derived datatypes

2014-05-07 Thread Elena Elkina
Hi, I've found that commit b531973419a056696e6f88d813769aa4f1f1aee6 doesn't work Author: Jeff Squyres List-Post: devel@lists.open-mpi.org Date: Tue Apr 22 19:48:56 2014 + caused new failures with derived datatypes. Collectives return incorrect

[OMPI devel] scif btl side effects

2014-05-07 Thread Gilles Gouaillardet
Dear OpenMPI Folks, i noticed some crashes when running OpenMPI (both latest v1.8 and trunk from svn) on a single linux system where a MIC is available. /* strictly speaking, MIC hardware is not needed: libscif.so, mic kernel module and accessible /dev/mic/* are enough */ the attached test_scif

Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is specifically requested

2014-05-07 Thread Ralph Castain
Interesting - how many nodes were involved? As I said, the bad scaling becomes more evident at a fairly high node count. On May 7, 2014, at 12:07 AM, Christopher Samuel wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > Hiya Ralph, > > On 07/05/14 14:49,

Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is specifically requested

2014-05-07 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hiya Ralph, On 07/05/14 14:49, Ralph Castain wrote: > I should have looked closer to see the numbers you posted, Chris - > those include time for MPI wireup. So what you are seeing is that > mpirun is much more efficient at exchanging the MPI

Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is specifically requested

2014-05-07 Thread Ralph Castain
I should have looked closer to see the numbers you posted, Chris - those include time for MPI wireup. So what you are seeing is that mpirun is much more efficient at exchanging the MPI endpoint info than PMI. I suspect that PMI2 is not much better as the primary reason for the difference is

Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is specifically requested

2014-05-07 Thread Ralph Castain
Ah, interesting - my comments were in respect to startup time (specifically, MPI wireup) On May 6, 2014, at 8:49 PM, Christopher Samuel wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > On 07/05/14 13:37, Moody, Adam T. wrote: > >> Hi Chris, > > Hi Adam, >

Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is specifically requested

2014-05-07 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 07/05/14 13:37, Moody, Adam T. wrote: > Hi Chris, Hi Adam, > I'm interested in SLURM / OpenMPI startup numbers, but I haven't > done this testing myself. We're stuck with an older version of > SLURM for various internal reasons, and I'm

Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is specifically requested

2014-05-07 Thread Ralph Castain
FWIW: we see varying reports about the scalability of Slurm, especially at large cluster sizes. Last I saw/tested, there is a quadratic term that begins to dominate above 2k nodes. Others swear it is better . Guess I'd be cautious and definitely test things before investing in a move - I'm not