Re: [OMPI users] Qlogic & openmpi
On Mon, Dec 5, 2011 at 16:12, Ralph Castainwrote: > Sounds like we should be setting this value when starting the process - yes? > If so, what is the "good" value, and how do we compute it? I've also been just looking at this for the past few days. What I came up with is a small script psm_shctx which sets the envvar then execs the MPI binary and is inserted between mpirun and the MPI binary: mpirun psm_shctx my_mpi_app Of course the same effect can be obtained if the orted would set the envvar before starting the process. There is however a problem: deciding how many contexts to use. For max. performance, one should use a ratio of 1:1 between MPI ranks and contexts; the highest ratio possible (but with lowest performance) is 4 MPI ranks per context; another restriction is that each job should have at least 1 context. F.e. on AMD cluster nodes with 4 CPUs of 12 cores (so total of 48 cores) one gets 16 contexts; assigning all 16 contexts to 48 ranks would mean a ratio of 1:3 but this can only apply if allocation of cores is done in multiples of 4; with a less advantageous allocation strategy more contexts are lost due to rounding up. At the extreme, if there's only one rank per job, there can only be maximum 16 jobs - using all 16 contexts and the rest of 32 cores have to remain idle or be used for other jobs that don't require communication over InfiniPath. There is a further issue though: MPI-2 dynamic creation of processes - if it's not known how many ranks there will be, I guess one should use the highest context sharing ratio (1:4) to be on the safe side. I've found a mention of this envvar being handled in the changelog for MVAPICH2 1.4.1 - maybe that can serve as source of inspiration ? (but I haven't looked at it...) Hope this helps, Bogdan
Re: [OMPI users] Qlogic & openmpi
- Arnaud HERITIER Meteo France International +33 561432940 arnaud.herit...@mfi.fr -- On Mon, Dec 5, 2011 at 6:12 PM, Ralph Castainwrote: > > On Dec 5, 2011, at 5:49 AM, arnaud Heritier wrote: > > Hello, > > I found the solution, thanks to Qlogic support. > > The "can't open /dev/ipath, network down (err=26)" message from the ipath > driver is really misleading. > > Actually, this is an hardware context problem on the Qlogic PSM. PSM can't > allocate any hardware context for the job because other(s) MPI job(s) have > already used all available contexts. In order to avoid this problem, every > MPI jobs have to use the PSM_SHAREDCONTEXTS_MAX variable set with the good > value, according to the number of processes that will run on the node. If > we don't use this variable, PSM will "greedily" use all contexts with the > first mpi job spawned on the node. > > > Sounds like we should be setting this value when starting the process - > yes? If so, what is the "good" value, and how do we compute it? > The good value is roundup( $OMPI_COMM_WORLD_LOCAL_SIZE / Context shared ratio) (ratio max 4 on my HCA) Qlogic provided me with simple script to compute this value, i just changed my mpirun script to call this script , set the PSM_SHAREDCONTEXTS_MAX with the returned value and the call the mpi binary. Script attached. Arnaud > Regards, > > Arnaud > > > On Tue, Nov 29, 2011 at 6:44 PM, Jeff Squyres wrote: > >> On Nov 28, 2011, at 11:53 PM, arnaud Heritier wrote: >> >> > I do have a contract and i tried to open a case, but their support is >> .. >> >> What happens if you put a delay between the two jobs? E.g., if you just >> delay a few seconds before the 2nd job starts? Perhaps the ipath device >> just needs a little time before it will be available...? (that's a total >> guess) >> >> I suggest this because the PSM device will definitely give you better >> overall performance than the QLogic verbs support. Their verbs support >> basically barely works -- PSM is their primary device and the one that we >> always recommend. >> >> > Anyway. I'm stii working on the strange error message from mpirun >> saying it can't allocate memory when at the same time it also reports that >> the memory is unlimited ... >> > >> > >> > Arnaud >> > >> > On Tue, Nov 29, 2011 at 4:23 AM, Jeff Squyres >> wrote: >> > I'm afraid we don't have any contacts left at QLogic to ask them any >> more... do you have a support contract, perchance? >> > >> > On Nov 27, 2011, at 3:11 PM, Arnaud Heritier wrote: >> > >> > > Hello, >> > > >> > > I run into a stange problem with qlogic OFED and openmpi. When i >> submit (through SGE) 2 jobs on the same node, the second job ends up with: >> > > >> > > (ipath/PSM)[10292]: can't open /dev/ipath, network down (err=26) >> > > >> > > I'm pretty sure the infiniband is working well as the other job runs >> fine. >> > > >> > > Here is details about the configuration: >> > > >> > > Qlogic HCA: InfiniPath_QMH7342 (2 ports but only one connected to a >> switch) >> > > qlogic_ofed-1.5.3-7.0.0.0.35 (rocks cluster roll) >> > > openmpi 1.5.4 (./configure --with-psm --with-openib --with-sge) >> > > >> > > - >> > > >> > > In order to fix this problem i recompiled openmpi without psm >> support, but i faced an other problem: >> > > >> > > The OpenFabrics (openib) BTL failed to initialize while trying to >> > > allocate some locked memory. This typically can indicate that the >> > > memlock limits are set too low. For most HPC installations, the >> > > memlock limits should be set to "unlimited". The failure occured >> > > here: >> > > >> > > Local host:compute-0-6.local >> > > OMPI source: btl_openib.c:329 >> > > Function: ibv_create_srq() >> > > Device:qib0 >> > > Memlock limit: unlimited >> > > >> > > >> > > ___ >> > > users mailing list >> > > us...@open-mpi.org >> > > http://www.open-mpi.org/mailman/listinfo.cgi/users >> > >> > >> > -- >> > Jeff Squyres >> > jsquy...@cisco.com >> > For corporate legal information go to: >> > http://www.cisco.com/web/about/doing_business/legal/cri/ >> > >> > >> > ___ >> > users mailing list >> > us...@open-mpi.org >> > http://www.open-mpi.org/mailman/listinfo.cgi/users >> > >> > ___ >> > users mailing list >> > us...@open-mpi.org >> > http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> -- >> Jeff Squyres >> jsquy...@cisco.com >> For corporate legal information go to: >> http://www.cisco.com/web/about/doing_business/legal/cri/ >> >> >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > >
Re: [OMPI users] Qlogic & openmpi
On Dec 5, 2011, at 5:49 AM, arnaud Heritier wrote: > Hello, > > I found the solution, thanks to Qlogic support. > > The "can't open /dev/ipath, network down (err=26)" message from the ipath > driver is really misleading. > > Actually, this is an hardware context problem on the Qlogic PSM. PSM can't > allocate any hardware context for the job because other(s) MPI job(s) have > already used all available contexts. In order to avoid this problem, every > MPI jobs have to use the PSM_SHAREDCONTEXTS_MAX variable set with the good > value, according to the number of processes that will run on the node. If we > don't use this variable, PSM will "greedily" use all contexts with the first > mpi job spawned on the node. Sounds like we should be setting this value when starting the process - yes? If so, what is the "good" value, and how do we compute it? > > Regards, > > Arnaud > > > On Tue, Nov 29, 2011 at 6:44 PM, Jeff Squyreswrote: > On Nov 28, 2011, at 11:53 PM, arnaud Heritier wrote: > > > I do have a contract and i tried to open a case, but their support is .. > > What happens if you put a delay between the two jobs? E.g., if you just > delay a few seconds before the 2nd job starts? Perhaps the ipath device just > needs a little time before it will be available...? (that's a total guess) > > I suggest this because the PSM device will definitely give you better overall > performance than the QLogic verbs support. Their verbs support basically > barely works -- PSM is their primary device and the one that we always > recommend. > > > Anyway. I'm stii working on the strange error message from mpirun saying it > > can't allocate memory when at the same time it also reports that the memory > > is unlimited ... > > > > > > Arnaud > > > > On Tue, Nov 29, 2011 at 4:23 AM, Jeff Squyres wrote: > > I'm afraid we don't have any contacts left at QLogic to ask them any > > more... do you have a support contract, perchance? > > > > On Nov 27, 2011, at 3:11 PM, Arnaud Heritier wrote: > > > > > Hello, > > > > > > I run into a stange problem with qlogic OFED and openmpi. When i submit > > > (through SGE) 2 jobs on the same node, the second job ends up with: > > > > > > (ipath/PSM)[10292]: can't open /dev/ipath, network down (err=26) > > > > > > I'm pretty sure the infiniband is working well as the other job runs fine. > > > > > > Here is details about the configuration: > > > > > > Qlogic HCA: InfiniPath_QMH7342 (2 ports but only one connected to a > > > switch) > > > qlogic_ofed-1.5.3-7.0.0.0.35 (rocks cluster roll) > > > openmpi 1.5.4 (./configure --with-psm --with-openib --with-sge) > > > > > > - > > > > > > In order to fix this problem i recompiled openmpi without psm support, > > > but i faced an other problem: > > > > > > The OpenFabrics (openib) BTL failed to initialize while trying to > > > allocate some locked memory. This typically can indicate that the > > > memlock limits are set too low. For most HPC installations, the > > > memlock limits should be set to "unlimited". The failure occured > > > here: > > > > > > Local host:compute-0-6.local > > > OMPI source: btl_openib.c:329 > > > Function: ibv_create_srq() > > > Device:qib0 > > > Memlock limit: unlimited > > > > > > > > > ___ > > > users mailing list > > > us...@open-mpi.org > > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > > -- > > Jeff Squyres > > jsquy...@cisco.com > > For corporate legal information go to: > > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > > > > ___ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > ___ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Qlogic & openmpi
Hello, I found the solution, thanks to Qlogic support. The "can't open /dev/ipath, network down (err=26)" message from the ipath driver is really misleading. Actually, this is an hardware context problem on the Qlogic PSM. PSM can't allocate any hardware context for the job because other(s) MPI job(s) have already used all available contexts. In order to avoid this problem, every MPI jobs have to use the PSM_SHAREDCONTEXTS_MAX variable set with the good value, according to the number of processes that will run on the node. If we don't use this variable, PSM will "greedily" use all contexts with the first mpi job spawned on the node. Regards, Arnaud On Tue, Nov 29, 2011 at 6:44 PM, Jeff Squyreswrote: > On Nov 28, 2011, at 11:53 PM, arnaud Heritier wrote: > > > I do have a contract and i tried to open a case, but their support is > .. > > What happens if you put a delay between the two jobs? E.g., if you just > delay a few seconds before the 2nd job starts? Perhaps the ipath device > just needs a little time before it will be available...? (that's a total > guess) > > I suggest this because the PSM device will definitely give you better > overall performance than the QLogic verbs support. Their verbs support > basically barely works -- PSM is their primary device and the one that we > always recommend. > > > Anyway. I'm stii working on the strange error message from mpirun saying > it can't allocate memory when at the same time it also reports that the > memory is unlimited ... > > > > > > Arnaud > > > > On Tue, Nov 29, 2011 at 4:23 AM, Jeff Squyres > wrote: > > I'm afraid we don't have any contacts left at QLogic to ask them any > more... do you have a support contract, perchance? > > > > On Nov 27, 2011, at 3:11 PM, Arnaud Heritier wrote: > > > > > Hello, > > > > > > I run into a stange problem with qlogic OFED and openmpi. When i > submit (through SGE) 2 jobs on the same node, the second job ends up with: > > > > > > (ipath/PSM)[10292]: can't open /dev/ipath, network down (err=26) > > > > > > I'm pretty sure the infiniband is working well as the other job runs > fine. > > > > > > Here is details about the configuration: > > > > > > Qlogic HCA: InfiniPath_QMH7342 (2 ports but only one connected to a > switch) > > > qlogic_ofed-1.5.3-7.0.0.0.35 (rocks cluster roll) > > > openmpi 1.5.4 (./configure --with-psm --with-openib --with-sge) > > > > > > - > > > > > > In order to fix this problem i recompiled openmpi without psm support, > but i faced an other problem: > > > > > > The OpenFabrics (openib) BTL failed to initialize while trying to > > > allocate some locked memory. This typically can indicate that the > > > memlock limits are set too low. For most HPC installations, the > > > memlock limits should be set to "unlimited". The failure occured > > > here: > > > > > > Local host:compute-0-6.local > > > OMPI source: btl_openib.c:329 > > > Function: ibv_create_srq() > > > Device:qib0 > > > Memlock limit: unlimited > > > > > > > > > ___ > > > users mailing list > > > us...@open-mpi.org > > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > > -- > > Jeff Squyres > > jsquy...@cisco.com > > For corporate legal information go to: > > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > > > > ___ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > ___ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] Qlogic & openmpi
On Nov 28, 2011, at 11:53 PM, arnaud Heritier wrote: > I do have a contract and i tried to open a case, but their support is .. What happens if you put a delay between the two jobs? E.g., if you just delay a few seconds before the 2nd job starts? Perhaps the ipath device just needs a little time before it will be available...? (that's a total guess) I suggest this because the PSM device will definitely give you better overall performance than the QLogic verbs support. Their verbs support basically barely works -- PSM is their primary device and the one that we always recommend. > Anyway. I'm stii working on the strange error message from mpirun saying it > can't allocate memory when at the same time it also reports that the memory > is unlimited ... > > > Arnaud > > On Tue, Nov 29, 2011 at 4:23 AM, Jeff Squyreswrote: > I'm afraid we don't have any contacts left at QLogic to ask them any more... > do you have a support contract, perchance? > > On Nov 27, 2011, at 3:11 PM, Arnaud Heritier wrote: > > > Hello, > > > > I run into a stange problem with qlogic OFED and openmpi. When i submit > > (through SGE) 2 jobs on the same node, the second job ends up with: > > > > (ipath/PSM)[10292]: can't open /dev/ipath, network down (err=26) > > > > I'm pretty sure the infiniband is working well as the other job runs fine. > > > > Here is details about the configuration: > > > > Qlogic HCA: InfiniPath_QMH7342 (2 ports but only one connected to a switch) > > qlogic_ofed-1.5.3-7.0.0.0.35 (rocks cluster roll) > > openmpi 1.5.4 (./configure --with-psm --with-openib --with-sge) > > > > - > > > > In order to fix this problem i recompiled openmpi without psm support, but > > i faced an other problem: > > > > The OpenFabrics (openib) BTL failed to initialize while trying to > > allocate some locked memory. This typically can indicate that the > > memlock limits are set too low. For most HPC installations, the > > memlock limits should be set to "unlimited". The failure occured > > here: > > > > Local host:compute-0-6.local > > OMPI source: btl_openib.c:329 > > Function: ibv_create_srq() > > Device:qib0 > > Memlock limit: unlimited > > > > > > ___ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI users] Qlogic & openmpi
I do have a contract and i tried to open a case, but their support is ..Anyway. I'm stii working on the strange error message from mpirun saying it can't allocate memory when at the same time it also reports that the memory is unlimited ... Arnaud On Tue, Nov 29, 2011 at 4:23 AM, Jeff Squyreswrote: > I'm afraid we don't have any contacts left at QLogic to ask them any > more... do you have a support contract, perchance? > > On Nov 27, 2011, at 3:11 PM, Arnaud Heritier wrote: > > > Hello, > > > > I run into a stange problem with qlogic OFED and openmpi. When i submit > (through SGE) 2 jobs on the same node, the second job ends up with: > > > > (ipath/PSM)[10292]: can't open /dev/ipath, network down (err=26) > > > > I'm pretty sure the infiniband is working well as the other job runs > fine. > > > > Here is details about the configuration: > > > > Qlogic HCA: InfiniPath_QMH7342 (2 ports but only one connected to a > switch) > > qlogic_ofed-1.5.3-7.0.0.0.35 (rocks cluster roll) > > openmpi 1.5.4 (./configure --with-psm --with-openib --with-sge) > > > > - > > > > In order to fix this problem i recompiled openmpi without psm support, > but i faced an other problem: > > > > The OpenFabrics (openib) BTL failed to initialize while trying to > > allocate some locked memory. This typically can indicate that the > > memlock limits are set too low. For most HPC installations, the > > memlock limits should be set to "unlimited". The failure occured > > here: > > > > Local host:compute-0-6.local > > OMPI source: btl_openib.c:329 > > Function: ibv_create_srq() > > Device:qib0 > > Memlock limit: unlimited > > > > > > ___ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
[OMPI users] Qlogic & openmpi
Hello, I run into a stange problem with qlogic OFED and openmpi. When i submit (through SGE) 2 jobs on the same node, the second job ends up with: (ipath/PSM)[10292]: can't open /dev/ipath, network down (err=26) I'm pretty sure the infiniband is working well as the other job runs fine. Here is details about the configuration: Qlogic HCA: InfiniPath_QMH7342 (2 ports but only one connected to a switch) qlogic_ofed-1.5.3-7.0.0.0.35 (rocks cluster roll) openmpi 1.5.4 (./configure --with-psm --with-openib --with-sge) - In order to fix this problem i recompiled openmpi without psm support, but i faced an other problem: The OpenFabrics (openib) BTL failed to initialize while trying to allocate some locked memory. This typically can indicate that the memlock limits are set too low. For most HPC installations, the memlock limits should be set to "unlimited". The failure occured here: Local host:compute-0-6.local OMPI source: btl_openib.c:329 Function: ibv_create_srq() Device:qib0 Memlock limit: *unlimited*