Re: [OMPI users] Error - BTLs attempted: self sm - on a cluster with IB and openib btl enabled

2013-08-13 Thread Gus Correa
Hi Ralph Thank you. I switched back to memlock unlimited, rebooted the nodes, and after that OpenMPI is working right with Infinband. As for why the problem happened first place, I can only think that somehow the Infiniband kernel modules and driver didn't like my reducing the memlock limit,

Re: [OMPI users] Error - BTLs attempted: self sm - on a cluster with IB and openib btl enabled

2013-08-12 Thread Ralph Castain
Seems strange that it would have something to do with IB - it seems that alloc itself is failing, and at only 512 bytes, that doesn't seem like something IB would cause. If you write a little program that calls alloc (no MPI), does it also fail? On Aug 12, 2013, at 3:35 PM, Gus Correa

Re: [OMPI users] Error - BTLs attempted: self sm - on a cluster with IB and openib btl enabled

2013-08-12 Thread Gus Correa
Hi Ralph Sorry if this is more of an IB than an OMPI problem, but my view angle shows it through the OMPI jobs failing. Yes, indeed I was setting memlock to unlimited in limits.conf and in the pbs_mom, restarting everything, relaunching the job. The error message changes, but it still fails on

Re: [OMPI users] Error - BTLs attempted: self sm - on a cluster with IB and openib btl enabled

2013-08-12 Thread Ralph Castain
No, this has nothing to do with the registration limit. For some reason, the system is refusing to create a thread - i.e., it is pthread_create that is failing. I have no idea what would be causing that to happen. Try setting it to unlimited and see if it allows the thread to start, I guess.

Re: [OMPI users] Error - BTLs attempted: self sm - on a cluster with IB and openib btl enabled

2013-08-12 Thread Gus Correa
Hi Ralph, all I include more information below, after turning on btl_openib_verbose 30. As you can see, OMPI tries, and fails, to load openib. Last week I reduced the memlock limit from unlimited to ~12GB, as part of a general attempt to reign on memory use/abuse by jobs sharing a node. No

Re: [OMPI users] Error - BTLs attempted: self sm - on a cluster with IB and openib btl enabled

2013-08-12 Thread Gus Correa
Thank you for the prompt help, Ralph! Yes, it is OMPI 1.4.3 built with openib support: $ ompi_info | grep openib MCA btl: openib (MCA v2.0, API v2.0, Component v1.4.3) There are three libraries in prefix/lib/openmpi, no mca_btl_openib library. $ ls $PREFIX/lib/openmpi/

Re: [OMPI users] Error - BTLs attempted: self sm - on a cluster with IB and openib btl enabled

2013-08-12 Thread Ralph Castain
Check ompi_info - was it built with openib support? Then check that the mca_btl_openib library is present in the prefix/lib/openmpi directory Sounds like it isn't finding the openib plugin On Aug 12, 2013, at 11:57 AM, Gus Correa wrote: > Dear Open MPI pros > > On