Re: [OMPI devel] IBCM error

2008-08-03 Thread Pavel Shamis (Pasha)
Thanks for update. Sean Hefty wrote: I've committed a patch to my libibcm git tree with the values IB_CM_ASSIGN_SERVICE_ID IB_CM_ASSIGN_SERVICE_ID_MASK these will be in libibcm release 1.0.3, which will shortly... - Sean

Re: [OMPI devel] IBCM error

2008-07-31 Thread Sean Hefty
I've committed a patch to my libibcm git tree with the values IB_CM_ASSIGN_SERVICE_ID IB_CM_ASSIGN_SERVICE_ID_MASK these will be in libibcm release 1.0.3, which will shortly... - Sean

Re: [OMPI devel] IBCM error

2008-07-17 Thread Pavel Shamis (Pasha)
Sean Hefty wrote: It is not zero, it should be: #define IB_CM_ASSIGN_SERVICE_ID __cpu_to_be64(0x0200ULL) Unfortunately the value defined in kernel level IBCM and does not exposed to user level. Can you please expose it to user level (infiniband/cm.h) Oops - good catch. I

Re: [OMPI devel] IBCM error

2008-07-17 Thread Sean Hefty
>It is not zero, it should be: >#define IB_CM_ASSIGN_SERVICE_ID __cpu_to_be64(0x0200ULL) > >Unfortunately the value defined in kernel level IBCM and does not >exposed to user level. >Can you please expose it to user level (infiniband/cm.h) Oops - good catch. I will add the assign ID

Re: [OMPI devel] IBCM error

2008-07-17 Thread Pavel Shamis (Pasha)
Jeff Squyres wrote: On Jul 16, 2008, at 11:07 AM, Don Kerr wrote: Pasha added configure switches for this about a week ago: --en|disable-openib-ibcm --en|disable-openib-rdmacm I like these flags but I thought there was going to be a run time check for cases where Open MPI is built on a

Re: [OMPI devel] IBCM error

2008-07-17 Thread Pavel Shamis (Pasha)
Sean Hefty wrote: If you don't care what the service ID is, you can specify 0, and the kernel will assign one. The assigned value can be retrieved by calling ib_cm_attr_id(). (I'm assuming that you communicate the IDs out of band somehow.) It is not zero, it should be: #define

Re: [OMPI devel] IBCM error

2008-07-16 Thread Jeff Squyres
On Jul 16, 2008, at 11:07 AM, Don Kerr wrote: Pasha added configure switches for this about a week ago: --en|disable-openib-ibcm --en|disable-openib-rdmacm I like these flags but I thought there was going to be a run time check for cases where Open MPI is built on a system that has

Re: [OMPI devel] IBCM error

2008-07-16 Thread Don Kerr
Jeff Squyres wrote: On Jul 15, 2008, at 7:30 AM, Ralph Castain wrote: Minor clarification: we did not test RDMACM on RoadRunner. Just for further clarification - I did, and it wasn't a particularly good experience. Encountered several problems, none of them overwhelming, hence my

Re: [OMPI devel] IBCM error

2008-07-15 Thread Pavel Shamis (Pasha)
Guess what - we don't always put them out there because - tada - we don't use them! What goes out on the backend is a stripped down version of libraries we require. Given the huge number of libraries people provide (looking at the bigger, beyond OMPI picture), it consumes a lot of limited disk

Re: [OMPI devel] IBCM error

2008-07-15 Thread Ralph Castain
On 7/15/08 5:05 AM, "Jeff Squyres" wrote: > On Jul 14, 2008, at 3:04 PM, Ralph H. Castain wrote: > >> I've been quietly following this discussion, but now feel a need to >> jump >> in here. I really must disagree with the idea of building either >> IBCM or >> RDMACM

Re: [OMPI devel] IBCM error

2008-07-15 Thread Jeff Squyres
On Jul 14, 2008, at 3:04 PM, Ralph H. Castain wrote: I've been quietly following this discussion, but now feel a need to jump in here. I really must disagree with the idea of building either IBCM or RDMACM support by default. Neither of these has been proven to reliably work, or to be

Re: [OMPI devel] IBCM error

2008-07-15 Thread Pavel Shamis (Pasha)
I need to check on this. You may want to look at section A3.2.3 of the spec. If you set the first byte (network order) to 0x00, and the 2nd byte to 0x01, then you hit a 'reserved' range that probably isn't being used currently. If you don't care what the service ID is, you can specify 0,

Re: [OMPI devel] IBCM error

2008-07-14 Thread Jeff Squyres
On Jul 14, 2008, at 5:48 PM, Sean Hefty wrote: Is there a service ID range that is guaranteed to be available for user apps? I need to check on this. You may want to look at section A3.2.3 of the spec. If you set the first byte (network order) to 0x00, and the 2nd byte to 0x01, then you

Re: [OMPI devel] IBCM error

2008-07-14 Thread Sean Hefty
>Ah! I did not realize that there were other services on the machine >that were using / reserving IBCM service ID's. Intel MPI hit a similar problem a long, long time ago. >Is there a service ID range that is guaranteed to be available for >user apps? I need to check on this. You may want to

Re: [OMPI devel] IBCM error

2008-07-14 Thread Sean Hefty
>The service ID that it uses is its PID and the mask is always 0. >There will only be one call to ib_cm_listen() per device per MPI >process. > >Open MPI certainly could be buggy with IBCM, of course -- but it's >fishy that the same exact "mpirun ..." command line works one time and >fails the

Re: [OMPI devel] IBCM error

2008-07-14 Thread Jeff Squyres
On Jul 14, 2008, at 1:17 PM, Sean Hefty wrote: I talked to Sean Hefty about it, but we never figured out a definitive cause or solution. My best guess is that there is something wonky about multiple processes simultaneously interacting with the IBCM kernel driver from userspace; but I don't

Re: [OMPI devel] IBCM error

2008-07-14 Thread Ralph H. Castain
I've been quietly following this discussion, but now feel a need to jump in here. I really must disagree with the idea of building either IBCM or RDMACM support by default. Neither of these has been proven to reliably work, or to be advantageous. Our own experiences in testing them have been

Re: [OMPI devel] IBCM error

2008-07-14 Thread Sean Hefty
>I talked to Sean Hefty about it, but we never figured out a definitive >cause or solution. My best guess is that there is something wonky >about multiple processes simultaneously interacting with the IBCM >kernel driver from userspace; but I don't know jack about kernel >stuff, so that's a total

Re: [OMPI devel] IBCM error

2008-07-14 Thread Jeff Squyres
On Jul 14, 2008, at 9:21 AM, Pavel Shamis (Pasha) wrote: Should we not even build support for it? I think IBCM CPC build should be enabled by default. The IBCM is supplied with OFED so it should not be any problem during install. Ok. But remember that there are at least some OS's where

Re: [OMPI devel] IBCM error

2008-07-14 Thread Pavel Shamis (Pasha)
Should we not even build support for it? I think IBCM CPC build should be enabled by default. The IBCM is supplied with OFED so it should not be any problem during install. PRO: don't even allow the possibility of running with it, because we know that there are issues with the ibcm

Re: [OMPI devel] IBCM error

2008-07-14 Thread Jeff Squyres
On Jul 14, 2008, at 7:55 AM, Pavel Shamis (Pasha) wrote: I can add in head of query function something like : if (!mca_btl_openib_component.cpc_explicitly_defined) return OMPI_ERR_NOT_SUPPORTED; That sounds reasonable until the ibcm userspace library issues can be sorted out. Then

Re: [OMPI devel] IBCM error

2008-07-14 Thread Pavel Shamis (Pasha)
I can add in head of query function something like : if (!mca_btl_openib_component.cpc_explicitly_defined) return OMPI_ERR_NOT_SUPPORTED; Jeff Squyres wrote: On Jul 14, 2008, at 3:59 AM, Lenny Verkhovsky wrote: Seems to be fixed. Well, it's "fixed" in that Pasha turned off the error

Re: [OMPI devel] IBCM error

2008-07-14 Thread Jeff Squyres
On Jul 14, 2008, at 3:59 AM, Lenny Verkhovsky wrote: Seems to be fixed. Well, it's "fixed" in that Pasha turned off the error message. But the same issue is undoubtedly happening. I was asking for something a little stronger: perhaps we should actually have IBCM not try to be used

Re: [OMPI devel] IBCM error

2008-07-14 Thread Jeff Squyres
Right about when Brad and I discovered that issue, I ran out of time. This made IBCM more-or-less unusable for many installations -- we were kinda hoping for an OpenFabrics fix... On Jul 13, 2008, at 12:43 PM, Pavel Shamis (Pasha) wrote: Fixed in

Re: [OMPI devel] IBCM error

2008-07-14 Thread Lenny Verkhovsky
Seems to be fixed. On 7/14/08, Lenny Verkhovsky wrote: > > ../configure --with-memory-manager=ptmalloc2 --with-openib > > I guess not. I always use same configure line, and only recently I started > to see this error. > > On 7/13/08, Jeff Squyres

Re: [OMPI devel] IBCM error

2008-07-14 Thread Lenny Verkhovsky
../configure --with-memory-manager=ptmalloc2 --with-openib I guess not. I always use same configure line, and only recently I started to see this error. On 7/13/08, Jeff Squyres wrote: > > I think you said opposite things: Lenny's command line did not specifically > ask for

Re: [OMPI devel] IBCM error

2008-07-13 Thread Pavel Shamis (Pasha)
Fixed in https://svn.open-mpi.org/trac/ompi/changeset/18897 Is it any other know IBCM issue ? Regards, Pasha Jeff Squyres wrote: I think you said opposite things: Lenny's command line did not specifically ask for ibcm, but it was used anyway. Lenny -- did you explicitly request it somewhere

Re: [OMPI devel] IBCM error

2008-07-13 Thread Lenny Verkhovsky
Pasha is right, I didn't disabled it. On 7/13/08, Pavel Shamis (Pasha) wrote: > > Jeff Squyres wrote: > >> Brad and I did some scale testing of IBCM and saw this error sometimes. >> It seemed to happen with higher frequency when you increased the number of >> processes

Re: [OMPI devel] IBCM error

2008-07-13 Thread Pavel Shamis (Pasha)
Jeff Squyres wrote: Brad and I did some scale testing of IBCM and saw this error sometimes. It seemed to happen with higher frequency when you increased the number of processes on a single node. I talked to Sean Hefty about it, but we never figured out a definitive cause or solution. My

Re: [OMPI devel] IBCM error

2008-07-13 Thread Jeff Squyres
Brad and I did some scale testing of IBCM and saw this error sometimes. It seemed to happen with higher frequency when you increased the number of processes on a single node. I talked to Sean Hefty about it, but we never figured out a definitive cause or solution. My best guess is that

[OMPI devel] IBCM error

2008-07-13 Thread Lenny Verkhovsky
Hi, I am getting this error sometimes. /home/USERS/lenny/OMPI_COMP_PATH/bin/mpirun -np 100 -hostfile /home/USERS/lenny/TESTS/COMPILERS/hostfile /home/USERS/lenny/TESTS/COMPILERS/hello