Re: [OMPI devel] Segfault in 1.3 branch

2008-07-14 Thread Ralph Castain
It looks like a new issue to me, Pasha. Possibly a side consequence of the IOF change made by Jeff and I the other day. From what I can see, it looks like you app was a simple "hello" - correct? If you look at the error, the problem occurs when mpirun is trying to route a message. Since the app is

Re: [OMPI devel] IBCM error

2008-07-14 Thread Jeff Squyres
On Jul 14, 2008, at 5:48 PM, Sean Hefty wrote: Is there a service ID range that is guaranteed to be available for user apps? I need to check on this. You may want to look at section A3.2.3 of the spec. If you set the first byte (network order) to 0x00, and the 2nd byte to 0x01, then you h

Re: [OMPI devel] IBCM error

2008-07-14 Thread Sean Hefty
>Ah! I did not realize that there were other services on the machine >that were using / reserving IBCM service ID's. Intel MPI hit a similar problem a long, long time ago. >Is there a service ID range that is guaranteed to be available for >user apps? I need to check on this. You may want to l

Re: [OMPI devel] IBCM error

2008-07-14 Thread Jeff Squyres
On Jul 14, 2008, at 5:18 PM, Sean Hefty wrote: Open MPI certainly could be buggy with IBCM, of course -- but it's fishy that the same exact "mpirun ..." command line works one time and fails the next (it's kinda random when the problem occurs). I just want to make sure that service ID colli

Re: [OMPI devel] IBCM error

2008-07-14 Thread Sean Hefty
>The service ID that it uses is its PID and the mask is always 0. >There will only be one call to ib_cm_listen() per device per MPI >process. > >Open MPI certainly could be buggy with IBCM, of course -- but it's >fishy that the same exact "mpirun ..." command line works one time and >fails the next

Re: [OMPI devel] IBCM error

2008-07-14 Thread Jeff Squyres
On Jul 14, 2008, at 1:17 PM, Sean Hefty wrote: I talked to Sean Hefty about it, but we never figured out a definitive cause or solution. My best guess is that there is something wonky about multiple processes simultaneously interacting with the IBCM kernel driver from userspace; but I don't k

Re: [OMPI devel] IBCM error

2008-07-14 Thread Ralph H. Castain
I've been quietly following this discussion, but now feel a need to jump in here. I really must disagree with the idea of building either IBCM or RDMACM support by default. Neither of these has been proven to reliably work, or to be advantageous. Our own experiences in testing them have been slight

Re: [OMPI devel] IBCM error

2008-07-14 Thread Sean Hefty
>I talked to Sean Hefty about it, but we never figured out a definitive >cause or solution. My best guess is that there is something wonky >about multiple processes simultaneously interacting with the IBCM >kernel driver from userspace; but I don't know jack about kernel >stuff, so that's a total

Re: [OMPI devel] IBCM error

2008-07-14 Thread Jeff Squyres
On Jul 14, 2008, at 9:21 AM, Pavel Shamis (Pasha) wrote: Should we not even build support for it? I think IBCM CPC build should be enabled by default. The IBCM is supplied with OFED so it should not be any problem during install. Ok. But remember that there are at least some OS's where /dev

Re: [OMPI devel] IBCM error

2008-07-14 Thread Pavel Shamis (Pasha)
Should we not even build support for it? I think IBCM CPC build should be enabled by default. The IBCM is supplied with OFED so it should not be any problem during install. PRO: don't even allow the possibility of running with it, because we know that there are issues with the ibcm userspac

Re: [OMPI devel] SM latency regression

2008-07-14 Thread Terry Dontje
George Bosilca wrote: I'm tracked the SM performance over the last couple of months and I didn't notice any major change on the performance side. I guess there is the architecture factor involved in this. My tests are performed on a PPC (MAC OS X) and on a dual core AMD. What is the architect

Re: [OMPI devel] IBCM error

2008-07-14 Thread Jeff Squyres
On Jul 14, 2008, at 7:55 AM, Pavel Shamis (Pasha) wrote: I can add in head of query function something like : if (!mca_btl_openib_component.cpc_explicitly_defined) return OMPI_ERR_NOT_SUPPORTED; That sounds reasonable until the ibcm userspace library issues can be sorted out. Then perha

Re: [OMPI devel] IBCM error

2008-07-14 Thread Pavel Shamis (Pasha)
I can add in head of query function something like : if (!mca_btl_openib_component.cpc_explicitly_defined) return OMPI_ERR_NOT_SUPPORTED; Jeff Squyres wrote: On Jul 14, 2008, at 3:59 AM, Lenny Verkhovsky wrote: Seems to be fixed. Well, it's "fixed" in that Pasha turned off the error me

Re: [OMPI devel] IBCM error

2008-07-14 Thread Jeff Squyres
On Jul 14, 2008, at 3:59 AM, Lenny Verkhovsky wrote: Seems to be fixed. Well, it's "fixed" in that Pasha turned off the error message. But the same issue is undoubtedly happening. I was asking for something a little stronger: perhaps we should actually have IBCM not try to be used unles

Re: [OMPI devel] IBCM error

2008-07-14 Thread Jeff Squyres
Right about when Brad and I discovered that issue, I ran out of time. This made IBCM more-or-less unusable for many installations -- we were kinda hoping for an OpenFabrics fix... On Jul 13, 2008, at 12:43 PM, Pavel Shamis (Pasha) wrote: Fixed in https://svn.open-mpi.org/trac/ompi/changes

Re: [OMPI devel] IBCM error

2008-07-14 Thread Lenny Verkhovsky
Seems to be fixed. On 7/14/08, Lenny Verkhovsky wrote: > > ../configure --with-memory-manager=ptmalloc2 --with-openib > > I guess not. I always use same configure line, and only recently I started > to see this error. > > On 7/13/08, Jeff Squyres wrote: >> >> I think you said opposite things: Le

[OMPI devel] Segfault in 1.3 branch

2008-07-14 Thread Pavel Shamis (Pasha)
Please see http://www.open-mpi.org/mtt/index.php?do_redir=764 The error is not consistent. It takes a lot of iteration to reproduce it. In my MTT testing I seen it few times. Is it know issue ? Regards, Pasha

Re: [OMPI devel] IBCM error

2008-07-14 Thread Lenny Verkhovsky
../configure --with-memory-manager=ptmalloc2 --with-openib I guess not. I always use same configure line, and only recently I started to see this error. On 7/13/08, Jeff Squyres wrote: > > I think you said opposite things: Lenny's command line did not specifically > ask for ibcm, but it was used