Thanks for update.
Sean Hefty wrote:
I've committed a patch to my libibcm git tree with the values
IB_CM_ASSIGN_SERVICE_ID
IB_CM_ASSIGN_SERVICE_ID_MASK
these will be in libibcm release 1.0.3, which will shortly...
- Sean
I've committed a patch to my libibcm git tree with the values
IB_CM_ASSIGN_SERVICE_ID
IB_CM_ASSIGN_SERVICE_ID_MASK
these will be in libibcm release 1.0.3, which will shortly...
- Sean
Sean Hefty wrote:
It is not zero, it should be:
#define IB_CM_ASSIGN_SERVICE_ID __cpu_to_be64(0x0200ULL)
Unfortunately the value defined in kernel level IBCM and does not
exposed to user level.
Can you please expose it to user level (infiniband/cm.h)
Oops - good catch. I
>It is not zero, it should be:
>#define IB_CM_ASSIGN_SERVICE_ID __cpu_to_be64(0x0200ULL)
>
>Unfortunately the value defined in kernel level IBCM and does not
>exposed to user level.
>Can you please expose it to user level (infiniband/cm.h)
Oops - good catch. I will add the assign ID
Jeff Squyres wrote:
On Jul 16, 2008, at 11:07 AM, Don Kerr wrote:
Pasha added configure switches for this about a week ago:
--en|disable-openib-ibcm
--en|disable-openib-rdmacm
I like these flags but I thought there was going to be a run time
check for cases where Open MPI is built on a
Sean Hefty wrote:
If you don't care what the service ID is, you can specify 0, and the kernel will
assign one. The assigned value can be retrieved by calling ib_cm_attr_id().
(I'm assuming that you communicate the IDs out of band somehow.)
It is not zero, it should be:
#define
On Jul 16, 2008, at 11:07 AM, Don Kerr wrote:
Pasha added configure switches for this about a week ago:
--en|disable-openib-ibcm
--en|disable-openib-rdmacm
I like these flags but I thought there was going to be a run time
check for cases where Open MPI is built on a system that has
Jeff Squyres wrote:
On Jul 15, 2008, at 7:30 AM, Ralph Castain wrote:
Minor clarification: we did not test RDMACM on RoadRunner.
Just for further clarification - I did, and it wasn't a particularly
good
experience. Encountered several problems, none of them overwhelming,
hence
my
Guess what - we don't always put them out there because - tada - we don't
use them! What goes out on the backend is a stripped down version of
libraries we require. Given the huge number of libraries people provide
(looking at the bigger, beyond OMPI picture), it consumes a lot of limited
disk
On 7/15/08 5:05 AM, "Jeff Squyres" wrote:
> On Jul 14, 2008, at 3:04 PM, Ralph H. Castain wrote:
>
>> I've been quietly following this discussion, but now feel a need to
>> jump
>> in here. I really must disagree with the idea of building either
>> IBCM or
>> RDMACM
On Jul 14, 2008, at 3:04 PM, Ralph H. Castain wrote:
I've been quietly following this discussion, but now feel a need to
jump
in here. I really must disagree with the idea of building either
IBCM or
RDMACM support by default. Neither of these has been proven to
reliably
work, or to be
I need to check on this. You may want to look at section A3.2.3 of
the spec.
If you set the first byte (network order) to 0x00, and the 2nd byte
to 0x01,
then you hit a 'reserved' range that probably isn't being used
currently.
If you don't care what the service ID is, you can specify 0,
On Jul 14, 2008, at 5:48 PM, Sean Hefty wrote:
Is there a service ID range that is guaranteed to be available for
user apps?
I need to check on this. You may want to look at section A3.2.3 of
the spec.
If you set the first byte (network order) to 0x00, and the 2nd byte
to 0x01,
then you
>Ah! I did not realize that there were other services on the machine
>that were using / reserving IBCM service ID's.
Intel MPI hit a similar problem a long, long time ago.
>Is there a service ID range that is guaranteed to be available for
>user apps?
I need to check on this. You may want to
>The service ID that it uses is its PID and the mask is always 0.
>There will only be one call to ib_cm_listen() per device per MPI
>process.
>
>Open MPI certainly could be buggy with IBCM, of course -- but it's
>fishy that the same exact "mpirun ..." command line works one time and
>fails the
On Jul 14, 2008, at 1:17 PM, Sean Hefty wrote:
I talked to Sean Hefty about it, but we never figured out a
definitive
cause or solution. My best guess is that there is something wonky
about multiple processes simultaneously interacting with the IBCM
kernel driver from userspace; but I don't
I've been quietly following this discussion, but now feel a need to jump
in here. I really must disagree with the idea of building either IBCM or
RDMACM support by default. Neither of these has been proven to reliably
work, or to be advantageous. Our own experiences in testing them have been
>I talked to Sean Hefty about it, but we never figured out a definitive
>cause or solution. My best guess is that there is something wonky
>about multiple processes simultaneously interacting with the IBCM
>kernel driver from userspace; but I don't know jack about kernel
>stuff, so that's a total
On Jul 14, 2008, at 9:21 AM, Pavel Shamis (Pasha) wrote:
Should we not even build support for it?
I think IBCM CPC build should be enabled by default. The IBCM is
supplied with OFED so it should not be any problem during install.
Ok. But remember that there are at least some OS's where
Should we not even build support for it?
I think IBCM CPC build should be enabled by default. The IBCM is
supplied with OFED so it should not be any problem during install.
PRO: don't even allow the possibility of running with it, because we
know that there are issues with the ibcm
On Jul 14, 2008, at 7:55 AM, Pavel Shamis (Pasha) wrote:
I can add in head of query function something like :
if (!mca_btl_openib_component.cpc_explicitly_defined)
return OMPI_ERR_NOT_SUPPORTED;
That sounds reasonable until the ibcm userspace library issues can be
sorted out. Then
I can add in head of query function something like :
if (!mca_btl_openib_component.cpc_explicitly_defined)
return OMPI_ERR_NOT_SUPPORTED;
Jeff Squyres wrote:
On Jul 14, 2008, at 3:59 AM, Lenny Verkhovsky wrote:
Seems to be fixed.
Well, it's "fixed" in that Pasha turned off the error
On Jul 14, 2008, at 3:59 AM, Lenny Verkhovsky wrote:
Seems to be fixed.
Well, it's "fixed" in that Pasha turned off the error message. But
the same issue is undoubtedly happening.
I was asking for something a little stronger: perhaps we should
actually have IBCM not try to be used
Right about when Brad and I discovered that issue, I ran out of time.
This made IBCM more-or-less unusable for many installations -- we were
kinda hoping for an OpenFabrics fix...
On Jul 13, 2008, at 12:43 PM, Pavel Shamis (Pasha) wrote:
Fixed in
Seems to be fixed.
On 7/14/08, Lenny Verkhovsky wrote:
>
> ../configure --with-memory-manager=ptmalloc2 --with-openib
>
> I guess not. I always use same configure line, and only recently I started
> to see this error.
>
> On 7/13/08, Jeff Squyres
../configure --with-memory-manager=ptmalloc2 --with-openib
I guess not. I always use same configure line, and only recently I started
to see this error.
On 7/13/08, Jeff Squyres wrote:
>
> I think you said opposite things: Lenny's command line did not specifically
> ask for
Fixed in https://svn.open-mpi.org/trac/ompi/changeset/18897
Is it any other know IBCM issue ?
Regards,
Pasha
Jeff Squyres wrote:
I think you said opposite things: Lenny's command line did not
specifically ask for ibcm, but it was used anyway. Lenny -- did you
explicitly request it somewhere
Pasha is right, I didn't disabled it.
On 7/13/08, Pavel Shamis (Pasha) wrote:
>
> Jeff Squyres wrote:
>
>> Brad and I did some scale testing of IBCM and saw this error sometimes.
>> It seemed to happen with higher frequency when you increased the number of
>> processes
Jeff Squyres wrote:
Brad and I did some scale testing of IBCM and saw this error
sometimes. It seemed to happen with higher frequency when you
increased the number of processes on a single node.
I talked to Sean Hefty about it, but we never figured out a definitive
cause or solution. My
Brad and I did some scale testing of IBCM and saw this error
sometimes. It seemed to happen with higher frequency when you
increased the number of processes on a single node.
I talked to Sean Hefty about it, but we never figured out a definitive
cause or solution. My best guess is that
Hi,
I am getting this error sometimes.
/home/USERS/lenny/OMPI_COMP_PATH/bin/mpirun -np 100 -hostfile
/home/USERS/lenny/TESTS/COMPILERS/hostfile
/home/USERS/lenny/TESTS/COMPILERS/hello
31 matches
Mail list logo