Re: [OMPI devel] Infiniband memory usage with XRC

2010-05-17 Thread Pavel Shamis (Pasha)
Please see below. When using XRC queues, Open MPI is indeed creating only one XRC queue per node (instead of per-host). The problem is that the number of send elements in this queue is multiplied by the number of processes on the remote host. So, what are we getting from this ? Not much,

Re: [OMPI devel] Infiniband memory usage with XRC

2010-05-17 Thread Pavel Shamis (Pasha)
Sylvain Jeaugey wrote: The XRC protocol seems to create shared receive queues, which is a good thing. However, comparing memory used by an "X" queue versus and "S" queue, we can see a large difference. Digging a bit into the code, we found some So, do you see that X consumes more that S ?

Re: [OMPI devel] Infiniband memory usage with XRC

2010-05-23 Thread Pavel Shamis (Pasha)
a Sylvain Jeaugey wrote: On Mon, 17 May 2010, Pavel Shamis (Pasha) wrote: Sylvain Jeaugey wrote: The XRC protocol seems to create shared receive queues, which is a good thing. However, comparing memory used by an "X" queue versus and "S" queue, we can see a large difference

Re: [OMPI devel] Fwd: [OMPI users] OpenMPI with openib partitions

2008-10-07 Thread Pavel Shamis (Pasha)
- ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- -- Pavel Shamis (Pasha) Mellanox Technologies LTD. Index: ompi/mca/btl/openib/btl_openib_component.c ===

Re: [OMPI devel] Fwd: [OMPI users] OpenMPI with openib partitions

2008-10-07 Thread Pavel Shamis (Pasha)
Matt Burgess wrote: Pasha, Thanks for the patch. Unfortunately, it doesn't seem like that fixed the problem. I realized earlier I didn't mention what version of OpenMPI I was trying - it's 1.2.6. <http://1.2.6.> Should I be trying 1.2.7 with this patch? Thanks, Matt 2008/10/

[OMPI devel] OpenIB BTL - removing btl_openib_ib_pkey_ix parameter

2008-10-07 Thread Pavel Shamis (Pasha)
it. Comments ? -- Pavel Shamis (Pasha) Mellanox Technologies LTD.

Re: [OMPI devel] Fwd: [OMPI users] OpenMPI with openib partitions

2008-10-07 Thread Pavel Shamis (Pasha)
and parameter suggestion, it works! So to be clear btl_openib_ib_pkey_val is for 1.2.6 and btl_openib_of_pkey_val is for 1.2.7? Thanks again, Matt On Tue, Oct 7, 2008 at 12:24 PM, Pavel Shamis (Pasha) <pa...@dev.mellanox.co.il <mailto:pa...@dev.mellanox.co.il>> wrote: Matt,

Re: [OMPI devel] 1.3.1rc3 was borked; 1.3.1rc4 is out

2009-03-05 Thread Pavel Shamis (Pasha)
I tried to build latest OFED with new ompi rc4, but is looks that vtune code is broken again ? gcc -DHAVE_CONFIG_H -I. -I.. -I../tools/opari/lib -I../extlib/otf/otflib -I../extlib/otf/otflib -D_GNU_SOURCE -DBINDIR=\"/usr/local/mpi/gcc/openmpi-1.3.1rc4/bin\"

Re: [OMPI devel] VT compile error: Fwd: [ofa-general] OFED 1.4.1 (rc1) is available

2009-03-05 Thread Pavel Shamis (Pasha)
Adding pointer to OFED bugzilla ticket for more information: https://bugs.openfabrics.org/show_bug.cgi?id=1537 Jeff Squyres wrote: VT guys -- It looks like we still have a compile bug in OMPI 1.3.1rc4... See below. Do you think you can get a fix ASAP for OMPI 1.3.1final? Begin forwarded

Re: [OMPI devel] Meta Question -- Open MPI: Is it a dessert toppingor is it a floor wax?

2009-03-13 Thread Pavel Shamis (Pasha)
I would like to add my 0.02 to this discussion. The "code stability" it is not something that should stop project from development and progress. During MPI Project live we already saw some pretty critical changes (pml/btl/etc...) and as result after all we have more stable and more optimized MPI.

Re: [OMPI devel] [ewg] Seg fault running OpenMPI-1.3.1rc4

2009-03-30 Thread Pavel Shamis (Pasha)
I think you problem is related to this bug: https://svn.open-mpi.org/trac/ompi/ticket/1823 And it is resolved on the ompi-trunk. Pasha. Steve Wise wrote: When this happens, that node logs this type of message also in /var/log/messages: IMB-MPI1[8859]: segfault at 0018 rip

Re: [OMPI devel] ***SPAM*** Re: [ewg] Seg fault running OpenMPI-1.3.1rc4

2009-03-30 Thread Pavel Shamis (Pasha)
lt. So I think this is a new bug. Lemme pull the trunk and try that. Pavel Shamis (Pasha) wrote: I think you problem is related to this bug: https://svn.open-mpi.org/trac/ompi/ticket/1823 And it is resolved on the ompi-trunk. Pasha. Steve Wise wrote: When this happens, that node logs

Re: [OMPI devel] Open MPI mirrors list

2009-04-26 Thread Pavel Shamis (Pasha)
Thanks Jeff ! Jeff Squyres wrote: We just added a mirror web site in Israel. That one was fun because they requested to have their web tagline be in Hebrew. Fun fact: with this addition, Open MPI now has 14 mirror web sites around the world. http://www.open-mpi.org/community/mirrors/

Re: [OMPI devel] Multi-rail on openib

2009-06-09 Thread Pavel Shamis (Pasha)
Open MPI currently needs to have connected fabrics, but maybe that's something we will like to change in the future, having two separate rails. (Btw Pasha, will your current work enable this ?) I do not completely understand what do you mean here under two separate rails ... Already today you

Re: [OMPI devel] Multi-rail on openib

2009-06-14 Thread Pavel Shamis (Pasha)
Nifty Tom Mitchell wrote: On Tue, Jun 09, 2009 at 04:33:51PM +0300, Pavel Shamis (Pasha) wrote: Open MPI currently needs to have connected fabrics, but maybe that's something we will like to change in the future, having two separate rails. (Btw Pasha, will your current work enable

Re: [OMPI devel] selectively bind MPI to one HCA out of available ones

2009-07-16 Thread Pavel Shamis (Pasha)
Hi, You can select ib device used with openib btl by using follow parametres: MCA btl: parameter "btl_openib_if_include" (current value: , data source: default value) Comma-delimited list of devices/ports to be used (e.g. "mthca0,mthca1:2"; empty value means to

Re: [OMPI devel] Device failover on ob1

2009-08-03 Thread Pavel Shamis (Pasha)
Rolf, Did you compare latency/bw for failover-enabled code VS trunk ? Pasha. Rolf Vandevaart wrote: Hi folks: As some of you know, I have also been looking into implementing failover as well. I took a different approach as I am solving the problem within the openib BTL itself. This of

[OMPI devel] Trunk is brokem ?

2009-10-21 Thread Pavel Shamis (Pasha)
On my systems I see follow error: gcc -DHAVE_CONFIG_H -I. -I../../../../opal/include -I../../../../orte/include -I../../../../ompi/include -I../../../../opal/mca/paffinity/linux/plpa/src/libplpa -I../../../.. -O3 -DNDEBUG -Wall -Wundef -Wno-long-long -Wsign-compare -Wmissing-prototypes

Re: [OMPI devel] Trunk is brokem ?

2009-10-21 Thread Pavel Shamis (Pasha)
It was broken :-( I fixed it - r22119 Pasha Pavel Shamis (Pasha) wrote: On my systems I see follow error: gcc -DHAVE_CONFIG_H -I. -I../../../../opal/include -I../../../../orte/include -I../../../../ompi/include -I../../../../opal/mca/paffinity/linux/plpa/src/libplpa -I../../../.. -O3

Re: [OMPI devel] Adding support for RDMAoE devices.

2009-11-01 Thread Pavel Shamis (Pasha)
Jeff, Can you please check that we do not brake iwarp devices functionality ? Pasha Vasily Philipov wrote: The attached patch adds support for RDMAoE (RDMA over Ethernet) devices to Openib BTL. The code changes are very minimal, actually we only modified the RDMACM code to provide better

Re: [OMPI devel] [Fwd: strong ordering for data registered memory]

2009-11-13 Thread Pavel Shamis (Pasha)
Jeff Squyres wrote: On Nov 11, 2009, at 8:13 AM, Terry Dontje wrote: Sun's IB group has asked me to forward the following email to see if anyone has any comments on this email. Tastes great / less filling. :-) I think (assume) we'll be happy to implement changes like this that come from

Re: [OMPI devel] [PATCH] Not optimal SRQ resource allocation

2009-12-06 Thread Pavel Shamis (Pasha)
Jeff, During the original code review we found ,that by default we allocate SRQ with size "rd_num + sd_max" but on the SRQ we post only rd_num receive entries. It means that we do not fill the queue completely. Looks like a bug. Pasha Jeff Squyres wrote: SRQ hardware vendors -- please

[OMPI devel] Question about ompi_proc_t

2009-12-07 Thread Pavel Shamis (Pasha)
In the ompi_proc_t structure (ompi/proc/proc.h:54) we keep pointer to proc_pml - "struct mca_pml_base_endpoint_t* proc_pml" . I tired to find definition for "struct mca_pml_base_endpoint_t" , but I failed. Does somebody know where is it defined ? Regards, Pasha

Re: [OMPI devel] Question about ompi_proc_t

2009-12-08 Thread Pavel Shamis (Pasha)
9/12/7 Timothy Hayes <haye...@tcd.ie> Is it not a forward definition and then defined in the PML components individually based on their own requirements? 2009/12/7 Pavel Shamis (Pasha) <pash...@gmail.com> In the ompi_proc_t structure (ompi/proc/proc.h:54) we keep pointer to proc_pml -

Re: [OMPI devel] Question about ompi_proc_t

2009-12-08 Thread Pavel Shamis (Pasha)
ry, I think I read your question too quickly. Ignore me. :-) 2009/12/7 Timothy Hayes <haye...@tcd.ie> Is it not a forward definition and then defined in the PML components individually based on their own requirements? 2009/12/7 Pavel Shamis (Pasha) <pash...@gmail.com> In the ompi_proc_t

Re: [OMPI devel] #2163: 1.5 blocker

2010-01-14 Thread Pavel Shamis (Pasha)
Jeff Squyres wrote: Note that I just moved #2163 into "blocker" state because the bug breaks any non-SRQ-capable OpenFabrics device (e.g., both Chelsio and Neteffect RNICs). The bug came in recently with the "resize SRQ" patches. Mellanox / IBM: we are branching for 1.5 tomorrow. Can this

Re: [OMPI devel] v1.4 broken

2010-02-17 Thread Pavel Shamis (Pasha)
I'm checking this issue. Pasha Joshua Hursey wrote: I just noticed that the nightly tarball of v1.4 failed to build in the OpenIB BTL last night. The error was: - btl_openib_component.c: In function 'init_one_device': btl_openib_component.c:2089: error:

Re: [OMPI devel] openib coord teleconf

2007-06-13 Thread Pavel Shamis (Pasha)
Jeff Squyres wrote: On Jun 13, 2007, at 2:41 PM, Gleb Natapov wrote: Pasha tells me that the best times for Ishai and him are: - 2000-2030 Israel time - 1300-1300 US Eastern - 1100-1130 US Mountain - 2230-2300 India (Bangalore) Although they could also do the preceding half hour as well.

Re: [OMPI devel] problems with openib finalize

2007-07-19 Thread Pavel Shamis (Pasha)
Jeff Squyres wrote: Background: Pasha added a call in the openib BTL finalize function that will only succeed if all registered memory has been released (ibv_dealloc_pd()). Since the test app didn't call MPI_FREE_MEM, there was some memory that was still registered, and therefore the

Re: [OMPI devel] problems with openib finalize

2007-07-23 Thread Pavel Shamis (Pasha)
Just committed r15557 that adds finalize flow to mpool. So now openib should be able to release all resources in normal way. Pasha Pavel Shamis (Pasha) wrote: Jeff Squyres wrote: Background: Pasha added a call in the openib BTL finalize function that will only succeed if all registered

Re: [OMPI devel] openib btl header caching

2007-08-13 Thread Pavel Shamis (Pasha)
George Bosilca wrote: On Aug 13, 2007, at 11:28 AM, Pavel Shamis (Pasha) wrote: Jeff Squyres wrote: I guess reading the graph that Pasha sent is difficult; Pasha -- can you send the actual numbers? Ok here is the numbers on my machines: 0 bytes mvapich with header caching: 1.56 mvapich

Re: [OMPI devel] openib btl header caching

2007-08-13 Thread Pavel Shamis (Pasha)
Brian Barrett wrote: On Aug 13, 2007, at 9:33 AM, George Bosilca wrote: On Aug 13, 2007, at 11:28 AM, Pavel Shamis (Pasha) wrote: Jeff Squyres wrote: I guess reading the graph that Pasha sent is difficult; Pasha -- can you send the actual numbers? Ok here is the numbers on my machines: 0

Re: [OMPI devel] tmp XRC branches

2007-12-02 Thread Pavel Shamis (Pasha)
Jeff Squyres wrote: Are any of the XRC tmp SVN branches still relevant? Or have they now been integrated into the trunk? I ask because I see 4 XRC-related branches out there under /tmp and / tmp-public. I removed my XRC stuff from tmp repositories (3 of 4) Pasha

Re: [OMPI devel] [PATCH] openib btl: remove excess ompi_btl_openib_connect_base_open call

2007-12-06 Thread Pavel Shamis (Pasha)
:-) Nice catch. Please commit the fix. Pasha. Jeff Squyres wrote: Hah! Sweet; good catch -- feel free to delete that extra call. On Dec 5, 2007, at 6:42 PM, Jon Mason wrote: There is a double call to ompi_btl_openib_connect_base_open in mca_btl_openib_mca_setup_qps(). It looks like

Re: [OMPI devel] [PATCH] openib: clean-up connect to allow for new cm's

2007-12-12 Thread Pavel Shamis (Pasha)
Gleb Natapov wrote: On Wed, Dec 12, 2007 at 03:37:26PM +0200, Pavel Shamis (Pasha) wrote: Gleb Natapov wrote: On Tue, Dec 11, 2007 at 08:16:07PM -0500, Jeff Squyres wrote: Isn't there a better way somehow? Perhaps we should have "select" call *all* the functions

Re: [OMPI devel] Fwd: [OMPI svn-full] svn:open-mpi r16959

2007-12-16 Thread Pavel Shamis (Pasha)
I will try it. Jeff Squyres wrote: This commit does what we previously discussed: it only compiles the XOOB openib CPC if XRC support is actually present (vs. having a stub XOOB when XRC is not present). This is on the /tmp-public/openib-cpc branch. I have some hermon hca's, but due to

Re: [OMPI devel] Fwd: [OMPI svn-full] svn:open-mpi r16959

2007-12-16 Thread Pavel Shamis (Pasha)
Tested. XRC works. Pasha Pavel Shamis (Pasha) wrote: I will try it. Jeff Squyres wrote: This commit does what we previously discussed: it only compiles the XOOB openib CPC if XRC support is actually present (vs. having a stub XOOB when XRC is not present). This is on the /tmp-public

Re: [OMPI devel] openib xrc CPC minor nit

2007-12-23 Thread Pavel Shamis (Pasha)
there. It's up to the CPC's if they want to use that info or not. Any objections to me removing this #if on the openib-cpc branch? (and eventual merge back up to the trunk) I have no objections. We should remove it. -- Pavel Shamis (Pasha) Mellanox Technologies

Re: [OMPI devel] [PATCH] openib btl: extensable cpc selection enablement

2008-01-10 Thread Pavel Shamis (Pasha)
Shamis (Pasha) Mellanox Technologies

Re: [OMPI devel] [PATCH] openib btl: extensable cpc selection enablement

2008-01-14 Thread Pavel Shamis (Pasha)
Jon Mason wrote: I have few machines with connectX and i will try to run MTT on Sunday. Awesome! I appreciate it. After fixing the compilation problem in XRC part of code I was able to run mtt. Most of the test pass and one test failed: mpi2c++_dynamics_test. The test pass

Re: [OMPI devel] Open IB BTL development question

2008-01-17 Thread Pavel Shamis (Pasha)
. Thanks -DON ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Pavel Shamis (Pasha) Mellanox Technologies

Re: [OMPI devel] open ib btl and xrc

2008-01-17 Thread Pavel Shamis (Pasha)
of xrc, maybe point me to a paper, or both? TIA -DON ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Pavel Shamis (Pasha) Mellanox Technologies

Re: [OMPI devel] open ib btl and xrc

2008-01-20 Thread Pavel Shamis (Pasha)
to another paper http://www.cs.sandia.gov/~rbbrigh/papers/ompi-ib-pvmmpi07.pdf that talks about more efficient usage of receive buffers. Do you have any numbers showing the actual memory footprint savings when using xrc? I don't have. Pasha. -DON Pavel Shamis (Pasha) wrote: Here is paper from

Re: [OMPI devel] 1.3 Release schedule and contents

2008-02-21 Thread Pavel Shamis (Pasha)
--Brad Brad Benton IBM ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Pavel Shamis (Pasha) Mellanox Technologies

Re: [OMPI devel] Setting CQ depth

2008-02-26 Thread Pavel Shamis (Pasha)
_compat(hca->ib_dev_context, cq_size, ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Pavel Shamis (Pasha) Mellanox Technologies

Re: [OMPI devel] Merging in the CPC work

2008-04-24 Thread Pavel Shamis (Pasha)
- you have to manually mknod and chmod a device in /dev/infiniband for every HCA in the host). Thanks. -- Pavel Shamis (Pasha) Mellanox Technologies

Re: [OMPI devel] Merging in the CPC work

2008-04-24 Thread Pavel Shamis (Pasha)
it -- even trivial apps will work or not work. Thanks. On Apr 24, 2008, at 6:24 AM, Gleb Natapov wrote: On Thu, Apr 24, 2008 at 11:50:10AM +0300, Pavel Shamis (Pasha) wrote: Jeff, All my tests fail. XRC disabled tests failed with: mtt/installs/Zq_9/install/lib/openmpi/mca_btl_openib.so

Re: [OMPI devel] heterogeneous OpenFabrics adapters

2008-05-13 Thread Pavel Shamis (Pasha)
-- Jeff Squyres Cisco Systems ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Pavel Shamis (Pasha) Mellanox Technologies

Re: [OMPI devel] Threaded progress for CPCs

2008-05-19 Thread Pavel Shamis (Pasha)
1. When CM progress thread completes an incoming connection, it sends a command down a pipe to the main thread indicating that a new endpoint is ready to use. The pipe message will be noticed by opal_progress() in the main thread and will run a function to do all necessary housekeeping

Re: [OMPI devel] Threaded progress for CPCs

2008-05-19 Thread Pavel Shamis (Pasha)
What about moving posting of receive buffers into main thread. With SRQ it is easy: don't post anything in CPC thread. Main thread will prepost buffers automatically after first fragment received on the endpoint (in btl_openib_handle_incoming()). It still doesn't guaranty

Re: [OMPI devel] Threaded progress for CPCs

2008-05-20 Thread Pavel Shamis (Pasha)
Is it possible to have sane SRQ implementation without HW flow control? It seems pretty unlikely if the only available HW flow control is to terminate the connection. ;-) Even if we can get the iWARP semantics to work, this feels kinda icky. Perhaps I'm overreacting and this

Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-21 Thread Pavel Shamis (Pasha)
As I know only Openib kernel drivers is installed by default with distribution. But the user level - libibverbs and other openib stuff is not installed by default. User need go to the package manager and explicitly select libibverb. So if user decided to install libibverbs he had reasons for

Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-22 Thread Pavel Shamis (Pasha)
Brian and I chatted a bit about this off-list, and I think we're in agreement now: - do not change the default value or meaning of btl_base_want_component_unsed. - major point of confusion: the openib BTL is actually fairly unique in that it can (and does) tell the difference between

Re: [OMPI devel] Memory hooks stuff

2008-05-25 Thread Pavel Shamis (Pasha)
Jeff Squyres wrote: Who would be interested in discussing this stuff? (me, Brian, ? someone from Sun?, ...?) Me.

Re: [OMPI devel] New HCA vendor part ID

2008-05-26 Thread Pavel Shamis (Pasha)
it seems that Open MPI doen't have HCA parameters for the following InfiniBand adapter which using the compute nodes of our SGI Altix ICE 8200EX: Mellanox Technologies MT26418 ConnectX I assume that the vendor part ID 26418 should be listed in the section "Mellanox Hermon" of the HCA

Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-29 Thread Pavel Shamis (Pasha)
I got some more feedback from Roland off-list explaining that if /sys/ class/infiniband does exist and is non-empty and /sys/class/ infiniband_verbs/abi_version does not exist, then this is definitely a case where we want to warn because it implies that config is screwed up -- RDMA devices

[OMPI devel] r18551 - brakes ob1 compilation on Sles10

2008-06-02 Thread Pavel Shamis (Pasha)
r18551 brakes ompi compilation on SLES10 gcc 4.1.0. I got follow error on my systems (http://www.open-mpi.org/mtt/index.php?do_redir=672 ): make[2]: Entering directory `/.autodirect/hpc/work/pasha/tmp/mtt-8/installs/5VHm/src/openmpi-1.3a1r18553/ompi/mca/pml/ob1' /bin/sh ../../../../libtool

Re: [OMPI devel] r18551 - brakes ob1 compilation on SLES10

2008-06-02 Thread Pavel Shamis (Pasha)
Ralf Wildenhues wrote: * Pavel Shamis (Pasha) wrote on Mon, Jun 02, 2008 at 02:25:13PM CEST: r18551 brakes ompi compilation on SLES10 gcc 4.1.0. I got follow error on my systems (http://www.open-mpi.org/mtt/index.php?do_redir=672 ): [...] /usr/lib64/gcc/x86_64-suse-linux/4.1.0

Re: [OMPI devel] r18551 - brakes ob1 compilation on SLES10

2008-06-03 Thread Pavel Shamis (Pasha)
The compilation passed on SLES 10 SP1. So SP1 resolves the gcc/binutils issue. We need to add somewhere notice that " ompi 1.3 can not be compiled on SLES10 , please update ... " Regards, Pasha Pavel Shamis (Pasha) wrote: Ralf Wildenhues wrote: * Pavel Shamis (Pasha) wrote on M

[OMPI devel] Pallas fails

2008-06-04 Thread Pavel Shamis (Pasha)
Last conf. call Jeff mentioned that he see some collectives failures. In my MTT testing I also see that Pallas collectives failed - http://www.open-mpi.org/mtt/index.php?do_redir=682 Alltoall # # Benchmarking Alltoall #

Re: [OMPI devel] Memory hooks change testing

2008-06-11 Thread Pavel Shamis (Pasha)
Brian Barrett wrote: Did anyone get a chance to test (or think about testing) this? I'd like to commit the changes on Friday evening, if I haven't heard any negative feedback. I will run it tomorrow on my cluster. Brian On Jun 9, 2008, at 8:56 PM, Brian Barrett wrote: Hi all -

Re: [OMPI devel] Pallas fails

2008-06-12 Thread Pavel Shamis (Pasha)
(__assert_fail+0xf6) [0x2aba5e651256] [sw216:32013] [ 4] Pavel Shamis (Pasha) wrote: Last conf. call Jeff mentioned that he see some collectives failures. In my MTT testing I also see that Pallas collectives failed - http://www.open-mpi.org/mtt/index.php?do_redir=682 Alltoall

Re: [OMPI devel] Memory hooks change testing

2008-06-12 Thread Pavel Shamis (Pasha)
In my MTT testing it looks ok, too. Brad Benton wrote: Brian, This is working smoothly now: both the configuration/build and execution. So, from my standpoint, it looks good for inclusion into the trunk. --brad On Wed, Jun 11, 2008 at 4:50 PM, Brian W. Barrett

[OMPI devel] Is trunk broken ?

2008-06-19 Thread Pavel Shamis (Pasha)
I tried to run trunk on my machines and I got follow error: [sw214:04367] [[16563,1],1] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file base/grpcomm_base_modex.c at line 451 [sw214:04367] [[16563,1],1] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file

Re: [OMPI devel] Is trunk broken ?

2008-06-19 Thread Pavel Shamis (Pasha)
You'll have to tell us something more than that, Pasha. What kind of environment, what rev level were you at, etc. Ahh, sorry :) I run on Linux x86_64 Sles10 sp1. (Open MPI) 1.3a1r18682M , OFED 1.3.1 Pasha. So far as I know, the trunk is fine. On 6/19/08 12:01 PM, "Pavel Shamis (

Re: [OMPI devel] Is trunk broken ?

2008-06-19 Thread Pavel Shamis (Pasha)
., here is command line: ./bin/mpirun -np 2 -H sw214,sw214 -mca btl openib,sm,self ./osu_benchmarks-3.0/osu_latency Meantime, I'm building on RoadRunner and will test there (TM enviro). On 6/19/08 1:18 PM, "Pavel Shamis (Pasha)" <pa...@dev.mellanox.co.il> wrote: You'll

Re: [OMPI devel] Is trunk broken ?

2008-06-19 Thread Pavel Shamis (Pasha)
oblem. I'll see what I can do. For now, though, just use the default grpcomm and it will work fine. On 6/19/08 1:18 PM, "Pavel Shamis (Pasha)" <pa...@dev.mellanox.co.il> wrote: You'll have to tell us something more than that, Pasha. What kind of environment, what rev level wer

Re: [OMPI devel] Is trunk broken ?

2008-06-19 Thread Pavel Shamis (Pasha)
or build and that the component is either missing or not getting built. On 6/19/08 1:37 PM, "Pavel Shamis (Pasha)" <pa...@dev.mellanox.co.il> wrote: Ralph H Castain wrote: I can't find anything wrong so far. I'm waiting in a queue on Odin to try there since Jeff indicat

Re: [OMPI devel] open ib dependency question

2008-07-03 Thread Pavel Shamis (Pasha)
Jeff Squyres wrote: Do you need configury to disable building ibcm / rdmacm support? The more I think about it, the more I think that these would be good features to have for v1.3... I had similar issue recently. It will be nice to have option to disable/enable *CM via config flags. On Jul

Re: [OMPI devel] open ib dependency question

2008-07-03 Thread Pavel Shamis (Pasha)
Ok: https://svn.open-mpi.org/trac/ompi/ticket/1375 I think any of us could do this -- it's pretty straightforward. No guarantees on when I can get to it; my 1.3 list is already pretty long... No problem. I will take this one. Pasha. On Jul 3, 2008, at 6:20 AM, Pavel Shamis (Pasha

Re: [OMPI devel] open ib dependency question

2008-07-06 Thread Pavel Shamis (Pasha)
explicit selectionMaybe just hide the print ? Bogdan Costescu wrote: On Thu, 3 Jul 2008, Pavel Shamis (Pasha) wrote: I had similar issue recently. It will be nice to have option to disable/enable *CM via config flags. Not sure if this is related... I am looking at using a nightly 1.3 snapshot

Re: [OMPI devel] open ib dependency question

2008-07-06 Thread Pavel Shamis (Pasha)
:20 AM, Pavel Shamis (Pasha) wrote: Jeff Squyres wrote: Do you need configury to disable building ibcm / rdmacm support? The more I think about it, the more I think that these would be good features to have for v1.3... I had similar issue recently. It will be nice to have option to disable

Re: [OMPI devel] open ib dependency question

2008-07-08 Thread Pavel Shamis (Pasha)
Pasha -- do you want to open a ticket? Done. https://svn.open-mpi.org/trac/ompi/ticket/1376 Pasha.

Re: [OMPI devel] open ib dependency question

2008-07-10 Thread Pavel Shamis (Pasha)
FYI the issue was resolved - https://svn.open-mpi.org/trac/ompi/ticket/1376 Regards, Pasha Bogdan Costescu wrote: On Thu, 3 Jul 2008, Pavel Shamis (Pasha) wrote: I had similar issue recently. It will be nice to have option to disable/enable *CM via config flags. Not sure if this is related

Re: [OMPI devel] IBCM error

2008-07-13 Thread Pavel Shamis (Pasha)
Jeff Squyres wrote: Brad and I did some scale testing of IBCM and saw this error sometimes. It seemed to happen with higher frequency when you increased the number of processes on a single node. I talked to Sean Hefty about it, but we never figured out a definitive cause or solution. My

Re: [OMPI devel] IBCM error

2008-07-13 Thread Pavel Shamis (Pasha)
:56 AM, Lenny Verkhovsky wrote: Pasha is right, I didn't disabled it. On 7/13/08, Pavel Shamis (Pasha) <pa...@dev.mellanox.co.il> wrote: Jeff Squyres wrote: Brad and I did some scale testing of IBCM and saw this error sometimes. It seemed to happen with higher frequency when you inc

[OMPI devel] Segfault in 1.3 branch

2008-07-14 Thread Pavel Shamis (Pasha)
Please see http://www.open-mpi.org/mtt/index.php?do_redir=764 The error is not consistent. It takes a lot of iteration to reproduce it. In my MTT testing I seen it few times. Is it know issue ? Regards, Pasha

Re: [OMPI devel] IBCM error

2008-07-14 Thread Pavel Shamis (Pasha)
I can add in head of query function something like : if (!mca_btl_openib_component.cpc_explicitly_defined) return OMPI_ERR_NOT_SUPPORTED; Jeff Squyres wrote: On Jul 14, 2008, at 3:59 AM, Lenny Verkhovsky wrote: Seems to be fixed. Well, it's "fixed" in that Pasha turned off the error

Re: [OMPI devel] IBCM error

2008-07-14 Thread Pavel Shamis (Pasha)
Should we not even build support for it? I think IBCM CPC build should be enabled by default. The IBCM is supplied with OFED so it should not be any problem during install. PRO: don't even allow the possibility of running with it, because we know that there are issues with the ibcm

Re: [OMPI devel] Segfault in 1.3 branch

2008-07-15 Thread Pavel Shamis (Pasha)
bug to leave in the code than the one we just fixed (duplicated stdin)... It is not so rare issue, 19 failures in my MTT run (http://www.open-mpi.org/mtt/index.php?do_redir=765). Pasha Ralph On 7/14/08 1:11 AM, "Pavel Shamis (Pasha)" <pa...@dev.mellanox.co.il> wrote: Pl

Re: [OMPI devel] IBCM error

2008-07-15 Thread Pavel Shamis (Pasha)
I need to check on this. You may want to look at section A3.2.3 of the spec. If you set the first byte (network order) to 0x00, and the 2nd byte to 0x01, then you hit a 'reserved' range that probably isn't being used currently. If you don't care what the service ID is, you can specify 0,

Re: [OMPI devel] IBCM error

2008-07-15 Thread Pavel Shamis (Pasha)
Guess what - we don't always put them out there because - tada - we don't use them! What goes out on the backend is a stripped down version of libraries we require. Given the huge number of libraries people provide (looking at the bigger, beyond OMPI picture), it consumes a lot of limited disk

Re: [OMPI devel] Segfault in 1.3 branch

2008-07-15 Thread Pavel Shamis (Pasha)
l, given that this hits rarely, it probably is a more acceptable bug to leave in the code than the one we just fixed (duplicated stdin)... Ralph On 7/14/08 1:11 AM, "Pavel Shamis (Pasha)" <pa...@dev.mellanox.co.il> wrote: Please see http://www.open-mpi.org/mtt/index.php?do_redir=76

Re: [OMPI devel] IBCM error

2008-07-17 Thread Pavel Shamis (Pasha)
Sean Hefty wrote: If you don't care what the service ID is, you can specify 0, and the kernel will assign one. The assigned value can be retrieved by calling ib_cm_attr_id(). (I'm assuming that you communicate the IDs out of band somehow.) It is not zero, it should be: #define

Re: [OMPI devel] IBCM error

2008-07-17 Thread Pavel Shamis (Pasha)
Jeff Squyres wrote: On Jul 16, 2008, at 11:07 AM, Don Kerr wrote: Pasha added configure switches for this about a week ago: --en|disable-openib-ibcm --en|disable-openib-rdmacm I like these flags but I thought there was going to be a run time check for cases where Open MPI is built on a

Re: [OMPI devel] IBCM error

2008-07-17 Thread Pavel Shamis (Pasha)
Sean Hefty wrote: It is not zero, it should be: #define IB_CM_ASSIGN_SERVICE_ID __cpu_to_be64(0x0200ULL) Unfortunately the value defined in kernel level IBCM and does not exposed to user level. Can you please expose it to user level (infiniband/cm.h) Oops - good catch. I

Re: [OMPI devel] trunk hangs since r19010

2008-07-29 Thread Pavel Shamis (Pasha)
Jeff Squyres wrote: This used to be true, but I think we changed it a while ago (Pasha: do you remember?) because Mellanox HCAs are capable of send-to-self (process) and there were no code changes necessary to enable it. So it allowed a slightly simpler command line. This was quite a while

Re: [OMPI devel] IBCM error

2008-08-03 Thread Pavel Shamis (Pasha)
Thanks for update. Sean Hefty wrote: I've committed a patch to my libibcm git tree with the values IB_CM_ASSIGN_SERVICE_ID IB_CM_ASSIGN_SERVICE_ID_MASK these will be in libibcm release 1.0.3, which will shortly... - Sean

Re: [OMPI devel] openib component lock

2008-08-06 Thread Pavel Shamis (Pasha)
Jeff Squyres wrote: In working on https://svn.open-mpi.org/trac/ompi/ticket/1434, I see fairly inconsistent use of the mca_openib_btl_component.ib_lock, such as within btl_openib_proc.c. In fixing #1434, do I need to be mindful doing all the locking properly, or is it already so broken that