[OMPI devel] SM BTL NUMA awareness patches

2008-05-28 Thread Gleb Natapov
alway welcome. -- Gleb. commit 883db5e1ce8c3b49cc1376e6acf9c2d5d0d77983 Author: Gleb Natapov List-Post: devel@lists.open-mpi.org Date: Tue May 27 14:55:11 2008 +0300 Add functions to maffinity. diff --git a/opal/mca/maffinity/base/base.h b/opal/mca/maffinity/base

Re: [OMPI devel] Open MPI session directory location

2008-05-27 Thread Gleb Natapov
On Tue, May 27, 2008 at 08:27:49AM -0600, Ralph H Castain wrote: > -mca orte_tmpdir_base foo Thanks! It works. But this parameter is not reported by ompi_info :( > > > > On 5/27/08 8:24 AM, "Gleb Natapov" wrote: > > > Hi, > > > > Is there

[OMPI devel] Open MPI session directory location

2008-05-27 Thread Gleb Natapov
Hi, Is there a way to change where Open MPI creates session directory. I can't find mca parameter that specifies this. -- Gleb.

Re: [OMPI devel] Memory hooks stuff

2008-05-26 Thread Gleb Natapov
On Sun, May 25, 2008 at 10:54:23AM -0400, Patrick Geoffray wrote: > Jeff Squyres wrote: > > That would also be great. I don't know anything about these mmu > > notifiers (I'm not much of a kernel guy), but anything that allows us > > It's what Quadrics used for years in True64. Instead of try

Re: [OMPI devel] Memory hooks stuff

2008-05-23 Thread Gleb Natapov
On Fri, May 23, 2008 at 07:19:01AM -0400, Jeff Squyres wrote: > Brian and I were chatting the other day about random OMPI stuff and > the topic of the memory hooks came up again. Brian was wondering if > we should [finally] revisit this topic -- there's a few things that > could be done to m

Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-23 Thread Gleb Natapov
On Thu, May 22, 2008 at 08:30:52PM +, Dirk Eddelbuettel wrote: > > > Also, if this test depends on the Debian kernel packages, then we're > > > back to square one as some folks (like myself) run binary kernels, > > > other may just hand-compile and this test may not work as we may miss > > > th

Re: [OMPI devel] RFC: Linuxes shipping libibverbs

2008-05-23 Thread Gleb Natapov
On Thu, May 22, 2008 at 04:19:05PM -0400, Jeff Squyres wrote: > On May 22, 2008, at 4:07 PM, Dirk Eddelbuettel wrote: > > > Is there a test I could run for you? > > Can you see if /dev/infiniband exists? If it does, the OpenFabrics > kernel drivers are running. If not, they aren't. Either tha

Re: [OMPI devel] Threaded progress for CPCs

2008-05-20 Thread Gleb Natapov
On Mon, May 19, 2008 at 01:38:53PM -0400, Jeff Squyres wrote: > >> 5. ...? > > What about moving posting of receive buffers into main thread. With > > SRQ it is easy: don't post anything in CPC thread. Main thread will > > prepost buffers automatically after first fragment received on the > > endpo

Re: [OMPI devel] Threaded progress for CPCs

2008-05-19 Thread Gleb Natapov
On Mon, May 19, 2008 at 01:52:22PM -0500, Jon Mason wrote: > On Mon, May 19, 2008 at 05:17:57PM +0300, Gleb Natapov wrote: > > On Mon, May 19, 2008 at 05:08:17PM +0300, Pavel Shamis (Pasha) wrote: > > > >> 5. ...? > > > >> > > > > What about

Re: [OMPI devel] Threaded progress for CPCs

2008-05-19 Thread Gleb Natapov
On Mon, May 19, 2008 at 07:39:13PM +0300, Pavel Shamis (Pasha) wrote: So this solution will cost 1 buffer on each srq ... sounds acceptable for me. But I don't see too much difference compared to #1, as I understand we anyway will be need the pipe for communication with mai

Re: [OMPI devel] Threaded progress for CPCs

2008-05-19 Thread Gleb Natapov
On Mon, May 19, 2008 at 05:08:17PM +0300, Pavel Shamis (Pasha) wrote: > >> 5. ...? > >> > > What about moving posting of receive buffers into main thread. With > > SRQ it is easy: don't post anything in CPC thread. Main thread will > > prepost buffers automatically after first fragment receive

Re: [OMPI devel] Threaded progress for CPCs

2008-05-19 Thread Gleb Natapov
On Sun, May 18, 2008 at 11:38:36AM -0400, Jeff Squyres wrote: > ==> Remember that the goal for this work was to have a separate > progress thread *without* all the heavyweight OMPI thread locks. > Specifically: make it work in a build without --enable-progress- > threads or --enable-mpi-threa

Re: [OMPI devel] openib btl code review

2008-05-18 Thread Gleb Natapov
> that Nysal found. > > Please see the most recent patch on the ticket. Looks good to me. > > > > On May 15, 2008, at 11:01 AM, Jeff Squyres wrote: > > > On May 15, 2008, at 8:46 AM, Gleb Natapov wrote: > > > >>> Any other reviewe

Re: [OMPI devel] openib btl code review

2008-05-15 Thread Gleb Natapov
On Thu, May 15, 2008 at 08:14:29AM -0400, Jeff Squyres wrote: > Pasha tells me he'll be able to review the patch next week, so I'll > wait to commit until then. I added the patch to the ticket, just so > that it doesn't get lost. > > Any other reviewers would be welcome... :-) I'll look at i

Re: [OMPI devel] Unbelievable situation BUG

2008-04-27 Thread Gleb Natapov
On Sun, Apr 27, 2008 at 07:00:57PM +0300, Lenny Verkhovsky wrote: > Hi, all > > I faced the "Unbelievable situation" The situation is believable, but commit r18274, that adds this output, is not, as it doesn't take into account sequence number wrap around. > > during running IMB benchmark. > >

Re: [OMPI devel] Merging in the CPC work

2008-04-24 Thread Gleb Natapov
On Thu, Apr 24, 2008 at 11:50:10AM +0300, Pavel Shamis (Pasha) wrote: > Jeff, > All my tests fail. > XRC disabled tests failed with: > mtt/installs/Zq_9/install/lib/openmpi/mca_btl_openib.so: undefined > symbol: rdma_create_event_channel > XRC enabled failed with segfault , I will take a look late

Re: [OMPI devel] Affect of compression on modex and launch messages

2008-04-07 Thread Gleb Natapov
On Mon, Apr 07, 2008 at 07:54:38AM -0600, Ralph H Castain wrote: > > > > On 4/7/08 7:45 AM, "Gleb Natapov" wrote: > > > On Mon, Apr 07, 2008 at 07:28:07AM -0600, Ralph H Castain wrote: > >>> Also can you explain how > >>> allgather is i

Re: [OMPI devel] Affect of compression on modex and launch messages

2008-04-07 Thread Gleb Natapov
On Mon, Apr 07, 2008 at 07:28:07AM -0600, Ralph H Castain wrote: > > Also can you explain how > > allgather is implemented in orte (sorry if you already explained this once > > and I missed it). > > The default method is for each proc to send its modex data to its local > daemon. The local daemon

Re: [OMPI devel] Affect of compression on modex and launch messages

2008-04-07 Thread Gleb Natapov
On Mon, Apr 07, 2008 at 07:07:38AM -0600, Ralph H Castain wrote: > > > > On 4/7/08 7:04 AM, "Gleb Natapov" wrote: > > > On Fri, Apr 04, 2008 at 10:52:38AM -0600, Ralph H Castain wrote: > >> With compression "on", you will get output telling

Re: [OMPI devel] Affect of compression on modex and launch messages

2008-04-07 Thread Gleb Natapov
On Fri, Apr 04, 2008 at 10:52:38AM -0600, Ralph H Castain wrote: > With compression "on", you will get output telling you the original size of > the message and its compressed size so you can see what was done. > I see this output: uncompressed allgather msg orig size 67521 compressed size 4162.

Re: [OMPI devel] RFC: changes to modex

2008-04-03 Thread Gleb Natapov
On Thu, Apr 03, 2008 at 07:05:28AM -0600, Ralph H Castain wrote: > H...since I have no control nor involvement in what gets sent, perhaps I > can be a disinterested third party. ;-) > > Could you perhaps explain this comment: > > > BTW I looked at how we do modex now on the trunk. For OOB cas

Re: [OMPI devel] RFC: changes to modex

2008-04-03 Thread Gleb Natapov
On Wed, Apr 02, 2008 at 08:41:14PM -0400, Jeff Squyres wrote: > >> that it's the same for all procs on all hosts. I guess there's a few > >> cases: > >> > >> 1. homogeneous include/exclude, no carto: send all in node info; no > >> proc info > >> 2. homogeneous include/exclude, carto is used: send

Re: [OMPI devel] RFC: changes to modex

2008-04-02 Thread Gleb Natapov
On Wed, Apr 02, 2008 at 03:45:20PM -0400, Jeff Squyres wrote: > On Apr 2, 2008, at 1:58 PM, Gleb Natapov wrote: > >> No, I think it would be fine to only send the output after > >> btl_openib_if_in|exclude is applied. Perhaps we need an MCA param to > >> say "

Re: [OMPI devel] RFC: changes to modex

2008-04-02 Thread Gleb Natapov
On Wed, Apr 02, 2008 at 12:08:47PM -0400, Jeff Squyres wrote: > On Apr 2, 2008, at 11:13 AM, Gleb Natapov wrote: > > On Wed, Apr 02, 2008 at 10:35:03AM -0400, Jeff Squyres wrote: > >> If we use carto to limit hcas/ports are used on a given host on a > >> per- > >

Re: [OMPI devel] RFC: changes to modex

2008-04-02 Thread Gleb Natapov
On Wed, Apr 02, 2008 at 10:35:03AM -0400, Jeff Squyres wrote: > If we use carto to limit hcas/ports are used on a given host on a per- > proc basis, then we can include some proc_send data to say "this proc > only uses indexes X,Y,Z from the node data". The indexes can be > either uint8_ts, o

Re: [OMPI devel] RFC: changes to modex

2008-04-02 Thread Gleb Natapov
On Wed, Apr 02, 2008 at 10:21:12AM -0400, Jeff Squyres wrote: > * int ompi_modex_proc_send(...): send modex data that is specific to > this process. It is just about exactly the same as the current API > call (ompi_modex_send). > [skip] > > * int ompi_modex_node_send(...): send modex dat

Re: [OMPI devel] Switching away from SVN?

2008-03-24 Thread Gleb Natapov
On Fri, Mar 21, 2008 at 08:52:03AM -0400, Jeff Squyres wrote: > Cool -- thanks Roland! > > For anyone who wants to play with the entire history of OMPI in git > (as of last night or so -- this git repository is *not* being kept in > sync with SVN), I cloned the tree that Roland created and put

Re: [OMPI devel] OMPI OpenIB Credit Schema breaks Chelsio HW

2008-03-11 Thread Gleb Natapov
On Mon, Mar 10, 2008 at 01:52:22PM -0500, Steve Wise wrote: > >Does OMPI do lazy dereg to maintain a cache of registered user buffers? Not by default. You'll have to use -mca mpi_leave_pinned 1 to enable lazy dereg. -- Gleb.

Re: [OMPI devel] OMPI OpenIB Credit Schema breaks Chelsio HW

2008-03-10 Thread Gleb Natapov
On Mon, Mar 10, 2008 at 09:50:13AM -0500, Steve Wise wrote: > > I personally don't like the idea to add another layer of complexity to > > openib > > BTL code just to work around HW that doesn't follow spec. If work around > > is simple that is OK, but in this case it is not so simple and will add

Re: [OMPI devel] OMPI OpenIB Credit Schema breaks Chelsio HW

2008-03-09 Thread Gleb Natapov
On Sun, Mar 09, 2008 at 02:48:09PM -0500, Jon Mason wrote: > Issue (as described by Steve Wise): > > Currently OMPI uses qp 0 for all credit updates (by design). This breaks > when running over the chelsio rnic due to a race condition between > advertising the availability of a buffer using qp0 w

Re: [OMPI devel] orte can't launch process

2008-03-06 Thread Gleb Natapov
On Thu, Mar 06, 2008 at 07:49:13AM -0500, Tim Prins wrote: > Sorry about that. I removed a field in a structure, then 'svn up' seems > to have added it back, so we were using a field that should not even > exist in a couple places. > > Should be fixed in r17757 Works again. Thanks --

[OMPI devel] orte can't launch process

2008-03-06 Thread Gleb Natapov
Something is broken in the trunk. # mpirun -np 2 -H host1,host2 ./osu_latency -- Some of the requested hosts are not included in the current allocation. The requested hosts were specified with --host as: host1,host2 Please

Re: [OMPI devel] RDMA pipeline

2008-02-21 Thread Gleb Natapov
ays enable pipeline. We may even not expose it to users, but set it automatically if message logging is enabled. > Thanks, > george. > > On Feb 20, 2008, at 4:29 AM, Gleb Natapov wrote: > >> On Tue, Feb 19, 2008 at 10:40:46PM -0500, George Bosilca wrote: >>&g

Re: [OMPI devel] RDMA pipeline

2008-02-20 Thread Gleb Natapov
On Tue, Feb 19, 2008 at 10:40:46PM -0500, George Bosilca wrote: > Actually, it restores the original behavior. The RDMA operations were > pipelined before the r15247 commit, independent of the fact that they > had mpool or not. We were actively using this behavior in the message > logging fra

Re: [OMPI devel] RDMA pipeline

2008-02-19 Thread Gleb Natapov
On Tue, Feb 19, 2008 at 02:13:30PM -0500, George Bosilca wrote: > Few days ago during some testing I realize that the RDMA pipeline was > disabled for MX and Elan (I didn't check for the others). A quick look > into the source code, pinpointed the problem into the pml_ob1_rdma.c > file, and i

Re: [OMPI devel] btl_openib_rnr_retry MCA param

2008-02-13 Thread Gleb Natapov
t; (right now > it says RNR happened, and goes into detail into what that means -- but > that's not the real problem). > Good point. > I'll do that as well. Thanks! > > > On Feb 13, 2008, at 12:59 AM, Gleb Natapov wrote: > > > On Tue, Feb 12, 2008 at

Re: [OMPI devel] [RFC] Remove explicit call to progress() from ob1.

2008-02-13 Thread Gleb Natapov
gt; The much better question is "Why they are necessary?", because if there is not good answer to this question then they should be removed, since they are harmful as they cause uncontrollable recursion calls. > > On Feb 12, 2008, at 5:27 AM, Gleb Natapov wrote: > >

Re: [OMPI devel] btl_openib_rnr_retry MCA param

2008-02-13 Thread Gleb Natapov
On Tue, Feb 12, 2008 at 05:41:13PM -0500, Jeff Squyres wrote: > I see that in the OOB CPC for the openib BTL, when setting up the send > side of the QP, we set the rnr_retry value depending on whether the > remote receive queue is a per-peer or SRQ: > > - SRQ: btl_openib_rnr_retry MCA param va

Re: [OMPI devel] Something wrong with vt?

2008-02-12 Thread Gleb Natapov
ke distclean, configure, make' ? > Which version of the autotools are you using? > > > Matthias > > On Mo, 2008-02-11 at 11:42 +0200, Gleb Natapov wrote: > > > I get the following error while "make install": > > > > make[2]: Entering director

[OMPI devel] [RFC] Remove explicit call to progress() from ob1.

2008-02-12 Thread Gleb Natapov
Hi, I am planning to commit the following patch. Those two progress() calls are responsible for most of our deep recursion troubles. And I also think they are completely unnecessary. diff --git a/ompi/mca/pml/ob1/pml_ob1_recvreq.c b/ompi/mca/pml/ob1/pml_ob1_recvreq.c index 5899243..641176e 10064

[OMPI devel] Something wrong with vt?

2008-02-11 Thread Gleb Natapov
I get the following error while "make install": make[2]: Entering directory `/home_local/glebn/build_dbg/ompi/contrib/vt' Making install in vt make[3]: Entering directory `/home_local/glebn/build_dbg/ompi/contrib/vt/vt' make[3]: *** No rule to make target `install'. Stop. make[3]: Leaving directo

Re: [OMPI devel] 32 bit udapl warnings

2008-01-31 Thread Gleb Natapov
On Thu, Jan 31, 2008 at 08:45:54AM -0500, Don Kerr wrote: > This was brought to my attention once before but I don't see this > message so I just plain forgot about it. :-( > uDAPL defines its pointers as uint64, "typedef DAT_UINT64 DAT_VADDR", > and pval is a "void *" which is why the message co

Re: [OMPI devel] open ib btl and xrc

2008-01-20 Thread Gleb Natapov
On Fri, Jan 18, 2008 at 11:43:03AM -0500, Jeff Squyres wrote: > I think the main savings is that mellanox hardware works better when > fewer qp's are open. I.e., it's a resource issue on the HCA, not > necessarily a savings in posting buffers to the qp. Interesting. I hear this justification o

Re: [OMPI devel] [PATCH] openib btl: extensable cpcselection enablement

2008-01-14 Thread Gleb Natapov
On Mon, Jan 14, 2008 at 08:15:23AM -0500, Jeff Squyres (jsquyres) wrote: > Any obj to bringing this stuff to the trunk? The moden string opt stuff can > be done directly on the trunk imo. Go ahead. -- Gleb.

[OMPI devel] ptmalloc and pin down cache problems again

2008-01-07 Thread Gleb Natapov
Hi Brian, I encountered problem with ptmalloc an registration cache. I see that you (I think it was you) disabled shrinking of a heap memory allocated by sbrk by setting MORECORE_CANNOT_TRIM to 1. The comment explains that it should be done because freeing of small objects is not reentrant so if

Re: [OMPI devel] Common initialization code for IB.

2008-01-07 Thread Gleb Natapov
On Thu, Jan 03, 2008 at 09:27:14AM -0500, Jeff Squyres wrote: > > Another > > problem is how multicast collective knows that all processes in a > > communicator are reachable via the same network, do we have a > > mechanism > > in ompi to check this? > > > Good question. > > Perhaps the common

[OMPI devel] Common initialization code for IB.

2008-01-03 Thread Gleb Natapov
Hi, In Paris we've talked about putting HCA discovery and initialization code outside of openib BTL so other components that want to use IB will be able to share common code, data and registration cache. Other components I am thinking about are ofud and multicast collectives. I started to look a

Re: [OMPI devel] [ofa-general] [RFC] XRC -- make receiving XRC QP independent of any one user process

2007-12-25 Thread Gleb Natapov
On Mon, Dec 24, 2007 at 11:49:37PM +, Tang, Changqing wrote: > > > > -Original Message- > > From: Pavel Shamis (Pasha) [mailto:pa...@dev.mellanox.co.il] > > Sent: Monday, December 24, 2007 8:03 AM > > To: Tang, Changqing > > Cc: Jack Morgenstein; Roland Dreier; > > gene...@lists.openf

Re: [OMPI devel] openib xrc CPC minor nit

2007-12-21 Thread Gleb Natapov
On Thu, Dec 20, 2007 at 05:39:36PM -0500, Jeff Squyres wrote: > Pasha -- > > I notice in the port info struct that you have a member for the lid, > but only #if HAVE_XRC. Per a comment in the code, this is supposed to > save bytes when we're using OOB (because we don't need this value in >

Re: [OMPI devel] matching code rewrite in OB1

2007-12-18 Thread Gleb Natapov
rios. > > What do you think ? I think that coverage testing I did is enough for this code. > Rich > > > On 12/17/07 8:32 AM, "Gleb Natapov" wrote: > > > On Thu, Dec 13, 2007 at 08:04:21PM -0500, Richard Graham wrote: > >> > Yes, should be a bit more

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r16969

2007-12-17 Thread Gleb Natapov
On Mon, Dec 17, 2007 at 10:53:26AM -0500, Jeff Squyres wrote: > Gleb - > > Is this picture of the v1.3 long message params accurate? (see attached) Yes. -- Gleb.

Re: [OMPI devel] matching code rewrite in OB1

2007-12-17 Thread Gleb Natapov
On Thu, Dec 13, 2007 at 08:04:21PM -0500, Richard Graham wrote: > Yes, should be a bit more clear. Need an independent way to verify that > data is matched > in the correct order ­ sending this information as payload is one way to do > this. So, > sending unique data in every message, and makin

Re: [OMPI devel] rb rcache component

2007-12-15 Thread Gleb Natapov
On Sat, Dec 15, 2007 at 08:27:29AM -0500, Jeff Squyres wrote: > It doesn't look like this component is used anymore > (it's .ompi_ignore'd). > > Anyone object to svn rm'ing it on the trunk? > Not me. -- Gleb.

Re: [OMPI devel] New BTL parameter

2007-12-14 Thread Gleb Natapov
If there is no objection I will commit this to the trunk next week. On Sun, Dec 09, 2007 at 05:34:30PM +0200, Gleb Natapov wrote: > Hi, > > Currently BTL has parameter btl_min_send_size that is no longer used. > I want to change it to be btl_rndv_eager_limit. This new pa

Re: [OMPI devel] matching code rewrite in OB1

2007-12-14 Thread Gleb Natapov
verification, this would > also be good - actually better than hoping that one will hit out-of-order > situations. > > Rich > > > On 12/14/07 2:20 AM, "Gleb Natapov" wrote: > > > On Thu, Dec 13, 2007 at 06:16:49PM -0500, Richard Graham wrote: > >

Re: [OMPI devel] matching code rewrite in OB1

2007-12-14 Thread Gleb Natapov
On Thu, Dec 13, 2007 at 06:16:49PM -0500, Richard Graham wrote: > The situation that needs to be triggered, just as George has mentions, is > where we have a lot of unexpected messages, to make sure that when one that > we can match against comes in, all the unexpected messages that can be > matche

Re: [OMPI devel] [PATCH] openib: clean-up connect to allow for new cm's

2007-12-13 Thread Gleb Natapov
On Thu, Dec 13, 2007 at 10:49:45AM +0200, Pavel Shamis (Pasha) wrote: >> Because we want to support mixed setups and create XRC between nodes that >> support it and RC between all other nodes. >> > Ok, sounds reasonable for me. Just need make sure that the parameters name > will be user friendl

Re: [OMPI devel] matching code rewrite in OB1

2007-12-13 Thread Gleb Natapov
On Wed, Dec 12, 2007 at 03:10:10PM -0600, Brian W. Barrett wrote: > On Wed, 12 Dec 2007, Gleb Natapov wrote: > > > On Wed, Dec 12, 2007 at 03:46:10PM -0500, Richard Graham wrote: > >> This is better than nothing, but really not very helpful for looking at the > >>

Re: [OMPI devel] New BTL parameter

2007-12-13 Thread Gleb Natapov
On Wed, Dec 12, 2007 at 01:18:10PM -0800, Paul H. Hargrove wrote: > Gleb Natapov wrote: > > On Wed, Dec 12, 2007 at 02:03:02PM -0500, Jeff Squyres wrote: > > > >> On Dec 9, 2007, at 10:34 AM, Gleb Natapov wrote: > >> > >> > >>> Cu

Re: [OMPI devel] matching code rewrite in OB1

2007-12-12 Thread Gleb Natapov
On Wed, Dec 12, 2007 at 03:52:17PM -0500, Jeff Squyres wrote: > On Dec 12, 2007, at 3:20 PM, Gleb Natapov wrote: > > >> How about making a tarball with this patch in it that can be thrown > >> at > >> everyone's MTT? (we can put the tarball on www.open-mpi.

Re: [OMPI devel] matching code rewrite in OB1

2007-12-12 Thread Gleb Natapov
ange one of them to do so. > Rich > > > On 12/12/07 3:20 PM, "Gleb Natapov" wrote: > > > On Wed, Dec 12, 2007 at 11:57:11AM -0500, Jeff Squyres wrote: > >> Gleb -- > >> > >> How about making a tarball with this patch in it that can be thrown at

Re: [OMPI devel] matching code rewrite in OB1

2007-12-12 Thread Gleb Natapov
out changing it w/o some very > > strong > > reasons. Not apposed, just very cautious. > > > > Rich > > > > > > On 12/11/07 11:47 AM, "Gleb Natapov" wrote: > > > >> On Tue, Dec 11, 2007 at 08:36:42AM -0800, Andrew Friedley wrote: &

Re: [OMPI devel] New BTL parameter

2007-12-12 Thread Gleb Natapov
On Wed, Dec 12, 2007 at 02:03:02PM -0500, Jeff Squyres wrote: > On Dec 9, 2007, at 10:34 AM, Gleb Natapov wrote: > > > Currently BTL has parameter btl_min_send_size that is no longer used. > > I want to change it to be btl_rndv_eager_limit. This new parameter > > will &

Re: [OMPI devel] SCTP BTL exclusivity value problem

2007-12-12 Thread Gleb Natapov
On Wed, Dec 12, 2007 at 10:31:37AM -0500, Jeff Squyres wrote: > I'd be in favor of setting the TCP exclusivity to LOW+100 and setting > SCTP exclusivity to LOW. Fine with me. > > > On Dec 12, 2007, at 10:07 AM, Gleb Natapov wrote: > > > On Wed, Dec 12, 2007 at 1

Re: [OMPI devel] SCTP BTL exclusivity value problem

2007-12-12 Thread Gleb Natapov
the SCTP BTL is being built? What kind of > environment is it? Red Hat Enterprise Linux AS release 4 (Nahant Update 5) # rpm -qa | grep sctp lksctp-tools-devel-1.0.2-6.4E.1 lksctp-tools-doc-1.0.2-6.4E.1 lksctp-tools-1.0.2-6.4E.1 > > > > On Dec 12, 2007, at 9:38 AM, Gleb Natapov

[OMPI devel] SCTP BTL exclusivity value problem

2007-12-12 Thread Gleb Natapov
Hi, SCTP BTL sets its exclusivity value to MCA_BTL_EXCLUSIVITY_LOW - 1 but MCA_BTL_EXCLUSIVITY_LOW is zero so actually it is set to max exclusivity possible. Can somebody fix this please? May be we should not define MCA_BTL_EXCLUSIVITY_LOW to zero? -- Gleb.

Re: [OMPI devel] [PATCH] openib: clean-up connect to allow for new cm's

2007-12-12 Thread Gleb Natapov
On Wed, Dec 12, 2007 at 04:08:31PM +0200, Pavel Shamis (Pasha) wrote: > Gleb Natapov wrote: >> On Wed, Dec 12, 2007 at 03:37:26PM +0200, Pavel Shamis (Pasha) wrote: >> >>> Gleb Natapov wrote: >>> >>>> On Tue, Dec 11, 2007 at 08:16:07PM -0500,

Re: [OMPI devel] [PATCH] openib: clean-up connect to allow for new cm's

2007-12-12 Thread Gleb Natapov
On Wed, Dec 12, 2007 at 03:37:26PM +0200, Pavel Shamis (Pasha) wrote: > Gleb Natapov wrote: > > On Tue, Dec 11, 2007 at 08:16:07PM -0500, Jeff Squyres wrote: > > > >> Isn't there a better way somehow? Perhaps we should have "select" > >> cal

Re: [OMPI devel] [PATCH] openib: clean-up connect to allow for new cm's

2007-12-12 Thread Gleb Natapov
On Tue, Dec 11, 2007 at 08:16:07PM -0500, Jeff Squyres wrote: > Isn't there a better way somehow? Perhaps we should have "select" > call *all* the functions and accept back a priority. The one with the > highest priority then wins. This is quite similar to much of the > other selection log

Re: [OMPI devel] matching code rewrite in OB1

2007-12-11 Thread Gleb Natapov
; > > > Rich > > > > > > On 12/11/07 10:54 AM, "Gleb Natapov" wrote: > > > >> Hi, > >> > >>I did a rewrite of matching code in OB1. I made it much simpler and 2 > >> times smaller (which is good, less co

Re: [OMPI devel] matching code rewrite in OB1

2007-12-11 Thread Gleb Natapov
Currently upper layers of Open MPI may call BTL progress function recursively. I hope this will change some day. > > Andrew > > Gleb Natapov wrote: > > On Tue, Dec 11, 2007 at 08:03:52AM -0800, Andrew Friedley wrote: > >> Try UD, frags are reordered at a very high rate so

Re: [OMPI devel] matching code rewrite in OB1

2007-12-11 Thread Gleb Natapov
s checked in this be tested on a system > > that has N-way network parallelism, where N is as large as you can find. > > This is a key bit of code for MPI correctness, and out-of-order operations > > will break it, so you want to maximize the chance for such operations. >

Re: [OMPI devel] matching code rewrite in OB1

2007-12-11 Thread Gleb Natapov
On Tue, Dec 11, 2007 at 10:00:08AM -0600, Brian W. Barrett wrote: > On Tue, 11 Dec 2007, Gleb Natapov wrote: > > > I did a rewrite of matching code in OB1. I made it much simpler and 2 > > times smaller (which is good, less code - less bugs). I also got rid > > of huge

Re: [OMPI devel] matching code rewrite in OB1

2007-12-11 Thread Gleb Natapov
On Tue, Dec 11, 2007 at 11:00:51AM -0500, Richard Graham wrote: > Gleb, > I would suggest that before this is checked in this be tested on a system > that has N-way network parallelism, where N is as large as you can find. > This is a key bit of code for MPI correctness, and out-of-order operatio

[OMPI devel] matching code rewrite in OB1

2007-12-11 Thread Gleb Natapov
Hi, I did a rewrite of matching code in OB1. I made it much simpler and 2 times smaller (which is good, less code - less bugs). I also got rid of huge macros - very helpful if you need to debug something. There is no performance degradation, actually I even see very small performance improvemen

Re: [OMPI devel] opal_condition_wait

2007-12-11 Thread Gleb Natapov
On Tue, Dec 11, 2007 at 10:27:55AM -0500, Tim Prins wrote: > My understanding was that this behavior was not right, but upon further > inspection of the pthreads documentation this behavior seems to be > allowable. > I think that Open MPI does not implement condition variable in the strict sense

[OMPI devel] New BTL parameter

2007-12-09 Thread Gleb Natapov
Hi, Currently BTL has parameter btl_min_send_size that is no longer used. I want to change it to be btl_rndv_eager_limit. This new parameter will determine a size of a first fragment of rendezvous protocol. Now we use btl_eager_limit to set its size. btl_rndv_eager_limit will have to be smaller

[OMPI devel] Changes to all BTLs.

2007-12-09 Thread Gleb Natapov
Hi everybody, I committed changes to BTL interface. Two new parameters are now provided to descriptor allocation: endpoint and flags. I did my best to change all in tree BTLs, but I can't compile all of them, so compilation problems are possible. Can everybody test that the BTLs they care about s

Re: [OMPI devel] 32-bit openib is broken on the trunk as of Nov 27th, r16799

2007-12-09 Thread Gleb Natapov
On Wed, Dec 05, 2007 at 02:45:17PM -0500, Tim Mattox wrote: > Hello, > It appears that sometime after r16777, and by r16799, that something > was broken on the trunk's openib support for 32-bit builds. > The 64-bit tests all seem normal, as well as the 32-bit & 64-bit tests on > the 1.2 branch on t

Re: [OMPI devel] opal_condition_wait

2007-12-06 Thread Gleb Natapov
On Thu, Dec 06, 2007 at 09:46:45AM -0500, Tim Prins wrote: > Also, when we are using threads, there is a case where we do not > decrement the signaled count, in condition.h:84. Gleb put this in in > r9451, however the change does not make sense to me. I think that the > signal count should alway

Re: [OMPI devel] tmp XRC branches

2007-11-30 Thread Gleb Natapov
On Fri, Nov 30, 2007 at 02:06:02PM -0500, Jeff Squyres wrote: > Are any of the XRC tmp SVN branches still relevant? Or have they now > been integrated into the trunk? > > I ask because I see 4 XRC-related branches out there under /tmp and / > tmp-public. They are not relevant any more. I'll re

Re: [OMPI devel] THREAD_MULTIPLE

2007-11-28 Thread Gleb Natapov
On Wed, Nov 28, 2007 at 01:46:53PM -0500, George Bosilca wrote: > Yes, "us" means UTK. Our math folks are pushing hard for this. I'll gladly > accept any help, even if it's only for testing. For development, I dispose > of some of my time and a 100% of a post-doc for few months. I already worked

Re: [OMPI devel] IB/OpenFabrics pow wow

2007-11-19 Thread Gleb Natapov
On Fri, Nov 16, 2007 at 11:36:39AM -0800, Jeff Squyres wrote: > 1. Mon, 26 Nov, 10am US East, 7am US Pacific, 5pm Israel > 2. Mon, 26 Nov, 11am US East, 8am US Pacific, 6pm Israel > 3. Thu, 29 Nov, 10am US East, 7am US Pacific, 5pm Israel > 4. Thu, 29 Nov, 11am US East, 8am US Pacific, 6pm Israel >

Re: [OMPI devel] [OMPI svn] svn:open-mpi r16723

2007-11-14 Thread Gleb Natapov
On Wed, Nov 14, 2007 at 06:44:06AM -0800, Tim Prins wrote: > Hi, > > The following files bother me about this commit: > trunk/ompi/mca/btl/sctp/sctp_writev.c > trunk/ompi/mca/btl/sctp/sctp_writev.h > > They bother me for 2 reasons: > 1. Their naming does not follow the prefix rule > 2.

Re: [OMPI devel] Multi-Rail and Open IB BTL

2007-11-14 Thread Gleb Natapov
Sorry I missed a mail with the question. On Mon, Nov 12, 2007 at 06:03:07AM -0500, Jeff Squyres wrote: > On Nov 9, 2007, at 1:24 PM, Don Kerr wrote: > > > both, I was thinking of listing what I think are multi-rail > > requirements > > but wanted to understand what the current state of things a

Re: [OMPI devel] collective problems

2007-11-08 Thread Gleb Natapov
On Wed, Nov 07, 2007 at 11:25:43PM -0500, Patrick Geoffray wrote: > Richard Graham wrote: > > The real problem, as you and others have pointed out is the lack of > > predictable time slices for the progress engine to do its work, when relying > > on the ULP to make calls into the library... > > Th

Re: [OMPI devel] collective problems

2007-11-08 Thread Gleb Natapov
On Wed, Nov 07, 2007 at 01:16:04PM -0500, George Bosilca wrote: > > On Nov 7, 2007, at 12:51 PM, Jeff Squyres wrote: > >>> The same callback is called in both cases. In the case that you >>> described, the callback is called just a little bit deeper into the >>> recursion, when in the "normal case"

Re: [OMPI devel] collective problems

2007-11-08 Thread Gleb Natapov
On Wed, Nov 07, 2007 at 09:07:23PM -0700, Brian Barrett wrote: > Personally, I'd rather just not mark MPI completion until a local > completion callback from the BTL. But others don't like that idea, so > we came up with a way for back pressure from the BTL to say "it's not > on the wire yet

Re: [OMPI devel] Multi-Rail and Open IB BTL

2007-11-01 Thread Gleb Natapov
On Thu, Nov 01, 2007 at 11:15:21AM -0400, Don Kerr wrote: > How would the openib btl handle the following scenario: > Two nodes, each with two ports, all ports are on the same subnet and switch. > > Would striping occur over 4 connections or 2? Only two connections will be created. > > If 2 is i

[OMPI devel] bml_btl->btl_alloc() instead of mca_bml_base_alloc() in OSC

2007-10-28 Thread Gleb Natapov
Hi Brian, Is there a special reason why you call btl functions directly instead of using bml wrappers? What about applying this patch? diff --git a/ompi/mca/osc/rdma/osc_rdma_component.c b/ompi/mca/osc/rdma/osc_rdma_component.c index 2d0dc06..302dd9e 100644 --- a/ompi/mca/osc/rdma/osc_rdma_co

Re: [OMPI devel] RFC: Add "connect" field to openib BTL INI file

2007-10-25 Thread Gleb Natapov
On Thu, Oct 25, 2007 at 10:55:25AM -0400, Jeff Squyres wrote: > On Oct 25, 2007, at 10:35 AM, Gleb Natapov wrote: > > > I don't think xrc should be used by default even if HW supports it. > > Only if > > special config option is set xrc should be attempted. &g

Re: [OMPI devel] RFC: Add "connect" field to openib BTL INI file

2007-10-25 Thread Gleb Natapov
On Wed, Oct 24, 2007 at 08:01:44PM -0400, Jeff Squyres wrote: > My proposal is that the "connect" field can be added to the INI file > and take a comma-delimited list of values of acceptable CPCs for a > given device. For example, the ConnectX HCA can take the following > value: > > co

Re: [OMPI devel] collective problems

2007-10-23 Thread Gleb Natapov
on problem the fix to the problem will be a couple of lines of code. > > - Galen > > > > On 10/11/07 11:26 AM, "Gleb Natapov" wrote: > > > On Fri, Oct 05, 2007 at 09:43:44AM +0200, Jeff Squyres wrote: > >> David -- > >> > >

Re: [OMPI devel] putting common request completion waiting code into separate inline function

2007-10-18 Thread Gleb Natapov
laces :) > > > On Oct 15, 2007, at 10:27 AM, Gleb Natapov wrote: > > > Hi, > > > >Each time a someone needs to wait for request completion he > > implements the same piece of code. Why not put this code into > > inline function

[OMPI devel] putting common request completion waiting code into separate inline function

2007-10-15 Thread Gleb Natapov
Hi, Each time a someone needs to wait for request completion he implements the same piece of code. Why not put this code into inline function and use it instead. Look at the included patch, it moves the common code into ompi_request_wait_completion() function. Does somebody have any objection

Re: [OMPI devel] collective problems

2007-10-11 Thread Gleb Natapov
On Fri, Oct 05, 2007 at 09:43:44AM +0200, Jeff Squyres wrote: > David -- > > Gleb and I just actively re-looked at this problem yesterday; we > think it's related to https://svn.open-mpi.org/trac/ompi/ticket/ > 1015. We previously thought this ticket was a different problem, but > our analys

Re: [OMPI devel] osu_bibw failing for message sizes 2097152 and larger

2007-09-19 Thread Gleb Natapov
On Wed, Sep 19, 2007 at 10:26:15AM -0400, Dan Lacher wrote: > In doing some runs with the osu_bibw test on a single node, we have > found that it hands when using the trunk for message sizes 2097152 or > larger unless the mpool_sm_min_size is set to a number larger than the > message size. We a

Re: [OMPI devel] Commit r16105

2007-09-18 Thread Gleb Natapov
t ); >> >> This collective is executed on old communicator after setup of a new >> cid. Is this not enough to solve the problem? Some ranks may leave >> this collective call earlier than others, but none can leave it before >> all ranks enter it and at this stage new c

Re: [OMPI devel] Commit r16105

2007-09-18 Thread Gleb Natapov
and at this stage new communicator is already exists in all of them. Do I miss something? > > george. > > On Sep 18, 2007, at 9:06 AM, Gleb Natapov wrote: > >> George, >> >> In the comment you are saying that "a message for a not yet existing >

[OMPI devel] Commit r16105

2007-09-18 Thread Gleb Natapov
George, In the comment you are saying that "a message for a not yet existing communicator can happen". Can you explain in what situation it can happen? Thanks, -- Gleb.

  1   2   3   >