Re: [OMPI devel] [Open MPI] #334: Building with Libtool 2.1a fails to run OpenIB BTL

2006-09-06 Thread Gleb Natapov
This error is usually happens when libibverbs is dlopened without RTLD_GLOBAL flag. On Wed, Sep 06, 2006 at 03:05:39PM +0200, Ralf Wildenhues wrote: > Hello, > > * Open MPI wrote on Wed, Sep 06, 2006 at 01:00:00PM CEST: > > #334: Building with Libtool 2.1a fails to run OpenIB BTL > > >Are yo

Re: [OMPI devel] [OMPI bugs] [Open MPI] #366: OpenIB on IA64: we should disable small message RDMA

2006-09-14 Thread Gleb Natapov
On Wed, Sep 13, 2006 at 07:50:02PM +0530, Sunil Patil wrote: > Hi, > > This is somewhat irrelevant question. It was said that in order delivery is > a feature of Mellanox HCA and not part of IB spec. Is this true for UD type > QPs also for which IB spec says that "in order delivery" is not guarant

Re: [OMPI devel] MCA_BTL_DES_FLAGS_PRIORITY usage

2006-10-31 Thread Gleb Natapov
On Tue, Oct 31, 2006 at 09:35:05AM -0500, Donald Kerr wrote: > Can someone explain to me the intended use of the following flag and/or > all that it implies when set : MCA_BTL_DES_FLAGS_PRIORITY > I can explain how this flag is treated in openib BTL. We have two QPs high and low priority. High pri

Re: [OMPI devel] [PATCH] opal/class/opal_object: fix double-check locking for class initialization

2007-03-06 Thread Gleb Natapov
On Tue, Mar 06, 2007 at 10:10:44AM +0100, Bert Wesarg wrote: > Fix the double-check locking[1] by defining the cls_initialized member to > volatile. > > Greetings > > Bert Wesarg > > [1]: http://en.wikipedia.org/wiki/Double-checked_locking Can you explain how the Java example from this page appl

Re: [OMPI devel] [PATCH] opal/class/opal_object: fix double-check locking for class initialization

2007-03-06 Thread Gleb Natapov
On Tue, Mar 06, 2007 at 10:44:53AM +0100, Bert Wesarg wrote: > > > Gleb Natapov wrote: > > On Tue, Mar 06, 2007 at 10:10:44AM +0100, Bert Wesarg wrote: > >> Fix the double-check locking[1] by defining the cls_initialized member to > >> volatile. > >

Re: [OMPI devel] [PATCH] opal/class/opal_object: fix double-check locking for class initialization

2007-03-06 Thread Gleb Natapov
On Tue, Mar 06, 2007 at 11:24:06AM +0100, Bert Wesarg wrote: > Hello, > > Gleb Natapov wrote: > > If it does this after opal_atomic_lock() (which is explicit memory > > barrier) then it is broken. > Than, gcc 4.1.1 on the amd64 architecture is broken: And can you repeat t

Re: [OMPI devel] [PATCH] opal/class/opal_object: fix double-check locking for class initialization

2007-03-06 Thread Gleb Natapov
On Tue, Mar 06, 2007 at 12:13:16PM +0100, Bert Wesarg wrote: > Gleb Natapov wrote: > > On Tue, Mar 06, 2007 at 11:24:06AM +0100, Bert Wesarg wrote: > >> Hello, > >> > >> Gleb Natapov wrote: > >>> If it does this after opal_atomic_lock() (which is

Re: [OMPI devel] [Patch] make ompi recognize new ib (connectx/mlx4)

2007-05-10 Thread Gleb Natapov
On Thu, May 10, 2007 at 08:22:41AM -0400, Jeff Squyres wrote: > (FWIW: the internal Mellanox code name for ConnectX is Hermon, > another mountain in Israel, just like Sinai, Arbel, ...etc.). > Yes, but Hermon is the highest one, so theoretically Mellanox can only go downhill from there :) --

Re: [OMPI devel] [ewg] Re: [ofa-general] Re: OMPI over ofed udapl - bugs opened

2007-05-10 Thread Gleb Natapov
On Thu, May 10, 2007 at 04:30:27PM +0300, Or Gerlitz wrote: > Jeff Squyres wrote: > >On May 10, 2007, at 9:02 AM, Or Gerlitz wrote: > > >>To start with, my hope here is at least to be able play defensive > >>here, that is convince you that the disadvantages are minor, where > >>only if this fail

Re: [OMPI devel] [ofa-general] Re: [ewg] Re: Re: OMPI over ofed udapl - bugs opened

2007-05-10 Thread Gleb Natapov
On Thu, May 10, 2007 at 05:56:13PM +0300, Michael S. Tsirkin wrote: > > Quoting Jeff Squyres : > > Subject: Re: [ewg] Re: [OMPI devel] Re: OMPI over ofed?udapl -?bugs?opened > > > > On May 10, 2007, at 10:28 AM, Michael S. Tsirkin wrote: > > > > >>What is the advantage of this approach? > > > > >

Re: [OMPI devel] [RFC] Send data from the end of a buffer during pipeline proto

2007-05-18 Thread Gleb Natapov
> random OMPI users who use system(). So if there's zero impact on > > performance and it doesn't make the code [more] incredibly horrible > > [than it already is], I'm in favor of this change. > > > > > > > > On May 17, 2007, at 7:00 AM, Gleb N

Re: [OMPI devel] [RFC] Send data from the end of a buffer during pipeline proto

2007-05-18 Thread Gleb Natapov
s the type of things that a > MPI implementation should not care about. At least not in the (common) > protocol layer. That's why the BTL-level abstraction is a bad one, > device-specific problems bubble up instead of staying hidden in > device-specific code. I am glad I provid

Re: [OMPI devel] [RFC] Send data from the end of a buffer during pipeline proto

2007-05-18 Thread Gleb Natapov
On Thu, May 17, 2007 at 02:35:02PM -0400, Patrick Geoffray wrote: > Brian Barrett wrote: > > On the other hand, since the MPI standard explicitly says you're not > > allowed to call fork() or system() during the MPI application and > > Does it ? The MPI spec says that you should not access buf

Re: [OMPI devel] [RFC] Send data from the end of a buffer during pipeline proto

2007-05-18 Thread Gleb Natapov
On Thu, May 17, 2007 at 02:57:22PM -0400, Patrick Geoffray wrote: > gshipman wrote: > >> The fork() problem is due to memory registration aggravated by > >> registration cache. Memory registration in itself is a hack from > >> the OS > >> point of view, and you already know a lot about the variou

Re: [OMPI devel] [RFC] Send data from the end of a buffer during pipeline proto

2007-05-20 Thread Gleb Natapov
On Fri, May 18, 2007 at 06:04:07PM -0400, Patrick Geoffray wrote: > Hi Gleb, > > Gleb Natapov wrote: > > new madvice flag was implemented that allows userspace to mark certain > > memory to not be copied to a child process. This memory is not mapped in > > a child at a

Re: [OMPI devel] [OMPI svn] svn:open-mpi r14768

2007-05-27 Thread Gleb Natapov
On Fri, May 25, 2007 at 09:31:33PM -0600, Galen Shipman wrote: > > On May 24, 2007, at 2:48 PM, George Bosilca wrote: > > > I see the problem this patch try to solve, but I fail to correctly > > understand the implementation. The patch affect all PML and BTL in > > the code base by adding one

Re: [OMPI devel] [OMPI svn] svn:open-mpi r14768

2007-05-27 Thread Gleb Natapov
On Sun, May 27, 2007 at 10:19:09AM -0600, Galen Shipman wrote: > > > With current code this is not the case. Order tag is set during a > > fragment > > allocation. It seems wrong according to your description. Attached > > patch fixes > > this. If no specific ordering tag is provided to alloca

Re: [OMPI devel] [OMPI svn] svn:open-mpi r14768

2007-05-27 Thread Gleb Natapov
On Sun, May 27, 2007 at 10:23:23AM -0600, Galen Shipman wrote: > >> > >> > >> The problem is that MCA_BTL_DES_FLAGS_PRIORITY was meant to indicate > >> that the fragment was higher priority, but the fragment isn't higher > >> priority. It simply needs to be ordered w.r.t. a previous fragment, > >>

Re: [OMPI devel] [OMPI svn] svn:open-mpi r14780

2007-05-27 Thread Gleb Natapov
On Sun, May 27, 2007 at 10:32:26AM -0600, Galen Shipman wrote: > Can we get rid of mca_pml_ob1_send_fin_btl and just have > mca_pml_ob1_send_fin? It seems we should just always send the fin > over the same btl and this would clean up the code a bit. Yes. It should be possible. I'll do that. >

Re: [OMPI devel] [OMPI svn] svn:open-mpi r14782

2007-05-27 Thread Gleb Natapov
On Sun, May 27, 2007 at 10:34:33AM -0600, Galen Shipman wrote: > Actually, we still need MCA_BTL_FLAGS_FAKE_RDMA , it can be used as > a hint for components such as one-sided. What is the purpose of the hint if it should be set for each interconnect. Just assume that it is set and behave accordi

[OMPI devel] Fix for deadlock in OB1 RDMA protocol

2007-05-29 Thread Gleb Natapov
Hi, Attached are two patches. First one implements new function mca_pml_ob1_send_requst_copy_in_out(req, offset, len) that sends given range of the request by copying data in/out internal buffers. It also changes the behaviour of the pipeline protocol to send data from an end of a user buffer. Th

Re: [OMPI devel] Multi-NIC support

2007-06-05 Thread Gleb Natapov
OK. I wanted to post my patch later this week, but you beat me to it, so here it is attached. But my approach is completely different and may coexist with yours. On Tue, Jun 05, 2007 at 12:03:55PM -0400, George Bosilca wrote: > The multi-NIC support was broken for a while. This patch correct it

Re: [OMPI devel] [OMPI svn] svn:open-mpi r14768

2007-06-07 Thread Gleb Natapov
Hi Galen, On Sun, May 27, 2007 at 10:19:09AM -0600, Galen Shipman wrote: > > > With current code this is not the case. Order tag is set during a > > fragment > > allocation. It seems wrong according to your description. Attached > > patch fixes > > this. If no specific ordering tag is provide

Re: [OMPI devel] [OMPI svn] svn:open-mpi r14768

2007-06-07 Thread Gleb Natapov
On Thu, Jun 07, 2007 at 11:11:12AM -0400, George Bosilca wrote: > ) I expect you to revise the patch in order to propose a generic > solution or I'll trigger a vote against the patch. I vote to be > backed out of the trunk as it export way to much knowledge from the > Open IB BTL into the PML

Re: [OMPI devel] [OMPI svn] svn:open-mpi r14768

2007-06-07 Thread Gleb Natapov
On Thu, Jun 07, 2007 at 02:38:51PM -0400, George Bosilca wrote: > > On Jun 7, 2007, at 1:28 PM, Gleb Natapov wrote: > > >On Thu, Jun 07, 2007 at 11:11:12AM -0400, George Bosilca wrote: > >>) I expect you to revise the patch in order to propose a generic > >&g

[OMPI devel] Problem with openib on demand connection bring up.

2007-06-13 Thread Gleb Natapov
Hello everyone, I encountered a problem with openib on depend connection code. Basically it works only by pure luck if you have more then one endpoint for the same proc and sometimes breaks in mysterious ways. The algo works like this: A wants to connect to B so it creates QP and sends it to B.

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r15041

2007-06-13 Thread Gleb Natapov
On Wed, Jun 13, 2007 at 11:03:09AM -0400, Jeff Squyres wrote: > Hey Gleb -- > > Can you explain the rationale for this change? Is there a reason why > the bandwidths reported by the IBV API are not sufficient? Are you > trying to do creative things with multi-LID scenarios (perhaps QOS- > l

Re: [OMPI devel] Problem with openib on demand connection bring up.

2007-06-13 Thread Gleb Natapov
K with everyone how cares then I want this change to go into 1.2 branch. I don't care how this change will get to the trunk. I can use patched version for a while. If you branch is in working state right now I can merge this change into it tomorrow. > > Thanks, > > Galen > >

Re: [OMPI devel] Problem with openib on demand connection bring up.

2007-06-13 Thread Gleb Natapov
On Wed, Jun 13, 2007 at 12:45:01PM -0400, Jeff Squyres wrote: > On Jun 13, 2007, at 12:08 PM, Gleb Natapov wrote: > > > I am not committing this yet. I want people to review my logic and the > > patch. If the change is OK with everyone how cares then I want this > > chan

Re: [OMPI devel] Problem with openib on demand connection bring up.

2007-06-13 Thread Gleb Natapov
use this commit will conflict quite a bit with > >> what I am working on, I can always merge it by hand but it may make > >> sense for us to get this all done in one area and then bring it all > >> over? > >> > >> Thanks, > >> > >&

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r15041

2007-06-13 Thread Gleb Natapov
On Wed, Jun 13, 2007 at 12:35:55PM -0400, Jeff Squyres wrote: > On Jun 13, 2007, at 12:03 PM, George Bosilca wrote: > > >> I think the "hidden" MCA parameters are a different issue; they were > >> created for a different purpose (users are not supposed to see/set > >> them). These variable parame

Re: [OMPI devel] Problem with openib on demand connection bring up.

2007-06-13 Thread Gleb Natapov
On Wed, Jun 13, 2007 at 02:05:00PM -0400, Jeff Squyres wrote: > On Jun 13, 2007, at 1:54 PM, Jeff Squyres wrote: > > > With today's trunk, I still see the problem: > > Same thing happens on v1.2 branch. I'll re-open #548. > I am sure it was never tested with multiple subnets. I'll try to get su

Re: [OMPI devel] openib coord teleconf (was: Problem with openib on demand connection bring up)

2007-06-13 Thread Gleb Natapov
On Wed, Jun 13, 2007 at 02:23:37PM -0400, Jeff Squyres wrote: > On Jun 13, 2007, at 1:40 PM, Gleb Natapov wrote: > > >>> [snip] > >>> coordination kind of teleconference. If people think this is a good > >>> idea, I can setup the call. > >> >

Re: [OMPI devel] openib coord teleconf (was: Problem with openib on demand connection bring up)

2007-06-13 Thread Gleb Natapov
On Wed, Jun 13, 2007 at 02:48:02PM -0400, Jeff Squyres wrote: > On Jun 13, 2007, at 2:41 PM, Gleb Natapov wrote: > > >> Pasha tells me that the best times for Ishai and him are: > >> > >> - 2000-2030 Israel time > >> - 1300-1300 US Eastern > >

Re: [OMPI devel] Problem with openib on demand connection bring up.

2007-06-14 Thread Gleb Natapov
On Wed, Jun 13, 2007 at 01:54:28PM -0400, Jeff Squyres wrote: > On Jun 13, 2007, at 1:37 PM, Gleb Natapov wrote: > > >> I have 2 hosts: one with 3 active ports and one with 2 active ports. > >> If I run an MPI job between them, the openib BTL wireup got badly and > &

Re: [OMPI devel] Problem with openib on demand connection bring up.

2007-06-14 Thread Gleb Natapov
On Wed, Jun 13, 2007 at 07:08:51PM +0300, Gleb Natapov wrote: > On Wed, Jun 13, 2007 at 09:38:21AM -0600, Galen Shipman wrote: > > Hi Gleb, > > > > As we have discussed before I am working on adding support for > > multiple QPs with either per peer resources or shared

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r15041

2007-06-14 Thread Gleb Natapov
On Wed, Jun 13, 2007 at 10:01:20PM -0400, Patrick Geoffray wrote: > Jeff Squyres wrote: > > Let's take a step back and see exactly what we *want*. Then we can > > talk about how to have an interface for it. > > I must be missing something but why is the bandwidth/latency passed by > the user (

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r15041

2007-06-14 Thread Gleb Natapov
On Wed, Jun 13, 2007 at 09:43:03PM -0400, Jeff Squyres wrote: > On Jun 13, 2007, at 1:48 PM, Gleb Natapov wrote: > > >> 3. Use a file to convey this information, because it's better suited > >> to what we're trying to do (vs. MCA parameters). > >> >

[OMPI devel] Improve OB1 performance when multiple NICs are available

2007-06-25 Thread Gleb Natapov
Hello, Attached patch improves OB1 scheduling algorithm between multiple links. Current algorithm perform very poorly if interconnects with very different bandwidth values are used. For big message sizes it always divide traffic equally between all available interconnects. Attached patch change t

Re: [OMPI devel] PML/BTL MCA params review

2007-06-26 Thread Gleb Natapov
On Fri, Jun 22, 2007 at 04:52:45PM -0400, Jeff Squyres wrote: > On Jun 20, 2007, at 8:29 AM, Jeff Squyres wrote: > > >1. btl_*_min_send_size is used to decide when to stop striping a > >message across multiple BTL's. Is there a reason that we don't > >just use eager_limit for this value? It

Re: [OMPI devel] Improve OB1 performance when multiple NICs are available

2007-06-27 Thread Gleb Natapov
e multi-BTL > stuff I think I have another idea. How about merging the ack with the > next pipeline fragment for RDMA (except for the last fragment) ? Can you elaborate? If you are talking about ACK from receiver on match then we already merge it with first PUT message if possible. >

Re: [OMPI devel] Improve OB1 performance when multiple NICs are available

2007-06-27 Thread Gleb Natapov
On Tue, Jun 26, 2007 at 05:42:05PM -0400, George Bosilca wrote: > Gleb, > > Simplifying the code and getting better performance is always a good > approach (at least from my perspective). However, your patch still > dispatch the messages over the BTLs in a round robin fashion, which > doesn'

Re: [OMPI devel] Improve OB1 performance when multiple NICs are available

2007-06-27 Thread Gleb Natapov
On Wed, Jun 27, 2007 at 02:27:34PM -0400, George Bosilca wrote: > > On Jun 27, 2007, at 10:06 AM, Gleb Natapov wrote: > > >> > >>Btw, did you compare my patch with yours on your multi-NIC system ? > >>With my patch on our system with 3 networks (2*1Gbs and

Re: [OMPI devel] Improve OB1 performance when multiple NICs are available

2007-06-28 Thread Gleb Natapov
Nobody except George haven't commented/complained about this patch, so I assume everybody except George are OK with it. And from George mails I don't understand if he is OK with me applying it to the trunk and he simply thinks that further work should be done in this area. So I'll ask him directly:

Re: [OMPI devel] Improve OB1 performance when multiple NICs are available

2007-06-28 Thread Gleb Natapov
> > On Jun 28, 2007, at 10:06 AM, Gleb Natapov wrote: > > >Nobody except George haven't commented/complained about this patch, > >so I > >assume everybody except George are OK with it. And from George mails I > >don't understand if he is OK with me appl

Re: [OMPI devel] Ob1 segfault

2007-07-08 Thread Gleb Natapov
On Fri, Jul 06, 2007 at 06:36:13PM -0400, Tim Prins wrote: > While looking into another problem I ran into an issue which made ob1 > segfault > on me. Using gm, and running the test test_dan1 in the onesided test suite, > if I limit the gm freelist by too much, I get a segfault. That is, > > mp

Re: [OMPI devel] Ob1 segfault

2007-07-09 Thread Gleb Natapov
On Sun, Jul 08, 2007 at 12:41:58PM -0400, Tim Prins wrote: > On Sunday 08 July 2007 08:32:27 am Gleb Natapov wrote: > > On Fri, Jul 06, 2007 at 06:36:13PM -0400, Tim Prins wrote: > > > While looking into another problem I ran into an issue which made ob1 > > > segfault o

Re: [OMPI devel] Ob1 segfault

2007-07-09 Thread Gleb Natapov
On Mon, Jul 09, 2007 at 10:41:52AM -0400, Tim Prins wrote: > Gleb Natapov wrote: > > On Sun, Jul 08, 2007 at 12:41:58PM -0400, Tim Prins wrote: > >> On Sunday 08 July 2007 08:32:27 am Gleb Natapov wrote: > >>> On Fri, Jul 06, 2007 at 06:36:13PM -0400, Tim Prins wr

Re: [OMPI devel] patch for btl_sm.c fixing segmentation fault

2007-07-11 Thread Gleb Natapov
On Wed, Jul 11, 2007 at 01:17:02PM +0200, Christoph Niethammer wrote: > Hello, > > > Since some time I'm testing Open MPI at the HRLS. My main topic there is the > thread support of Open MPI. > > Some time ago I found a segmentation fault when running the svn-trunk > Version. > Thanks to the

Re: [OMPI devel] [devel-core] Major reduction in ORTE

2007-07-13 Thread Gleb Natapov
On Thu, Jul 12, 2007 at 03:04:01PM -0600, Ralph H Castain wrote: > As always, any thoughts/suggestions are welcomed. > I hope Sharon's work on process affinity will be merged into the trunk before this works begins and functionality will be preserved during the work. -- Gl

Re: [OMPI devel] [devel-core] Major reduction in ORTE

2007-07-13 Thread Gleb Natapov
. > Great. Thanks Ralph. > > On 7/13/07 12:41 AM, "Gleb Natapov" wrote: > > > On Thu, Jul 12, 2007 at 03:04:01PM -0600, Ralph H Castain wrote: > >> As always, any thoughts/suggestions are welcomed. > >> > > I hope Sharon's work on process a

Re: [OMPI devel] Build failures of 1.2.3 on Debian hppa, mips, mipsel, s390, m68k

2007-07-14 Thread Gleb Natapov
On Sat, Jul 14, 2007 at 01:16:42PM -0400, George Bosilca wrote: > Instead of failing at configure time, we might want to disable the > threading features and the shared memory device if we detect that we > don't have support for atomics on a specified platform. In a non > threaded build, the

[OMPI devel] LD_LIBRARY_PATH and process launch on a head node

2007-07-18 Thread Gleb Natapov
Hi, With current trunk LD_LIBRARY_PATH is not set for ranks that are launched on the head node. This worked previously. -- Gleb.

Re: [OMPI devel] LD_LIBRARY_PATH and process launch on a head node

2007-07-18 Thread Gleb Natapov
On Wed, Jul 18, 2007 at 04:27:15PM +0300, Gleb Natapov wrote: > Hi, > > With current trunk LD_LIBRARY_PATH is not set for ranks that are > launched on the head node. This worked previously. > Same more info. I use rsh pls. elfit1# /home/glebn/openmpi/bin/mpirun -np 1 -H el

Re: [OMPI devel] LD_LIBRARY_PATH and process launch on a head node

2007-07-18 Thread Gleb Natapov
On Wed, Jul 18, 2007 at 07:48:17AM -0600, Ralph H Castain wrote: > I believe that was fixed in r15405 - are you at that rev level? I am on the latest revision. > > > On 7/18/07 7:27 AM, "Gleb Natapov" wrote: > > > Hi, > > > > With current trunk LD

Re: [OMPI devel] LD_LIBRARY_PATH and process launch on a head node

2007-07-18 Thread Gleb Natapov
On Wed, Jul 18, 2007 at 09:08:47AM -0600, Ralph H Castain wrote: > But this will lockup: > > pn1180961:~/openmpi/trunk rhc$ mpirun -n 1 -host pn1180961 printenv | grep > LD > > The reason is that the hostname in this last command doesn't match the > hostname I get when I query my interfaces, so m

Re: [OMPI devel] LD_LIBRARY_PATH and process launch on a head node

2007-07-19 Thread Gleb Natapov
On Wed, Jul 18, 2007 at 09:08:38PM +0300, Gleb Natapov wrote: > On Wed, Jul 18, 2007 at 09:08:47AM -0600, Ralph H Castain wrote: > > But this will lockup: > > > > pn1180961:~/openmpi/trunk rhc$ mpirun -n 1 -host pn1180961 printenv | grep > > LD > > > >

Re: [OMPI devel] LD_LIBRARY_PATH and process launch on a head node

2007-07-19 Thread Gleb Natapov
u provide a different hostname? Right I don't have LD_LIBRARY_PATH set in my environment, but I expect that mpirun will provide working environment for all ranks not just remote ones. This is how it worked before. Perhaps that was a bug, but this was useful bug :) > > > On 7/19/07 7:45 A

Re: [OMPI devel] LD_LIBRARY_PATH and process launch on a head node

2007-07-22 Thread Gleb Natapov
pointing to your > >>>> openmpi > >>>> installation - it says it did it right here in your debug output: > >>>> > >>>>>>> [elfit1:14752] pls:rsh: reset LD_LIBRARY_PATH: /home/glebn/ > >>>>>>> openmpi/lib

Re: [OMPI devel] Fwd: [Open MPI] #1101: MPI_ALLOC_MEM with 0 size must be valid

2007-07-24 Thread Gleb Natapov
On Tue, Jul 24, 2007 at 11:20:11AM -0300, Lisandro Dalcin wrote: > On 7/23/07, Jeff Squyres wrote: > > Does anyone have any opinions on this? If not, I'll go implement > > option #1. > > Sorry, Jeff... just reading this. I think your option #1 is the > better. However, I want to warn you about t

Re: [OMPI devel] openib credits problem

2007-07-26 Thread Gleb Natapov
On Thu, Jul 26, 2007 at 09:12:26AM -0400, Jeff Squyres wrote: > I got a problem in MTT runs last night with the openib BTL w.r.t. > credits: > > [...lots of IMB output...] > #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] > Mbytes/sec > 0 1000 36

Re: [OMPI devel] openib credits problem

2007-07-26 Thread Gleb Natapov
On Thu, Jul 26, 2007 at 04:29:40PM +0300, Gleb Natapov wrote: > On Thu, Jul 26, 2007 at 09:12:26AM -0400, Jeff Squyres wrote: > > I got a problem in MTT runs last night with the openib BTL w.r.t. > > credits: > > > > [...lots of IMB output...] > > #bytes

Re: [OMPI devel] problem with system() call and openib - blocks send/recv

2007-08-06 Thread Gleb Natapov
On Mon, Aug 06, 2007 at 09:53:20AM -0400, Bill Wichser wrote: > We have run across an issue, probably more related to openib than to > openmpi but don't know how to resolve. > > Linux kernel - 2.6.9-55.0.2.ELsmp x86_64 fork (and thus system()) is not supported by openib in this kernel. To get sys

Re: [OMPI devel] openib btl header caching

2007-08-12 Thread Gleb Natapov
On Sat, Aug 11, 2007 at 09:55:18AM -0700, Jeff Squyres wrote: > With Mellanox's new HCA (ConnectX), extremely low latencies are > possible for short messages between two MPI processes. Currently, > OMPI's latency is around 1.9us while all other MPI's (HP MPI, Intel > MPI, MVAPICH[2], etc.) a

Re: [OMPI devel] openib btl header caching

2007-08-13 Thread Gleb Natapov
On Mon, Aug 13, 2007 at 11:06:00AM +0300, Pavel Shamis (Pasha) wrote: > > > > >> Any objections? We can discuss what approaches we want to take > >> (there's going to be some complications because of the PML driver, > >> etc.); perhaps in the Tuesday Mellanox teleconf...? > >> > >> >

Re: [OMPI devel] openib btl header caching

2007-08-13 Thread Gleb Natapov
On Mon, Aug 13, 2007 at 10:36:19AM -0400, Jeff Squyres wrote: > On Aug 13, 2007, at 6:36 AM, Gleb Natapov wrote: > > >> Pallas, Presta (as i know) also use static rank. So lets start to fix > >> all "bogus" benchmarks :-) ? > >> > > All benchmar

Re: [OMPI devel] openib btl header caching

2007-08-13 Thread Gleb Natapov
EST_INIT_FULL > > so the first macro just sets up the convertor, the second populates > all the rest of the request state in the case that we will need it > later because the fragment doesn't hit the wire. > +++ > We all agreed. > --

Re: [OMPI devel] Problem in mpool rdma finalize

2007-08-13 Thread Gleb Natapov
On Mon, Aug 13, 2007 at 05:00:37PM +0300, Pavel Shamis (Pasha) wrote: > Jeff Squyres wrote: > > FWIW: we fixed this recently in the openib BTL by ensuring that all > > registered memory is freed during the BTL finalize (vs. the mpool > > finalize). > > > > This is a new issue because the mpool

Re: [OMPI devel] openib btl header caching

2007-08-13 Thread Gleb Natapov
On Mon, Aug 13, 2007 at 03:59:28PM -0400, Richard Graham wrote: > > > > On 8/13/07 3:52 PM, "Gleb Natapov" wrote: > > > On Mon, Aug 13, 2007 at 09:12:33AM -0600, Galen Shipman wrote: > > Here are the > > items we have identified: > > > All

Re: [OMPI devel] SM BTL hang issue

2007-08-29 Thread Gleb Natapov
Is this trunk or 1.2? On Wed, Aug 29, 2007 at 09:40:30AM -0400, Terry D. Dontje wrote: > I have a program that does a simple bucket brigade of sends and receives > where rank 0 is the start and repeatedly sends to rank 1 until a certain > amount of time has passed and then it sends and all done

Re: [OMPI devel] SM BTL hang issue

2007-08-29 Thread Gleb Natapov
On Wed, Aug 29, 2007 at 04:48:07PM +0300, Gleb Natapov wrote: > Is this trunk or 1.2? Oops. I should read more carefully :) This is trunk. > > On Wed, Aug 29, 2007 at 09:40:30AM -0400, Terry D. Dontje wrote: > > I have a program that does a simple bucket brigade of sends and rece

Re: [OMPI devel] SM BTL hang issue

2007-08-29 Thread Gleb Natapov
On Wed, Aug 29, 2007 at 10:46:06AM -0400, Richard Graham wrote: > Gleb, > Are you looking at this ? Not today. And I need the code to reproduce the bug. Is this possible? > > Rich > > > On 8/29/07 9:56 AM, "Gleb Natapov" wrote: > > > On Wed, Aug 29

Re: [OMPI devel] SM BTL hang issue

2007-08-29 Thread Gleb Natapov
On Wed, Aug 29, 2007 at 11:01:14AM -0400, Richard Graham wrote: > If you are going to look at it, I will not bother with this. I need the code to reproduce the problem. Otherwise I have nothing to look at. > > Rich > > > On 8/29/07 10:47 AM, "Gleb Natapov" wrote:

[OMPI devel] opal_atomic_lifo is not really atomic.

2007-09-05 Thread Gleb Natapov
Hi, opal_atomic_lifo implementation suffers from ABA problem. Here is the code for opal_atomic_lifo_pop: 1 do { 2 item = lifo->opal_lifo_head; 3 if( opal_atomic_cmpset_ptr( &(lifo->opal_lifo_head), 4 item, 5

Re: [OMPI devel] [devel-core] [RFC] Exit without finalize

2007-09-06 Thread Gleb Natapov
On Thu, Sep 06, 2007 at 06:50:43AM -0600, Ralph H Castain wrote: > WHAT: Decide upon how to handle MPI applications where one or more > processes exit without calling MPI_Finalize > > WHY:Some applications can abort via an exit call instead of > calling MPI_Abort when a libra

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r16088

2007-09-11 Thread Gleb Natapov
On Tue, Sep 11, 2007 at 10:14:30AM -0400, George Bosilca wrote: > Gleb, > > This patch is not correct. The code preventing the registration of the same > communicator twice is later in the code (same file in the function > ompi_comm_register_cid line 326). Once the function ompi_comm_register_cid

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r16088

2007-09-11 Thread Gleb Natapov
ink now I get the idea behind this test. I'll restore it and leave ompi_comm_unregister_cid() fix in place. Is this OK? > > george. > > On Sep 11, 2007, at 10:34 AM, Gleb Natapov wrote: > >> On Tue, Sep 11, 2007 at 10:14:30AM -0400, George Bosilca wrote: >>>

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r16088

2007-09-11 Thread Gleb Natapov
e in allreduce on > the same communicator, it won't work. Correct, but this is not what happens with mt_coll test. mt_coll calls commdup on the same communicator in different threads concurrently, but we handle this case inside ompi_comm_nextcid(). > > > Gleb Nata

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r16088

2007-09-11 Thread Gleb Natapov
On Tue, Sep 11, 2007 at 11:30:53AM -0400, George Bosilca wrote: > > On Sep 11, 2007, at 11:05 AM, Gleb Natapov wrote: > >> On Tue, Sep 11, 2007 at 10:54:25AM -0400, George Bosilca wrote: >>> We don't want to prevent two thread from entering the code is same time

[OMPI devel] Commit r16105

2007-09-18 Thread Gleb Natapov
George, In the comment you are saying that "a message for a not yet existing communicator can happen". Can you explain in what situation it can happen? Thanks, -- Gleb.

Re: [OMPI devel] Commit r16105

2007-09-18 Thread Gleb Natapov
and at this stage new communicator is already exists in all of them. Do I miss something? > > george. > > On Sep 18, 2007, at 9:06 AM, Gleb Natapov wrote: > >> George, >> >> In the comment you are saying that "a message for a not yet existing >

Re: [OMPI devel] Commit r16105

2007-09-18 Thread Gleb Natapov
t ); >> >> This collective is executed on old communicator after setup of a new >> cid. Is this not enough to solve the problem? Some ranks may leave >> this collective call earlier than others, but none can leave it before >> all ranks enter it and at this stage new c

Re: [OMPI devel] osu_bibw failing for message sizes 2097152 and larger

2007-09-19 Thread Gleb Natapov
On Wed, Sep 19, 2007 at 10:26:15AM -0400, Dan Lacher wrote: > In doing some runs with the osu_bibw test on a single node, we have > found that it hands when using the trunk for message sizes 2097152 or > larger unless the mpool_sm_min_size is set to a number larger than the > message size. We a

Re: [OMPI devel] collective problems

2007-10-11 Thread Gleb Natapov
On Fri, Oct 05, 2007 at 09:43:44AM +0200, Jeff Squyres wrote: > David -- > > Gleb and I just actively re-looked at this problem yesterday; we > think it's related to https://svn.open-mpi.org/trac/ompi/ticket/ > 1015. We previously thought this ticket was a different problem, but > our analys

[OMPI devel] putting common request completion waiting code into separate inline function

2007-10-15 Thread Gleb Natapov
Hi, Each time a someone needs to wait for request completion he implements the same piece of code. Why not put this code into inline function and use it instead. Look at the included patch, it moves the common code into ompi_request_wait_completion() function. Does somebody have any objection

Re: [OMPI devel] putting common request completion waiting code into separate inline function

2007-10-18 Thread Gleb Natapov
laces :) > > > On Oct 15, 2007, at 10:27 AM, Gleb Natapov wrote: > > > Hi, > > > >Each time a someone needs to wait for request completion he > > implements the same piece of code. Why not put this code into > > inline function

Re: [OMPI devel] collective problems

2007-10-23 Thread Gleb Natapov
on problem the fix to the problem will be a couple of lines of code. > > - Galen > > > > On 10/11/07 11:26 AM, "Gleb Natapov" wrote: > > > On Fri, Oct 05, 2007 at 09:43:44AM +0200, Jeff Squyres wrote: > >> David -- > >> > >

Re: [OMPI devel] RFC: Add "connect" field to openib BTL INI file

2007-10-25 Thread Gleb Natapov
On Wed, Oct 24, 2007 at 08:01:44PM -0400, Jeff Squyres wrote: > My proposal is that the "connect" field can be added to the INI file > and take a comma-delimited list of values of acceptable CPCs for a > given device. For example, the ConnectX HCA can take the following > value: > > co

Re: [OMPI devel] RFC: Add "connect" field to openib BTL INI file

2007-10-25 Thread Gleb Natapov
On Thu, Oct 25, 2007 at 10:55:25AM -0400, Jeff Squyres wrote: > On Oct 25, 2007, at 10:35 AM, Gleb Natapov wrote: > > > I don't think xrc should be used by default even if HW supports it. > > Only if > > special config option is set xrc should be attempted. &g

[OMPI devel] bml_btl->btl_alloc() instead of mca_bml_base_alloc() in OSC

2007-10-28 Thread Gleb Natapov
Hi Brian, Is there a special reason why you call btl functions directly instead of using bml wrappers? What about applying this patch? diff --git a/ompi/mca/osc/rdma/osc_rdma_component.c b/ompi/mca/osc/rdma/osc_rdma_component.c index 2d0dc06..302dd9e 100644 --- a/ompi/mca/osc/rdma/osc_rdma_co

Re: [OMPI devel] Multi-Rail and Open IB BTL

2007-11-01 Thread Gleb Natapov
On Thu, Nov 01, 2007 at 11:15:21AM -0400, Don Kerr wrote: > How would the openib btl handle the following scenario: > Two nodes, each with two ports, all ports are on the same subnet and switch. > > Would striping occur over 4 connections or 2? Only two connections will be created. > > If 2 is i

Re: [OMPI devel] collective problems

2007-11-08 Thread Gleb Natapov
On Wed, Nov 07, 2007 at 09:07:23PM -0700, Brian Barrett wrote: > Personally, I'd rather just not mark MPI completion until a local > completion callback from the BTL. But others don't like that idea, so > we came up with a way for back pressure from the BTL to say "it's not > on the wire yet

Re: [OMPI devel] collective problems

2007-11-08 Thread Gleb Natapov
On Wed, Nov 07, 2007 at 01:16:04PM -0500, George Bosilca wrote: > > On Nov 7, 2007, at 12:51 PM, Jeff Squyres wrote: > >>> The same callback is called in both cases. In the case that you >>> described, the callback is called just a little bit deeper into the >>> recursion, when in the "normal case"

Re: [OMPI devel] collective problems

2007-11-08 Thread Gleb Natapov
On Wed, Nov 07, 2007 at 11:25:43PM -0500, Patrick Geoffray wrote: > Richard Graham wrote: > > The real problem, as you and others have pointed out is the lack of > > predictable time slices for the progress engine to do its work, when relying > > on the ULP to make calls into the library... > > Th

Re: [OMPI devel] Multi-Rail and Open IB BTL

2007-11-14 Thread Gleb Natapov
Sorry I missed a mail with the question. On Mon, Nov 12, 2007 at 06:03:07AM -0500, Jeff Squyres wrote: > On Nov 9, 2007, at 1:24 PM, Don Kerr wrote: > > > both, I was thinking of listing what I think are multi-rail > > requirements > > but wanted to understand what the current state of things a

Re: [OMPI devel] [OMPI svn] svn:open-mpi r16723

2007-11-14 Thread Gleb Natapov
On Wed, Nov 14, 2007 at 06:44:06AM -0800, Tim Prins wrote: > Hi, > > The following files bother me about this commit: > trunk/ompi/mca/btl/sctp/sctp_writev.c > trunk/ompi/mca/btl/sctp/sctp_writev.h > > They bother me for 2 reasons: > 1. Their naming does not follow the prefix rule > 2.

Re: [OMPI devel] IB/OpenFabrics pow wow

2007-11-19 Thread Gleb Natapov
On Fri, Nov 16, 2007 at 11:36:39AM -0800, Jeff Squyres wrote: > 1. Mon, 26 Nov, 10am US East, 7am US Pacific, 5pm Israel > 2. Mon, 26 Nov, 11am US East, 8am US Pacific, 6pm Israel > 3. Thu, 29 Nov, 10am US East, 7am US Pacific, 5pm Israel > 4. Thu, 29 Nov, 11am US East, 8am US Pacific, 6pm Israel >

Re: [OMPI devel] THREAD_MULTIPLE

2007-11-28 Thread Gleb Natapov
On Wed, Nov 28, 2007 at 01:46:53PM -0500, George Bosilca wrote: > Yes, "us" means UTK. Our math folks are pushing hard for this. I'll gladly > accept any help, even if it's only for testing. For development, I dispose > of some of my time and a 100% of a post-doc for few months. I already worked

Re: [OMPI devel] tmp XRC branches

2007-11-30 Thread Gleb Natapov
On Fri, Nov 30, 2007 at 02:06:02PM -0500, Jeff Squyres wrote: > Are any of the XRC tmp SVN branches still relevant? Or have they now > been integrated into the trunk? > > I ask because I see 4 XRC-related branches out there under /tmp and / > tmp-public. They are not relevant any more. I'll re

Re: [OMPI devel] opal_condition_wait

2007-12-06 Thread Gleb Natapov
On Thu, Dec 06, 2007 at 09:46:45AM -0500, Tim Prins wrote: > Also, when we are using threads, there is a case where we do not > decrement the signaled count, in condition.h:84. Gleb put this in in > r9451, however the change does not make sense to me. I think that the > signal count should alway

  1   2   3   >