Re: [OMPI devel] Is trunk broken ?

2008-06-19 Thread Ralph Castain
On 6/19/08 3:31 PM, "Jeff Squyres" wrote: > Yo Ralph -- > > Is the "bad" grpcomm component both new and the default? Further, is > the old "basic" grpcomm component now the non-default / testing > component? Yes to both > > If so, I wonder if what happened was that Pasha did an "svn up", bu

Re: [OMPI devel] Is trunk broken ?

2008-06-19 Thread Jeff Squyres
Yo Ralph -- Is the "bad" grpcomm component both new and the default? Further, is the old "basic" grpcomm component now the non-default / testing component? If so, I wonder if what happened was that Pasha did an "svn up", but without re-running autogen/configure, he wouldn't have seen the

Re: [OMPI devel] Is trunk broken ?

2008-06-19 Thread Pavel Shamis (Pasha)
I did fresh check out and everything works well. So looks like some svn up screw my svn. Ralph, thanks for help ! Ralph H Castain wrote: Hmmm...something isn't right, Pasha. There is simply no way you should be encountering this error. You are picking up the wrong grpcomm module. I went ahead a

Re: [OMPI devel] Is trunk broken ?

2008-06-19 Thread Ralph H Castain
Hmmm...something isn't right, Pasha. There is simply no way you should be encountering this error. You are picking up the wrong grpcomm module. I went ahead and fixed the grpcomm/basic module, but as I note in the commit message, that is now an experimental area. The grpcomm/bad module is the defa

Re: [OMPI devel] Is trunk broken ?

2008-06-19 Thread Pavel Shamis (Pasha)
Ralph H Castain wrote: Ha! I found it - you left out one very important detail. You are specifying the use of the grpcomm basic module instead of the default "bad" one. Hmm , I did not specified any "grpcomm" module. I just checked and that module is indeed showing a problem. I'll see what I

Re: [OMPI devel] Is trunk broken ?

2008-06-19 Thread Pavel Shamis (Pasha)
Ralph H Castain wrote: I can't find anything wrong so far. I'm waiting in a queue on Odin to try there since Jeff indicated you are using rsh as a launcher, and that's the only access I have to such an environment. Guess Odin is being pounded because the queue isn't going anywhere. I use ssh.

Re: [OMPI devel] Is trunk broken ?

2008-06-19 Thread Ralph H Castain
Ha! I found it - you left out one very important detail. You are specifying the use of the grpcomm basic module instead of the default "bad" one. I just checked and that module is indeed showing a problem. I'll see what I can do. For now, though, just use the default grpcomm and it will work fine

Re: [OMPI devel] Is trunk broken ?

2008-06-19 Thread Ralph H Castain
I can't find anything wrong so far. I'm waiting in a queue on Odin to try there since Jeff indicated you are using rsh as a launcher, and that's the only access I have to such an environment. Guess Odin is being pounded because the queue isn't going anywhere. Meantime, I'm building on RoadRunner a

Re: [OMPI devel] Is trunk broken ?

2008-06-19 Thread Pavel Shamis (Pasha)
You'll have to tell us something more than that, Pasha. What kind of environment, what rev level were you at, etc. Ahh, sorry :) I run on Linux x86_64 Sles10 sp1. (Open MPI) 1.3a1r18682M , OFED 1.3.1 Pasha. So far as I know, the trunk is fine. On 6/19/08 12:01 PM, "Pavel Shamis (Pasha)"

Re: [OMPI devel] RML Send

2008-06-19 Thread Ralph H Castain
Okay, I've traced this down. The problem is that a DSS-internal function has been exposed via the API, so now people can mistakenly call the wrong one. You should -never- be using opal_dss.pack_buffer or opal_dss.unpack_buffer. Those were supposed to be internal to the DSS only, and will definitely

Re: [OMPI devel] Is trunk broken ?

2008-06-19 Thread Ralph H Castain
You'll have to tell us something more than that, Pasha. What kind of environment, what rev level were you at, etc. So far as I know, the trunk is fine. On 6/19/08 12:01 PM, "Pavel Shamis (Pasha)" wrote: > I tried to run trunk on my machines and I got follow error: > > [sw214:04367] [[16563,1]

Re: [OMPI devel] [OMPI svn] svn:open-mpi r18677

2008-06-19 Thread Ralph H Castain
I would argue that this behavior is in fact consistent - the returned state is that all required connections have been opened and is independent of the selected routed module. How that is done is irrelevant to the caller. Each routed module knows precisely what connections are used for its operati

[OMPI devel] Is trunk broken ?

2008-06-19 Thread Pavel Shamis (Pasha)
I tried to run trunk on my machines and I got follow error: [sw214:04367] [[16563,1],1] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file base/grpcomm_base_modex.c at line 451 [sw214:04367] [[16563,1],1] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file grpcomm_b

Re: [OMPI devel] autogen error

2008-06-19 Thread Jeff Squyres
Will do. And with some off-list mails to Leonardo, it seems that the env variable GREP_COLORS was the culprit. On Jun 19, 2008, at 12:01 PM, Ralf Wildenhues wrote: * Jeff Squyres wrote on Thu, Jun 19, 2008 at 05:50:43PM CEST: Ralf: if it's more correct to also quote the m4_define first a

Re: [OMPI devel] [OMPI svn] svn:open-mpi r18677

2008-06-19 Thread George Bosilca
Ralph, I don't necessarily agree with this statement. There is a generic method to do the correct wireup, and this method works independent of the selected routed algorithms. One can use the routed to ask for the next hop for each of the destinations, make a unique list out of these first

Re: [OMPI devel] autogen error

2008-06-19 Thread Ralf Wildenhues
* Jeff Squyres wrote on Thu, Jun 19, 2008 at 05:50:43PM CEST: > Ralf: if it's more correct to also quote the m4_define first argument, > I'll do that, too. Yes, please. Several instances in autogen.sh.

Re: [OMPI devel] autogen error

2008-06-19 Thread Jeff Squyres
Ah! Looks like your "ls" must be aliased to include colors or somesuch. So I think the real culprit here is that we need to ensure to use an unaliased "ls" when getting the list of components. I can fix up autogen to do this. Ralf: if it's more correct to also quote the m4_define first ar

Re: [OMPI devel] autogen error

2008-06-19 Thread Leonardo Fialho
Hi Ralf, $ aclocal -I config /usr/local/bin/m4:config/mca_no_configure_components.m4:9: ERROR: end of file in string autom4te: /usr/local/bin/m4 failed with exit status: 1 aclocal: autom4te failed with exit status: 1 $ My line 9 have some characters more (I'm not m4, expert...): m4_define(mca_

Re: [OMPI devel] autogen error

2008-06-19 Thread Jeff Squyres
Interesting! I'm happy to make the change, but can you guess as to why this is only biting Leonardo, and only now (after literally years of being underquoted)? On Jun 19, 2008, at 11:29 AM, Ralf Wildenhues wrote: Hello Leonardo, * Leonardo Fialho wrote on Thu, Jun 19, 2008 at 04:29:30PM

Re: [OMPI devel] autogen error

2008-06-19 Thread Ralf Wildenhues
Hello Leonardo, * Leonardo Fialho wrote on Thu, Jun 19, 2008 at 04:29:30PM CEST: > [Running] aclocal -I config > /usr/local/bin/m4:config/mca_no_configure_components.m4:9: ERROR: end of > file in string > autom4te: /usr/local/bin/m4 failed with exit status: 1 > aclocal: autom4te failed with exit

Re: [OMPI devel] autogen error

2008-06-19 Thread Leonardo Fialho
That is the versions that I'm using: $ aclocal --version aclocal (GNU automake) 1.10.1 ... $ autoheader --version autoheader (GNU Autoconf) 2.62 ... $ autoconf --version autoconf (GNU Autoconf) 2.62 ... $ autom4te --version autom4te (GNU Autoconf) 2.62 ... $ libtoolize --version libtoolize (GNU l

Re: [OMPI devel] autogen error

2008-06-19 Thread Leonardo Fialho
Hi Jeff, Yes, with a fresh checkout... well, it can be some error in my aclocal files, I just updated it today, but I think I did it correctly. Leonardo Jeff Squyres escribió: That's a weird one -- that file (mca_no_configure_components.m4) is automatically generated by autogen.sh. I can't

Re: [OMPI devel] autogen error

2008-06-19 Thread Jeff Squyres
That's a weird one -- that file (mca_no_configure_components.m4) is automatically generated by autogen.sh. I can't think offhand of how it could be bogus. If you have a fresh tree checkout and run autogen, is the problem repeatable? On Jun 19, 2008, at 10:29 AM, Leonardo Fialho wrote:

Re: [OMPI devel] RML Send

2008-06-19 Thread Ralph H Castain
WOW! Somebody really screwed up the DSS by adding some new API's I'd never heard of before, but really can cause the system to break! I'm going to have to straighten this mess out - it is a total disaster. There needs to be just ONE way of packing and unpacking, not two totally incompatible method

Re: [OMPI devel] BW benchmark hangs after r 18551

2008-06-19 Thread Lenny Verkhovsky
Sorry, I checked it without sm. pls ignore this mail. On Thu, Jun 19, 2008 at 4:32 PM, Lenny Verkhovsky < lenny.verkhov...@gmail.com> wrote: > Hi, > I found what caused the problem in both cases. > > --- ompi/mca/btl/sm/btl_sm.c(revision 18675) > +++ ompi/mca/btl/sm/btl_sm.c(working co

Re: [OMPI devel] RML Send

2008-06-19 Thread Leonardo Fialho
Hi Ralph, Mi mistake, I'm really using ORTE_PROC_MY_DAEMON->jobid. I have success using pack_buffer()/unpack_buffer() and OPAL_BYTE type, something strange occur when I was using pack()/unpack(). The value of num_bytes increase, example: I tried to read num_bytes=5, and after a unpack this var

[OMPI devel] autogen error

2008-06-19 Thread Leonardo Fialho
Hi All, Anybody knows what is this error? Yes, I think that I'm using last version of M4, autoconf, automake and libtool, I think... *** Running GNU tools [Running] autom4te --language=m4sh ompi_get_version.m4sh -o ompi_get_version.sh [Running] libtoolize --automake --copy --ltdl ** Adjusti

Re: [OMPI devel] MPI_Iprobe and mca_btl_sm_component_progress

2008-06-19 Thread Brian W. Barrett
On Thu, 19 Jun 2008, Terry Dontje wrote: But my concern is not the raw performance of MPI_Iprobe in this case but more of an interaction between MPI and an application. The concern is if it takes 2 MPI_Iprobes to get to the real message (instead of one) then could this induce a synchronizatio

Re: [OMPI devel] MPI_Iprobe and mca_btl_sm_component_progress

2008-06-19 Thread Terry Dontje
George Bosilca wrote: Terry, We had a discussion about this few weeks ago. I have a version that modify this behavior (SM progress will not return as long as there are pending acks). There was no benefit from doing so (even if one might think that less calls to opal_progress might improve the

Re: [OMPI devel] BW benchmark hangs after r 18551

2008-06-19 Thread Lenny Verkhovsky
Hi, I found what caused the problem in both cases. --- ompi/mca/btl/sm/btl_sm.c(revision 18675) +++ ompi/mca/btl/sm/btl_sm.c(working copy) @@ -812,7 +812,7 @@ */ MCA_BTL_SM_FIFO_WRITE(endpoint, endpoint->my_smp_rank, endpoint->peer_smp_rank, frag->hdr,

Re: [OMPI devel] MPI_Iprobe and mca_btl_sm_component_progress

2008-06-19 Thread George Bosilca
Terry, We had a discussion about this few weeks ago. I have a version that modify this behavior (SM progress will not return as long as there are pending acks). There was no benefit from doing so (even if one might think that less calls to opal_progress might improve the performances). In

[OMPI devel] MPI_Iprobe and mca_btl_sm_component_progress

2008-06-19 Thread Terry Dontje
Galen, George and others that might have SM BTL interest. In my quest of looking at MPI_Iprobe performance I found what I think is an issue. If you have an application that is using the SM BTL and does a small message send <=256 followed by an MPI_Iprobe the mca_btl_sm_component function that