On 6/19/08 3:31 PM, "Jeff Squyres" wrote:
> Yo Ralph --
>
> Is the "bad" grpcomm component both new and the default? Further, is
> the old "basic" grpcomm component now the non-default / testing
> component?
Yes to both
>
> If so, I wonder if what happened was that Pasha did an "svn up", bu
Yo Ralph --
Is the "bad" grpcomm component both new and the default? Further, is
the old "basic" grpcomm component now the non-default / testing
component?
If so, I wonder if what happened was that Pasha did an "svn up", but
without re-running autogen/configure, he wouldn't have seen the
I did fresh check out and everything works well.
So looks like some svn up screw my svn.
Ralph, thanks for help !
Ralph H Castain wrote:
Hmmm...something isn't right, Pasha. There is simply no way you should be
encountering this error. You are picking up the wrong grpcomm module.
I went ahead a
Hmmm...something isn't right, Pasha. There is simply no way you should be
encountering this error. You are picking up the wrong grpcomm module.
I went ahead and fixed the grpcomm/basic module, but as I note in the commit
message, that is now an experimental area. The grpcomm/bad module is the
defa
Ralph H Castain wrote:
Ha! I found it - you left out one very important detail. You are specifying
the use of the grpcomm basic module instead of the default "bad" one.
Hmm , I did not specified any "grpcomm" module.
I just checked and that module is indeed showing a problem. I'll see what I
Ralph H Castain wrote:
I can't find anything wrong so far. I'm waiting in a queue on Odin to try
there since Jeff indicated you are using rsh as a launcher, and that's the
only access I have to such an environment. Guess Odin is being pounded
because the queue isn't going anywhere.
I use ssh.
Ha! I found it - you left out one very important detail. You are specifying
the use of the grpcomm basic module instead of the default "bad" one.
I just checked and that module is indeed showing a problem. I'll see what I
can do.
For now, though, just use the default grpcomm and it will work fine
I can't find anything wrong so far. I'm waiting in a queue on Odin to try
there since Jeff indicated you are using rsh as a launcher, and that's the
only access I have to such an environment. Guess Odin is being pounded
because the queue isn't going anywhere.
Meantime, I'm building on RoadRunner a
You'll have to tell us something more than that, Pasha. What kind of
environment, what rev level were you at, etc.
Ahh, sorry :) I run on Linux x86_64 Sles10 sp1. (Open MPI) 1.3a1r18682M
, OFED 1.3.1
Pasha.
So far as I know, the trunk is fine.
On 6/19/08 12:01 PM, "Pavel Shamis (Pasha)"
Okay, I've traced this down. The problem is that a DSS-internal function has
been exposed via the API, so now people can mistakenly call the wrong one.
You should -never- be using opal_dss.pack_buffer or opal_dss.unpack_buffer.
Those were supposed to be internal to the DSS only, and will definitely
You'll have to tell us something more than that, Pasha. What kind of
environment, what rev level were you at, etc.
So far as I know, the trunk is fine.
On 6/19/08 12:01 PM, "Pavel Shamis (Pasha)"
wrote:
> I tried to run trunk on my machines and I got follow error:
>
> [sw214:04367] [[16563,1]
I would argue that this behavior is in fact consistent - the returned state
is that all required connections have been opened and is independent of the
selected routed module. How that is done is irrelevant to the caller.
Each routed module knows precisely what connections are used for its
operati
I tried to run trunk on my machines and I got follow error:
[sw214:04367] [[16563,1],1] ORTE_ERROR_LOG: Data unpack would read past
end of buffer in file base/grpcomm_base_modex.c at line 451
[sw214:04367] [[16563,1],1] ORTE_ERROR_LOG: Data unpack would read past
end of buffer in file grpcomm_b
Will do.
And with some off-list mails to Leonardo, it seems that the env
variable GREP_COLORS was the culprit.
On Jun 19, 2008, at 12:01 PM, Ralf Wildenhues wrote:
* Jeff Squyres wrote on Thu, Jun 19, 2008 at 05:50:43PM CEST:
Ralf: if it's more correct to also quote the m4_define first
a
Ralph,
I don't necessarily agree with this statement. There is a generic
method to do the correct wireup, and this method works independent of
the selected routed algorithms.
One can use the routed to ask for the next hop for each of the
destinations, make a unique list out of these first
* Jeff Squyres wrote on Thu, Jun 19, 2008 at 05:50:43PM CEST:
> Ralf: if it's more correct to also quote the m4_define first argument,
> I'll do that, too.
Yes, please. Several instances in autogen.sh.
Ah! Looks like your "ls" must be aliased to include colors or somesuch.
So I think the real culprit here is that we need to ensure to use an
unaliased "ls" when getting the list of components. I can fix up
autogen to do this.
Ralf: if it's more correct to also quote the m4_define first ar
Hi Ralf,
$ aclocal -I config
/usr/local/bin/m4:config/mca_no_configure_components.m4:9: ERROR: end of
file in string
autom4te: /usr/local/bin/m4 failed with exit status: 1
aclocal: autom4te failed with exit status: 1
$
My line 9 have some characters more (I'm not m4, expert...):
m4_define(mca_
Interesting!
I'm happy to make the change, but can you guess as to why this is only
biting Leonardo, and only now (after literally years of being
underquoted)?
On Jun 19, 2008, at 11:29 AM, Ralf Wildenhues wrote:
Hello Leonardo,
* Leonardo Fialho wrote on Thu, Jun 19, 2008 at 04:29:30PM
Hello Leonardo,
* Leonardo Fialho wrote on Thu, Jun 19, 2008 at 04:29:30PM CEST:
> [Running] aclocal -I config
> /usr/local/bin/m4:config/mca_no_configure_components.m4:9: ERROR: end of
> file in string
> autom4te: /usr/local/bin/m4 failed with exit status: 1
> aclocal: autom4te failed with exit
That is the versions that I'm using:
$ aclocal --version
aclocal (GNU automake) 1.10.1
...
$ autoheader --version
autoheader (GNU Autoconf) 2.62
...
$ autoconf --version
autoconf (GNU Autoconf) 2.62
...
$ autom4te --version
autom4te (GNU Autoconf) 2.62
...
$ libtoolize --version
libtoolize (GNU l
Hi Jeff,
Yes, with a fresh checkout... well, it can be some error in my aclocal
files, I just updated it today, but I think I did it correctly.
Leonardo
Jeff Squyres escribió:
That's a weird one -- that file (mca_no_configure_components.m4) is
automatically generated by autogen.sh. I can't
That's a weird one -- that file (mca_no_configure_components.m4) is
automatically generated by autogen.sh. I can't think offhand of how
it could be bogus.
If you have a fresh tree checkout and run autogen, is the problem
repeatable?
On Jun 19, 2008, at 10:29 AM, Leonardo Fialho wrote:
WOW! Somebody really screwed up the DSS by adding some new API's I'd never
heard of before, but really can cause the system to break!
I'm going to have to straighten this mess out - it is a total disaster.
There needs to be just ONE way of packing and unpacking, not two totally
incompatible method
Sorry,
I checked it without sm.
pls ignore this mail.
On Thu, Jun 19, 2008 at 4:32 PM, Lenny Verkhovsky <
lenny.verkhov...@gmail.com> wrote:
> Hi,
> I found what caused the problem in both cases.
>
> --- ompi/mca/btl/sm/btl_sm.c(revision 18675)
> +++ ompi/mca/btl/sm/btl_sm.c(working co
Hi Ralph,
Mi mistake, I'm really using ORTE_PROC_MY_DAEMON->jobid.
I have success using pack_buffer()/unpack_buffer() and OPAL_BYTE type,
something strange occur when I was using pack()/unpack(). The value of
num_bytes increase, example:
I tried to read num_bytes=5, and after a unpack this var
Hi All,
Anybody knows what is this error?
Yes, I think that I'm using last version of M4, autoconf, automake and
libtool, I think...
*** Running GNU tools
[Running] autom4te --language=m4sh ompi_get_version.m4sh -o
ompi_get_version.sh
[Running] libtoolize --automake --copy --ltdl
** Adjusti
On Thu, 19 Jun 2008, Terry Dontje wrote:
But my concern is not the raw performance of MPI_Iprobe in this case but more
of an interaction between MPI and an application. The concern is if it takes
2 MPI_Iprobes to get to the real message (instead of one) then could this
induce a synchronizatio
George Bosilca wrote:
Terry,
We had a discussion about this few weeks ago. I have a version that
modify this behavior (SM progress will not return as long as there are
pending acks). There was no benefit from doing so (even if one might
think that less calls to opal_progress might improve the
Hi,
I found what caused the problem in both cases.
--- ompi/mca/btl/sm/btl_sm.c(revision 18675)
+++ ompi/mca/btl/sm/btl_sm.c(working copy)
@@ -812,7 +812,7 @@
*/
MCA_BTL_SM_FIFO_WRITE(endpoint, endpoint->my_smp_rank,
endpoint->peer_smp_rank, frag->hdr,
Terry,
We had a discussion about this few weeks ago. I have a version that
modify this behavior (SM progress will not return as long as there are
pending acks). There was no benefit from doing so (even if one might
think that less calls to opal_progress might improve the performances).
In
Galen, George and others that might have SM BTL interest.
In my quest of looking at MPI_Iprobe performance I found what I think is
an issue. If you have an application that is using the SM BTL and does
a small message send <=256 followed by an MPI_Iprobe the
mca_btl_sm_component function that
32 matches
Mail list logo