wrote:
> On Apr 30, 2014, at 6:35 PM, George Bosilca <bosi...@icl.utk.edu> wrote:
>
>> Puzzling. We survived so far without such a requirement.
>
> Ralph tells me that this is a requirement. So I figured we should check for
> it.
>
>> In the BTLs wher
that there be some way to reduce
>> to 64-bits when accessing the common data.
>>
>> I think the usnic BTL may have an issue with that approach, so maybe some
>> way of "unhashing" will be required?
>>
>>
>> On Apr 30, 2014, at 3:42 PM, Jeff
branch, and seems to work pretty well so far.
George.
On Apr 30, 2014, at 22:01 , George Bosilca <bosi...@icl.utk.edu> wrote:
> On Apr 30, 2014, at 20:04 , Jeff Squyres (jsquyres) <jsquy...@cisco.com>
> wrote:
>
>> All we need in usnic is the ompi_process_name_t
shift the process identifier down to the opal layer?
>>>> If we define opal_identifier_t to include the required jobid/vpid, perhaps
>>>> adding a void* so someone can put whatever they want in it?
>>>>
>>>> Note that I'm not wild about extending the
Any update on this? Can it be used in the RMA part?
George.
On Wed, Apr 23, 2014 at 1:58 AM, Gilles Gouaillardet
wrote:
> my bad :-(
>
> this has just been fixed
>
> Gilles
>
> On 2014/04/23 14:55, Nathan Hjelm wrote:
>> The ompi_datatype_flatten.c file appears
Strange. The outcome and the timing of this issue seems to highlight a link
with the other datatype-related issue you reported earlier, and as suggested by
Ralph with Gilles scif+vader issue.
Generally speaking, the mechanism used to split the data in the case of
multiple BTLs, is identical to
Nathan, or anybody with access to the target hardware,
If you can provide a minimalistic output of the applications with and
without the above-mentioned patch and with mpi_ddt_unpack_debug and
mpi_ddt_pack_debug, and mpi_ddt_position_debug set to 1, I would try
to help.
George.
On Thu, May
I heard multiple references to pthread_cancel being known to have bad
side effects. Can somebody educate my on this topic please?
Thanks,
George.
On Tue, May 13, 2014 at 10:25 PM, Ralph Castain wrote:
> It could be a bug in the software stack, though I wouldn't count
Good catch. I fixed the TCP BTL (r31753). It is the only BTL I can
test so that's the most I can do here.
However, I never get OPAL_ERR_DATA_VALUE_NOT_FOUND out of the modex
call when the key doesn't exists. I looked in dstore and the correct
value one should look for is OPAL_ERR_NOT_FOUND. I
resources,
> including (but not limited to) file descriptors and malloc()ed memory.
> Even if Open MPI is written very carefully, one cannot assume that all the
> libraries it calls (and their dependencies, etc.) are written to properly
> deal with cancellation.
>
> -Paul
>
&g
There seems to be a consensus on the fact that closing an fd should trigger the
return from poll. Unfortunately this assumption is wrong, and not condoned by
any documentation available online.
To be more clear, all documentation I know tend to point in the opposite
direction: it is unwise to
The solution you propose here is definitively not OK. It is 1) ugly and 2)
break the separation barrier that we hold dear.
Regarding your other suggestion I don’t see any reasons not to call the
delete_proc on MPI_COMM_WORLD as the last action we do before tearing down
everything else.
In order to cope with the dynamic case I think we will need to remove
the check for a single registration and instead do a ompi_proc ref
count.
George.
On Tue, May 20, 2014 at 6:58 AM, Open MPI wrote:
> #4645: Move r31786, 31829, r31830, r31833, r31834, r31835 to v1.8
>From a practical perspective, I don't think there is a need for a
phone call. Ralph made his point, and we all took notice of it.
However, the proposed changes are in a single independent component,
with no impact on the rest of the code base. Therefore, there is
absolutely no valid reason not to
On Tue, May 27, 2014 at 5:09 PM, Ralph Castain wrote:
>> That being said, I agree with Ralph on the fact that accepting them in
>> the trunk doesn't automatically qualify it for inclusion in any
>> further stable release. However, if ORNL setup nightly builds to
>> validate
Calling MPI_Comm_free is not enough from MPI perspective to clean up
all knowledge about remote processes, nor to sever the links between
the local and remote groups. One MUST call MPI_Comm_disconnect in
order to achieve this.
Look at the code in ompi/mpi/c and see the difference between
_position(...) is
>> invoked and does not set the pConvertor->pStack
>> as expected by r31496
>>
>> i will run some more tests from now
>>
>> Gilles
>>
>> On 2014/05/08 2:23, George Bosilca wrote:
>>> Strange. The outcome and the timing of
I think I like d the most but it is not a perfect solution. With d all
real8 types in a common will be badly aligned and the Open MPI
internal datatype will be incorrect. So I will vote for a combo: b +
d.
George.
On Fri, May 30, 2014 at 4:57 AM, Gilles Gouaillardet
If the scif BTL registered it's own memory registration function, I
would have expected that it will deregister it upon finalize. Without
this we run into circular dependencies that are not solvable at the
library level.
George.
On Mon, Jun 2, 2014 at 12:39 AM, Gilles Gouaillardet
WHAT:Open our low-level communication infrastructure by moving all
necessary components
(btl/rcache/allocator/mpool) down in OPAL
WHY: All the components required for inter-process communications are
currently deeply integrated in the OMPI
layer. Several
Thanks for the patch. I applied it to the trunk in r32190, and CMR (#4780)
it for the next release 1.8.2
George.
On Thu, Jul 10, 2014 at 3:09 AM, Kawashima, Takahiro <
t-kawash...@jp.fujitsu.com> wrote:
> Hi,
>
> The attached patch corrects trivial typos in man files and
> FUNC_NAME
Nathan,
Fixing the classes to correctly tear down everything was a two lines patch.
However, this doesn’t fix the bigger issue, which is related to the fact that
not all frameworks are correctly teared down, and when they are they leave
behind char* parameters not set to NULL, and that the
be good
> because at least there will be no conflicts with usnic BTL concurrent
> development. :-)
>
>
>
>
> On Jul 10, 2014, at 2:56 PM, Ralph Castain <r...@open-mpi.org> wrote:
>
> > George: any update on when this will happen?
> >
> >
> >
I'm also looking into it.
George.
On Tue, Jul 15, 2014 at 10:50 AM, Nathan Hjelm wrote:
> On Tue, Jul 15, 2014 at 11:40:38PM +0900, Gilles GOUAILLARDET wrote:
> >r32236 is a suspect
> >
> >i am afk
> >
> >I just read the code and a class is initialized with
>
r32248 should be the fix for this issue. I was overly optimistic about the
cleanup of the classes. It turns out this is not possible without deep
rearrangement of the class infrastructure. More info on the commit log.
Sorry for the mess,
George.
On Tue, Jul 15, 2014 at 11:38 AM, George
are unloaded.
George.
On July 15, 2014 at 1:17:26 AM, George Bosilca (bosi...@icl.utk.edu) wrote:
> Nathan,
>
> Fixing the classes to correctly tear down everything was a two lines patch.
> However,
> this doesn’t fix the bigger issue, which is related to the fact that not al
Enforcing the portability of this sounds like a huge [almost impossible]
mess, without a clean portable solution (more about this below). However,
few things should be considered:
- Except for reinit, Open MPI works without it! If we provide such a
capability it will be more a convenience
Are these also called for shared libraries?
George.
On Wed, Jul 16, 2014 at 3:36 PM, Paul Hargrove wrote:
>
> On Wed, Jul 16, 2014 at 7:36 AM, Nathan Hjelm wrote:
>
>> Correction. xlc does support the destructor function attribute. The odd
>> one out
I think Case #1 is only a partial solution, as it only solves the example
attached to the ticket. Based on my reading the the tool chapter calling
MPI_T_init after MPI_Finalize is legit, and this case is not covered by the
patch. But this is not the major issue I have with this patch. From a
discuss the init-after-finalize issue, and he intends to
> raise it with the Forum as it doesn't seem a logical thing to do. So that
> issue may go away. Still leaves us pondering the right solution, and
> hopefully coming up with something better than either of the ones we have
> so far.
>
There was a long thread of discussion on why we must use an rte_barrier and
not an mpi_barrier during the finalize. Basically, we long as we have
connectionless unreliable BTLs we need an external mechanism to ensure
complete tear-down of the entire infrastructure. Thus, we need to rely on
an
m rankB, while rankB is still doing MPI work. In this case rankB will
> not be able to communicate with rankA any more, while it still has work to
> do.
>
>
>
> *From:* devel [mailto:devel-boun...@open-mpi.org] *On Behalf Of *George
> Bosilca
> *Sent:* Monday, July 21, 2014 9:1
information about peer processes in the
>>> usnic BTL to include the peer's VPID, which is the MCW rank. I'll be sad
>>> if that goes away...
>>>
>>>
>>> On Jul 15, 2014, at 2:06 AM, George Bosilca <bosi...@icl.utk.edu> wrote:
>>>
>&
Terry,
We use the feature defined by POSIX mmap where the area should be zero-
filled when the file length is extended. What OS you're using when you
see such problems ?
Just in case, here is a patch that set the beginning of the mmaped
region to zero, in case this is not done
is a gnu-ism. We should probably use memset instead.
On Aug 21, 2008, at 5:40 AM, George Bosilca wrote:
Terry,
We use the feature defined by POSIX mmap where the area should be
zero-filled when the file length is extended. What OS you're using
when you see such problems ?
Just in case
Patrick,
I'm unable to reproduce the buffer overrun with the latest trunk. I
run valgrind (with the memchekcer tool) on a regular basis on the
trunk, and I never noticed anything like that. Moreover, I went over
the code, and I cannot imagine how we can overrun the buffer in the
code you
Same here. The TRAC server seems to have some problems, but the svn
access is still working.
george.
On Sep 18, 2008, at 10:28 AM, Lenny Verkhovsky wrote:
Any problems with https://svn.open-mpi.org/trac/ompi/ ??
I can open a new ticket :(
Internal Server Error
The server encountered an
I've been using these versions for some time, basically from the date
they get released. So far, no issues have been raised. However, I do
not see any benefit with these new versions (on Linux and Mac OS X).
george.
On Sep 19, 2008, at 9:56 AM, Tim Mattox wrote:
Just an FYI,
Last night
Ralph,
There is NO need to have this discussion again, it was painful enough
last time. From my perspective I do not understand why are you making
so much noise on this one. How a 4 lines change in some ALPS specific
files (Cray system very specific to ORNL) can generate more than 3 A4
Ralph in order to have the behavior you describe for the visibility
feature just don't specify --enable-visibility. This will enable it if
the feature is supported and disable (plus a small warning) if not.
We decided a while ago that 1) we should have a consistent behavior
for similar
arly state that we are
unable to find any (or some specific) version of the valgrind libraries.
The behavior related to memchecker you described in your second email
seems like a deviation from this, so from my perspective it should be
considered as a bug.
george.
On Oct 3, 2008, at 7:18
There is a simple (and good) reason not to have it in the upcoming
release. By lack of time, I didn't manage to test it well enough. As
I'm not 100% confident that it will not create any problems, I
preferred to left it out of the 1.3.
If you want to test it, please fell free to do so. And
I did investigate this issue for about 3 hours yesterday. Neither
valgrind nor efence report any errors on my cluster. I'm using debian
unstable with gcc-4.1.2. Adding printfs doesn't shows the same output
as you, all addresses are in the correct range. I went over the code
manually, and
? If yes does it resolve
the issue ?
george.
On Oct 16, 2008, at 7:29 PM, Stephan Kramer wrote:
George Bosilca wrote:
I did investigate this issue for about 3 hours yesterday. Neither
valgrind nor efence report any errors on my cluster. I'm using
debian unstable with gcc-4.1.2. Adding
Stephan,
The fix was committed in the trunk (revision 19778). Fixes for the
1.2, as well as the 1.3 are pending.
Thanks for your help,
george.
On Oct 20, 2008, at 5:43 PM, George Bosilca wrote:
Stephen,
I think you're completely right, and that I had a wrong
understanding
Youpiii!
george.
On Oct 21, 2008, at 4:53 PM, Ralph Castain wrote:
Hello all
I am working on adding a new radix tree routed module and am
simultaneously doing a little streamlining to the overall routed-
related code for scalability. One thing that would help cleanup
several areas
What's happened if we roll around with the counter ?
george.
On Oct 22, 2008, at 2:49 PM, Ralph Castain wrote:
There recently was activity on the mailing lists where someone was
attempting to call comm_spawn 100,000 times. Setting aside the
threading issues that were the focus of that
Ralph,
This problem was fixed long ago by some of the work Camille did. The
exact revision number is r15402 (https://svn.open-mpi.org/trac/ompi/changeset/15402
). I'm using this feature daily and so far I had any problems with it.
To reuse your example here is what Camille came up with.
$
, Ralph Castain wrote:
Done...r19820
On Oct 28, 2008, at 8:37 AM, Ralph Castain wrote:
Yes, of course it does - the problem is in a sanity check I just
installed over the weekend.
Easily fixed...
On Oct 28, 2008, at 8:33 AM, George Bosilca wrote:
Ralph,
I run in troubles with the new IO
Leonardo,
All events generated by the libevent are catched internally by the
ompi library, but are not propagated until the next call to
opal_progress. If you want to use alarms that trigger outside the
opal_progress you will have to deal directly with the libevent (and
not use
On Nov 7, 2008, at 11:41 AM, Timothy Hayes wrote:
http://macneill.cs.tcd.ie/~hayesti/ompi.jpg
This is unfortunately not available to the outside world.
N.B. The XEN component in the BTL layer represents what I'm trying
to make.
So far so good, the BTL is what you need in order to move
Apparently it was with 19845, so before the patch that is supposed to
fix this issue. Terry can you please test with a more recent version
(> 19929).
Thanks,
george.
On Nov 8, 2008, at 9:54 AM, Edgar Gabriel wrote:
Terry,
was this with the trunk or v1.3? If it was the trunk, was it
On Nov 15, 2008, at 10:30 , Jeff Squyres wrote:
I've reviewed and updated the entire README file, but have several
questions that need to be answered by others in the community. I
committed all my changes, but marked sections of the file with "***"
where there's still a question about
We're still using STL ? I was pretty much sure that we removed this
dependency a while ago ?
george.
On Nov 19, 2008, at 09:11 , Ethan Mallove wrote:
WHAT: Add patch-libtool-for-sun-studio.pl script
Shiqing,
Don't waste your time. While the idea behind cccl is nice, the
overhead is unbelievably expensive. As a comparison it took 2 hours to
compile Open MPI on Windows using cccl and makefile, while it takes
less than 4 minutes to compile exactly the same set of functionalities
using
Terry,
I'm involved [at some degree] in both efforts and I can confirm these
two efforts will not affect each other in any bad way.
george.
On Dec 3, 2008, at 11:42 , Terry Dontje wrote:
I don't have any *strong* objections. However, I know that Eugene
and George B have been working on
Brian,
You're right, the datatype is being too cautious with the boundaries
when detecting the overlap. There is no good solution to detect the
overlap except parsing the whole memory layout to check the status of
every predefined type. As one can imagine this is a very expensive
s to devel. Someone might want to reply to Dorian's e-mail
on users.
Brian
On Dec 11, 2008, at 2:31 PM, George Bosilca wrote:
Brian,
You're right, the datatype is being too cautious with the
boundaries when detecting the overlap. There is no good solution to
detect the overlap exce
solution. However, the words "not it"
come to mind. Sorry, but I have way too much on my plate this
month. By the way, in case no one noticed, I had e-mailed my
findings to devel. Someone might want to reply to Dorian's e-mail
on users.
Brian
On Dec 11, 2008, at 2:31 PM, Geor
eone might want to reply to Dorian's e-mail
on users.
Brian
On Dec 11, 2008, at 2:31 PM, George Bosilca wrote:
Brian,
You're right, the datatype is being too cautious with the
boundaries when detecting the overlap. There is no good solution to
detect the overlap except parsing the wh
As Rich stated, the original design of the SM BTL included [some]
support for dynamic processes. Over the years, by lack of interest and
man-power this support was more or less dropped. Some pieces of the
code were removed or disabled, but apparently not everything.
However, lately the
There might be one reason to slowdown the application quite a bit. If
the fact that you're using timer interact with the libevent (the
library we're using to internally manage any kind of events), then we
might end-up in the situation where we call the poll for every
iteration in the event
Paul,
Thanks for noticing the Elan problem. It appears we miss one patch in
the 1.3 (https://svn.open-mpi.org/trac/ompi/changeset/20122). I'll
fill a CMR asap.
Thanks,
george.
On Jan 13, 2009, at 16:31 , Paul H. Hargrove wrote:
Since it looks like you guys are very close to
The simple answer is you can't. The mpool is loaded before the BTLs
and on Linux the loader use the RTLD_NOW flag (i.e. all symbols have
to be defined or the dlopen call will fail).
Moreover, there is no way in Open MPI to exchange information between
components except a global variable or
This topic was raised on the mailing list quite a few times. There is
a major difference between the PSM and the MX support. For PSM there
is just an MTL, which makes everything a lot simpler. The problem with
MX is that we have an MTL and a BTL. In order to figure out which one
to use, we
rent behavior to match.
Brian
On Jan 13, 2009, at 6:27 PM, George Bosilca wrote:
This topic was raised on the mailing list quite a few times. There
is a major difference between the PSM and the MX support. For PSM
there is just an MTL, which makes everything a lot simpler. The
problem with
Unfortunately, this pinpoint the fact that we didn't test enough the
collective module mixing thing. I went over the tuned collective
functions and changed all instances to use the correct module
information. It is now on the trunk, revision 20267. Simultaneously,I
checked that all other
Here we go by the book :)
https://svn.open-mpi.org/trac/ompi/ticket/1749
george.
On Jan 13, 2009, at 23:40 , Jeff Squyres wrote:
Let's debate tomorrow when people are around, but first you have to
file a CMR... :-)
On Jan 13, 2009, at 10:28 PM, George Bosilca wrote:
Unfortunately
:
http://www.open-mpi.org/mtt/index.php?do_redir=922
So, I'll vote for applying the CMR for 1.3 since it clearly improved
things,
but there is still more to be done to get coll_hierarch ready for
regular
use.
On Wed, Jan 14, 2009 at 12:15 AM, George Bosilca
<bosi...@eecs.utk.edu>
Absolutely! Why wait until the 1.4 while we can have that in the
1.3.1...
george.
On Jan 15, 2009, at 16:39 , Eugene Loh wrote:
I don't know what scope of changes require RFCs, but here's a
trivial change.
==
RFC: Eliminate
There are several reasons these calls are there. Please read further.
On Jan 26, 2009, at 02:19 , Brice Goglin wrote:
Hello,
I am testing OpenMPI 1.3 over Open-MX. OpenMPI 1.2 works well but 1.3
does not load. This is caused by OMPI MX components now using some MX
internal symbols
On Jan 26, 2009, at 15:31 , Brice Goglin wrote:
George Bosilca wrote:
Yes, the only thing we need is an unique identifier per cluster. We
use the last 6 digits from the mapper MAC address.
Ok, thanks for the details. We are going to implement all this in
Open-MX now.
Then, I guess
Seems more like a compiler problem. A static inline function defined
in the header file but never used is the source of the problem. It did
compile for me with the gcc from Leopard and 4.3.1 on Linux. I'll
commit the fix asap.
george.
On Jan 28, 2009, at 14:26 , Ralph Castain wrote:
In the current bitmap implementation every time we set or check a bit
we have to compute the index of the char where this bit is set and the
relative position from the beginning of char. This requires two _VERY_
expensive operations: a division and a modulo. Compared with the cost
of these
it ...
george.
On Feb 3, 2009, at 15:30 , Jeff Squyres wrote:
On Feb 3, 2009, at 3:24 PM, George Bosilca wrote:
In the current bitmap implementation every time we set or check a
bit we have to compute the index of the char where this bit is set
and the relative position from the beginning of char
Christoph,
You're absolutely right. In addition to your comment about the
syntactically wrong line of code, even in the case when the fortran
and C integers have the same length, we modify the value pointer by
the fortran IN only argument.
A patch is on the way.
Thanks,
george.
On
This functionality has as many chances to be called as any MPI 2
dynamics MPI functions. Every time the MPI universe is expanded, once
the modex of the new processes is known, add procs is called in order
to allow the PML and BTL to update their local view of the MPI universe.
The code is
These changes look fine to me. However, I would like to amend this
proposal to include the splitting of the config directory. Over the
last months, I know several project that use OPAL, and they like to
use it as an independent part and not as a subset of ompi. Therefore,
I had to extract
I'm unable to replicate these errors with revision r20529. All tests
pass on my Linux cluster, TCP based not Myrinet. Let's see if other
contributors to the MTT tests trigger the same errors.
george.
On Feb 12, 2009, at 12:04 , Tim Mattox wrote:
Hello,
Last night's MTT runs show a
I can't confirm or deny. The only thing I can tell is that the same
test works fine over other BTL, so this tent either to pinpoint a
problem in the sm BTL or in a particular path in the PML (the one used
by the sm BTL). I'll have to dig a little bit more into it, but I was
hoping to do it
Josh,
Spending few minutes to understand, could have pinpointed you to the
real culprit: the tool itself!
The assert in the code state that on finalize there is still a
registered signal handler. A quick gdb show that this is for the
SIG_CHLD. Tracking the signal addition in the tool
Based on several man pages, free is capable of handling a NULL
argument. What is really puzzling is that on your system it doesn't ...
I tried on two system a 64 bits Debian and on my MAC OS X with all
memory allocator options on, and I'm unable to get such a warning :(
george.
On Feb
I guess that if the free function supports the NULL pointer we should
do the same...
george.
On Feb 17, 2009, at 07:35 , Jeff Squyres wrote:
On Feb 16, 2009, at 9:16 PM, George Bosilca wrote:
Based on several man pages, free is capable of handling a NULL
argument. What is really
17, 2009, at 11:18 AM, George Bosilca wrote:
I guess that if the free function supports the NULL pointer we
should do the same...
I'll agree with that if we know for sure that free(NULL) is
universally supported. You mentioned "a few man pages" -- how
universal is this support?
Eugene,
It appears this is a sm BTL problem. The prepare_src function can be
called with any size. The BTL should check the size against the eager
and return a descriptor that match the size requested.
george.
On Feb 17, 2009, at 20:14 , Eugene Loh wrote:
(Rich: same question as I
I fail to find anything about this on the MPI Standard. For me passing
the NULL error handle to any kind of set handler function should not
be an error. It should means that you prefer to not have any error
handler triggered on the object.
george.
On Feb 19, 2009, at 09:34 , Lisandro
_REQUEST_NULL should be the empty status (defined
in the MPI standard) and not any kind of errors (i.e MPI_ERR_ARG).
george.
On Feb 19, 2009, at 11:43 , Jeff Squyres wrote:
On Feb 19, 2009, at 10:47 AM, George Bosilca wrote:
I fail to find anything about this on the MPI Standard.
MPI doesn
It doesn't sound reasonable to me. There is a reason for this, and I
think it's a good reason. The sendi function work for some devices as
a fast path for sending data, when the network is not flooded.
However, in the case sendi cannot do the job we expect, the fact that
it return the
On Feb 23, 2009, at 12:14 , Eugene Loh wrote:
I'm a newbie and George is a veteran. So, this feels rather like
David and Goliath. (Hmm, David won and became king. Gee, I kinda
like that.) Anyhow...
That's an old story, we're living in modern times now ;)
George Bosilca wrote
Ken,
Your interpretation of the MPI standard is way too optimistic.
Unfortunately, there is no asynchronous progress (expect on very few
devices) in most of the MPI libraries. So, you should not expect the
non blocking send to complete, without going in some MPI calls
(MPI_Test as an
Here is another way to write the code without having to pay the
expensive initialization of sendreq.
first_time = 0;
for ( btl = ... ) {
if ( SUCCESS == sendi() ) return SUCCESS;
if( 0 == first_time++) set_up_expensive_send_request();
if ( SUCCESS == send() ) return
On Feb 24, 2009, at 18:08 , Eugene Loh wrote:
(Probably this message only for George, but I'll toss it out to the
alias/archive.)
I have a question about the sm sendi() function. What should happen
if the sendi() function attempts to write to the FIFO, but the FIFO
is full?
The write
Markus,
You're right, there was a problem in the code. I'll pass the gore
details of the why and how. The problem is now fixed by commit r20674.
It will be in the next release.
Thanks,
george.
On Mar 2, 2009, at 10:04 , Markus Blatt wrote:
Hi,
I already posted this accidentally
Right, this should be reinitialized at the beginning of each loop.
However, the current code works fine, it only call the
ompi_convertor_set_position twice if the condition is true. This
function check if the current position match the requested one, and
does nothing if its the case.
Which solution seems to be working ?
This bug was fixed a while ago in the trunk (https://svn.open-mpi.org/trac/ompi/changeset/20591
) and in the 1.3 branch. It even made it in the 1.3.2.
george.
On Mar 3, 2009, at 05:01 , Lenny Verkhovsky wrote:
Seems to be working.
George, can you
On Mar 4, 2009, at 14:44 , Eugene Loh wrote:
Let me try another thought here. Why do we have BTL sendi functions
at all? I'll make an assertion and would appreciate feedback: a
BTL sendi function contributes nothing to optimizing send latency.
To optimize send latency in the
On Mar 9, 2009, at 15:13 , Ralph Castain wrote:
Could you please clarify - what is going to happen on Mar 23 (your
timeout date)?
It also wasn't clear about your testing. Are you calling up into the
ONET layer to run it from the RTE? I believe this was the point of
concern regarding
The default values for the large message fragments are not optimized
for the new generation processors. This might be something to
investigate, in order to see if we can have the same bandwidth as they
do or not.
george.
On Mar 17, 2009, at 18:23 , Eugene Loh wrote:
A colleague of mine
It is a known problem. When the freelist is empty going in the
ompi_free_list_wait will block the process until at least one fragment
became available. As a fragment can became available only when
returned by the BTL, this can lead to deadlocks in some cases. The
workaround is to ban the
You are absolutely right, the peer should never be set to -1 on any of
the PERUSE callbacks. I checked the code this morning and figure out
what was the problem. We report the peer and the tag attached to a
request before setting the right values (some code moved around). I
submitted a
601 - 700 of 1109 matches
Mail list logo