George Bosilca wrote:
The default values for the large message fragments are not optimized
for the new generation processors. This might be something to
investigate, in order to see if we can have the same bandwidth as they
do or not.
Are you suggesting bumping up the btl_sm_max_send_size value
Could we hold off on this until after 1.3.2 is out the door and has a
couple of days to stabilize? All these header file changes are making
it more difficult to cleanly apply patches to the 1.3 branch.
When we get past the next couple of weeks, the 1.3 branch should clear
out the backlog of
I can't remember if I've forwarded this to the OMPI lists before;
pardon if you have seen this before. I have one of these books and I
find it quite handy. IMHO: it's quite a steal for US$25 (~600 pages).
Begin forwarded message:
From: "Rolf Rabenseifner"
Date: March 18, 2009 10:21:31 AM
Something like this. We can play with the eager size too, maybe 4K is
too small.
george.
On Mar 18, 2009, at 06:43 , Terry Dontje wrote:
George Bosilca wrote:
The default values for the large message fragments are not
optimized for the new generation processors. This might be
something
George Bosilca wrote:
Something like this. We can play with the eager size too, maybe 4K is
too small.
george.
I guess I am curious why the larger buffer sizes work better? I am
curious because we ran into a similar issue on one of our platforms and
it turned out to be the non-temporal co
God willing, this will be the last rc. Really.
Please test test test:
http://www.open-mpi.org/software/ompi/v1.3/
--
Jeff Squyres
Cisco Systems
I don't have access to the machine where my colleague ran. On other
machines, it appears that playing with eager or fragsize doesn't change
much... and, in any case, OMPI bandwidth is up around memcpy bandwidth.
So, maybe the first challenge is reproducing what he saw and/or getting
access to
That might indicate the source of the bandwidth difference.
Open MPI uses the compiler supplied memcpy, which may or
may not be particularly fast for a given machine/architecture.
Scali could very well be using its own tuned memcpy.
On the hulk and tank systems at IU (16 core intel shared mem mach