Jeff et al,
Thanks, exactly what I was looking for.
Pete
I think the information was scattered across a few posts, but the union of
which is correct:
- it depends on the benchmark
- yes, L1/L2/L3 cache sizes can have a huge effect. I.e., once the buffer size
gets bigger than the cache size, it
On Thu, 10 Mar 2016, BRADLEY, PETER C PW wrote:
This is an academic exercise, obviously. The curve shown comes from one pair
of ranks running on the same node alternating between MPI_Send and
MPI_Recv. The most likely suspect is a cache effect, but rather than assuming,
I was curious if
I think the information was scattered across a few posts, but the union of
which is correct:
- it depends on the benchmark
- yes, L1/L2/L3 cache sizes can have a huge effect. I.e., once the buffer size
gets bigger than the cache size, it takes more time to get the message from
main RAM
-->
This is an academic exercise, obviously. The curve shown comes from one pair
of ranks running on the same node alternating between MPI_Send and MPI_Recv.
The most likely suspect is a cache effect, but rather than assuming, I was
curious if there might be any other aspects of the
Pete,
how did you measure the bandwidth ?
iirc, IMB benchmark does not reuse send and recv buffers, so the results
could be different.
also, you might want to use a logarithmic scale for the message size, so
information for small messages is easier to read.
Cheers,
Gilles
On Thursday, March
Fwiw,you might want to try compare sm and vader
mpirun --mca btl self,sm ...
And with and without knem
(modprobe knem should do the trick)
Cheers,
Gilles
Vincent Diepeveen wrote:
>
>You're trying to read absurd huge message sizes considering you're busy
>testing the memory
You're trying to read absurd huge message sizes considering you're busy
testing the memory bandwidth of your system in this manner.
As soon as the message gets larger than your CPU's caching
system it has to copy the message several times via your RAM, falls
outside CPU's L2 or L3 cache and
I'm curious what causes the hump in the pingpong bandwidth curve when running
on shared memory. Here's an example running on a fairly antiquated
single-socket 4 core laptop with linux (2.6.32 kernel). Is this a cache
effect? Something in OpenMPI itself, or a combination?
[Macintosh