Creating nightly hwloc snapshot git tarball was a success.
Snapshot: hwloc 1.10.1-14-gffd4fa8
Start time: Wed Mar 11 21:03:13 EDT 2015
End time: Wed Mar 11 21:04:46 EDT 2015
Your friendly daemon,
Cyrador
Creating nightly hwloc snapshot git tarball was a success.
Snapshot: hwloc dev-459-g4208a31
Start time: Wed Mar 11 21:01:02 EDT 2015
End time: Wed Mar 11 21:02:59 EDT 2015
Your friendly daemon,
Cyrador
We have some new Power8 nodes with dual-port FDR HCAs. I have not tested
same-node Verbs throughput. Using Linux’s Cross Memory Attach (CMA), I can get
30 GB/s for 2 MB messages between two cores and then it drops off to ~12 GB/s.
The PCIe Gen3 x16 slots should max at ~15 GB/s. I agree that
In that case we should find a way to eliminate this behavior. I will
take a look later this week and see if there is a workable solution.
-Nathan
On Wed, Mar 11, 2015 at 11:41:00AM -0600, Howard Pritchard wrote:
>My experience with DMA engines located on the other side of a PCI-e 16x
>
My experience with DMA engines located on the other side of a PCI-e 16x
gen3 bus from the cpus is that for a couple of ranks doing large
transfers between each other on a node, using the DMA engine looks good.
But once there are multiple ranks exchanging data (like up to 32 ranks on a
dual socket
Definitely a side-effect though it could be beneficial in some cases as
the RDMA engine in the HCA may be faster than using memcpy (larger than
a certain size). I don't know how to best fix this as I need all RDMA
capable BTLs to listed for RMA. I though about adding another list to
track BTLs
Fixed.
-Nathan
On Thu, Feb 26, 2015 at 04:57:22PM -0800, Paul Hargrove wrote:
>The warning below comes from pgi-14.7 on the latest master tarball (output
>from "make V=1").
>-Paul
>libtool: compile: pgcc -DHAVE_CONFIG_H -I.
>
>