The problem is that you need to write the file with an optimal stripe
count/size in the first place. An unaware user who just uses something
like cp will end up with the default stripe count which is usually 1.
For large files, you should just set the stripe count to the number of
OSTs. Your results seem to support this.
For the small mesh and 64 nodes, you are reading just 2 MiB per process.
I think that collective I/O should give you a significant improvement.
Also, it would be interesting to know what performance you get from a
single process reading from a single OST. I think you should be able to
get 0.5-2.5 GiB/s which is what you are getting from 36 OSTs (~70 MiB/s
per OST).
BTW, since you also used Salomon for testing, I found some old tests I
did there with pure MPI I/O, and I was able to get 18.5 GiB/s read for 1
GiB file on 108 processes / 54 nodes, 54 OSTs, 4 MiB stripe.
Best,
Jakub
On 6/14/19 12:31 PM, Hapla Vaclav via petsc-dev wrote:
I take back one thing I mentioned in my talk in Atlanta. I think I
said that Lustre striping does not really influence the read
performance. With my latest results in hand, I must point out this is
not true. I might have been confused by some former Piz Daint Lustre
performance issues and/or HDF5 library issues I mentioned.
Here are my latest slides from PASC19.
https://polybox.ethz.ch/index.php/s/PPZLSyZOKo3UXPS
On slide 18, there is some comparison for different stripe settings. I
can now see a speed-up of ~4 for 1 vs 12 stripes (which is actually
the number of cores per node) for the mesh with 128M elements. The
times are very similar for 8 and 64 computation nodes.
Toby, could you maybe forward this message to the meeting attendees? I
don't want to leave anybody confused.
Thanks,
Vaclav