The problem is that you need to write the file with an optimal stripe count/size in the first place. An unaware user who just uses something like cp will end up with the default stripe count which is usually 1.

For large files, you should just set the stripe count to the number of OSTs. Your results seem to support this.

For the small mesh and 64 nodes, you are reading just 2 MiB per process. I think that collective I/O should give you a significant improvement.

Also, it would be interesting to know what performance you get from a single process reading from a single OST. I think you should be able to get 0.5-2.5 GiB/s which is what you are getting from 36 OSTs (~70 MiB/s per OST).

BTW, since you also used Salomon for testing, I found some old tests I did there with pure MPI I/O, and I was able to get 18.5 GiB/s read for 1 GiB file on 108 processes / 54 nodes, 54 OSTs, 4 MiB stripe.

Best,

Jakub


On 6/14/19 12:31 PM, Hapla Vaclav via petsc-dev wrote:
I take back one thing I mentioned in my talk in Atlanta. I think I said that Lustre striping does not really influence the read performance. With my latest results in hand, I must point out this is not true. I might have been confused by some former Piz Daint Lustre performance issues and/or HDF5 library issues I mentioned.

Here are my latest slides from PASC19.
https://polybox.ethz.ch/index.php/s/PPZLSyZOKo3UXPS

On slide 18, there is some comparison for different stripe settings. I can now see a speed-up of ~4 for 1 vs 12 stripes (which is actually the number of cores per node) for the mesh with 128M elements. The times are very similar for 8 and 64 computation nodes.

Toby, could you maybe forward this message to the meeting attendees? I don't want to leave anybody confused.

Thanks,
Vaclav

Reply via email to