Juha Jäykkä <[email protected]> writes: > On Sunday 18 August 2013 08:10:19 Jed Brown wrote: >> Output uses a collective write, so the granularity of the IO node is >> probably more relevant for writing (e.g., BG/Q would have one IO node >> per 128 compute nodes), but almost any chunk size should perform >> similarly. It would make a lot more difference for something like > > I ran into this on a Cray XK30 and it's certainly not the case there that any > chunk size performs even close to similarly: I can get IO throughput from > roughly 50 MB/s to 16 GB/s depending on the chunk sizes and number of ranks > participating in the MPI IO operation (underneath H5Dwrite()).
What range of chunk sizes are you using? For each fixed number of ranks, how does the performance vary when varying chunk size from, say, 5MB to 500MB? > Yes, this certainly needs to be considered, too. I guess huge chunks are bad > here? Likely, but depends what you are looking at. >> Chunk size needs to be collective. We could compute an average size >> From each subdomain, but can't just use the subdomain size. > > Why not use the size of the local part of the DA/Vec? That would guarantee That's fine, but the chunk size needs to be *collective* so we need to do a reduction or otherwise compute the "average size". >> I think the chunk size (or maximum chunk size) should be settable by the >> user. > > I agree, that would be the best solution. > > Is the granularity (number of ranks actually doing disc IO) settable on HDF5 > side or does that need to be set in MPI-IO? I'm not sure what you mean. On a system like BG, the computed nodes are not connected to disks and instead have to send the data to IO nodes. The distribution of IO nodes is part of the machine design. The ranks participating in IO are just rearranging data before sending it to the IO nodes. > Any idea which version of PETSc this fix might get into? I currently keep my > own patched version of gr2.c around, which uses local-Vec-size chunks and it > works ok, but I'd like to be able to use vanilla PETSc again. Send a patch (or submit a pull request) against 'maint' and we'll consider it. As long as the change doesn't break any existing uses, it could be merged to 'maint' (thus v3.4.k for k>=3) after testing.
pgp2_lCT0OjHk.pgp
Description: PGP signature
