On 25. nov. 2014 22:40, Matthew Knepley wrote:
On Tue, Nov 25, 2014 at 2:34 PM, Håkon Strandenes <[email protected]
<mailto:[email protected]>> wrote:

(...)

First, this is great debugging.

Thanks.


Second, my reading of the HDF5 document you linked to says that either
selection should be valid:

   "For non-regular hyperslab selection, parallel HDF5 uses independent
IO internally for this option."

so it ought to fall back to the INDEPENDENT model if it can't do
collective calls correctly. However,
it appears that the collective call has bugs.

My conclusion: Since you have determined that changing the setting to
INDEPENDENT produces
correct input/output in all the test cases, and since my understanding
of the HDF5 documentation is
that we should always be able to use COLLECTIVE as an option, this is an
HDF5 or MPT bug.

I have conducted yet another test:
My example (ex10) that I previously posted to the mailing list was set up with 250 grid points along each axis. When the topic on chunking was brought to the table, I realized that 250 is not evenly dividable on four. The example failed on 64 processes, that is four processes along each direction (the division is 62 + 62 + 63 + 63 = 250).

So I have recompiled "my ex10" with 256 gridpoints in each direction. It turns out that this does in deed run successfully on 64 nodes. Great! It also runs on 128 processes, that is a 8x4x4 decomposition. However it does not run on 125 processes, that is a 5x5x5 decomposition.

The same pattern is clear if I run my example with 250^3 grid points. It does not run on numbers like 64 and 128, but does run successfully on 125 processes, again only when the sub-domains are of exactly equal size (in this case the domain is divided as 5x5x5).

However, I still believe that there is bugs. I did my "roundtrip" by loading a dataset and immediately writing the same dataset to a different file, this time a 250^3 dataset on 125 processes. It did not "pass" this test, i.e. the written dataset was just garbage. I have not yet identified if the garbling is introduced in the reading or writing of the dataset.


Does anyone else see the HDF5 differently? Also, it really looks to me
like HDF5 messed up the MPI
data type in the COLLECTIVE picture below, since it appears to be sliced
incorrectly.

Possible Remedies:

   1) We can allow you to turn off H5Pset_dxpl_mpio()

   2) Send this test case to the MPI/IO people at ANL

If you think 1) is what you want, we can do it. If you can package this
work for 2), it would be really valuable.

I will be fine editing gr2.c manually each time this file is changed (I use the sources from Git). But *if* this not a bug in MPT, but a bug in PETSc or HDF5 it should be fixed... Because it is that kind of bug that is extremely annoying and a read pain to track down.

Perhaps the HDF5 mailing list could contribute in this issue?


   Thanks,

     Matt

    Tanks for your time.

    Best regards,
    Håkon Strandenes



Again thanks for your time.

Regards,
Håkon





--
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener

Reply via email to