> On Nov 26, 2014, at 6:26 AM, Håkon Strandenes <[email protected]> wrote:
>
> My local HPC group have found a solution to this problem:
> On MPT it is possible to set an environment variable MPI_TYPE_DEPTH with
> default value 8. The MPI_TYPE_DEPTH variable limits the maximum depth of
> derived datatypes that an application can create.
Is the variable MPI_TYPE_DEPTH actually set in the environment to 8 (by
default) or does it use the value of 8 if the variable is not found?
We can use getenv("MPI_TYPE_DEPTH"); in PETSc when the HDF5 viewer is
created to make sure the value is sane and otherwise produce a useful error
message telling the user exactly what to do. BUT we need to somehow limit this
test to machines where it matters. So for example
PETSC_EXTERN PetscErrorCode PetscViewerCreate_HDF5(PetscViewer v)
{
PetscViewer_HDF5 *hdf5;
PetscErrorCode ierr;
const char *typedepth;
int itypedepth;
PetscFunctionBegin;
#if defined(PETSC_HAVE_HDF5_REQUIRE_LARGE_MPI_TYPE_DEPTH)
typedepth = getenv("MPI_TYPE_DEPTH")
sscanf(typedepth,"%d",&itypedepth);
if (itypedepth < 100) SETERRQ(...,"This system requires you do \"export
MPI_TYPE_DEPTH=100\" before submitting jobs when using HDF5");
#endif
but we need a configure test that determines if this is such a system. Can
you tell us a "system command" we could run in our configure to detect these
SGI MPT system?
Thanks
Barry
A big FAT error message is always better than a FAQ when possible.
>
> I have found that setting this to at least 32 will make my examples run
> perfectly on up to 256 processes. No error messages what so ever, and in my
> simple load and write dataset roundtrip h5diff compares the two datasets and
> finds then identical. I also notice that Leibniz Rechenzentrum recommend to
> set this variable to 100 (or some other suitably large value) when using
> NetCDF together with MPT (https://www.lrz.de/services/software/io/netcdf/).
>
> This bug have been a pain in the (***)... Perhaps it is worthy a FAQ entry?
>
> Thanks for your time and effort.
>
> Regards,
> Håkon Strandenes
>
>
> On 26. nov. 2014 08:01, Håkon Strandenes wrote:
>>
>>
>> On 25. nov. 2014 22:40, Matthew Knepley wrote:
>>> On Tue, Nov 25, 2014 at 2:34 PM, Håkon Strandenes <[email protected]
>>> <mailto:[email protected]>> wrote:
>>>
>>> (...)
>>>
>>> First, this is great debugging.
>>
>> Thanks.
>>
>>>
>>> Second, my reading of the HDF5 document you linked to says that either
>>> selection should be valid:
>>>
>>> "For non-regular hyperslab selection, parallel HDF5 uses independent
>>> IO internally for this option."
>>>
>>> so it ought to fall back to the INDEPENDENT model if it can't do
>>> collective calls correctly. However,
>>> it appears that the collective call has bugs.
>>>
>>> My conclusion: Since you have determined that changing the setting to
>>> INDEPENDENT produces
>>> correct input/output in all the test cases, and since my understanding
>>> of the HDF5 documentation is
>>> that we should always be able to use COLLECTIVE as an option, this is an
>>> HDF5 or MPT bug.
>>
>> I have conducted yet another test:
>> My example (ex10) that I previously posted to the mailing list was set
>> up with 250 grid points along each axis. When the topic on chunking was
>> brought to the table, I realized that 250 is not evenly dividable on
>> four. The example failed on 64 processes, that is four processes along
>> each direction (the division is 62 + 62 + 63 + 63 = 250).
>>
>> So I have recompiled "my ex10" with 256 gridpoints in each direction. It
>> turns out that this does in deed run successfully on 64 nodes. Great! It
>> also runs on 128 processes, that is a 8x4x4 decomposition. However it
>> does not run on 125 processes, that is a 5x5x5 decomposition.
>>
>> The same pattern is clear if I run my example with 250^3 grid points. It
>> does not run on numbers like 64 and 128, but does run successfully on
>> 125 processes, again only when the sub-domains are of exactly equal size
>> (in this case the domain is divided as 5x5x5).
>>
>> However, I still believe that there is bugs. I did my "roundtrip" by
>> loading a dataset and immediately writing the same dataset to a
>> different file, this time a 250^3 dataset on 125 processes. It did not
>> "pass" this test, i.e. the written dataset was just garbage. I have not
>> yet identified if the garbling is introduced in the reading or writing
>> of the dataset.
>>
>>>
>>> Does anyone else see the HDF5 differently? Also, it really looks to me
>>> like HDF5 messed up the MPI
>>> data type in the COLLECTIVE picture below, since it appears to be sliced
>>> incorrectly.
>>>
>>> Possible Remedies:
>>>
>>> 1) We can allow you to turn off H5Pset_dxpl_mpio()
>>>
>>> 2) Send this test case to the MPI/IO people at ANL
>>>
>>> If you think 1) is what you want, we can do it. If you can package this
>>> work for 2), it would be really valuable.
>>
>> I will be fine editing gr2.c manually each time this file is changed (I
>> use the sources from Git). But *if* this not a bug in MPT, but a bug in
>> PETSc or HDF5 it should be fixed... Because it is that kind of bug that
>> is extremely annoying and a read pain to track down.
>>
>> Perhaps the HDF5 mailing list could contribute in this issue?
>>
>>>
>>> Thanks,
>>>
>>> Matt
>>>
>>> Tanks for your time.
>>>
>>> Best regards,
>>> Håkon Strandenes
>>>
>>>
>>
>> Again thanks for your time.
>>
>> Regards,
>> Håkon
>>
>>>
>>>
>>>
>>>
>>> --
>>> What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which
>>> their experiments lead.
>>> -- Norbert Wiener