> On Nov 26, 2014, at 6:26 AM, Håkon Strandenes <[email protected]> wrote:
> 
> My local HPC group have found a solution to this problem:
> On MPT it is possible to set an environment variable MPI_TYPE_DEPTH with 
> default value 8. The MPI_TYPE_DEPTH variable limits the maximum depth of 
> derived datatypes that an application can create.

   Is the variable MPI_TYPE_DEPTH  actually set in the environment to 8 (by 
default) or does it use the value of 8 if the variable is not found? 

   We can use getenv("MPI_TYPE_DEPTH"); in PETSc when the HDF5 viewer is 
created to make sure the value is sane and otherwise produce a useful error 
message telling the user exactly what to do. BUT we need to somehow limit this 
test to machines where it matters. So for example

   PETSC_EXTERN PetscErrorCode PetscViewerCreate_HDF5(PetscViewer v)
{
  PetscViewer_HDF5 *hdf5;
  PetscErrorCode   ierr;
  const char *typedepth;
  int itypedepth;

  PetscFunctionBegin;
#if defined(PETSC_HAVE_HDF5_REQUIRE_LARGE_MPI_TYPE_DEPTH)
  typedepth = getenv("MPI_TYPE_DEPTH")
  sscanf(typedepth,"%d",&itypedepth);
  if (itypedepth < 100) SETERRQ(...,"This system requires you do \"export 
MPI_TYPE_DEPTH=100\" before submitting jobs when using HDF5");
#endif

  but we need a configure test that determines if this is such a system. Can 
you tell us a "system command" we could run in our configure to detect these 
SGI MPT system?

  Thanks

   Barry

A big FAT error message is always better than a FAQ when possible.
  

> 
> I have found that setting this to at least 32 will make my examples run 
> perfectly on up to 256 processes. No error messages what so ever, and in my 
> simple load and write dataset roundtrip h5diff compares the two datasets and 
> finds then identical. I also notice that Leibniz Rechenzentrum recommend to 
> set this variable to 100 (or some other suitably large value) when using 
> NetCDF together with MPT (https://www.lrz.de/services/software/io/netcdf/).
> 
> This bug have been a pain in the (***)... Perhaps it is worthy a FAQ entry?
> 
> Thanks for your time and effort.
> 
> Regards,
> Håkon Strandenes
> 
> 
> On 26. nov. 2014 08:01, Håkon Strandenes wrote:
>> 
>> 
>> On 25. nov. 2014 22:40, Matthew Knepley wrote:
>>> On Tue, Nov 25, 2014 at 2:34 PM, Håkon Strandenes <[email protected]
>>> <mailto:[email protected]>> wrote:
>>> 
>>> (...)
>>> 
>>> First, this is great debugging.
>> 
>> Thanks.
>> 
>>> 
>>> Second, my reading of the HDF5 document you linked to says that either
>>> selection should be valid:
>>> 
>>>   "For non-regular hyperslab selection, parallel HDF5 uses independent
>>> IO internally for this option."
>>> 
>>> so it ought to fall back to the INDEPENDENT model if it can't do
>>> collective calls correctly. However,
>>> it appears that the collective call has bugs.
>>> 
>>> My conclusion: Since you have determined that changing the setting to
>>> INDEPENDENT produces
>>> correct input/output in all the test cases, and since my understanding
>>> of the HDF5 documentation is
>>> that we should always be able to use COLLECTIVE as an option, this is an
>>> HDF5 or MPT bug.
>> 
>> I have conducted yet another test:
>> My example (ex10) that I previously posted to the mailing list was set
>> up with 250 grid points along each axis. When the topic on chunking was
>> brought to the table, I realized that 250 is not evenly dividable on
>> four. The example failed on 64 processes, that is four processes along
>> each direction (the division is 62 + 62 + 63 + 63 = 250).
>> 
>> So I have recompiled "my ex10" with 256 gridpoints in each direction. It
>> turns out that this does in deed run successfully on 64 nodes. Great! It
>> also runs on 128 processes, that is a 8x4x4 decomposition. However it
>> does not run on 125 processes, that is a 5x5x5 decomposition.
>> 
>> The same pattern is clear if I run my example with 250^3 grid points. It
>> does not run on numbers like 64 and 128, but does run successfully on
>> 125 processes, again only when the sub-domains are of exactly equal size
>> (in this case the domain is divided as 5x5x5).
>> 
>> However, I still believe that there is bugs. I did my "roundtrip" by
>> loading a dataset and immediately writing the same dataset to a
>> different file, this time a 250^3 dataset on 125 processes. It did not
>> "pass" this test, i.e. the written dataset was just garbage. I have not
>> yet identified if the garbling is introduced in the reading or writing
>> of the dataset.
>> 
>>> 
>>> Does anyone else see the HDF5 differently? Also, it really looks to me
>>> like HDF5 messed up the MPI
>>> data type in the COLLECTIVE picture below, since it appears to be sliced
>>> incorrectly.
>>> 
>>> Possible Remedies:
>>> 
>>>   1) We can allow you to turn off H5Pset_dxpl_mpio()
>>> 
>>>   2) Send this test case to the MPI/IO people at ANL
>>> 
>>> If you think 1) is what you want, we can do it. If you can package this
>>> work for 2), it would be really valuable.
>> 
>> I will be fine editing gr2.c manually each time this file is changed (I
>> use the sources from Git). But *if* this not a bug in MPT, but a bug in
>> PETSc or HDF5 it should be fixed... Because it is that kind of bug that
>> is extremely annoying and a read pain to track down.
>> 
>> Perhaps the HDF5 mailing list could contribute in this issue?
>> 
>>> 
>>>   Thanks,
>>> 
>>>     Matt
>>> 
>>>    Tanks for your time.
>>> 
>>>    Best regards,
>>>    Håkon Strandenes
>>> 
>>> 
>> 
>> Again thanks for your time.
>> 
>> Regards,
>> Håkon
>> 
>>> 
>>> 
>>> 
>>> 
>>> --
>>> What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which
>>> their experiments lead.
>>> -- Norbert Wiener

Reply via email to