On Sun, May 14, 2006 at 08:30:09PM +0200, Dries Kimpe wrote:
> Because you couldn't reproduce the same errors,
> I've done some major testing... (Doubt sneaked in, and I was in need of
> some reassuring ;-)
Sorry for not keeping in touch with you better. I was able to
reprocde your problem after all. testphdf5 runs to completion fine as
a single-process job (mpiexec -np 1) but fails as a multi-process job
(any -np greater than 1).
> On all systems, the PVFS2 server was running locally
> (1 metadata, 1 data); The database was erased and recreated (-f)
> for every test. I've also verified that I had no old libraries/other
> installs of hdf5/mpich/... .
I went straight for the multiple-server setup, but will use
a single server from now on. thanks for further minimizing the test
case.
> The bad news: I've been able to reproduce the errors I got.
> The slightly less bad news: I've been able to reproduce the error you
> got. I'm not really sure if both are related.
The error I got
Testing -- compressed dataset collective read (cmpdsetr)
Proc 0: *** PHDF5 ERROR ***
Assertion (H5Fcreate succeeded) failed at line 2134 in ../../testpar/t_d
set.c
aborting MPI process
Error encountered before initializing MPICH
is easy to explain. HDF5 is calling open(2) directly with the
ROMIO-style file name. I verified this with strace. So we just need
to find the spot in the code path that calls open(2) instead of
MPI_File_open() and we can take care of that one. Clearly they know
to call MPI_File_open() most of the time. Wonder how they missed this
spot?
Thanks for the additional information. I'm going to further reduce
testphdf5 and narrow down where the failure lies.
==rob
--
Rob Latham
Mathematics and Computer Science Division A215 0178 EA2D B059 8CDF
Argonne National Labs, IL USA B29D F333 664A 4280 315B
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users