Thanks, Frank. I've looked through the combined STDOUT & STDERR file from the run that generated the 122000 & 124000 checkpoints, and it looks totally normal at that time; no complaints.
As I have N separate checkpoint files for a time step (works faster than one large one), the h5ls takes a bit of time. I'm running it now ... This machine has been pretty flaky with its nobackup filesystem in recent weeks. I'd have been pretty unlucky to have been affected by it both at 122000 *and* 124000 (a few hours later), but it's not impossible. Beany On 1 February 2016 at 15:56, Frank Loeffler <[email protected]> wrote: > Hi, > > On Mon, Feb 01, 2016 at 03:39:25PM -0500, Bernard Kelly wrote: >> In each case, the error output in the STDERR consists of multiple >> instances of the message below. >> >> * Is this likely due to file corruption? > > I would think so. A way to see would be to use another HDF5 tool to look > at the file. > >> * What's the best way to check CarpetIOHDF5 files for corruption? > > I don't know of a 'best' way, but I would first try 'h5ls' and see if > already that has problems. If this succeeds, 'h5dump' might be a > quick-and-dirty solution. Dump the complete file to /dev/null and see if > h5dump complains about something. > >> * Can I do anything about this particular run, apart from start >> (again) from the "good" 30000 checkpoint? > > Assuming it is hdf5 file corruption, most likely not. Depending on how > desperate you are you could try to see which parts are affected, and if > the remaining 'good' parts are sufficient to restart your particular > simulation. I wouldn't have high hopes though. > > Something else that I would like to know: do you still have stdout/err > of the run producing these files? Did Carpet complain during checkpoint > write? If so, it might be fine continuing (as it apparently did), but > shouldn't delete the last-good checkpoint file - maybe even by deleting > the known-to-be-bad attempt. > > Frank > _______________________________________________ Users mailing list [email protected] http://lists.einsteintoolkit.org/mailman/listinfo/users
