A Friday 23 April 2010 01:22:17 Anthony Foglia escrigué: > Is it possible to have one process reading an HDF5 file while another > one is writing to it? We currently have some code written in C++ that > writes a dataset regularly, and another process in Python (and sometimes > third processes in C++) that reads from it. This often leads to the > reader saying the node written by the writer does not exist. Here's an > example done at a pair of IPython terminals (sessions interleaved). > > --------------1 > In [4]: fyle = tables.openFile("spam.h5",mode="a") > > ---2 > In [2]: fyle = tables.openFile("spam.h5") > IOError: file ``spam.h5`` exists but it is not an HDF5 file > > In [6]: fyle.flush() > > In [3]: fyle = tables.openFile("spam.h5") > > In [4]: fyle.root._v_children.keys() > Out[4]: [] > > In [7]: fyle.createArray("/", "test01", numpy.arange(10)) > Out[7]: > /test01 (Array(10,)) '' > atom := Int64Atom(shape=(), dflt=0) > maindim := 0 > flavor := 'numpy' > byteorder := 'little' > chunkshape := None > > In [5]: fyle.root._v_children.keys() > Out[5]: [] > > In [8]: fyle.flush() > > In [6]: fyle.root._v_children.keys() > Out[6]: [] > > In [7]: fyle.root._f_getChild("test01") > ERROR: An unexpected error occurred while tokenizing input > The following traceback may be corrupted or invalid > The error message is: ('EOF in multi-line statement', (132, 0)) > > --------------------------------------------------------------------------- > NoSuchNodeError Traceback (most recent call last) > [traceback omitted...] > NoSuchNodeError: group ``/`` does not have a child named ``test01`` > > ---------------------------------- > > Is there a way to get a File object to refresh itself from disk? Or do > I need to close and re-open the file? Is this caused by the underlying > HDF5 libraries, or a caching issue in PyTables itself?
Actually both PyTables and HDF5 have caches, and this is certainly the problem. You can disable caches on the PyTables side by setting the NODE_CACHE_SLOTS parameter to 0. You may also try to disable the HDF5 cache for nodes by setting METADATA_CACHE_SIZE to 0 too (only if you are using HDF5 1.8.x), but I don't think this scenario is supported yet in HDF5. More info about these paramteres in: http://www.pytables.org/docs/manual/apc.html#id356485 I think the HDF crew is working on implementing a safe way for allowing one single process writing and many reading, but it is not there yet. Meanwhile, you will have to explicitly close and reopen files so as to force HDF5 caches to refresh. -- Francesc Alted ------------------------------------------------------------------------------ _______________________________________________ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users