A Friday 23 April 2010 01:22:17 Anthony Foglia escrigué:
> Is it possible to have one process reading an HDF5 file while another
> one is writing to it?  We currently have some code written in C++ that
> writes a dataset regularly, and another process in Python (and sometimes
> third processes in C++) that reads from it.  This often leads to the
> reader saying the node written by the writer does not exist.  Here's an
> example done at a pair of IPython terminals (sessions interleaved).
> 
> --------------1
>           In [4]: fyle = tables.openFile("spam.h5",mode="a")
> 
> ---2
> In [2]: fyle = tables.openFile("spam.h5")
> IOError: file ``spam.h5`` exists but it is not an HDF5 file
> 
>           In [6]: fyle.flush()
> 
> In [3]: fyle = tables.openFile("spam.h5")
> 
> In [4]: fyle.root._v_children.keys()
> Out[4]: []
> 
>           In [7]: fyle.createArray("/", "test01", numpy.arange(10))
>           Out[7]:
>           /test01 (Array(10,)) ''
>             atom := Int64Atom(shape=(), dflt=0)
>             maindim := 0
>             flavor := 'numpy'
>             byteorder := 'little'
>             chunkshape := None
> 
> In [5]: fyle.root._v_children.keys()
> Out[5]: []
> 
>           In [8]: fyle.flush()
> 
> In [6]: fyle.root._v_children.keys()
> Out[6]: []
> 
> In [7]: fyle.root._f_getChild("test01")
> ERROR: An unexpected error occurred while tokenizing input
> The following traceback may be corrupted or invalid
> The error message is: ('EOF in multi-line statement', (132, 0))
> 
> ---------------------------------------------------------------------------
> NoSuchNodeError                           Traceback (most recent call last)
> [traceback omitted...]
> NoSuchNodeError: group ``/`` does not have a child named ``test01``
> 
> ----------------------------------
> 
>       Is there a way to get a File object to refresh itself from disk?  Or do
> I need to close and re-open the file?  Is this caused by the underlying
> HDF5 libraries, or a caching issue in PyTables itself?

Actually both PyTables and HDF5 have caches, and this is certainly the 
problem.  You can disable caches on the PyTables side by setting the 
NODE_CACHE_SLOTS parameter to 0.  You may also try to disable the HDF5 cache 
for nodes by setting METADATA_CACHE_SIZE to 0 too (only if you are using HDF5 
1.8.x), but I don't think this scenario is supported yet in HDF5.   More info 
about these paramteres in:

http://www.pytables.org/docs/manual/apc.html#id356485

I think the HDF crew is working on implementing a safe way for allowing one 
single process writing and many reading, but it is not there yet.  Meanwhile, 
you will have to explicitly close and reopen files so as to force HDF5 caches 
to refresh.

-- 
Francesc Alted

------------------------------------------------------------------------------
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to