Hi Owen,

How many pools do you have?  Is this a random runtime failure?  What kind
of system is this one?  Is there some particular fucntion in Python that
you are running?  (It seems to be openFile(), but I can't be sure...)  The
error is definitely happening down in the H5open() routine.  Now whether
this is HDF5's fault or ours, I am not yet sure.

Be Well
Anthony

On Sat, Oct 6, 2012 at 4:56 AM, Owen Mackwood
<owen.mackw...@bccn-berlin.de>wrote:

> Hi Anthony,
>
> I'm not trying to write in parallel. Each worker process has its own file
> to write to. After all tasks are completed, I collect the results in the
> master process. So the problem I'm seeing (a hang in the worker process)
> shouldn't have anything to do with parallel writes. Do you have any other
> suggestions?
>
> Regards,
> Owen
>
> On 5 October 2012 18:38, Anthony Scopatz <scop...@gmail.com> wrote:
>
>> Hello Owen,
>>
>> While you can use process pools to read from a file in parallel just
>> fine, writing is another story completely.  While HDF5 itself supports
>> parallel writing though MPI, this comes at the high cost of compression no
>> longer being available and a much more complicated code base.  So for the
>> time being, PyTables only supports the serial HDF5 library.
>>
>> Therefore if you want to write to a file in parallel, you adopt a
>> strategy where you have one process which is responsible for all of the
>> writing and all other processes send their data to this process instead of
>> writing to file directly.  This is a very effective way
>> of accomplishing basically what you need.  In fact, we have an example to
>> do just that [1].  (As a side note: HDF5 may soon be adding an API for
>> exactly this pattern because it comes up so often.)
>>
>> So if I were you, I would look at [1] and adopt it to my use case.
>>
>> Be Well
>> Anthony
>>
>> 1.
>> https://github.com/PyTables/PyTables/blob/develop/examples/multiprocess_access_queues.py
>>
>> On Fri, Oct 5, 2012 at 9:55 AM, Owen Mackwood <
>> owen.mackw...@bccn-berlin.de> wrote:
>>
>>> Hello,
>>>
>>> I'm using a multiprocessing.Pool to parallelize a set of tasks which
>>> record their results into separate hdf5 files. Occasionally (less than 2%
>>> of the time) the worker process will hang. According to gdb, the problem
>>> occurs while opening the hdf5 file, when it attempts to obtain the
>>> associated mutex. Here's part of the backtrace:
>>>
>>> #0  0x00007fb2ceaa716c in pthread_cond_wait@@GLIBC_2.3.2 () from
>>> /lib/libpthread.so.0
>>> #1  0x00007fb2be61c215 in H5TS_mutex_lock () from /usr/lib/libhdf5.so.6
>>> #2  0x00007fb2be32bff0 in H5open () from /usr/lib/libhdf5.so.6
>>> #3  0x00007fb2b96226a4 in __pyx_pf_6tables_13hdf5Extension_4File__g_new
>>> (__pyx_v_self=0x7fb2b04867d0, __pyx_args=<value optimized out>,
>>> __pyx_kwds=<value optimized out>)
>>>     at tables/hdf5Extension.c:2820
>>> #4  0x00000000004abf62 in ext_do_call (f=0x4cb2430, throwflag=<value
>>> optimized out>) at Python/ceval.c:4331
>>>
>>> Nothing else is trying to open this file, so can someone suggest why
>>> this is occurring? This is a very annoying problem as there is no way to
>>> recover from this error, and consequently the worker process is permanently
>>> occupied, which effectively removes one of my processors from the pool.
>>>
>>> Regards,
>>> Owen Mackwood
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Don't let slow site performance ruin your business. Deploy New Relic APM
>>> Deploy New Relic app performance management and know exactly
>>> what is happening inside your Ruby, Python, PHP, Java, and .NET app
>>> Try New Relic at no cost today and get our sweet Data Nerd shirt too!
>>> http://p.sf.net/sfu/newrelic-dev2dev
>>> _______________________________________________
>>> Pytables-users mailing list
>>> Pytables-users@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>>
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> Don't let slow site performance ruin your business. Deploy New Relic APM
>> Deploy New Relic app performance management and know exactly
>> what is happening inside your Ruby, Python, PHP, Java, and .NET app
>> Try New Relic at no cost today and get our sweet Data Nerd shirt too!
>> http://p.sf.net/sfu/newrelic-dev2dev
>> _______________________________________________
>> Pytables-users mailing list
>> Pytables-users@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>
>>
>
>
> ------------------------------------------------------------------------------
> Don't let slow site performance ruin your business. Deploy New Relic APM
> Deploy New Relic app performance management and know exactly
> what is happening inside your Ruby, Python, PHP, Java, and .NET app
> Try New Relic at no cost today and get our sweet Data Nerd shirt too!
> http://p.sf.net/sfu/newrelic-dev2dev
> _______________________________________________
> Pytables-users mailing list
> Pytables-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>
------------------------------------------------------------------------------
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to