On Fri, Jul 12, 2013 at 1:51 AM, Mathieu Dubois <duboismathieu_g...@yahoo.fr
> wrote:
> Hi Anthony,
>
> Thank you very much for your answer (it works). I will try to remodel my
> code around this trick but I'm not sure it's possible because I use a
> framework that need arrays.
>
I think that this method still works. You can always send back a numpy
array to the main process that you pull out from a subprocess.
> Can somebody explain what is going on? I was thinking that PyTables keep
> weakref to the file for lazy loading but I'm not sure.
>
> How
>
> In any case, the PyTables community is very helpful.
>
Glad to help!
Be Well
Anthony
>
> Thanks,
> Mathieu
>
> Le 12/07/2013 00:44, Anthony Scopatz a écrit :
>
> Hi Mathieu,
>
> I think you should try opening a new file handle per process. The
> following works for me on v3.0:
>
> import tables
> import random
> import multiprocessing
>
> # Reload the data
>
> # Use multiprocessing to perform a simple computation (column average)
>
> def f(filename):
> h5file = tables.openFile(filename, mode='r')
> name = multiprocessing.current_process().name
> column = random.randint(0, 10)
> print '%s use column %i' % (name, column)
> rtn = h5file.root.X[:, column].mean()
> h5file.close()
> return rtn
>
> p = multiprocessing.Pool(2)
> col_mean = p.map(f, ['test.hdf5', 'test.hdf5', 'test.hdf5'])
>
> Be well
> Anthony
>
>
> On Thu, Jul 11, 2013 at 3:43 PM, Mathieu Dubois <
> duboismathieu_g...@yahoo.fr> wrote:
>
>> Le 11/07/2013 21:56, Anthony Scopatz a écrit :
>>
>>
>>
>>
>> On Thu, Jul 11, 2013 at 2:49 PM, Mathieu Dubois <
>> duboismathieu_g...@yahoo.fr> wrote:
>>
>>> Hello,
>>>
>>> I wanted to use PyTables in conjunction with multiprocessing for some
>>> embarrassingly parallel tasks.
>>>
>>> However, it seems that it is not possible. In the following (very
>>> stupid) example, X is a Carray of size (100, 10) stored in the file
>>> test.hdf5:
>>>
>>> import tables
>>>
>>> import multiprocessing
>>>
>>> # Reload the data
>>>
>>> h5file = tables.openFile('test.hdf5', mode='r')
>>>
>>> X = h5file.root.X
>>>
>>> # Use multiprocessing to perform a simple computation (column average)
>>>
>>> def f(X):
>>>
>>> name = multiprocessing.current_process().name
>>>
>>> column = random.randint(0, n_features)
>>>
>>> print '%s use column %i' % (name, column)
>>>
>>> return X[:, column].mean()
>>>
>>> p = multiprocessing.Pool(2)
>>>
>>> col_mean = p.map(f, [X, X, X])
>>>
>>> When executing it the following error:
>>>
>>> Exception in thread Thread-2:
>>>
>>> Traceback (most recent call last):
>>>
>>> File "/usr/lib/python2.7/threading.py", line 551, in __bootstrap_inner
>>>
>>> self.run()
>>>
>>> File "/usr/lib/python2.7/threading.py", line 504, in run
>>>
>>> self.__target(*self.__args, **self.__kwargs)
>>>
>>> File "/usr/lib/python2.7/multiprocessing/pool.py", line 319, in
>>> _handle_tasks
>>>
>>> put(task)
>>>
>>> PicklingError: Can't pickle <type 'weakref'>: attribute lookup
>>> __builtin__.weakref failed
>>>
>>>
>>> I have googled for weakref and pickle but can't find a solution.
>>>
>>> Any help?
>>>
>>
>> Hello Mathieu,
>>
>> I have used multiprocessing and files opened in read mode many times so
>> I am not sure what is going on here.
>>
>> Thanks for your answer. Maybe you can point me to an working example?
>>
>>
>> Could you provide the test.hdf5 file so that we could try to reproduce
>> this.
>>
>> Here is the script that I have used to generate the data:
>>
>> import tables
>>
>> import numpy
>>
>> # Create data & store it
>>
>> n_features = 10
>>
>> n_obs = 100
>>
>> X = numpy.random.rand(n_obs, n_features)
>>
>> h5file = tables.openFile('test.hdf5', mode='w')
>>
>> Xatom = tables.Atom.from_dtype(X.dtype)
>>
>> Xhdf5 = h5file.createCArray(h5file.root, 'X', Xatom, X.shape)
>>
>> Xhdf5[:] = X
>>
>> h5file.close()
>>
>>
>> I hope it's not a stupid mistake. I am using PyTables 2.3.1 on Ubuntu
>> 12.04 (libhdf5 is 1.8.4patch1).
>>
>>
>>
>>
>>> By the way, I have noticed that by slicing a Carray, I get a numpy array
>>> (I created the HDF5 file with numpy). Therefore, everything is copied to
>>> memory. Is there a way to avoid that?
>>>
>>
>> Only the slice that you ask for is brought into memory an it is
>> returned as a non-view numpy array.
>>
>> OK. I may be careful about that.
>>
>>
>>
>> Be Well
>> Anthony
>>
>>
>>>
>>> Mathieu
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> See everything from the browser to the database with AppDynamics
>>> Get end-to-end visibility with application monitoring from AppDynamics
>>> Isolate bottlenecks and diagnose root cause in seconds.
>>> Start your free trial of AppDynamics Pro today!
>>>
>>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
>>> _______________________________________________
>>> Pytables-users mailing list
>>> Pytables-users@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> See everything from the browser to the database with AppDynamics
>> Get end-to-end visibility with application monitoring from AppDynamics
>> Isolate bottlenecks and diagnose root cause in seconds.
>> Start your free trial of AppDynamics Pro
>> today!http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
>>
>>
>>
>> _______________________________________________
>> Pytables-users mailing
>> listPytables-users@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/pytables-users
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> See everything from the browser to the database with AppDynamics
>> Get end-to-end visibility with application monitoring from AppDynamics
>> Isolate bottlenecks and diagnose root cause in seconds.
>> Start your free trial of AppDynamics Pro today!
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Pytables-users mailing list
>> Pytables-users@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>
>>
>
>
> ------------------------------------------------------------------------------
> See everything from the browser to the database with AppDynamics
> Get end-to-end visibility with application monitoring from AppDynamics
> Isolate bottlenecks and diagnose root cause in seconds.
> Start your free trial of AppDynamics Pro
> today!http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
>
>
>
> _______________________________________________
> Pytables-users mailing
> listPytables-users@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/pytables-users
>
>
>
>
> ------------------------------------------------------------------------------
> See everything from the browser to the database with AppDynamics
> Get end-to-end visibility with application monitoring from AppDynamics
> Isolate bottlenecks and diagnose root cause in seconds.
> Start your free trial of AppDynamics Pro today!
> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
> _______________________________________________
> Pytables-users mailing list
> Pytables-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>
------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users