Re: [Pytables-users] PyTables and Multiprocessing

Anthony Scopatz Thu, 11 Jul 2013 15:46:04 -0700

Hi Mathieu,

I think you should try opening a new file handle per process.  The
following works for me on v3.0:


import tables
import random
import multiprocessing

# Reload the data

# Use multiprocessing to perform a simple computation (column average)

def f(filename):
    h5file = tables.openFile(filename, mode='r')
    name = multiprocessing.current_process().name
    column = random.randint(0, 10)
    print '%s use column %i' % (name, column)
    rtn = h5file.root.X[:, column].mean()
    h5file.close()
    return rtn

p = multiprocessing.Pool(2)
col_mean = p.map(f, ['test.hdf5', 'test.hdf5', 'test.hdf5'])

Be well
Anthony


On Thu, Jul 11, 2013 at 3:43 PM, Mathieu Dubois <[email protected]
> wrote:

>  Le 11/07/2013 21:56, Anthony Scopatz a écrit :
>
>
>
>
> On Thu, Jul 11, 2013 at 2:49 PM, Mathieu Dubois <
> [email protected]> wrote:
>
>> Hello,
>>
>> I wanted to use PyTables in conjunction with multiprocessing for some
>> embarrassingly parallel tasks.
>>
>> However, it seems that it is not possible. In the following (very
>> stupid) example, X is a Carray of size (100, 10) stored in the file
>> test.hdf5:
>>
>> import tables
>>
>> import multiprocessing
>>
>> # Reload the data
>>
>> h5file = tables.openFile('test.hdf5', mode='r')
>>
>> X = h5file.root.X
>>
>> # Use multiprocessing to perform a simple computation (column average)
>>
>> def f(X):
>>
>>      name = multiprocessing.current_process().name
>>
>>      column = random.randint(0, n_features)
>>
>>      print '%s use column %i' % (name, column)
>>
>>      return X[:, column].mean()
>>
>> p = multiprocessing.Pool(2)
>>
>> col_mean = p.map(f, [X, X, X])
>>
>> When executing it the following error:
>>
>> Exception in thread Thread-2:
>>
>> Traceback (most recent call last):
>>
>>    File "/usr/lib/python2.7/threading.py", line 551, in __bootstrap_inner
>>
>>      self.run()
>>
>>    File "/usr/lib/python2.7/threading.py", line 504, in run
>>
>>      self.__target(*self.__args, **self.__kwargs)
>>
>>    File "/usr/lib/python2.7/multiprocessing/pool.py", line 319, in
>> _handle_tasks
>>
>>      put(task)
>>
>> PicklingError: Can't pickle <type 'weakref'>: attribute lookup
>> __builtin__.weakref failed
>>
>>
>> I have googled for weakref and pickle but can't find a solution.
>>
>> Any help?
>>
>
>  Hello Mathieu,
>
>  I have used multiprocessing and files opened in read mode many times so
> I am not sure what is going on here.
>
> Thanks for your answer. Maybe you can point me to an working example?
>
>
>   Could you provide the test.hdf5 file so that we could try to reproduce
> this.
>
> Here is the script that I have used to generate the data:
>
> import tables
>
> import numpy
>
> # Create data & store it
>
> n_features = 10
>
> n_obs      = 100
>
> X = numpy.random.rand(n_obs, n_features)
>
> h5file = tables.openFile('test.hdf5', mode='w')
>
> Xatom = tables.Atom.from_dtype(X.dtype)
>
> Xhdf5 = h5file.createCArray(h5file.root, 'X', Xatom, X.shape)
>
> Xhdf5[:] = X
>
> h5file.close()
>
>
> I hope it's not a stupid mistake. I am using PyTables 2.3.1 on Ubuntu
> 12.04 (libhdf5 is 1.8.4patch1).
>
>
>
>
>> By the way, I have noticed that by slicing a Carray, I get a numpy array
>> (I created the HDF5 file with numpy). Therefore, everything is copied to
>> memory. Is there a way to avoid that?
>>
>
>  Only the slice that you ask for is brought into memory an it is returned
> as a non-view numpy array.
>
> OK. I may be careful about that.
>
>
>
>  Be Well
> Anthony
>
>
>>
>> Mathieu
>>
>>
>> ------------------------------------------------------------------------------
>> See everything from the browser to the database with AppDynamics
>> Get end-to-end visibility with application monitoring from AppDynamics
>> Isolate bottlenecks and diagnose root cause in seconds.
>> Start your free trial of AppDynamics Pro today!
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Pytables-users mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>
>
>
>
> ------------------------------------------------------------------------------
> See everything from the browser to the database with AppDynamics
> Get end-to-end visibility with application monitoring from AppDynamics
> Isolate bottlenecks and diagnose root cause in seconds.
> Start your free trial of AppDynamics Pro 
> today!http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
>
>
>
> _______________________________________________
> Pytables-users mailing 
> [email protected]https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>
>
>
> ------------------------------------------------------------------------------
> See everything from the browser to the database with AppDynamics
> Get end-to-end visibility with application monitoring from AppDynamics
> Isolate bottlenecks and diagnose root cause in seconds.
> Start your free trial of AppDynamics Pro today!
> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
> _______________________________________________
> Pytables-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>

------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk

_______________________________________________
Pytables-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pytables-users

Re: [Pytables-users] PyTables and Multiprocessing

Reply via email to