Hi Anthony,

Thank you very much for your answer (it works). I will try to remodel my code around this trick but I'm not sure it's possible because I use a framework that need arrays.

Can somebody explain what is going on? I was thinking that PyTables keep weakref to the file for lazy loading but I'm not sure.

How

In any case, the PyTables community is very helpful.

Thanks,
Mathieu

Le 12/07/2013 00:44, Anthony Scopatz a écrit :
Hi Mathieu,

I think you should try opening a new file handle per process. The following works for me on v3.0:

import tables
import random
import multiprocessing

# Reload the data

# Use multiprocessing to perform a simple computation (column average)

def f(filename):
    h5file = tables.openFile(filename, mode='r')
    name = multiprocessing.current_process().name
    column = random.randint(0, 10)
    print '%s use column %i' % (name, column)
    rtn = h5file.root.X[:, column].mean()
    h5file.close()
    return rtn

p = multiprocessing.Pool(2)
col_mean = p.map(f, ['test.hdf5', 'test.hdf5', 'test.hdf5'])

Be well
Anthony


On Thu, Jul 11, 2013 at 3:43 PM, Mathieu Dubois <duboismathieu_g...@yahoo.fr <mailto:duboismathieu_g...@yahoo.fr>> wrote:

    Le 11/07/2013 21:56, Anthony Scopatz a écrit :



    On Thu, Jul 11, 2013 at 2:49 PM, Mathieu Dubois
    <duboismathieu_g...@yahoo.fr
    <mailto:duboismathieu_g...@yahoo.fr>> wrote:

        Hello,

        I wanted to use PyTables in conjunction with multiprocessing
        for some
        embarrassingly parallel tasks.

        However, it seems that it is not possible. In the following (very
        stupid) example, X is a Carray of size (100, 10) stored in
        the file
        test.hdf5:

        import tables

        import multiprocessing

        # Reload the data

        h5file = tables.openFile('test.hdf5', mode='r')

        X = h5file.root.X

        # Use multiprocessing to perform a simple computation (column
        average)

        def f(X):

             name = multiprocessing.current_process().name

             column = random.randint(0, n_features)

             print '%s use column %i' % (name, column)

             return X[:, column].mean()

        p = multiprocessing.Pool(2)

        col_mean = p.map(f, [X, X, X])

        When executing it the following error:

        Exception in thread Thread-2:

        Traceback (most recent call last):

           File "/usr/lib/python2.7/threading.py", line 551, in
        __bootstrap_inner

             self.run()

           File "/usr/lib/python2.7/threading.py", line 504, in run

             self.__target(*self.__args, **self.__kwargs)

           File "/usr/lib/python2.7/multiprocessing/pool.py", line
        319, in _handle_tasks

             put(task)

        PicklingError: Can't pickle <type 'weakref'>: attribute
        lookup __builtin__.weakref failed


        I have googled for weakref and pickle but can't find a solution.

        Any help?


    Hello Mathieu,

    I have used multiprocessing and files opened in read mode many
    times so I am not sure what is going on here.
    Thanks for your answer. Maybe you can point me to an working example?


    Could you provide the test.hdf5 file so that we could try to
    reproduce this.
    Here is the script that I have used to generate the data:

    import tables

    import numpy

    # Create data & store it

    n_features = 10

    n_obs      = 100

    X = numpy.random.rand(n_obs, n_features)

    h5file = tables.openFile('test.hdf5', mode='w')

    Xatom = tables.Atom.from_dtype(X.dtype)

    Xhdf5 = h5file.createCArray(h5file.root, 'X', Xatom, X.shape)

    Xhdf5[:] = X

    h5file.close()

    I hope it's not a stupid mistake. I am using PyTables 2.3.1 on
    Ubuntu 12.04 (libhdf5 is 1.8.4patch1).


        By the way, I have noticed that by slicing a Carray, I get a
        numpy array
        (I created the HDF5 file with numpy). Therefore, everything
        is copied to
        memory. Is there a way to avoid that?


    Only the slice that you ask for is brought into memory an it is
    returned as a non-view numpy array.
    OK. I may be careful about that.



    Be Well
    Anthony


        Mathieu

        
------------------------------------------------------------------------------
        See everything from the browser to the database with AppDynamics
        Get end-to-end visibility with application monitoring from
        AppDynamics
        Isolate bottlenecks and diagnose root cause in seconds.
        Start your free trial of AppDynamics Pro today!
        
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
        _______________________________________________
        Pytables-users mailing list
        Pytables-users@lists.sourceforge.net
        <mailto:Pytables-users@lists.sourceforge.net>
        https://lists.sourceforge.net/lists/listinfo/pytables-users




    
------------------------------------------------------------------------------
    See everything from the browser to the database with AppDynamics
    Get end-to-end visibility with application monitoring from AppDynamics
    Isolate bottlenecks and diagnose root cause in seconds.
    Start your free trial of AppDynamics Pro today!
    http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk


    _______________________________________________
    Pytables-users mailing list
    Pytables-users@lists.sourceforge.net  
<mailto:Pytables-users@lists.sourceforge.net>
    https://lists.sourceforge.net/lists/listinfo/pytables-users


    
------------------------------------------------------------------------------
    See everything from the browser to the database with AppDynamics
    Get end-to-end visibility with application monitoring from AppDynamics
    Isolate bottlenecks and diagnose root cause in seconds.
    Start your free trial of AppDynamics Pro today!
    http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
    _______________________________________________
    Pytables-users mailing list
    Pytables-users@lists.sourceforge.net
    <mailto:Pytables-users@lists.sourceforge.net>
    https://lists.sourceforge.net/lists/listinfo/pytables-users




------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk


_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to