Re: [Pytables-users] PyTables and Multiprocessing
Hi Anthony, Thank you very much for your answer (it works). I will try to remodel my code around this trick but I'm not sure it's possible because I use a framework that need arrays. Can somebody explain what is going on? I was thinking that PyTables keep weakref to the file for lazy loading but I'm not sure. How In any case, the PyTables community is very helpful. Thanks, Mathieu Le 12/07/2013 00:44, Anthony Scopatz a écrit : Hi Mathieu, I think you should try opening a new file handle per process. The following works for me on v3.0: import tables import random import multiprocessing # Reload the data # Use multiprocessing to perform a simple computation (column average) def f(filename): h5file = tables.openFile(filename, mode='r') name = multiprocessing.current_process().name column = random.randint(0, 10) print '%s use column %i' % (name, column) rtn = h5file.root.X[:, column].mean() h5file.close() return rtn p = multiprocessing.Pool(2) col_mean = p.map(f, ['test.hdf5', 'test.hdf5', 'test.hdf5']) Be well Anthony On Thu, Jul 11, 2013 at 3:43 PM, Mathieu Dubois duboismathieu_g...@yahoo.fr mailto:duboismathieu_g...@yahoo.fr wrote: Le 11/07/2013 21:56, Anthony Scopatz a écrit : On Thu, Jul 11, 2013 at 2:49 PM, Mathieu Dubois duboismathieu_g...@yahoo.fr mailto:duboismathieu_g...@yahoo.fr wrote: Hello, I wanted to use PyTables in conjunction with multiprocessing for some embarrassingly parallel tasks. However, it seems that it is not possible. In the following (very stupid) example, X is a Carray of size (100, 10) stored in the file test.hdf5: import tables import multiprocessing # Reload the data h5file = tables.openFile('test.hdf5', mode='r') X = h5file.root.X # Use multiprocessing to perform a simple computation (column average) def f(X): name = multiprocessing.current_process().name column = random.randint(0, n_features) print '%s use column %i' % (name, column) return X[:, column].mean() p = multiprocessing.Pool(2) col_mean = p.map(f, [X, X, X]) When executing it the following error: Exception in thread Thread-2: Traceback (most recent call last): File /usr/lib/python2.7/threading.py, line 551, in __bootstrap_inner self.run() File /usr/lib/python2.7/threading.py, line 504, in run self.__target(*self.__args, **self.__kwargs) File /usr/lib/python2.7/multiprocessing/pool.py, line 319, in _handle_tasks put(task) PicklingError: Can't pickle type 'weakref': attribute lookup __builtin__.weakref failed I have googled for weakref and pickle but can't find a solution. Any help? Hello Mathieu, I have used multiprocessing and files opened in read mode many times so I am not sure what is going on here. Thanks for your answer. Maybe you can point me to an working example? Could you provide the test.hdf5 file so that we could try to reproduce this. Here is the script that I have used to generate the data: import tables import numpy # Create data store it n_features = 10 n_obs = 100 X = numpy.random.rand(n_obs, n_features) h5file = tables.openFile('test.hdf5', mode='w') Xatom = tables.Atom.from_dtype(X.dtype) Xhdf5 = h5file.createCArray(h5file.root, 'X', Xatom, X.shape) Xhdf5[:] = X h5file.close() I hope it's not a stupid mistake. I am using PyTables 2.3.1 on Ubuntu 12.04 (libhdf5 is 1.8.4patch1). By the way, I have noticed that by slicing a Carray, I get a numpy array (I created the HDF5 file with numpy). Therefore, everything is copied to memory. Is there a way to avoid that? Only the slice that you ask for is brought into memory an it is returned as a non-view numpy array. OK. I may be careful about that. Be Well Anthony Mathieu -- See everything from the browser to the database with AppDynamics Get end-to-end visibility with application monitoring from AppDynamics Isolate bottlenecks and diagnose root cause in seconds. Start your free trial of AppDynamics Pro today! http://pubads.g.doubleclick.net/gampad/clk?id=48808831iu=/4140/ostg.clktrk ___ Pytables-users mailing list Pytables-users@lists.sourceforge.net mailto:Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users
Re: [Pytables-users] PyTables and Multiprocessing
On Fri, Jul 12, 2013 at 1:51 AM, Mathieu Dubois duboismathieu_g...@yahoo.fr wrote: Hi Anthony, Thank you very much for your answer (it works). I will try to remodel my code around this trick but I'm not sure it's possible because I use a framework that need arrays. I think that this method still works. You can always send back a numpy array to the main process that you pull out from a subprocess. Can somebody explain what is going on? I was thinking that PyTables keep weakref to the file for lazy loading but I'm not sure. How In any case, the PyTables community is very helpful. Glad to help! Be Well Anthony Thanks, Mathieu Le 12/07/2013 00:44, Anthony Scopatz a écrit : Hi Mathieu, I think you should try opening a new file handle per process. The following works for me on v3.0: import tables import random import multiprocessing # Reload the data # Use multiprocessing to perform a simple computation (column average) def f(filename): h5file = tables.openFile(filename, mode='r') name = multiprocessing.current_process().name column = random.randint(0, 10) print '%s use column %i' % (name, column) rtn = h5file.root.X[:, column].mean() h5file.close() return rtn p = multiprocessing.Pool(2) col_mean = p.map(f, ['test.hdf5', 'test.hdf5', 'test.hdf5']) Be well Anthony On Thu, Jul 11, 2013 at 3:43 PM, Mathieu Dubois duboismathieu_g...@yahoo.fr wrote: Le 11/07/2013 21:56, Anthony Scopatz a écrit : On Thu, Jul 11, 2013 at 2:49 PM, Mathieu Dubois duboismathieu_g...@yahoo.fr wrote: Hello, I wanted to use PyTables in conjunction with multiprocessing for some embarrassingly parallel tasks. However, it seems that it is not possible. In the following (very stupid) example, X is a Carray of size (100, 10) stored in the file test.hdf5: import tables import multiprocessing # Reload the data h5file = tables.openFile('test.hdf5', mode='r') X = h5file.root.X # Use multiprocessing to perform a simple computation (column average) def f(X): name = multiprocessing.current_process().name column = random.randint(0, n_features) print '%s use column %i' % (name, column) return X[:, column].mean() p = multiprocessing.Pool(2) col_mean = p.map(f, [X, X, X]) When executing it the following error: Exception in thread Thread-2: Traceback (most recent call last): File /usr/lib/python2.7/threading.py, line 551, in __bootstrap_inner self.run() File /usr/lib/python2.7/threading.py, line 504, in run self.__target(*self.__args, **self.__kwargs) File /usr/lib/python2.7/multiprocessing/pool.py, line 319, in _handle_tasks put(task) PicklingError: Can't pickle type 'weakref': attribute lookup __builtin__.weakref failed I have googled for weakref and pickle but can't find a solution. Any help? Hello Mathieu, I have used multiprocessing and files opened in read mode many times so I am not sure what is going on here. Thanks for your answer. Maybe you can point me to an working example? Could you provide the test.hdf5 file so that we could try to reproduce this. Here is the script that I have used to generate the data: import tables import numpy # Create data store it n_features = 10 n_obs = 100 X = numpy.random.rand(n_obs, n_features) h5file = tables.openFile('test.hdf5', mode='w') Xatom = tables.Atom.from_dtype(X.dtype) Xhdf5 = h5file.createCArray(h5file.root, 'X', Xatom, X.shape) Xhdf5[:] = X h5file.close() I hope it's not a stupid mistake. I am using PyTables 2.3.1 on Ubuntu 12.04 (libhdf5 is 1.8.4patch1). By the way, I have noticed that by slicing a Carray, I get a numpy array (I created the HDF5 file with numpy). Therefore, everything is copied to memory. Is there a way to avoid that? Only the slice that you ask for is brought into memory an it is returned as a non-view numpy array. OK. I may be careful about that. Be Well Anthony Mathieu -- See everything from the browser to the database with AppDynamics Get end-to-end visibility with application monitoring from AppDynamics Isolate bottlenecks and diagnose root cause in seconds. Start your free trial of AppDynamics Pro today! http://pubads.g.doubleclick.net/gampad/clk?id=48808831iu=/4140/ostg.clktrk ___ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users -- See everything from the browser to the database with AppDynamics Get end-to-end visibility with application monitoring from AppDynamics Isolate bottlenecks and diagnose root cause in seconds. Start your free trial of AppDynamics Pro
Re: [Pytables-users] PyTables and Multiprocessing
Le 11/07/2013 21:56, Anthony Scopatz a écrit : On Thu, Jul 11, 2013 at 2:49 PM, Mathieu Dubois duboismathieu_g...@yahoo.fr mailto:duboismathieu_g...@yahoo.fr wrote: Hello, I wanted to use PyTables in conjunction with multiprocessing for some embarrassingly parallel tasks. However, it seems that it is not possible. In the following (very stupid) example, X is a Carray of size (100, 10) stored in the file test.hdf5: import tables import multiprocessing # Reload the data h5file = tables.openFile('test.hdf5', mode='r') X = h5file.root.X # Use multiprocessing to perform a simple computation (column average) def f(X): name = multiprocessing.current_process().name column = random.randint(0, n_features) print '%s use column %i' % (name, column) return X[:, column].mean() p = multiprocessing.Pool(2) col_mean = p.map(f, [X, X, X]) When executing it the following error: Exception in thread Thread-2: Traceback (most recent call last): File /usr/lib/python2.7/threading.py, line 551, in __bootstrap_inner self.run() File /usr/lib/python2.7/threading.py, line 504, in run self.__target(*self.__args, **self.__kwargs) File /usr/lib/python2.7/multiprocessing/pool.py, line 319, in _handle_tasks put(task) PicklingError: Can't pickle type 'weakref': attribute lookup __builtin__.weakref failed I have googled for weakref and pickle but can't find a solution. Any help? Hello Mathieu, I have used multiprocessing and files opened in read mode many times so I am not sure what is going on here. Thanks for your answer. Maybe you can point me to an working example? Could you provide the test.hdf5 file so that we could try to reproduce this. Here is the script that I have used to generate the data: import tables import numpy # Create data store it n_features = 10 n_obs = 100 X = numpy.random.rand(n_obs, n_features) h5file = tables.openFile('test.hdf5', mode='w') Xatom = tables.Atom.from_dtype(X.dtype) Xhdf5 = h5file.createCArray(h5file.root, 'X', Xatom, X.shape) Xhdf5[:] = X h5file.close() I hope it's not a stupid mistake. I am using PyTables 2.3.1 on Ubuntu 12.04 (libhdf5 is 1.8.4patch1). By the way, I have noticed that by slicing a Carray, I get a numpy array (I created the HDF5 file with numpy). Therefore, everything is copied to memory. Is there a way to avoid that? Only the slice that you ask for is brought into memory an it is returned as a non-view numpy array. OK. I may be careful about that. Be Well Anthony Mathieu -- See everything from the browser to the database with AppDynamics Get end-to-end visibility with application monitoring from AppDynamics Isolate bottlenecks and diagnose root cause in seconds. Start your free trial of AppDynamics Pro today! http://pubads.g.doubleclick.net/gampad/clk?id=48808831iu=/4140/ostg.clktrk ___ Pytables-users mailing list Pytables-users@lists.sourceforge.net mailto:Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users -- See everything from the browser to the database with AppDynamics Get end-to-end visibility with application monitoring from AppDynamics Isolate bottlenecks and diagnose root cause in seconds. Start your free trial of AppDynamics Pro today! http://pubads.g.doubleclick.net/gampad/clk?id=48808831iu=/4140/ostg.clktrk ___ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users -- See everything from the browser to the database with AppDynamics Get end-to-end visibility with application monitoring from AppDynamics Isolate bottlenecks and diagnose root cause in seconds. Start your free trial of AppDynamics Pro today! http://pubads.g.doubleclick.net/gampad/clk?id=48808831iu=/4140/ostg.clktrk___ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users
Re: [Pytables-users] PyTables and Multiprocessing
Hi Mathieu, I think you should try opening a new file handle per process. The following works for me on v3.0: import tables import random import multiprocessing # Reload the data # Use multiprocessing to perform a simple computation (column average) def f(filename): h5file = tables.openFile(filename, mode='r') name = multiprocessing.current_process().name column = random.randint(0, 10) print '%s use column %i' % (name, column) rtn = h5file.root.X[:, column].mean() h5file.close() return rtn p = multiprocessing.Pool(2) col_mean = p.map(f, ['test.hdf5', 'test.hdf5', 'test.hdf5']) Be well Anthony On Thu, Jul 11, 2013 at 3:43 PM, Mathieu Dubois duboismathieu_g...@yahoo.fr wrote: Le 11/07/2013 21:56, Anthony Scopatz a écrit : On Thu, Jul 11, 2013 at 2:49 PM, Mathieu Dubois duboismathieu_g...@yahoo.fr wrote: Hello, I wanted to use PyTables in conjunction with multiprocessing for some embarrassingly parallel tasks. However, it seems that it is not possible. In the following (very stupid) example, X is a Carray of size (100, 10) stored in the file test.hdf5: import tables import multiprocessing # Reload the data h5file = tables.openFile('test.hdf5', mode='r') X = h5file.root.X # Use multiprocessing to perform a simple computation (column average) def f(X): name = multiprocessing.current_process().name column = random.randint(0, n_features) print '%s use column %i' % (name, column) return X[:, column].mean() p = multiprocessing.Pool(2) col_mean = p.map(f, [X, X, X]) When executing it the following error: Exception in thread Thread-2: Traceback (most recent call last): File /usr/lib/python2.7/threading.py, line 551, in __bootstrap_inner self.run() File /usr/lib/python2.7/threading.py, line 504, in run self.__target(*self.__args, **self.__kwargs) File /usr/lib/python2.7/multiprocessing/pool.py, line 319, in _handle_tasks put(task) PicklingError: Can't pickle type 'weakref': attribute lookup __builtin__.weakref failed I have googled for weakref and pickle but can't find a solution. Any help? Hello Mathieu, I have used multiprocessing and files opened in read mode many times so I am not sure what is going on here. Thanks for your answer. Maybe you can point me to an working example? Could you provide the test.hdf5 file so that we could try to reproduce this. Here is the script that I have used to generate the data: import tables import numpy # Create data store it n_features = 10 n_obs = 100 X = numpy.random.rand(n_obs, n_features) h5file = tables.openFile('test.hdf5', mode='w') Xatom = tables.Atom.from_dtype(X.dtype) Xhdf5 = h5file.createCArray(h5file.root, 'X', Xatom, X.shape) Xhdf5[:] = X h5file.close() I hope it's not a stupid mistake. I am using PyTables 2.3.1 on Ubuntu 12.04 (libhdf5 is 1.8.4patch1). By the way, I have noticed that by slicing a Carray, I get a numpy array (I created the HDF5 file with numpy). Therefore, everything is copied to memory. Is there a way to avoid that? Only the slice that you ask for is brought into memory an it is returned as a non-view numpy array. OK. I may be careful about that. Be Well Anthony Mathieu -- See everything from the browser to the database with AppDynamics Get end-to-end visibility with application monitoring from AppDynamics Isolate bottlenecks and diagnose root cause in seconds. Start your free trial of AppDynamics Pro today! http://pubads.g.doubleclick.net/gampad/clk?id=48808831iu=/4140/ostg.clktrk ___ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users -- See everything from the browser to the database with AppDynamics Get end-to-end visibility with application monitoring from AppDynamics Isolate bottlenecks and diagnose root cause in seconds. Start your free trial of AppDynamics Pro today!http://pubads.g.doubleclick.net/gampad/clk?id=48808831iu=/4140/ostg.clktrk ___ Pytables-users mailing listPytables-users@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/pytables-users -- See everything from the browser to the database with AppDynamics Get end-to-end visibility with application monitoring from AppDynamics Isolate bottlenecks and diagnose root cause in seconds. Start your free trial of AppDynamics Pro today! http://pubads.g.doubleclick.net/gampad/clk?id=48808831iu=/4140/ostg.clktrk ___ Pytables-users mailing list Pytables-users@lists.sourceforge.net