Re: [Pytables-users] PyTables and Multiprocessing

2013-07-12 Thread Mathieu Dubois

Hi Anthony,

Thank you very much for your answer (it works). I will try to remodel my 
code around this trick but I'm not sure it's possible because I use a 
framework that need arrays.


Can somebody explain what is going on? I was thinking that PyTables keep 
weakref to the file for lazy loading but I'm not sure.


How

In any case, the PyTables community is very helpful.

Thanks,
Mathieu

Le 12/07/2013 00:44, Anthony Scopatz a écrit :

Hi Mathieu,

I think you should try opening a new file handle per process.  The 
following works for me on v3.0:


import tables
import random
import multiprocessing

# Reload the data

# Use multiprocessing to perform a simple computation (column average)

def f(filename):
h5file = tables.openFile(filename, mode='r')
name = multiprocessing.current_process().name
column = random.randint(0, 10)
print '%s use column %i' % (name, column)
rtn = h5file.root.X[:, column].mean()
h5file.close()
return rtn

p = multiprocessing.Pool(2)
col_mean = p.map(f, ['test.hdf5', 'test.hdf5', 'test.hdf5'])

Be well
Anthony


On Thu, Jul 11, 2013 at 3:43 PM, Mathieu Dubois 
duboismathieu_g...@yahoo.fr mailto:duboismathieu_g...@yahoo.fr wrote:


Le 11/07/2013 21:56, Anthony Scopatz a écrit :




On Thu, Jul 11, 2013 at 2:49 PM, Mathieu Dubois
duboismathieu_g...@yahoo.fr
mailto:duboismathieu_g...@yahoo.fr wrote:

Hello,

I wanted to use PyTables in conjunction with multiprocessing
for some
embarrassingly parallel tasks.

However, it seems that it is not possible. In the following (very
stupid) example, X is a Carray of size (100, 10) stored in
the file
test.hdf5:

import tables

import multiprocessing

# Reload the data

h5file = tables.openFile('test.hdf5', mode='r')

X = h5file.root.X

# Use multiprocessing to perform a simple computation (column
average)

def f(X):

 name = multiprocessing.current_process().name

 column = random.randint(0, n_features)

 print '%s use column %i' % (name, column)

 return X[:, column].mean()

p = multiprocessing.Pool(2)

col_mean = p.map(f, [X, X, X])

When executing it the following error:

Exception in thread Thread-2:

Traceback (most recent call last):

   File /usr/lib/python2.7/threading.py, line 551, in
__bootstrap_inner

 self.run()

   File /usr/lib/python2.7/threading.py, line 504, in run

 self.__target(*self.__args, **self.__kwargs)

   File /usr/lib/python2.7/multiprocessing/pool.py, line
319, in _handle_tasks

 put(task)

PicklingError: Can't pickle type 'weakref': attribute
lookup __builtin__.weakref failed


I have googled for weakref and pickle but can't find a solution.

Any help?


Hello Mathieu,

I have used multiprocessing and files opened in read mode many
times so I am not sure what is going on here.

Thanks for your answer. Maybe you can point me to an working example?



Could you provide the test.hdf5 file so that we could try to
reproduce this.

Here is the script that I have used to generate the data:

import tables

import numpy

# Create data  store it

n_features = 10

n_obs  = 100

X = numpy.random.rand(n_obs, n_features)

h5file = tables.openFile('test.hdf5', mode='w')

Xatom = tables.Atom.from_dtype(X.dtype)

Xhdf5 = h5file.createCArray(h5file.root, 'X', Xatom, X.shape)

Xhdf5[:] = X

h5file.close()

I hope it's not a stupid mistake. I am using PyTables 2.3.1 on
Ubuntu 12.04 (libhdf5 is 1.8.4patch1).



By the way, I have noticed that by slicing a Carray, I get a
numpy array
(I created the HDF5 file with numpy). Therefore, everything
is copied to
memory. Is there a way to avoid that?


Only the slice that you ask for is brought into memory an it is
returned as a non-view numpy array.

OK. I may be careful about that.




Be Well
Anthony


Mathieu


--
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from
AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!

http://pubads.g.doubleclick.net/gampad/clk?id=48808831iu=/4140/ostg.clktrk
___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
mailto:Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users






Re: [Pytables-users] PyTables and Multiprocessing

2013-07-12 Thread Anthony Scopatz
On Fri, Jul 12, 2013 at 1:51 AM, Mathieu Dubois duboismathieu_g...@yahoo.fr
 wrote:

  Hi Anthony,

 Thank you very much for your answer (it works). I will try to remodel my
 code around this trick but I'm not sure it's possible because I use a
 framework that need arrays.


I think that this method still works.  You can always send back a numpy
array to the main process that you pull out from a subprocess.


 Can somebody explain what is going on? I was thinking that PyTables keep
 weakref to the file for lazy loading but I'm not sure.

 How

 In any case, the PyTables community is very helpful.


Glad to help!

Be Well
Anthony



 Thanks,
 Mathieu

 Le 12/07/2013 00:44, Anthony Scopatz a écrit :

 Hi Mathieu,

  I think you should try opening a new file handle per process.  The
 following works for me on v3.0:

  import tables
 import random
 import multiprocessing

  # Reload the data

  # Use multiprocessing to perform a simple computation (column average)

  def f(filename):
 h5file = tables.openFile(filename, mode='r')
 name = multiprocessing.current_process().name
 column = random.randint(0, 10)
 print '%s use column %i' % (name, column)
 rtn = h5file.root.X[:, column].mean()
 h5file.close()
 return rtn

  p = multiprocessing.Pool(2)
 col_mean = p.map(f, ['test.hdf5', 'test.hdf5', 'test.hdf5'])

  Be well
 Anthony


 On Thu, Jul 11, 2013 at 3:43 PM, Mathieu Dubois 
 duboismathieu_g...@yahoo.fr wrote:

  Le 11/07/2013 21:56, Anthony Scopatz a écrit :




 On Thu, Jul 11, 2013 at 2:49 PM, Mathieu Dubois 
 duboismathieu_g...@yahoo.fr wrote:

 Hello,

 I wanted to use PyTables in conjunction with multiprocessing for some
 embarrassingly parallel tasks.

 However, it seems that it is not possible. In the following (very
 stupid) example, X is a Carray of size (100, 10) stored in the file
 test.hdf5:

 import tables

 import multiprocessing

 # Reload the data

 h5file = tables.openFile('test.hdf5', mode='r')

 X = h5file.root.X

 # Use multiprocessing to perform a simple computation (column average)

 def f(X):

  name = multiprocessing.current_process().name

  column = random.randint(0, n_features)

  print '%s use column %i' % (name, column)

  return X[:, column].mean()

 p = multiprocessing.Pool(2)

 col_mean = p.map(f, [X, X, X])

 When executing it the following error:

 Exception in thread Thread-2:

 Traceback (most recent call last):

File /usr/lib/python2.7/threading.py, line 551, in __bootstrap_inner

  self.run()

File /usr/lib/python2.7/threading.py, line 504, in run

  self.__target(*self.__args, **self.__kwargs)

File /usr/lib/python2.7/multiprocessing/pool.py, line 319, in
 _handle_tasks

  put(task)

 PicklingError: Can't pickle type 'weakref': attribute lookup
 __builtin__.weakref failed


 I have googled for weakref and pickle but can't find a solution.

 Any help?


  Hello Mathieu,

  I have used multiprocessing and files opened in read mode many times so
 I am not sure what is going on here.

  Thanks for your answer. Maybe you can point me to an working example?


   Could you provide the test.hdf5 file so that we could try to reproduce
 this.

  Here is the script that I have used to generate the data:

 import tables

 import numpy

 # Create data  store it

 n_features = 10

 n_obs  = 100

 X = numpy.random.rand(n_obs, n_features)

 h5file = tables.openFile('test.hdf5', mode='w')

 Xatom = tables.Atom.from_dtype(X.dtype)

 Xhdf5 = h5file.createCArray(h5file.root, 'X', Xatom, X.shape)

 Xhdf5[:] = X

 h5file.close()


 I hope it's not a stupid mistake. I am using PyTables 2.3.1 on Ubuntu
 12.04 (libhdf5 is 1.8.4patch1).




 By the way, I have noticed that by slicing a Carray, I get a numpy array
 (I created the HDF5 file with numpy). Therefore, everything is copied to
 memory. Is there a way to avoid that?


  Only the slice that you ask for is brought into memory an it is
 returned as a non-view numpy array.

  OK. I may be careful about that.



  Be Well
 Anthony



 Mathieu


 --
 See everything from the browser to the database with AppDynamics
 Get end-to-end visibility with application monitoring from AppDynamics
 Isolate bottlenecks and diagnose root cause in seconds.
 Start your free trial of AppDynamics Pro today!

 http://pubads.g.doubleclick.net/gampad/clk?id=48808831iu=/4140/ostg.clktrk
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users




 --
 See everything from the browser to the database with AppDynamics
 Get end-to-end visibility with application monitoring from AppDynamics
 Isolate bottlenecks and diagnose root cause in seconds.
 Start your free trial of AppDynamics Pro 
 

Re: [Pytables-users] PyTables and Multiprocessing

2013-07-11 Thread Mathieu Dubois

Le 11/07/2013 21:56, Anthony Scopatz a écrit :




On Thu, Jul 11, 2013 at 2:49 PM, Mathieu Dubois 
duboismathieu_g...@yahoo.fr mailto:duboismathieu_g...@yahoo.fr wrote:


Hello,

I wanted to use PyTables in conjunction with multiprocessing for some
embarrassingly parallel tasks.

However, it seems that it is not possible. In the following (very
stupid) example, X is a Carray of size (100, 10) stored in the file
test.hdf5:

import tables

import multiprocessing

# Reload the data

h5file = tables.openFile('test.hdf5', mode='r')

X = h5file.root.X

# Use multiprocessing to perform a simple computation (column average)

def f(X):

 name = multiprocessing.current_process().name

 column = random.randint(0, n_features)

 print '%s use column %i' % (name, column)

 return X[:, column].mean()

p = multiprocessing.Pool(2)

col_mean = p.map(f, [X, X, X])

When executing it the following error:

Exception in thread Thread-2:

Traceback (most recent call last):

   File /usr/lib/python2.7/threading.py, line 551, in
__bootstrap_inner

 self.run()

   File /usr/lib/python2.7/threading.py, line 504, in run

 self.__target(*self.__args, **self.__kwargs)

   File /usr/lib/python2.7/multiprocessing/pool.py, line 319, in
_handle_tasks

 put(task)

PicklingError: Can't pickle type 'weakref': attribute lookup
__builtin__.weakref failed


I have googled for weakref and pickle but can't find a solution.

Any help?


Hello Mathieu,

I have used multiprocessing and files opened in read mode many times 
so I am not sure what is going on here.

Thanks for your answer. Maybe you can point me to an working example?

Could you provide the test.hdf5 file so that we could try to reproduce 
this.

Here is the script that I have used to generate the data:

import tables

import numpy

# Create data  store it

n_features = 10

n_obs  = 100

X = numpy.random.rand(n_obs, n_features)

h5file = tables.openFile('test.hdf5', mode='w')

Xatom = tables.Atom.from_dtype(X.dtype)

Xhdf5 = h5file.createCArray(h5file.root, 'X', Xatom, X.shape)

Xhdf5[:] = X

h5file.close()

I hope it's not a stupid mistake. I am using PyTables 2.3.1 on Ubuntu 
12.04 (libhdf5 is 1.8.4patch1).



By the way, I have noticed that by slicing a Carray, I get a numpy
array
(I created the HDF5 file with numpy). Therefore, everything is
copied to
memory. Is there a way to avoid that?


Only the slice that you ask for is brought into memory an it is 
returned as a non-view numpy array.

OK. I may be careful about that.



Be Well
Anthony


Mathieu


--
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831iu=/4140/ostg.clktrk
___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
mailto:Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users




--
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831iu=/4140/ostg.clktrk


___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


--
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831iu=/4140/ostg.clktrk___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] PyTables and Multiprocessing

2013-07-11 Thread Anthony Scopatz
Hi Mathieu,

I think you should try opening a new file handle per process.  The
following works for me on v3.0:

import tables
import random
import multiprocessing

# Reload the data

# Use multiprocessing to perform a simple computation (column average)

def f(filename):
h5file = tables.openFile(filename, mode='r')
name = multiprocessing.current_process().name
column = random.randint(0, 10)
print '%s use column %i' % (name, column)
rtn = h5file.root.X[:, column].mean()
h5file.close()
return rtn

p = multiprocessing.Pool(2)
col_mean = p.map(f, ['test.hdf5', 'test.hdf5', 'test.hdf5'])

Be well
Anthony


On Thu, Jul 11, 2013 at 3:43 PM, Mathieu Dubois duboismathieu_g...@yahoo.fr
 wrote:

  Le 11/07/2013 21:56, Anthony Scopatz a écrit :




 On Thu, Jul 11, 2013 at 2:49 PM, Mathieu Dubois 
 duboismathieu_g...@yahoo.fr wrote:

 Hello,

 I wanted to use PyTables in conjunction with multiprocessing for some
 embarrassingly parallel tasks.

 However, it seems that it is not possible. In the following (very
 stupid) example, X is a Carray of size (100, 10) stored in the file
 test.hdf5:

 import tables

 import multiprocessing

 # Reload the data

 h5file = tables.openFile('test.hdf5', mode='r')

 X = h5file.root.X

 # Use multiprocessing to perform a simple computation (column average)

 def f(X):

  name = multiprocessing.current_process().name

  column = random.randint(0, n_features)

  print '%s use column %i' % (name, column)

  return X[:, column].mean()

 p = multiprocessing.Pool(2)

 col_mean = p.map(f, [X, X, X])

 When executing it the following error:

 Exception in thread Thread-2:

 Traceback (most recent call last):

File /usr/lib/python2.7/threading.py, line 551, in __bootstrap_inner

  self.run()

File /usr/lib/python2.7/threading.py, line 504, in run

  self.__target(*self.__args, **self.__kwargs)

File /usr/lib/python2.7/multiprocessing/pool.py, line 319, in
 _handle_tasks

  put(task)

 PicklingError: Can't pickle type 'weakref': attribute lookup
 __builtin__.weakref failed


 I have googled for weakref and pickle but can't find a solution.

 Any help?


  Hello Mathieu,

  I have used multiprocessing and files opened in read mode many times so
 I am not sure what is going on here.

 Thanks for your answer. Maybe you can point me to an working example?


   Could you provide the test.hdf5 file so that we could try to reproduce
 this.

 Here is the script that I have used to generate the data:

 import tables

 import numpy

 # Create data  store it

 n_features = 10

 n_obs  = 100

 X = numpy.random.rand(n_obs, n_features)

 h5file = tables.openFile('test.hdf5', mode='w')

 Xatom = tables.Atom.from_dtype(X.dtype)

 Xhdf5 = h5file.createCArray(h5file.root, 'X', Xatom, X.shape)

 Xhdf5[:] = X

 h5file.close()


 I hope it's not a stupid mistake. I am using PyTables 2.3.1 on Ubuntu
 12.04 (libhdf5 is 1.8.4patch1).




 By the way, I have noticed that by slicing a Carray, I get a numpy array
 (I created the HDF5 file with numpy). Therefore, everything is copied to
 memory. Is there a way to avoid that?


  Only the slice that you ask for is brought into memory an it is returned
 as a non-view numpy array.

 OK. I may be careful about that.



  Be Well
 Anthony



 Mathieu


 --
 See everything from the browser to the database with AppDynamics
 Get end-to-end visibility with application monitoring from AppDynamics
 Isolate bottlenecks and diagnose root cause in seconds.
 Start your free trial of AppDynamics Pro today!

 http://pubads.g.doubleclick.net/gampad/clk?id=48808831iu=/4140/ostg.clktrk
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users




 --
 See everything from the browser to the database with AppDynamics
 Get end-to-end visibility with application monitoring from AppDynamics
 Isolate bottlenecks and diagnose root cause in seconds.
 Start your free trial of AppDynamics Pro 
 today!http://pubads.g.doubleclick.net/gampad/clk?id=48808831iu=/4140/ostg.clktrk



 ___
 Pytables-users mailing 
 listPytables-users@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/pytables-users




 --
 See everything from the browser to the database with AppDynamics
 Get end-to-end visibility with application monitoring from AppDynamics
 Isolate bottlenecks and diagnose root cause in seconds.
 Start your free trial of AppDynamics Pro today!
 http://pubads.g.doubleclick.net/gampad/clk?id=48808831iu=/4140/ostg.clktrk
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net