[Pytables-users] Storing large images in PyTable

2013-07-04 Thread Mathieu Dubois
Hello,

I'm a beginner with Pyable.

I wanted to store a database in a HDF5 file using PyTable. The DB is 
made by a CSV file (which contains the subject information) and a lot of 
images (I work on MRI so the images are 3 dimensional float32 arrays of 
shape (121, 145, 121)). The relation is very simple: there are a 3 
images per subject.

My first idea was to create a class  Subject like this:
class Subject(tables.IsDescription):
 # Subject information
 Id   = tables.UInt16Col()
 ...
 Image= tables.Float32Col(shape=IMAGE_SIZE)

And the proceed like in the tutorial (open a file, create a group and a 
table associated to the Subject class and then append data to this table).

Unfortunately I got an error when creating the table (even before 
inserting data):
HDF5-DIAG: Error detected in HDF5 (1.8.4-patch1) thread 140612945950464:
   #000: ../../../src/H5Ddeprec.c line 170 in H5Dcreate1(): unable to 
create dataset
 major: Dataset
 minor: Unable to initialize object
   #001: ../../../src/H5Dint.c line 428 in H5D_create_named(): unable to 
create and link to dataset
 major: Dataset
 minor: Unable to initialize object
   #002: ../../../src/H5L.c line 1639 in H5L_link_object(): unable to 
create new link to object
 major: Links
 minor: Unable to initialize object
   #003: ../../../src/H5L.c line 1862 in H5L_create_real(): can't insert 
link
 major: Symbol table
 minor: Unable to insert object
   #004: ../../../src/H5Gtraverse.c line 877 in H5G_traverse(): internal 
path traversal failed
 major: Symbol table
 minor: Object not found
   #005: ../../../src/H5Gtraverse.c line 703 in H5G_traverse_real(): 
traversal operator failed
 major: Symbol table
 minor: Callback failed
   #006: ../../../src/H5L.c line 1685 in H5L_link_cb(): unable to create 
object
 major: Object header
 minor: Unable to initialize object
   #007: ../../../src/H5O.c line 2677 in H5O_obj_create(): unable to 
open object
 major: Object header
 minor: Can't open object
   #008: ../../../src/H5Doh.c line 296 in H5O_dset_create(): unable to 
create dataset
 major: Dataset
 minor: Unable to initialize object
   #009: ../../../src/H5Dint.c line 1034 in H5D_create(): can't update 
the metadata cache
 major: Dataset
 minor: Unable to initialize object
   #010: ../../../src/H5Dint.c line 799 in H5D_update_oh_info(): unable 
to update new fill value header message
 major: Dataset
 minor: Unable to initialize object
   #011: ../../../src/H5Omessage.c line 188 in H5O_msg_append_oh(): 
unable to create new message in header
 major: Attribute
 minor: Unable to insert object
   #012: ../../../src/H5Omessage.c line 228 in H5O_msg_append_real(): 
unable to create new message
 major: Object header
 minor: No space available for allocation
   #013: ../../../src/H5Omessage.c line 1940 in H5O_msg_alloc(): unable 
to allocate space for message
 major: Object header
 minor: Unable to initialize object
   #014: ../../../src/H5Oalloc.c line 1032 in H5O_alloc(): object header 
message is too large
 major: Object header
 minor: Unable to initialize object
Traceback (most recent call last):
   File "00_build_dataset.tmp.py", line 52, in 
 dump_in_hdf5(**vars(args))
   File "00_build_dataset.tmp.py", line 32, in dump_in_hdf5
 data_api.Subject)
   File "/usr/lib/python2.7/dist-packages/tables/file.py", line 770, in 
createTable
 chunkshape=chunkshape, byteorder=byteorder)
   File "/usr/lib/python2.7/dist-packages/tables/table.py", line 832, in 
__init__
 byteorder, _log)
   File "/usr/lib/python2.7/dist-packages/tables/leaf.py", line 291, in 
__init__
 super(Leaf, self).__init__(parentNode, name, _log)
   File "/usr/lib/python2.7/dist-packages/tables/node.py", line 296, in 
__init__
 self._v_objectID = self._g_create()
   File "/usr/lib/python2.7/dist-packages/tables/table.py", line 983, in 
_g_create
 self._v_new_title, self.filters.complib or '', obversion )
   File "tableExtension.pyx", line 195, in 
tables.tableExtension.Table._createTable (tables/tableExtension.c:2181)
tables.exceptions.HDF5ExtError: Problems creating the table

I think that the size of the column is too large (if I remove the Image 
field, everything works perfectly).

Therefore what is the best way to store the images (while keeping the 
relation)? I have read various post about this subject on the web but 
could not find a definitive answer (the more helpful was 
http://stackoverflow.com/questions/8843062/python-how-to-store-a-numpy-multidimensional-array-in-pytables).

I was thinking to create an extensible array and store each image in the 
same order than the subject. However, I would feel more comfortable if 
the subject Id could be inserted too (to join the tables).

Any help?

Mathieu

--
This SF.net email is sponsored by Windows:

Build for Win

Re: [Pytables-users] Storing large images in PyTable

2013-07-04 Thread Mathieu Dubois

Le 05/07/2013 00:31, Anthony Scopatz a écrit :




On Thu, Jul 4, 2013 at 4:13 PM, Mathieu Dubois 
mailto:duboismathieu_g...@yahoo.fr>> wrote:


Hello,

I'm a beginner with Pyable.

I wanted to store a database in a HDF5 file using PyTable. The DB is
made by a CSV file (which contains the subject information) and a
lot of
images (I work on MRI so the images are 3 dimensional float32
arrays of
shape (121, 145, 121)). The relation is very simple: there are a 3
images per subject.

My first idea was to create a class  Subject like this:
class Subject(tables.IsDescription):
 # Subject information
 Id   = tables.UInt16Col()
 ...
 Image= tables.Float32Col(shape=IMAGE_SIZE)

And the proceed like in the tutorial (open a file, create a group
and a
table associated to the Subject class and then append data to this
table).

Unfortunately I got an error when creating the table (even before
inserting data):
HDF5-DIAG: Error detected in HDF5 (1.8.4-patch1) thread
140612945950464:
   #000: ../../../src/H5Ddeprec.c line 170 in H5Dcreate1(): unable to
create dataset
 major: Dataset
 minor: Unable to initialize object
   #001: ../../../src/H5Dint.c line 428 in H5D_create_named():
unable to
create and link to dataset
 major: Dataset
 minor: Unable to initialize object
   #002: ../../../src/H5L.c line 1639 in H5L_link_object(): unable to
create new link to object
 major: Links
 minor: Unable to initialize object
   #003: ../../../src/H5L.c line 1862 in H5L_create_real(): can't
insert
link
 major: Symbol table
 minor: Unable to insert object
   #004: ../../../src/H5Gtraverse.c line 877 in H5G_traverse():
internal
path traversal failed
 major: Symbol table
 minor: Object not found
   #005: ../../../src/H5Gtraverse.c line 703 in H5G_traverse_real():
traversal operator failed
 major: Symbol table
 minor: Callback failed
   #006: ../../../src/H5L.c line 1685 in H5L_link_cb(): unable to
create
object
 major: Object header
 minor: Unable to initialize object
   #007: ../../../src/H5O.c line 2677 in H5O_obj_create(): unable to
open object
 major: Object header
 minor: Can't open object
   #008: ../../../src/H5Doh.c line 296 in H5O_dset_create(): unable to
create dataset
 major: Dataset
 minor: Unable to initialize object
   #009: ../../../src/H5Dint.c line 1034 in H5D_create(): can't update
the metadata cache
 major: Dataset
 minor: Unable to initialize object
   #010: ../../../src/H5Dint.c line 799 in H5D_update_oh_info():
unable
to update new fill value header message
 major: Dataset
 minor: Unable to initialize object
   #011: ../../../src/H5Omessage.c line 188 in H5O_msg_append_oh():
unable to create new message in header
 major: Attribute
 minor: Unable to insert object
   #012: ../../../src/H5Omessage.c line 228 in H5O_msg_append_real():
unable to create new message
 major: Object header
 minor: No space available for allocation
   #013: ../../../src/H5Omessage.c line 1940 in H5O_msg_alloc():
unable
to allocate space for message
 major: Object header
 minor: Unable to initialize object
   #014: ../../../src/H5Oalloc.c line 1032 in H5O_alloc(): object
header
message is too large
 major: Object header
 minor: Unable to initialize object
Traceback (most recent call last):
   File "00_build_dataset.tmp.py
<http://00_build_dataset.tmp.py>", line 52, in 
 dump_in_hdf5(**vars(args))
   File "00_build_dataset.tmp.py
<http://00_build_dataset.tmp.py>", line 32, in dump_in_hdf5
 data_api.Subject)
   File "/usr/lib/python2.7/dist-packages/tables/file.py", line
770, in
createTable
 chunkshape=chunkshape, byteorder=byteorder)
   File "/usr/lib/python2.7/dist-packages/tables/table.py", line
832, in
__init__
 byteorder, _log)
   File "/usr/lib/python2.7/dist-packages/tables/leaf.py", line
291, in
__init__
 super(Leaf, self).__init__(parentNode, name, _log)
   File "/usr/lib/python2.7/dist-packages/tables/node.py", line
296, in
__init__
 self._v_objectID = self._g_create()
   File "/usr/lib/python2.7/dist-packages/tables/table.py", line
983, in
_g_create
 self._v_new_title, self.filters.complib or '', obversion )
   File "tableExtension.pyx", line 195, in
tables.tableExtension.Table._createTable
(tables/tableExtension.c:2181)
tables.exceptions.HDF5ExtError: Problems

Re: [Pytables-users] Storing large images in PyTable

2013-07-05 Thread Mathieu Dubois

Hi,

Sorry for the late response.

First of all, I have managed to achieve what I wanted to do differently.

Then the code Francesc send works well (I had to adapt it because I use 
version 2.3.1 under Ubuntu 12.04).


I was able to reproduce something similar with a class like this (copied 
& pasted from the tutorial):


import tables as tb

import numpy as np

class Subject(tb.IsDescription):

 # Subject information

 Id   = tb.UInt16Col()

 Image= tb.Float32Col(shape=(121, 145, 121))

h5file = tb.openFile("tutorial1.h5", mode = "w", title = "Test file")

group = h5file.createGroup("/", 'subject', 'Suject information')

table = h5file.createTable(group, 'readout', Subject, "Readout example")

subject = table.row

for i in xrange(10):

 subject['Id'] = i

 subject['Image'] = np.ones((121, 145, 121))

 subject.append()

This code works well  too.

So I don't really know why nothing was working yesterday: this was the 
same class and a very close program. I will try to investigate later on 
this.


Thanks for everything,
Mahtieu

Le 05/07/2013 16:54, Anthony Scopatz a écrit :




On Fri, Jul 5, 2013 at 8:40 AM, Francesc Alted <mailto:fal...@gmail.com>> wrote:


On 7/5/13 1:33 AM, Mathieu Dubois wrote:
> tables.tableExtension.Table._createTable
(tables/tableExtension.c:2181)
>>
>> tables.exceptions.HDF5ExtError: Problems creating the table
>>
>> I think that the size of the column is too large (if I
remove the
>> Image
>> field, everything works perfectly).
>>
>>
>> Hi Mathieu,
>>
>> This shouldn't be the case.  What is the value of IMAGE_SIZE?
>
> IMAGE_SIZE is a tuple containing (121, 145, 121).

This is a bit large for a row in the Table object.  My recommendation
for these cases is to use an associated EArray with shape (0, 121,
145,
121) and then append the images there.  You can always refer to the
image by issuing a __getitem__() operation on the EArray object
with the
index of the row in the table.  Easy as a pie and you will allow the
compression library (in case you are using compression) to work much
more efficiently for the table.



Hi Francesc,

I disagree that this shape is too large for a table.  Here is a 
minimal example that works for me:


import tables as tb
import numpy as np

images = np.ones(100, dtype=[('id', np.uint16),
   ('image', np.float32, (121, 145, 121))
   ])

with tb.open_file('temp.h5', 'w') as f:
f.create_table('/', 'images', images)

I think that there is something else going on with the initialization 
but Mathieu hasn't given us enough information to figure it out =/.  A 
minimal failing script would be super helpful here!


(BTW Mathieu, Tables can also take advantage of compression.  Though 
Francesc's solution is nicer for a lot of reason too.)


Be Well
Anthony


HTH,

-- Francesc Alted


--
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev
___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
<mailto:Pytables-users@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/pytables-users




--
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev


___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


--
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


[Pytables-users] PyTables and Multiprocessing

2013-07-11 Thread Mathieu Dubois
Hello,

I wanted to use PyTables in conjunction with multiprocessing for some 
embarrassingly parallel tasks.

However, it seems that it is not possible. In the following (very 
stupid) example, X is a Carray of size (100, 10) stored in the file 
test.hdf5:

import tables

import multiprocessing

# Reload the data

h5file = tables.openFile('test.hdf5', mode='r')

X = h5file.root.X

# Use multiprocessing to perform a simple computation (column average)

def f(X):

 name = multiprocessing.current_process().name

 column = random.randint(0, n_features)

 print '%s use column %i' % (name, column)

 return X[:, column].mean()

p = multiprocessing.Pool(2)

col_mean = p.map(f, [X, X, X])

When executing it the following error:

Exception in thread Thread-2:

Traceback (most recent call last):

   File "/usr/lib/python2.7/threading.py", line 551, in __bootstrap_inner

 self.run()

   File "/usr/lib/python2.7/threading.py", line 504, in run

 self.__target(*self.__args, **self.__kwargs)

   File "/usr/lib/python2.7/multiprocessing/pool.py", line 319, in _handle_tasks

 put(task)

PicklingError: Can't pickle : attribute lookup 
__builtin__.weakref failed


I have googled for weakref and pickle but can't find a solution.

Any help?

By the way, I have noticed that by slicing a Carray, I get a numpy array 
(I created the HDF5 file with numpy). Therefore, everything is copied to 
memory. Is there a way to avoid that?

Mathieu

--
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] PyTables and Multiprocessing

2013-07-11 Thread Mathieu Dubois

Le 11/07/2013 21:56, Anthony Scopatz a écrit :




On Thu, Jul 11, 2013 at 2:49 PM, Mathieu Dubois 
mailto:duboismathieu_g...@yahoo.fr>> wrote:


Hello,

I wanted to use PyTables in conjunction with multiprocessing for some
embarrassingly parallel tasks.

However, it seems that it is not possible. In the following (very
stupid) example, X is a Carray of size (100, 10) stored in the file
test.hdf5:

import tables

import multiprocessing

# Reload the data

h5file = tables.openFile('test.hdf5', mode='r')

X = h5file.root.X

# Use multiprocessing to perform a simple computation (column average)

def f(X):

 name = multiprocessing.current_process().name

 column = random.randint(0, n_features)

 print '%s use column %i' % (name, column)

 return X[:, column].mean()

p = multiprocessing.Pool(2)

col_mean = p.map(f, [X, X, X])

When executing it the following error:

Exception in thread Thread-2:

Traceback (most recent call last):

   File "/usr/lib/python2.7/threading.py", line 551, in
__bootstrap_inner

 self.run()

   File "/usr/lib/python2.7/threading.py", line 504, in run

 self.__target(*self.__args, **self.__kwargs)

   File "/usr/lib/python2.7/multiprocessing/pool.py", line 319, in
_handle_tasks

 put(task)

PicklingError: Can't pickle : attribute lookup
__builtin__.weakref failed


I have googled for weakref and pickle but can't find a solution.

Any help?


Hello Mathieu,

I have used multiprocessing and files opened in read mode many times 
so I am not sure what is going on here.

Thanks for your answer. Maybe you can point me to an working example?

Could you provide the test.hdf5 file so that we could try to reproduce 
this.

Here is the script that I have used to generate the data:

import tables

import numpy

# Create data & store it

n_features = 10

n_obs  = 100

X = numpy.random.rand(n_obs, n_features)

h5file = tables.openFile('test.hdf5', mode='w')

Xatom = tables.Atom.from_dtype(X.dtype)

Xhdf5 = h5file.createCArray(h5file.root, 'X', Xatom, X.shape)

Xhdf5[:] = X

h5file.close()

I hope it's not a stupid mistake. I am using PyTables 2.3.1 on Ubuntu 
12.04 (libhdf5 is 1.8.4patch1).



By the way, I have noticed that by slicing a Carray, I get a numpy
array
(I created the HDF5 file with numpy). Therefore, everything is
copied to
memory. Is there a way to avoid that?


Only the slice that you ask for is brought into memory an it is 
returned as a non-view numpy array.

OK. I may be careful about that.



Be Well
Anthony


Mathieu


--
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
<mailto:Pytables-users@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/pytables-users




--
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk


___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


--
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] PyTables and Multiprocessing

2013-07-11 Thread Mathieu Dubois

Hi Anthony,

Thank you very much for your answer (it works). I will try to remodel my 
code around this trick but I'm not sure it's possible because I use a 
framework that need arrays.


Can somebody explain what is going on? I was thinking that PyTables keep 
weakref to the file for lazy loading but I'm not sure.


How

In any case, the PyTables community is very helpful.

Thanks,
Mathieu

Le 12/07/2013 00:44, Anthony Scopatz a écrit :

Hi Mathieu,

I think you should try opening a new file handle per process.  The 
following works for me on v3.0:


import tables
import random
import multiprocessing

# Reload the data

# Use multiprocessing to perform a simple computation (column average)

def f(filename):
h5file = tables.openFile(filename, mode='r')
name = multiprocessing.current_process().name
column = random.randint(0, 10)
print '%s use column %i' % (name, column)
rtn = h5file.root.X[:, column].mean()
h5file.close()
return rtn

p = multiprocessing.Pool(2)
col_mean = p.map(f, ['test.hdf5', 'test.hdf5', 'test.hdf5'])

Be well
Anthony


On Thu, Jul 11, 2013 at 3:43 PM, Mathieu Dubois 
mailto:duboismathieu_g...@yahoo.fr>> wrote:


Le 11/07/2013 21:56, Anthony Scopatz a écrit :




On Thu, Jul 11, 2013 at 2:49 PM, Mathieu Dubois
mailto:duboismathieu_g...@yahoo.fr>> wrote:

Hello,

I wanted to use PyTables in conjunction with multiprocessing
for some
embarrassingly parallel tasks.

However, it seems that it is not possible. In the following (very
stupid) example, X is a Carray of size (100, 10) stored in
the file
test.hdf5:

import tables

import multiprocessing

# Reload the data

h5file = tables.openFile('test.hdf5', mode='r')

X = h5file.root.X

# Use multiprocessing to perform a simple computation (column
average)

def f(X):

 name = multiprocessing.current_process().name

 column = random.randint(0, n_features)

 print '%s use column %i' % (name, column)

 return X[:, column].mean()

p = multiprocessing.Pool(2)

col_mean = p.map(f, [X, X, X])

When executing it the following error:

Exception in thread Thread-2:

Traceback (most recent call last):

   File "/usr/lib/python2.7/threading.py", line 551, in
__bootstrap_inner

 self.run()

   File "/usr/lib/python2.7/threading.py", line 504, in run

 self.__target(*self.__args, **self.__kwargs)

   File "/usr/lib/python2.7/multiprocessing/pool.py", line
319, in _handle_tasks

 put(task)

PicklingError: Can't pickle : attribute
lookup __builtin__.weakref failed


I have googled for weakref and pickle but can't find a solution.

Any help?


Hello Mathieu,

I have used multiprocessing and files opened in read mode many
times so I am not sure what is going on here.

Thanks for your answer. Maybe you can point me to an working example?



Could you provide the test.hdf5 file so that we could try to
reproduce this.

Here is the script that I have used to generate the data:

import tables

import numpy

# Create data & store it

n_features = 10

n_obs  = 100

X = numpy.random.rand(n_obs, n_features)

h5file = tables.openFile('test.hdf5', mode='w')

Xatom = tables.Atom.from_dtype(X.dtype)

Xhdf5 = h5file.createCArray(h5file.root, 'X', Xatom, X.shape)

Xhdf5[:] = X

h5file.close()

I hope it's not a stupid mistake. I am using PyTables 2.3.1 on
Ubuntu 12.04 (libhdf5 is 1.8.4patch1).



By the way, I have noticed that by slicing a Carray, I get a
numpy array
(I created the HDF5 file with numpy). Therefore, everything
is copied to
memory. Is there a way to avoid that?


Only the slice that you ask for is brought into memory an it is
returned as a non-view numpy array.

OK. I may be careful about that.




Be Well
Anthony


Mathieu


--
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from
AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!

http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
<mailto:Pytables-users@lists.sourceforge.net>