[Pytables-users] Storing large images in PyTable
Hello, I'm a beginner with Pyable. I wanted to store a database in a HDF5 file using PyTable. The DB is made by a CSV file (which contains the subject information) and a lot of images (I work on MRI so the images are 3 dimensional float32 arrays of shape (121, 145, 121)). The relation is very simple: there are a 3 images per subject. My first idea was to create a class Subject like this: class Subject(tables.IsDescription): # Subject information Id = tables.UInt16Col() ... Image= tables.Float32Col(shape=IMAGE_SIZE) And the proceed like in the tutorial (open a file, create a group and a table associated to the Subject class and then append data to this table). Unfortunately I got an error when creating the table (even before inserting data): HDF5-DIAG: Error detected in HDF5 (1.8.4-patch1) thread 140612945950464: #000: ../../../src/H5Ddeprec.c line 170 in H5Dcreate1(): unable to create dataset major: Dataset minor: Unable to initialize object #001: ../../../src/H5Dint.c line 428 in H5D_create_named(): unable to create and link to dataset major: Dataset minor: Unable to initialize object #002: ../../../src/H5L.c line 1639 in H5L_link_object(): unable to create new link to object major: Links minor: Unable to initialize object #003: ../../../src/H5L.c line 1862 in H5L_create_real(): can't insert link major: Symbol table minor: Unable to insert object #004: ../../../src/H5Gtraverse.c line 877 in H5G_traverse(): internal path traversal failed major: Symbol table minor: Object not found #005: ../../../src/H5Gtraverse.c line 703 in H5G_traverse_real(): traversal operator failed major: Symbol table minor: Callback failed #006: ../../../src/H5L.c line 1685 in H5L_link_cb(): unable to create object major: Object header minor: Unable to initialize object #007: ../../../src/H5O.c line 2677 in H5O_obj_create(): unable to open object major: Object header minor: Can't open object #008: ../../../src/H5Doh.c line 296 in H5O_dset_create(): unable to create dataset major: Dataset minor: Unable to initialize object #009: ../../../src/H5Dint.c line 1034 in H5D_create(): can't update the metadata cache major: Dataset minor: Unable to initialize object #010: ../../../src/H5Dint.c line 799 in H5D_update_oh_info(): unable to update new fill value header message major: Dataset minor: Unable to initialize object #011: ../../../src/H5Omessage.c line 188 in H5O_msg_append_oh(): unable to create new message in header major: Attribute minor: Unable to insert object #012: ../../../src/H5Omessage.c line 228 in H5O_msg_append_real(): unable to create new message major: Object header minor: No space available for allocation #013: ../../../src/H5Omessage.c line 1940 in H5O_msg_alloc(): unable to allocate space for message major: Object header minor: Unable to initialize object #014: ../../../src/H5Oalloc.c line 1032 in H5O_alloc(): object header message is too large major: Object header minor: Unable to initialize object Traceback (most recent call last): File "00_build_dataset.tmp.py", line 52, in dump_in_hdf5(**vars(args)) File "00_build_dataset.tmp.py", line 32, in dump_in_hdf5 data_api.Subject) File "/usr/lib/python2.7/dist-packages/tables/file.py", line 770, in createTable chunkshape=chunkshape, byteorder=byteorder) File "/usr/lib/python2.7/dist-packages/tables/table.py", line 832, in __init__ byteorder, _log) File "/usr/lib/python2.7/dist-packages/tables/leaf.py", line 291, in __init__ super(Leaf, self).__init__(parentNode, name, _log) File "/usr/lib/python2.7/dist-packages/tables/node.py", line 296, in __init__ self._v_objectID = self._g_create() File "/usr/lib/python2.7/dist-packages/tables/table.py", line 983, in _g_create self._v_new_title, self.filters.complib or '', obversion ) File "tableExtension.pyx", line 195, in tables.tableExtension.Table._createTable (tables/tableExtension.c:2181) tables.exceptions.HDF5ExtError: Problems creating the table I think that the size of the column is too large (if I remove the Image field, everything works perfectly). Therefore what is the best way to store the images (while keeping the relation)? I have read various post about this subject on the web but could not find a definitive answer (the more helpful was http://stackoverflow.com/questions/8843062/python-how-to-store-a-numpy-multidimensional-array-in-pytables). I was thinking to create an extensible array and store each image in the same order than the subject. However, I would feel more comfortable if the subject Id could be inserted too (to join the tables). Any help? Mathieu -- This SF.net email is sponsored by Windows: Build for Win
Re: [Pytables-users] Storing large images in PyTable
Le 05/07/2013 00:31, Anthony Scopatz a écrit : On Thu, Jul 4, 2013 at 4:13 PM, Mathieu Dubois mailto:duboismathieu_g...@yahoo.fr>> wrote: Hello, I'm a beginner with Pyable. I wanted to store a database in a HDF5 file using PyTable. The DB is made by a CSV file (which contains the subject information) and a lot of images (I work on MRI so the images are 3 dimensional float32 arrays of shape (121, 145, 121)). The relation is very simple: there are a 3 images per subject. My first idea was to create a class Subject like this: class Subject(tables.IsDescription): # Subject information Id = tables.UInt16Col() ... Image= tables.Float32Col(shape=IMAGE_SIZE) And the proceed like in the tutorial (open a file, create a group and a table associated to the Subject class and then append data to this table). Unfortunately I got an error when creating the table (even before inserting data): HDF5-DIAG: Error detected in HDF5 (1.8.4-patch1) thread 140612945950464: #000: ../../../src/H5Ddeprec.c line 170 in H5Dcreate1(): unable to create dataset major: Dataset minor: Unable to initialize object #001: ../../../src/H5Dint.c line 428 in H5D_create_named(): unable to create and link to dataset major: Dataset minor: Unable to initialize object #002: ../../../src/H5L.c line 1639 in H5L_link_object(): unable to create new link to object major: Links minor: Unable to initialize object #003: ../../../src/H5L.c line 1862 in H5L_create_real(): can't insert link major: Symbol table minor: Unable to insert object #004: ../../../src/H5Gtraverse.c line 877 in H5G_traverse(): internal path traversal failed major: Symbol table minor: Object not found #005: ../../../src/H5Gtraverse.c line 703 in H5G_traverse_real(): traversal operator failed major: Symbol table minor: Callback failed #006: ../../../src/H5L.c line 1685 in H5L_link_cb(): unable to create object major: Object header minor: Unable to initialize object #007: ../../../src/H5O.c line 2677 in H5O_obj_create(): unable to open object major: Object header minor: Can't open object #008: ../../../src/H5Doh.c line 296 in H5O_dset_create(): unable to create dataset major: Dataset minor: Unable to initialize object #009: ../../../src/H5Dint.c line 1034 in H5D_create(): can't update the metadata cache major: Dataset minor: Unable to initialize object #010: ../../../src/H5Dint.c line 799 in H5D_update_oh_info(): unable to update new fill value header message major: Dataset minor: Unable to initialize object #011: ../../../src/H5Omessage.c line 188 in H5O_msg_append_oh(): unable to create new message in header major: Attribute minor: Unable to insert object #012: ../../../src/H5Omessage.c line 228 in H5O_msg_append_real(): unable to create new message major: Object header minor: No space available for allocation #013: ../../../src/H5Omessage.c line 1940 in H5O_msg_alloc(): unable to allocate space for message major: Object header minor: Unable to initialize object #014: ../../../src/H5Oalloc.c line 1032 in H5O_alloc(): object header message is too large major: Object header minor: Unable to initialize object Traceback (most recent call last): File "00_build_dataset.tmp.py <http://00_build_dataset.tmp.py>", line 52, in dump_in_hdf5(**vars(args)) File "00_build_dataset.tmp.py <http://00_build_dataset.tmp.py>", line 32, in dump_in_hdf5 data_api.Subject) File "/usr/lib/python2.7/dist-packages/tables/file.py", line 770, in createTable chunkshape=chunkshape, byteorder=byteorder) File "/usr/lib/python2.7/dist-packages/tables/table.py", line 832, in __init__ byteorder, _log) File "/usr/lib/python2.7/dist-packages/tables/leaf.py", line 291, in __init__ super(Leaf, self).__init__(parentNode, name, _log) File "/usr/lib/python2.7/dist-packages/tables/node.py", line 296, in __init__ self._v_objectID = self._g_create() File "/usr/lib/python2.7/dist-packages/tables/table.py", line 983, in _g_create self._v_new_title, self.filters.complib or '', obversion ) File "tableExtension.pyx", line 195, in tables.tableExtension.Table._createTable (tables/tableExtension.c:2181) tables.exceptions.HDF5ExtError: Problems
Re: [Pytables-users] Storing large images in PyTable
Hi, Sorry for the late response. First of all, I have managed to achieve what I wanted to do differently. Then the code Francesc send works well (I had to adapt it because I use version 2.3.1 under Ubuntu 12.04). I was able to reproduce something similar with a class like this (copied & pasted from the tutorial): import tables as tb import numpy as np class Subject(tb.IsDescription): # Subject information Id = tb.UInt16Col() Image= tb.Float32Col(shape=(121, 145, 121)) h5file = tb.openFile("tutorial1.h5", mode = "w", title = "Test file") group = h5file.createGroup("/", 'subject', 'Suject information') table = h5file.createTable(group, 'readout', Subject, "Readout example") subject = table.row for i in xrange(10): subject['Id'] = i subject['Image'] = np.ones((121, 145, 121)) subject.append() This code works well too. So I don't really know why nothing was working yesterday: this was the same class and a very close program. I will try to investigate later on this. Thanks for everything, Mahtieu Le 05/07/2013 16:54, Anthony Scopatz a écrit : On Fri, Jul 5, 2013 at 8:40 AM, Francesc Alted <mailto:fal...@gmail.com>> wrote: On 7/5/13 1:33 AM, Mathieu Dubois wrote: > tables.tableExtension.Table._createTable (tables/tableExtension.c:2181) >> >> tables.exceptions.HDF5ExtError: Problems creating the table >> >> I think that the size of the column is too large (if I remove the >> Image >> field, everything works perfectly). >> >> >> Hi Mathieu, >> >> This shouldn't be the case. What is the value of IMAGE_SIZE? > > IMAGE_SIZE is a tuple containing (121, 145, 121). This is a bit large for a row in the Table object. My recommendation for these cases is to use an associated EArray with shape (0, 121, 145, 121) and then append the images there. You can always refer to the image by issuing a __getitem__() operation on the EArray object with the index of the row in the table. Easy as a pie and you will allow the compression library (in case you are using compression) to work much more efficiently for the table. Hi Francesc, I disagree that this shape is too large for a table. Here is a minimal example that works for me: import tables as tb import numpy as np images = np.ones(100, dtype=[('id', np.uint16), ('image', np.float32, (121, 145, 121)) ]) with tb.open_file('temp.h5', 'w') as f: f.create_table('/', 'images', images) I think that there is something else going on with the initialization but Mathieu hasn't given us enough information to figure it out =/. A minimal failing script would be super helpful here! (BTW Mathieu, Tables can also take advantage of compression. Though Francesc's solution is nicer for a lot of reason too.) Be Well Anthony HTH, -- Francesc Alted -- This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev ___ Pytables-users mailing list Pytables-users@lists.sourceforge.net <mailto:Pytables-users@lists.sourceforge.net> https://lists.sourceforge.net/lists/listinfo/pytables-users -- This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev ___ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users -- This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev___ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users
[Pytables-users] PyTables and Multiprocessing
Hello, I wanted to use PyTables in conjunction with multiprocessing for some embarrassingly parallel tasks. However, it seems that it is not possible. In the following (very stupid) example, X is a Carray of size (100, 10) stored in the file test.hdf5: import tables import multiprocessing # Reload the data h5file = tables.openFile('test.hdf5', mode='r') X = h5file.root.X # Use multiprocessing to perform a simple computation (column average) def f(X): name = multiprocessing.current_process().name column = random.randint(0, n_features) print '%s use column %i' % (name, column) return X[:, column].mean() p = multiprocessing.Pool(2) col_mean = p.map(f, [X, X, X]) When executing it the following error: Exception in thread Thread-2: Traceback (most recent call last): File "/usr/lib/python2.7/threading.py", line 551, in __bootstrap_inner self.run() File "/usr/lib/python2.7/threading.py", line 504, in run self.__target(*self.__args, **self.__kwargs) File "/usr/lib/python2.7/multiprocessing/pool.py", line 319, in _handle_tasks put(task) PicklingError: Can't pickle : attribute lookup __builtin__.weakref failed I have googled for weakref and pickle but can't find a solution. Any help? By the way, I have noticed that by slicing a Carray, I get a numpy array (I created the HDF5 file with numpy). Therefore, everything is copied to memory. Is there a way to avoid that? Mathieu -- See everything from the browser to the database with AppDynamics Get end-to-end visibility with application monitoring from AppDynamics Isolate bottlenecks and diagnose root cause in seconds. Start your free trial of AppDynamics Pro today! http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk ___ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users
Re: [Pytables-users] PyTables and Multiprocessing
Le 11/07/2013 21:56, Anthony Scopatz a écrit : On Thu, Jul 11, 2013 at 2:49 PM, Mathieu Dubois mailto:duboismathieu_g...@yahoo.fr>> wrote: Hello, I wanted to use PyTables in conjunction with multiprocessing for some embarrassingly parallel tasks. However, it seems that it is not possible. In the following (very stupid) example, X is a Carray of size (100, 10) stored in the file test.hdf5: import tables import multiprocessing # Reload the data h5file = tables.openFile('test.hdf5', mode='r') X = h5file.root.X # Use multiprocessing to perform a simple computation (column average) def f(X): name = multiprocessing.current_process().name column = random.randint(0, n_features) print '%s use column %i' % (name, column) return X[:, column].mean() p = multiprocessing.Pool(2) col_mean = p.map(f, [X, X, X]) When executing it the following error: Exception in thread Thread-2: Traceback (most recent call last): File "/usr/lib/python2.7/threading.py", line 551, in __bootstrap_inner self.run() File "/usr/lib/python2.7/threading.py", line 504, in run self.__target(*self.__args, **self.__kwargs) File "/usr/lib/python2.7/multiprocessing/pool.py", line 319, in _handle_tasks put(task) PicklingError: Can't pickle : attribute lookup __builtin__.weakref failed I have googled for weakref and pickle but can't find a solution. Any help? Hello Mathieu, I have used multiprocessing and files opened in read mode many times so I am not sure what is going on here. Thanks for your answer. Maybe you can point me to an working example? Could you provide the test.hdf5 file so that we could try to reproduce this. Here is the script that I have used to generate the data: import tables import numpy # Create data & store it n_features = 10 n_obs = 100 X = numpy.random.rand(n_obs, n_features) h5file = tables.openFile('test.hdf5', mode='w') Xatom = tables.Atom.from_dtype(X.dtype) Xhdf5 = h5file.createCArray(h5file.root, 'X', Xatom, X.shape) Xhdf5[:] = X h5file.close() I hope it's not a stupid mistake. I am using PyTables 2.3.1 on Ubuntu 12.04 (libhdf5 is 1.8.4patch1). By the way, I have noticed that by slicing a Carray, I get a numpy array (I created the HDF5 file with numpy). Therefore, everything is copied to memory. Is there a way to avoid that? Only the slice that you ask for is brought into memory an it is returned as a non-view numpy array. OK. I may be careful about that. Be Well Anthony Mathieu -- See everything from the browser to the database with AppDynamics Get end-to-end visibility with application monitoring from AppDynamics Isolate bottlenecks and diagnose root cause in seconds. Start your free trial of AppDynamics Pro today! http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk ___ Pytables-users mailing list Pytables-users@lists.sourceforge.net <mailto:Pytables-users@lists.sourceforge.net> https://lists.sourceforge.net/lists/listinfo/pytables-users -- See everything from the browser to the database with AppDynamics Get end-to-end visibility with application monitoring from AppDynamics Isolate bottlenecks and diagnose root cause in seconds. Start your free trial of AppDynamics Pro today! http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk ___ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users -- See everything from the browser to the database with AppDynamics Get end-to-end visibility with application monitoring from AppDynamics Isolate bottlenecks and diagnose root cause in seconds. Start your free trial of AppDynamics Pro today! http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk___ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users
Re: [Pytables-users] PyTables and Multiprocessing
Hi Anthony, Thank you very much for your answer (it works). I will try to remodel my code around this trick but I'm not sure it's possible because I use a framework that need arrays. Can somebody explain what is going on? I was thinking that PyTables keep weakref to the file for lazy loading but I'm not sure. How In any case, the PyTables community is very helpful. Thanks, Mathieu Le 12/07/2013 00:44, Anthony Scopatz a écrit : Hi Mathieu, I think you should try opening a new file handle per process. The following works for me on v3.0: import tables import random import multiprocessing # Reload the data # Use multiprocessing to perform a simple computation (column average) def f(filename): h5file = tables.openFile(filename, mode='r') name = multiprocessing.current_process().name column = random.randint(0, 10) print '%s use column %i' % (name, column) rtn = h5file.root.X[:, column].mean() h5file.close() return rtn p = multiprocessing.Pool(2) col_mean = p.map(f, ['test.hdf5', 'test.hdf5', 'test.hdf5']) Be well Anthony On Thu, Jul 11, 2013 at 3:43 PM, Mathieu Dubois mailto:duboismathieu_g...@yahoo.fr>> wrote: Le 11/07/2013 21:56, Anthony Scopatz a écrit : On Thu, Jul 11, 2013 at 2:49 PM, Mathieu Dubois mailto:duboismathieu_g...@yahoo.fr>> wrote: Hello, I wanted to use PyTables in conjunction with multiprocessing for some embarrassingly parallel tasks. However, it seems that it is not possible. In the following (very stupid) example, X is a Carray of size (100, 10) stored in the file test.hdf5: import tables import multiprocessing # Reload the data h5file = tables.openFile('test.hdf5', mode='r') X = h5file.root.X # Use multiprocessing to perform a simple computation (column average) def f(X): name = multiprocessing.current_process().name column = random.randint(0, n_features) print '%s use column %i' % (name, column) return X[:, column].mean() p = multiprocessing.Pool(2) col_mean = p.map(f, [X, X, X]) When executing it the following error: Exception in thread Thread-2: Traceback (most recent call last): File "/usr/lib/python2.7/threading.py", line 551, in __bootstrap_inner self.run() File "/usr/lib/python2.7/threading.py", line 504, in run self.__target(*self.__args, **self.__kwargs) File "/usr/lib/python2.7/multiprocessing/pool.py", line 319, in _handle_tasks put(task) PicklingError: Can't pickle : attribute lookup __builtin__.weakref failed I have googled for weakref and pickle but can't find a solution. Any help? Hello Mathieu, I have used multiprocessing and files opened in read mode many times so I am not sure what is going on here. Thanks for your answer. Maybe you can point me to an working example? Could you provide the test.hdf5 file so that we could try to reproduce this. Here is the script that I have used to generate the data: import tables import numpy # Create data & store it n_features = 10 n_obs = 100 X = numpy.random.rand(n_obs, n_features) h5file = tables.openFile('test.hdf5', mode='w') Xatom = tables.Atom.from_dtype(X.dtype) Xhdf5 = h5file.createCArray(h5file.root, 'X', Xatom, X.shape) Xhdf5[:] = X h5file.close() I hope it's not a stupid mistake. I am using PyTables 2.3.1 on Ubuntu 12.04 (libhdf5 is 1.8.4patch1). By the way, I have noticed that by slicing a Carray, I get a numpy array (I created the HDF5 file with numpy). Therefore, everything is copied to memory. Is there a way to avoid that? Only the slice that you ask for is brought into memory an it is returned as a non-view numpy array. OK. I may be careful about that. Be Well Anthony Mathieu -- See everything from the browser to the database with AppDynamics Get end-to-end visibility with application monitoring from AppDynamics Isolate bottlenecks and diagnose root cause in seconds. Start your free trial of AppDynamics Pro today! http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk ___ Pytables-users mailing list Pytables-users@lists.sourceforge.net <mailto:Pytables-users@lists.sourceforge.net>