A Friday 25 March 2011 21:12:50 Adriano Vilela Barbosa escrigué: > > Probably not, but as I said before, trying to pack binary data as > > strings is asking for problems. Please use a bytes array instead. > > If what you are after is performance, then I'd say that > > Blosc/VLArray is the way to go. > > I understand. As I said before, I was using strings because that's > what the OpenCV Python bindings use to represent image data (though > they've been moving towards numpy in their latest releases). > Actually, representing byte streams as strings seems to be the > standard in Python 2.x, which was kind of surprising to me when I > first started programming in Python.
Exactly, and this is why the Python crew has introduced the bytearray object in Python 2.6. See more info on this in: http://docs.python.org/whatsnew/2.6.html#pep-3112-byte-literals > > Could you send a self-contained example reproducing your problem? > > Please, see the code below. Okay. The problem was two-folded. First of all, a bug in the way PyTables deals with the defaults, made the MemoryError (this has been fixed in trunk). Secondly, and due to HDF5 limitations, you cannot use atoms that are larger than 64 KB. The canonical way to handle this is to add more dimensions to the datasets in HDF5 and then use the slice selection capabilities to retrieve the images. Look at this: import tables import numpy from time import time # ----- Writing data to file ----- # # Open the output file for writing fid = tables.openFile("carray_error.hdf","w") # Create a table group fid.createGroup("/", 'table', 'Flow table') # The number of rows and columns in a frame, and the number of frames n_rows = 480 n_cols = 720 n_frames = 2 # Create a numpy vector to be stored in the Carray matrix = numpy.random.randn(n_rows,n_cols) # The CArray shape array_shape = (n_frames, n_rows, n_cols) # The CArray atom array_atom = tables.Int16Atom() # Create a Carray for holding horizontal flow values fid.createCArray(fid.root.table,'flow_x',array_atom,array_shape) # Create a Carray for holding vertical flow values. This is where we # get an error; working with smaller values of n_rows and n_cols works # fine though. fid.createCArray(fid.root.table,'flow_y',array_atom,array_shape) t0 = time() for m in range(n_frames): fid.root.table.flow_x[0] = matrix fid.root.table.flow_y[0] = matrix print "time to save a couple of matrices:", round(time()-t0, 3) # ----- Reading data from file ----- # print "flow_x:", fid.root.table.flow_x[0] print "flow_y:", fid.root.table.flow_y[0] # Close the output file fid.close() And the output: time to save a couple of matrices: 0.004 flow_x: [[ 0 0 0 ..., 0 1 0] [ 1 0 0 ..., 0 0 0] [ 1 0 0 ..., 0 0 0] ..., [ 1 2 -1 ..., -1 0 1] [ 2 0 -1 ..., 0 0 -1] [-1 1 0 ..., -1 0 0]] flow_y: [[ 0 0 0 ..., 0 1 0] [ 1 0 0 ..., 0 0 0] [ 1 0 0 ..., 0 0 0] ..., [ 1 2 -1 ..., -1 0 1] [ 2 0 -1 ..., 0 0 -1] [-1 1 0 ..., -1 0 0]] Hope this helps, -- Francesc Alted ------------------------------------------------------------------------------ Enable your software for Intel(R) Active Management Technology to meet the growing manageability and security demands of your customers. Businesses are taking advantage of Intel(R) vPro (TM) technology - will your software be a part of the solution? Download the Intel(R) Manageability Checker today! http://p.sf.net/sfu/intel-dev2devmar _______________________________________________ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users