Re: [Pytables-users] Problem writing strings to a CArray. Could this be a bug?

Francesc Alted Sat, 26 Mar 2011 07:01:16 -0700

A Friday 25 March 2011 21:12:50 Adriano Vilela Barbosa escrigué:
> > Probably not, but as I said before, trying to pack binary data as
> > strings is asking for problems.  Please use a bytes array  instead.
> >  If what you are after is performance, then I'd say that
> > Blosc/VLArray is the way to go.
> 
> I understand. As I said before, I was using strings because that's
> what the OpenCV Python bindings use to represent image data (though
> they've been moving towards numpy in their latest releases).
> Actually, representing byte streams as strings seems to be the
> standard in Python 2.x, which was kind of surprising to me when I
> first started programming in Python.


Exactly, and this is why the Python crew has introduced the bytearray 
object in Python 2.6.  See more info on this in:

http://docs.python.org/whatsnew/2.6.html#pep-3112-byte-literals

> > Could you send a self-contained example reproducing your  problem?
> 
> Please, see the code below.

Okay.  The problem was two-folded.  First of all, a bug in the way 
PyTables deals with the defaults, made the MemoryError (this has been 
fixed in trunk).  Secondly, and due to HDF5 limitations, you cannot use 
atoms that are larger than 64 KB.  The canonical way to handle this is 
to add more dimensions to the datasets in HDF5 and then use the slice 
selection capabilities to retrieve the images.  Look at this:

import tables
import numpy
from time import time

# ----- Writing data to file ----- #

# Open the output file for writing
fid = tables.openFile("carray_error.hdf","w")

# Create a table group
fid.createGroup("/", 'table', 'Flow table')

# The number of rows and columns in a frame, and the number of frames
n_rows = 480
n_cols = 720
n_frames = 2

# Create a numpy vector to be stored in the Carray
matrix = numpy.random.randn(n_rows,n_cols)

# The CArray shape
array_shape = (n_frames, n_rows, n_cols)

# The CArray atom
array_atom = tables.Int16Atom()

# Create a Carray for holding horizontal flow values
fid.createCArray(fid.root.table,'flow_x',array_atom,array_shape)

# Create a Carray for holding vertical flow values.  This is where we
# get an error; working with smaller values of n_rows and n_cols works
# fine though.
fid.createCArray(fid.root.table,'flow_y',array_atom,array_shape)

t0 = time()
for m in range(n_frames):
    fid.root.table.flow_x[0] = matrix
    fid.root.table.flow_y[0] = matrix
print "time to save a couple of matrices:", round(time()-t0, 3)

# ----- Reading data from file ----- #

print "flow_x:", fid.root.table.flow_x[0]
print "flow_y:", fid.root.table.flow_y[0]

# Close the output file
fid.close()

And the output:

time to save a couple of matrices: 0.004
flow_x: [[ 0  0  0 ...,  0  1  0]
 [ 1  0  0 ...,  0  0  0]
 [ 1  0  0 ...,  0  0  0]
 ..., 
 [ 1  2 -1 ..., -1  0  1]
 [ 2  0 -1 ...,  0  0 -1]
 [-1  1  0 ..., -1  0  0]]
flow_y: [[ 0  0  0 ...,  0  1  0]
 [ 1  0  0 ...,  0  0  0]
 [ 1  0  0 ...,  0  0  0]
 ..., 
 [ 1  2 -1 ..., -1  0  1]
 [ 2  0 -1 ...,  0  0 -1]
 [-1  1  0 ..., -1  0  0]]

Hope this helps,

-- 
Francesc Alted

------------------------------------------------------------------------------
Enable your software for Intel(R) Active Management Technology to meet the
growing manageability and security demands of your customers. Businesses
are taking advantage of Intel(R) vPro (TM) technology - will your software 
be a part of the solution? Download the Intel(R) Manageability Checker 
today! http://p.sf.net/sfu/intel-dev2devmar
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Re: [Pytables-users] Problem writing strings to a CArray. Could this be a bug?

Reply via email to