----- Original Message ----
> From: FrancescAlted <[email protected]>
> To: Discussion list for PyTables <[email protected]>
> Sent: Sat, March 26, 2011 7:00:44 AM
> Subject: Re: [Pytables-users] Problem writing strings to a CArray. Could this
>be a bug?
>
> A Friday 25 March 2011 21:12:50 Adriano Vilela Barbosa escrigué:
> > > Probably not, but as I said before, trying to pack binary data as
> > > strings is asking for problems. Please use a bytes array instead.
> > > If what you are after is performance, then I'd say that
> > > Blosc/VLArray is the way to go.
> >
> > I understand. As I said before, I was using strings because that's
> > what the OpenCV Python bindings use to represent image data (though
> > they've been moving towards numpy in their latest releases).
> > Actually, representing byte streams as strings seems to be the
> > standard in Python 2.x, which was kind of surprising to me when I
> > first started programming in Python.
>
> Exactly, and this is why the Python crew has introduced the bytearray
> object in Python 2.6. See more info on this in:
>
> http://docs.python.org/whatsnew/2.6.html#pep-3112-byte-literals
Yes. I had read a little bit about bytearrays in Python 2.6. Thanks for the
link
anyway.
>
> > > Could you send a self-contained example reproducing your problem?
> >
> > Please, see the code below.
>
> Okay. The problem was two-folded. First of all, a bug in the way
> PyTables deals with the defaults, made the MemoryError (this has been
> fixed in trunk). Secondly, and due to HDF5 limitations, you cannot use
> atoms that are larger than 64 KB. The canonical way to handle this is
> to add more dimensions to the datasets in HDF5 and then use the slice
> selection capabilities to retrieve the images. Look at this:
Actually, what you did below was the first thing I tried when moving away from
strings. However, it resulted in my code running dozens of times slower and my
HDF files being quite bigger. That's why I tried using bigger atoms (one atom
per optical flow frame), to see if this would run faster and/or produce smaller
files, and then I ran into the error I reported.
However, I later noticed that the shape of your array is
array_shape = (n_frames, n_rows, n_cols)
whereas I had tried
array_shape = (n_rows, n_cols, n_frames)
This makes a huge difference. Using a shape (n_frames, n_rows, n_cols) for the
CArray results in the code running only about 15% slower and producing a file
only about 10% bigger when compared to using strings. This is much better than
the results I was getting when using a shape (n_rows, n_cols, n_frames). I
guess
this has to do with the way the data is laid out on disk?
As for the atom size limit (64 kB), I guess that doesn't apply to string atoms?
When using strings, I construct the atom in the following way
array_atom = tables.StringAtom(len(matrix.tostring()))
where len(matrix.tostring()) = 691200 bytes = 675 kB.
I mean, the size of the string atom is quite above the 64 kB limit and yet it
doesn't produce any erros.
Thanks a lot for your help.
Adriano
>
> import tables
> import numpy
> from time import time
>
> # ----- Writing data to file ----- #
>
> # Open the output file for writing
> fid = tables.openFile("carray_error.hdf","w")
>
> # Create a table group
> fid.createGroup("/", 'table', 'Flow table')
>
> # The number of rows and columns in a frame, and the number of frames
> n_rows = 480
> n_cols = 720
> n_frames = 2
>
> # Create a numpy vector to be stored in the Carray
> matrix = numpy.random.randn(n_rows,n_cols)
>
> # The CArray shape
> array_shape = (n_frames, n_rows, n_cols)
>
> # The CArray atom
> array_atom = tables.Int16Atom()
>
> # Create a Carray for holding horizontal flow values
> fid.createCArray(fid.root.table,'flow_x',array_atom,array_shape)
>
> # Create a Carray for holding vertical flow values. This is where we
> # get an error; working with smaller values of n_rows and n_cols works
> # fine though.
> fid.createCArray(fid.root.table,'flow_y',array_atom,array_shape)
>
> t0 = time()
> for m in range(n_frames):
> fid.root.table.flow_x[0] = matrix
> fid.root.table.flow_y[0] = matrix
> print "time to save a couple of matrices:", round(time()-t0, 3)
>
> # ----- Reading data from file ----- #
>
> print "flow_x:", fid.root.table.flow_x[0]
> print "flow_y:", fid.root.table.flow_y[0]
>
> # Close the output file
> fid.close()
>
> And the output:
>
> time to save a couple of matrices: 0.004
> flow_x: [[ 0 0 0 ..., 0 1 0]
> [ 1 0 0 ..., 0 0 0]
> [ 1 0 0 ..., 0 0 0]
> ...,
> [ 1 2 -1 ..., -1 0 1]
> [ 2 0 -1 ..., 0 0 -1]
> [-1 1 0 ..., -1 0 0]]
> flow_y: [[ 0 0 0 ..., 0 1 0]
> [ 1 0 0 ..., 0 0 0]
> [ 1 0 0 ..., 0 0 0]
> ...,
> [ 1 2 -1 ..., -1 0 1]
> [ 2 0 -1 ..., 0 0 -1]
> [-1 1 0 ..., -1 0 0]]
>
> Hope this helps,
>
> --
> Francesc Alted
>
> ------------------------------------------------------------------------------
> Enable your software for Intel(R) Active Management Technology to meet the
> growing manageability and security demands of your customers. Businesses
> are taking advantage of Intel(R) vPro (TM) technology - will your software
> be a part of the solution? Download the Intel(R) Manageability Checker
> today! http://p.sf.net/sfu/intel-dev2devmar
> _______________________________________________
> Pytables-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
------------------------------------------------------------------------------
Create and publish websites with WebMatrix
Use the most popular FREE web apps or write code yourself;
WebMatrix provides all the features you need to develop and publish
your website. http://p.sf.net/sfu/ms-webmatrix-sf
_______________________________________________
Pytables-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pytables-users