----- Original Message ----
> From: FrancescAlted <fal...@pytables.org>
> To: Discussion list for PyTables <pytables-users@lists.sourceforge.net>
> Sent: Sat, March 26, 2011 7:00:44 AM
> Subject: Re: [Pytables-users] Problem writing strings to a CArray. Could this 
>be a bug?
> 
> A Friday 25 March 2011 21:12:50 Adriano Vilela Barbosa escrigué:
> > >  Probably not, but as I said before, trying to pack binary data as
> > >  strings is asking for problems.  Please use a bytes array   instead.
> > >  If what you are after is performance, then I'd say  that
> > > Blosc/VLArray is the way to go.
> > 
> > I understand.  As I said before, I was using strings because that's
> > what the OpenCV  Python bindings use to represent image data (though
> > they've been moving  towards numpy in their latest releases).
> > Actually, representing byte  streams as strings seems to be the
> > standard in Python 2.x, which was  kind of surprising to me when I
> > first started programming in  Python.
> 
> Exactly, and this is why the Python crew has introduced the  bytearray
> object in Python 2.6.  See more info on this in:
> 
> http://docs.python.org/whatsnew/2.6.html#pep-3112-byte-literals

Yes. I had read a little bit about bytearrays in Python 2.6. Thanks for the 
link 
anyway.

> 
> >  > Could you send a self-contained example reproducing your   problem?
> > 
> > Please, see the code below.
> 
> Okay.  The  problem was two-folded.  First of all, a bug in the way 
> PyTables deals  with the defaults, made the MemoryError (this has been 
> fixed in  trunk).  Secondly, and due to HDF5 limitations, you cannot use 
> atoms  that are larger than 64 KB.  The canonical way to handle this is 
> to add  more dimensions to the datasets in HDF5 and then use the slice 
> selection  capabilities to retrieve the images.  Look at this:

Actually, what you did below was the first thing I tried when moving away from 
strings. However, it resulted in my code running dozens of times slower and my 
HDF files being quite bigger. That's why I tried using bigger atoms (one atom 
per optical flow frame), to see if this would run faster and/or produce smaller 
files, and then I ran into the error I reported.

However, I later noticed that the shape of your array is

array_shape = (n_frames, n_rows, n_cols)

whereas I had tried

array_shape = (n_rows, n_cols, n_frames)

This makes a huge difference. Using a shape (n_frames, n_rows, n_cols) for the 
CArray results in the code running only about 15% slower and producing a file 
only about 10% bigger when compared to using strings. This is much better than 
the results I was getting when using a shape (n_rows, n_cols, n_frames). I 
guess 
this has to do with the way the data is laid out on disk?

As for the atom size limit (64 kB), I guess that doesn't apply to string atoms? 
When using strings, I construct the atom in the following way

array_atom = tables.StringAtom(len(matrix.tostring()))

where len(matrix.tostring()) = 691200 bytes = 675 kB.

I mean, the size of the string atom is quite above the 64 kB limit and yet it 
doesn't produce any erros.

Thanks a lot for your help.

Adriano


> 
> import  tables
> import numpy
> from time import time
> 
> # ----- Writing data to  file ----- #
> 
> # Open the output file for writing
> fid =  tables.openFile("carray_error.hdf","w")
> 
> # Create a table  group
> fid.createGroup("/", 'table', 'Flow table')
> 
> # The number of rows  and columns in a frame, and the number of frames
> n_rows = 480
> n_cols =  720
> n_frames = 2
> 
> # Create a numpy vector to be stored in the  Carray
> matrix = numpy.random.randn(n_rows,n_cols)
> 
> # The CArray  shape
> array_shape = (n_frames, n_rows, n_cols)
> 
> # The CArray  atom
> array_atom = tables.Int16Atom()
> 
> # Create a Carray for holding  horizontal flow  values
> fid.createCArray(fid.root.table,'flow_x',array_atom,array_shape)
> 
> #  Create a Carray for holding vertical flow values.  This is where we
> #  get an error; working with smaller values of n_rows and n_cols works
> # fine  though.
> fid.createCArray(fid.root.table,'flow_y',array_atom,array_shape)
> 
> t0  = time()
> for m in range(n_frames):
>     fid.root.table.flow_x[0]  = matrix
>     fid.root.table.flow_y[0] = matrix
> print "time to  save a couple of matrices:", round(time()-t0, 3)
> 
> # ----- Reading data  from file ----- #
> 
> print "flow_x:", fid.root.table.flow_x[0]
> print  "flow_y:", fid.root.table.flow_y[0]
> 
> # Close the output  file
> fid.close()
> 
> And the output:
> 
> time to save a couple of  matrices: 0.004
> flow_x: [[ 0  0  0 ...,  0  1   0]
>  [ 1  0  0 ...,  0  0  0]
>  [ 1  0  0  ...,  0  0  0]
>  ..., 
>  [ 1  2 -1 ..., -1  0   1]
>  [ 2  0 -1 ...,  0  0 -1]
>  [-1  1  0 ...,  -1  0  0]]
> flow_y: [[ 0  0  0 ...,  0  1   0]
>  [ 1  0  0 ...,  0  0  0]
>  [ 1  0  0  ...,  0  0  0]
>  ..., 
>  [ 1  2 -1 ..., -1  0   1]
>  [ 2  0 -1 ...,  0  0 -1]
>  [-1  1  0 ...,  -1  0  0]]
> 
> Hope this helps,
> 
> -- 
> Francesc  Alted
> 
> ------------------------------------------------------------------------------
> Enable  your software for Intel(R) Active Management Technology to meet the
> growing  manageability and security demands of your customers. Businesses
> are taking  advantage of Intel(R) vPro (TM) technology - will your software 
> be a part of  the solution? Download the Intel(R) Manageability Checker 
> today! http://p.sf.net/sfu/intel-dev2devmar
> _______________________________________________
> Pytables-users  mailing list
> Pytables-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>


------------------------------------------------------------------------------
Create and publish websites with WebMatrix
Use the most popular FREE web apps or write code yourself; 
WebMatrix provides all the features you need to develop and publish 
your website. http://p.sf.net/sfu/ms-webmatrix-sf
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to