----- Original Message ----
> From: FrancescAlted <fal...@pytables.org>
> To: Discussion list for PyTables <pytables-users@lists.sourceforge.net>
> Sent: Thu, March 24, 2011 1:24:55 AM
> Subject: Re: [Pytables-users] Problem writing strings to a CArray. Could this
>be a bug?
>
> A Wednesday 23 March 2011 19:53:43 Adriano Vilela Barbosa escrigué:
> > >From: FrancescAlted <fal...@pytables.org>
> > >To: Discussion list for PyTables
> > ><pytables-users@lists.sourceforge.net> Sent: Wed, March 23, 2011
> > >10:57:06 AM
> > >Subject: Re: [Pytables-users] Problem writing strings to a CArray.
> > >Could this be
> > >
> > >a bug?
> > >
> > >
> > >2011/3/23 Adriano Vilela Barbosa <adriano.vil...@yahoo.com>
> > >
> > >This is not a bug, but rather a feature of NumPy. Look at this:
> > >>>>> import numpy as np
> > >>>>> a = np.array(['aa\x00\x00'])
> > >>>>> a[0]
> > >>
> > >>'aa' # hey! were have my trailing 0's gone?
> > >>
> > >>>>> a.data[:]
> > >>
> > >>'aa\x00\x00' # yeah, they still are in the data area of the array
> > >>
> > >>I'd recommend you using the byte ('i1') type for achieving what you
> want:
> > >>>>> a.view('i1')
> > >>
> > >>array([97, 97, 0, 0], dtype=int8)
> > >>
> > >>Thank you very much for your explanation, but I still don't get it.
> > >>
> > >>Let's forget numpy for a moment and just say I want to store the
> > >>string 'aa\x00\x00' in a CArray. Each element of the CArray is a 4
> > >>element string.
> > >>
> > >>First, I create the CArray:
> > >>>>> import tables
> > >>>>> fid = tables.openFile("carray_test.hdf","w")
> > >>>>>
> > >>>>> fid.createGroup("/", 'table', 'Binary table')
> > >>>>> array_atom = tables.StringAtom(itemsize=4)
> > >>>>> array_shape = (1,)
> > >>>>>
> > >>>>> fid.createCArray(fid.root.table,'bin_table',array_atom,array_sh
> > >>>>> ape)
> > >>
> > >>Now, I store the string 'aa\x00\x00' in the first row (which is the
> > >>only row
> >
> > in
> >
> > >>this example) of the CArray:
> > >>>>> fid.root.table.bin_table[0] = 'aa\x00\x00'
> > >>
> > >>Now, I do
> > >>
> > >>>>> fid.root.table.bin_table[0].data[:]
> > >>
> > >>'aa'
> > >>
> > >>So, it looks to me that the trailing \x00 elements of the string
> > >>are not being stored in the CArray. From my side, there's no numpy
> > >>involved; I'm just trying to store a string. What am I missing?
> >
> > You cannot avoid NumPy because PyTables uses NumPy behind the scenes
> > as an intermediate buffer area. What you are seeing is probably a
> > secondary effect caused by the 'feature' I mentioned before. Any
> > reason why you don't want to use a byte type instead of a string?
> >
> > >FrancescAlted
> >
> > Hi again,
> >
> > I'm happy to use bytes instead of strings. The reason I was using
> > strings is that, as someone new to Python and numpy, I thought
> > strings were the only way of dealing with individual bytes. Also,
> > because of this problem I'm having with strings, I tried storing the
> > numpy arrays directly into the HDF file, but the performance was
> > quite poorer and the file size quite bigger.
> >
> > So, going back to my previous example, I guess the only things I need
> > to change is the Atom object used to construct the CArray and also
> > to use the method view() instead of tostring() of the numpy array.
> >
> > >>> import numpy
> > >>> import tables
> > >>> fid = tables.openFile("carray_test.hdf","w")
> > >>> fid.createGroup("/", 'table', 'Binary table')
> > >>> array_atom = tables.Atom.from_dtype(numpy.dtype((numpy.byte,
> > >>> (4,)))) array_shape = (1,)
> > >>> fid.createCArray(fid.root.table,'bin_table',array_atom,array_shap
> > >>> e) a = numpy.array(['aa\x00\x00'])
> > >>> fid.root.table.bin_table[0] = a.view('b')
> > >>> fid.root.table.bin_table[0].data[:]
> >
> > 'aa\x00\x00'
>
> Ah! I see where the problem was now (the assignation). Thanks for
> showing the point.
Yes, the problem is when assigning a string (any string, not only one obtained
from a numpy array) to the CArray. The trailing '\x00' items are simply lost.
In
the examples you gave before with numpy arrays, you could still see the
trailing
'\x00' elements were there by doing a.data[:]; however, after assigning the
string to the CArray, even if I do
fid.root.table.bin_table[0].data[:]
I can't see anything. Is this really the way this is supposed to work?
>
> > Is this right, or there's a more efficient way of doing it?
>
> Well, I don't fully understand why you are converting to strings prior
> to save the info into the CArray because it should support compression
> and be pretty fast too. Are you really getting a significant speed-up
> by converting to strings?
>
> In case you want to continue the conversion path, I'd also try a VLArray
> where the elements have been previously compressed using the blosc
> package (https://github.com/FrancescAlted/python-blosc).
In my previous, long (sorry about that) email I told you the reason I'm using
strings: because of OpenCV. However, I converted my OpenCV images (actually,
optical flow frames) to numpy arrays and I'm trying to store them in a CArray.
The data can be seen as a (n_rows, n_cols, n_frames) array, where n_rows and
n_cols are the number of rows and columns in each frame, respectively, and
n_frames is the number of frames. The optical flow values are represented as
int16. Initially, I did
array_shape = (n_rows,n_cols,n_frames)
array_atom = tables.Int16Atom()
and that works fine, although this is much slower and results in quite bigger
files (compared to the string approach). Next, I did
array_shape = (n_frames,)
array_atom = tables.Int16Atom((n_rows,n_cols))
in the hope that this would be faster and more compression efficient. However,
when creating the second CArray (I need two of them, for the horizontal and
vertical pixel displacements) I get the following error:
Traceback (innermost last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.6/dist-packages/tables/file.py", line 781, in
createCArray
chunkshape=chunkshape, byteorder=byteorder)
File "/usr/lib/python2.6/dist-packages/tables/carray.py", line 220, in
__init__
byteorder, _log)
File "/usr/lib/python2.6/dist-packages/tables/leaf.py", line 290, in __init__
super(Leaf, self).__init__(parentNode, name, _log)
File "/usr/lib/python2.6/dist-packages/tables/node.py", line 296, in __init__
self._v_objectID = self._g_create()
File "/usr/lib/python2.6/dist-packages/tables/carray.py", line 230, in
_g_create
return self._g_create_common(self.nrows)
File "/usr/lib/python2.6/dist-packages/tables/carray.py", line 253, in
_g_create_common
self._v_objectID = self._createCArray(self._v_new_title)
File "hdf5Extension.pyx", line 877, in
tables.hdf5Extension.Array._createCArray
MemoryError
This is the memory error I mentioned before. Any ideas why this happens?
>
> Good luck!
>
> --
> FrancescAlted
Thank you very much,
Adriano
------------------------------------------------------------------------------
Enable your software for Intel(R) Active Management Technology to meet the
growing manageability and security demands of your customers. Businesses
are taking advantage of Intel(R) vPro (TM) technology - will your software
be a part of the solution? Download the Intel(R) Manageability Checker
today! http://p.sf.net/sfu/intel-dev2devmar
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users