A Wednesday 23 March 2011 19:53:43 Adriano Vilela Barbosa escrigué:
> >From: FrancescAlted <fal...@pytables.org>
> >To: Discussion list for PyTables
> ><pytables-users@lists.sourceforge.net> Sent: Wed, March 23, 2011
> >10:57:06 AM
> >Subject: Re: [Pytables-users] Problem writing strings to a CArray.
> >Could this be
> >
> >a bug?
> >
> >
> >2011/3/23 Adriano Vilela Barbosa <adriano.vil...@yahoo.com>
> >
> >This is not a bug, but rather a feature of NumPy.  Look at this:
> >>>>> import numpy as np
> >>>>> a = np.array(['aa\x00\x00'])
> >>>>> a[0]
> >>
> >>'aa'          # hey! were have my trailing 0's gone?
> >>
> >>>>> a.data[:]
> >>
> >>'aa\x00\x00'  # yeah, they still are in the data area of the array
> >>
> >>I'd recommend you using the byte ('i1') type for achieving what you 
want:
> >>>>> a.view('i1')
> >>
> >>array([97, 97,  0,  0], dtype=int8)
> >>
> >>Thank you very much for your explanation, but I still don't get it.
> >>
> >>Let's forget numpy for a moment and just say I want to store the
> >>string 'aa\x00\x00' in a CArray. Each element of the CArray is a 4
> >>element string.
> >>
> >>First, I create the CArray:
> >>>>> import tables
> >>>>> fid = tables.openFile("carray_test.hdf","w")
> >>>>> 
> >>>>> fid.createGroup("/", 'table', 'Binary table')
> >>>>> array_atom = tables.StringAtom(itemsize=4)
> >>>>> array_shape = (1,)
> >>>>> 
> >>>>> fid.createCArray(fid.root.table,'bin_table',array_atom,array_sh
> >>>>> ape)
> >>
> >>Now, I store the string 'aa\x00\x00' in the first row (which is the
> >>only row
> 
> in
> 
> >>this example) of the CArray:
> >>>>> fid.root.table.bin_table[0] = 'aa\x00\x00'
> >>
> >>Now, I do
> >>
> >>>>> fid.root.table.bin_table[0].data[:]
> >>
> >>'aa'
> >>
> >>So, it looks to me that the trailing \x00 elements of the string
> >>are not being stored in the CArray. From my side, there's no numpy
> >>involved; I'm just trying to store a string. What am I missing?
> 
> You cannot avoid NumPy because PyTables uses NumPy behind the scenes
> as an intermediate buffer area.  What you are seeing is probably a
> secondary effect caused by the 'feature' I mentioned before.  Any
> reason why you don't want to use a byte type instead of a string?
> 
> >FrancescAlted
> 
> Hi again,
> 
> I'm happy to use bytes instead of strings. The reason I was using
> strings is that, as someone new to Python and numpy, I thought
> strings were the only way of dealing with individual bytes. Also,
> because of this problem I'm having with strings, I tried storing the
> numpy arrays directly into the HDF file, but the performance was
> quite poorer and the file size quite bigger.
> 
> So, going back to my previous example, I guess the only things I need
> to change is the Atom object used to construct the CArray and also
> to use the method view() instead of tostring() of the numpy array.
> 
> >>> import numpy
> >>> import tables
> >>> fid = tables.openFile("carray_test.hdf","w")
> >>> fid.createGroup("/", 'table', 'Binary table')
> >>> array_atom = tables.Atom.from_dtype(numpy.dtype((numpy.byte,
> >>> (4,)))) array_shape = (1,)
> >>> fid.createCArray(fid.root.table,'bin_table',array_atom,array_shap
> >>> e) a = numpy.array(['aa\x00\x00'])
> >>> fid.root.table.bin_table[0] = a.view('b')
> >>> fid.root.table.bin_table[0].data[:]
> 
> 'aa\x00\x00'

Ah!  I see where the problem was now (the assignation).  Thanks for 
showing the point.
 
> Is this right, or there's a more efficient way of doing it?

Well, I don't fully understand why you are converting to strings prior 
to save the info into the CArray because it should support compression 
and be pretty fast too.  Are you really getting a significant speed-up 
by converting to strings?

In case you want to continue the conversion path, I'd also try a VLArray 
where the elements have been previously compressed using the blosc 
package (https://github.com/FrancescAlted/python-blosc).

Good luck!

-- 
Francesc Alted

------------------------------------------------------------------------------
Enable your software for Intel(R) Active Management Technology to meet the
growing manageability and security demands of your customers. Businesses
are taking advantage of Intel(R) vPro (TM) technology - will your software 
be a part of the solution? Download the Intel(R) Manageability Checker 
today! http://p.sf.net/sfu/intel-dev2devmar
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to