>
>From: FrancescAlted <fal...@pytables.org>
>To: Discussion list for PyTables <pytables-users@lists.sourceforge.net>
>Sent: Wed, March 23, 2011 10:57:06 AM
>Subject: Re: [Pytables-users] Problem writing strings to a CArray. Could this 
>be 
>
>a bug?
>
>
>2011/3/23 Adriano Vilela Barbosa <adriano.vil...@yahoo.com>
>
>This is not a bug, but rather a feature of NumPy.  Look at this:
>>
>>>>> import numpy as np
>>>>> a = np.array(['aa\x00\x00'])
>>>>> a[0]
>>'aa'          # hey! were have my trailing 0's gone?
>>>>> a.data[:]
>>'aa\x00\x00'  # yeah, they still are in the data area of the array
>>
>>I'd recommend you using the byte ('i1') type for achieving what you want:
>>
>>>>> a.view('i1')
>>array([97, 97,  0,  0], dtype=int8)
>>
>>Thank you very much for your explanation, but I still don't get it.
>>
>>Let's forget numpy for a moment and just say I want to store the string
>>'aa\x00\x00' in a CArray. Each element of the CArray is a 4 element string.
>>First, I create the CArray:
>>
>>>>> import tables
>>>>> fid = tables.openFile("carray_test.hdf","w")
>>
>>>>> fid.createGroup("/", 'table', 'Binary table')
>>>>> array_atom = tables.StringAtom(itemsize=4)
>>>>> array_shape = (1,)
>>
>>>>> fid.createCArray(fid.root.table,'bin_table',array_atom,array_shape)
>>
>>Now, I store the string 'aa\x00\x00' in the first row (which is the only row 
in
>>this example) of the CArray:
>>
>>>>> fid.root.table.bin_table[0] = 'aa\x00\x00'
>>
>>Now, I do
>>
>>>>> fid.root.table.bin_table[0].data[:]
>>'aa'
>>
>>So, it looks to me that the trailing \x00 elements of the string are not being
>>stored in the CArray. From my side, there's no numpy involved; I'm just trying
>>to store a string. What am I missing?
>>

You cannot avoid NumPy because PyTables uses NumPy behind the scenes as an 
intermediate buffer area.  What you are seeing is probably a secondary effect 
caused by the 'feature' I mentioned before.  Any reason why you don't want to 
use a byte type instead of a string?
 -- 
>FrancescAlted
>

Hi again,

I'm happy to use bytes instead of strings. The reason I was using strings is 
that, as someone new to Python and numpy, I thought strings were the only way 
of 
dealing with individual bytes. Also, because of this problem I'm having with 
strings, I tried storing the numpy arrays directly into the HDF file, but the 
performance was quite poorer and the file size quite bigger.

So, going back to my previous example, I guess the only things I need to change 
is the Atom object used to construct the CArray and also to use the method 
view() instead of tostring() of the numpy array.

>>> import numpy
>>> import tables
>>> fid = tables.openFile("carray_test.hdf","w")
>>> fid.createGroup("/", 'table', 'Binary table')
>>> array_atom = tables.Atom.from_dtype(numpy.dtype((numpy.byte, (4,))))
>>> array_shape = (1,)
>>> fid.createCArray(fid.root.table,'bin_table',array_atom,array_shape)
>>> a = numpy.array(['aa\x00\x00'])
>>> fid.root.table.bin_table[0] = a.view('b')
>>> fid.root.table.bin_table[0].data[:]
'aa\x00\x00'

Is this right, or there's a more efficient way of doing it?

Thank you very much. Your help is greatly appreciated.

Adriano



------------------------------------------------------------------------------
Enable your software for Intel(R) Active Management Technology to meet the
growing manageability and security demands of your customers. Businesses
are taking advantage of Intel(R) vPro (TM) technology - will your software 
be a part of the solution? Download the Intel(R) Manageability Checker 
today! http://p.sf.net/sfu/intel-dev2devmar
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to