Hello,

I'm having a problem with PyTables that's driving me nuts. I'm probably doing 
something wrong, but I wonder if it could be a bug.

Basically, I need to write a series of numpy arrays to an HDF file. Because of 
performance and compression requirements, I decided to convert each numpy array 
to a binary string and then write the resulting string to a CArray in the 
output 
HDF file. This works fine, as long as there are no trailing \x00 elements in 
the 
string. For some reason, trailing \x00 elements are NOT copied to the Carray.

As an example, I attach a python script where I create a numpy array with two 
int16 elements [1, 2]. The string representation of this array is 
'\x01\x00\x02\x00'. When writing this string to the CArray, only the 
'\x01\x00\x02' part is copied over. Later, when trying to read this string from 
the CArray into a two int16 element numpy array, I run into an error, as the 
string only has 3 bytes (instead of four).

Does anyone know why this happens? Any help with this is greatly appreciated.

Thank you,

Adriano


---------------------------------------------------------------------


import tables
import numpy


# ----- Writing data to file ----- #

# The name of the output file
file_name = "carray_test.hdf"

# Open the output file for writing
fid = tables.openFile(file_name,"w")

# Create a table group
fid.createGroup("/", 'table', 'Binary table')

# Create a numpy vector to be stored in the Carray
matrix = numpy.array([1, 2],dtype='int16')

# The CArray shape
n_rows = 1
array_shape = (n_rows,)

# The CArray atom
array_atom = tables.StringAtom(itemsize=len(matrix.tostring()))

# Create the Carray
fid.createCArray(fid.root.table,'bin_table',array_atom,array_shape)

# Write 'matrix' to the CArray in the output file. For some reason,
# trailing \x00 elements in the binary string representation of the
# numpy array are not copied to the CArray. Why???
fid.root.table.bin_table[0] = matrix.tostring()

# Close the output file
fid.close()



# ----- Reading data from file ----- #

# Re-open the file, this time in reading mode
fid = tables.openFile(file_name,"r")

# Read the binary string from the CArray
matrix_str = fid.root.table.bin_table[0]

# Convert the binary string into a numpy array. Here is where we get the error
# "ValueError: string size must be a multiple of element size"
# because the trailing \x00 have been dropped from the binary string when 
writing
# it to the CArray
matrix_2 = numpy.fromstring(matrix_str,dtype='int16')



------------------------------------------------------------------------------
Enable your software for Intel(R) Active Management Technology to meet the
growing manageability and security demands of your customers. Businesses
are taking advantage of Intel(R) vPro (TM) technology - will your software 
be a part of the solution? Download the Intel(R) Manageability Checker 
today! http://p.sf.net/sfu/intel-dev2devmar
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to