Hello, I'm having a problem with PyTables that's driving me nuts. I'm probably doing something wrong, but I wonder if it could be a bug.
Basically, I need to write a series of numpy arrays to an HDF file. Because of performance and compression requirements, I decided to convert each numpy array to a binary string and then write the resulting string to a CArray in the output HDF file. This works fine, as long as there are no trailing \x00 elements in the string. For some reason, trailing \x00 elements are NOT copied to the Carray. As an example, I attach a python script where I create a numpy array with two int16 elements [1, 2]. The string representation of this array is '\x01\x00\x02\x00'. When writing this string to the CArray, only the '\x01\x00\x02' part is copied over. Later, when trying to read this string from the CArray into a two int16 element numpy array, I run into an error, as the string only has 3 bytes (instead of four). Does anyone know why this happens? Any help with this is greatly appreciated. Thank you, Adriano --------------------------------------------------------------------- import tables import numpy # ----- Writing data to file ----- # # The name of the output file file_name = "carray_test.hdf" # Open the output file for writing fid = tables.openFile(file_name,"w") # Create a table group fid.createGroup("/", 'table', 'Binary table') # Create a numpy vector to be stored in the Carray matrix = numpy.array([1, 2],dtype='int16') # The CArray shape n_rows = 1 array_shape = (n_rows,) # The CArray atom array_atom = tables.StringAtom(itemsize=len(matrix.tostring())) # Create the Carray fid.createCArray(fid.root.table,'bin_table',array_atom,array_shape) # Write 'matrix' to the CArray in the output file. For some reason, # trailing \x00 elements in the binary string representation of the # numpy array are not copied to the CArray. Why??? fid.root.table.bin_table[0] = matrix.tostring() # Close the output file fid.close() # ----- Reading data from file ----- # # Re-open the file, this time in reading mode fid = tables.openFile(file_name,"r") # Read the binary string from the CArray matrix_str = fid.root.table.bin_table[0] # Convert the binary string into a numpy array. Here is where we get the error # "ValueError: string size must be a multiple of element size" # because the trailing \x00 have been dropped from the binary string when writing # it to the CArray matrix_2 = numpy.fromstring(matrix_str,dtype='int16') ------------------------------------------------------------------------------ Enable your software for Intel(R) Active Management Technology to meet the growing manageability and security demands of your customers. Businesses are taking advantage of Intel(R) vPro (TM) technology - will your software be a part of the solution? Download the Intel(R) Manageability Checker today! http://p.sf.net/sfu/intel-dev2devmar _______________________________________________ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users