Anand Patil (el 2007-10-31 a les 17:53:17 -0700) va dir:: > I have a file full of 32-bit floats, in binary format, compressed with zip. > I'd like to get it into a PyTables array, but this: > > Z = ZipFile('data_file.zip') > binary_data = Z.read('data_file') > numpy_array = numpy.fromstring(data, dtype=float32) > h5file.createArray('/', 'data', numpy_array) > > won't work because I don't have enough memory for the intermediate stages. > Is there an easy way to do this piece-by-piece or in a 'streaming' fashion?
First of all I'd avoid using an ``Array`` object for storing such a big array. ``CArray`` or ``EArray`` objects are more suited for that, since they are chunked so they are a lot more memory-efficient. Both allow you to store your data little by little, since disk space is only allocated for a chunk when really needed. The first ones have a fixed shape, while the second ones are enlargeable. I guess the big obstacle would be to extract data from the zip file incrementally. Since the ``ZipFile`` interface doesn't allow this, you may unzip ``data_file`` to disk, then open it and read chunks of data from it. Something like this: nptype = numpy.float32 atom = tables.Atom.from_sctype(nptype) extract data_file from data_file.zip (e.g. with subprocess) total_rows = size of data_file / atom.itemsize (e.g. with stat) array = h5file.createCArray( '/', 'data', atom, shape=(total_rows,) ) # or array = h5file.createEArray( '/', 'data', atom, shape=(0,), expectedrows=total_rows ) # We will be reading blocks as big as a chunk. rows_to_read = array.chunkshape[0] bytes_to_read = rows_to_read * atom.itemsize dfile = open('data_file', 'b') data = dfile.read(bytes_to_read) base = 0 # only for CArray while data: arr = numpy.fromstring(data, dtype=nptype) # CArray case array[base:base+len(arr)] = arr base += len(arr) # EArray case array.append(arr) data = dfile.read(bytes_to_read) array.flush() dfile.close() This is untested, but I hope you get the idea. Cheers, :: Ivan Vilata i Balaguer >qo< http://www.carabos.com/ Cárabos Coop. V. V V Enjoy Data ""
signature.asc
Description: Digital signature
------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users