Re: [Numpy-discussion] read not byte aligned records
For the archive, I tried to use bitarray instead of bitstring and for same file parsing went from 180ms to 60ms. Code was finally shorter and more simple but less easy to jump into (documentation). Performance is still far from using fromstring or fromfile which gives like 5ms for similar size of file but byte aligned. Aymeric my code is below: def readBitarray(self, bita, channelList=None): reads stream of record bytes using bitarray module needed for not byte aligned data Parameters bitarray : stream stream of bytes channelList : List of str, optional Returns rec : numpy recarray contains a matrix of raw data in a recarray (attributes corresponding to channel name) from bitarray import bitarray B = bitarray(endian=little) # little endian by default B.frombytes(bytes(bita)) # initialise data structure if channelList is None: channelList = self.channelNames format = [] for channel in self: if channel.name in channelList: format.append(channel.RecordFormat) buf = recarray(self.numberOfRecords, format) # read data for chan in range(len(self)): if self[chan].name in channelList: record_bit_size = self.CGrecordLength * 8 temp = [B[self[chan].posBitBeg + record_bit_size * i:\ self[chan].posBitEnd + record_bit_size * i]\ for i in range(self.numberOfRecords)] nbytes = len(temp[0].tobytes()) if not nbytes == self[chan].nBytes and \ self[chan].signalDataType not in (6, 7, 8, 9, 10, 11, 12): # not Ctype byte length byte = 8 * (self[chan].nBytes - nbytes) * bitarray([False]) for i in range(self.numberOfRecords): # extend data of bytes to match numpy requirement temp[i].append(byte) temp = [self[chan].CFormat.unpack(temp[i].tobytes())[0] \ for i in range(self.numberOfRecords)] buf[self[chan].name] = asarray(temp) return buf Le 05/05/15 15:39, Benjamin Root a écrit : I have been very happy with the bitarray package. I don't know if it is faster than bitstring, but it is worth a mention. Just watch out for any hashing operations on its objects, it doesn't seem to do them right (set(), dict(), etc...), but comparison operations work just fine. Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] read not byte aligned records
I have been very happy with the bitarray package. I don't know if it is faster than bitstring, but it is worth a mention. Just watch out for any hashing operations on its objects, it doesn't seem to do them right (set(), dict(), etc...), but comparison operations work just fine. Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] read not byte aligned records
On Mon, May 4, 2015 at 10:21 PM, Jerome Kieffer jerome.kief...@esrf.fr wrote: Hi, If you want to play with 10 bits data-blocks, read 5 bytes and work with 4 entries at a time... NumPy arrays don't have any support for sub-byte alignment. So if you want to handle such data, you either need to write some manual packing/unpacking code (using bitshift operators, or perhaps np.unpackbits, or whatever), or use another library designed for doing this. You may find Cython useful to write the core packing/unpacking, since bit-by-bit processing in a for loop is not something that CPython is super well suited to. Good luck, -n -- Nathaniel J. Smith -- http://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] read not byte aligned records
Hi, To answer Jerome (I hope), data is sometime spread on bytes shared by other data in the whole record. 10 bits was an example, sometimes, 24, 2, 8, 7 etc. all combined including some padding between them. I am not sure to have understood... To Nathaniel, yes indeed I could read the records in big/long bytes and apply right_shift and bitwise_and functions to extract each channels. I am a bit afraid of performance though. I am currently using bitstring module which is doing exactly this bits handling. It is implemented in both pure python and cython. I tried to use the pure python and performance drawback compared to byte aligned data is around 2-3x for similar file sizes. -- I will try with bitstring's cython implementation. -- I will also try the way using right_shift and bitwise_and Best will win but at least I am sure I am not missing any trick or optimisation and I am in the right direction from your answers. Thanks ! Regards Aymeric 5 mai 2015 08:15 Nathaniel Smith n...@pobox.com a écrit: On Mon, May 4, 2015 at 10:21 PM, Jerome Kieffer jerome.kief...@esrf.fr wrote: Hi, If you want to play with 10 bits data-blocks, read 5 bytes and work with 4 entries at a time... NumPy arrays don't have any support for sub-byte alignment. So if you want to handle such data, you either need to write some manual packing/unpacking code (using bitshift operators, or perhaps np.unpackbits, or whatever), or use another library designed for doing this. You may find Cython useful to write the core packing/unpacking, since bit-by-bit processing in a for loop is not something that CPython is super well suited to. Good luck, -n -- Nathaniel J. Smith -- http://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] read not byte aligned records
Hi, If you want to play with 10 bits data-blocks, read 5 bytes and work with 4 entries at a time... -- Jérôme Kieffer Data analysis unit - ESRF ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] read not byte aligned records
Hi, I am developping a code to read binary files (MDF, Measurement Data File). In its previous version 3, data was always byte aligned. I used widely numpy.core.records module (fromstring, fromfile) showing good performance to read and unpack data on the fly. However, in the latest version 4, not byte aligned data is possible. It allows to reduce size of file, especially when raw data is not actually recorded on bytes, like 10bits for analog converter. For instance, a record structure could be: uint64, float32, uint8, unit10, padding 6bits, uint9, padding 7bits, uint24, uint24, uint24, etc. I found a way using instead of numpy.core.records the bitstring module to read these records when not aligned but performance is much worse (I did not try cython implementation though but in python like x10) ? Would there be a pure numpy way to do ? Regards Aymeric ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion