Re: [Numpy-discussion] read not byte aligned records

2015-05-10 Thread Gmail
For the archive, I tried to use bitarray instead of bitstring and for
same file parsing went from 180ms to 60ms. Code was finally shorter and
more simple but less easy to jump into (documentation).


Performance is still far from using fromstring or fromfile which gives
like 5ms for similar size of file but byte aligned.

Aymeric


my code is below:

def readBitarray(self, bita, channelList=None):
 reads stream of record bytes using bitarray module needed
for not byte aligned data
   
Parameters

bitarray : stream
stream of bytes
channelList : List of str, optional
   
Returns

rec : numpy recarray
contains a matrix of raw data in a recarray (attributes
corresponding to channel name)

from bitarray import bitarray
B = bitarray(endian=little)  # little endian by default
B.frombytes(bytes(bita))
# initialise data structure
if channelList is None:
channelList = self.channelNames
format = []
for channel in self:
if channel.name in channelList:
format.append(channel.RecordFormat)
buf = recarray(self.numberOfRecords, format)
# read data
for chan in range(len(self)):
if self[chan].name in channelList:
record_bit_size = self.CGrecordLength * 8
temp = [B[self[chan].posBitBeg + record_bit_size * i:\
self[chan].posBitEnd + record_bit_size * i]\
 for i in range(self.numberOfRecords)]
nbytes = len(temp[0].tobytes())
if not nbytes == self[chan].nBytes and \
self[chan].signalDataType not in (6, 7, 8, 9,
10, 11, 12): # not Ctype byte length
byte = 8 * (self[chan].nBytes - nbytes) *
bitarray([False])
for i in range(self.numberOfRecords):  # extend data
of bytes to match numpy requirement
temp[i].append(byte)
temp = [self[chan].CFormat.unpack(temp[i].tobytes())[0] \
for i in range(self.numberOfRecords)]
buf[self[chan].name] = asarray(temp)
return buf

Le 05/05/15 15:39, Benjamin Root a écrit :
 I have been very happy with the bitarray package. I don't know if it
 is faster than bitstring, but it is worth a mention. Just watch out
 for any hashing operations on its objects, it doesn't seem to do them
 right (set(), dict(), etc...), but comparison operations work just fine.

 Ben Root


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] read not byte aligned records

2015-05-05 Thread Benjamin Root
I have been very happy with the bitarray package. I don't know if it is
faster than bitstring, but it is worth a mention. Just watch out for any
hashing operations on its objects, it doesn't seem to do them right (set(),
dict(), etc...), but comparison operations work just fine.

Ben Root
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] read not byte aligned records

2015-05-05 Thread Nathaniel Smith
On Mon, May 4, 2015 at 10:21 PM, Jerome Kieffer jerome.kief...@esrf.fr wrote:
 Hi,
 If you want to play with 10 bits data-blocks, read 5 bytes and work with 4 
 entries at a time...

NumPy arrays don't have any support for sub-byte alignment. So if you
want to handle such data, you either need to write some manual
packing/unpacking code (using bitshift operators, or perhaps
np.unpackbits, or whatever), or use another library designed for doing
this. You may find Cython useful to write the core packing/unpacking,
since bit-by-bit processing in a for loop is not something that
CPython is super well suited to.

Good luck,
-n

-- 
Nathaniel J. Smith -- http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] read not byte aligned records

2015-05-05 Thread aymeric . rateau
Hi,
To answer Jerome (I hope), data is sometime spread on bytes shared by other 
data in the whole record. 10 bits was an example, sometimes, 24, 2, 8, 7 etc. 
all combined including some padding between them. I am not sure to have 
understood...

To Nathaniel, yes indeed I could read the records in big/long bytes and apply 
right_shift and bitwise_and functions to extract each channels. I am a bit 
afraid of performance though.

I am currently using bitstring module which is doing exactly this bits 
handling. It is implemented in both pure python and cython.
I tried to use the pure python and performance drawback compared to byte 
aligned data is around 2-3x for similar file sizes.
-- I will try with bitstring's cython implementation.
-- I will also try the way using right_shift and bitwise_and
Best will win but at least I am sure I am not missing any trick or optimisation 
and I am in the right direction from your answers.
Thanks !
Regards
Aymeric


5 mai 2015 08:15 Nathaniel Smith n...@pobox.com a écrit:
 On Mon, May 4, 2015 at 10:21 PM, Jerome Kieffer jerome.kief...@esrf.fr 
 wrote:
 
 Hi,
 If you want to play with 10 bits data-blocks, read 5 bytes and work with 4 
 entries at a time...
 
 NumPy arrays don't have any support for sub-byte alignment. So if you
 want to handle such data, you either need to write some manual
 packing/unpacking code (using bitshift operators, or perhaps
 np.unpackbits, or whatever), or use another library designed for doing
 this. You may find Cython useful to write the core packing/unpacking,
 since bit-by-bit processing in a for loop is not something that
 CPython is super well suited to.
 
 Good luck,
 -n
 
 --
 Nathaniel J. Smith -- http://vorpus.org
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] read not byte aligned records

2015-05-04 Thread Jerome Kieffer
Hi, 
If you want to play with 10 bits data-blocks, read 5 bytes and work with 4 
entries at a time...

-- 
Jérôme Kieffer
Data analysis unit - ESRF
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] read not byte aligned records

2015-05-04 Thread Gmail
Hi,

I am developping a code to read binary files (MDF, Measurement Data File).
In its previous version 3, data was always byte aligned. I used widely
numpy.core.records module (fromstring, fromfile) showing good
performance to read and unpack data on the fly.
However, in the latest version 4, not byte aligned data is possible. It
allows to reduce size of file, especially when raw data is not actually
recorded on bytes, like 10bits for analog converter. For instance, a
record structure could be:
uint64, float32, uint8, unit10, padding 6bits, uint9, padding 7bits,
uint24, uint24, uint24, etc.

I found a way using instead of numpy.core.records the bitstring module
to read these records when not aligned but performance is much worse (I
did not try cython implementation though but in python like x10) ?

Would there be a pure numpy way to do ?

Regards

Aymeric

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion