I posted this on SO, but… yeah…
I'm doing some serial protocol stuff and want to implement a basic byte
stuffing algorithm in python. Though really what this really generalizes to is
“what is the most pythonic way to transform one sequence of bytes where some
bytes are passed through 1:1, but others are transformed to longer subsequences
of bytes?” I’m pretty sure this rules out the use of transform() which expects
a 1:1 mapping.
So far, I've come with 5 different approaches, and each of them has something I
don't like about it:
1 Via Generator
def stuff1(bits):
for byte in bits:
if byte in _EscapeCodes:
yield PacketCode.Escape
yield byte ^ 0xFF
else:
yield byte
This may be my favorite, but maybe just because I'm kind of fascinated by yield
based generators. I worried that the generator would make it slow, but it's
actually the second fastest of the bunch.
2 Simply bytes()
def stuff2(bits):
result = bytes()
for byte in bits:
if byte in _EscapeCodes:
result += bytes([PacketCode.Escape, byte ^ 0xFF])
else:
result += bytes([byte])
return result
Constantly has to create single element arrays just to throw them out because
I'm not aware of any "copy with one additional element" operation. It ties for
the slowest of the bunch.
3 Use bytearray()
def stuff3(bits):
result = bytearray()
for byte in bits:
if byte in _EscapeCodes:
result.append(PacketCode.Escape)
result.append(byte ^ 0xFF)
else:
result.append(byte)
return result
Seems better than the direct bytes() approach. Actually slower than the yield
generator and can do one byte at a time (instead of needing intermediate 1
element collections). But it feels brutish. It's middle of the pack performance.
4 BytesIO()
def stuff4(bits):
bio = BytesIO()
for byte in bits:
if byte in _EscapeCodes:
bio.write(bytes([PacketCode.Escape, byte ^ 0xFF]))
else:
bio.write(bytes([byte]))
return bio.getbuffer()
I like the stream based approach here. But it is annoying that there doesn't
seem to be something like a write1() API that could just add 1 byte, so I have
to make those intermediate bytes again. If there was a "write single byte", I'd
like this one. It ties for slowest.
5 Use replace()
def stuff5(bits):
escapeStuffed = bytes(bits).replace(bytes([PacketCode.Escape]),
bytes([PacketCode.Escape, PacketCode.Escape ^ 0xFF]))
stopStuffed= escapeStuffed.replace(bytes([PacketCode.Stop]),
bytes([PacketCode.Escape, PacketCode.Stop ^ 0xFF]))
return stopStuffed.replace(bytes([PacketCode.Start]),
bytes([PacketCode.Escape, PacketCode.Start ^ 0xFF]))
This is the fastest. But I don't like the way the code reads and the
intermediate sweeps.
--
https://mail.python.org/mailman/listinfo/python-list