[Tim] >> ... >> better than implicit" kinds of reasons. The only way now to know that >> you're looking at a frame size is to keep a running count of bytes >> processed and realize you've reached a byte offset where a frame size >> "is expected".
[Antoine] > That's integrated to the built-in buffering. Well, obviously, because it wouldn't work at all unless the built-in buffering knew all about it ;-) > It's not really an additional constraint: the frame sizes simply > dictate how buffering happens in practice. The main point of > framing is to *simplify* the buffering logic (of course, the old > buffering logic is still there for protocols <= 3, unfortunately). And always will be - there are no pickle simplifications, because everything always sticks around forever. Over time, pickle just gets more complicated. That's in the nature of the beast. > Note some drawbacks of frame opcodes: > - the decoder has to sanity check the frame opcodes (what if a frame > opcode is encountered when already inside a frame?) > - a pickle-mutating function such as pickletools.optimize() may naively > ignore the frame opcodes while rearranging the pickle stream, only to > emit a new pickle with invalid frame sizes I suspect we have very different mental models here. By "has an opcode", I do NOT mean "must be visible to the opcode-decoding loop". I just mean "has a unique byte assigned in the pickle opcode space". I expect that in the CPython implementation of unpickling, the buffering layer would _consume_ the FRAME opcode, along with the frame size. The opcode-decoding loop would never see it. But if some _other_ implementation of unpickling didn't give a hoot about framing, having an explicit opcode means that implementation could ignore the whole scheme very easily: just implement the FRAME opcode in *its* opcode-decoding loop to consume the FRAME argument, ignore it, and move on. As-is, all other implementations _have_ to know everything about the buffering scheme because it's all implicit low-level magic. So, then, to the 2 points you raised: 1. If the CPython decoder ever sees a FRAME opcode, I expect it to raise an exception. That's all - it's an invalid pickle (or bug in the code) if it contains a FRAME the buffering layer didn't consume. 2. pickletools.optimize() in the CPython implementation should never see a FRAME opcode either. Initially, all I desperately ;-) want changed here is for the _buffering layer_, on the writing end, to write 9 bytes instead of 8 (1 new one for a FRAME opcode), and on the reading end to consume 9 bytes instead of 8 (extra credit if it checked the first byte to verify it really is a FRAME opcode - there's nothing wrong with sanity checks). Then it becomes _possible_ to optimize "small pickles" later (in the sense of not bothering to frame them at all). So long as frames remain implicit magic, that's impossible without moving to yet another new protocol level. _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com