Re: [Python-Dev] Which direction is UnTransform? / Unicode is different
My thought on this for the day, for what it's worth: Anything that doesn't have directions clearly identifiable as "encoding" and "decoding" maybe shouldn't be called a "codec"? -- Greg ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Accepting PEP 3154 for 3.4?
[Martin v. Löwis] > ... > AFAICT, the real driving force is the desire to not read-ahead > more than the pickle is long. This is what complicates the code. > The easiest (and most space-efficient) solution to that problem > would be to prefix the entire pickle with a data size field > (possibly in a variable-length representation), i.e. to make a > single frame. In a bout of giddy optimism, I suggested that earlier in the thread. It would be sweet :-) > If that was done, I would guess that Tim's concerns about brittleness > would go away (as you couldn't have a length field in the middle of > data). IMO, the PEP has nearly the same flaw as the HTTP chunked > transfer, which also puts length fields in the middle of the payload > (except that HTTP makes it worse by making them optional). > > Of course, a single length field has other drawbacks, such as having > to pickle everything before sending out the first bytes. And that's the killer. Pickle strings are generally produced incrementally, in smallish pieces. But that may go on for very many iterations, and there's no way to guess the final size in advance. I only see three ways to do it: 1. Hope the whole string fits in RAM. 2. Pickle twice, the first time just to get the final size (& throw the pickle pieces away on the first pass while summing their sizes). 3. Flush the pickle string to disk periodically, then after it's done read it up and copy it to the intended output stream. All of those really suck :-( BTW, I'm not a web guy: in what way is HTTP chunked transfer mode viewed as being flawed? Everything I ever read about it seemed to think it was A Good Idea. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [RELEASED] Python 3.3.3 final
How else would you know that anyone is listening? :) [yay, thanks for our being release monkey!] -gps On Tue, Nov 19, 2013 at 11:40 AM, Georg Brandl wrote: > Am 19.11.2013 17:14, schrieb Mark Lawrence: > > On 19/11/2013 06:59, Georg Brandl wrote: > >> > >> To download Python 3.3.3 rc2 visit: > >> > >> http://www.python.org/download/releases/3.3.3/ > >> > > > > Please make your mind up, final or rc2? > > > > Thanks everybody for your efforts, much appreciated :) > > It's my firm belief that every announce should have a small error > to appease the gods of regression. *ahem* > > :) > Georg > > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/greg%40krypto.org > ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 454 (tracemalloc) close to pronouncement
On 20 Nov 2013 10:48, "Victor Stinner" wrote: > > 2013/11/20 Guido van Rossum : > > I'm guessing CF is the PEP-BDFL? > > Yes, you delegated him this PEP :-) > > > So he should approve it (the period for > > final comments is long over) and then you can merge. > > I'm asking if the code *must* be merged before the beta1, or if it can > be merged later. Charles-François told me that he would like to review > the code after he reviewed the PEP. The feature needs to be available and the API stable in beta 1, but implementation details are free to change until the first RC. And yes, this does mean that beta 1 releases are often a bit "interesting" :) Cheers, Nick. > > Victor > > > > > On Tue, Nov 19, 2013 at 4:09 PM, Victor Stinner < victor.stin...@gmail.com> > > wrote: > >> > >> 2013/11/11 Charles-François Natali : > >> > After several exchanges with Victor, PEP 454 has reached a status > >> > which I consider ready for pronuncement [1]: so if you have any last > >> > minute comment, now is the time! > >> > >> So, what's going on? The deadline is Saturday, in 5 days. > >> > >> If the PEP is accepted (I hope so), should the code be merged before the > >> beta1? > >> > >> Victor > >> ___ > >> Python-Dev mailing list > >> Python-Dev@python.org > >> https://mail.python.org/mailman/listinfo/python-dev > >> Unsubscribe: > >> https://mail.python.org/mailman/options/python-dev/guido%40python.org > > > > > > > > > > -- > > --Guido van Rossum (python.org/~guido) > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Which direction is UnTransform? / Unicode is different
(Fri Nov 15 16:57:00 CET 2013) Stephen J. Turnbull wrote: > Serhiy Storchaka wrote: > > If the transform() method will be added, I prefer to have only > > one transformation method and specify a direction by the > > transformation name ("bzip2"/"unbzip2"). Me too. Until I consider special cases like "compress", or "lower", and realize that there are enough special cases to become a major wart if generic transforms ever became popular. > People think about these transformations as "en- or de-coding", not > "transforming", most of the time. Even for a transformation that is > an involution (eg, rot13), people have an very clear idea of what's > encoded and what's not, and they are going to prefer the names > "encode" and "decode" for these (generic) operations in many cases. I think this is one of the major stumbling blocks with unicode. I originally disagreed strongly with what Stephen wrote -- but then I realized that all my counterexamples involved unicode text. I can tell whether something is tarred or untarred, zipped or unzipped. But an 8-bit (even Latin-1, let alone ASCII) bytestring really doesn't seem "encoded", and it doesn't make sense to "decode" a perfectly readable (ASCII) string into a sequence of "code units". Nor does it help that http://www.unicode.org/glossary/#code_unit defines "code unit" as "The minimal bit combination that can represent a unit of encoded text for processing or interchange. The Unicode Standard uses 8-bit code units in the UTF-8 encoding form, 16-bit code units in the UTF-16 encoding form, and 32-bit code units in the UTF-32 encoding form. (See definition D77 in Section 3.9, Unicode Encoding Forms.)" I have to read that very carefully to avoid mentally translating it into "Code Units are *en*coded, and there are lots of different complicated encodings that I wouldn't use unless I were doing special processing or interchange." If I'm not using the network, or if my "interchange format" already looks like readable ASCII, then unicode sure sounds like a complication. I *will* get confused over which direction is encoding and which is decoding. (Removing .decode() from the (unicode) str type in 3 does help a lot, if I have a Python 3 interpreter running to check against.) I'm not sure exactly what implications the above has, but it certainly supports separating the Text Processing from the generic codecs, both in the documentation and in any potential new methods. Instead of relying on introspection of .decodes_to and .encodes_to, it would be useful to have charsetcodecs and tranformcodecs as entirely different modules, with their own separate registries. I will even note that the existing help(codecs) seems more appropriate for charsetcodecs than it does for the current conjoined module. -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 454 (tracemalloc) close to pronouncement
2013/11/20 Guido van Rossum : > I'm guessing CF is the PEP-BDFL? Yes, you delegated him this PEP :-) > So he should approve it (the period for > final comments is long over) and then you can merge. I'm asking if the code *must* be merged before the beta1, or if it can be merged later. Charles-François told me that he would like to review the code after he reviewed the PEP. Victor > > On Tue, Nov 19, 2013 at 4:09 PM, Victor Stinner > wrote: >> >> 2013/11/11 Charles-François Natali : >> > After several exchanges with Victor, PEP 454 has reached a status >> > which I consider ready for pronuncement [1]: so if you have any last >> > minute comment, now is the time! >> >> So, what's going on? The deadline is Saturday, in 5 days. >> >> If the PEP is accepted (I hope so), should the code be merged before the >> beta1? >> >> Victor >> ___ >> Python-Dev mailing list >> Python-Dev@python.org >> https://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: >> https://mail.python.org/mailman/options/python-dev/guido%40python.org > > > > > -- > --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 454 (tracemalloc) close to pronouncement
I'm guessing CF is the PEP-BDFL? So he should approve it (the period for final comments is long over) and then you can merge. On Tue, Nov 19, 2013 at 4:09 PM, Victor Stinner wrote: > 2013/11/11 Charles-François Natali : > > After several exchanges with Victor, PEP 454 has reached a status > > which I consider ready for pronuncement [1]: so if you have any last > > minute comment, now is the time! > > So, what's going on? The deadline is Saturday, in 5 days. > > If the PEP is accepted (I hope so), should the code be merged before the > beta1? > > Victor > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Accepting PEP 3154 for 3.4?
On Wed, 20 Nov 2013 00:56:13 +0100 "Martin v. Löwis" wrote: > AFAICT, the real driving force is the desire to not read-ahead > more than the pickle is long. This is what complicates the code. > The easiest (and most space-efficient) solution to that problem > would be to prefix the entire pickle with a data size field > (possibly in a variable-length representation), i.e. to make a > single frame. Pickling then becomes very problematic: you have to keep the entire pickle in memory until the end, when you finally can write the size at the beginning of the pickle. > If that was done, I would guess that Tim's concerns about brittleness > would go away (as you couldn't have a length field in the middle of > data). IMO, the PEP has nearly the same flaw as the HTTP chunked > transfer, which also puts length fields in the middle of the payload > (except that HTTP makes it worse by making them optional). Tim's concern is easily addressed with a FRAME opcode, without changing the overall scheme (as he lately proposed). Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 454 (tracemalloc) close to pronouncement
2013/11/11 Charles-François Natali : > After several exchanges with Victor, PEP 454 has reached a status > which I consider ready for pronuncement [1]: so if you have any last > minute comment, now is the time! So, what's going on? The deadline is Saturday, in 5 days. If the PEP is accepted (I hope so), should the code be merged before the beta1? Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Accepting PEP 3154 for 3.4?
Am 19.11.13 23:50, schrieb Antoine Pitrou: > Ok, thanks. So now that I look at the patch I see the following problems > with this idea: > > - "pickle + framing" becomes a different protocol than "pickle" alone, > which means we lose the benefit of protocol autodetection. It's as > though pickle.load() required you to give the protocol number, > instead of inferring it from the pickle bytestream. Not necessarily. Framing becomes a different protocol, yes. But autodetection would still be possible (it actually is possible in my proposed definition). > - it is less efficient than framing built inside pickle, since it > adds separate buffers and memory copies (while the point of framing > is to make buffering more efficient). Correct. However, if the intent is to reduce the number of system calls, then this is still achieved. > Your idea is morally similar to saying "we don't need to optimize the > size of pickles, since you can gzip them anyway". Not really. In the case of gzip, it might be that the size reduction of properly saving bytes in pickle might be even larger. Here, the wire representation, and the number of system calls is actually (nearly) identical. > However, the fact > that the _pickle module currently goes to lengths to try to optimize > buffering, implies to me that it's reasonable to also improve the > pickle protocol so as to optimize buffering. AFAICT, the real driving force is the desire to not read-ahead more than the pickle is long. This is what complicates the code. The easiest (and most space-efficient) solution to that problem would be to prefix the entire pickle with a data size field (possibly in a variable-length representation), i.e. to make a single frame. If that was done, I would guess that Tim's concerns about brittleness would go away (as you couldn't have a length field in the middle of data). IMO, the PEP has nearly the same flaw as the HTTP chunked transfer, which also puts length fields in the middle of the payload (except that HTTP makes it worse by making them optional). Of course, a single length field has other drawbacks, such as having to pickle everything before sending out the first bytes. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Accepting PEP 3154 for 3.4?
On Tue, 19 Nov 2013 23:05:07 +0100 "Martin v. Löwis" wrote: > Am 19.11.13 21:28, schrieb Antoine Pitrou: > > > Well, unless you propose a patch before Saturday, I will happily ignore > > your proposal. > > See > > http://bugs.python.org/file32709/framing.diff Ok, thanks. So now that I look at the patch I see the following problems with this idea: - "pickle + framing" becomes a different protocol than "pickle" alone, which means we lose the benefit of protocol autodetection. It's as though pickle.load() required you to give the protocol number, instead of inferring it from the pickle bytestream. - it is less efficient than framing built inside pickle, since it adds separate buffers and memory copies (while the point of framing is to make buffering more efficient). Your idea is morally similar to saying "we don't need to optimize the size of pickles, since you can gzip them anyway". However, the fact that the _pickle module currently goes to lengths to try to optimize buffering, implies to me that it's reasonable to also improve the pickle protocol so as to optimize buffering. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Accepting PEP 3154 for 3.4?
Am 19.11.13 21:28, schrieb Antoine Pitrou: > Well, unless you propose a patch before Saturday, I will happily ignore > your proposal. See http://bugs.python.org/file32709/framing.diff Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Accepting PEP 3154 for 3.4?
On Tue, 19 Nov 2013 16:06:22 -0600 Tim Peters wrote: > [Antoine] > >>> Ahah, ok, I see where you're going. But how many other implementations > >>> of unpickling are there? > > [Tim] > >> That's something you should have researched when writing the PEP ;-) > >> How many implementations of Python aren't CPython? That's probably > >> the answer. I'm not an expert on that, but there's more than one. > > [Antoine] > > But "how many of them use something else than Lib/pickle.py" is the > > actual question. > > I don't know - and neither do you ;-) > > I do know that I'd like, e.g., a version of pickletools.dis() in > CPython that _did_ show the framing bits, for debugging. That's a > bare-bones "unpickler". I don't know how many other "partial > unpicklers" exist in the wild either. But their lives would also be > much easier if the framing stuff were explicit. "Mandatory > optimization" should be an oxymoron ;-) Well, I don't think it's a big deal to add a FRAME opcode if it doesn't change the current framing logic. I'd like to defer to Alexandre on this one, anyway. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 428 - pathlib - ready for approval
On Tue, 19 Nov 2013 17:02:15 -0500 Brett Cannon wrote: > On Tue, Nov 19, 2013 at 4:04 PM, Antoine Pitrou wrote: > > > > > Hello, > > > > Guido has told me that he was ready to approve PEP 428 (pathlib) in its > > latest amended form. Here is the last call for any comments or > > arguments against approval, before Guido marks the PEP accepted (or > > changes his mind :-)). > > > > Is 'ext' going to exist with 'suffix'? Seems redundant (I'm guessing the > example is out-of-date and 'ext' was changed to suffix). > > And a very minor grammatical thing: "you have to lookup a dedicate > attribute" -> "dedicated" Thanks. Both errors are now fixed. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Accepting PEP 3154 for 3.4?
[Antoine] >>> Ahah, ok, I see where you're going. But how many other implementations >>> of unpickling are there? [Tim] >> That's something you should have researched when writing the PEP ;-) >> How many implementations of Python aren't CPython? That's probably >> the answer. I'm not an expert on that, but there's more than one. [Antoine] > But "how many of them use something else than Lib/pickle.py" is the > actual question. I don't know - and neither do you ;-) I do know that I'd like, e.g., a version of pickletools.dis() in CPython that _did_ show the framing bits, for debugging. That's a bare-bones "unpickler". I don't know how many other "partial unpicklers" exist in the wild either. But their lives would also be much easier if the framing stuff were explicit. "Mandatory optimization" should be an oxymoron ;-) > ... > The problem with "let's make the unpickler more lenient in a later > version" is that then you have protocol 4 pickles that won't work with > all protocol 4-accepting versions of the pickle module. Yup. s/4/5/ would need to be part of a delayed optimization. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 428 - pathlib - ready for approval
On Tue, Nov 19, 2013 at 4:04 PM, Antoine Pitrou wrote: > > Hello, > > Guido has told me that he was ready to approve PEP 428 (pathlib) in its > latest amended form. Here is the last call for any comments or > arguments against approval, before Guido marks the PEP accepted (or > changes his mind :-)). > Is 'ext' going to exist with 'suffix'? Seems redundant (I'm guessing the example is out-of-date and 'ext' was changed to suffix). And a very minor grammatical thing: "you have to lookup a dedicate attribute" -> "dedicated" ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Accepting PEP 3154 for 3.4?
On Tue, 19 Nov 2013 15:41:51 -0600 Tim Peters wrote: > [Tim] > >> ... > >> But if some _other_ implementation of unpickling didn't give a hoot > >> about framing, having an explicit opcode means that implementation > >> could ignore the whole scheme very easily: just implement the FRAME > >> opcode in *its* opcode-decoding loop to consume the FRAME argument, > >> ignore it, and move on. As-is, all other implementations _have_ to > >> know everything about the buffering scheme because it's all implicit > >> low-level magic. > > [Antoine] > > Ahah, ok, I see where you're going. But how many other implementations > > of unpickling are there? > > That's something you should have researched when writing the PEP ;-) > How many implementations of Python aren't CPython? That's probably > the answer. I'm not an expert on that, but there's more than one. But "how many of them use something else than Lib/pickle.py" is the actual question. > > Otherwise the "later optimization" can't work. > > Right. _If_ reducing framing overhead to "nothing" for small pickles > turns out to be sufficiently desirable, then the buffering layer would > need to learn how to turn itself off in the absence of a FRAME opcode > immediately following the current frame. Perhaps the opcode decoding > loop would also need to learn how to turn the buffering layer back on > again too (next time a FRAME opcode showed up). Sounds annoying, but > not impossible. The problem with "let's make the unpickler more lenient in a later version" is that then you have protocol 4 pickles that won't work with all protocol 4-accepting versions of the pickle module. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Accepting PEP 3154 for 3.4?
[Tim] >> ... >> But if some _other_ implementation of unpickling didn't give a hoot >> about framing, having an explicit opcode means that implementation >> could ignore the whole scheme very easily: just implement the FRAME >> opcode in *its* opcode-decoding loop to consume the FRAME argument, >> ignore it, and move on. As-is, all other implementations _have_ to >> know everything about the buffering scheme because it's all implicit >> low-level magic. [Antoine] > Ahah, ok, I see where you're going. But how many other implementations > of unpickling are there? That's something you should have researched when writing the PEP ;-) How many implementations of Python aren't CPython? That's probably the answer. I'm not an expert on that, but there's more than one. >> Initially, all I desperately ;-) want changed here is for the >> _buffering layer_, on the writing end, to write 9 bytes instead of 8 >> (1 new one for a FRAME opcode), and on the reading end to consume 9 >> bytes instead of 8 (extra credit if it checked the first byte to >> verify it really is a FRAME opcode - there's nothing wrong with sanity >> checks). >> >> Then it becomes _possible_ to optimize "small pickles" later (in the >> sense of not bothering to frame them at all). > So the CPython unpickler must be able to work with and without framing > by detecting the FRAME opcode? Not at first, no. At first the buffering layer could raise an exception if there's no FRAME opcode when it expected one. Or just read up garbage bytes and assume it's a frame size, which is effectively what it's doing now anyway ;-) > Otherwise the "later optimization" can't work. Right. _If_ reducing framing overhead to "nothing" for small pickles turns out to be sufficiently desirable, then the buffering layer would need to learn how to turn itself off in the absence of a FRAME opcode immediately following the current frame. Perhaps the opcode decoding loop would also need to learn how to turn the buffering layer back on again too (next time a FRAME opcode showed up). Sounds annoying, but not impossible. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Accepting PEP 3154 for 3.4?
On Tue, 19 Nov 2013 15:17:06 -0600 Tim Peters wrote: > > > Note some drawbacks of frame opcodes: > > - the decoder has to sanity check the frame opcodes (what if a frame > > opcode is encountered when already inside a frame?) > > - a pickle-mutating function such as pickletools.optimize() may naively > > ignore the frame opcodes while rearranging the pickle stream, only to > > emit a new pickle with invalid frame sizes > > I suspect we have very different mental models here. By "has an > opcode", I do NOT mean "must be visible to the opcode-decoding loop". > I just mean "has a unique byte assigned in the pickle opcode space". > > I expect that in the CPython implementation of unpickling, the > buffering layer would _consume_ the FRAME opcode, along with the frame > size. The opcode-decoding loop would never see it. > > But if some _other_ implementation of unpickling didn't give a hoot > about framing, having an explicit opcode means that implementation > could ignore the whole scheme very easily: just implement the FRAME > opcode in *its* opcode-decoding loop to consume the FRAME argument, > ignore it, and move on. As-is, all other implementations _have_ to > know everything about the buffering scheme because it's all implicit > low-level magic. Ahah, ok, I see where you're going. But how many other implementations of unpickling are there? > Initially, all I desperately ;-) want changed here is for the > _buffering layer_, on the writing end, to write 9 bytes instead of 8 > (1 new one for a FRAME opcode), and on the reading end to consume 9 > bytes instead of 8 (extra credit if it checked the first byte to > verify it really is a FRAME opcode - there's nothing wrong with sanity > checks). > > Then it becomes _possible_ to optimize "small pickles" later (in the > sense of not bothering to frame them at all). So the CPython unpickler must be able to work with and without framing by detecting the FRAME opcode? Otherwise the "later optimization" can't work. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Accepting PEP 3154 for 3.4?
[Tim] >> ... >> better than implicit" kinds of reasons. The only way now to know that >> you're looking at a frame size is to keep a running count of bytes >> processed and realize you've reached a byte offset where a frame size >> "is expected". [Antoine] > That's integrated to the built-in buffering. Well, obviously, because it wouldn't work at all unless the built-in buffering knew all about it ;-) > It's not really an additional constraint: the frame sizes simply > dictate how buffering happens in practice. The main point of > framing is to *simplify* the buffering logic (of course, the old > buffering logic is still there for protocols <= 3, unfortunately). And always will be - there are no pickle simplifications, because everything always sticks around forever. Over time, pickle just gets more complicated. That's in the nature of the beast. > Note some drawbacks of frame opcodes: > - the decoder has to sanity check the frame opcodes (what if a frame > opcode is encountered when already inside a frame?) > - a pickle-mutating function such as pickletools.optimize() may naively > ignore the frame opcodes while rearranging the pickle stream, only to > emit a new pickle with invalid frame sizes I suspect we have very different mental models here. By "has an opcode", I do NOT mean "must be visible to the opcode-decoding loop". I just mean "has a unique byte assigned in the pickle opcode space". I expect that in the CPython implementation of unpickling, the buffering layer would _consume_ the FRAME opcode, along with the frame size. The opcode-decoding loop would never see it. But if some _other_ implementation of unpickling didn't give a hoot about framing, having an explicit opcode means that implementation could ignore the whole scheme very easily: just implement the FRAME opcode in *its* opcode-decoding loop to consume the FRAME argument, ignore it, and move on. As-is, all other implementations _have_ to know everything about the buffering scheme because it's all implicit low-level magic. So, then, to the 2 points you raised: 1. If the CPython decoder ever sees a FRAME opcode, I expect it to raise an exception. That's all - it's an invalid pickle (or bug in the code) if it contains a FRAME the buffering layer didn't consume. 2. pickletools.optimize() in the CPython implementation should never see a FRAME opcode either. Initially, all I desperately ;-) want changed here is for the _buffering layer_, on the writing end, to write 9 bytes instead of 8 (1 new one for a FRAME opcode), and on the reading end to consume 9 bytes instead of 8 (extra credit if it checked the first byte to verify it really is a FRAME opcode - there's nothing wrong with sanity checks). Then it becomes _possible_ to optimize "small pickles" later (in the sense of not bothering to frame them at all). So long as frames remain implicit magic, that's impossible without moving to yet another new protocol level. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] PEP 428 - pathlib - ready for approval
Hello, Guido has told me that he was ready to approve PEP 428 (pathlib) in its latest amended form. Here is the last call for any comments or arguments against approval, before Guido marks the PEP accepted (or changes his mind :-)). Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Accepting PEP 3154 for 3.4?
19.11.13 21:59, Antoine Pitrou написав(ла): Note some drawbacks of frame opcodes: - the decoder has to sanity check the frame opcodes (what if a frame opcode is encountered when already inside a frame?) This is only one simple check when reading the frame opcode. - a pickle-mutating function such as pickletools.optimize() may naively ignore the frame opcodes while rearranging the pickle stream, only to emit a new pickle with invalid frame sizes But with naked frame sizes without opcodes it have even more chance to produce invalid pickle. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Accepting PEP 3154 for 3.4?
On Tue, 19 Nov 2013 21:25:34 +0100 "Martin v. Löwis" wrote: > Am 19.11.13 20:59, schrieb Antoine Pitrou: > > That's integrated to the built-in buffering. It's not really an > > additional constraint: the frame sizes simply dictate how buffering > > happens in practice. The main point of framing is to *simplify* the > > buffering logic (of course, the old buffering logic is still there for > > protocols <= 3, unfortunately). > > I wonder why this needs to be part of the pickle protocol at all, > if it really is "below" the opcodes. Anybody desiring framing could > just implement a framing version of the io.BufferedReader, which > could be used on top of a socket connection (say) to allow fetching > larger blocks from the network stack. This would then be transparent > to the pickle implementation; the framing reader would, of course, > provide the peek() operation to allow the unpickler to continue to use > buffering. > > Such a framing BufferedReader might even be included in the standard > library. Well, unless you propose a patch before Saturday, I will happily ignore your proposal. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Accepting PEP 3154 for 3.4?
Am 19.11.13 20:59, schrieb Antoine Pitrou: > That's integrated to the built-in buffering. It's not really an > additional constraint: the frame sizes simply dictate how buffering > happens in practice. The main point of framing is to *simplify* the > buffering logic (of course, the old buffering logic is still there for > protocols <= 3, unfortunately). I wonder why this needs to be part of the pickle protocol at all, if it really is "below" the opcodes. Anybody desiring framing could just implement a framing version of the io.BufferedReader, which could be used on top of a socket connection (say) to allow fetching larger blocks from the network stack. This would then be transparent to the pickle implementation; the framing reader would, of course, provide the peek() operation to allow the unpickler to continue to use buffering. Such a framing BufferedReader might even be included in the standard library. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Accepting PEP 3154 for 3.4?
On Tue, 19 Nov 2013 13:22:52 -0600 Tim Peters wrote: > [Guido] > > So using an opcode for framing is out? (Sorry, I've lost track of the > > back-and-forth.) > > It was never in ;-) I'd *prefer* one, but not enough to try to block > the PEP. As is, framing is done at a "lower level" than opcode > decoding. I fear this is brittle, for all the usual "explicit is > better than implicit" kinds of reasons. The only way now to know that > you're looking at a frame size is to keep a running count of bytes > processed and realize you've reached a byte offset where a frame size > "is expected". That's integrated to the built-in buffering. It's not really an additional constraint: the frame sizes simply dictate how buffering happens in practice. The main point of framing is to *simplify* the buffering logic (of course, the old buffering logic is still there for protocols <= 3, unfortunately). Note some drawbacks of frame opcodes: - the decoder has to sanity check the frame opcodes (what if a frame opcode is encountered when already inside a frame?) - a pickle-mutating function such as pickletools.optimize() may naively ignore the frame opcodes while rearranging the pickle stream, only to emit a new pickle with invalid frame sizes Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [RELEASED] Python 3.3.3 final
Am 19.11.2013 17:14, schrieb Mark Lawrence: > On 19/11/2013 06:59, Georg Brandl wrote: >> >> To download Python 3.3.3 rc2 visit: >> >> http://www.python.org/download/releases/3.3.3/ >> > > Please make your mind up, final or rc2? > > Thanks everybody for your efforts, much appreciated :) It's my firm belief that every announce should have a small error to appease the gods of regression. *ahem* :) Georg ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Accepting PEP 3154 for 3.4?
[Guido] > So using an opcode for framing is out? (Sorry, I've lost track of the > back-and-forth.) It was never in ;-) I'd *prefer* one, but not enough to try to block the PEP. As is, framing is done at a "lower level" than opcode decoding. I fear this is brittle, for all the usual "explicit is better than implicit" kinds of reasons. The only way now to know that you're looking at a frame size is to keep a running count of bytes processed and realize you've reached a byte offset where a frame size "is expected". With an opcode, framing could also be optional (whenever desired), because frame sizes would be _explicitly_ marked in the byte stream Then the framing overhead for small pickles could drop to 0 bytes (instead of the current 8, or 1 thru 9 under various other schemes). Ideal would be an explicit framing opcode combined with variable-length size encoding. That would never require more bytes than the current scheme, and 'almost always" require fewer. But even I don't think it's of much value to chop a few bytes off every 64KB of pickle ;-) ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Accepting PEP 3154 for 3.4?
Well, both fixed 8-byte framing and variable-size framing it introduces a new way of representing numbers in the stream, which means that everyone parsing and generating pickles must be able to support both styles. (But fixed is easier since the XXX8 opcodes use the same format.) I'm thinking of how you correctly read a pickle from a non-buffering pipe with the minimum number of read() calls without ever reading beyond the end of a valid pickle. (That's a requirement, right?) If you know it's protocol 4: with fixed framing: read 10 bytes, that's the magic word plus first frame; then you can start buffering with variable framing: read 3 bytes, then depending on the 3rd byte read some more to find the frame size; then you can start buffering with mandatory frame opcode: pretty much the same with optional frame opcode: pretty much the same (the 3rd byte must be a valid opcode, even if it isn't a frame opcode) if you don't know the protocol number: read the first byte, then read the second byte (or not if it's not explicitly versioned), then you know the protocol and can do the rest as above On Tue, Nov 19, 2013 at 11:11 AM, Antoine Pitrou wrote: > On Tue, 19 Nov 2013 11:05:45 -0800 > Guido van Rossum wrote: > > > So using an opcode for framing is out? (Sorry, I've lost track of the > > back-and-forth.) > > It doesn't seem to bring anything, and it makes the overhead worse for > tiny pickles (since it will be two bytes at least, instead of one byte > with the current variable length encoding proposal). > > If overhead doesn't matter, I'm fine with keeping a simple 8-bytes > frame size :-) > > Regards > > Antoine. > -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Accepting PEP 3154 for 3.4?
On Tue, 19 Nov 2013 11:05:45 -0800 Guido van Rossum wrote: > So using an opcode for framing is out? (Sorry, I've lost track of the > back-and-forth.) It doesn't seem to bring anything, and it makes the overhead worse for tiny pickles (since it will be two bytes at least, instead of one byte with the current variable length encoding proposal). If overhead doesn't matter, I'm fine with keeping a simple 8-bytes frame size :-) Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Accepting PEP 3154 for 3.4?
So using an opcode for framing is out? (Sorry, I've lost track of the back-and-forth.) On Tue, Nov 19, 2013 at 10:57 AM, Antoine Pitrou wrote: > On Tue, 19 Nov 2013 10:52:58 -0800 > Guido van Rossum wrote: > > So why is framing different? > > Because it doesn't use opcodes, so it can't use different opcodes to > differentiate between different frame size widths :-) > > Regards > > Antoine. > -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Accepting PEP 3154 for 3.4?
On Tue, 19 Nov 2013 10:52:58 -0800 Guido van Rossum wrote: > So why is framing different? Because it doesn't use opcodes, so it can't use different opcodes to differentiate between different frame size widths :-) Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Accepting PEP 3154 for 3.4?
On Tue, 19 Nov 2013 19:51:10 +0100 Antoine Pitrou wrote: > On Mon, 18 Nov 2013 16:48:05 -0800 > Guido van Rossum wrote: > > > > Food for thought: maybe we should have variable-encoding lengths for all > > opcodes, rather than the current cumbersome scheme? > > Well, it's not that cumbersome... If you look at CPU encodings, they > also tend to have different opcodes for different immediate lengths. > > In your case, I'd say it mostly leads to a bit of code duplication. But > the opcode space is far from exhausted right now :) Oops... Make that "in our case", of course. cheers Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Accepting PEP 3154 for 3.4?
So why is framing different? On Tue, Nov 19, 2013 at 10:51 AM, Antoine Pitrou wrote: > On Mon, 18 Nov 2013 16:48:05 -0800 > Guido van Rossum wrote: > > > > Food for thought: maybe we should have variable-encoding lengths for all > > opcodes, rather than the current cumbersome scheme? > > Well, it's not that cumbersome... If you look at CPU encodings, they > also tend to have different opcodes for different immediate lengths. > > In your case, I'd say it mostly leads to a bit of code duplication. But > the opcode space is far from exhausted right now :) > > Regards > > Antoine. > -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Accepting PEP 3154 for 3.4?
On Mon, 18 Nov 2013 16:48:05 -0800 Guido van Rossum wrote: > > Food for thought: maybe we should have variable-encoding lengths for all > opcodes, rather than the current cumbersome scheme? Well, it's not that cumbersome... If you look at CPU encodings, they also tend to have different opcodes for different immediate lengths. In your case, I'd say it mostly leads to a bit of code duplication. But the opcode space is far from exhausted right now :) Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Accepting PEP 3154 for 3.4?
Food for thought: maybe we should have variable-encoding lengths for all opcodes, rather than the current cumbersome scheme? Funny, it sounds like UTF-8 :-) ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Accepting PEP 3154 for 3.4?
Food for thought: maybe we should have variable-encoding lengths for all opcodes, rather than the current cumbersome scheme? Funny, it sounds like UTF-8 :-) ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Mixed up core/module source file locations in CPython
On Sun, Nov 17, 2013 at 6:20 AM, Brett Cannon wrote: > > On Nov 17, 2013 8:58 AM, "Eli Bendersky" wrote: > > > > > > > > > > On Sat, Nov 16, 2013 at 3:44 PM, Brett Cannon wrote: > >> > >> > >> > >> > >> On Sat, Nov 16, 2013 at 1:40 PM, Eric Snow > wrote: > >>> > >>> If you look at the Python and Modules directories in the cpython repo, > >>> you'll find modules in Python/ and core files (like python.c and > >>> main.c) in Modules/. (It's like parking on a driveway and driving on > >>> a parkway. ) It's not that big a deal and not that hard to > >>> figure out (so I'm fine with the status quo), but it is a bit > >>> surprising. When I was first getting familiar with the code base a > >>> few years ago (as a C non-expert), it was a not insignificant but not > >>> major stumbling block. > >>> > >>> The situation is mostly a consequence of history, if I understand > >>> correctly. The subject has come up before and I don't recall any > >>> objections to doing something about it. I haven't had the time to > >>> track down those earlier discussions, though I remember Benjamin > >>> having some comment about it. > >>> > >>> Would it be too disruptive (churn, etc.) to clean this up in 3.5? I > >>> see it similarly to when I moved a light switch from outside my > >>> bathroom to inside. For a while, but not that long, I kept > >>> unconsciously reaching for the switch that was no longer there on the > >>> outside. Regardless I'm glad I did it. Likewise, moving the handful > >>> of files around is a relatively inconsequential change that would make > >>> the project just a little less surprising, particularly for new > >>> contributors. > >>> > >>> -eric > >>> > >>> p.s. Either way I'll probably take some time (it shouldn't take long) > >>> after the PEP 451 implementation is done to put together a patch that > >>> moves the files around, just to see what difference it makes. > >> > >> > >> I personally think it would be a good idea to re-arrange the files to > make things more beginner-friendly. I believe Nick was also talking about > renaming directories, etc. at some point. > > > > > > If we're concerned with the beginner-friendliness of our source layout, > I'll have to mention that I have a full ASDL parser lying around that's > written in Python 3.4 (enums!) without using Spark. So that's one less > custom tool to carry around with Python, less files and less LOCs in > general. Just sayin' ;-) > > Then stop saying and check it in (or at least open a big for it). :) > > http://bugs.python.org/issue19655 Eli ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Python-checkins] cpython: Close #17828: better handling of codec errors
On 19 November 2013 22:24, Walter Dörwald wrote: > On 15.11.13 00:02, Greg Ewing wrote: > >> Walter Dörwald wrote: >>> >>> Unfortunaty the frame from the decorator shows up in the traceback. >> >> >> Maybe the decorator could remove its own frame from >> the traceback? > > > True, this could be done via either an additional attribute on the frame, or > a special value for frame.f_annotation. > > Would we want to add frame annotations to every function call in the Python > stdlib? Certainly not. So which functions would get annotations and which > ones won't? > > When we have many annotations, doing it with a decorator might be a > performance problem, as each function call goes through another stack level. > > Is there any other way to implement it? Yep, you make the annotations a mapping and use module based naming as a convention to avoid conflicts: http://bugs.python.org/issue18861#msg202754 However, that issue also goes into why this is definitely a PEP level question - there's a bunch of things that are currently painful that a general frame annotation mechanism could help simplify. The challenge is to do it in a way that doesn't hurt performance in the normal case, that is acceptable to other interpreter implementations, and to show that it actually *does* make it possible to clean up at least the already noted issues: - avoiding inadvertent suppression of the original context when another exception gets replaced or suppressed inside an exception handler - more reliably hiding traceback frames involving the importlib machinery - more reliably reporting the codec involved in an encoding or decoding error (for example, the exception chaining I added for 3.4 can't provide any context for failures in the bz2_codec input validation, because that throws a stateful OSError that the chaining system can't handle, and thus doesn't wrap) Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Python-checkins] cpython: Close #17828: better handling of codec errors
On 15.11.13 00:02, Greg Ewing wrote: Walter Dörwald wrote: Unfortunaty the frame from the decorator shows up in the traceback. Maybe the decorator could remove its own frame from the traceback? True, this could be done via either an additional attribute on the frame, or a special value for frame.f_annotation. Would we want to add frame annotations to every function call in the Python stdlib? Certainly not. So which functions would get annotations and which ones won't? When we have many annotations, doing it with a decorator might be a performance problem, as each function call goes through another stack level. Is there any other way to implement it? Servus, Walter ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com