Re: [Python-Dev] peps: PEP 456: add some of the new implementation details to the PEP's text
On Wed, 13 Nov 2013 23:33:02 +0100 (CET) christian.heimes wrote: > > > +Small string optimization > += > + > +Hash functions like SipHash24 have a costly initialization and finalization > +code that can dominate speed of the algorithm for very short strings. On the > +other hand Python calculates the hash value of short strings quite often. A > +simple and fast function for especially for hashing of small strings can make > +a measurably impact on performance. For example these measurements were taken > +during a run of Python's regression tests. Additional measurements of other > +code have shown a similar distribution. Well, the text above talks about a "measurably (typo?) impact on performance", but you aren't giving any performance numbers, which doesn't help the reader of those lines. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Python-checkins] Daily reference leaks (784a02ec2a26): sum=522
On 14 Nov 2013 13:52, wrote: > > results for 784a02ec2a26 on branch "default" > > > test_codeccallbacks leaked [40, 40, 40] references, sum=120 > test_codeccallbacks leaked [40, 40, 40] memory blocks, sum=120 > test_codecs leaked [38, 38, 38] references, sum=114 > test_codecs leaked [24, 24, 24] memory blocks, sum=72 > test_email leaked [16, 16, 16] references, sum=48 > test_email leaked [16, 16, 16] memory blocks, sum=48 Hmm, it appears I have a reference leak somewhere. Cheers, Nick. > > > Command line was: ['./python', '-m', 'test.regrtest', '-uall', '-R', '3:3:/home/antoine/cpython/refleaks/reflogx2QIb_', '-x'] > ___ > Python-checkins mailing list > python-check...@python.org > https://mail.python.org/mailman/listinfo/python-checkins ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Python-checkins] Daily reference leaks (784a02ec2a26): sum=522
On 14 Nov 2013 21:58, "Nick Coghlan" wrote: > > > On 14 Nov 2013 13:52, wrote: > > > > results for 784a02ec2a26 on branch "default" > > > > > > test_codeccallbacks leaked [40, 40, 40] references, sum=120 > > test_codeccallbacks leaked [40, 40, 40] memory blocks, sum=120 > > test_codecs leaked [38, 38, 38] references, sum=114 > > test_codecs leaked [24, 24, 24] memory blocks, sum=72 > > test_email leaked [16, 16, 16] references, sum=48 > > test_email leaked [16, 16, 16] memory blocks, sum=48 > > Hmm, it appears I have a reference leak somewhere. Ah, Benjamin fixed it already. Thanks! :) Cheers, Nick. > > Cheers, > Nick. > > > > > > > Command line was: ['./python', '-m', 'test.regrtest', '-uall', '-R', '3:3:/home/antoine/cpython/refleaks/reflogx2QIb_', '-x'] > > ___ > > Python-checkins mailing list > > python-check...@python.org > > https://mail.python.org/mailman/listinfo/python-checkins ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Python-checkins] cpython: Close #17828: better handling of codec errors
On 13.11.13 17:25, Nick Coghlan wrote: On 14 November 2013 02:12, Nick Coghlan wrote: On 14 November 2013 00:30, Walter Dörwald wrote: On 13.11.13 14:51, nick.coghlan wrote: http://hg.python.org/cpython/rev/854a2cea31b9 changeset: 87084:854a2cea31b9 user:Nick Coghlan date:Wed Nov 13 23:49:21 2013 +1000 summary: Close #17828: better handling of codec errors - output type errors now redirect users to the type-neutral convenience functions in the codecs module - stateless errors that occur during encoding and decoding will now be automatically wrapped in exceptions that give the name of the codec involved Wouldn't it be better to add an annotation API to the exceptions classes? This would allow to annotate all exceptions without having to replace the exception object. Hmm, it might be better to have the traceback machinery print the annotation information instead of BaseException.__str__, so we don't get any compatibility issues with custom __str__ implementations. There's a reason the C API for this is private - it's a band aid fix, because solving it properly is hard :) Note that the specific problem with just annotating the exception rather than a specific frame is that you lose the stack context for where the annotation occurred. The current chaining workaround doesn't just change the exception message, it also breaks the stack into two pieces (inside and outside the codec) that get displayed separately. Mostly though, it boils down to the fact that I'm far more comfortable changing codec exception stack trace details in some cases than I am proposing a new API for all exceptions this close to the Python 3.4 feature freeze. Sure, this is something that might go into 3.5, but not 3.4. A more elegant (and comprehensive) solution as a PEP for 3.5 would certainly be a nice thing to have, but I think this is still much better than the 3.3 status quo. Thinking further about this, I like your "frame annotation" suggestion Tracebacks could then look like this: >>> b"hello".decode("uu_codec") Traceback (most recent call last): File "", line 1, in : decoding with 'uu_codec' codec failed ValueError: Missing "begin" line in input data In fact the traceback already lays out the chain of events. What is missing is simply a little additional information. Could frame annotation be added via decorators, i.e. something like this: @annotate("while doing something with {param}") def func(param): do something annotate() would catch the exception, call .format() on the annotation string with the local variables of the frame as keyword arguments, attach the result to a special attribute of the frame and reraise the exception. The traceback machinery would simply have to print this additional attribute. Servus, Walter Servus, Walter ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Python-checkins] Daily reference leaks (784a02ec2a26): sum=522
2013/11/14 Antoine Pitrou : > On Thu, 14 Nov 2013 22:01:32 +1000 > Nick Coghlan wrote: >> On 14 Nov 2013 21:58, "Nick Coghlan" wrote: >> > >> > >> > On 14 Nov 2013 13:52, wrote: >> > > >> > > results for 784a02ec2a26 on branch "default" >> > > >> > > >> > > test_codeccallbacks leaked [40, 40, 40] references, sum=120 >> > > test_codeccallbacks leaked [40, 40, 40] memory blocks, sum=120 >> > > test_codecs leaked [38, 38, 38] references, sum=114 >> > > test_codecs leaked [24, 24, 24] memory blocks, sum=72 >> > > test_email leaked [16, 16, 16] references, sum=48 >> > > test_email leaked [16, 16, 16] memory blocks, sum=48 >> > >> > Hmm, it appears I have a reference leak somewhere. >> >> Ah, Benjamin fixed it already. Thanks! :) > > The reference leak task has been running for quite some time on my > personal machine and I believe it has proven useful. I have no problem > continuing running it on the same machine (which is mostly sitting idle > anyway), but maybe it should rather be hosted on our CI infrastructure? > Any suggestions? Thank you very much for running that, btw. I'm sure we would have released a lot of horribly leaking stuff without it. > > (the script is quite rough with hardcoded stuff, but beating it into > better shape could be a nice target for first-time contributors) Perhaps someone can figure out how to run it on one of the the buildbots? -- Regards, Benjamin ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Python-checkins] cpython: Close #17828: better handling of codec errors
On 14.11.13 14:22, Walter Dörwald wrote: On 13.11.13 17:25, Nick Coghlan wrote: >> [...] A more elegant (and comprehensive) solution as a PEP for 3.5 would certainly be a nice thing to have, but I think this is still much better than the 3.3 status quo. Thinking further about this, I like your "frame annotation" suggestion Tracebacks could then look like this: >>> b"hello".decode("uu_codec") Traceback (most recent call last): File "", line 1, in : decoding with 'uu_codec' codec failed ValueError: Missing "begin" line in input data In fact the traceback already lays out the chain of events. What is missing is simply a little additional information. Could frame annotation be added via decorators, i.e. something like this: @annotate("while doing something with {param}") def func(param): do something annotate() would catch the exception, call .format() on the annotation string with the local variables of the frame as keyword arguments, attach the result to a special attribute of the frame and reraise the exception. The traceback machinery would simply have to print this additional attribute. http://bugs.python.org/19585 is a patch that implements that. With the patch the following code: import traceback @traceback.annotate("while handling x={x!r}") def handle(x): raise ValueError(42) handle("spam") will give the traceback: Traceback (most recent call last): File "spam.py", line 8, in handle("spam") File "frame-annotation/Lib/traceback.py", line 322, in wrapped f(*args, **kwargs) File "spam.py", line 5, in handle: while handling x='spam' raise ValueError(42) ValueError: 42 Unfortunaty the frame from the decorator shows up in the traceback. Servus, Walter ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] unicode Exception messages in py2.7
Folks, (note this is about 2.7 -- sorry, but a lot of us still use that! I can only assume that in 3.* this is a non-issue) I just discovered an issue that's been around a long time: If you create an Exception with a unicode object for the message, the message can be silently ignored if it can not be encoded to ASCII (or, more properly, the default encoding). In my use-case, I was parsing a text file (utf-8), and wanted a bit of that text to be part of the Exception message (an error reading the file, I wanted the user to know what the text was surrounding the ill-formated part of the text file). What I got was a blank message, and it took a lot of poking at it to figure out why. My solution was: msg = u"Problem with line %i: %s This is not a valid time slot"%(linenum, line) raise ValueError(msg.encode('ascii', 'ignore')) which is really pretty painfully clunky. This is an issue brought up in various tutorial and blog posts, and all the solutions I've seen involve some similar clunkiness. I also found this issue in the issue tracker: http://bugs.python.org/issue2517 Which was resolved years ago, but as far as I can tell, only solved the problem of being able to do: unicode(an_exception) and get the proper unicode message object. But we still can't raise the darn thing and expect the user to see the message. Why is this the case? I can print a unicode object to the terminal, why can't raising an Exception print a unicode object? I can imagine for backward compatibility, or maybe for non-unicode terminals, or ??? Exceptions do need to print as ascii. However, having a message simply get swallowed up and disappear seems like the wrong solution. - auto-conversion to a default encoding is fraught with problems all over the board -- I know that. I also know that too much code would break too often if we didn't have auto-conversion. - for the most part, the auto-conversion uses 'strict' mode -- I generally dislike this, as it means code crashes when odd stuff gets introduced after testing, but I can see why it is done. - However, I can see why for raising Exceptions, the decision was made to swallow that error, so that the actual Exception intended is raised, rather than a new UnicodeEncodeError. - But combining 'strict' with ignoring the encoding exception seems like the worst of both worlds. So a proposal: Use 'replace" mode for the encoding to the default, and at least the user would see SOMETHING of the message. In a common case, it would be a lot of ascii, and in the worse case it would be a lot of question marks -- still better than a totally blank message. Another option would be to use the str(repr(the_message)) so the user would get the escaped version. Though I think that would be more ugly. What am I missing? This seems so obvious, and easy to do (though maybe it's buried in the C implementation of Exceptions) -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The pysandbox project is broken
On Wed, Nov 13, 2013 at 10:27 AM, Brett Cannon wrote: > > > > On Wed, Nov 13, 2013 at 1:05 PM, Eli Bendersky wrote: > >> >> >> >> On Wed, Nov 13, 2013 at 6:58 AM, Brett Cannon wrote: >> >>> >>> >>> >>> On Wed, Nov 13, 2013 at 6:30 AM, Facundo Batista < >>> facundobati...@gmail.com> wrote: >>> On Wed, Nov 13, 2013 at 4:37 AM, Maciej Fijalkowski wrote: >> Do you think it would be productive to create an independent Python >> compiler, designed with sandboxing in mind from the beginning? > > PyPy sandbox does work FYI > > It might not do exactly what you want, but it both provides a full > python and security. If we have sandboxing using PyPy... what also we need to put Python running in the browser? (like javascript, you know) Thanks! >>> >>> You can try to get PNaCl to work with Python to get a Python executable >>> that at least Chrome can run. >>> >> >> Two corrections: >> >> 1. CPython already works with NaCl and PNaCl (there are working patches >> in naclports to build it) >> > > Anything that should be upstreamed? > > >> 2. It can be used outside Chrome as well, using the standalone "sel_ldr" >> tool that will then allow to run a sandboxed CPython .nexe from the command >> line >> > > Sure, but I was just thinking about the "in browser" question Facundo > asked about. > FWIW, if you already have Chrome 31, go to: http://commondatastorage.googleapis.com/nativeclient-mirror/naclports/pepper_33/988/publish/python/pnacl/index.html This is CPython running on top of PNaCl, at near-native speed. With C extensions. With threads. It's 2.7.5 but we'll put up 3.4 too soon (anyone can do it though - based on naclports). The first load takes a bit of time, afterwards it's cached and instantaneous. Now all that's left is for someone to come up with a friendly API to wrap around the Pepper interface to conveniently access DOM :-) Eli ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] unicode Exception messages in py2.7
2013/11/14 Chris Barker : > So a proposal: > > Use 'replace" mode for the encoding to the default, and at least the > user would see SOMETHING of the message. In a common case, it would be > a lot of ascii, and in the worse case it would be a lot of question > marks -- still better than a totally blank message. > > Another option would be to use the str(repr(the_message)) so the user > would get the escaped version. Though I think that would be more ugly. Unfortunately both of these things change behavior so cannot be changed in Python 2.7. > > What am I missing? This seems so obvious, and easy to do (though maybe > it's buried in the C implementation of Exceptions) -- Regards, Benjamin ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] unicode Exception messages in py2.7
2013/11/14 Chris Barker : > (note this is about 2.7 -- sorry, but a lot of us still use that! I > can only assume that in 3.* this is a non-issue) > > I just discovered an issue that's been around a long time: > > If you create an Exception with a unicode object for the message, (...) In Python 2, there are too many similar corner cases. It is impossible to fix these bugs without taking the risk of introducing a regression. Seriously, *all* these tricky bugs are fixed in Python 3. So don't loose time on trying to workaround them, but invest in the future: upgrade to Python 3! Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The pysandbox project is broken
Hi Victor, On Wed, Nov 13, 2013 at 12:58 AM, Victor Stinner wrote: > I now gave up on sandboxing Python. I just would like to warn other > core developers that trying to put a sandbox in Python is not a good > idea :-) I cannot thank you enough for writing this mail :-) It is a great place to point people to when they come along with some superficial idea about sandboxing Python. A bientôt, Armin. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] unicode Exception messages in py2.7
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 11/14/2013 04:02 PM, Benjamin Peterson wrote: > 2013/11/14 Chris Barker : >> So a proposal: >> >> Use 'replace" mode for the encoding to the default, and at least >> the user would see SOMETHING of the message. In a common case, it >> would be a lot of ascii, and in the worse case it would be a lot of >> question marks -- still better than a totally blank message. >> >> Another option would be to use the str(repr(the_message)) so the >> user would get the escaped version. Though I think that would be >> more ugly. > > Unfortunately both of these things change behavior so cannot be > changed in Python 2.7. Fixing any bug is "changing behavior"; 2.7 is not frozen for bugfixes. The real question is whether third-party code will break when the now-empty error messages appear with '?' littered through them? About the only things I can think of which might break would be doctests, but people *expect* those to break across third-dot releases of Python (one reason why I hate them). Exception repr is explicitly *not* part of any backward-compatibility guarantees in Python. Or code which explicitly works around the breakage could fail (urlparse changes between 2.7.3 and 2.7.4, anyone?d( Tres. - -- === Tres Seaver +1 540-429-0999 tsea...@palladion.com Palladion Software "Excellence by Design"http://palladion.com -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlKFRscACgkQ+gerLs4ltQ6JIgCgvNxHugjjbR3L1crSDK0QJiLb LSYAn2cJnZ8almcfCmWHKhOnCP69bpB3 =MIFq -END PGP SIGNATURE- ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Add transform() and untranform() methods
Hi, I saw that Nick Coghlan documented codecs.encode() and codecs.decode(), and changed the exception raised when codecs like rot_13 are used on bytes.decode() and str.encode(). I don't like the functions codecs.encode() and codecs.decode() because the type of the result depends on the encoding (second parameter). We try to avoid this in Python. I would prefer to split the registry of codecs to have 3 registries: - "encoding" (a better name can found): encode str=>bytes, decode bytes=>str - bytes: encode bytes=>bytes, decode bytes=>bytes - str: encode str=>str, decode str=>str And add transform() and untransform() methods to bytes and str types. In practice, it might be same codecs registry for all codecs just with a new attribute. Examples: - utf8: encoding - zlib: bytes - rot13: str The result type of bytes.transform/untransform would be bytes, and the result type of str.transform/untransform would be str. I don't know which exception should be raised when a codec is used in the wrong method. LookupError? TypeError "codec xxx cannot be used with method xxx.xx"? Something else? codecs.encode/decode() documentation should be removed. The functions should be kept, just in case if someone uses them. Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Add transform() and untranform() methods
Oh, I forgot to mention that I sent this email in reaction to this issue: http://bugs.python.org/issue19585 Modifying the critical PyFrameObject because the codecs API raises surprising errors doesn't sound correct. I prefer to fix how codecs are used, than modifying the PyFrameObject. For more information, see the issue #7475 which a long history (4 years) and many messages. Martin von Loewis wrote "I would still be opposed to such a change, and I think it needs a PEP." and I still agree with him on this point. Because they are different opinions and no consensus, a PEP is required to explain why we took this decision and list rejected alternatives. http://bugs.python.org/issue7475 Victor 2013/11/14 Victor Stinner : > Hi, > > I saw that Nick Coghlan documented codecs.encode() and > codecs.decode(), and changed the exception raised when codecs like > rot_13 are used on bytes.decode() and str.encode(). > > I don't like the functions codecs.encode() and codecs.decode() because > the type of the result depends on the encoding (second parameter). We > try to avoid this in Python. > > I would prefer to split the registry of codecs to have 3 registries: > > - "encoding" (a better name can found): encode str=>bytes, decode bytes=>str > - bytes: encode bytes=>bytes, decode bytes=>bytes > - str: encode str=>str, decode str=>str > > And add transform() and untransform() methods to bytes and str types. > In practice, it might be same codecs registry for all codecs just with > a new attribute. > > Examples: > > - utf8: encoding > - zlib: bytes > - rot13: str > > The result type of bytes.transform/untransform would be bytes, and the > result type of str.transform/untransform would be str. > > I don't know which exception should be raised when a codec is used in > the wrong method. LookupError? TypeError "codec xxx cannot be used > with method xxx.xx"? Something else? > > codecs.encode/decode() documentation should be removed. The functions > should be kept, just in case if someone uses them. > > Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] peps: PEP 456: add some of the new implementation details to the PEP's text
On 11/14/2013 4:00 AM, Antoine Pitrou wrote: On Wed, 13 Nov 2013 23:33:02 +0100 (CET) christian.heimes wrote: +Small string optimization += + +Hash functions like SipHash24 have a costly initialization and finalization +code that can dominate speed of the algorithm for very short strings. On the +other hand Python calculates the hash value of short strings quite often. A +simple and fast function for especially for hashing of small strings can make 'for especially for hashing' is garbled. Delete first 'for'. +a measurably impact on performance. For example these measurements were taken 'measurable' +during a run of Python's regression tests. Additional measurements of other +code have shown a similar distribution. -- Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] unicode Exception messages in py2.7
On 11/14/2013 4:55 PM, Tres Seaver wrote: About the only things I can think of which might break would be doctests, but people *expect* those to break across third-dot releases of Python (one reason why I hate them). My impression is that we avoid enhancing correct exception messages in bugfix (third-dot) releases because of both doctests and other in-code examination of messages. > Exception repr is explicitly *not* part of any backward-compatibility guarantees in Python. So we more freely change exception messages in version (second-dot) releases, without deprecation notices or waiting periods. -- Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Python-checkins] cpython: Close #17828: better handling of codec errors
Walter Dörwald wrote: Unfortunaty the frame from the decorator shows up in the traceback. Maybe the decorator could remove its own frame from the traceback? -- Greg ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Add transform() and untranform() methods
On 15 Nov 2013 08:34, "Victor Stinner" wrote: > > Hi, > > I saw that Nick Coghlan documented codecs.encode() and > codecs.decode(), and changed the exception raised when codecs like > rot_13 are used on bytes.decode() and str.encode(). > > I don't like the functions codecs.encode() and codecs.decode() because > the type of the result depends on the encoding (second parameter). We > try to avoid this in Python. The type signature of those functions is just object -> object (Similar to the way the 2.x convenience methods were actually basestring -> basestring). > I would prefer to split the registry of codecs to have 3 registries: > > - "encoding" (a better name can found): encode str=>bytes, decode bytes=>str > - bytes: encode bytes=>bytes, decode bytes=>bytes > - str: encode str=>str, decode str=>str > You have to get it out of your head that codecs are just about text and and binary data. They're not: they're arbitrary type transforms, and MAL deliberately wrote the module that way. > And add transform() and untransform() methods to bytes and str types. > In practice, it might be same codecs registry for all codecs just with > a new attribute. This is completely the wrong approach. There's zero justification for adding new builtin methods for this use case - encoding and decoding are generic operations, they should use functions not methods. What could be useful is allowing CodecInfo objects to supply an "expected input type" and an "expected output type" (ABCs and instance check overrides make that quite flexible). > > Examples: > > - utf8: encoding > - zlib: bytes > - rot13: str > > The result type of bytes.transform/untransform would be bytes, and the > result type of str.transform/untransform would be str. > > I don't know which exception should be raised when a codec is used in > the wrong method. LookupError? TypeError "codec xxx cannot be used > with method xxx.xx"? Something else? We already do this check in the existing convenience methods - it raises TypeError. > > codecs.encode/decode() documentation should be removed. The functions > should be kept, just in case if someone uses them. No. They're part of the regression test suite, and have been since Python 2.4. They embody MAL's intended "arbitrary type transform library" approach. They provide a source compatible mechanism for using binary codecs in single code base Python 2/3 projects. At this point, the only person that can get me to revert this clarification of MAL's original vision for the codecs module is Guido, since anything else completely fails to address the Python 3 adoption barrier posed by the current state of Python 3's binary codec support. Note that the only behavioural changes in the commits so far were to exception handling - everything else was just docs. The next planned commit (to restore the binary codec aliases) *is* a behavioural change - that's why I posted to the list about it (it received only two responses, both +1) Cheers, Nick. > > Victor > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Add transform() and untranform() methods
On 15 Nov 2013 08:42, "Victor Stinner" wrote: > > Oh, I forgot to mention that I sent this email in reaction to this issue: > > http://bugs.python.org/issue19585 > > Modifying the critical PyFrameObject because the codecs API raises > surprising errors doesn't sound correct. I prefer to fix how codecs > are used, than modifying the PyFrameObject. > > For more information, see the issue #7475 which a long history (4 > years) and many messages. Martin von Loewis wrote "I would still be > opposed to such a change, and I think it needs a PEP." and I still > agree with him on this point. Because they are different opinions and > no consensus, a PEP is required to explain why we took this decision > and list rejected alternatives. > > http://bugs.python.org/issue7475 Martin wrote that before it was pointed out there were existing functions to handle the problem (I was asking for a PEP back then, too). I posted my plan for dealing with this months ago without receiving any complaints, and I'm annoyed you waited until I had actually followed through and implemented it to complain about it and ask for Python 3's binary codec support to stay broken instead :P (Starting a new thread instead of replying to the one where I specifically asked about taking the next step does nothing to improve my mood) Regards, Nick. > > Victor > > 2013/11/14 Victor Stinner : > > Hi, > > > > I saw that Nick Coghlan documented codecs.encode() and > > codecs.decode(), and changed the exception raised when codecs like > > rot_13 are used on bytes.decode() and str.encode(). > > > > I don't like the functions codecs.encode() and codecs.decode() because > > the type of the result depends on the encoding (second parameter). We > > try to avoid this in Python. > > > > I would prefer to split the registry of codecs to have 3 registries: > > > > - "encoding" (a better name can found): encode str=>bytes, decode bytes=>str > > - bytes: encode bytes=>bytes, decode bytes=>bytes > > - str: encode str=>str, decode str=>str > > > > And add transform() and untransform() methods to bytes and str types. > > In practice, it might be same codecs registry for all codecs just with > > a new attribute. > > > > Examples: > > > > - utf8: encoding > > - zlib: bytes > > - rot13: str > > > > The result type of bytes.transform/untransform would be bytes, and the > > result type of str.transform/untransform would be str. > > > > I don't know which exception should be raised when a codec is used in > > the wrong method. LookupError? TypeError "codec xxx cannot be used > > with method xxx.xx"? Something else? > > > > codecs.encode/decode() documentation should be removed. The functions > > should be kept, just in case if someone uses them. > > > > Victor > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] "*zip-bomb" via codecs
It is possible make a DDoS using the fact that codecs registry provides access to gzip and bzip2 decompressor. Someone can send HTTP request or email message with specified "gzip_codec" or "bzip2_codec" as content encoding and great well compressed gzip- or bzip2-file as a content. Naive server will use the bytes.decode() method to decompress a content. It is possible to create small compressed files which require very much time and memory to decompress. Of course bytes.decode() will fail becouse decoder returns bytes instead string, but time and memory are already wasted. I have no working example but I'm sure it will be easy to create it. I suspect many services will be vulnerable for this attack. Simple solution for this problem is check any foreign encoding that it is conteined in a special set of safe encodings. But every program should check it explicitly. For more general solution bytes.decode() should reject encoding *before* starting of decoding. I.e. either all bytes->str decoders should be registered in separated registry, or all codecs should have additional attributes which determines input and output type. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] unicode Exception messages in py2.7
On 11/14/2013 02:59 PM, Terry Reedy wrote: On 11/14/2013 4:55 PM, Tres Seaver wrote: About the only things I can think of which might break would be doctests, but people *expect* those to break across third-dot releases of Python (one reason why I hate them). My impression is that we avoid enhancing correct exception messages in bugfix (third-dot) releases because of both doctests and other in-code examination of messages. But these exception messages are incorrect, and so we are okay to fix them, yes? -- ~Ethan~ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Add transform() and untranform() methods
15.11.13 01:03, Nick Coghlan написав(ла): We already do this check in the existing convenience methods - it raises TypeError. The problem with this check is that it happens *after* encoding/decoding. This opens door for DoS (see my last message). ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Add transform() and untranform() methods
On 15 Nov 2013 09:11, "Nick Coghlan" wrote: > > > On 15 Nov 2013 08:42, "Victor Stinner" wrote: > > > > Oh, I forgot to mention that I sent this email in reaction to this issue: > > > > http://bugs.python.org/issue19585 > > > > Modifying the critical PyFrameObject because the codecs API raises > > surprising errors doesn't sound correct. I prefer to fix how codecs > > are used, than modifying the PyFrameObject. > > > > For more information, see the issue #7475 which a long history (4 > > years) and many messages. Martin von Loewis wrote "I would still be > > opposed to such a change, and I think it needs a PEP." and I still > > agree with him on this point. Because they are different opinions and > > no consensus, a PEP is required to explain why we took this decision > > and list rejected alternatives. > > > > http://bugs.python.org/issue7475 > > Martin wrote that before it was pointed out there were existing functions to handle the problem (I was asking for a PEP back then, too). > > I posted my plan for dealing with this months ago without receiving any complaints, and I'm annoyed you waited until I had actually followed through and implemented it to complain about it and ask for Python 3's binary codec support to stay broken instead :P Something I *would* be entirely happy to do is write a retroactive PEP after beta 1 is out the door, explaining the history of this issue in a more coherent form than the comment history on issue 7475 and the many child issues it spawned. This would also provide a better launching point for other enhancements in Python 3.5 (frame annotations to remove the need for the exception chaining hack and better input validation mechanisms for codecs that allow the convenience methods to check that case explicitly rather than relying on the exception chaining). Cheers, Nick. > > (Starting a new thread instead of replying to the one where I specifically asked about taking the next step does nothing to improve my mood) > > Regards, > Nick. > > > > > Victor > > > > 2013/11/14 Victor Stinner : > > > Hi, > > > > > > I saw that Nick Coghlan documented codecs.encode() and > > > codecs.decode(), and changed the exception raised when codecs like > > > rot_13 are used on bytes.decode() and str.encode(). > > > > > > I don't like the functions codecs.encode() and codecs.decode() because > > > the type of the result depends on the encoding (second parameter). We > > > try to avoid this in Python. > > > > > > I would prefer to split the registry of codecs to have 3 registries: > > > > > > - "encoding" (a better name can found): encode str=>bytes, decode bytes=>str > > > - bytes: encode bytes=>bytes, decode bytes=>bytes > > > - str: encode str=>str, decode str=>str > > > > > > And add transform() and untransform() methods to bytes and str types. > > > In practice, it might be same codecs registry for all codecs just with > > > a new attribute. > > > > > > Examples: > > > > > > - utf8: encoding > > > - zlib: bytes > > > - rot13: str > > > > > > The result type of bytes.transform/untransform would be bytes, and the > > > result type of str.transform/untransform would be str. > > > > > > I don't know which exception should be raised when a codec is used in > > > the wrong method. LookupError? TypeError "codec xxx cannot be used > > > with method xxx.xx"? Something else? > > > > > > codecs.encode/decode() documentation should be removed. The functions > > > should be kept, just in case if someone uses them. > > > > > > Victor > > ___ > > Python-Dev mailing list > > Python-Dev@python.org > > https://mail.python.org/mailman/listinfo/python-dev > > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Add transform() and untranform() methods
15.11.13 00:32, Victor Stinner написав(ла): And add transform() and untransform() methods to bytes and str types. In practice, it might be same codecs registry for all codecs just with a new attribute. If the transform() method will be added, I prefer to have only one transformation method and specify a direction by the transformation name ("bzip2"/"unbzip2"). ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] unicode Exception messages in py2.7
On Thu, Nov 14, 2013 at 04:55:19PM -0500, Tres Seaver wrote: > Fixing any bug is "changing behavior"; 2.7 is not frozen for bugfixes. It's not a given that the current behaviour *is* a bug. Exception messages in 2 are byte-strings, not Unicode. Trying to use Unicode instead is not, as far as I can tell, supported behaviour. If the exception message cannot be converted to a byte-string, suppressing the display of the message seems like perfectly reasonable behaviour to me: py> class NoString: ... def __str__(self): ... raise ValueError ... py> msg = NoString py> msg = NoString() py> print msg Traceback (most recent call last): File "", line 1, in ? File "", line 3, in __str__ ValueError py> raise TypeError(msg) Traceback (most recent call last): File "", line 1, in ? TypeErrorpy> although it would be nice if a newline was used so the prompt was bumped to the next line. The point is, I'm not convinced that this is a bug at all. > The real question is whether third-party code will break when the > now-empty error messages appear with '?' littered through them? This behaviour goes back to at least Python 2.4, the oldest version I have easy access to at the moment that includes Unicode. Given that this alleged bug has been around for so long, I don't think that it effects terribly many people. That implies that fixing it won't benefit many people either. > About the only things I can think of which might break would be doctests, > but people *expect* those to break across third-dot releases of Python Which people? I certainly don't expect doctests to break unless I've done something silly. > (one reason why I hate them). Exception repr is explicitly *not* part of > any backward-compatibility guarantees in Python. Do you have a link for that explicit non-guarantee from the docs please? -- Steven ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] unicode Exception messages in py2.7
On Thu, Nov 14, 2013 at 1:55 PM, Tres Seaver wrote: > Fixing any bug is "changing behavior"; 2.7 is not frozen for bugfixes. Thank you. > The real question is whether third-party code will break when the > now-empty error messages appear with '?' littered through them? right -- any bugfix changes behaviour, and any that can break any test or code that is expecting (or working around) that behavior. So the key question here is are there many (any?) tests or function code out there that are counting on an empty message if and only if there happens to be a non-ascii charactor in an assigned message. It's hard for me to imagine that that's a common thing to test for, but then I'm been known to lack imagination ;-) > About the only things I can think of which might break would be doctests, > but people *expect* those to break across third-dot releases of Python > (one reason why I hate them). Exception repr is explicitly *not* part of > any backward-compatibility guarantees in Python. Or code which > explicitly works around the breakage could fail (urlparse changes between > 2.7.3 and 2.7.4, anyone?d( Sounds do-able to me, then... -Thanks, -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] unicode Exception messages in py2.7
On Thu, Nov 14, 2013 at 1:20 PM, Victor Stinner >> If you create an Exception with a unicode object for the message, (...) > > In Python 2, there are too many similar corner cases. It is impossible > to fix these bugs without taking the risk of introducing a regression. Yes, there are -- the auto-encoding is a serious pain everywhere. However, this is a case where the resulting Exception is silenced -- it's the only one I know of, and there can't be many like that. > Seriously, *all* these tricky bugs are fixed in Python 3. So don't > loose time on trying to workaround them, but invest in the future: > upgrade to Python 3! Maybe so -- but we are either maintaining 2.7 or not -- it WIL be around for along time yet... (amazing to me how many people are still using <=2.7, actually, even for new projects .. thank you Red Hat "Enterprise" Linux ;-) ) -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] unicode Exception messages in py2.7
On Thu, Nov 14, 2013 at 3:58 PM, Steven D'Aprano wrote: > It's not a given that the current behaviour *is* a bug. I'll concede that it's not a bug unless someone said somewhere that unicode messages should work .. but that's kind of a semantic argument. I have to say it's a very odd choice to me that it suppresses the message, rather than raising an encoding error, like what happens everywhere else the default encoding is used. In fact, I noticed that the message can be anything that can be stringified, which makes it particularly wacky that you can't use a unicode object. > Exception > messages in 2 are byte-strings, not Unicode. well, they are anything that you can call str() on anyway... > Trying to use Unicode > instead is not, as far as I can tell, supported behaviour. clearly not > If the exception message cannot be converted to a byte-string, > suppressing the display of the message seems like perfectly reasonable > behaviour to me: well, yes and no -- the fact is that unicode objects ARE special -- and it wouldn't hurt to treat them that way. And I'm not sure that suppressing the message when you've passed in a weird object that raises an exception when you try to convert it to a string makes sense either -- suppressing an exception is really not a good idea in general -- you really should have a good reason for it. I'm guessing that this was put in to save a lot of crashing from unicode objects, but what do I know? Actually, when I think about it, Exceptions being raised when you call str(0 on something are probably pretty rare -- if you define a class with no __str__ method, you get a default string version -- there can't be many use-cases where you want to make sure no one tries to make a string out of your object... > although it would be nice if a newline was used so the prompt was bumped > to the next line. yup -- that would be good. > The point is, I'm not convinced that this is a bug at all. OK -- to clarify the discussion a bit: I think we all agree that this is not a fatal bug that MUST be fixed. Is this something that could be improved or is the current behavior the best we could have, given the limitations of strings an unicode in py2 anyway? If it's not a desirable change, then we're done -- sorry for the noise. If it is a desirable change, then is the benefit worth the possible breakage of code. Do assess that, you need to trade off the size of the benefit with the amount of breakage. I think it would be a pretty nice benefit I can't see that it would cause a lot of breakage. Any idea how we could assess how much code or tests are out there in the would that this would affect? I contend that it wouldn't be much because: If I had thought to write a test for this, I would have thought to fix my code so that it would either never use a unicode object for a message, or, like I have done in my code, encode it when passing it in to the Exception. There is certainly a chance that some doctests would break, if people had not looked carefully at them -- i.e. that wanted to test that the exception was raised, but did not notice that the message didn't get through. How many are there? who knows? -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Add transform() and untranform() methods
On 11/14/2013 5:32 PM, Victor Stinner wrote: I don't like the functions codecs.encode() and codecs.decode() because the type of the result depends on the encoding (second parameter). We try to avoid this in Python. Such dependence is common with arithmetic. >>> 1 + 2 3 >>> 1 + 2.0 3.0 >>> 1 + 2+0j (3+0j) >>> sum((1,2,3), 0) 6 >>> sum((1,2,3), 0.0) 6.0 >>> sum((1,2,3), 0.0+0j) (6+0j) for f in (compile, eval, getattr, iter, max, min, next, open, pow, round, type, vars): type(f(*args)) # depends on the inputs That is a large fraction of the non-class builtin functions. -- Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Add transform() and untranform() methods
On 11/14/2013 6:03 PM, Nick Coghlan wrote: You have to get it out of your head that codecs are just about text and and binary data. 99+% of the current codec module doc leads one to that impression. The fact that codecs are expected to have a file reader and writer and that the default 'strict' error handler is specified in 2 out of the 3 mostly redundant lists as raising a UnicodeError reinforces the impression. They're not: they're arbitrary type transforms, and MAL deliberately wrote the module that way. Generic functions are quite pythonic. However, I am not sure how much benefit there is to registering an arbitrary pair of bijective functions This is completely the wrong approach. There's zero justification for adding new builtin methods for this use case - encoding and decoding are generic operations, they should use functions not methods. Making 2&3 code easier is certainly a good reason for the codecs approach. The next planned commit (to restore the binary codec aliases) *is* a behavioural change - that's why I posted to the list about it (it received only two responses, both +1) If I understand correctly, I am mildly +1, but did not respond, thinking that 2 to 0 was sufficient response for you to continue ;-). -- Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] unicode Exception messages in py2.7
On 11/14/2013 6:57 PM, Chris Barker wrote: On Thu, Nov 14, 2013 at 1:20 PM, Victor Stinner Seriously, *all* these tricky bugs are fixed in Python 3. So don't loose time on trying to workaround them, but invest in the future: upgrade to Python 3! Maybe so -- but we are either maintaining 2.7 or not That statement is too 'binary'. We normally fix general bugs* for two years and security bugs for 3 more years. That is already 'trinary'. For 2.7, we have already done 3 1/2 years of general bug fixing. I expect that that will taper off for the next 1 1/2 years. * We sometimes do not back port a bug fix that theorectically could be backported because we think it would be too disruptive (because people depend on the bug). When we fix a bug with a feature change that cannot be backported, we do not usually create a separate backport patch unless the bug is severe. In either case, people who want the fix must upgrade. Many unicode bugs in 2.x were fixed in 3.0 by making unicode the text type. For some but not all unicode issues, separate patches have been made for 2.7. People who want the general fix must upgrade. (The unicode future import gives some of the benefits, but maybe not all.) A few more unicode bugs were fixed in 3.3 with the flexible string representation. People who want the 3.3 fix must upgrade, even from 3.2. -- it WIL be around for along time yet... 1.5 was around for a long time; not sure if it is completely gone yet. -- Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] unicode Exception messages in py2.7
On Thu, Nov 14, 2013 at 09:09:06PM -0500, Terry Reedy wrote: > 1.5 was around for a long time; not sure if it is completely gone yet. It's not. I forget the details, but after the last American PyCon, somebody posted a message about a fellow they met who was still using 1.5 in production. -- Steven ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] unicode Exception messages in py2.7
On Thu, Nov 14, 2013 at 04:02:17PM -0800, Chris Barker wrote: > On Thu, Nov 14, 2013 at 1:55 PM, Tres Seaver wrote: > > > Fixing any bug is "changing behavior"; 2.7 is not frozen for bugfixes. > > Thank you. > > > The real question is whether third-party code will break when the > > now-empty error messages appear with '?' littered through them? > > right -- any bugfix changes behaviour It isn't clear that this is a bug at all. Non-ascii Unicode strings are just a special case of the more general problem of what to do if printing the exception raises. If str(exception.message) raises, suppressing the message seems like a perfectly reasonable approach to me. -- Steven ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] unicode Exception messages in py2.7
On 11/14/2013 7:41 PM, Chris Barker wrote: On Thu, Nov 14, 2013 at 3:58 PM, Steven D'Aprano wrote: It's not a given that the current behaviour *is* a bug. I'll concede that it's not a bug unless someone said somewhere that unicode messages should work In particular, what does the reference manual say. .. but that's kind of a semantic argument. Given that committing a patch to an existing version is a binary action -- done or not, we have to have a binary semantic decision, 'bug' or not, even when the best answer is 'sort of'. We cannot 'sort of' apply a patch ;-). I have to say it's a very odd choice to me that it suppresses the message, rather than raising an encoding error, like what happens everywhere else the default encoding is used. An encoding exception is raised but ignored. Exception handling has changed in some details in 3.x. Sometimes two sensible actions interact in certain contexts to produce an odd result. In fact, I noticed that the message can be anything that can be stringified, which makes it particularly wacky that you can't use a unicode object. You can, as long as it can be stringified with the default args. If it cannot be, then convert it yourself, with the alternative you choose (raise or substitute). Is this something that could be improved or is the current behavior the best we could have, given the limitations of strings an unicode in py2 anyway? From our (core developer viewpoint) that is the wrong question. 2.7 does not get enhancements. The situation would be different if there were going to be a 2.8. -- Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] unicode Exception messages in py2.7
On 14Nov2013 15:57, Chris Barker - NOAA Federal wrote: > (amazing to me how many people are still using <=2.7, actually, even > for new projects .. thank you Red Hat "Enterprise" Linux ;-) ) Well, one of the things RHEL gets you is platform stability (they backport fixes; primarily security in the older RHEL streams). So of course the Python dates to the time of the release. I install a current Python 2.7 into /usr/local on many RHEL boxes and target that for custom code. -- Cameron Simpson There is this special biologist word we use for 'stable'. It is 'dead'. - Jack Cohen ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] unicode Exception messages in py2.7
On 15Nov2013 14:08, Steven D'Aprano wrote: > On Thu, Nov 14, 2013 at 04:02:17PM -0800, Chris Barker wrote: > > right -- any bugfix changes behaviour > > It isn't clear that this is a bug at all. > > Non-ascii Unicode strings are just a special case of the more general > problem of what to do if printing the exception raises. If > str(exception.message) raises, suppressing the message seems like a > perfectly reasonable approach to me. Not to me. Silent failure is really nasty. In fact, doesn't the Zen speak explicitly against it? I'm debugging a program right now with silent failures; my own code, with functions submitted to a queue for asynchronous execution, and the queue preserves the function result (or exception) for collection later; if that collection doesn't happen you get... silent failure! I think that if an exception escapes to the outside for reporting, if the reporting raises an exception (especially an "expectable" one like unicode coding/decoding errors), the reporting should have at least a layer of "ouch, report failed, try something uglier but more conservative". At least you'd know there had been a failure. Cheers, -- Cameron Simpson Windows is really user friendly - it doesn't crash on its own, it first opens a dialog box, saying it will crash and you have to click OK :-) - Zoltan Kocsi ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] unicode Exception messages in py2.7
On Fri, Nov 15, 2013 at 02:28:48PM +1100, Cameron Simpson wrote: > > Non-ascii Unicode strings are just a special case of the more general > > problem of what to do if printing the exception raises. If > > str(exception.message) raises, suppressing the message seems like a > > perfectly reasonable approach to me. > > Not to me. Silent failure is really nasty. In fact, doesn't the Zen > speak explicitly against it? But its not really a silent failure, since you're already dealing with an exception, and that's the important one. The original exception is not suppressed, just the error message. If the original exception was replaced with a different exception: # this doesn't actually happen py> raise ValueError(u"¿what?") Traceback (most recent call last): File "", line 1, in ? TypeError: error displaying exception message py> or lost altogether: # neither does this py> raise ValueError(u"¿what?") py> then I would consider that a bug. Ideally, we should get a chained exception so you can see both the original and subsequent exceptions: Traceback (most recent call last): File "", line 2, in ValueError: During handling of the above exception, another exception occurred: Traceback (most recent call last): File "", line 4, in UnicodeEncodeError: 'ascii' codec can't encode character u'\xbf' in position 0: ordinal not in range(128) but Python 2 doesn't have chained exceptions so that's not an option. As for the Zen, the nice thing about that is that it can argue both sides of most questions :-) The Zen has something else to say about this: Special cases aren't special enough to break the rules. Except as the next line in the Zen suggests, sometimes they are :-) UnicodeEncoding errors are just a special case of arbitrary objects that can't be converted to byte strings. If the exception message can't be stringified, in general there's really nothing you can do about it. I suppose one might argue for inserting a meta-error message: ValueError: ***the error message could not be displayed*** but that strikes me as too subtle, potentially confusing, and generally problematic. Ultimately, in the absense of chained exceptions I don't think there's any good solution to the general problem, and I'm not convinced that treating Unicode strings as a special case is justified. It's been at least four, and possibly six (back to 2.2) point releases with this behaviour, and until now apparently nobody has noticed. -- Steven ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Add transform() and untranform() methods
Am 15.11.2013 um 00:42 schrieb Serhiy Storchaka : > > 15.11.13 00:32, Victor Stinner написав(ла): >> And add transform() and untransform() methods to bytes and str types. >> In practice, it might be same codecs registry for all codecs just with >> a new attribute. > > If the transform() method will be added, I prefer to have only one > transformation method and specify a direction by the transformation name > ("bzip2"/"unbzip2"). +1 Some of the transformations might not be revertible (s.transform("lower")? ;)) And the transform function probably doesn't need any error handling machinery. What about the stream/iterator/incremental parts of the codec API? Servus, Walter ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Add transform() and untranform() methods
On 15 November 2013 11:10, Terry Reedy wrote: > On 11/14/2013 5:32 PM, Victor Stinner wrote: > >> I don't like the functions codecs.encode() and codecs.decode() because >> the type of the result depends on the encoding (second parameter). We >> try to avoid this in Python. > > > Such dependence is common with arithmetic. > 1 + 2 > 3 1 + 2.0 > 3.0 1 + 2+0j > (3+0j) > sum((1,2,3), 0) > 6 sum((1,2,3), 0.0) > 6.0 sum((1,2,3), 0.0+0j) > (6+0j) > > for f in (compile, eval, getattr, iter, max, min, next, open, pow, round, > type, vars): > type(f(*args)) # depends on the inputs > That is a large fraction of the non-class builtin functions. *Type* dependence between inputs and outputs is common (and completely non-controversial). The codecs system is different, since the supported input and output types are *value* dependent, driven by the name of the codec. That's the part which makes the codec machinery interesting in general, since it combines a value driven lazy loading mechanism (based on the codec name) with the subsequent invocation of that mechanism: the default codec search algorithm goes hunting in the "encodings" package (or the alias dictionary), but you can register custom search algorithms and provide encodings any way you want. It does mean, however, that the most you can claim for the type signature of codecs.encode and codecs.decode is that they accept an object and return an object. Beyond that, it's completely driven by the value of the codec. In Python 2.x, the type constraints imposed by the str and unicode convenience methods is "basestring in, basestring out". As it happens, all of the standard library codecs abide by that restriction , so it was easy to interpret the codecs module itself as having the same "basestring in, basestring out" limitation, especially given the heavy focus on text encodings in the way it was documented. In practice, the codecs weren't that open ended - some of them only accepted 8 bit strings, some only accepted unicode, some accepted both (perhaps relying on implicit decoding to unicode), The migration to Python 3 made the contrast between the two far more stark however, hence the long and involved discussion on issue 7475, and the fact that the non-Unicode codecs are currently still missing their shorthand aliases. The proposal I posted to issue 7475 back in April (and, in the absence of any objections to the proposal, finally implemented over the past few weeks) was to take advantage of the fact that the codecs.encode and codecs.decode convenience functions exist (and have been covered by the regression test suite) as far back as Python 2.4. I did this merely by documenting the existing of the functions for Python 2.7, 3.3 and 3.4, changing the exception messages thrown for codec output type errors on the convenience methods to reference them, and by updating the Python 3.4 What's New document to explain the changes. This approach provides a Python 2/3 compatible solution for usage of non-Unicode encodings: users simply need to call the existing module level functions in the codecs module, rather than using the methods on specific builtin types. This approach also means that the binary codecs can be used with any bytes-like object (including memoryview and array.array), rather than being limited to types that implement a new method (like "transform"), and can also be used in Python 2/3 source compatible APIs (since the data driven nature of the problem makes 2to3 unusable as a solution, and that doesn't help single code base projects anyway). >From my point of view, this is now just a matter of better documenting the status quo, and nudging people in the right direction when it comes to using the appropriate API for non-Unicode codecs. Since we now realise these functions have existed since Python 2.4, it doesn't make sense to try to fundamentally change direction, but instead to work on making it better. A few things I noticed while implementing the recent updates: - as you noted in your other email, while MAL is on record as saying the codecs module is intended for arbitrary codecs, not just Unicode encodings, readers of the current docs can definitely be forgiven for not realising that. We really need to better separate the codecs module docs from the text model docs (two new sections in the language reference, one for the codecs machinery and one for the text model would likely be appropriate. The io module docs and those for the builtin open function may also be affected) - a mechanism for annotating frames would help avoid the need for nasty hacks like the exception wrapping that aims to make codec failures easier to debug - if codecs exposed a way to separate the input type check from the invocation of the codec, we could redirect users to the module API for bad input types as well (e.g. calling "input str".encode("bz2") - if we want something that doesn't need to be imported, then encode() and decode() builtins ma
Re: [Python-Dev] [Python-checkins] cpython: Close #17828: better handling of codec errors
Nick Coghlan, 13.11.2013 17:25: > Note that the specific problem with just annotating the exception > rather than a specific frame is that you lose the stack context for > where the annotation occurred. The current chaining workaround doesn't > just change the exception message, it also breaks the stack into two > pieces (inside and outside the codec) that get displayed separately. I find this specific chain of exceptions a bit excessive, though: """ Failed example: str(result) Expected: Traceback (most recent call last): ... LookupError: unknown encoding: UCS4 Got: LookupError: unknown encoding: UCS4 The above exception was the direct cause of the following exception: Traceback (most recent call last): File ".../py3km/python/lib/python3.4/doctest.py", line 1291, in __run compileflags, 1), test.globs) File "", line 1, in str(result) File "xslt.pxi", line 727, in lxml.etree._XSLTResultTree.__str__ (src/lxml/lxml.etree.c:143584) File "xslt.pxi", line 750, in lxml.etree._XSLTResultTree.__unicode__ (src/lxml/lxml.etree.c:143853) LookupError: decoding with 'UCS4' codec failed (LookupError: unknown encoding: UCS4) """ I can't see any bit of information being added by chaining the exceptions in this specific case. Remember that each change to exception messages and/or exception chaining will break someone's doctests somewhere, and it's really ugly to work around chained exceptions in (cross-Py-version) doctests. I understand that this is helpful *in general*, though, i.e. for other kinds of exceptions in codecs, so maybe changing the exception handling in the doctest module could be a work-around for this kind of change? Stefan ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com