Re: [Python-Dev] peps: PEP 456: add some of the new implementation details to the PEP's text

2013-11-14 Thread Antoine Pitrou
On Wed, 13 Nov 2013 23:33:02 +0100 (CET)
christian.heimes  wrote:
>  
>  
> +Small string optimization
> +=
> +
> +Hash functions like SipHash24 have a costly initialization and finalization
> +code that can dominate speed of the algorithm for very short strings. On the
> +other hand Python calculates the hash value of short strings quite often. A
> +simple and fast function for especially for hashing of small strings can make
> +a measurably impact on performance. For example these measurements were taken
> +during a run of Python's regression tests. Additional measurements of other
> +code have shown a similar distribution.

Well, the text above talks about a "measurably (typo?) impact on
performance", but you aren't giving any performance numbers, which
doesn't help the reader of those lines.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-checkins] Daily reference leaks (784a02ec2a26): sum=522

2013-11-14 Thread Nick Coghlan
On 14 Nov 2013 13:52,  wrote:
>
> results for 784a02ec2a26 on branch "default"
> 
>
> test_codeccallbacks leaked [40, 40, 40] references, sum=120
> test_codeccallbacks leaked [40, 40, 40] memory blocks, sum=120
> test_codecs leaked [38, 38, 38] references, sum=114
> test_codecs leaked [24, 24, 24] memory blocks, sum=72
> test_email leaked [16, 16, 16] references, sum=48
> test_email leaked [16, 16, 16] memory blocks, sum=48

Hmm, it appears I have a reference leak somewhere.

Cheers,
Nick.

>
>
> Command line was: ['./python', '-m', 'test.regrtest', '-uall', '-R',
'3:3:/home/antoine/cpython/refleaks/reflogx2QIb_', '-x']
> ___
> Python-checkins mailing list
> python-check...@python.org
> https://mail.python.org/mailman/listinfo/python-checkins
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-checkins] Daily reference leaks (784a02ec2a26): sum=522

2013-11-14 Thread Nick Coghlan
On 14 Nov 2013 21:58, "Nick Coghlan"  wrote:
>
>
> On 14 Nov 2013 13:52,  wrote:
> >
> > results for 784a02ec2a26 on branch "default"
> > 
> >
> > test_codeccallbacks leaked [40, 40, 40] references, sum=120
> > test_codeccallbacks leaked [40, 40, 40] memory blocks, sum=120
> > test_codecs leaked [38, 38, 38] references, sum=114
> > test_codecs leaked [24, 24, 24] memory blocks, sum=72
> > test_email leaked [16, 16, 16] references, sum=48
> > test_email leaked [16, 16, 16] memory blocks, sum=48
>
> Hmm, it appears I have a reference leak somewhere.

Ah, Benjamin fixed it already. Thanks! :)

Cheers,
Nick.

>
> Cheers,
> Nick.
>
> >
> >
> > Command line was: ['./python', '-m', 'test.regrtest', '-uall', '-R',
'3:3:/home/antoine/cpython/refleaks/reflogx2QIb_', '-x']
> > ___
> > Python-checkins mailing list
> > python-check...@python.org
> > https://mail.python.org/mailman/listinfo/python-checkins
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-checkins] cpython: Close #17828: better handling of codec errors

2013-11-14 Thread Walter Dörwald

On 13.11.13 17:25, Nick Coghlan wrote:


On 14 November 2013 02:12, Nick Coghlan  wrote:

On 14 November 2013 00:30, Walter Dörwald  wrote:

On 13.11.13 14:51, nick.coghlan wrote:


http://hg.python.org/cpython/rev/854a2cea31b9
changeset:   87084:854a2cea31b9
user:Nick Coghlan 
date:Wed Nov 13 23:49:21 2013 +1000
summary:
Close #17828: better handling of codec errors

- output type errors now redirect users to the type-neutral
convenience functions in the codecs module
- stateless errors that occur during encoding and decoding
will now be automatically wrapped in exceptions that give
the name of the codec involved



Wouldn't it be better to add an annotation API to the exceptions classes?
This would allow to annotate all exceptions without having to replace the
exception object.


Hmm, it might be better to have the traceback machinery print the 
annotation information instead of BaseException.__str__, so we don't get 
any compatibility issues with custom __str__ implementations.



There's a reason the C API for this is private - it's a band aid fix,
because solving it properly is hard :)


Note that the specific problem with just annotating the exception
rather than a specific frame is that you lose the stack context for
where the annotation occurred. The current chaining workaround doesn't
just change the exception message, it also breaks the stack into two
pieces (inside and outside the codec) that get displayed separately.

Mostly though, it boils down to the fact that I'm far more comfortable
changing codec exception stack trace details in some cases than I am
proposing a new API for all exceptions this close to the Python 3.4
feature freeze.


Sure, this is something that might go into 3.5, but not 3.4.


A more elegant (and comprehensive) solution as a PEP for 3.5 would
certainly be a nice thing to have, but I think this is still much
better than the 3.3 status quo.


Thinking further about this, I like your "frame annotation" suggestion

Tracebacks could then look like this:

>>> b"hello".decode("uu_codec")
Traceback (most recent call last):
  File "", line 1, in : decoding with 'uu_codec' codec 
failed

ValueError: Missing "begin" line in input data

In fact the traceback already lays out the chain of events. What is 
missing is simply a little additional information.


Could frame annotation be added via decorators, i.e. something like this:

@annotate("while doing something with {param}")
def func(param):
   do something

annotate() would catch the exception, call .format() on the annotation 
string with the local variables of the frame as keyword arguments, 
attach the result to a special attribute of the frame and reraise the 
exception.


The traceback machinery would simply have to print this additional 
attribute.


Servus,
   Walter


Servus,
   Walter

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-checkins] Daily reference leaks (784a02ec2a26): sum=522

2013-11-14 Thread Benjamin Peterson
2013/11/14 Antoine Pitrou :
> On Thu, 14 Nov 2013 22:01:32 +1000
> Nick Coghlan  wrote:
>> On 14 Nov 2013 21:58, "Nick Coghlan"  wrote:
>> >
>> >
>> > On 14 Nov 2013 13:52,  wrote:
>> > >
>> > > results for 784a02ec2a26 on branch "default"
>> > > 
>> > >
>> > > test_codeccallbacks leaked [40, 40, 40] references, sum=120
>> > > test_codeccallbacks leaked [40, 40, 40] memory blocks, sum=120
>> > > test_codecs leaked [38, 38, 38] references, sum=114
>> > > test_codecs leaked [24, 24, 24] memory blocks, sum=72
>> > > test_email leaked [16, 16, 16] references, sum=48
>> > > test_email leaked [16, 16, 16] memory blocks, sum=48
>> >
>> > Hmm, it appears I have a reference leak somewhere.
>>
>> Ah, Benjamin fixed it already. Thanks! :)
>
> The reference leak task has been running for quite some time on my
> personal machine and I believe it has proven useful. I have no problem
> continuing running it on the same machine (which is mostly sitting idle
> anyway), but maybe it should rather be hosted on our CI infrastructure?
> Any suggestions?

Thank you very much for running that, btw. I'm sure we would have
released a lot of horribly leaking stuff without it.

>
> (the script is quite rough with hardcoded stuff, but beating it into
> better shape could be a nice target for first-time contributors)

Perhaps someone can figure out how to run it on one of the the buildbots?



-- 
Regards,
Benjamin
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-checkins] cpython: Close #17828: better handling of codec errors

2013-11-14 Thread Walter Dörwald

On 14.11.13 14:22, Walter Dörwald wrote:


On 13.11.13 17:25, Nick Coghlan wrote:


>> [...]

A more elegant (and comprehensive) solution as a PEP for 3.5 would
certainly be a nice thing to have, but I think this is still much
better than the 3.3 status quo.


Thinking further about this, I like your "frame annotation" suggestion

Tracebacks could then look like this:

 >>> b"hello".decode("uu_codec")
Traceback (most recent call last):
   File "", line 1, in : decoding with 'uu_codec' codec
failed
ValueError: Missing "begin" line in input data

In fact the traceback already lays out the chain of events. What is
missing is simply a little additional information.

Could frame annotation be added via decorators, i.e. something like this:

@annotate("while doing something with {param}")
def func(param):
do something

annotate() would catch the exception, call .format() on the annotation
string with the local variables of the frame as keyword arguments,
attach the result to a special attribute of the frame and reraise the
exception.

The traceback machinery would simply have to print this additional
attribute.


http://bugs.python.org/19585 is a patch that implements that. With the 
patch the following code:


   import traceback

   @traceback.annotate("while handling x={x!r}")
   def handle(x):
  raise ValueError(42)

   handle("spam")

will give the traceback:

   Traceback (most recent call last):
 File "spam.py", line 8, in 
   handle("spam")
 File "frame-annotation/Lib/traceback.py", line 322, in wrapped
   f(*args, **kwargs)
 File "spam.py", line 5, in handle: while handling x='spam'
   raise ValueError(42)
   ValueError: 42

Unfortunaty the frame from the decorator shows up in the traceback.

Servus,
   Walter

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] unicode Exception messages in py2.7

2013-11-14 Thread Chris Barker
Folks,

(note this is about 2.7 -- sorry, but a lot of us still use that! I
can only assume that in 3.* this is a non-issue)

I just discovered an issue that's been around a long time:

If you create an Exception with a unicode object for the message, the
message can be silently ignored if it can not be encoded to ASCII (or,
more properly, the default encoding).

In my use-case, I was parsing a text file (utf-8), and wanted a bit of
that text to be part of the Exception message (an error reading the
file, I wanted the user to know what the text was surrounding the
ill-formated part of the text file).

What I got was a blank message, and it took a lot of poking at it to
figure out why.

My solution was:

msg = u"Problem with line %i: %s This is not a
valid time slot"%(linenum, line)
raise ValueError(msg.encode('ascii', 'ignore'))

which is really pretty painfully clunky.

This is an issue brought up in various tutorial and blog posts, and
all the solutions I've seen involve some similar clunkiness.

I also found this issue in the issue tracker:

http://bugs.python.org/issue2517

Which was resolved years ago, but as far as I can tell, only solved
the problem of being able to do:

unicode(an_exception)

and get the proper unicode message object. But we still can't raise
the darn thing and expect the user to see the message.

Why is this the case? I can print a unicode object to the terminal,
why can't raising an Exception print a unicode object?

I can imagine for backward compatibility, or maybe for non-unicode
terminals, or ??? Exceptions do need to print as ascii. However,
having a message simply get swallowed up and disappear seems like the
wrong solution.

 - auto-conversion to a default encoding is fraught with problems all
over the board -- I know that. I also know that too much code would
break too often if we didn't have auto-conversion.

 - for the most part, the auto-conversion uses 'strict' mode -- I
generally dislike this, as it means code crashes when  odd stuff gets
introduced after testing, but I can see why it is done.

 - However, I can see why for raising Exceptions, the decision was
made to swallow that error, so that the actual Exception intended is
raised, rather than a new UnicodeEncodeError.

 - But combining 'strict' with ignoring the encoding exception seems
like the worst of both worlds.

So a proposal:

Use 'replace" mode for the encoding to the default, and at least the
user would see SOMETHING of the message. In a common case, it would be
a lot of ascii, and in the worse case it would be a lot of question
marks -- still better than a totally blank message.

Another option would be to use the str(repr(the_message)) so the user
would get the escaped version. Though I think that would be more ugly.

What am I missing? This seems so obvious, and easy to do (though maybe
it's buried in the C implementation of Exceptions)

-Chris

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] The pysandbox project is broken

2013-11-14 Thread Eli Bendersky
On Wed, Nov 13, 2013 at 10:27 AM, Brett Cannon  wrote:

>
>
>
> On Wed, Nov 13, 2013 at 1:05 PM, Eli Bendersky  wrote:
>
>>
>>
>>
>> On Wed, Nov 13, 2013 at 6:58 AM, Brett Cannon  wrote:
>>
>>>
>>>
>>>
>>> On Wed, Nov 13, 2013 at 6:30 AM, Facundo Batista <
>>> facundobati...@gmail.com> wrote:
>>>
 On Wed, Nov 13, 2013 at 4:37 AM, Maciej Fijalkowski 
 wrote:

 >> Do you think it would be productive to create an independent Python
 >> compiler, designed with sandboxing in mind from the beginning?
 >
 > PyPy sandbox does work FYI
 >
 > It might not do exactly what you want, but it both provides a full
 > python and security.

 If we have sandboxing using PyPy... what also we need to put Python
 running in the browser? (like javascript, you know)

 Thanks!

>>>
>>> You can try to get PNaCl to work with Python to get a Python executable
>>> that at least Chrome can run.
>>>
>>
>> Two corrections:
>>
>> 1. CPython already works with NaCl and PNaCl (there are working patches
>> in naclports to build it)
>>
>
> Anything that should be upstreamed?
>
>
>> 2. It can be used outside Chrome as well, using the standalone "sel_ldr"
>> tool that will then allow to run a sandboxed CPython .nexe from the command
>> line
>>
>
> Sure, but I was just thinking about the "in browser" question Facundo
> asked about.
>

FWIW, if you already have Chrome 31, go to:

http://commondatastorage.googleapis.com/nativeclient-mirror/naclports/pepper_33/988/publish/python/pnacl/index.html

This is CPython running on top of PNaCl, at near-native speed. With C
extensions. With threads. It's 2.7.5 but we'll put up 3.4 too soon (anyone
can do it though - based on naclports).

The first load takes a bit of time, afterwards it's cached and
instantaneous.

Now all that's left is for someone to come up with a friendly API to wrap
around the Pepper interface to conveniently access DOM :-)

Eli
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] unicode Exception messages in py2.7

2013-11-14 Thread Benjamin Peterson
2013/11/14 Chris Barker :
> So a proposal:
>
> Use 'replace" mode for the encoding to the default, and at least the
> user would see SOMETHING of the message. In a common case, it would be
> a lot of ascii, and in the worse case it would be a lot of question
> marks -- still better than a totally blank message.
>
> Another option would be to use the str(repr(the_message)) so the user
> would get the escaped version. Though I think that would be more ugly.

Unfortunately both of these things change behavior so cannot be
changed in Python 2.7.

>
> What am I missing? This seems so obvious, and easy to do (though maybe
> it's buried in the C implementation of Exceptions)


-- 
Regards,
Benjamin
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] unicode Exception messages in py2.7

2013-11-14 Thread Victor Stinner
2013/11/14 Chris Barker :
> (note this is about 2.7 -- sorry, but a lot of us still use that! I
> can only assume that in 3.* this is a non-issue)
>
> I just discovered an issue that's been around a long time:
>
> If you create an Exception with a unicode object for the message, (...)

In Python 2, there are too many similar corner cases. It is impossible
to fix these bugs without taking the risk of introducing a regression.

Seriously, *all* these tricky bugs are fixed in Python 3. So don't
loose time on trying to workaround them, but invest in the future:
upgrade to Python 3!

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] The pysandbox project is broken

2013-11-14 Thread Armin Rigo
Hi Victor,

On Wed, Nov 13, 2013 at 12:58 AM, Victor Stinner
 wrote:
> I now gave up on sandboxing Python. I just would like to warn other
> core developers that trying to put a sandbox in Python is not a good
> idea :-)

I cannot thank you enough for writing this mail :-)  It is a great
place to point people to when they come along with some superficial
idea about sandboxing Python.


A bientôt,

Armin.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] unicode Exception messages in py2.7

2013-11-14 Thread Tres Seaver
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 11/14/2013 04:02 PM, Benjamin Peterson wrote:
> 2013/11/14 Chris Barker :
>> So a proposal:
>> 
>> Use 'replace" mode for the encoding to the default, and at least
>> the user would see SOMETHING of the message. In a common case, it
>> would be a lot of ascii, and in the worse case it would be a lot of
>> question marks -- still better than a totally blank message.
>> 
>> Another option would be to use the str(repr(the_message)) so the
>> user would get the escaped version. Though I think that would be
>> more ugly.
> 
> Unfortunately both of these things change behavior so cannot be 
> changed in Python 2.7.

Fixing any bug is "changing behavior";  2.7 is not frozen for bugfixes.
The real question is whether third-party code will break when the
now-empty error messages appear with '?' littered through them?

About the only things I can think of which might break would be doctests,
but people *expect* those to break across third-dot releases of Python
(one reason why I hate them).  Exception repr is explicitly *not* part of
any backward-compatibility guarantees in Python.  Or code which
explicitly works around the breakage could fail (urlparse changes between
2.7.3 and 2.7.4, anyone?d(


Tres.
- -- 
===
Tres Seaver  +1 540-429-0999  tsea...@palladion.com
Palladion Software   "Excellence by Design"http://palladion.com
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlKFRscACgkQ+gerLs4ltQ6JIgCgvNxHugjjbR3L1crSDK0QJiLb
LSYAn2cJnZ8almcfCmWHKhOnCP69bpB3
=MIFq
-END PGP SIGNATURE-

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Add transform() and untranform() methods

2013-11-14 Thread Victor Stinner
Hi,

I saw that Nick Coghlan documented codecs.encode() and
codecs.decode(), and changed the exception raised when codecs like
rot_13 are used on bytes.decode() and str.encode().

I don't like the functions codecs.encode() and codecs.decode() because
the type of the result depends on the encoding (second parameter). We
try to avoid this in Python.

I would prefer to split the registry of codecs to have 3 registries:

- "encoding" (a better name can found): encode str=>bytes, decode bytes=>str
- bytes: encode bytes=>bytes, decode bytes=>bytes
- str:  encode str=>str, decode str=>str

And add transform() and untransform() methods to bytes and str types.
In practice, it might be same codecs registry for all codecs just with
a new attribute.

Examples:

- utf8: encoding
- zlib: bytes
- rot13: str

The result type of bytes.transform/untransform would be bytes, and the
result type of str.transform/untransform would be str.

I don't know which exception should be raised when a codec is used in
the wrong method. LookupError? TypeError "codec xxx cannot be used
with method xxx.xx"? Something else?

codecs.encode/decode() documentation should be removed. The functions
should be kept, just in case if someone uses them.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Add transform() and untranform() methods

2013-11-14 Thread Victor Stinner
Oh, I forgot to mention that I sent this email in reaction to this issue:

http://bugs.python.org/issue19585

Modifying the critical PyFrameObject because the codecs API raises
surprising errors doesn't sound correct. I prefer to fix how codecs
are used, than modifying the PyFrameObject.

For more information, see the issue #7475 which a long history (4
years) and many messages. Martin von Loewis wrote "I would still be
opposed to such a change, and I think it needs a PEP." and I still
agree with him on this point. Because they are different opinions and
no consensus, a PEP is required to explain why we took this decision
and list rejected alternatives.

http://bugs.python.org/issue7475

Victor

2013/11/14 Victor Stinner :
> Hi,
>
> I saw that Nick Coghlan documented codecs.encode() and
> codecs.decode(), and changed the exception raised when codecs like
> rot_13 are used on bytes.decode() and str.encode().
>
> I don't like the functions codecs.encode() and codecs.decode() because
> the type of the result depends on the encoding (second parameter). We
> try to avoid this in Python.
>
> I would prefer to split the registry of codecs to have 3 registries:
>
> - "encoding" (a better name can found): encode str=>bytes, decode bytes=>str
> - bytes: encode bytes=>bytes, decode bytes=>bytes
> - str:  encode str=>str, decode str=>str
>
> And add transform() and untransform() methods to bytes and str types.
> In practice, it might be same codecs registry for all codecs just with
> a new attribute.
>
> Examples:
>
> - utf8: encoding
> - zlib: bytes
> - rot13: str
>
> The result type of bytes.transform/untransform would be bytes, and the
> result type of str.transform/untransform would be str.
>
> I don't know which exception should be raised when a codec is used in
> the wrong method. LookupError? TypeError "codec xxx cannot be used
> with method xxx.xx"? Something else?
>
> codecs.encode/decode() documentation should be removed. The functions
> should be kept, just in case if someone uses them.
>
> Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] peps: PEP 456: add some of the new implementation details to the PEP's text

2013-11-14 Thread Terry Reedy

On 11/14/2013 4:00 AM, Antoine Pitrou wrote:

On Wed, 13 Nov 2013 23:33:02 +0100 (CET)
christian.heimes  wrote:



+Small string optimization
+=
+
+Hash functions like SipHash24 have a costly initialization and finalization
+code that can dominate speed of the algorithm for very short strings. On the
+other hand Python calculates the hash value of short strings quite often. A
+simple and fast function for especially for hashing of small strings can make


'for especially for hashing' is garbled. Delete first 'for'.


+a measurably impact on performance. For example these measurements were taken


'measurable'


+during a run of Python's regression tests. Additional measurements of other
+code have shown a similar distribution.


--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] unicode Exception messages in py2.7

2013-11-14 Thread Terry Reedy

On 11/14/2013 4:55 PM, Tres Seaver wrote:


About the only things I can think of which might break would be doctests,
but people *expect* those to break across third-dot releases of Python
(one reason why I hate them).


My impression is that we avoid enhancing correct exception messages in 
bugfix (third-dot) releases because of both doctests and other in-code 
examination of messages.


> Exception repr is explicitly *not* part of

any backward-compatibility guarantees in Python.


So we more freely change exception messages in version (second-dot) 
releases, without deprecation notices or waiting periods.


--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-checkins] cpython: Close #17828: better handling of codec errors

2013-11-14 Thread Greg Ewing

Walter Dörwald wrote:

Unfortunaty the frame from the decorator shows up in the traceback.


Maybe the decorator could remove its own frame from
the traceback?

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Add transform() and untranform() methods

2013-11-14 Thread Nick Coghlan
On 15 Nov 2013 08:34, "Victor Stinner"  wrote:
>
> Hi,
>
> I saw that Nick Coghlan documented codecs.encode() and
> codecs.decode(), and changed the exception raised when codecs like
> rot_13 are used on bytes.decode() and str.encode().
>
> I don't like the functions codecs.encode() and codecs.decode() because
> the type of the result depends on the encoding (second parameter). We
> try to avoid this in Python.

The type signature of those functions is just object -> object (Similar to
the way the 2.x convenience methods were actually basestring -> basestring).

> I would prefer to split the registry of codecs to have 3 registries:
>
> - "encoding" (a better name can found): encode str=>bytes, decode
bytes=>str
> - bytes: encode bytes=>bytes, decode bytes=>bytes
> - str:  encode str=>str, decode str=>str
>

You have to get it out of your head that codecs are just about text and and
binary data. They're not: they're arbitrary type transforms, and MAL
deliberately wrote the module that way.

> And add transform() and untransform() methods to bytes and str types.
> In practice, it might be same codecs registry for all codecs just with
> a new attribute.

This is completely the wrong approach. There's zero justification for
adding new builtin methods for this use case - encoding and decoding are
generic operations, they should use functions not methods.

What could be useful is allowing CodecInfo objects to supply an "expected
input type" and an "expected output type" (ABCs and instance check
overrides make that quite flexible).

>
> Examples:
>
> - utf8: encoding
> - zlib: bytes
> - rot13: str
>
> The result type of bytes.transform/untransform would be bytes, and the
> result type of str.transform/untransform would be str.
>
> I don't know which exception should be raised when a codec is used in
> the wrong method. LookupError? TypeError "codec xxx cannot be used
> with method xxx.xx"? Something else?

We already do this check in the existing convenience methods - it raises
TypeError.

>
> codecs.encode/decode() documentation should be removed. The functions
> should be kept, just in case if someone uses them.

No. They're part of the regression test suite, and have been since Python
2.4. They embody MAL's intended "arbitrary type transform library"
approach. They provide a source compatible mechanism for using binary
codecs in single code base Python 2/3 projects.

At this point, the only person that can get me to revert this clarification
of MAL's original vision for the codecs module is Guido, since anything
else completely fails to address the Python 3 adoption barrier posed by the
current state of Python 3's binary codec support.

Note that the only behavioural changes in the commits so far were to
exception handling - everything else was just docs.

The next planned commit (to restore the binary codec aliases) *is* a
behavioural change - that's why I posted to the list about it (it received
only two responses, both +1)

Cheers,
Nick.

>
> Victor
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Add transform() and untranform() methods

2013-11-14 Thread Nick Coghlan
On 15 Nov 2013 08:42, "Victor Stinner"  wrote:
>
> Oh, I forgot to mention that I sent this email in reaction to this issue:
>
> http://bugs.python.org/issue19585
>
> Modifying the critical PyFrameObject because the codecs API raises
> surprising errors doesn't sound correct. I prefer to fix how codecs
> are used, than modifying the PyFrameObject.
>
> For more information, see the issue #7475 which a long history (4
> years) and many messages. Martin von Loewis wrote "I would still be
> opposed to such a change, and I think it needs a PEP." and I still
> agree with him on this point. Because they are different opinions and
> no consensus, a PEP is required to explain why we took this decision
> and list rejected alternatives.
>
> http://bugs.python.org/issue7475

Martin wrote that before it was pointed out there were existing functions
to handle the problem (I was asking for a PEP back then, too).

I posted my plan for dealing with this months ago without receiving any
complaints, and I'm annoyed you waited until I had actually followed
through and implemented it to complain about it and ask for Python 3's
binary codec support to stay broken instead :P

(Starting a new thread instead of replying to the one where I specifically
asked about taking the next step does nothing to improve my mood)

Regards,
Nick.

>
> Victor
>
> 2013/11/14 Victor Stinner :
> > Hi,
> >
> > I saw that Nick Coghlan documented codecs.encode() and
> > codecs.decode(), and changed the exception raised when codecs like
> > rot_13 are used on bytes.decode() and str.encode().
> >
> > I don't like the functions codecs.encode() and codecs.decode() because
> > the type of the result depends on the encoding (second parameter). We
> > try to avoid this in Python.
> >
> > I would prefer to split the registry of codecs to have 3 registries:
> >
> > - "encoding" (a better name can found): encode str=>bytes, decode
bytes=>str
> > - bytes: encode bytes=>bytes, decode bytes=>bytes
> > - str:  encode str=>str, decode str=>str
> >
> > And add transform() and untransform() methods to bytes and str types.
> > In practice, it might be same codecs registry for all codecs just with
> > a new attribute.
> >
> > Examples:
> >
> > - utf8: encoding
> > - zlib: bytes
> > - rot13: str
> >
> > The result type of bytes.transform/untransform would be bytes, and the
> > result type of str.transform/untransform would be str.
> >
> > I don't know which exception should be raised when a codec is used in
> > the wrong method. LookupError? TypeError "codec xxx cannot be used
> > with method xxx.xx"? Something else?
> >
> > codecs.encode/decode() documentation should be removed. The functions
> > should be kept, just in case if someone uses them.
> >
> > Victor
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] "*zip-bomb" via codecs

2013-11-14 Thread Serhiy Storchaka
It is possible make a DDoS using the fact that codecs registry provides 
access to gzip and bzip2 decompressor. Someone can send HTTP request or 
email message with specified "gzip_codec" or "bzip2_codec" as content 
encoding and great well compressed gzip- or bzip2-file as a content. 
Naive server will use the bytes.decode() method to decompress a content. 
It is possible to create small compressed files which require very much 
time and memory to decompress. Of course bytes.decode() will fail 
becouse decoder returns bytes instead string, but time and memory are 
already wasted.


I have no working example but I'm sure it will be easy to create it. I 
suspect many services will be vulnerable for this attack.


Simple solution for this problem is check any foreign encoding that it 
is conteined in a special set of safe encodings. But every program 
should check it explicitly. For more general solution bytes.decode() 
should reject encoding *before* starting of decoding. I.e. either all 
bytes->str decoders should be registered in separated registry, or all 
codecs should have additional attributes which determines input and 
output type.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] unicode Exception messages in py2.7

2013-11-14 Thread Ethan Furman

On 11/14/2013 02:59 PM, Terry Reedy wrote:

On 11/14/2013 4:55 PM, Tres Seaver wrote:


About the only things I can think of which might break would be doctests,
but people *expect* those to break across third-dot releases of Python
(one reason why I hate them).


My impression is that we avoid enhancing correct exception messages in bugfix 
(third-dot) releases because of both
doctests and other in-code examination of messages.


But these exception messages are incorrect, and so we are okay to fix them, yes?

--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Add transform() and untranform() methods

2013-11-14 Thread Serhiy Storchaka

15.11.13 01:03, Nick Coghlan написав(ла):

We already do this check in the existing convenience methods - it raises
TypeError.


The problem with this check is that it happens *after* 
encoding/decoding. This opens door for DoS (see my last message).



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Add transform() and untranform() methods

2013-11-14 Thread Nick Coghlan
On 15 Nov 2013 09:11, "Nick Coghlan"  wrote:
>
>
> On 15 Nov 2013 08:42, "Victor Stinner"  wrote:
> >
> > Oh, I forgot to mention that I sent this email in reaction to this
issue:
> >
> > http://bugs.python.org/issue19585
> >
> > Modifying the critical PyFrameObject because the codecs API raises
> > surprising errors doesn't sound correct. I prefer to fix how codecs
> > are used, than modifying the PyFrameObject.
> >
> > For more information, see the issue #7475 which a long history (4
> > years) and many messages. Martin von Loewis wrote "I would still be
> > opposed to such a change, and I think it needs a PEP." and I still
> > agree with him on this point. Because they are different opinions and
> > no consensus, a PEP is required to explain why we took this decision
> > and list rejected alternatives.
> >
> > http://bugs.python.org/issue7475
>
> Martin wrote that before it was pointed out there were existing functions
to handle the problem (I was asking for a PEP back then, too).
>
> I posted my plan for dealing with this months ago without receiving any
complaints, and I'm annoyed you waited until I had actually followed
through and implemented it to complain about it and ask for Python 3's
binary codec support to stay broken instead :P

Something I *would* be entirely happy to do is write a retroactive PEP
after beta 1  is out the door, explaining the history of this issue in a
more coherent form than the comment history on issue 7475 and the many
child issues it spawned.

This would also provide a better launching point for other enhancements in
Python 3.5 (frame annotations to remove the need for the exception chaining
hack and better input validation mechanisms for codecs that allow the
convenience methods to check that case explicitly rather than relying on
the exception chaining).

Cheers,
Nick.

>
> (Starting a new thread instead of replying to the one where I
specifically asked about taking the next step does nothing to improve my
mood)
>
> Regards,
> Nick.
>
> >
> > Victor
> >
> > 2013/11/14 Victor Stinner :
> > > Hi,
> > >
> > > I saw that Nick Coghlan documented codecs.encode() and
> > > codecs.decode(), and changed the exception raised when codecs like
> > > rot_13 are used on bytes.decode() and str.encode().
> > >
> > > I don't like the functions codecs.encode() and codecs.decode() because
> > > the type of the result depends on the encoding (second parameter). We
> > > try to avoid this in Python.
> > >
> > > I would prefer to split the registry of codecs to have 3 registries:
> > >
> > > - "encoding" (a better name can found): encode str=>bytes, decode
bytes=>str
> > > - bytes: encode bytes=>bytes, decode bytes=>bytes
> > > - str:  encode str=>str, decode str=>str
> > >
> > > And add transform() and untransform() methods to bytes and str types.
> > > In practice, it might be same codecs registry for all codecs just with
> > > a new attribute.
> > >
> > > Examples:
> > >
> > > - utf8: encoding
> > > - zlib: bytes
> > > - rot13: str
> > >
> > > The result type of bytes.transform/untransform would be bytes, and the
> > > result type of str.transform/untransform would be str.
> > >
> > > I don't know which exception should be raised when a codec is used in
> > > the wrong method. LookupError? TypeError "codec xxx cannot be used
> > > with method xxx.xx"? Something else?
> > >
> > > codecs.encode/decode() documentation should be removed. The functions
> > > should be kept, just in case if someone uses them.
> > >
> > > Victor
> > ___
> > Python-Dev mailing list
> > Python-Dev@python.org
> > https://mail.python.org/mailman/listinfo/python-dev
> > Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Add transform() and untranform() methods

2013-11-14 Thread Serhiy Storchaka

15.11.13 00:32, Victor Stinner написав(ла):

And add transform() and untransform() methods to bytes and str types.
In practice, it might be same codecs registry for all codecs just with
a new attribute.


If the transform() method will be added, I prefer to have only one 
transformation method and specify a direction by the transformation name 
("bzip2"/"unbzip2").


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] unicode Exception messages in py2.7

2013-11-14 Thread Steven D'Aprano
On Thu, Nov 14, 2013 at 04:55:19PM -0500, Tres Seaver wrote:

> Fixing any bug is "changing behavior";  2.7 is not frozen for bugfixes.

It's not a given that the current behaviour *is* a bug. Exception 
messages in 2 are byte-strings, not Unicode. Trying to use Unicode 
instead is not, as far as I can tell, supported behaviour.

If the exception message cannot be converted to a byte-string, 
suppressing the display of the message seems like perfectly reasonable 
behaviour to me:

py> class NoString:
... def __str__(self):
... raise ValueError
...
py> msg = NoString
py> msg = NoString()
py> print msg
Traceback (most recent call last):
  File "", line 1, in ?
  File "", line 3, in __str__
ValueError
py> raise TypeError(msg)
Traceback (most recent call last):
  File "", line 1, in ?
TypeErrorpy>

although it would be nice if a newline was used so the prompt was bumped 
to the next line.

The point is, I'm not convinced that this is a bug at all.


> The real question is whether third-party code will break when the
> now-empty error messages appear with '?' littered through them?

This behaviour goes back to at least Python 2.4, the oldest version I 
have easy access to at the moment that includes Unicode. Given that this 
alleged bug has been around for so long, I don't think that it effects 
terribly many people. That implies that fixing it won't benefit many 
people either.


> About the only things I can think of which might break would be doctests,
> but people *expect* those to break across third-dot releases of Python

Which people? I certainly don't expect doctests to break unless I've 
done something silly.


> (one reason why I hate them).  Exception repr is explicitly *not* part of
> any backward-compatibility guarantees in Python.

Do you have a link for that explicit non-guarantee from the docs please?



-- 
Steven
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] unicode Exception messages in py2.7

2013-11-14 Thread Chris Barker
On Thu, Nov 14, 2013 at 1:55 PM, Tres Seaver  wrote:

> Fixing any bug is "changing behavior";  2.7 is not frozen for bugfixes.

Thank you.

> The real question is whether third-party code will break when the
> now-empty error messages appear with '?' littered through them?

right -- any bugfix changes behaviour, and any that can break any test
or code that is expecting (or working around) that behavior. So the
key question here is are there many (any?) tests or function code out
there that are counting on an empty message if and only if there
happens to be a non-ascii charactor in an assigned message.

It's hard for me to imagine that that's a common thing to test for,
but then I'm been known to lack imagination ;-)

> About the only things I can think of which might break would be doctests,
> but people *expect* those to break across third-dot releases of Python
> (one reason why I hate them).  Exception repr is explicitly *not* part of
> any backward-compatibility guarantees in Python.  Or code which
> explicitly works around the breakage could fail (urlparse changes between
> 2.7.3 and 2.7.4, anyone?d(

Sounds do-able to me, then...

-Thanks,
  -Chris


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] unicode Exception messages in py2.7

2013-11-14 Thread Chris Barker
On Thu, Nov 14, 2013 at 1:20 PM, Victor Stinner

>> If you create an Exception with a unicode object for the message, (...)
>
> In Python 2, there are too many similar corner cases. It is impossible
> to fix these bugs without taking the risk of introducing a regression.

Yes, there are -- the auto-encoding is a serious pain everywhere.

However, this is a case where the resulting Exception is silenced --
it's the only one I know of, and there can't be many like that.

> Seriously, *all* these tricky bugs are fixed in Python 3. So don't
> loose time on trying to workaround them, but invest in the future:
> upgrade to Python 3!

Maybe so -- but we are either maintaining 2.7 or not -- it WIL be
around for along time yet...

(amazing to me how many people are still using <=2.7, actually, even
for new projects .. thank you Red Hat "Enterprise" Linux ;-) )

-Chris
-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] unicode Exception messages in py2.7

2013-11-14 Thread Chris Barker
On Thu, Nov 14, 2013 at 3:58 PM, Steven D'Aprano  wrote:

> It's not a given that the current behaviour *is* a bug.

I'll concede that it's not a bug unless someone said somewhere that
unicode messages should work .. but that's kind of a semantic
argument.

I have to say it's a very odd choice to me that it suppresses the
message, rather than raising an encoding error, like what happens
everywhere else the default encoding is used.

In fact, I noticed that the message can be anything that can be
stringified, which makes it particularly wacky that you can't use a
unicode object.

> Exception
> messages in 2 are byte-strings, not Unicode.

well, they are anything that you can call str() on anyway...

> Trying to use Unicode
> instead is not, as far as I can tell, supported behaviour.

clearly not

> If the exception message cannot be converted to a byte-string,
> suppressing the display of the message seems like perfectly reasonable
> behaviour to me:

well, yes and no -- the fact is that unicode objects ARE special --
and it wouldn't hurt to treat them that way. And I'm not sure that
suppressing the message when you've passed in a weird object that
raises an exception when you try to convert it to a string makes sense
either -- suppressing an exception is really not a good idea in
general -- you really should have a good reason for it. I'm guessing
that this was put in to save a lot of crashing from unicode objects,
but what do I know?

Actually, when I think about it, Exceptions being raised when you call
str(0 on something are probably pretty rare -- if you define a class
with no __str__ method, you get a default string version -- there
can't be many use-cases where you want to make sure no one tries to
make a string out of your object...

> although it would be nice if a newline was used so the prompt was bumped
> to the next line.

yup -- that would be good.

> The point is, I'm not convinced that this is a bug at all.

OK -- to clarify the discussion a bit:

I think we all agree that this is not a fatal bug that MUST be fixed.

Is this something that could be improved  or is the current behavior
the best we could have, given the limitations of strings an unicode in
py2 anyway?

If it's not a desirable change, then we're done -- sorry for the noise.

If it is a desirable change, then is the benefit worth the possible
breakage of code. Do assess that, you need to trade off the size of
the benefit with the amount of breakage.

I think it would be a pretty nice benefit

I can't see that it would cause a lot of breakage.

Any idea how we could assess how much code or tests are out there in
the would that this would affect?

I contend that it wouldn't be much because:

If I had thought to write a test for this, I would have thought to fix
my code so that it would either never use a unicode object for a
message, or, like I have done in my code, encode it when passing it in
to the Exception.

There is certainly a chance that some doctests would break, if people
had not looked carefully at them -- i.e. that wanted to test that the
exception was raised, but did not notice that the message didn't get
through.

How many are there? who knows?

-Chris

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Add transform() and untranform() methods

2013-11-14 Thread Terry Reedy

On 11/14/2013 5:32 PM, Victor Stinner wrote:


I don't like the functions codecs.encode() and codecs.decode() because
the type of the result depends on the encoding (second parameter). We
try to avoid this in Python.


Such dependence is common with arithmetic.

>>> 1 + 2
3
>>> 1 + 2.0
3.0
>>> 1 + 2+0j
(3+0j)

>>> sum((1,2,3), 0)
6
>>> sum((1,2,3), 0.0)
6.0
>>> sum((1,2,3), 0.0+0j)
(6+0j)

for f in (compile, eval, getattr, iter, max, min, next, open, pow, 
round, type, vars):

  type(f(*args)) # depends on the inputs
That is a large fraction of the non-class builtin functions.

--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Add transform() and untranform() methods

2013-11-14 Thread Terry Reedy

On 11/14/2013 6:03 PM, Nick Coghlan wrote:


You have to get it out of your head that codecs are just about text and
and binary data.


99+% of the current codec module doc leads one to that impression. The 
fact that codecs are expected to have a file reader and writer and that 
the default 'strict' error handler is specified in 2 out of the 3 mostly 
redundant lists as raising a UnicodeError reinforces the impression.



They're not: they're arbitrary type transforms, and MAL
deliberately wrote the module that way.


Generic functions are quite pythonic. However, I am not sure how much 
benefit there is to registering an arbitrary pair of bijective functions



This is completely the wrong approach. There's zero justification for
adding new builtin methods for this use case - encoding and decoding are
generic operations, they should use functions not methods.


Making 2&3 code easier is certainly a good reason for the codecs approach.


The next planned commit (to restore the binary codec aliases) *is* a
behavioural change - that's why I posted to the list about it (it
received only two responses, both +1)


If I understand correctly, I am mildly +1, but did not respond, thinking 
that 2 to 0 was sufficient response for you to continue ;-).


--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] unicode Exception messages in py2.7

2013-11-14 Thread Terry Reedy

On 11/14/2013 6:57 PM, Chris Barker wrote:

On Thu, Nov 14, 2013 at 1:20 PM, Victor Stinner



Seriously, *all* these tricky bugs are fixed in Python 3. So don't
loose time on trying to workaround them, but invest in the future:
upgrade to Python 3!


Maybe so -- but we are either maintaining 2.7 or not


That statement is too 'binary'. We normally fix general bugs* for two 
years and security bugs for 3 more years. That is already 'trinary'. For 
2.7, we have already done 3 1/2 years of general bug fixing. I expect 
that that will taper off for the next 1 1/2 years.


* We sometimes do not back port a bug fix that theorectically could be 
backported because we think it would be too disruptive (because people 
depend on the bug). When we fix a bug with a feature change that cannot 
be backported, we do not usually create a separate backport patch unless 
the bug is severe. In either case, people who want the fix must upgrade.


Many unicode bugs in 2.x were fixed in 3.0 by making unicode the text 
type. For some but not all unicode issues, separate patches have been 
made for 2.7. People who want the general fix must upgrade. (The unicode 
future import gives some of the benefits, but maybe not all.)  A few 
more unicode bugs were fixed in 3.3 with the flexible string 
representation. People who want the 3.3 fix must upgrade, even from 3.2.



-- it WIL be around for along time yet...


1.5 was around for a long time; not sure if it is completely gone yet.

--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] unicode Exception messages in py2.7

2013-11-14 Thread Steven D'Aprano
On Thu, Nov 14, 2013 at 09:09:06PM -0500, Terry Reedy wrote:

> 1.5 was around for a long time; not sure if it is completely gone yet.

It's not. I forget the details, but after the last American PyCon, 
somebody posted a message about a fellow they met who was still using 
1.5 in production.


-- 
Steven
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] unicode Exception messages in py2.7

2013-11-14 Thread Steven D'Aprano
On Thu, Nov 14, 2013 at 04:02:17PM -0800, Chris Barker wrote:
> On Thu, Nov 14, 2013 at 1:55 PM, Tres Seaver  wrote:
> 
> > Fixing any bug is "changing behavior";  2.7 is not frozen for bugfixes.
> 
> Thank you.
> 
> > The real question is whether third-party code will break when the
> > now-empty error messages appear with '?' littered through them?
> 
> right -- any bugfix changes behaviour

It isn't clear that this is a bug at all.

Non-ascii Unicode strings are just a special case of the more general 
problem of what to do if printing the exception raises. If 
str(exception.message) raises, suppressing the message seems like a 
perfectly reasonable approach to me.



-- 
Steven
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] unicode Exception messages in py2.7

2013-11-14 Thread Terry Reedy

On 11/14/2013 7:41 PM, Chris Barker wrote:

On Thu, Nov 14, 2013 at 3:58 PM, Steven D'Aprano  wrote:


It's not a given that the current behaviour *is* a bug.


I'll concede that it's not a bug unless someone said somewhere that
unicode messages should work


In particular, what does the reference manual say.


.. but that's kind of a semantic argument.


Given that committing a patch to an existing version is a binary action 
-- done or not, we have to have a binary semantic decision, 'bug' or 
not, even when the best answer is 'sort of'. We cannot 'sort of' apply a 
patch ;-).



I have to say it's a very odd choice to me that it suppresses the
message, rather than raising an encoding error, like what happens
everywhere else the default encoding is used.


An encoding exception is raised but ignored. Exception handling has 
changed in some details in 3.x. Sometimes two sensible actions interact 
in certain contexts to produce an odd result.



In fact, I noticed that the message can be anything that can be
stringified, which makes it particularly wacky that you can't use a
unicode object.


You can, as long as it can be stringified with the default args. If it 
cannot be, then convert it yourself, with the alternative you choose 
(raise or substitute).



Is this something that could be improved  or is the current behavior
the best we could have, given the limitations of strings an unicode in
py2 anyway?


From our (core developer viewpoint) that is the wrong question. 2.7 
does not get enhancements. The situation would be different if there 
were going to be a 2.8.


--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] unicode Exception messages in py2.7

2013-11-14 Thread Cameron Simpson
On 14Nov2013 15:57, Chris Barker - NOAA Federal  wrote:
> (amazing to me how many people are still using <=2.7, actually, even
> for new projects .. thank you Red Hat "Enterprise" Linux ;-) )

Well, one of the things RHEL gets you is platform stability (they
backport fixes; primarily security in the older RHEL streams). So
of course the Python dates to the time of the release.

I install a current Python 2.7 into /usr/local on many RHEL boxes
and target that for custom code.
-- 
Cameron Simpson 

There is this special biologist word we use for 'stable'. It is 'dead'.
- Jack Cohen
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] unicode Exception messages in py2.7

2013-11-14 Thread Cameron Simpson
On 15Nov2013 14:08, Steven D'Aprano  wrote:
> On Thu, Nov 14, 2013 at 04:02:17PM -0800, Chris Barker wrote:
> > right -- any bugfix changes behaviour
> 
> It isn't clear that this is a bug at all.
> 
> Non-ascii Unicode strings are just a special case of the more general 
> problem of what to do if printing the exception raises. If 
> str(exception.message) raises, suppressing the message seems like a 
> perfectly reasonable approach to me.

Not to me. Silent failure is really nasty. In fact, doesn't the Zen
speak explicitly against it?

I'm debugging a program right now with silent failures; my own code,
with functions submitted to a queue for asynchronous execution, and
the queue preserves the function result (or exception) for collection
later; if that collection doesn't happen you get... silent failure!

I think that if an exception escapes to the outside for reporting,
if the reporting raises an exception (especially an "expectable"
one like unicode coding/decoding errors), the reporting should have
at least a layer of "ouch, report failed, try something uglier but
more conservative". At least you'd know there had been a failure.

Cheers,
-- 
Cameron Simpson 

Windows is really user friendly - it doesn't crash on its own, it first
opens a dialog box, saying it will crash and you have to click OK :-)
- Zoltan Kocsi
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] unicode Exception messages in py2.7

2013-11-14 Thread Steven D'Aprano
On Fri, Nov 15, 2013 at 02:28:48PM +1100, Cameron Simpson wrote:

> > Non-ascii Unicode strings are just a special case of the more general 
> > problem of what to do if printing the exception raises. If 
> > str(exception.message) raises, suppressing the message seems like a 
> > perfectly reasonable approach to me.
> 
> Not to me. Silent failure is really nasty. In fact, doesn't the Zen
> speak explicitly against it?

But its not really a silent failure, since you're already dealing with 
an exception, and that's the important one. The original exception is 
not suppressed, just the error message. If the original exception was 
replaced with a different exception:

# this doesn't actually happen
py> raise ValueError(u"¿what?")
Traceback (most recent call last):
  File "", line 1, in ?
TypeError: error displaying exception message
py>

or lost altogether:

# neither does this
py> raise ValueError(u"¿what?")
py> 


then I would consider that a bug. Ideally, we should get a chained 
exception so you can see both the original and subsequent exceptions:

Traceback (most recent call last):
  File "", line 2, in 
ValueError: 

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "", line 4, in 
UnicodeEncodeError: 'ascii' codec can't encode character u'\xbf' in 
position 0: ordinal not in range(128)

but Python 2 doesn't have chained exceptions so that's not an option.

As for the Zen, the nice thing about that is that it can argue both 
sides of most questions :-) The Zen has something else to say about 
this: Special cases aren't special enough to break the rules.

Except as the next line in the Zen suggests, sometimes they are :-)

UnicodeEncoding errors are just a special case of arbitrary objects that 
can't be converted to byte strings. If the exception message can't be 
stringified, in general there's really nothing you can do about it. I 
suppose one might argue for inserting a meta-error message:

ValueError: ***the error message could not be displayed***

but that strikes me as too subtle, potentially confusing, and generally 
problematic. Ultimately, in the absense of chained exceptions I don't 
think there's any good solution to the general problem, and I'm not 
convinced that treating Unicode strings as a special case is justified. 
It's been at least four, and possibly six (back to 2.2) point releases 
with this behaviour, and until now apparently nobody has noticed.


-- 
Steven
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Add transform() and untranform() methods

2013-11-14 Thread Walter Dörwald
Am 15.11.2013 um 00:42 schrieb Serhiy Storchaka :
> 
> 15.11.13 00:32, Victor Stinner написав(ла):
>> And add transform() and untransform() methods to bytes and str types.
>> In practice, it might be same codecs registry for all codecs just with
>> a new attribute.
> 
> If the transform() method will be added, I prefer to have only one 
> transformation method and specify a direction by the transformation name 
> ("bzip2"/"unbzip2").

+1

Some of the transformations might not be revertible (s.transform("lower")? ;))

And the transform function probably doesn't need any error handling machinery.

What about the stream/iterator/incremental parts of the codec API?

Servus,
   Walter

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Add transform() and untranform() methods

2013-11-14 Thread Nick Coghlan
On 15 November 2013 11:10, Terry Reedy  wrote:
> On 11/14/2013 5:32 PM, Victor Stinner wrote:
>
>> I don't like the functions codecs.encode() and codecs.decode() because
>> the type of the result depends on the encoding (second parameter). We
>> try to avoid this in Python.
>
>
> Such dependence is common with arithmetic.
>
 1 + 2
> 3
 1 + 2.0
> 3.0
 1 + 2+0j
> (3+0j)
>
 sum((1,2,3), 0)
> 6
 sum((1,2,3), 0.0)
> 6.0
 sum((1,2,3), 0.0+0j)
> (6+0j)
>
> for f in (compile, eval, getattr, iter, max, min, next, open, pow, round,
> type, vars):
>   type(f(*args)) # depends on the inputs
> That is a large fraction of the non-class builtin functions.

*Type* dependence between inputs and outputs is common (and completely
non-controversial). The codecs system is different, since the
supported input and output types are *value* dependent, driven by the
name of the codec.

That's the part which makes the codec machinery interesting in
general, since it combines a value driven lazy loading mechanism
(based on the codec name) with the subsequent invocation of that
mechanism: the default codec search algorithm goes hunting in the
"encodings" package (or the alias dictionary), but you can register
custom search algorithms and provide encodings any way you want. It
does mean, however, that the most you can claim for the type signature
of codecs.encode and codecs.decode is that they accept an object and
return an object. Beyond that, it's completely driven by the value of
the codec.

In Python 2.x, the type constraints imposed by the str and unicode
convenience methods is "basestring in, basestring out". As it happens,
all of the standard library codecs abide by that restriction , so it
was easy to interpret the codecs module itself as having the same
"basestring in, basestring out" limitation, especially given the heavy
focus on text encodings in the way it was documented. In practice, the
codecs weren't that open ended - some of them only accepted 8 bit
strings, some only accepted unicode, some accepted both (perhaps
relying on implicit decoding to unicode),

The migration to Python 3 made the contrast between the two far more
stark however, hence the long and involved discussion on issue 7475,
and the fact that the non-Unicode codecs are currently still missing
their shorthand aliases.

The proposal I posted to issue 7475 back in April (and, in the absence
of any objections to the proposal, finally implemented over the past
few weeks) was to take advantage of the fact that the codecs.encode
and codecs.decode convenience functions exist (and have been covered
by the regression test suite) as far back as Python 2.4. I did this
merely by documenting the existing of the functions for Python 2.7,
3.3 and 3.4, changing the exception messages thrown for codec output
type errors on the convenience methods to reference them, and by
updating the Python 3.4 What's New document to explain the changes.

This approach provides a Python 2/3 compatible solution for usage of
non-Unicode encodings: users simply need to call the existing module
level functions in the codecs module, rather than using the methods on
specific builtin types. This approach also means that the binary
codecs can be used with any bytes-like object (including memoryview
and array.array), rather than being limited to types that implement a
new method (like "transform"), and can also be used in Python 2/3
source compatible APIs (since the data driven nature of the problem
makes 2to3 unusable as a solution, and that doesn't help single code
base projects anyway).

>From my point of view, this is now just a matter of better documenting
the status quo, and nudging people in the right direction when it
comes to using the appropriate API for non-Unicode codecs. Since we
now realise these functions have existed since Python 2.4, it doesn't
make sense to try to fundamentally change direction, but instead to
work on making it better.

A few things I noticed while implementing the recent updates:

- as you noted in your other email, while MAL is on record as saying
the codecs module is intended for arbitrary codecs, not just Unicode
encodings, readers of the current docs can definitely be forgiven for
not realising that. We really need to better separate the codecs
module docs from the text model docs (two new sections in the language
reference, one for the codecs machinery and one for the text model
would likely be appropriate. The io module docs and those for the
builtin open function may also be affected)
- a mechanism for annotating frames would help avoid the need for
nasty hacks like the exception wrapping that aims to make codec
failures easier to debug
- if codecs exposed a way to separate the input type check from the
invocation of the codec, we could redirect users to the module API for
bad input types as well (e.g. calling "input str".encode("bz2")
- if we want something that doesn't need to be imported, then encode()
and decode() builtins ma

Re: [Python-Dev] [Python-checkins] cpython: Close #17828: better handling of codec errors

2013-11-14 Thread Stefan Behnel
Nick Coghlan, 13.11.2013 17:25:
> Note that the specific problem with just annotating the exception
> rather than a specific frame is that you lose the stack context for
> where the annotation occurred. The current chaining workaround doesn't
> just change the exception message, it also breaks the stack into two
> pieces (inside and outside the codec) that get displayed separately.

I find this specific chain of exceptions a bit excessive, though:

"""
Failed example:
str(result)
Expected:
Traceback (most recent call last):
  ...
LookupError: unknown encoding: UCS4
Got:
LookupError: unknown encoding: UCS4

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File ".../py3km/python/lib/python3.4/doctest.py", line 1291, in __run
compileflags, 1), test.globs)
  File "", line 1, in 
str(result)
  File "xslt.pxi", line 727, in lxml.etree._XSLTResultTree.__str__
(src/lxml/lxml.etree.c:143584)
  File "xslt.pxi", line 750, in lxml.etree._XSLTResultTree.__unicode__
(src/lxml/lxml.etree.c:143853)
LookupError: decoding with 'UCS4' codec failed (LookupError: unknown
encoding: UCS4)
"""

I can't see any bit of information being added by chaining the exceptions
in this specific case.

Remember that each change to exception messages and/or exception chaining
will break someone's doctests somewhere, and it's really ugly to work
around chained exceptions in (cross-Py-version) doctests.

I understand that this is helpful *in general*, though, i.e. for other
kinds of exceptions in codecs, so maybe changing the exception handling in
the doctest module could be a work-around for this kind of change?

Stefan


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com