Re: [Python-Dev] Add transform() and untranform() methods

2013-11-15 Thread M.-A. Lemburg
On 15.11.2013 08:13, Nick Coghlan wrote:
 On 15 November 2013 11:10, Terry Reedy tjre...@udel.edu wrote:
 On 11/14/2013 5:32 PM, Victor Stinner wrote:

 I don't like the functions codecs.encode() and codecs.decode() because
 the type of the result depends on the encoding (second parameter). We
 try to avoid this in Python.


 Such dependence is common with arithmetic.

 1 + 2
 3
 1 + 2.0
 3.0
 1 + 2+0j
 (3+0j)

 sum((1,2,3), 0)
 6
 sum((1,2,3), 0.0)
 6.0
 sum((1,2,3), 0.0+0j)
 (6+0j)

 for f in (compile, eval, getattr, iter, max, min, next, open, pow, round,
 type, vars):
   type(f(*args)) # depends on the inputs
 That is a large fraction of the non-class builtin functions.
 
 *Type* dependence between inputs and outputs is common (and completely
 non-controversial). The codecs system is different, since the
 supported input and output types are *value* dependent, driven by the
 name of the codec.
 
 That's the part which makes the codec machinery interesting in
 general, since it combines a value driven lazy loading mechanism
 (based on the codec name) with the subsequent invocation of that
 mechanism: the default codec search algorithm goes hunting in the
 encodings package (or the alias dictionary), but you can register
 custom search algorithms and provide encodings any way you want. It
 does mean, however, that the most you can claim for the type signature
 of codecs.encode and codecs.decode is that they accept an object and
 return an object. Beyond that, it's completely driven by the value of
 the codec.

Indeed. You have to think of the codec registry as a mere
lookup mechanism - very much like an import. The implementation
of the imported module defines which types are supported and
how the encode/decode steps work.

 In Python 2.x, the type constraints imposed by the str and unicode
 convenience methods is basestring in, basestring out. As it happens,
 all of the standard library codecs abide by that restriction , so it
 was easy to interpret the codecs module itself as having the same
 basestring in, basestring out limitation, especially given the heavy
 focus on text encodings in the way it was documented. In practice, the
 codecs weren't that open ended - some of them only accepted 8 bit
 strings, some only accepted unicode, some accepted both (perhaps
 relying on implicit decoding to unicode),
 
 The migration to Python 3 made the contrast between the two far more
 stark however, hence the long and involved discussion on issue 7475,
 and the fact that the non-Unicode codecs are currently still missing
 their shorthand aliases.
 
 The proposal I posted to issue 7475 back in April (and, in the absence
 of any objections to the proposal, finally implemented over the past
 few weeks) was to take advantage of the fact that the codecs.encode
 and codecs.decode convenience functions exist (and have been covered
 by the regression test suite) as far back as Python 2.4. I did this
 merely by documenting the existing of the functions for Python 2.7,
 3.3 and 3.4, changing the exception messages thrown for codec output
 type errors on the convenience methods to reference them, and by
 updating the Python 3.4 What's New document to explain the changes.
 
 This approach provides a Python 2/3 compatible solution for usage of
 non-Unicode encodings: users simply need to call the existing module
 level functions in the codecs module, rather than using the methods on
 specific builtin types. This approach also means that the binary
 codecs can be used with any bytes-like object (including memoryview
 and array.array), rather than being limited to types that implement a
 new method (like transform), and can also be used in Python 2/3
 source compatible APIs (since the data driven nature of the problem
 makes 2to3 unusable as a solution, and that doesn't help single code
 base projects anyway).

Right, and that was the main point in making codecs flexible
in this respect. There are many other types which can serve
as input and output - in the stdlib and interpreter as well as
in extension modules that implement their own types.

From my point of view, this is now just a matter of better documenting
 the status quo, and nudging people in the right direction when it
 comes to using the appropriate API for non-Unicode codecs. Since we
 now realise these functions have existed since Python 2.4, it doesn't
 make sense to try to fundamentally change direction, but instead to
 work on making it better.
 
 A few things I noticed while implementing the recent updates:
 
 - as you noted in your other email, while MAL is on record as saying
 the codecs module is intended for arbitrary codecs, not just Unicode
 encodings, readers of the current docs can definitely be forgiven for
 not realising that. We really need to better separate the codecs
 module docs from the text model docs (two new sections in the language
 reference, one for the codecs machinery and one for the text model
 would likely be appropriate. The io 

Re: [Python-Dev] Add transform() and untranform() methods

2013-11-15 Thread Antoine Pitrou
On Fri, 15 Nov 2013 09:03:37 +1000
Nick Coghlan ncogh...@gmail.com wrote:
 
  And add transform() and untransform() methods to bytes and str types.
  In practice, it might be same codecs registry for all codecs just with
  a new attribute.
 
 This is completely the wrong approach. There's zero justification for
 adding new builtin methods for this use case - encoding and decoding are
 generic operations, they should use functions not methods.

I'm sorry, I disagree. The question is what use case it is solving, and
there's zero benefit in writing codecs.encode(zlib) compared to e.g.
zlib.compress().

A transform() or untransform() method, however, allows for a much more
convenient spelling, with easy cascading, e.g.:

b.transform(zlib).transform(base64)

In other words, there's zero justification for codecs.encode() and
codecs.decode(). The fact that the codecs machinery works on arbitrary
object transformation is a pointless genericity, if it doesn't bring
any additional convenience compared to the canonical functions in their
respective modules.

 At this point, the only person that can get me to revert this clarification
 of MAL's original vision for the codecs module is Guido, since anything
 else completely fails to address the Python 3 adoption barrier posed by the
 current state of Python 3's binary codec support.

I'd like to challenge your assertion that your change addresses
anything.

It's not easier to change b.encode(zlib) into codecs.encode(zlib,
b), than it is to change it into zlib.compress(b).

Regards,

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Finding overlapping matches with re assertions: bug or feature?

2013-11-15 Thread Tim Peters
I was surprised to find that this works:  if you want to find all
_overlapping_ matches for a regexp R, wrap it in

 (?=(R))

and feed it to (say) finditer.  Here's a very simple example, finding
all overlapping occurrences of xx:

pat = re.compile((?=(xx)))
for it in pat.finditer():
print(it.span(1))

That displays:

(0, 2)
(1, 3)
(2, 4)

Is that a feature?  Or an accident?  It's very surprising to find a
non-empty match inside an empty match (the outermost lookahead
assertion).  If it's intended behavior, it's just in time for the
holiday season; e.g., to generate ASCII art for half an upside-down
Christmas tree:

pat = re.compile((?=(x+)))
for it in pat.finditer(xx):
print(it.group(1))
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Finding overlapping matches with re assertions: bug or feature?

2013-11-15 Thread Paul Moore
On 15 November 2013 06:48, Tim Peters tim.pet...@gmail.com wrote:
 Is that a feature?  Or an accident?  It's very surprising to find a
 non-empty match inside an empty match (the outermost lookahead
 assertion).

Personally, I would read (?=(R)) as finding an empty match at a point
where R starts. There's no implication that R is in any sense inside
the match.

(?=(\\w\w\w\w\w\w)\w\w\w) finds the first 3 characters of words that
are 6 or more characters long. Once again, the lookahead extends
beyond the extent of the main match.

It's obscure and a little bizarre, but I'd say its intended and a
logical consequence of the definitions.

Paul
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Assign(expr* targets, expr value) - why targetS?

2013-11-15 Thread anatoly techtonik
On Tue, Nov 12, 2013 at 5:08 PM, Benjamin Peterson benja...@python.org wrote:
 2013/11/12 anatoly techtonik techto...@gmail.com:
 On Sun, Nov 10, 2013 at 8:34 AM, Benjamin Peterson benja...@python.org 
 wrote:
 2013/11/10 anatoly techtonik techto...@gmail.com:
 http://hg.python.org/cpython/file/1ee45eb6aab9/Parser/Python.asdl

 In Assign(expr* targets, expr value), why the first argument is a list?

 x = y = 42

 Thanks.

 Speaking of this ASDL. `expr* targets` means that multiple entities of
 `expr` under the name 'targets' can be passed to Assign statement.
 Assign uses them as left value. But `expr` definition contains things
 that can not be used as left side assignment targets:

 expr = BoolOp(boolop op, expr* values)
  | BinOp(expr left, operator op, expr right)
  ...
  | Str(string s) -- need to specify raw, unicode, etc?
  | Bytes(bytes s)
  | NameConstant(singleton value)
  | Ellipsis

  -- the following expression can appear in assignment context
  | Attribute(expr value, identifier attr, expr_context ctx)
  | Subscript(expr value, slice slice, expr_context ctx)
  | Starred(expr value, expr_context ctx)
  | Name(identifier id, expr_context ctx)
  | List(expr* elts, expr_context ctx)
  | Tuple(expr* elts, expr_context ctx)

 If I understand correctly, this is compiled into C struct definitions
 (Python-ast.c), and there is a code to traverse the structure, but
 where is code that validates that the structure is correct? Is it done
 on the first level - text file parsing, before ASDL is built? If so,
 then what is the role of this ADSL exactly that the first step is
 unable to solve?

 Only valid expression targets are allowed during AST construction. See
 set_expr_context in ast.c.

Oh my. Now there is also CST in addition to AST. This stuff -
http://docs.python.org/devguide/ - badly needs diagrams about data
transformation toolchain from Python source code to machine
execution instructions. I'd like some pretty stuff, but raw blogdiag
hack will do the job http://blockdiag.com/en/blockdiag/index.html

There is no set_expr_context in my copy of CPython code, which
seems to be some alpha of Python 3.4

 Is it possible to fix ADSL to move `expr` that are allowed in Assign
 into `expr` subset? What effect will it achieve? I mean - will ADSL
 compiler complain about wrong stuff on the left side, or it will still
 be a role of some other component. Which one?

 I'm not sure what you mean by an `expr` subset.

Transform this:

expr = BoolOp(boolop op, expr* values)
 | BinOp(expr left, operator op, expr right)
 ...
 | Str(string s) -- need to specify raw, unicode, etc?
 | Bytes(bytes s)
 | NameConstant(singleton value)
 | Ellipsis

 -- the following expression can appear in assignment context
 | Attribute(expr value, identifier attr, expr_context ctx)
 | Subscript(expr value, slice slice, expr_context ctx)
 | Starred(expr value, expr_context ctx)
 | Name(identifier id, expr_context ctx)
 | List(expr* elts, expr_context ctx)
 | Tuple(expr* elts, expr_context ctx)

to this:

expr = BoolOp(boolop op, expr* values)
 | BinOp(expr left, operator op, expr right)
 ...
 | Str(string s) -- need to specify raw, unicode, etc?
 | Bytes(bytes s)
 | NameConstant(singleton value)
 | Ellipsis

 -- the following expression can appear in assignment context
 | expr_asgn

 expr_asgn =
   Attribute(expr value, identifier attr, expr_context ctx)
 | Subscript(expr value, slice slice, expr_context ctx)
 | Starred(expr value, expr_context ctx)
 | Name(identifier id, expr_context ctx)
 | List(expr* elts, expr_context ctx)
 | Tuple(expr* elts, expr_context ctx)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Assign(expr* targets, expr value) - why targetS?

2013-11-15 Thread anatoly techtonik
On Fri, Nov 15, 2013 at 12:54 PM, anatoly techtonik techto...@gmail.com wrote:
 On Tue, Nov 12, 2013 at 5:08 PM, Benjamin Peterson benja...@python.org 
 wrote:
 2013/11/12 anatoly techtonik techto...@gmail.com:
 On Sun, Nov 10, 2013 at 8:34 AM, Benjamin Peterson benja...@python.org 
 wrote:
 2013/11/10 anatoly techtonik techto...@gmail.com:
 http://hg.python.org/cpython/file/1ee45eb6aab9/Parser/Python.asdl

 In Assign(expr* targets, expr value), why the first argument is a list?

 x = y = 42

 Thanks.

 Speaking of this ASDL. `expr* targets` means that multiple entities of
 `expr` under the name 'targets' can be passed to Assign statement.
 Assign uses them as left value. But `expr` definition contains things
 that can not be used as left side assignment targets:

 expr = BoolOp(boolop op, expr* values)
  | BinOp(expr left, operator op, expr right)
  ...
  | Str(string s) -- need to specify raw, unicode, etc?
  | Bytes(bytes s)
  | NameConstant(singleton value)
  | Ellipsis

  -- the following expression can appear in assignment context
  | Attribute(expr value, identifier attr, expr_context ctx)
  | Subscript(expr value, slice slice, expr_context ctx)
  | Starred(expr value, expr_context ctx)
  | Name(identifier id, expr_context ctx)
  | List(expr* elts, expr_context ctx)
  | Tuple(expr* elts, expr_context ctx)

 If I understand correctly, this is compiled into C struct definitions
 (Python-ast.c), and there is a code to traverse the structure, but
 where is code that validates that the structure is correct? Is it done
 on the first level - text file parsing, before ASDL is built? If so,
 then what is the role of this ADSL exactly that the first step is
 unable to solve?

 Only valid expression targets are allowed during AST construction. See
 set_expr_context in ast.c.

 Oh my. Now there is also CST in addition to AST. This stuff -
 http://docs.python.org/devguide/ - badly needs diagrams about data
 transformation toolchain from Python source code to machine
 execution instructions. I'd like some pretty stuff, but raw blogdiag
 hack will do the job http://blockdiag.com/en/blockdiag/index.html

 There is no set_expr_context in my copy of CPython code, which
 seems to be some alpha of Python 3.4

 Is it possible to fix ADSL to move `expr` that are allowed in Assign
 into `expr` subset? What effect will it achieve? I mean - will ADSL
 compiler complain about wrong stuff on the left side, or it will still
 be a role of some other component. Which one?

 I'm not sure what you mean by an `expr` subset.

 Transform this:

 expr = BoolOp(boolop op, expr* values)
  | BinOp(expr left, operator op, expr right)
  ...
  | Str(string s) -- need to specify raw, unicode, etc?
  | Bytes(bytes s)
  | NameConstant(singleton value)
  | Ellipsis

  -- the following expression can appear in assignment context
  | Attribute(expr value, identifier attr, expr_context ctx)
  | Subscript(expr value, slice slice, expr_context ctx)
  | Starred(expr value, expr_context ctx)
  | Name(identifier id, expr_context ctx)
  | List(expr* elts, expr_context ctx)
  | Tuple(expr* elts, expr_context ctx)

 to this:

 expr = BoolOp(boolop op, expr* values)
  | BinOp(expr left, operator op, expr right)
  ...
  | Str(string s) -- need to specify raw, unicode, etc?
  | Bytes(bytes s)
  | NameConstant(singleton value)
  | Ellipsis

  -- the following expression can appear in assignment context
  | expr_asgn

  expr_asgn =
Attribute(expr value, identifier attr, expr_context ctx)
  | Subscript(expr value, slice slice, expr_context ctx)
  | Starred(expr value, expr_context ctx)
  | Name(identifier id, expr_context ctx)
  | List(expr* elts, expr_context ctx)
  | Tuple(expr* elts, expr_context ctx)

And also this:

  | Assign(expr* targets, expr value)

to this:

  | Assign(expr_asgn* targets, expr value)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Add transform() and untranform() methods

2013-11-15 Thread Steven D'Aprano
On Fri, Nov 15, 2013 at 05:13:34PM +1000, Nick Coghlan wrote:

 A few things I noticed while implementing the recent updates:
 
 - as you noted in your other email, while MAL is on record as saying
 the codecs module is intended for arbitrary codecs, not just Unicode
 encodings, readers of the current docs can definitely be forgiven for
 not realising that. We really need to better separate the codecs
 module docs from the text model docs (two new sections in the language
 reference, one for the codecs machinery and one for the text model
 would likely be appropriate. The io module docs and those for the
 builtin open function may also be affected)
 - a mechanism for annotating frames would help avoid the need for
 nasty hacks like the exception wrapping that aims to make codec
 failures easier to debug
 - if codecs exposed a way to separate the input type check from the
 invocation of the codec, we could redirect users to the module API for
 bad input types as well (e.g. calling input str.encode(bz2)

 - if we want something that doesn't need to be imported, then encode()
 and decode() builtins make more sense than new methods on str, bytes
 and bytearray objects (since builtins would support memoryview and
 array.array as well, and it avoids ambiguity regarding the direction
 of the operation)

Sounds good to me.

 - the codecs module should offer a way to register a new alias for an
 existing codec
 - the codecs module should offer a way to map a name to a CodecInfo
 object without registering a new search function

It would be really good to be able to query the available codecs. For 
example, many applications offer an Encoding menu, where you can 
specify the codec used for text. That's hard in Python, since you 
can't retrieve a list of known codecs.


-- 
Steven
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Add transform() and untranform() methods

2013-11-15 Thread Serhiy Storchaka

15.11.13 12:02, Steven D'Aprano написав(ла):

It would be really good to be able to query the available codecs. For
example, many applications offer an Encoding menu, where you can
specify the codec used for text. That's hard in Python, since you
can't retrieve a list of known codecs.


And you can't determine which codec is binary-text encoding.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Add transform() and untranform() methods

2013-11-15 Thread Steven D'Aprano
On Fri, Nov 15, 2013 at 10:22:28AM +0100, Antoine Pitrou wrote:
 On Fri, 15 Nov 2013 09:03:37 +1000 Nick Coghlan ncogh...@gmail.com wrote:
  
   And add transform() and untransform() methods to bytes and str types.
   In practice, it might be same codecs registry for all codecs just with
   a new attribute.
  
  This is completely the wrong approach. There's zero justification for
  adding new builtin methods for this use case - encoding and decoding are
  generic operations, they should use functions not methods.
 
 I'm sorry, I disagree. The question is what use case it is solving, and
 there's zero benefit in writing codecs.encode(zlib) compared to e.g.
 zlib.compress().

One benefit is:

import codecs
codec = get_name_of_compression_codec()
result = codecs.encode(data, codec)


versus:


codec = get_name_of_compression_codec()
if codec == zlib:
import zlib
encoder = zlib.compress
elif codec == bz2
import bz2
encoder = bz2.compress
elif codec == gzip:
import gzip
encoder = gzip.compress
elif codec == squash:
import mySquashLib
encoder = mySquashLib.squash
elif ...:
# and so on
result = encoder(data)



 A transform() or untransform() method, however, allows for a much more
 convenient spelling, with easy cascading, e.g.:
 
 b.transform(zlib).transform(base64)

Yes, that's quite nice. Although it need not be a method, a built-in 
function works for me too:

# either of these:
transform(transform(b, zlib), base64)
encode(encode(b, zlib), base64)


If encoding/decoding is intended to be completely generic (even if 99% 
of the uses will be with strings and bytes), is there any reason to 
prefer built-in functions rather than methods on object?


-- 
Steven
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Add transform() and untranform() methods

2013-11-15 Thread Antoine Pitrou
On Fri, 15 Nov 2013 21:28:35 +1100
Steven D'Aprano st...@pearwood.info wrote:
 
 One benefit is:
 
 import codecs
 codec = get_name_of_compression_codec()
 result = codecs.encode(data, codec)

That's a good point.

 If encoding/decoding is intended to be completely generic (even if 99% 
 of the uses will be with strings and bytes), is there any reason to 
 prefer built-in functions rather than methods on object?

Practicality beats purity. Personally, I've never used codecs on
anything else than str and bytes objects.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Add transform() and untranform() methods

2013-11-15 Thread Serhiy Storchaka

15.11.13 12:28, Steven D'Aprano написав(ла):

One benefit is:

import codecs
codec = get_name_of_compression_codec()
result = codecs.encode(data, codec)


And this is a hole in a security if you don't check codec name before 
calling a codec. See topic about utilizing zip-bombs via codecs machinery.


Also usually you need more than just uncompress binary data by Python 
name. You need map external compression name to internal Python codec 
name, you need configure decompressor object by specific options, 
perhaps you need different buffering strategies for different 
compression algorithms. See for example zipfile and tarfile sources.



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] unicode Exception messages in py2.7

2013-11-15 Thread Armin Rigo
Hi,

FWIW, the pure Python traceback.py module has a slightly different
(and saner) behavior:

 e = Exception(uxx\u1234yy)
 traceback.print_exception(Exception, e, None)
Exception: xx\u1234yy

I'd suggest that the behavior of the two should be unified anyway.
The traceback module uses value.encode(ascii, backslashreplace)
for any unicode object.


A bientôt,

Armin.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-checkins] cpython: Close #17828: better handling of codec errors

2013-11-15 Thread Nick Coghlan
On 15 November 2013 17:22, Stefan Behnel stefan...@behnel.de wrote:

 I can't see any bit of information being added by chaining the exceptions
 in this specific case.

 Remember that each change to exception messages and/or exception chaining
 will break someone's doctests somewhere, and it's really ugly to work
 around chained exceptions in (cross-Py-version) doctests.

 I understand that this is helpful *in general*, though, i.e. for other
 kinds of exceptions in codecs, so maybe changing the exception handling in
 the doctest module could be a work-around for this kind of change?

IIRC, doctest ignores the traceback contents by default - this is just
a bug where the chaining is also triggering for the initial codec
lookup when it should avoid doing that.

Created http://bugs.python.org/issue19609

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Add transform() and untranform() methods

2013-11-15 Thread Nick Coghlan
On 15 November 2013 20:33, Antoine Pitrou solip...@pitrou.net wrote:
 On Fri, 15 Nov 2013 21:28:35 +1100
 Steven D'Aprano st...@pearwood.info wrote:

 One benefit is:

 import codecs
 codec = get_name_of_compression_codec()
 result = codecs.encode(data, codec)

 That's a good point.

 If encoding/decoding is intended to be completely generic (even if 99%
 of the uses will be with strings and bytes), is there any reason to
 prefer built-in functions rather than methods on object?

 Practicality beats purity. Personally, I've never used codecs on
 anything else than str and bytes objects.

The reason I'm now putting some effort into better documenting the
status quo for codec handling in Python 3 and filing off some of the
rough edges (rather than proposing adding any new APIs to Python 3.x)
is because the users I care about in this matter are web developers
that already make use of the binary codecs and are adopting the
single-source approach to handle supporting both Python 2 and Python
3. Armin Ronacher is the one who's been most vocal about the problem,
but he's definitely not alone.

A new API for binary transforms is potentially an academically
interesting concept, but it solves zero current real world problems.
By contrast, being clear about the fact that codecs.encode and
codecs.decode exist and are available as far back as Python 2.4 helps
to eliminate a genuine barrier to Python 3 adoption for a subset of
the community.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Add transform() and untranform() methods

2013-11-15 Thread Antoine Pitrou
On Fri, 15 Nov 2013 21:45:31 +1000
Nick Coghlan ncogh...@gmail.com wrote:
 
 The reason I'm now putting some effort into better documenting the
 status quo for codec handling in Python 3 and filing off some of the
 rough edges (rather than proposing adding any new APIs to Python 3.x)
 is because the users I care about in this matter are web developers
 that already make use of the binary codecs and are adopting the
 single-source approach to handle supporting both Python 2 and Python
 3. Armin Ronacher is the one who's been most vocal about the problem,
 but he's definitely not alone.

zlib.compress(something) works on both Python 2 and Python 3, why do
you need something else?

Regards

Antoine.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Add transform() and untranform() methods

2013-11-15 Thread Victor Stinner
2013/11/15 Nick Coghlan ncogh...@gmail.com:
 The reason I'm now putting some effort into better documenting the
 status quo for codec handling in Python 3 and filing off some of the
 rough edges (rather than proposing adding any new APIs to Python 3.x)
 is because the users I care about in this matter are web developers
 that already make use of the binary codecs and are adopting the
 single-source approach to handle supporting both Python 2 and Python
 3. Armin Ronacher is the one who's been most vocal about the problem,
 but he's definitely not alone.

Except of Armin Ronacher, I never see anyway blocked when trying to
port a project to Python3 because of these bytes=bytes and str=str
codecs. I did a quick search on Google but I failed to find a question
how can I write .encode(hex) or .encode(zlib) in Python 3?. It
was just a quick search, it's likely that many developers hit this
Python 3 regression, but I'm confident that developers are able to
workaround themself this regression (ex: use directly the right Python
module).

I saw a lot of huge code base ported to Python 3 without the need of
these codecs. For example: Django which is a web framework has been
ported on Python 3, I know that Armin Ronacher also works on web
things (I don't know what exactly).

 A new API for binary transforms is potentially an academically
 interesting concept, but it solves zero current real world problems.

I would like to reply the same for these codecs: they are not solving
any real world problem :-)

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Add transform() and untranform() methods

2013-11-15 Thread Paul Moore
On 15 November 2013 12:07, Victor Stinner victor.stin...@gmail.com wrote:
 A new API for binary transforms is potentially an academically
 interesting concept, but it solves zero current real world problems.

 I would like to reply the same for these codecs: they are not solving
 any real world problem :-)

As Nick is only documenting long-existing functions, I fail to see the
issue here.

If someone were to propose new methods, builtins, or module functions,
then I could see a reason for debate. But surely simply documenting
existing functions is not worth all this pushback?

Paul
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Add transform() and untranform() methods

2013-11-15 Thread M.-A. Lemburg
On 15.11.2013 12:45, Nick Coghlan wrote:
 On 15 November 2013 20:33, Antoine Pitrou solip...@pitrou.net wrote:
 On Fri, 15 Nov 2013 21:28:35 +1100
 Steven D'Aprano st...@pearwood.info wrote:

 One benefit is:

 import codecs
 codec = get_name_of_compression_codec()
 result = codecs.encode(data, codec)

 That's a good point.

 If encoding/decoding is intended to be completely generic (even if 99%
 of the uses will be with strings and bytes), is there any reason to
 prefer built-in functions rather than methods on object?

 Practicality beats purity. Personally, I've never used codecs on
 anything else than str and bytes objects.
 
 The reason I'm now putting some effort into better documenting the
 status quo for codec handling in Python 3 and filing off some of the
 rough edges (rather than proposing adding any new APIs to Python 3.x)
 is because the users I care about in this matter are web developers
 that already make use of the binary codecs and are adopting the
 single-source approach to handle supporting both Python 2 and Python
 3. Armin Ronacher is the one who's been most vocal about the problem,
 but he's definitely not alone.

You can add me to that list :-). Esp. the hex codec is very handy.
Google returns a few thousand hits for that codec alone.

One detail that people often tend to forget is the extensibility
of the codec system. It is easily possible to add new codecs
to the system to e.g. perform encoding, escaping, compression or
other conversion operations, so the set of codecs in the stdlib
is not the complete set of codecs used in the wild - and it's
not intended to be.

As example: We've written codecs for customers that perform
special types of XML un/escaping.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Nov 15 2013)
 Python Projects, Consulting and Support ...   http://www.egenix.com/
 mxODBC.Zope/Plone.Database.Adapter ...   http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/

2013-11-19: Python Meeting Duesseldorf ...  4 days to go

: Try our mxODBC.Connect Python Database Interface for free ! ::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Add transform() and untranform() methods

2013-11-15 Thread Facundo Batista
On Thu, Nov 14, 2013 at 7:32 PM, Victor Stinner
victor.stin...@gmail.com wrote:

 I would prefer to split the registry of codecs to have 3 registries:

 - encoding (a better name can found): encode str=bytes, decode bytes=str
 - bytes: encode bytes=bytes, decode bytes=bytes
 - str:  encode str=str, decode str=str

 And add transform() and untransform() methods to bytes and str types.
 In practice, it might be same codecs registry for all codecs just with
 a new attribute.

I like this idea very much.

But to see IIUC, let me be more explicit... you'll have (of course,
always py3k-speaking):

- bytes.decode() - str ... here you can only use unicode encodings
- no bytes.encode(), like today
- bytes.transform() - bytes ... here you can only use things like
zlib, rot13, etc

- str.encode() - bytes ... here you can only use unicode encodings
- no str.decode(), like today
- str.transform() - str ... here you can only use things like... like what?

When to use decode/encode was always a major pain point for people, so
doing this extra separation and cleaning would bring more clarity to
when to use what.

Thanks!

--
.Facundo

Blog: http://www.taniquetil.com.ar/plog/
PyAr: http://www.python.org/ar/
Twitter: @facundobatista
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] unicode Exception messages in py2.7

2013-11-15 Thread Martin v. Löwis
Am 15.11.13 00:57, schrieb Chris Barker:
 Maybe so -- but we are either maintaining 2.7 or not -- it WIL be
 around for along time yet...

Procedurally, it's really easy. Ultimately it's up to the release
manager to decide which changes go into a release and which don't, and
Benjamin has already voiced an opinion.

In addition, Guido van Rossum has voiced an opinion a while ago that
he doesn't consider fixing bugs for 2.7 very useful, and would rather
see maintenance focus on ongoing support for new operating systems,
compiler, build environments, etc. The rationale is that people who
have lived with the glitches of 2.x for so long surely have already
made their work-arounds, so they aren't helped with receiving bug fixes.

The same is true in your case: you indicated that you *already* work
around the problem. It may have been tedious when you had to do it,
but now it's done - and you might not even change your code even if
Python 2.7.x gets changed, since you might want to support older 2.7.x
release for some time.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Add transform() and untranform() methods

2013-11-15 Thread Nick Coghlan
On 15 November 2013 22:24, Paul Moore p.f.mo...@gmail.com wrote:
 On 15 November 2013 12:07, Victor Stinner victor.stin...@gmail.com wrote:
 A new API for binary transforms is potentially an academically
 interesting concept, but it solves zero current real world problems.

 I would like to reply the same for these codecs: they are not solving
 any real world problem :-)

 As Nick is only documenting long-existing functions, I fail to see the
 issue here.

 If someone were to propose new methods, builtins, or module functions,
 then I could see a reason for debate. But surely simply documenting
 existing functions is not worth all this pushback?

There's a bit more to it than that (and that's why I started the other
thread about the codec aliases before proceeding to the final step).

One of the changes Victor is concerned about is that when you use an
incorrect codec in one of the Unicode-encoding-only convenience
methods, the recent exception updates explicitly push users towards
using those module level functions instead:

 import codecs
 no good.encode(rot_13)
Traceback (most recent call last):
  File stdin, line 1, in module
TypeError: 'rot_13' encoder returned 'str' instead of 'bytes'; use
codecs.encode() to encode to arbitrary types
 codecs.encode(just fine, rot_13)
'whfg svar'

 bno good.decode(quopri_codec)
Traceback (most recent call last):
  File stdin, line 1, in module
TypeError: 'quopri_codec' decoder returned 'bytes' instead of 'str';
use codecs.decode() to decode to arbitrary types
 codecs.decode(bjust fine, quopri_codec)
b'just fine'

My perspective is that, in current Python, that *is* the right thing
for people to do, and any hypothetical new API proposed for Python 3.5
would do nothing to change what's right for Python 3.4 code (or Python
2/3 compatible code). I also find it bizarre that several of those
arguing that this is too niche a feature to be worth refining are
simultaneously in favour of a proposal to add new *methods on builtin
types* for the same niche feature.

The other part is the fact that I updated the What's New document to
highlight these tweaks:
http://docs.python.org/dev/whatsnew/3.4.html#improvements-to-handling-of-non-unicode-codecs

As noted earlier in the thread, Armin Ronacher has been the most vocal
of the users of this feature in Python 2 that lamented it's absence in
Python 3 (see, for example,
http://lucumr.pocoo.org/2012/8/11/codec-confusion/), but I've also
received plenty of subsequent feedback along the lines of what he
said! (such as http://bugs.python.org/issue7475#msg187630).

Many of the proposed solutions from the people affected by the change
haven't been usable (since they've often been based on a
misunderstanding of why the method behaviour changed in Python 3 in
the first place), but the pain they experience is genuine, and it can
unnecessarily sour their whole experience of the transition. I
consider documenting the existing module level functions and nudging
users towards them when they try to use the affected codecs to be an
expedient way to say yes, this is still available if you really want
to use it, but the required spelling is different.

However, the one thing I'm *not* going to do at this point is restore
the shorthand aliases, so those opposing the lowering of this barrier
to transition can take comfort in the fact they have succeeded in
ensuring that the out-of-the-box experience for users of this feature
migrating from Python 2 remains the unfriendly:

 babcdef.decode(hex)
Traceback (most recent call last):
  File stdin, line 1, in module
LookupError: unknown encoding: hex

Rather than the more useful:

 babcdef.decode(hex)
Traceback (most recent call last):
  File stdin, line 1, in module
TypeError: 'hex' decoder returned 'bytes' instead of 'str'; use
codecs.decode() to decode to arbitrary types

Which would then lead them to the working (and still Python 2 compatible) code:

 codecs.decode(babcdef, hex)
b'\xab\xcd\xef'

Regards,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Add transform() and untranform() methods

2013-11-15 Thread Antoine Pitrou
On Fri, 15 Nov 2013 23:50:23 +1000
Nick Coghlan ncogh...@gmail.com wrote:
 
 My perspective is that, in current Python, that *is* the right thing
 for people to do, and any hypothetical new API proposed for Python 3.5
 would do nothing to change what's right for Python 3.4 code (or Python
 2/3 compatible code). I also find it bizarre that several of those
 arguing that this is too niche a feature to be worth refining are
 simultaneously in favour of a proposal to add new *methods on builtin
 types* for the same niche feature.

I am not claiming it is a niche feature, I am claiming codecs.encode()
and codecs.decode() don't solve the use case like the .transform()
and .untransform() methods do.

(I do think it is a nice feature in Python 2, although I find myself
using it mainly at the interpreter prompt, rather than in production
code)

 As noted earlier in the thread, Armin Ronacher has been the most vocal
 of the users of this feature in Python 2 that lamented it's absence in
 Python 3 (see, for example,
 http://lucumr.pocoo.org/2012/8/11/codec-confusion/), but I've also
 received plenty of subsequent feedback along the lines of what he
 said! (such as http://bugs.python.org/issue7475#msg187630).

The way I read it, the positive feedback was about .transform()
and .untransform(), not about recommending people switch to
codecs.encode() and codecs.decode().

 Rather than the more useful:
 
  babcdef.decode(hex)
 Traceback (most recent call last):
   File stdin, line 1, in module
 TypeError: 'hex' decoder returned 'bytes' instead of 'str'; use
 codecs.decode() to decode to arbitrary types

I think this may be confusing.  TypeError seems to suggest that the
parameter type sent by the user to the method is wrong, which is not
the actual cause of the error.

Regards

Antoine.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Assign(expr* targets, expr value) - why targetS?

2013-11-15 Thread Benjamin Peterson
2013/11/15 anatoly techtonik techto...@gmail.com:
 On Tue, Nov 12, 2013 at 5:08 PM, Benjamin Peterson benja...@python.org 
 wrote:
 2013/11/12 anatoly techtonik techto...@gmail.com:
 On Sun, Nov 10, 2013 at 8:34 AM, Benjamin Peterson benja...@python.org 
 wrote:
 2013/11/10 anatoly techtonik techto...@gmail.com:
 http://hg.python.org/cpython/file/1ee45eb6aab9/Parser/Python.asdl

 In Assign(expr* targets, expr value), why the first argument is a list?

 x = y = 42

 Thanks.

 Speaking of this ASDL. `expr* targets` means that multiple entities of
 `expr` under the name 'targets' can be passed to Assign statement.
 Assign uses them as left value. But `expr` definition contains things
 that can not be used as left side assignment targets:

 expr = BoolOp(boolop op, expr* values)
  | BinOp(expr left, operator op, expr right)
  ...
  | Str(string s) -- need to specify raw, unicode, etc?
  | Bytes(bytes s)
  | NameConstant(singleton value)
  | Ellipsis

  -- the following expression can appear in assignment context
  | Attribute(expr value, identifier attr, expr_context ctx)
  | Subscript(expr value, slice slice, expr_context ctx)
  | Starred(expr value, expr_context ctx)
  | Name(identifier id, expr_context ctx)
  | List(expr* elts, expr_context ctx)
  | Tuple(expr* elts, expr_context ctx)

 If I understand correctly, this is compiled into C struct definitions
 (Python-ast.c), and there is a code to traverse the structure, but
 where is code that validates that the structure is correct? Is it done
 on the first level - text file parsing, before ASDL is built? If so,
 then what is the role of this ADSL exactly that the first step is
 unable to solve?

 Only valid expression targets are allowed during AST construction. See
 set_expr_context in ast.c.

 Oh my. Now there is also CST in addition to AST. This stuff -
 http://docs.python.org/devguide/ - badly needs diagrams about data
 transformation toolchain from Python source code to machine
 execution instructions. I'd like some pretty stuff, but raw blogdiag
 hack will do the job http://blockdiag.com/en/blockdiag/index.html

 There is no set_expr_context in my copy of CPython code, which
 seems to be some alpha of Python 3.4

It's actually called set_context.


 Is it possible to fix ADSL to move `expr` that are allowed in Assign
 into `expr` subset? What effect will it achieve? I mean - will ADSL
 compiler complain about wrong stuff on the left side, or it will still
 be a role of some other component. Which one?

 I'm not sure what you mean by an `expr` subset.

 Transform this:

 expr = BoolOp(boolop op, expr* values)
  | BinOp(expr left, operator op, expr right)
  ...
  | Str(string s) -- need to specify raw, unicode, etc?
  | Bytes(bytes s)
  | NameConstant(singleton value)
  | Ellipsis

  -- the following expression can appear in assignment context
  | Attribute(expr value, identifier attr, expr_context ctx)
  | Subscript(expr value, slice slice, expr_context ctx)
  | Starred(expr value, expr_context ctx)
  | Name(identifier id, expr_context ctx)
  | List(expr* elts, expr_context ctx)
  | Tuple(expr* elts, expr_context ctx)

 to this:

 expr = BoolOp(boolop op, expr* values)
  | BinOp(expr left, operator op, expr right)
  ...
  | Str(string s) -- need to specify raw, unicode, etc?
  | Bytes(bytes s)
  | NameConstant(singleton value)
  | Ellipsis

  -- the following expression can appear in assignment context
  | expr_asgn

  expr_asgn =
Attribute(expr value, identifier attr, expr_context ctx)
  | Subscript(expr value, slice slice, expr_context ctx)
  | Starred(expr value, expr_context ctx)
  | Name(identifier id, expr_context ctx)
  | List(expr* elts, expr_context ctx)
  | Tuple(expr* elts, expr_context ctx)

I doubt ASDL will let you do that.

-- 
Regards,
Benjamin
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Add transform() and untranform() methods

2013-11-15 Thread Nick Coghlan
On 16 November 2013 00:04, Antoine Pitrou solip...@pitrou.net wrote:
 Rather than the more useful:

  babcdef.decode(hex)
 Traceback (most recent call last):
   File stdin, line 1, in module
 TypeError: 'hex' decoder returned 'bytes' instead of 'str'; use
 codecs.decode() to decode to arbitrary types

 I think this may be confusing.  TypeError seems to suggest that the
 parameter type sent by the user to the method is wrong, which is not
 the actual cause of the error.

The TypeError isn't new, only the part after the semi-colon telling
them that codecs.decode() doesn't include the typecheck (because it
isn't constrained by the text model).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Add transform() and untranform() methods

2013-11-15 Thread Stephen J. Turnbull
Walter Dörwald writes:
  Am 15.11.2013 um 00:42 schrieb Serhiy Storchaka storch...@gmail.com:
   
   15.11.13 00:32, Victor Stinner написав(ла):
   And add transform() and untransform() methods to bytes and str types.
   In practice, it might be same codecs registry for all codecs just with
   a new attribute.
   
   If the transform() method will be added, I prefer to have only
   one transformation method and specify a direction by the
   transformation name (bzip2/unbzip2).
  
  +1

-1

I can't support adding such methods (and that's why I ended up giving
Nick's proposal for exposing codecs.encode and codecs.decode a +1).
People think about these transformations as en- or de-coding, not
transforming, most of the time.  Even for a transformation that is
an involution (eg, rot13), people have an very clear idea of what's
encoded and what's not, and they are going to prefer the names
encode and decode for these (generic) operations in many cases.

Eg, I don't think s.transform(decoder) is an improvement over
decode(s, codec) (but tastes vary).[1]  It does mean that we need
to add a redundant method, and I don't really see an advantage to it.
The semantics seem slightly off to me, since the purpose of the
operation is to create a new object, not transform the original
in-place.  (But of course str.encode and bytes.decode are precedents
for those semantics.)


Footnotes: 
[1]  Arguments decoder and codec are identifiers, not metavariables.

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Add transform() and untranform() methods

2013-11-15 Thread Ethan Furman

On 11/14/2013 11:13 PM, Nick Coghlan wrote:


The proposal I posted to issue 7475 back in April (and, in the absence
of any objections to the proposal, finally implemented over the past
few weeks) was to take advantage of the fact that the codecs.encode
and codecs.decode convenience functions exist (and have been covered
by the regression test suite) as far back as Python 2.4. I did this
merely by documenting the existing of the functions for Python 2.7,
3.3 and 3.4, changing the exception messages thrown for codec output
type errors on the convenience methods to reference them, and by
updating the Python 3.4 What's New document to explain the changes.


Thanks for doing this work, Nick!

--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Add transform() and untranform() methods

2013-11-15 Thread Antoine Pitrou
On Sat, 16 Nov 2013 00:46:15 +1000
Nick Coghlan ncogh...@gmail.com wrote:
 On 16 November 2013 00:04, Antoine Pitrou solip...@pitrou.net wrote:
  Rather than the more useful:
 
   babcdef.decode(hex)
  Traceback (most recent call last):
File stdin, line 1, in module
  TypeError: 'hex' decoder returned 'bytes' instead of 'str'; use
  codecs.decode() to decode to arbitrary types
 
  I think this may be confusing.  TypeError seems to suggest that the
  parameter type sent by the user to the method is wrong, which is not
  the actual cause of the error.
 
 The TypeError isn't new,

Really? That's not what your message said.

Regards

Antoine.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Summary of Python tracker Issues

2013-11-15 Thread Python tracker

ACTIVITY SUMMARY (2013-11-08 - 2013-11-15)
Python tracker at http://bugs.python.org/

To view or respond to any of the issues listed below, click on the issue.
Do NOT respond to this message.

Issues counts and deltas:
  open4265 (+38)
  closed 27119 (+45)
  total  31384 (+83)

Open issues with patches: 1979 


Issues opened (74)
==

#1180: Option to ignore or substitute ~/.pydistutils.cfg
http://bugs.python.org/issue1180  reopened by jason.coombs

#6466: duplicate get_version() code between cygwinccompiler and emxcc
http://bugs.python.org/issue6466  reopened by jason.coombs

#6516: reset owner/group to root for distutils tarballs
http://bugs.python.org/issue6516  reopened by jason.coombs

#16286: Use hash if available to optimize a==b and a!=b for bytes and 
http://bugs.python.org/issue16286  reopened by haypo

#17354: TypeError when running setup.py upload --show-response
http://bugs.python.org/issue17354  reopened by berker.peksag

#19466: Clear state of threads earlier in Python shutdown
http://bugs.python.org/issue19466  reopened by haypo

#19531: Loading -OO bytecode files if -O was requested can lead to pro
http://bugs.python.org/issue19531  opened by Sworddragon

#19532: compileall -f doesn't force to write bytecode files
http://bugs.python.org/issue19532  opened by Sworddragon

#19533: Unloading docstrings from memory if -OO is given
http://bugs.python.org/issue19533  opened by Sworddragon

#19534: normalize() in locale.py fails for sr_RS.UTF-8@latin
http://bugs.python.org/issue19534  opened by mfabian

#19535: Test failures with -OO
http://bugs.python.org/issue19535  opened by serhiy.storchaka

#19536: MatchObject should offer __getitem__()
http://bugs.python.org/issue19536  opened by brandon-rhodes

#19537: Fix misalignment in fastsearch_memchr_1char
http://bugs.python.org/issue19537  opened by schwab

#19538: Changed function prototypes in the PEP 384 stable ABI
http://bugs.python.org/issue19538  opened by theller

#19539: The 'raw_unicode_escape' codec buggy + not appropriate for Pyt
http://bugs.python.org/issue19539  opened by zuo

#19541: ast.dump(indent=True) prettyprinting
http://bugs.python.org/issue19541  opened by techtonik

#19542: WeakValueDictionary bug in setdefault()pop()
http://bugs.python.org/issue19542  opened by arigo

#19543: Add -3 warnings for codec convenience method changes
http://bugs.python.org/issue19543  opened by ncoghlan

#19544: Port distutils as found in Python 2.7 to Python 3.x.
http://bugs.python.org/issue19544  opened by jason.coombs

#19545: time.strptime exception context
http://bugs.python.org/issue19545  opened by Claudiu.Popa

#19546: configparser leaks implementation detail
http://bugs.python.org/issue19546  opened by Claudiu.Popa

#19547: HTTPS proxy support missing without warning
http://bugs.python.org/issue19547  opened by 02strich

#19548: 'codecs' module docs improvements
http://bugs.python.org/issue19548  opened by zuo

#19549: PKG-INFO is created with CRLF on Windows
http://bugs.python.org/issue19549  opened by techtonik

#19550: PEP 453: Windows installer integration
http://bugs.python.org/issue19550  opened by ncoghlan

#19551: PEP 453: Mac OS X installer integration
http://bugs.python.org/issue19551  opened by ncoghlan

#19552: PEP 453: venv module and pyvenv integration
http://bugs.python.org/issue19552  opened by ncoghlan

#19553: PEP 453: make install and make altinstall integration
http://bugs.python.org/issue19553  opened by ncoghlan

#19554: Enable all freebsd* host platforms
http://bugs.python.org/issue19554  opened by wg

#19555: SO config var not getting set
http://bugs.python.org/issue19555  opened by Marc.Abramowitz

#19557: ast - docs for every node type are missing
http://bugs.python.org/issue19557  opened by techtonik

#19558: Provide Tcl/Tk linkage information for extension module builds
http://bugs.python.org/issue19558  opened by ned.deily

#19561: request to reopen Issue837046 - pyport.h redeclares gethostnam
http://bugs.python.org/issue19561  opened by risto3

#19562: Added description for assert statement
http://bugs.python.org/issue19562  opened by thatiparthy

#19563: Changing barry's email  to ba...@python.org
http://bugs.python.org/issue19563  opened by thatiparthy

#19564: test_multiprocessing_spawn hangs
http://bugs.python.org/issue19564  opened by haypo

#19565: test_multiprocessing_spawn: RuntimeError and assertion error o
http://bugs.python.org/issue19565  opened by haypo

#19566: ERROR: test_close (test.test_asyncio.test_unix_events.FastChil
http://bugs.python.org/issue19566  opened by haypo

#19568: bytearray_setslice_linear() leaves the bytearray in an inconsi
http://bugs.python.org/issue19568  opened by haypo

#19569: Use __attribute__((deprecated)) to warn usage of deprecated fu
http://bugs.python.org/issue19569  opened by haypo

#19570: distutils' Command.ensure_dirname fails on Unicode
http://bugs.python.org/issue19570  opened by saschpe

#19572: Report more silently skipped tests as 

Re: [Python-Dev] The pysandbox project is broken

2013-11-15 Thread Trent Nelson
On Tue, Nov 12, 2013 at 01:16:55PM -0800, Victor Stinner wrote:
 pysandbox cannot be used in practice
 
 
 To protect the untrusted namespace, pysandbox installs a lot of
 different protections. Because of all these protections, it becomes
 hard to write Python code. Basic features like del dict[key] are
 denied. Passing an object to a sandbox is not possible to sandbox,
 pysandbox is unable to proxify arbitary objects.
 
 For something more complex than evaluating 1+(2*3), pysandbox cannot
 be used in practice, because of all these protections. Individual
 protections cannot be disabled, all protections are required to get a
 secure sandbox.

This sounds a lot like the work I initially did with PyParallel to
try and intercept/prevent parallel threads mutating main-thread
objects.

I ended up arriving at a much better solution by just relying on
memory protection; main thread pages are set read-only prior to
parallel threads being able to run.  If a parallel thread attempts
to mutate a main thread object; a SEH is raised (SIGSEV on POSIX),
which I catch in the ceval loop and convert into an exception.

See slide 138 of this: 
https://speakerdeck.com/trent/pyparallel-how-we-removed-the-gil-and-exploited-all-cores-1

I'm wondering if this sort of an approach (which worked surprisingly
well) could be leveraged to also provide a sandbox environment?  The
goals are the same: robust protection against mutation of memory
allocated outside of the sandbox.

(I'm purely talking about memory mutation; haven't thought about how
 that could be extended to prevent file system interaction as well.)


Trent.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] unicode Exception messages in py2.7

2013-11-15 Thread Armin Rigo
Hi again,

I figured that even using the traceback.py module and getting
Exception: \u1234\u1235\u5321 is rather useless if you tried to
raise an exception with a message in Thai.  I believe this to also be
a bug, so I opened https://bugs.pypy.org/issue1634 .  According to
this thread, however, python-dev is against it, so I didn't bother
adding a CPython bug.


A bientôt,

Armin.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] The pysandbox project is broken

2013-11-15 Thread Victor Stinner
2013/11/15 Trent Nelson tr...@snakebite.org:
 This sounds a lot like the work I initially did with PyParallel to
 try and intercept/prevent parallel threads mutating main-thread
 objects.

 I ended up arriving at a much better solution by just relying on
 memory protection; main thread pages are set read-only prior to
 parallel threads being able to run.  If a parallel thread attempts
 to mutate a main thread object; a SEH is raised (SIGSEV on POSIX),
 which I catch in the ceval loop and convert into an exception.

Read-only is not enough, an attack must not be able to read sensitive data.

Protections of memory pages sound very low-level, so not very portable :-/

How do you know fif SIGSEGV comes from a legal call (parallel thread
thing) or a real bug?

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] unicode Exception messages in py2.7

2013-11-15 Thread Chris Barker
On Fri, Nov 15, 2013 at 9:21 AM, Armin Rigo ar...@tunes.org wrote:

 I figured that even using the traceback.py module and getting
 Exception: \u1234\u1235\u5321 is rather useless if you tried to
 raise an exception with a message in Thai.

yup.

 I believe this to also be
 a bug, so I opened https://bugs.pypy.org/issue1634 .  According to
 this thread, however, python-dev is against it, so I didn't bother
 adding a CPython bug.

According to that bug report, it looks like CPython doesn't
comopletely handle unicode Exception messages even in py3? Is that
really the case?

And from this thread, I'd say that it's unlikely anyone want to chance
this in py2, but I don't know that making py3 better is this regard is
off the table.

-Chris



-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] The pysandbox project is broken

2013-11-15 Thread Trent Nelson

On Nov 15, 2013, at 12:34 PM, Victor Stinner wrote:

 2013/11/15 Trent Nelson tr...@snakebite.org:
This sounds a lot like the work I initially did with PyParallel to
try and intercept/prevent parallel threads mutating main-thread
objects.
 
I ended up arriving at a much better solution by just relying on
memory protection; main thread pages are set read-only prior to
parallel threads being able to run.  If a parallel thread attempts
to mutate a main thread object; a SEH is raised (SIGSEV on POSIX),
which I catch in the ceval loop and convert into an exception.
 
 Read-only is not enough, an attack must not be able to read sensitive data.

Well you could remove both write *and* read perms from pages, such that you 
would trap on read attempts too.  What's an example of sensitive data that 
you'd need to have residing in the same process that you also want to sandbox?

I was going to suggest something like:

with memory.protected:
htpasswd = open('htpasswd', 'r').read()
...

But then I couldn't think of why you'd persist the sensitive data past the 
point you'd need it. 

 Protections of memory pages sound very low-level, so not very portable :-/

It's a pretty fundamental provision provided by operating systems; granted, the 
interface differs (mprotect() versus VirtualProtect()), but the result is the 
same.

 How do you know fif SIGSEGV comes from a legal call (parallel thread
 thing) or a real bug?

You don't, but it doesn't really matter.  It'll be pretty obvious from looking 
at the offending line of code in the exception whether it was a legitimate 
memory protection error, or a bug in an extension module/CPython internals.

And having a ProtectionError bubble all the way back up to the top of the stack 
with exact details about the offending frame/line could be considered a nicer 
alternative to dumping core ;-)  (Unless you happen to be in an `except: pass` 
block.)

 Victor


Trent.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] unicode Exception messages in py2.7

2013-11-15 Thread Brett Cannon
On Fri, Nov 15, 2013 at 12:41 PM, Chris Barker chris.bar...@noaa.govwrote:

 On Fri, Nov 15, 2013 at 9:21 AM, Armin Rigo ar...@tunes.org wrote:

  I figured that even using the traceback.py module and getting
  Exception: \u1234\u1235\u5321 is rather useless if you tried to
  raise an exception with a message in Thai.

 yup.

  I believe this to also be
  a bug, so I opened https://bugs.pypy.org/issue1634 .  According to
  this thread, however, python-dev is against it, so I didn't bother
  adding a CPython bug.

 According to that bug report, it looks like CPython doesn't
 comopletely handle unicode Exception messages even in py3? Is that
 really the case?

 And from this thread, I'd say that it's unlikely anyone want to chance
 this in py2, but I don't know that making py3 better is this regard is
 off the table.


Making changes and improvements to Python 3 is totally an option.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] unicode Exception messages in py2.7

2013-11-15 Thread Chris Barker
On Fri, Nov 15, 2013 at 5:24 AM, Martin v. Löwis mar...@v.loewis.de wrote:

 Procedurally, it's really easy. Ultimately it's up to the release
 manager to decide which changes go into a release and which don't, and
 Benjamin has already voiced an opinion.

Very early in the conversation, though honestly, probably nothing
compelling has been brought up...

 In addition, Guido van Rossum has voiced an opinion a while ago that
 he doesn't consider fixing bugs for 2.7 very useful, and would rather
 see maintenance focus on ongoing support for new operating systems,
 compiler, build environments, etc. The rationale is that people who
 have lived with the glitches of 2.x for so long surely have already
 made their work-arounds, so they aren't helped with receiving bug fixes.

If that's the policy, then that's the policy, but ...

 The same is true in your case: you indicated that you *already* work
 around the problem.

only in one script, and I just looked, and I missed a numer of
locations in that script. I've been running that script for years with
very few changes, but ;last week, someone gave me a utf-8 data file --
it was really easy to read the file as utf-8 (change one line of
code), and bingo! everything else just worked.

Then I hit an exception, and banged my head against the wall for a
while -- though I guess this is what we always deal with anywhere we
introduce unicode to a previously-non-unicode-aware application. I'm
still a bit dumbfounded that you can't use a unicode message in an
Exception, though, still not sure why that's required...

 It may have been tedious when you had to do it,
 but now it's done - and you might not even change your code even if
 Python 2.7.x gets changed, since you might want to support older 2.7.x
 release for some time.

In this case, no -- but really this is more about making it easier to
just dump unicode in somewhere, or, in fact simple give people more
meaningful errors when they do that...

And I have a lot of code that ignores this problem, and Im sure it
will come up for me and others over and over again. But yeas, it
clearly hasn't been a deal-breaker so far!

On Fri, Nov 15, 2013 at 2:48 AM, Armin Rigo ar...@tunes.org wrote:
 FWIW, the pure Python traceback.py module has a slightly different
 (and saner) behavior:

 e = Exception(uxx\u1234yy)
 traceback.print_exception(Exception, e, None)
 Exception: xx\u1234yy

 I'd suggest that the behavior of the two should be unified anyway.
 The traceback module uses value.encode(ascii, backslashreplace)
 for any unicode object.

Nice observation -- so at least someone else agreed with me about what
the right thing to do is -- oh well.

On Thu, Nov 14, 2013 at 9:42 PM, Steven D'Aprano st...@pearwood.info wrote:
 I'm not
 convinced that treating Unicode strings as a special case is justified.
 It's been at least four, and possibly six (back to 2.2) point releases
 with this behaviour, and until now apparently nobody has noticed.

Not true -- apparently no one has brought it up on pyton-dev or posted
an issue, but I confirmed that I understood what was going on with a
little googling, including:

http://pythonhosted.org/kitchen/unicode-frustrations.html#frustration-5-exceptions

That's document was written 19 March 2011, and at the time the library
worked with pyton 2.3 and later.

Anyway, back to two questions:

1) could it be improved? it seems there is some disagreement on that one.

and

2) Is this a big enough deal to change 2.* ?

From what Martin says, No. So we don't need to argue about (1).

I sure hope py3 behavior is solid on this (sorry, no py3 to test on here...)

But I can't help myself:

Of all the guidelines for writing good code, the one I come back to
again and again is DRY -- it drives almost all of my code structure
decisions.

So, in this case, now I need to think about whether to put in a kludge
every single time I raise an Exception. In the script at hand, I
needed to change 7 instances of raising an Exception, out of 10 total.

Contrast that with one line of code changed in the Exception code.

In fact, what Ill probably do is write a little wrapper that does teh
encoding for an arbitrary exeption, and use that, somethign like:

def my_raise(exp, msg):
raise exp(unicode(msg).encode('ascii', 'replace'))

But does it really make sense for me to write that an use it all over
the place, as well as everyone else doing their own kludges?

Oh well, I suppose the real lesson is go to Python 3

-Chris

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Add transform() and untranform() methods

2013-11-15 Thread Walter Dörwald
Am 15.11.2013 um 16:57 schrieb Stephen J. Turnbull step...@xemacs.org:
 
 Walter Dörwald writes:
 Am 15.11.2013 um 00:42 schrieb Serhiy Storchaka storch...@gmail.com:
 
 15.11.13 00:32, Victor Stinner написав(ла):
 And add transform() and untransform() methods to bytes and str types.
 In practice, it might be same codecs registry for all codecs just with
 a new attribute.
 
 If the transform() method will be added, I prefer to have only
 one transformation method and specify a direction by the
 transformation name (bzip2/unbzip2).
 
 +1
 
 -1
 
 I can't support adding such methods (and that's why I ended up giving
 Nick's proposal for exposing codecs.encode and codecs.decode a +1).

My +1 was only for having the transformation be one-way under the condition 
that it is added at all.

 People think about these transformations as en- or de-coding, not
 transforming, most of the time.  Even for a transformation that is
 an involution (eg, rot13), people have an very clear idea of what's
 encoded and what's not, and they are going to prefer the names
 encode and decode for these (generic) operations in many cases.
 
 Eg, I don't think s.transform(decoder) is an improvement over
 decode(s, codec) (but tastes vary).[1]  It does mean that we need
 to add a redundant method, and I don't really see an advantage to it.

Actually my preferred method would be codec.decode(s). codec being the module 
that implements the functionality.

I don't think we need to invent another function registry.

 The semantics seem slightly off to me, since the purpose of the
 operation is to create a new object, not transform the original
 in-place.

This would mean the method would have to be called transformed()?

  (But of course str.encode and bytes.decode are precedents
 for those semantics.)
 
 
 Footnotes: 
 [1]  Arguments decoder and codec are identifiers, not metavariables.

Servus,
   Walter

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Finding overlapping matches with re assertions: bug or feature?

2013-11-15 Thread Tim Peters
[Tim]
 Is that a feature?  Or an accident?  It's very surprising to find a
 non-empty match inside an empty match (the outermost lookahead
 assertion).

[Paul Moore]
 Personally, I would read (?=(R)) as finding an empty match at a point
 where R starts. There's no implication that R is in any sense inside
 the match.

 (?=(\\w\w\w\w\w\w)\w\w\w) finds the first 3 characters of words that
 are 6 or more characters long. Once again, the lookahead extends
 beyond the extent of the main match.

 It's obscure and a little bizarre, but I'd say its intended and a
 logical consequence of the definitions.

After sleeping on it, I woke up a lot less surprised.  You'd think
that after decades of regexps, I'd be used to that by now ;-)

Thanks for the response!  Your points sound valid to me, and I agree.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] cpython: Issue #19544 and Issue #6516: Restore support for --user and --group parameters

2013-11-15 Thread Christian Heimes
Am 15.11.2013 19:07, schrieb jason.coombs:
 http://hg.python.org/cpython/rev/b9c9c4b2effe
 changeset:   87119:b9c9c4b2effe
 user:Andrew Kuchling a...@amk.ca
 date:Fri Nov 15 13:01:52 2013 -0500
 summary:
   Issue #19544 and Issue #6516: Restore support for --user and --group 
 parameters to sdist command as found in Python 2.7 and originally slated for 
 Python 3.2 but accidentally rolled back as part of the distutils2 rollback. 
 Closes Issue #6516.
 

Your commit has broken the build:

./python -E -S -m sysconfig --generate-posix-vars
Could not find platform dependent libraries exec_prefix
Consider setting $PYTHONHOME to prefix[:exec_prefix]
Traceback (most recent call last):
  File ./setup.py, line 11, in module
from distutils.core import Extension, setup
  File /home/heimes/dev/python/cpython/Lib/distutils/core.py, line 18,
in module
from distutils.cmd import Command
  File /home/heimes/dev/python/cpython/Lib/distutils/cmd.py, line 9,
in module
from distutils import util, dir_util, file_util, archive_util, dep_util
  File /home/heimes/dev/python/cpython/Lib/distutils/archive_util.py,
line 27, in module
from grp import getgrnam
ImportError: No module named 'grp'


The grp module is built later by setup.py.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] The pysandbox project is broken

2013-11-15 Thread Christian Tismer

On 13/11/13 00:49, Josiah Carlson wrote:


Python-dev is for the development of the Python core language, the 
CPython runtime, and libraries. Your sandbox, despite using and 
requiring deep knowledge of the runtime, is not developing those 
things. If you had a series of requests for the language or runtime 
that would make your job easier, then your thread would be on-topic.




I think you should consider to re-define you perception of the purpose
of the python-dev list. Simple feature-requests is not everything.
Instead, this list also touches the general direction where python should
go, and discusses the current hard-to-solve problems.

The sand-boxing feature via rexec, bastion etc. was perceived as a 
useful, quite
safe thing, until it was proven to be completely broken (Samuele Pedroni 
et. at., 2003
I think). After that, CPython simply removed those features and failed 
completely to

provide a better solution.

I appreciate very much that Victor tried his best to fill that old gap. 
And after
that breakage happened again, I think it is urgent to have an in-depth 
discussion how that

situation should be treated in the future.

--
Christian Tismer :^)   mailto:tis...@stackless.com
Software Consulting  : Have a break! Take a ride on Python's
Karl-Liebknecht-Str. 121 :*Starship* http://starship.python.net/
14482 Potsdam: PGP key - http://pgp.uni-mainz.de
phone +49 173 24 18 776  fax +49 (30) 700143-0023
PGP 0x57F3BF04   9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
  whom do you want to sponsor today?   http://www.stackless.com/

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] unicode Exception messages in py2.7

2013-11-15 Thread Greg Ewing

Armin Rigo wrote:

I figured that even using the traceback.py module and getting
Exception: \u1234\u1235\u5321 is rather useless if you tried to
raise an exception with a message in Thai.


But at least it tells you that *something* went wrong,
and points to the place in the code where it happened.
That has to be better than pretending that nothing
happened at all.

Also, if the escaping preserves the original byte
sequence of the message, there's a chance that someone
will be able to figure out what the message said.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Add transform() and untranform() methods

2013-11-15 Thread Nick Coghlan
On 16 Nov 2013 02:36, Antoine Pitrou solip...@pitrou.net wrote:

 On Sat, 16 Nov 2013 00:46:15 +1000
 Nick Coghlan ncogh...@gmail.com wrote:
  On 16 November 2013 00:04, Antoine Pitrou solip...@pitrou.net wrote:
   Rather than the more useful:
  
babcdef.decode(hex)
   Traceback (most recent call last):
 File stdin, line 1, in module
   TypeError: 'hex' decoder returned 'bytes' instead of 'str'; use
   codecs.decode() to decode to arbitrary types
  
   I think this may be confusing.  TypeError seems to suggest that the
   parameter type sent by the user to the method is wrong, which is not
   the actual cause of the error.
 
  The TypeError isn't new,

 Really? That's not what your message said.

The second example in my post included restoring the hex alias for
hex_codec (its absence is the reason for the current unknown encoding
error). The 3.2 and 3.3 error message for a restored alias would have been
TypeError: 'hex' decoder returned 'bytes' instead of 'str', which I agree
is confusing and uninformative - that's why I added the reference to the
module level functions to the output type errors *before* proposing the
restoration of the aliases.

So you can already use codecs.decode(s, 'hex_codec') in Python 3, you
just won't get a useful error leading you there if you use the more common
'hex' alias instead.

To address Serhiy's security concerns with the compression codecs (which
are technically independent of the question of restoring the aliases), I
also plan to document how to systematically blacklist particular codecs in
an application by setting attributes on the encodings module and/or
appropriate entries in sys.modules.

Finally, I now plan to write a documentation PEP that suggests clearly
splitting the codecs module docs into two layers: the type agnostic core
infrastructure and the specific application of that infrastructure to the
implementation of the text encoding model.

The only functional *change* I'd still like to make for 3.4 is to restore
the shorthand aliases for the non-Unicode codecs (to ease the migration for
folks coming from Python 2), but this thread has convinced me I likely need
to write the PEP *before* doing that, and I still have to integrate
ensurepip into pyvenv before the beta 1 deadline.

So unless you and Victor are prepared to +1 the restoration of the codec
aliases (closing issue 7475) in anticipation of that codecs infrastructure
documentation PEP, the change to restore the aliases probably won't be in
3.4. (I *might* get the PEP written in time regardless, but I'm not betting
on it at this point).

Cheers,
Nick.


 Regards

 Antoine.
 ___
 Python-Dev mailing list
 Python-Dev@python.org
 https://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] The pysandbox project is broken

2013-11-15 Thread Nick Coghlan
On 16 Nov 2013 08:25, Christian Tismer tis...@stackless.com wrote:

 On 13/11/13 00:49, Josiah Carlson wrote:


 Python-dev is for the development of the Python core language, the
CPython runtime, and libraries. Your sandbox, despite using and requiring
deep knowledge of the runtime, is not developing those things. If you had a
series of requests for the language or runtime that would make your job
easier, then your thread would be on-topic.


 I think you should consider to re-define you perception of the purpose
 of the python-dev list. Simple feature-requests is not everything.
 Instead, this list also touches the general direction where python should
 go, and discusses the current hard-to-solve problems.

 The sand-boxing feature via rexec, bastion etc. was perceived as a
useful, quite
 safe thing, until it was proven to be completely broken (Samuele Pedroni
et. at., 2003
 I think). After that, CPython simply removed those features and failed
completely to
 provide a better solution.

Use an OS level sandbox *is* better from a security point of view. It's
just not portable :P

Cheers,
Nick.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] The pysandbox project is broken

2013-11-15 Thread Guido van Rossum
On Fri, Nov 15, 2013 at 4:31 PM, Nick Coghlan ncogh...@gmail.com wrote:

 Use an OS level sandbox *is* better from a security point of view. It's
 just not portable :P


Honestly, I don't believe in portable security. :-)

BTW, in case it wasn't clear, I think it was a courageous step by Victor to
declare defeat. Negative results are also results, and they need to be
published. Thanks Victor!

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Add transform() and untranform() methods

2013-11-15 Thread Victor Stinner
2013/11/16 Nick Coghlan ncogh...@gmail.com:
 To address Serhiy's security concerns with the compression codecs (which are
 technically independent of the question of restoring the aliases), I also
 plan to document how to systematically blacklist particular codecs in an
 application by setting attributes on the encodings module and/or appropriate
 entries in sys.modules.

I would be simpler and safer to blacklist bytes=bytes and str=str
codecs from bytes.decode() and str.encode() directly. Marc Andre
Lemburg proposed to add new attributes in CodecInfo to specify input
and output types.

 The only functional *change* I'd still like to make for 3.4 is to restore
 the shorthand aliases for the non-Unicode codecs (to ease the migration for
 folks coming from Python 2), but this thread has convinced me I likely need
 to write the PEP *before* doing that, and I still have to integrate
 ensurepip into pyvenv before the beta 1 deadline.

 So unless you and Victor are prepared to +1 the restoration of the codec
 aliases (closing issue 7475) in anticipation of that codecs infrastructure
 documentation PEP, the change to restore the aliases probably won't be in
 3.4. (I *might* get the PEP written in time regardless, but I'm not betting
 on it at this point).

Using StackOverflow search engine, I found some posts where people
asks for hex codec on Python 3. There are two answers: use binascii
module or use codecs.encode(). So even if codecs.encode() was never
documented, it looks like it is used. So I now agree that documenting
it would not make the situation worse.

Adding transform()/untransform() method to bytes and str is a non
trivial change and not everybody likes them. Anyway, it's too late for
Python 3.4.

In my opinion, the best option is to add new input_type/output_type
attributes to CodecInfo right now, and modify the codecs so
abc.encode(hex) raises a LookupError (instead of tricky error
message with some evil low-level hacks on the traceback and the
exception, which is my initial concern in this mail thread). It fixes
also the security vulnerability.

To keep backward compatibility (even with custom codecs registered
manually), if input_type/output_type is not defined, we should
consider that the codec is a classical text encoding (encode
str=bytes, decode bytes=str).

The type of codecs.encode() result is my least concern in this topic.

I created the following issue to implement my idea:
http://bugs.python.org/issue19619

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] The pysandbox project is broken

2013-11-15 Thread Christian Tismer

On 16.11.13 01:35, Guido van Rossum wrote:
On Fri, Nov 15, 2013 at 4:31 PM, Nick Coghlan ncogh...@gmail.com 
mailto:ncogh...@gmail.com wrote:


Use an OS level sandbox *is* better from a security point of
view. It's just not portable :P


Honestly, I don't believe in portable security. :-)

BTW, in case it wasn't clear, I think it was a courageous step by 
Victor to declare defeat. Negative results are also results, and they 
need to be published. Thanks Victor!


Sure it was, and it was great to follow Victor's project!
I was about to use it in production, until I saw it's flaws, a while back.

Nevertheless, the issue has never been treated as much as to be able to
say this way you implement that security in Python, whatever that 
should be.

So I think it is worth discussing, and may it just be to identify the levels
of security involved, to help people to even identify their individual 
needs.


My question is, actually:
Do we need to address this topic, or is it already crystal clear that 
something
like PyPy's approach is necessary and sufficient to solve the common, 
undefined
problem of run some script on whatnot, with the following security 
constraint?


IOW: Do we really need a full abstraction, embedded in a virtual OS, or
is there already a compromise that suits 98 percent of the common needs?

I think as a starter, categorizing the expectations of some measure of 
'secure python'
would make sense. And I'm asking the people with better knowledge of 
these matters

than I have. (and not asking those who don't... ;-) )

cheers -- Chris

--
Christian Tismer :^)   mailto:tis...@stackless.com
Software Consulting  : Have a break! Take a ride on Python's
Karl-Liebknecht-Str. 121 :*Starship* http://starship.python.net/
14482 Potsdam: PGP key - http://pgp.uni-mainz.de
phone +49 173 24 18 776  fax +49 (30) 700143-0023
PGP 0x57F3BF04   9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
  whom do you want to sponsor today?   http://www.stackless.com/

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] (#19562) Asserts in Python stdlib code (datetime.py)

2013-11-15 Thread Terry Reedy

http://bugs.python.org/issue19562
propose to change the first assert in Lib/datetime.py
  assert 1 = month = 12, month
to
  assert 1 = month = 12,'month must be in 1..12'
to match the next two asserts out of the *53* in the file. I think that 
is the wrong direction of change, but that is not my question here.


Should stdlib code use assert at all?

If user input can trigger an assert, then the code should raise a normal 
exception that will not disappear with -OO.


If the assert is testing program logic, then it seems that the test 
belongs in the test file, in this case, test/test_datetime.py. For 
example, consider the following (backwards) code.


_DI4Y   = _days_before_year(5)
# A 4-year cycle has an extra leap day over what we'd get from pasting
# together 4 single years.
assert _DI4Y == 4 * 365 + 1

To me, the constant should be directly set to its known value.
_DI4Y = 4*365 + 1.
The function should then be tested in test_datetime.
  self.assertEqual(dt._days_before_year(5), dt._DI4Y)

Is there any policy on use of assert in stdlib production code?

--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] (#19562) Asserts in Python stdlib code (datetime.py)

2013-11-15 Thread Tim Peters
[Terry Reedy]
 Should stdlib code use assert at all?

Of course, and for exactly the same reasons we use `assert()` in
Python's C code:  to verify preconditions, postconditions, and
invariants that should never fail.  Assertions should never be used
to, e.g., verify user-supplied input (or anything else we believe
_may_ fail).

 If user input can trigger an assert, then the code should raise a normal
 exception that will not disappear with -OO.

Agreed.

 If the assert is testing program logic, then it seems that the test belongs
 in the test file, in this case, test/test_datetime.py. For example, consider
 the following (backwards) code.

 _DI4Y   = _days_before_year(5)
 # A 4-year cycle has an extra leap day over what we'd get from pasting
 # together 4 single years.
 assert _DI4Y == 4 * 365 + 1

 To me, the constant should be directly set to its known value.
 _DI4Y = 4*365 + 1.
 The function should then be tested in test_datetime.
   self.assertEqual(dt._days_before_year(5), dt._DI4Y)

I think making that change would be pointless code churn.  Harmful,
even.  As the guy who happened to have written that code ;-), I think
it's valuable to have the _code_ (not off buried in some monstrously
tedious test file) explain what the comments there do explain, and
verify with the assert.  If anyone needs to muck with the
implementation of datetime, it's crucial they understand what DI4Y
_means_, and that it's identical to _days_before_year(5).  Its actual
value (4*365 + 1) isn't really interesting.  Defining _DI4Y _as_
_days_before_year(5) captures its _meaning_.

Ain't broke - don't fix.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] (#19562) Asserts in Python stdlib code (datetime.py)

2013-11-15 Thread Mark Janssen
 Should stdlib code use assert at all?

 If user input can trigger an assert, then the code should raise a normal
 exception that will not disappear with -OO.

 If the assert is testing program logic, then it seems that the test belongs
 in the test file, in this case, test/test_datetime.py. For example, consider
 the following (backwards) code.

 Is there any policy on use of assert in stdlib production code?

It is my assertion that assert should only be used where a
system-level problem would occur, where you cannot trap an error
condition.

-- 
MarkJ
Tacoma, Washington
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] The pysandbox project is broken

2013-11-15 Thread Ethan Furman

On 11/15/2013 02:24 PM, Christian Tismer wrote:


I appreciate very much that Victor tried his best to fill that old gap.
And after that breakage happened again, I think it is urgent to have an

 in-depth discussion how that situation should be treated in the
 future.

+1

--
~Ethan~

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com