Re: [Python-Dev] Add transform() and untranform() methods
On 15.11.2013 08:13, Nick Coghlan wrote: On 15 November 2013 11:10, Terry Reedy tjre...@udel.edu wrote: On 11/14/2013 5:32 PM, Victor Stinner wrote: I don't like the functions codecs.encode() and codecs.decode() because the type of the result depends on the encoding (second parameter). We try to avoid this in Python. Such dependence is common with arithmetic. 1 + 2 3 1 + 2.0 3.0 1 + 2+0j (3+0j) sum((1,2,3), 0) 6 sum((1,2,3), 0.0) 6.0 sum((1,2,3), 0.0+0j) (6+0j) for f in (compile, eval, getattr, iter, max, min, next, open, pow, round, type, vars): type(f(*args)) # depends on the inputs That is a large fraction of the non-class builtin functions. *Type* dependence between inputs and outputs is common (and completely non-controversial). The codecs system is different, since the supported input and output types are *value* dependent, driven by the name of the codec. That's the part which makes the codec machinery interesting in general, since it combines a value driven lazy loading mechanism (based on the codec name) with the subsequent invocation of that mechanism: the default codec search algorithm goes hunting in the encodings package (or the alias dictionary), but you can register custom search algorithms and provide encodings any way you want. It does mean, however, that the most you can claim for the type signature of codecs.encode and codecs.decode is that they accept an object and return an object. Beyond that, it's completely driven by the value of the codec. Indeed. You have to think of the codec registry as a mere lookup mechanism - very much like an import. The implementation of the imported module defines which types are supported and how the encode/decode steps work. In Python 2.x, the type constraints imposed by the str and unicode convenience methods is basestring in, basestring out. As it happens, all of the standard library codecs abide by that restriction , so it was easy to interpret the codecs module itself as having the same basestring in, basestring out limitation, especially given the heavy focus on text encodings in the way it was documented. In practice, the codecs weren't that open ended - some of them only accepted 8 bit strings, some only accepted unicode, some accepted both (perhaps relying on implicit decoding to unicode), The migration to Python 3 made the contrast between the two far more stark however, hence the long and involved discussion on issue 7475, and the fact that the non-Unicode codecs are currently still missing their shorthand aliases. The proposal I posted to issue 7475 back in April (and, in the absence of any objections to the proposal, finally implemented over the past few weeks) was to take advantage of the fact that the codecs.encode and codecs.decode convenience functions exist (and have been covered by the regression test suite) as far back as Python 2.4. I did this merely by documenting the existing of the functions for Python 2.7, 3.3 and 3.4, changing the exception messages thrown for codec output type errors on the convenience methods to reference them, and by updating the Python 3.4 What's New document to explain the changes. This approach provides a Python 2/3 compatible solution for usage of non-Unicode encodings: users simply need to call the existing module level functions in the codecs module, rather than using the methods on specific builtin types. This approach also means that the binary codecs can be used with any bytes-like object (including memoryview and array.array), rather than being limited to types that implement a new method (like transform), and can also be used in Python 2/3 source compatible APIs (since the data driven nature of the problem makes 2to3 unusable as a solution, and that doesn't help single code base projects anyway). Right, and that was the main point in making codecs flexible in this respect. There are many other types which can serve as input and output - in the stdlib and interpreter as well as in extension modules that implement their own types. From my point of view, this is now just a matter of better documenting the status quo, and nudging people in the right direction when it comes to using the appropriate API for non-Unicode codecs. Since we now realise these functions have existed since Python 2.4, it doesn't make sense to try to fundamentally change direction, but instead to work on making it better. A few things I noticed while implementing the recent updates: - as you noted in your other email, while MAL is on record as saying the codecs module is intended for arbitrary codecs, not just Unicode encodings, readers of the current docs can definitely be forgiven for not realising that. We really need to better separate the codecs module docs from the text model docs (two new sections in the language reference, one for the codecs machinery and one for the text model would likely be appropriate. The io
Re: [Python-Dev] Add transform() and untranform() methods
On Fri, 15 Nov 2013 09:03:37 +1000 Nick Coghlan ncogh...@gmail.com wrote: And add transform() and untransform() methods to bytes and str types. In practice, it might be same codecs registry for all codecs just with a new attribute. This is completely the wrong approach. There's zero justification for adding new builtin methods for this use case - encoding and decoding are generic operations, they should use functions not methods. I'm sorry, I disagree. The question is what use case it is solving, and there's zero benefit in writing codecs.encode(zlib) compared to e.g. zlib.compress(). A transform() or untransform() method, however, allows for a much more convenient spelling, with easy cascading, e.g.: b.transform(zlib).transform(base64) In other words, there's zero justification for codecs.encode() and codecs.decode(). The fact that the codecs machinery works on arbitrary object transformation is a pointless genericity, if it doesn't bring any additional convenience compared to the canonical functions in their respective modules. At this point, the only person that can get me to revert this clarification of MAL's original vision for the codecs module is Guido, since anything else completely fails to address the Python 3 adoption barrier posed by the current state of Python 3's binary codec support. I'd like to challenge your assertion that your change addresses anything. It's not easier to change b.encode(zlib) into codecs.encode(zlib, b), than it is to change it into zlib.compress(b). Regards, Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Finding overlapping matches with re assertions: bug or feature?
I was surprised to find that this works: if you want to find all _overlapping_ matches for a regexp R, wrap it in (?=(R)) and feed it to (say) finditer. Here's a very simple example, finding all overlapping occurrences of xx: pat = re.compile((?=(xx))) for it in pat.finditer(): print(it.span(1)) That displays: (0, 2) (1, 3) (2, 4) Is that a feature? Or an accident? It's very surprising to find a non-empty match inside an empty match (the outermost lookahead assertion). If it's intended behavior, it's just in time for the holiday season; e.g., to generate ASCII art for half an upside-down Christmas tree: pat = re.compile((?=(x+))) for it in pat.finditer(xx): print(it.group(1)) ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Finding overlapping matches with re assertions: bug or feature?
On 15 November 2013 06:48, Tim Peters tim.pet...@gmail.com wrote: Is that a feature? Or an accident? It's very surprising to find a non-empty match inside an empty match (the outermost lookahead assertion). Personally, I would read (?=(R)) as finding an empty match at a point where R starts. There's no implication that R is in any sense inside the match. (?=(\\w\w\w\w\w\w)\w\w\w) finds the first 3 characters of words that are 6 or more characters long. Once again, the lookahead extends beyond the extent of the main match. It's obscure and a little bizarre, but I'd say its intended and a logical consequence of the definitions. Paul ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Assign(expr* targets, expr value) - why targetS?
On Tue, Nov 12, 2013 at 5:08 PM, Benjamin Peterson benja...@python.org wrote: 2013/11/12 anatoly techtonik techto...@gmail.com: On Sun, Nov 10, 2013 at 8:34 AM, Benjamin Peterson benja...@python.org wrote: 2013/11/10 anatoly techtonik techto...@gmail.com: http://hg.python.org/cpython/file/1ee45eb6aab9/Parser/Python.asdl In Assign(expr* targets, expr value), why the first argument is a list? x = y = 42 Thanks. Speaking of this ASDL. `expr* targets` means that multiple entities of `expr` under the name 'targets' can be passed to Assign statement. Assign uses them as left value. But `expr` definition contains things that can not be used as left side assignment targets: expr = BoolOp(boolop op, expr* values) | BinOp(expr left, operator op, expr right) ... | Str(string s) -- need to specify raw, unicode, etc? | Bytes(bytes s) | NameConstant(singleton value) | Ellipsis -- the following expression can appear in assignment context | Attribute(expr value, identifier attr, expr_context ctx) | Subscript(expr value, slice slice, expr_context ctx) | Starred(expr value, expr_context ctx) | Name(identifier id, expr_context ctx) | List(expr* elts, expr_context ctx) | Tuple(expr* elts, expr_context ctx) If I understand correctly, this is compiled into C struct definitions (Python-ast.c), and there is a code to traverse the structure, but where is code that validates that the structure is correct? Is it done on the first level - text file parsing, before ASDL is built? If so, then what is the role of this ADSL exactly that the first step is unable to solve? Only valid expression targets are allowed during AST construction. See set_expr_context in ast.c. Oh my. Now there is also CST in addition to AST. This stuff - http://docs.python.org/devguide/ - badly needs diagrams about data transformation toolchain from Python source code to machine execution instructions. I'd like some pretty stuff, but raw blogdiag hack will do the job http://blockdiag.com/en/blockdiag/index.html There is no set_expr_context in my copy of CPython code, which seems to be some alpha of Python 3.4 Is it possible to fix ADSL to move `expr` that are allowed in Assign into `expr` subset? What effect will it achieve? I mean - will ADSL compiler complain about wrong stuff on the left side, or it will still be a role of some other component. Which one? I'm not sure what you mean by an `expr` subset. Transform this: expr = BoolOp(boolop op, expr* values) | BinOp(expr left, operator op, expr right) ... | Str(string s) -- need to specify raw, unicode, etc? | Bytes(bytes s) | NameConstant(singleton value) | Ellipsis -- the following expression can appear in assignment context | Attribute(expr value, identifier attr, expr_context ctx) | Subscript(expr value, slice slice, expr_context ctx) | Starred(expr value, expr_context ctx) | Name(identifier id, expr_context ctx) | List(expr* elts, expr_context ctx) | Tuple(expr* elts, expr_context ctx) to this: expr = BoolOp(boolop op, expr* values) | BinOp(expr left, operator op, expr right) ... | Str(string s) -- need to specify raw, unicode, etc? | Bytes(bytes s) | NameConstant(singleton value) | Ellipsis -- the following expression can appear in assignment context | expr_asgn expr_asgn = Attribute(expr value, identifier attr, expr_context ctx) | Subscript(expr value, slice slice, expr_context ctx) | Starred(expr value, expr_context ctx) | Name(identifier id, expr_context ctx) | List(expr* elts, expr_context ctx) | Tuple(expr* elts, expr_context ctx) ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Assign(expr* targets, expr value) - why targetS?
On Fri, Nov 15, 2013 at 12:54 PM, anatoly techtonik techto...@gmail.com wrote: On Tue, Nov 12, 2013 at 5:08 PM, Benjamin Peterson benja...@python.org wrote: 2013/11/12 anatoly techtonik techto...@gmail.com: On Sun, Nov 10, 2013 at 8:34 AM, Benjamin Peterson benja...@python.org wrote: 2013/11/10 anatoly techtonik techto...@gmail.com: http://hg.python.org/cpython/file/1ee45eb6aab9/Parser/Python.asdl In Assign(expr* targets, expr value), why the first argument is a list? x = y = 42 Thanks. Speaking of this ASDL. `expr* targets` means that multiple entities of `expr` under the name 'targets' can be passed to Assign statement. Assign uses them as left value. But `expr` definition contains things that can not be used as left side assignment targets: expr = BoolOp(boolop op, expr* values) | BinOp(expr left, operator op, expr right) ... | Str(string s) -- need to specify raw, unicode, etc? | Bytes(bytes s) | NameConstant(singleton value) | Ellipsis -- the following expression can appear in assignment context | Attribute(expr value, identifier attr, expr_context ctx) | Subscript(expr value, slice slice, expr_context ctx) | Starred(expr value, expr_context ctx) | Name(identifier id, expr_context ctx) | List(expr* elts, expr_context ctx) | Tuple(expr* elts, expr_context ctx) If I understand correctly, this is compiled into C struct definitions (Python-ast.c), and there is a code to traverse the structure, but where is code that validates that the structure is correct? Is it done on the first level - text file parsing, before ASDL is built? If so, then what is the role of this ADSL exactly that the first step is unable to solve? Only valid expression targets are allowed during AST construction. See set_expr_context in ast.c. Oh my. Now there is also CST in addition to AST. This stuff - http://docs.python.org/devguide/ - badly needs diagrams about data transformation toolchain from Python source code to machine execution instructions. I'd like some pretty stuff, but raw blogdiag hack will do the job http://blockdiag.com/en/blockdiag/index.html There is no set_expr_context in my copy of CPython code, which seems to be some alpha of Python 3.4 Is it possible to fix ADSL to move `expr` that are allowed in Assign into `expr` subset? What effect will it achieve? I mean - will ADSL compiler complain about wrong stuff on the left side, or it will still be a role of some other component. Which one? I'm not sure what you mean by an `expr` subset. Transform this: expr = BoolOp(boolop op, expr* values) | BinOp(expr left, operator op, expr right) ... | Str(string s) -- need to specify raw, unicode, etc? | Bytes(bytes s) | NameConstant(singleton value) | Ellipsis -- the following expression can appear in assignment context | Attribute(expr value, identifier attr, expr_context ctx) | Subscript(expr value, slice slice, expr_context ctx) | Starred(expr value, expr_context ctx) | Name(identifier id, expr_context ctx) | List(expr* elts, expr_context ctx) | Tuple(expr* elts, expr_context ctx) to this: expr = BoolOp(boolop op, expr* values) | BinOp(expr left, operator op, expr right) ... | Str(string s) -- need to specify raw, unicode, etc? | Bytes(bytes s) | NameConstant(singleton value) | Ellipsis -- the following expression can appear in assignment context | expr_asgn expr_asgn = Attribute(expr value, identifier attr, expr_context ctx) | Subscript(expr value, slice slice, expr_context ctx) | Starred(expr value, expr_context ctx) | Name(identifier id, expr_context ctx) | List(expr* elts, expr_context ctx) | Tuple(expr* elts, expr_context ctx) And also this: | Assign(expr* targets, expr value) to this: | Assign(expr_asgn* targets, expr value) ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Add transform() and untranform() methods
On Fri, Nov 15, 2013 at 05:13:34PM +1000, Nick Coghlan wrote: A few things I noticed while implementing the recent updates: - as you noted in your other email, while MAL is on record as saying the codecs module is intended for arbitrary codecs, not just Unicode encodings, readers of the current docs can definitely be forgiven for not realising that. We really need to better separate the codecs module docs from the text model docs (two new sections in the language reference, one for the codecs machinery and one for the text model would likely be appropriate. The io module docs and those for the builtin open function may also be affected) - a mechanism for annotating frames would help avoid the need for nasty hacks like the exception wrapping that aims to make codec failures easier to debug - if codecs exposed a way to separate the input type check from the invocation of the codec, we could redirect users to the module API for bad input types as well (e.g. calling input str.encode(bz2) - if we want something that doesn't need to be imported, then encode() and decode() builtins make more sense than new methods on str, bytes and bytearray objects (since builtins would support memoryview and array.array as well, and it avoids ambiguity regarding the direction of the operation) Sounds good to me. - the codecs module should offer a way to register a new alias for an existing codec - the codecs module should offer a way to map a name to a CodecInfo object without registering a new search function It would be really good to be able to query the available codecs. For example, many applications offer an Encoding menu, where you can specify the codec used for text. That's hard in Python, since you can't retrieve a list of known codecs. -- Steven ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Add transform() and untranform() methods
15.11.13 12:02, Steven D'Aprano написав(ла): It would be really good to be able to query the available codecs. For example, many applications offer an Encoding menu, where you can specify the codec used for text. That's hard in Python, since you can't retrieve a list of known codecs. And you can't determine which codec is binary-text encoding. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Add transform() and untranform() methods
On Fri, Nov 15, 2013 at 10:22:28AM +0100, Antoine Pitrou wrote: On Fri, 15 Nov 2013 09:03:37 +1000 Nick Coghlan ncogh...@gmail.com wrote: And add transform() and untransform() methods to bytes and str types. In practice, it might be same codecs registry for all codecs just with a new attribute. This is completely the wrong approach. There's zero justification for adding new builtin methods for this use case - encoding and decoding are generic operations, they should use functions not methods. I'm sorry, I disagree. The question is what use case it is solving, and there's zero benefit in writing codecs.encode(zlib) compared to e.g. zlib.compress(). One benefit is: import codecs codec = get_name_of_compression_codec() result = codecs.encode(data, codec) versus: codec = get_name_of_compression_codec() if codec == zlib: import zlib encoder = zlib.compress elif codec == bz2 import bz2 encoder = bz2.compress elif codec == gzip: import gzip encoder = gzip.compress elif codec == squash: import mySquashLib encoder = mySquashLib.squash elif ...: # and so on result = encoder(data) A transform() or untransform() method, however, allows for a much more convenient spelling, with easy cascading, e.g.: b.transform(zlib).transform(base64) Yes, that's quite nice. Although it need not be a method, a built-in function works for me too: # either of these: transform(transform(b, zlib), base64) encode(encode(b, zlib), base64) If encoding/decoding is intended to be completely generic (even if 99% of the uses will be with strings and bytes), is there any reason to prefer built-in functions rather than methods on object? -- Steven ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Add transform() and untranform() methods
On Fri, 15 Nov 2013 21:28:35 +1100 Steven D'Aprano st...@pearwood.info wrote: One benefit is: import codecs codec = get_name_of_compression_codec() result = codecs.encode(data, codec) That's a good point. If encoding/decoding is intended to be completely generic (even if 99% of the uses will be with strings and bytes), is there any reason to prefer built-in functions rather than methods on object? Practicality beats purity. Personally, I've never used codecs on anything else than str and bytes objects. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Add transform() and untranform() methods
15.11.13 12:28, Steven D'Aprano написав(ла): One benefit is: import codecs codec = get_name_of_compression_codec() result = codecs.encode(data, codec) And this is a hole in a security if you don't check codec name before calling a codec. See topic about utilizing zip-bombs via codecs machinery. Also usually you need more than just uncompress binary data by Python name. You need map external compression name to internal Python codec name, you need configure decompressor object by specific options, perhaps you need different buffering strategies for different compression algorithms. See for example zipfile and tarfile sources. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] unicode Exception messages in py2.7
Hi, FWIW, the pure Python traceback.py module has a slightly different (and saner) behavior: e = Exception(uxx\u1234yy) traceback.print_exception(Exception, e, None) Exception: xx\u1234yy I'd suggest that the behavior of the two should be unified anyway. The traceback module uses value.encode(ascii, backslashreplace) for any unicode object. A bientôt, Armin. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Python-checkins] cpython: Close #17828: better handling of codec errors
On 15 November 2013 17:22, Stefan Behnel stefan...@behnel.de wrote: I can't see any bit of information being added by chaining the exceptions in this specific case. Remember that each change to exception messages and/or exception chaining will break someone's doctests somewhere, and it's really ugly to work around chained exceptions in (cross-Py-version) doctests. I understand that this is helpful *in general*, though, i.e. for other kinds of exceptions in codecs, so maybe changing the exception handling in the doctest module could be a work-around for this kind of change? IIRC, doctest ignores the traceback contents by default - this is just a bug where the chaining is also triggering for the initial codec lookup when it should avoid doing that. Created http://bugs.python.org/issue19609 Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Add transform() and untranform() methods
On 15 November 2013 20:33, Antoine Pitrou solip...@pitrou.net wrote: On Fri, 15 Nov 2013 21:28:35 +1100 Steven D'Aprano st...@pearwood.info wrote: One benefit is: import codecs codec = get_name_of_compression_codec() result = codecs.encode(data, codec) That's a good point. If encoding/decoding is intended to be completely generic (even if 99% of the uses will be with strings and bytes), is there any reason to prefer built-in functions rather than methods on object? Practicality beats purity. Personally, I've never used codecs on anything else than str and bytes objects. The reason I'm now putting some effort into better documenting the status quo for codec handling in Python 3 and filing off some of the rough edges (rather than proposing adding any new APIs to Python 3.x) is because the users I care about in this matter are web developers that already make use of the binary codecs and are adopting the single-source approach to handle supporting both Python 2 and Python 3. Armin Ronacher is the one who's been most vocal about the problem, but he's definitely not alone. A new API for binary transforms is potentially an academically interesting concept, but it solves zero current real world problems. By contrast, being clear about the fact that codecs.encode and codecs.decode exist and are available as far back as Python 2.4 helps to eliminate a genuine barrier to Python 3 adoption for a subset of the community. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Add transform() and untranform() methods
On Fri, 15 Nov 2013 21:45:31 +1000 Nick Coghlan ncogh...@gmail.com wrote: The reason I'm now putting some effort into better documenting the status quo for codec handling in Python 3 and filing off some of the rough edges (rather than proposing adding any new APIs to Python 3.x) is because the users I care about in this matter are web developers that already make use of the binary codecs and are adopting the single-source approach to handle supporting both Python 2 and Python 3. Armin Ronacher is the one who's been most vocal about the problem, but he's definitely not alone. zlib.compress(something) works on both Python 2 and Python 3, why do you need something else? Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Add transform() and untranform() methods
2013/11/15 Nick Coghlan ncogh...@gmail.com: The reason I'm now putting some effort into better documenting the status quo for codec handling in Python 3 and filing off some of the rough edges (rather than proposing adding any new APIs to Python 3.x) is because the users I care about in this matter are web developers that already make use of the binary codecs and are adopting the single-source approach to handle supporting both Python 2 and Python 3. Armin Ronacher is the one who's been most vocal about the problem, but he's definitely not alone. Except of Armin Ronacher, I never see anyway blocked when trying to port a project to Python3 because of these bytes=bytes and str=str codecs. I did a quick search on Google but I failed to find a question how can I write .encode(hex) or .encode(zlib) in Python 3?. It was just a quick search, it's likely that many developers hit this Python 3 regression, but I'm confident that developers are able to workaround themself this regression (ex: use directly the right Python module). I saw a lot of huge code base ported to Python 3 without the need of these codecs. For example: Django which is a web framework has been ported on Python 3, I know that Armin Ronacher also works on web things (I don't know what exactly). A new API for binary transforms is potentially an academically interesting concept, but it solves zero current real world problems. I would like to reply the same for these codecs: they are not solving any real world problem :-) Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Add transform() and untranform() methods
On 15 November 2013 12:07, Victor Stinner victor.stin...@gmail.com wrote: A new API for binary transforms is potentially an academically interesting concept, but it solves zero current real world problems. I would like to reply the same for these codecs: they are not solving any real world problem :-) As Nick is only documenting long-existing functions, I fail to see the issue here. If someone were to propose new methods, builtins, or module functions, then I could see a reason for debate. But surely simply documenting existing functions is not worth all this pushback? Paul ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Add transform() and untranform() methods
On 15.11.2013 12:45, Nick Coghlan wrote: On 15 November 2013 20:33, Antoine Pitrou solip...@pitrou.net wrote: On Fri, 15 Nov 2013 21:28:35 +1100 Steven D'Aprano st...@pearwood.info wrote: One benefit is: import codecs codec = get_name_of_compression_codec() result = codecs.encode(data, codec) That's a good point. If encoding/decoding is intended to be completely generic (even if 99% of the uses will be with strings and bytes), is there any reason to prefer built-in functions rather than methods on object? Practicality beats purity. Personally, I've never used codecs on anything else than str and bytes objects. The reason I'm now putting some effort into better documenting the status quo for codec handling in Python 3 and filing off some of the rough edges (rather than proposing adding any new APIs to Python 3.x) is because the users I care about in this matter are web developers that already make use of the binary codecs and are adopting the single-source approach to handle supporting both Python 2 and Python 3. Armin Ronacher is the one who's been most vocal about the problem, but he's definitely not alone. You can add me to that list :-). Esp. the hex codec is very handy. Google returns a few thousand hits for that codec alone. One detail that people often tend to forget is the extensibility of the codec system. It is easily possible to add new codecs to the system to e.g. perform encoding, escaping, compression or other conversion operations, so the set of codecs in the stdlib is not the complete set of codecs used in the wild - and it's not intended to be. As example: We've written codecs for customers that perform special types of XML un/escaping. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Nov 15 2013) Python Projects, Consulting and Support ... http://www.egenix.com/ mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ 2013-11-19: Python Meeting Duesseldorf ... 4 days to go : Try our mxODBC.Connect Python Database Interface for free ! :: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Add transform() and untranform() methods
On Thu, Nov 14, 2013 at 7:32 PM, Victor Stinner victor.stin...@gmail.com wrote: I would prefer to split the registry of codecs to have 3 registries: - encoding (a better name can found): encode str=bytes, decode bytes=str - bytes: encode bytes=bytes, decode bytes=bytes - str: encode str=str, decode str=str And add transform() and untransform() methods to bytes and str types. In practice, it might be same codecs registry for all codecs just with a new attribute. I like this idea very much. But to see IIUC, let me be more explicit... you'll have (of course, always py3k-speaking): - bytes.decode() - str ... here you can only use unicode encodings - no bytes.encode(), like today - bytes.transform() - bytes ... here you can only use things like zlib, rot13, etc - str.encode() - bytes ... here you can only use unicode encodings - no str.decode(), like today - str.transform() - str ... here you can only use things like... like what? When to use decode/encode was always a major pain point for people, so doing this extra separation and cleaning would bring more clarity to when to use what. Thanks! -- .Facundo Blog: http://www.taniquetil.com.ar/plog/ PyAr: http://www.python.org/ar/ Twitter: @facundobatista ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] unicode Exception messages in py2.7
Am 15.11.13 00:57, schrieb Chris Barker: Maybe so -- but we are either maintaining 2.7 or not -- it WIL be around for along time yet... Procedurally, it's really easy. Ultimately it's up to the release manager to decide which changes go into a release and which don't, and Benjamin has already voiced an opinion. In addition, Guido van Rossum has voiced an opinion a while ago that he doesn't consider fixing bugs for 2.7 very useful, and would rather see maintenance focus on ongoing support for new operating systems, compiler, build environments, etc. The rationale is that people who have lived with the glitches of 2.x for so long surely have already made their work-arounds, so they aren't helped with receiving bug fixes. The same is true in your case: you indicated that you *already* work around the problem. It may have been tedious when you had to do it, but now it's done - and you might not even change your code even if Python 2.7.x gets changed, since you might want to support older 2.7.x release for some time. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Add transform() and untranform() methods
On 15 November 2013 22:24, Paul Moore p.f.mo...@gmail.com wrote: On 15 November 2013 12:07, Victor Stinner victor.stin...@gmail.com wrote: A new API for binary transforms is potentially an academically interesting concept, but it solves zero current real world problems. I would like to reply the same for these codecs: they are not solving any real world problem :-) As Nick is only documenting long-existing functions, I fail to see the issue here. If someone were to propose new methods, builtins, or module functions, then I could see a reason for debate. But surely simply documenting existing functions is not worth all this pushback? There's a bit more to it than that (and that's why I started the other thread about the codec aliases before proceeding to the final step). One of the changes Victor is concerned about is that when you use an incorrect codec in one of the Unicode-encoding-only convenience methods, the recent exception updates explicitly push users towards using those module level functions instead: import codecs no good.encode(rot_13) Traceback (most recent call last): File stdin, line 1, in module TypeError: 'rot_13' encoder returned 'str' instead of 'bytes'; use codecs.encode() to encode to arbitrary types codecs.encode(just fine, rot_13) 'whfg svar' bno good.decode(quopri_codec) Traceback (most recent call last): File stdin, line 1, in module TypeError: 'quopri_codec' decoder returned 'bytes' instead of 'str'; use codecs.decode() to decode to arbitrary types codecs.decode(bjust fine, quopri_codec) b'just fine' My perspective is that, in current Python, that *is* the right thing for people to do, and any hypothetical new API proposed for Python 3.5 would do nothing to change what's right for Python 3.4 code (or Python 2/3 compatible code). I also find it bizarre that several of those arguing that this is too niche a feature to be worth refining are simultaneously in favour of a proposal to add new *methods on builtin types* for the same niche feature. The other part is the fact that I updated the What's New document to highlight these tweaks: http://docs.python.org/dev/whatsnew/3.4.html#improvements-to-handling-of-non-unicode-codecs As noted earlier in the thread, Armin Ronacher has been the most vocal of the users of this feature in Python 2 that lamented it's absence in Python 3 (see, for example, http://lucumr.pocoo.org/2012/8/11/codec-confusion/), but I've also received plenty of subsequent feedback along the lines of what he said! (such as http://bugs.python.org/issue7475#msg187630). Many of the proposed solutions from the people affected by the change haven't been usable (since they've often been based on a misunderstanding of why the method behaviour changed in Python 3 in the first place), but the pain they experience is genuine, and it can unnecessarily sour their whole experience of the transition. I consider documenting the existing module level functions and nudging users towards them when they try to use the affected codecs to be an expedient way to say yes, this is still available if you really want to use it, but the required spelling is different. However, the one thing I'm *not* going to do at this point is restore the shorthand aliases, so those opposing the lowering of this barrier to transition can take comfort in the fact they have succeeded in ensuring that the out-of-the-box experience for users of this feature migrating from Python 2 remains the unfriendly: babcdef.decode(hex) Traceback (most recent call last): File stdin, line 1, in module LookupError: unknown encoding: hex Rather than the more useful: babcdef.decode(hex) Traceback (most recent call last): File stdin, line 1, in module TypeError: 'hex' decoder returned 'bytes' instead of 'str'; use codecs.decode() to decode to arbitrary types Which would then lead them to the working (and still Python 2 compatible) code: codecs.decode(babcdef, hex) b'\xab\xcd\xef' Regards, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Add transform() and untranform() methods
On Fri, 15 Nov 2013 23:50:23 +1000 Nick Coghlan ncogh...@gmail.com wrote: My perspective is that, in current Python, that *is* the right thing for people to do, and any hypothetical new API proposed for Python 3.5 would do nothing to change what's right for Python 3.4 code (or Python 2/3 compatible code). I also find it bizarre that several of those arguing that this is too niche a feature to be worth refining are simultaneously in favour of a proposal to add new *methods on builtin types* for the same niche feature. I am not claiming it is a niche feature, I am claiming codecs.encode() and codecs.decode() don't solve the use case like the .transform() and .untransform() methods do. (I do think it is a nice feature in Python 2, although I find myself using it mainly at the interpreter prompt, rather than in production code) As noted earlier in the thread, Armin Ronacher has been the most vocal of the users of this feature in Python 2 that lamented it's absence in Python 3 (see, for example, http://lucumr.pocoo.org/2012/8/11/codec-confusion/), but I've also received plenty of subsequent feedback along the lines of what he said! (such as http://bugs.python.org/issue7475#msg187630). The way I read it, the positive feedback was about .transform() and .untransform(), not about recommending people switch to codecs.encode() and codecs.decode(). Rather than the more useful: babcdef.decode(hex) Traceback (most recent call last): File stdin, line 1, in module TypeError: 'hex' decoder returned 'bytes' instead of 'str'; use codecs.decode() to decode to arbitrary types I think this may be confusing. TypeError seems to suggest that the parameter type sent by the user to the method is wrong, which is not the actual cause of the error. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Assign(expr* targets, expr value) - why targetS?
2013/11/15 anatoly techtonik techto...@gmail.com: On Tue, Nov 12, 2013 at 5:08 PM, Benjamin Peterson benja...@python.org wrote: 2013/11/12 anatoly techtonik techto...@gmail.com: On Sun, Nov 10, 2013 at 8:34 AM, Benjamin Peterson benja...@python.org wrote: 2013/11/10 anatoly techtonik techto...@gmail.com: http://hg.python.org/cpython/file/1ee45eb6aab9/Parser/Python.asdl In Assign(expr* targets, expr value), why the first argument is a list? x = y = 42 Thanks. Speaking of this ASDL. `expr* targets` means that multiple entities of `expr` under the name 'targets' can be passed to Assign statement. Assign uses them as left value. But `expr` definition contains things that can not be used as left side assignment targets: expr = BoolOp(boolop op, expr* values) | BinOp(expr left, operator op, expr right) ... | Str(string s) -- need to specify raw, unicode, etc? | Bytes(bytes s) | NameConstant(singleton value) | Ellipsis -- the following expression can appear in assignment context | Attribute(expr value, identifier attr, expr_context ctx) | Subscript(expr value, slice slice, expr_context ctx) | Starred(expr value, expr_context ctx) | Name(identifier id, expr_context ctx) | List(expr* elts, expr_context ctx) | Tuple(expr* elts, expr_context ctx) If I understand correctly, this is compiled into C struct definitions (Python-ast.c), and there is a code to traverse the structure, but where is code that validates that the structure is correct? Is it done on the first level - text file parsing, before ASDL is built? If so, then what is the role of this ADSL exactly that the first step is unable to solve? Only valid expression targets are allowed during AST construction. See set_expr_context in ast.c. Oh my. Now there is also CST in addition to AST. This stuff - http://docs.python.org/devguide/ - badly needs diagrams about data transformation toolchain from Python source code to machine execution instructions. I'd like some pretty stuff, but raw blogdiag hack will do the job http://blockdiag.com/en/blockdiag/index.html There is no set_expr_context in my copy of CPython code, which seems to be some alpha of Python 3.4 It's actually called set_context. Is it possible to fix ADSL to move `expr` that are allowed in Assign into `expr` subset? What effect will it achieve? I mean - will ADSL compiler complain about wrong stuff on the left side, or it will still be a role of some other component. Which one? I'm not sure what you mean by an `expr` subset. Transform this: expr = BoolOp(boolop op, expr* values) | BinOp(expr left, operator op, expr right) ... | Str(string s) -- need to specify raw, unicode, etc? | Bytes(bytes s) | NameConstant(singleton value) | Ellipsis -- the following expression can appear in assignment context | Attribute(expr value, identifier attr, expr_context ctx) | Subscript(expr value, slice slice, expr_context ctx) | Starred(expr value, expr_context ctx) | Name(identifier id, expr_context ctx) | List(expr* elts, expr_context ctx) | Tuple(expr* elts, expr_context ctx) to this: expr = BoolOp(boolop op, expr* values) | BinOp(expr left, operator op, expr right) ... | Str(string s) -- need to specify raw, unicode, etc? | Bytes(bytes s) | NameConstant(singleton value) | Ellipsis -- the following expression can appear in assignment context | expr_asgn expr_asgn = Attribute(expr value, identifier attr, expr_context ctx) | Subscript(expr value, slice slice, expr_context ctx) | Starred(expr value, expr_context ctx) | Name(identifier id, expr_context ctx) | List(expr* elts, expr_context ctx) | Tuple(expr* elts, expr_context ctx) I doubt ASDL will let you do that. -- Regards, Benjamin ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Add transform() and untranform() methods
On 16 November 2013 00:04, Antoine Pitrou solip...@pitrou.net wrote: Rather than the more useful: babcdef.decode(hex) Traceback (most recent call last): File stdin, line 1, in module TypeError: 'hex' decoder returned 'bytes' instead of 'str'; use codecs.decode() to decode to arbitrary types I think this may be confusing. TypeError seems to suggest that the parameter type sent by the user to the method is wrong, which is not the actual cause of the error. The TypeError isn't new, only the part after the semi-colon telling them that codecs.decode() doesn't include the typecheck (because it isn't constrained by the text model). Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Add transform() and untranform() methods
Walter Dörwald writes: Am 15.11.2013 um 00:42 schrieb Serhiy Storchaka storch...@gmail.com: 15.11.13 00:32, Victor Stinner написав(ла): And add transform() and untransform() methods to bytes and str types. In practice, it might be same codecs registry for all codecs just with a new attribute. If the transform() method will be added, I prefer to have only one transformation method and specify a direction by the transformation name (bzip2/unbzip2). +1 -1 I can't support adding such methods (and that's why I ended up giving Nick's proposal for exposing codecs.encode and codecs.decode a +1). People think about these transformations as en- or de-coding, not transforming, most of the time. Even for a transformation that is an involution (eg, rot13), people have an very clear idea of what's encoded and what's not, and they are going to prefer the names encode and decode for these (generic) operations in many cases. Eg, I don't think s.transform(decoder) is an improvement over decode(s, codec) (but tastes vary).[1] It does mean that we need to add a redundant method, and I don't really see an advantage to it. The semantics seem slightly off to me, since the purpose of the operation is to create a new object, not transform the original in-place. (But of course str.encode and bytes.decode are precedents for those semantics.) Footnotes: [1] Arguments decoder and codec are identifiers, not metavariables. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Add transform() and untranform() methods
On 11/14/2013 11:13 PM, Nick Coghlan wrote: The proposal I posted to issue 7475 back in April (and, in the absence of any objections to the proposal, finally implemented over the past few weeks) was to take advantage of the fact that the codecs.encode and codecs.decode convenience functions exist (and have been covered by the regression test suite) as far back as Python 2.4. I did this merely by documenting the existing of the functions for Python 2.7, 3.3 and 3.4, changing the exception messages thrown for codec output type errors on the convenience methods to reference them, and by updating the Python 3.4 What's New document to explain the changes. Thanks for doing this work, Nick! -- ~Ethan~ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Add transform() and untranform() methods
On Sat, 16 Nov 2013 00:46:15 +1000 Nick Coghlan ncogh...@gmail.com wrote: On 16 November 2013 00:04, Antoine Pitrou solip...@pitrou.net wrote: Rather than the more useful: babcdef.decode(hex) Traceback (most recent call last): File stdin, line 1, in module TypeError: 'hex' decoder returned 'bytes' instead of 'str'; use codecs.decode() to decode to arbitrary types I think this may be confusing. TypeError seems to suggest that the parameter type sent by the user to the method is wrong, which is not the actual cause of the error. The TypeError isn't new, Really? That's not what your message said. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Summary of Python tracker Issues
ACTIVITY SUMMARY (2013-11-08 - 2013-11-15) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open4265 (+38) closed 27119 (+45) total 31384 (+83) Open issues with patches: 1979 Issues opened (74) == #1180: Option to ignore or substitute ~/.pydistutils.cfg http://bugs.python.org/issue1180 reopened by jason.coombs #6466: duplicate get_version() code between cygwinccompiler and emxcc http://bugs.python.org/issue6466 reopened by jason.coombs #6516: reset owner/group to root for distutils tarballs http://bugs.python.org/issue6516 reopened by jason.coombs #16286: Use hash if available to optimize a==b and a!=b for bytes and http://bugs.python.org/issue16286 reopened by haypo #17354: TypeError when running setup.py upload --show-response http://bugs.python.org/issue17354 reopened by berker.peksag #19466: Clear state of threads earlier in Python shutdown http://bugs.python.org/issue19466 reopened by haypo #19531: Loading -OO bytecode files if -O was requested can lead to pro http://bugs.python.org/issue19531 opened by Sworddragon #19532: compileall -f doesn't force to write bytecode files http://bugs.python.org/issue19532 opened by Sworddragon #19533: Unloading docstrings from memory if -OO is given http://bugs.python.org/issue19533 opened by Sworddragon #19534: normalize() in locale.py fails for sr_RS.UTF-8@latin http://bugs.python.org/issue19534 opened by mfabian #19535: Test failures with -OO http://bugs.python.org/issue19535 opened by serhiy.storchaka #19536: MatchObject should offer __getitem__() http://bugs.python.org/issue19536 opened by brandon-rhodes #19537: Fix misalignment in fastsearch_memchr_1char http://bugs.python.org/issue19537 opened by schwab #19538: Changed function prototypes in the PEP 384 stable ABI http://bugs.python.org/issue19538 opened by theller #19539: The 'raw_unicode_escape' codec buggy + not appropriate for Pyt http://bugs.python.org/issue19539 opened by zuo #19541: ast.dump(indent=True) prettyprinting http://bugs.python.org/issue19541 opened by techtonik #19542: WeakValueDictionary bug in setdefault()pop() http://bugs.python.org/issue19542 opened by arigo #19543: Add -3 warnings for codec convenience method changes http://bugs.python.org/issue19543 opened by ncoghlan #19544: Port distutils as found in Python 2.7 to Python 3.x. http://bugs.python.org/issue19544 opened by jason.coombs #19545: time.strptime exception context http://bugs.python.org/issue19545 opened by Claudiu.Popa #19546: configparser leaks implementation detail http://bugs.python.org/issue19546 opened by Claudiu.Popa #19547: HTTPS proxy support missing without warning http://bugs.python.org/issue19547 opened by 02strich #19548: 'codecs' module docs improvements http://bugs.python.org/issue19548 opened by zuo #19549: PKG-INFO is created with CRLF on Windows http://bugs.python.org/issue19549 opened by techtonik #19550: PEP 453: Windows installer integration http://bugs.python.org/issue19550 opened by ncoghlan #19551: PEP 453: Mac OS X installer integration http://bugs.python.org/issue19551 opened by ncoghlan #19552: PEP 453: venv module and pyvenv integration http://bugs.python.org/issue19552 opened by ncoghlan #19553: PEP 453: make install and make altinstall integration http://bugs.python.org/issue19553 opened by ncoghlan #19554: Enable all freebsd* host platforms http://bugs.python.org/issue19554 opened by wg #19555: SO config var not getting set http://bugs.python.org/issue19555 opened by Marc.Abramowitz #19557: ast - docs for every node type are missing http://bugs.python.org/issue19557 opened by techtonik #19558: Provide Tcl/Tk linkage information for extension module builds http://bugs.python.org/issue19558 opened by ned.deily #19561: request to reopen Issue837046 - pyport.h redeclares gethostnam http://bugs.python.org/issue19561 opened by risto3 #19562: Added description for assert statement http://bugs.python.org/issue19562 opened by thatiparthy #19563: Changing barry's email to ba...@python.org http://bugs.python.org/issue19563 opened by thatiparthy #19564: test_multiprocessing_spawn hangs http://bugs.python.org/issue19564 opened by haypo #19565: test_multiprocessing_spawn: RuntimeError and assertion error o http://bugs.python.org/issue19565 opened by haypo #19566: ERROR: test_close (test.test_asyncio.test_unix_events.FastChil http://bugs.python.org/issue19566 opened by haypo #19568: bytearray_setslice_linear() leaves the bytearray in an inconsi http://bugs.python.org/issue19568 opened by haypo #19569: Use __attribute__((deprecated)) to warn usage of deprecated fu http://bugs.python.org/issue19569 opened by haypo #19570: distutils' Command.ensure_dirname fails on Unicode http://bugs.python.org/issue19570 opened by saschpe #19572: Report more silently skipped tests as
Re: [Python-Dev] The pysandbox project is broken
On Tue, Nov 12, 2013 at 01:16:55PM -0800, Victor Stinner wrote: pysandbox cannot be used in practice To protect the untrusted namespace, pysandbox installs a lot of different protections. Because of all these protections, it becomes hard to write Python code. Basic features like del dict[key] are denied. Passing an object to a sandbox is not possible to sandbox, pysandbox is unable to proxify arbitary objects. For something more complex than evaluating 1+(2*3), pysandbox cannot be used in practice, because of all these protections. Individual protections cannot be disabled, all protections are required to get a secure sandbox. This sounds a lot like the work I initially did with PyParallel to try and intercept/prevent parallel threads mutating main-thread objects. I ended up arriving at a much better solution by just relying on memory protection; main thread pages are set read-only prior to parallel threads being able to run. If a parallel thread attempts to mutate a main thread object; a SEH is raised (SIGSEV on POSIX), which I catch in the ceval loop and convert into an exception. See slide 138 of this: https://speakerdeck.com/trent/pyparallel-how-we-removed-the-gil-and-exploited-all-cores-1 I'm wondering if this sort of an approach (which worked surprisingly well) could be leveraged to also provide a sandbox environment? The goals are the same: robust protection against mutation of memory allocated outside of the sandbox. (I'm purely talking about memory mutation; haven't thought about how that could be extended to prevent file system interaction as well.) Trent. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] unicode Exception messages in py2.7
Hi again, I figured that even using the traceback.py module and getting Exception: \u1234\u1235\u5321 is rather useless if you tried to raise an exception with a message in Thai. I believe this to also be a bug, so I opened https://bugs.pypy.org/issue1634 . According to this thread, however, python-dev is against it, so I didn't bother adding a CPython bug. A bientôt, Armin. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The pysandbox project is broken
2013/11/15 Trent Nelson tr...@snakebite.org: This sounds a lot like the work I initially did with PyParallel to try and intercept/prevent parallel threads mutating main-thread objects. I ended up arriving at a much better solution by just relying on memory protection; main thread pages are set read-only prior to parallel threads being able to run. If a parallel thread attempts to mutate a main thread object; a SEH is raised (SIGSEV on POSIX), which I catch in the ceval loop and convert into an exception. Read-only is not enough, an attack must not be able to read sensitive data. Protections of memory pages sound very low-level, so not very portable :-/ How do you know fif SIGSEGV comes from a legal call (parallel thread thing) or a real bug? Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] unicode Exception messages in py2.7
On Fri, Nov 15, 2013 at 9:21 AM, Armin Rigo ar...@tunes.org wrote: I figured that even using the traceback.py module and getting Exception: \u1234\u1235\u5321 is rather useless if you tried to raise an exception with a message in Thai. yup. I believe this to also be a bug, so I opened https://bugs.pypy.org/issue1634 . According to this thread, however, python-dev is against it, so I didn't bother adding a CPython bug. According to that bug report, it looks like CPython doesn't comopletely handle unicode Exception messages even in py3? Is that really the case? And from this thread, I'd say that it's unlikely anyone want to chance this in py2, but I don't know that making py3 better is this regard is off the table. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The pysandbox project is broken
On Nov 15, 2013, at 12:34 PM, Victor Stinner wrote: 2013/11/15 Trent Nelson tr...@snakebite.org: This sounds a lot like the work I initially did with PyParallel to try and intercept/prevent parallel threads mutating main-thread objects. I ended up arriving at a much better solution by just relying on memory protection; main thread pages are set read-only prior to parallel threads being able to run. If a parallel thread attempts to mutate a main thread object; a SEH is raised (SIGSEV on POSIX), which I catch in the ceval loop and convert into an exception. Read-only is not enough, an attack must not be able to read sensitive data. Well you could remove both write *and* read perms from pages, such that you would trap on read attempts too. What's an example of sensitive data that you'd need to have residing in the same process that you also want to sandbox? I was going to suggest something like: with memory.protected: htpasswd = open('htpasswd', 'r').read() ... But then I couldn't think of why you'd persist the sensitive data past the point you'd need it. Protections of memory pages sound very low-level, so not very portable :-/ It's a pretty fundamental provision provided by operating systems; granted, the interface differs (mprotect() versus VirtualProtect()), but the result is the same. How do you know fif SIGSEGV comes from a legal call (parallel thread thing) or a real bug? You don't, but it doesn't really matter. It'll be pretty obvious from looking at the offending line of code in the exception whether it was a legitimate memory protection error, or a bug in an extension module/CPython internals. And having a ProtectionError bubble all the way back up to the top of the stack with exact details about the offending frame/line could be considered a nicer alternative to dumping core ;-) (Unless you happen to be in an `except: pass` block.) Victor Trent. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] unicode Exception messages in py2.7
On Fri, Nov 15, 2013 at 12:41 PM, Chris Barker chris.bar...@noaa.govwrote: On Fri, Nov 15, 2013 at 9:21 AM, Armin Rigo ar...@tunes.org wrote: I figured that even using the traceback.py module and getting Exception: \u1234\u1235\u5321 is rather useless if you tried to raise an exception with a message in Thai. yup. I believe this to also be a bug, so I opened https://bugs.pypy.org/issue1634 . According to this thread, however, python-dev is against it, so I didn't bother adding a CPython bug. According to that bug report, it looks like CPython doesn't comopletely handle unicode Exception messages even in py3? Is that really the case? And from this thread, I'd say that it's unlikely anyone want to chance this in py2, but I don't know that making py3 better is this regard is off the table. Making changes and improvements to Python 3 is totally an option. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] unicode Exception messages in py2.7
On Fri, Nov 15, 2013 at 5:24 AM, Martin v. Löwis mar...@v.loewis.de wrote: Procedurally, it's really easy. Ultimately it's up to the release manager to decide which changes go into a release and which don't, and Benjamin has already voiced an opinion. Very early in the conversation, though honestly, probably nothing compelling has been brought up... In addition, Guido van Rossum has voiced an opinion a while ago that he doesn't consider fixing bugs for 2.7 very useful, and would rather see maintenance focus on ongoing support for new operating systems, compiler, build environments, etc. The rationale is that people who have lived with the glitches of 2.x for so long surely have already made their work-arounds, so they aren't helped with receiving bug fixes. If that's the policy, then that's the policy, but ... The same is true in your case: you indicated that you *already* work around the problem. only in one script, and I just looked, and I missed a numer of locations in that script. I've been running that script for years with very few changes, but ;last week, someone gave me a utf-8 data file -- it was really easy to read the file as utf-8 (change one line of code), and bingo! everything else just worked. Then I hit an exception, and banged my head against the wall for a while -- though I guess this is what we always deal with anywhere we introduce unicode to a previously-non-unicode-aware application. I'm still a bit dumbfounded that you can't use a unicode message in an Exception, though, still not sure why that's required... It may have been tedious when you had to do it, but now it's done - and you might not even change your code even if Python 2.7.x gets changed, since you might want to support older 2.7.x release for some time. In this case, no -- but really this is more about making it easier to just dump unicode in somewhere, or, in fact simple give people more meaningful errors when they do that... And I have a lot of code that ignores this problem, and Im sure it will come up for me and others over and over again. But yeas, it clearly hasn't been a deal-breaker so far! On Fri, Nov 15, 2013 at 2:48 AM, Armin Rigo ar...@tunes.org wrote: FWIW, the pure Python traceback.py module has a slightly different (and saner) behavior: e = Exception(uxx\u1234yy) traceback.print_exception(Exception, e, None) Exception: xx\u1234yy I'd suggest that the behavior of the two should be unified anyway. The traceback module uses value.encode(ascii, backslashreplace) for any unicode object. Nice observation -- so at least someone else agreed with me about what the right thing to do is -- oh well. On Thu, Nov 14, 2013 at 9:42 PM, Steven D'Aprano st...@pearwood.info wrote: I'm not convinced that treating Unicode strings as a special case is justified. It's been at least four, and possibly six (back to 2.2) point releases with this behaviour, and until now apparently nobody has noticed. Not true -- apparently no one has brought it up on pyton-dev or posted an issue, but I confirmed that I understood what was going on with a little googling, including: http://pythonhosted.org/kitchen/unicode-frustrations.html#frustration-5-exceptions That's document was written 19 March 2011, and at the time the library worked with pyton 2.3 and later. Anyway, back to two questions: 1) could it be improved? it seems there is some disagreement on that one. and 2) Is this a big enough deal to change 2.* ? From what Martin says, No. So we don't need to argue about (1). I sure hope py3 behavior is solid on this (sorry, no py3 to test on here...) But I can't help myself: Of all the guidelines for writing good code, the one I come back to again and again is DRY -- it drives almost all of my code structure decisions. So, in this case, now I need to think about whether to put in a kludge every single time I raise an Exception. In the script at hand, I needed to change 7 instances of raising an Exception, out of 10 total. Contrast that with one line of code changed in the Exception code. In fact, what Ill probably do is write a little wrapper that does teh encoding for an arbitrary exeption, and use that, somethign like: def my_raise(exp, msg): raise exp(unicode(msg).encode('ascii', 'replace')) But does it really make sense for me to write that an use it all over the place, as well as everyone else doing their own kludges? Oh well, I suppose the real lesson is go to Python 3 -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Add transform() and untranform() methods
Am 15.11.2013 um 16:57 schrieb Stephen J. Turnbull step...@xemacs.org: Walter Dörwald writes: Am 15.11.2013 um 00:42 schrieb Serhiy Storchaka storch...@gmail.com: 15.11.13 00:32, Victor Stinner написав(ла): And add transform() and untransform() methods to bytes and str types. In practice, it might be same codecs registry for all codecs just with a new attribute. If the transform() method will be added, I prefer to have only one transformation method and specify a direction by the transformation name (bzip2/unbzip2). +1 -1 I can't support adding such methods (and that's why I ended up giving Nick's proposal for exposing codecs.encode and codecs.decode a +1). My +1 was only for having the transformation be one-way under the condition that it is added at all. People think about these transformations as en- or de-coding, not transforming, most of the time. Even for a transformation that is an involution (eg, rot13), people have an very clear idea of what's encoded and what's not, and they are going to prefer the names encode and decode for these (generic) operations in many cases. Eg, I don't think s.transform(decoder) is an improvement over decode(s, codec) (but tastes vary).[1] It does mean that we need to add a redundant method, and I don't really see an advantage to it. Actually my preferred method would be codec.decode(s). codec being the module that implements the functionality. I don't think we need to invent another function registry. The semantics seem slightly off to me, since the purpose of the operation is to create a new object, not transform the original in-place. This would mean the method would have to be called transformed()? (But of course str.encode and bytes.decode are precedents for those semantics.) Footnotes: [1] Arguments decoder and codec are identifiers, not metavariables. Servus, Walter ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Finding overlapping matches with re assertions: bug or feature?
[Tim] Is that a feature? Or an accident? It's very surprising to find a non-empty match inside an empty match (the outermost lookahead assertion). [Paul Moore] Personally, I would read (?=(R)) as finding an empty match at a point where R starts. There's no implication that R is in any sense inside the match. (?=(\\w\w\w\w\w\w)\w\w\w) finds the first 3 characters of words that are 6 or more characters long. Once again, the lookahead extends beyond the extent of the main match. It's obscure and a little bizarre, but I'd say its intended and a logical consequence of the definitions. After sleeping on it, I woke up a lot less surprised. You'd think that after decades of regexps, I'd be used to that by now ;-) Thanks for the response! Your points sound valid to me, and I agree. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] cpython: Issue #19544 and Issue #6516: Restore support for --user and --group parameters
Am 15.11.2013 19:07, schrieb jason.coombs: http://hg.python.org/cpython/rev/b9c9c4b2effe changeset: 87119:b9c9c4b2effe user:Andrew Kuchling a...@amk.ca date:Fri Nov 15 13:01:52 2013 -0500 summary: Issue #19544 and Issue #6516: Restore support for --user and --group parameters to sdist command as found in Python 2.7 and originally slated for Python 3.2 but accidentally rolled back as part of the distutils2 rollback. Closes Issue #6516. Your commit has broken the build: ./python -E -S -m sysconfig --generate-posix-vars Could not find platform dependent libraries exec_prefix Consider setting $PYTHONHOME to prefix[:exec_prefix] Traceback (most recent call last): File ./setup.py, line 11, in module from distutils.core import Extension, setup File /home/heimes/dev/python/cpython/Lib/distutils/core.py, line 18, in module from distutils.cmd import Command File /home/heimes/dev/python/cpython/Lib/distutils/cmd.py, line 9, in module from distutils import util, dir_util, file_util, archive_util, dep_util File /home/heimes/dev/python/cpython/Lib/distutils/archive_util.py, line 27, in module from grp import getgrnam ImportError: No module named 'grp' The grp module is built later by setup.py. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The pysandbox project is broken
On 13/11/13 00:49, Josiah Carlson wrote: Python-dev is for the development of the Python core language, the CPython runtime, and libraries. Your sandbox, despite using and requiring deep knowledge of the runtime, is not developing those things. If you had a series of requests for the language or runtime that would make your job easier, then your thread would be on-topic. I think you should consider to re-define you perception of the purpose of the python-dev list. Simple feature-requests is not everything. Instead, this list also touches the general direction where python should go, and discusses the current hard-to-solve problems. The sand-boxing feature via rexec, bastion etc. was perceived as a useful, quite safe thing, until it was proven to be completely broken (Samuele Pedroni et. at., 2003 I think). After that, CPython simply removed those features and failed completely to provide a better solution. I appreciate very much that Victor tried his best to fill that old gap. And after that breakage happened again, I think it is urgent to have an in-depth discussion how that situation should be treated in the future. -- Christian Tismer :^) mailto:tis...@stackless.com Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 :*Starship* http://starship.python.net/ 14482 Potsdam: PGP key - http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] unicode Exception messages in py2.7
Armin Rigo wrote: I figured that even using the traceback.py module and getting Exception: \u1234\u1235\u5321 is rather useless if you tried to raise an exception with a message in Thai. But at least it tells you that *something* went wrong, and points to the place in the code where it happened. That has to be better than pretending that nothing happened at all. Also, if the escaping preserves the original byte sequence of the message, there's a chance that someone will be able to figure out what the message said. -- Greg ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Add transform() and untranform() methods
On 16 Nov 2013 02:36, Antoine Pitrou solip...@pitrou.net wrote: On Sat, 16 Nov 2013 00:46:15 +1000 Nick Coghlan ncogh...@gmail.com wrote: On 16 November 2013 00:04, Antoine Pitrou solip...@pitrou.net wrote: Rather than the more useful: babcdef.decode(hex) Traceback (most recent call last): File stdin, line 1, in module TypeError: 'hex' decoder returned 'bytes' instead of 'str'; use codecs.decode() to decode to arbitrary types I think this may be confusing. TypeError seems to suggest that the parameter type sent by the user to the method is wrong, which is not the actual cause of the error. The TypeError isn't new, Really? That's not what your message said. The second example in my post included restoring the hex alias for hex_codec (its absence is the reason for the current unknown encoding error). The 3.2 and 3.3 error message for a restored alias would have been TypeError: 'hex' decoder returned 'bytes' instead of 'str', which I agree is confusing and uninformative - that's why I added the reference to the module level functions to the output type errors *before* proposing the restoration of the aliases. So you can already use codecs.decode(s, 'hex_codec') in Python 3, you just won't get a useful error leading you there if you use the more common 'hex' alias instead. To address Serhiy's security concerns with the compression codecs (which are technically independent of the question of restoring the aliases), I also plan to document how to systematically blacklist particular codecs in an application by setting attributes on the encodings module and/or appropriate entries in sys.modules. Finally, I now plan to write a documentation PEP that suggests clearly splitting the codecs module docs into two layers: the type agnostic core infrastructure and the specific application of that infrastructure to the implementation of the text encoding model. The only functional *change* I'd still like to make for 3.4 is to restore the shorthand aliases for the non-Unicode codecs (to ease the migration for folks coming from Python 2), but this thread has convinced me I likely need to write the PEP *before* doing that, and I still have to integrate ensurepip into pyvenv before the beta 1 deadline. So unless you and Victor are prepared to +1 the restoration of the codec aliases (closing issue 7475) in anticipation of that codecs infrastructure documentation PEP, the change to restore the aliases probably won't be in 3.4. (I *might* get the PEP written in time regardless, but I'm not betting on it at this point). Cheers, Nick. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The pysandbox project is broken
On 16 Nov 2013 08:25, Christian Tismer tis...@stackless.com wrote: On 13/11/13 00:49, Josiah Carlson wrote: Python-dev is for the development of the Python core language, the CPython runtime, and libraries. Your sandbox, despite using and requiring deep knowledge of the runtime, is not developing those things. If you had a series of requests for the language or runtime that would make your job easier, then your thread would be on-topic. I think you should consider to re-define you perception of the purpose of the python-dev list. Simple feature-requests is not everything. Instead, this list also touches the general direction where python should go, and discusses the current hard-to-solve problems. The sand-boxing feature via rexec, bastion etc. was perceived as a useful, quite safe thing, until it was proven to be completely broken (Samuele Pedroni et. at., 2003 I think). After that, CPython simply removed those features and failed completely to provide a better solution. Use an OS level sandbox *is* better from a security point of view. It's just not portable :P Cheers, Nick. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The pysandbox project is broken
On Fri, Nov 15, 2013 at 4:31 PM, Nick Coghlan ncogh...@gmail.com wrote: Use an OS level sandbox *is* better from a security point of view. It's just not portable :P Honestly, I don't believe in portable security. :-) BTW, in case it wasn't clear, I think it was a courageous step by Victor to declare defeat. Negative results are also results, and they need to be published. Thanks Victor! -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Add transform() and untranform() methods
2013/11/16 Nick Coghlan ncogh...@gmail.com: To address Serhiy's security concerns with the compression codecs (which are technically independent of the question of restoring the aliases), I also plan to document how to systematically blacklist particular codecs in an application by setting attributes on the encodings module and/or appropriate entries in sys.modules. I would be simpler and safer to blacklist bytes=bytes and str=str codecs from bytes.decode() and str.encode() directly. Marc Andre Lemburg proposed to add new attributes in CodecInfo to specify input and output types. The only functional *change* I'd still like to make for 3.4 is to restore the shorthand aliases for the non-Unicode codecs (to ease the migration for folks coming from Python 2), but this thread has convinced me I likely need to write the PEP *before* doing that, and I still have to integrate ensurepip into pyvenv before the beta 1 deadline. So unless you and Victor are prepared to +1 the restoration of the codec aliases (closing issue 7475) in anticipation of that codecs infrastructure documentation PEP, the change to restore the aliases probably won't be in 3.4. (I *might* get the PEP written in time regardless, but I'm not betting on it at this point). Using StackOverflow search engine, I found some posts where people asks for hex codec on Python 3. There are two answers: use binascii module or use codecs.encode(). So even if codecs.encode() was never documented, it looks like it is used. So I now agree that documenting it would not make the situation worse. Adding transform()/untransform() method to bytes and str is a non trivial change and not everybody likes them. Anyway, it's too late for Python 3.4. In my opinion, the best option is to add new input_type/output_type attributes to CodecInfo right now, and modify the codecs so abc.encode(hex) raises a LookupError (instead of tricky error message with some evil low-level hacks on the traceback and the exception, which is my initial concern in this mail thread). It fixes also the security vulnerability. To keep backward compatibility (even with custom codecs registered manually), if input_type/output_type is not defined, we should consider that the codec is a classical text encoding (encode str=bytes, decode bytes=str). The type of codecs.encode() result is my least concern in this topic. I created the following issue to implement my idea: http://bugs.python.org/issue19619 Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The pysandbox project is broken
On 16.11.13 01:35, Guido van Rossum wrote: On Fri, Nov 15, 2013 at 4:31 PM, Nick Coghlan ncogh...@gmail.com mailto:ncogh...@gmail.com wrote: Use an OS level sandbox *is* better from a security point of view. It's just not portable :P Honestly, I don't believe in portable security. :-) BTW, in case it wasn't clear, I think it was a courageous step by Victor to declare defeat. Negative results are also results, and they need to be published. Thanks Victor! Sure it was, and it was great to follow Victor's project! I was about to use it in production, until I saw it's flaws, a while back. Nevertheless, the issue has never been treated as much as to be able to say this way you implement that security in Python, whatever that should be. So I think it is worth discussing, and may it just be to identify the levels of security involved, to help people to even identify their individual needs. My question is, actually: Do we need to address this topic, or is it already crystal clear that something like PyPy's approach is necessary and sufficient to solve the common, undefined problem of run some script on whatnot, with the following security constraint? IOW: Do we really need a full abstraction, embedded in a virtual OS, or is there already a compromise that suits 98 percent of the common needs? I think as a starter, categorizing the expectations of some measure of 'secure python' would make sense. And I'm asking the people with better knowledge of these matters than I have. (and not asking those who don't... ;-) ) cheers -- Chris -- Christian Tismer :^) mailto:tis...@stackless.com Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 :*Starship* http://starship.python.net/ 14482 Potsdam: PGP key - http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] (#19562) Asserts in Python stdlib code (datetime.py)
http://bugs.python.org/issue19562 propose to change the first assert in Lib/datetime.py assert 1 = month = 12, month to assert 1 = month = 12,'month must be in 1..12' to match the next two asserts out of the *53* in the file. I think that is the wrong direction of change, but that is not my question here. Should stdlib code use assert at all? If user input can trigger an assert, then the code should raise a normal exception that will not disappear with -OO. If the assert is testing program logic, then it seems that the test belongs in the test file, in this case, test/test_datetime.py. For example, consider the following (backwards) code. _DI4Y = _days_before_year(5) # A 4-year cycle has an extra leap day over what we'd get from pasting # together 4 single years. assert _DI4Y == 4 * 365 + 1 To me, the constant should be directly set to its known value. _DI4Y = 4*365 + 1. The function should then be tested in test_datetime. self.assertEqual(dt._days_before_year(5), dt._DI4Y) Is there any policy on use of assert in stdlib production code? -- Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] (#19562) Asserts in Python stdlib code (datetime.py)
[Terry Reedy] Should stdlib code use assert at all? Of course, and for exactly the same reasons we use `assert()` in Python's C code: to verify preconditions, postconditions, and invariants that should never fail. Assertions should never be used to, e.g., verify user-supplied input (or anything else we believe _may_ fail). If user input can trigger an assert, then the code should raise a normal exception that will not disappear with -OO. Agreed. If the assert is testing program logic, then it seems that the test belongs in the test file, in this case, test/test_datetime.py. For example, consider the following (backwards) code. _DI4Y = _days_before_year(5) # A 4-year cycle has an extra leap day over what we'd get from pasting # together 4 single years. assert _DI4Y == 4 * 365 + 1 To me, the constant should be directly set to its known value. _DI4Y = 4*365 + 1. The function should then be tested in test_datetime. self.assertEqual(dt._days_before_year(5), dt._DI4Y) I think making that change would be pointless code churn. Harmful, even. As the guy who happened to have written that code ;-), I think it's valuable to have the _code_ (not off buried in some monstrously tedious test file) explain what the comments there do explain, and verify with the assert. If anyone needs to muck with the implementation of datetime, it's crucial they understand what DI4Y _means_, and that it's identical to _days_before_year(5). Its actual value (4*365 + 1) isn't really interesting. Defining _DI4Y _as_ _days_before_year(5) captures its _meaning_. Ain't broke - don't fix. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] (#19562) Asserts in Python stdlib code (datetime.py)
Should stdlib code use assert at all? If user input can trigger an assert, then the code should raise a normal exception that will not disappear with -OO. If the assert is testing program logic, then it seems that the test belongs in the test file, in this case, test/test_datetime.py. For example, consider the following (backwards) code. Is there any policy on use of assert in stdlib production code? It is my assertion that assert should only be used where a system-level problem would occur, where you cannot trap an error condition. -- MarkJ Tacoma, Washington ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The pysandbox project is broken
On 11/15/2013 02:24 PM, Christian Tismer wrote: I appreciate very much that Victor tried his best to fill that old gap. And after that breakage happened again, I think it is urgent to have an in-depth discussion how that situation should be treated in the future. +1 -- ~Ethan~ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com