Re: [Python-Dev] readd u'' literal support in 3.3?
On Sat, 2011-12-10 at 15:55 +1000, Nick Coghlan wrote: So I'm back to being -1 on the idea of adding back u'' literals for 3.3. Instead, people should explicitly call str() on any literals that they want to be actual str instances both in 3.x and in 2.x when the unicode literals future import is in effect. After thinking on it a while, I can't see anything wrong with this strategy except for the 10X performance hit for defining native literals. Truth be told, in the vast majority of WSGI apps only high-level WSGI libraries (like WebOb and Werkzeug) and standalone middleware really needs to work with native strings. And the middleware really should be using the high-level libraries to parse WSGI anyway. So there are a finite number of places where it's actually a real issue. As someone who ported WebOb and other stuff built on top of it to Python 3 without using from __future__ import unicode_literals, I'm kinda sad that to be using best practice I'll have to go back and flip the polarity on everything. It's my cross to bear, though. If I have any issue with it in the future I'll bring u'' back up. - C ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Fixing the XML batteries
Martin v. Löwis, 11.12.2011 23:39: I can't recall anyone working on any substantial improvements during the last six years or so, and the reason for that seems obvious to me. What do you think is the reason? It's not at all obvious to me. Just to repeat myself for the third time here: lack of interest. Stefan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] cpython: Issue #5689: Add support for lzma compression to the tarfile module.
On Sun, Dec 11, 2011 at 11:45:06PM +0100, Antoine Pitrou wrote: On Sat, 10 Dec 2011 20:40:17 +0100 lars.gustaebel python-check...@python.org wrote: The :mod:`tarfile` module makes it possible to read and write tar -archives, including those using gzip or bz2 compression. +archives, including those using gzip, bz2 and lzma compression. (:file:`.zip` files can be read and written using the :mod:`zipfile` module.) Perhaps there should be a versionchanged directive for lzma support? This is now fixed. -- Lars Gustäbel l...@gustaebel.de There's no present. There's only the immediate future and the recent past. (George Carlin) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Fixing the XML batteries
Martin v. Löwis, 11.12.2011 23:03: Am 09.12.2011 10:09, schrieb Xavier Morel: On 2011-12-09, at 09:41 , Martin v. Löwis wrote: a) The stdlib documentation should help users to choose the right tool right from the start. Instead of using the totally misleading wording that it uses now, it should be honest about the performance characteristics of MiniDOM and should actively suggest that those who don't know what to choose (or even *that* they can choose) should not use MiniDOM in the first place. [...] Minidom is inferior in interface flow and pythonicity, in terseness, in speed, in memory consumption (even more so using cElementTree, and that's not something which can be fixed unless minidom gets a C accelerator), etc… Even after fixing minidom (if anybody has the time and drive to commit to it), ET/cET should be preferred over it. I don't mind pointing people to ElementTree, despite that I disagree whether the ET interface is superior to DOM. Yes, that's clearly a point where we agree to disagree, and I understand that you are as biased towards minidom as I am biased towards ElementTree. However, I think I made it clear that the implementation of cElementTree (and lxml.etree as well, for that purpose) is largely superiour to MiniDOM in terms of performance, for any sensible meaning of the word performance. And I'm also convinced that the API is largely superiour in terms of usability. ET certainly matches Python as a language much better than MiniDOM. But that's just my personal opinion. It's Stefan's reasoning as to *why* people should be pointed to ET, and what words should be used to do that. IOW, I detest bashing some part of the standard library, just to urge users to use some other part of the standard library. I'm all for finding a good way of putting it into words, as long as it keeps uninformed users from taking the wrong decision and getting the wrong idea of how complicated and slow Python is. People are still using PyXML, despite it's not being maintained anymore. My experience with that is that it's only *new* users that are still running into PyXML by accident, because they didn't see that it's a dead project and they find it through ancient web pages that tell them that they need it because it's the way to do XML in Python and if minidom is not enough, use PyXML. Maybe we should misuse the stdlib documentation to clear that up as well. PyXML is just too attractive a name for a dead project. Just look through the xml-sig page, basically all requests regarding PyXML during the last five years deal with problems in installing it, i.e. *before* even starting to use it. So you can't use this to claim that people really *are* still using it. Telling them to replace 4DOM with minidom is much more appropriate Do you actually have any evidence that anyone is still actively using 4DOM? than telling them to rewrite in ET. I usually encourage people to rewrite minidom code for ET. It makes the code simpler, more readable, more maintainable and much faster. Stefan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Fixing the XML batteries
Stefan Behnel, 12.12.2011 10:59: Just look through the xml-sig page Hmm, I meant xml-sig mailing list archive here ... Stefan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] readd u'' literal support in 3.3?
On Mon, Dec 12, 2011 at 3:40 AM, Chris McDonough chr...@plope.com wrote: Truth be told, in the vast majority of WSGI apps only high-level WSGI libraries (like WebOb and Werkzeug) and standalone middleware really needs to work with native strings. And the middleware really should be using the high-level libraries to parse WSGI anyway. So there are a finite number of places where it's actually a real issue. And those only if they're using six or a similar joint-codebase strategy, *and* using unicode_literals in a 2.x module that also does WSGI. If they're using 2to3 and stick with explicit u'', they'll be fine. Unfortunately, AFAIR, nobody in the PEP discussions brought up either the unicode_literals import OR the strategy of using a common codebase, so 2to3 on plain code and writing new Python3 code were the only porting scenarios discussed. (Not that I'm sure it would've made a difference, as I'm not sure what we could have done differently that would still support simple Python3 code and easy 2to3 porting.) As someone who ported WebOb and other stuff built on top of it to Python 3 without using from __future__ import unicode_literals, I'm kinda sad that to be using best practice I'll have to go back and flip the polarity on everything. Eh? If you don't need unicode_literals, what's the problem? ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] readd u'' literal support in 3.3?
On Mon, 2011-12-12 at 09:50 -0500, PJ Eby wrote: As someone who ported WebOb and other stuff built on top of it to Python 3 without using from __future__ import unicode_literals, I'm kinda sad that to be using best practice I'll have to go back and flip the polarity on everything. Eh? If you don't need unicode_literals, what's the problem? Porting the WebOb code sucked. It's only about 5K lines of code but the porting effort took me about 80 hours. Some of the problem is certainly my own idiocy, but some of it is just because straddling code across Python 2 and Python 3 currently requires that you change lots and lots of code for suspect benefit. - C ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] (no subject)
Guido posted this on Google+: IEEE/ISO are working on a draft document about Python vulunerabilities: http://grouper.ieee.org/groups/plv/DocLog/300-399/360-thru-379/22-WG23-N-0372/n0372.pdf (in the context of a larger effort to classify vulnerabilities in all languages: ISO/IEC TR 24772:2010, available from ISO at no cost at: http://standards.iso.org/ittf/PubliclyAvailableStandards/index.html (its link is near the bottom of the web page). Will this document have a broad use, such that we should make sure it is accurate (to avoid any future confusion)? I skimmed through and found that it covers a lot of ground, not necessarily about vulnerabilities, with some inaccuracies but not a ton that I noticed. If it doesn't matter then no big deal. Just thought I'd bring it up. -eric ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] IEEE/ISO draft on Python vulnerabilities
re-sending with subject :) On Mon, Dec 12, 2011 at 2:44 PM, Eric Snow ericsnowcurren...@gmail.com wrote: Guido posted this on Google+: IEEE/ISO are working on a draft document about Python vulunerabilities: http://grouper.ieee.org/groups/plv/DocLog/300-399/360-thru-379/22-WG23-N-0372/n0372.pdf (in the context of a larger effort to classify vulnerabilities in all languages: ISO/IEC TR 24772:2010, available from ISO at no cost at: http://standards.iso.org/ittf/PubliclyAvailableStandards/index.html (its link is near the bottom of the web page). Will this document have a broad use, such that we should make sure it is accurate (to avoid any future confusion)? I skimmed through and found that it covers a lot of ground, not necessarily about vulnerabilities, with some inaccuracies but not a ton that I noticed. If it doesn't matter then no big deal. Just thought I'd bring it up. -eric ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] (no subject)
The authors are definitely interested in feedback! Best probably to post it to my G+ thread. On Mon, Dec 12, 2011 at 1:44 PM, Eric Snow ericsnowcurren...@gmail.com wrote: Guido posted this on Google+: IEEE/ISO are working on a draft document about Python vulunerabilities: http://grouper.ieee.org/groups/plv/DocLog/300-399/360-thru-379/22-WG23-N-0372/n0372.pdf (in the context of a larger effort to classify vulnerabilities in all languages: ISO/IEC TR 24772:2010, available from ISO at no cost at: http://standards.iso.org/ittf/PubliclyAvailableStandards/index.html (its link is near the bottom of the web page). Will this document have a broad use, such that we should make sure it is accurate (to avoid any future confusion)? I skimmed through and found that it covers a lot of ground, not necessarily about vulnerabilities, with some inaccuracies but not a ton that I noticed. If it doesn't matter then no big deal. Just thought I'd bring it up. -eric ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] IEEE/ISO draft on Python vulnerabilities
IEEE/ISO are working on a draft document about Python vulunerabilities: http://grouper.ieee.org/groups/plv/DocLog/300-399/360-thru-379/22-WG23-N-0372/n0372.pdf (in the context of a larger effort to classify vulnerabilities in all languages: ISO/IEC TR 24772:2010, available from ISO at no cost at: http://standards.iso.org/ittf/PubliclyAvailableStandards/index.html (its link is near the bottom of the web page). Random comments. I didn't read everything. -- Vulnerability descriptions for the language Python Standards and terminology based on the 3.x standard only. (...) Automatic conversion also occurs when an integer becomes too large to fit within the constraints of the large integer specified in the language (typically C) used to create the Python interpreter. On a 32‐bit machine this would be the range ‐2^30 to 2^30‐1. When an integer becomes too large to fit into that range it is converted to an extended precision integer of arbitrary length. (...) otherwise, if either argument is a floating point number, the other is converted to floating otherwise, if either argument is a long integer, the other is converted to long integer; 10 and 2**1024 have the same type (int) in Python 3. I don't really understand what extended precision means. There are no more long integers. -- Python.16 Wrap‐around Error [XYY] (...) ... exception handling for floating point operations cannot be assumed to catch this type of error because they are not standardized in the underlying C language. Can you give me an example of such problem? If there is really an issue, can we configure the FPU to catch such error? pyfpe.h has PyFPE_START_PROTECT and PyFPE_END_PROTECT macros, but they do nothing by default. You can to enable this protection using ./configure --with-fpectl. -- if(y 0):print(x) Even if this example is valid, it is surprising to see parenthesis around the condition in Python. if y 0: print(x) or even if y 0: print(x) would be better. -- Python also encourages structured programming by not introducing any of the following constructs which could easily lead to unstructured code: - Labels and branching statements such as GO TO; - Case, GO TO DEPENDING, EVALUATE, switch and other statements that branch dependent on a variable’s value; and - ALTER which changes GO TO label to branch to a different label. You have to modify the language (and so build your own interpreter) to add a goto instruction to Python. Or do you mean that someone may want to implement something like goto using exceptions for example? -- When sorting a list using the sort() method, attempting to inspect or mutate the content of the list will result in undefined behaviour. Oh... I never imagined such use case. Let's try: $ ./python Python 3.3.0a0 (default:3ad7d01acbf4+, Dec 12 2011, 21:07:55) def hack(x): ... mylist.append(10) ... return ... mylist=[1] mylist.sort(key=hack) Traceback (most recent call last): File stdin, line 1, in module ValueError: list modified during sort Same behaviour with Python 2.7 and 3.2: so the Python behaviour is defined, you get a ValueError. Are there other ways to inspect or mutate a list while sorting it? -- The sequence of keys in a dictionary is undefined because the hashing function used to index the keys is unspecified therefore different implementations are likely to yield different sequences. Exact. You might mention that collections.OrderedDict has a defined behaviour: it lists keys (and values) in the insertion order. -- Mixing tabs and spaces to indent is defined differently for UNIX and non‐UNIX platforms; You can use the -tt command line option to raise an IndentationError (a block can still be indented using spaces and tabs). Victor ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] readd u'' literal support in 3.3?
On Tue, Dec 13, 2011 at 12:50 AM, PJ Eby p...@telecommunity.com wrote: Unfortunately, AFAIR, nobody in the PEP discussions brought up either the unicode_literals import OR the strategy of using a common codebase, so 2to3 on plain code and writing new Python3 code were the only porting scenarios discussed. (Not that I'm sure it would've made a difference, as I'm not sure what we could have done differently that would still support simple Python3 code and easy 2to3 porting.) That's not web-sig's fault though - it's only as people have been trying it and *succeeding* that we've come to realise that single code base approaches are significantly more feasible than we originally anticipated. Now, depending on whether you need to support 2.5 and earlier, we even have a reasonable answer to the native strings problem: If supporting only 2.6+, use from __future__ import unicode_literals and the 'str' builtin: Import at top of module: from __future__ import unicode_literals Text: Native: str() Binary: b If also supporting 2.5 and earlier, use six (or an equivalent compatibility module): Import at top of module: from six import u, b Text: u() Native: Binary: b() Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] str.format implementation
Hi, I'm hoping to get some kind of consensus about the divergences between the implementation and documentation of str.format (http://mail.python.org/pipermail/python-dev/2011-June/111860.html and the linked bug report contain examples of the divergences). These pertain to the arg_name, attribute_name, and element_index fields of the grammar in the docs: replacement_field ::= { [field_name] [! conversion] [: format_spec] } field_name::= arg_name (. attribute_name | [ element_index ])* arg_name ::= [identifier | integer] attribute_name::= identifier element_index ::= integer | index_string index_string ::= any source character except ] + Nothing definitive emerged from the last round of discussion, and as far as I can recall there are now three proposals for what kind of changes might be worth making: (1) the implementation should conform to the docs;* (2) like (1) with the change that element_index should be changed to integer | identifier (rendering index_string otiose); (3) like (1) with the change that index_string should be changed to 'any source character except ], }, or {'. * the docs link integer to http://docs.python.org/reference/lexical_analysis.html#grammar-token-integer but the current implementation only allows decimal integers, which seems reasonable and worth retaining. (2) was suggested by Greg Ewing on python-dev and (3) by Petri Lehtinen in the bug report. (Petri actually suggested that braces be disallowed except for the nesting in the format_spec, but it comes to the same thing.) None of these should be difficult to implement; patches exist for (1) and (2). (2) and (3) would lead to format strings that are easier to for the programmer to visually parse; (1) would make the indexing part of the replacement field conform more closely to the way indexing with strings behaves in Python generally, where arbitrary strings can be used. (It wouldn't conform exactly, obviously, since ']' would still be excluded.) I personally would prefer (1) to (2) or (3), and (3) to (2), had I my druthers, but it doesn't matter a *whole* lot to me; I'd prefer any of them to nothing (or to changing the docs to reflect the current batty behavior). -- Ben Wolfson Human kind has used its intelligence to vary the flavour of drinks, which may be sweet, aromatic, fermented or spirit-based. ... Family and social life also offer numerous other occasions to consume drinks for pleasure. [Larousse, Drink entry] ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] PyUnicodeObject / PyASCIIObject questions
(see http://www.python.org/dev/peps/pep-0393/ and http://hg.python.org/cpython/file/6f097ff9ac04/Include/unicodeobject.h ) typedef struct { PyObject_HEAD Py_ssize_t length; Py_hash_t hash; struct { unsigned int interned:2; unsigned int kind:2; /* now 3 in implementation */ unsigned int compact:1; unsigned int ascii:1; unsigned int ready:1; } state; wchar_t *wstr; } PyASCIIObject; typedef struct { PyASCIIObject _base; Py_ssize_t utf8_length; char *utf8; Py_ssize_t wstr_length; } PyCompactUnicodeObject; typedef struct { PyCompactUnicodeObject _base; union { void *any; Py_UCS1 *latin1; Py_UCS2 *ucs2; Py_UCS4 *ucs4; } data; } PyUnicodeObject; (1) Why is PyObject_HEAD used instead of PyObject_VAR_HEAD? It is because of the names (.length vs .size), or a holdover from when unicode (as opposed to str) did not expect to be compact, or is there a deeper reason? (2) Why does PyASCIIObject have a wstr member, and why does PyCompactUnicodeObject have wstr_length? As best I can tell from the PEP or header file, wstr is only meaningful when either: (2a) wstr is shared with (and redundant to) the canonical representation -- which will therefore not be ASCII. So wstr (and wstr_length) shouldn't need to be represented explicitly, and certainly not in the PyASCIIObject base. or (2b) The string is a Legacy String (and PyUnicode_READY has not been called). Because it is a Legacy String, the object header must already be a full PyUnicodeObject, and the wstr fields could at least be stored there. I'm also not sure why wstr can't be stored in the existing .data member -- once PyUnicode_READY is called, it will either be there (shared) or be discarded. Are there other times when the wstr will be explicitly re-filled and cached? (3) I would feel much less nervous if the remaining 4 values of PyUnicode_Kind were explicitly reserved, and the macros raised an error when they showed up. (Better still would be to allow other values, and to have the macros delegate to some attribute on the (sub) type object.) Discussion on py-ideas strongly suggested that people should not be rolling their own string string representations, and that it won't really save as much as people think it will, etc ... but I'm not sure that saying do it without inheritance is the best solution -- and that is what treating kind as an exhaustive list does. -jJ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyUnicodeObject / PyASCIIObject questions
(1) Why is PyObject_HEAD used instead of PyObject_VAR_HEAD? It is because of the names (.length vs .size), or a holdover from when unicode (as opposed to str) did not expect to be compact, or is there a deeper reason? The unicode object is not a var object. In a var object, tp_itemsize gives the element size, which is not possible for unicode objects, since the itemsize may vary by instance. In addition, not all instances have the items after the base object (plus the size of the base object in tp_basicsize is also not always correct). (2) Why does PyASCIIObject have a wstr member, and why does PyCompactUnicodeObject have wstr_length? As best I can tell from the PEP or header file, wstr is only meaningful when either: No. wstr is most of all relevant if someone calls PyUnicode_AsUnicode(AndSize); any unicode object might get the wstr pointer filled out at some point. It can be shared only if sizeof(Py_UNICODE) matches the canonical width of the string. wstr_length is only relevant if wstr is not NULL. For a pure ASCII string (and also for Latin-1 and other BMP strings), the wstr length will always equal the canonical length (number of code points). Only for ASCII objects the optimization was made to drop the wstr_length from the representation. I'm also not sure why wstr can't be stored in the existing .data member -- once PyUnicode_READY is called, it will either be there (shared) or be discarded. Most objects won't have the .data member. For those that do, .data holds the canonical representation (and *only* after PyUnicode_READY has been called). (3) I would feel much less nervous if the remaining 4 values of PyUnicode_Kind were explicitly reserved, and the macros raised an error when they showed up. (Better still would be to allow other values, and to have the macros delegate to some attribute on the (sub) type object.) Discussion on py-ideas strongly suggested that people should not be rolling their own string string representations, and that it won't really save as much as people think it will, etc ... but I'm not sure that saying do it without inheritance is the best solution -- and that is what treating kind as an exhaustive list does. If people use C, they can construct all kinds of illegal representations, for any object (e.g. lists where the stored length differs from the actual length, dictionaries where key an value are switched, and so on). If they do that, they likely get crashes and other failures, so they quickly stop doing it. In the specific case of kind values: many places will either work incorrectly, or have an assertion in debug mode already if an unexpected kind is encountered. I don't mind adding such checks to more places, but I also don't see a need to explicitly care about this specific class of bugs where people would have to deliberately try to cheat. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com