[Python-Dev] Omission in re.sub?
I've just come across an omission in re.sub which I hadn't noticed before. In re.sub the replacement string can contain escape sequences, for example: repr(re.sub(rx, r\n, axb)) 'a\\nb' However: repr(re.sub(rx, r\x0A, axb)) 'ax0Ab' Yes, it doesn't recognise \xNN. Is there a reason for this? The regex module does the same, but is there any objection to me fixing it in the regex module? (I'm thinking about compatibility with re here.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Tag trackbacks with version (was Re: readd u'' literal support in 3.3?)
On Sat, Dec 10, 2011 at 5:30 PM, Terry Reedy tjre...@udel.edu wrote: Is doctest really insisting that the whole line Traceback (most recent call last): exactly match, with nothing added? It really should not, as that is not part of the language spec. This seems like the tail wagging the dog. It's a regular expression match, actually. The standard matcher ignores everything between the Traceback line (matched by a regex) and the first unindented line that follows in the doctest. However, if you explicitly try to match a traceback with the ellipsis matcher, intending to observe whether certain specific lines are printed, then you wouldn't be using doctest's built-in matcher, and that was the case I was concerned about. However, as it turns out, I was confused about when this latter case occurs: in order to do it, you have to actually intentionally print a traceback (e.g. via traceback.format_exception() and friends), rather than allowing the exception to propagate normally. This doesn't happen nearly as often in my doctests as I thought it did, but if format_exception() changes it'll still affect some people. The other piece I was pointing out was that if you change the message without changing the doctest regex, then pasting an interpreter transcript into a doctest will no longer work, because doctest will think it's trying to match non-error output. So that has to be changed when the exception format changes. So, no actual objection here; just saying that if you don't change that regex, people who create *new* doctests with tracebacks won't be able to get them to work without deleting the version info from their copy-pasted tracebacks. I was also concerned about a situation that, while it exists, does not occur anywhere near as frequently as I thought it would in my own tests, even for things that seriously abuse Python internals and likely can't be ported to Python 3 anyway. ;-) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Omission in re.sub?
As long as there's a way to place a single backslash in the output this seems fine to me, though I'm not sure it's important. Of course it will likely break some test... the test will then have to be fixed. I can't remember why we did this -- is there a full list of all the escapes that re.sub() interprets somewhere? I thought it was pretty limited. Maybe it's the related list of escapes that are supported in regular expressions? --Guido On Sun, Dec 11, 2011 at 12:12 PM, MRAB pyt...@mrabarnett.plus.com wrote: I've just come across an omission in re.sub which I hadn't noticed before. In re.sub the replacement string can contain escape sequences, for example: repr(re.sub(rx, r\n, axb)) 'a\\nb' However: repr(re.sub(rx, r\x0A, axb)) 'ax0Ab' Yes, it doesn't recognise \xNN. Is there a reason for this? The regex module does the same, but is there any objection to me fixing it in the regex module? (I'm thinking about compatibility with re here.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Omission in re.sub?
On 11/12/2011 20:27, Guido van Rossum wrote: On Sun, Dec 11, 2011 at 12:12 PM, MRABpyt...@mrabarnett.plus.com wrote: I've just come across an omission in re.sub which I hadn't noticed before. In re.sub the replacement string can contain escape sequences, for example: repr(re.sub(rx, r\n, axb)) 'a\\nb' However: repr(re.sub(rx, r\x0A, axb)) 'ax0Ab' Yes, it doesn't recognise \xNN. Is there a reason for this? The regex module does the same, but is there any objection to me fixing it in the regex module? (I'm thinking about compatibility with re here.) As long as there's a way to place a single backslash in the output this seems fine to me, though I'm not sure it's important. Of course it will likely break some test... the test will then have to be fixed. I can't remember why we did this -- is there a full list of all the escapes that re.sub() interprets somewhere? I thought it was pretty limited. Maybe it's the related list of escapes that are supported in regular expressions? The documentation says: That is, \n is converted to a single newline character, \r is converted to a linefeed, and so forth. All of the other escape sequences work as expected, except for \u and \U which aren't supported at all in re. I should probably also add \N{...} to the list for completeness. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Omission in re.sub?
I guess the current rule is that any escapes referring to characters by a numeric value are not supported; this probably made some kind of sense because \1 etc. are backreferences. But since we're discouraging octal escapes anyway I think it's fine to improve over this. On Sun, Dec 11, 2011 at 12:47 PM, MRAB pyt...@mrabarnett.plus.com wrote: On 11/12/2011 20:27, Guido van Rossum wrote: On Sun, Dec 11, 2011 at 12:12 PM, MRABpyt...@mrabarnett.plus.com wrote: I've just come across an omission in re.sub which I hadn't noticed before. In re.sub the replacement string can contain escape sequences, for example: repr(re.sub(rx, r\n, axb)) 'a\\nb' However: repr(re.sub(rx, r\x0A, axb)) 'ax0Ab' Yes, it doesn't recognise \xNN. Is there a reason for this? The regex module does the same, but is there any objection to me fixing it in the regex module? (I'm thinking about compatibility with re here.) As long as there's a way to place a single backslash in the output this seems fine to me, though I'm not sure it's important. Of course it will likely break some test... the test will then have to be fixed. I can't remember why we did this -- is there a full list of all the escapes that re.sub() interprets somewhere? I thought it was pretty limited. Maybe it's the related list of escapes that are supported in regular expressions? The documentation says: That is, \n is converted to a single newline character, \r is converted to a linefeed, and so forth. All of the other escape sequences work as expected, except for \u and \U which aren't supported at all in re. I should probably also add \N{...} to the list for completeness. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Fixing the XML batteries
Am 09.12.2011 10:09, schrieb Xavier Morel: On 2011-12-09, at 09:41 , Martin v. Löwis wrote: a) The stdlib documentation should help users to choose the right tool right from the start. Instead of using the totally misleading wording that it uses now, it should be honest about the performance characteristics of MiniDOM and should actively suggest that those who don't know what to choose (or even *that* they can choose) should not use MiniDOM in the first place. [...] Minidom is inferior in interface flow and pythonicity, in terseness, in speed, in memory consumption (even more so using cElementTree, and that's not something which can be fixed unless minidom gets a C accelerator), etc… Even after fixing minidom (if anybody has the time and drive to commit to it), ET/cET should be preferred over it. I don't mind pointing people to ElementTree, despite that I disagree whether the ET interface is superior to DOM. It's Stefan's reasoning as to *why* people should be pointed to ET, and what words should be used to do that. IOW, I detest bashing some part of the standard library, just to urge users to use some other part of the standard library. People are still using PyXML, despite it's not being maintained anymore. Telling them to replace 4DOM with minidom is much more appropriate than telling them to rewrite in ET. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] cpython: Document PyUnicode_Copy() and PyUnicode_EncodeCodePage()
Am 09.12.2011 10:12, schrieb Nick Coghlan: On Fri, Dec 9, 2011 at 6:44 PM, Martin v. Löwis mar...@v.loewis.de wrote: Am 09.12.2011 01:35, schrieb Antoine Pitrou: On Fri, 09 Dec 2011 00:16:02 +0100 victor.stinner python-check...@python.org wrote: +.. c:function:: PyObject* PyUnicode_Copy(PyObject *unicode) + + Get a new copy of a Unicode object. + + .. versionadded:: 3.3 I'm not sure I understand. Why would you make a copy of an immutable object? It can convert a unicode subtype object into a an exact unicode object. I'd rename it to _PyUnicode_AsExactUnicode, and undocument it. Isn't it basically just exposing a C level version of the unicode() builtin's behaviour? No. To call the unicode() builtin, do PyObject_CallFunction(PyUnicode_Type, O, param) or some such. PyUnicode_Copy doesn't correspond to any Python-level API. While I agree the name could be better (and PyUnicode_AsExactUnicode would certainly work), why make it private? I suggest to be minimalistic in extensions to the API. There should be a demonstrated need for an API before adding it, which I don't see in this case. In general, it will be difficult to find a demonstrable need for new APIs, since the majority (more than 99%) of API use cases is already covered by the abstract object API (i.e. what ceval uses). The unicode type in particular has a bad tradition of adding tons of function to the C API, only so we find out a few releases later that the API is obsolete (e.g. needs additional/different parameters), so we carry unused functions around just because some extension module may use them. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Omission in re.sub?
On 11/12/2011 21:04, Guido van Rossum wrote: On Sun, Dec 11, 2011 at 12:47 PM, MRABpyt...@mrabarnett.plus.com wrote: On 11/12/2011 20:27, Guido van Rossum wrote: On Sun, Dec 11, 2011 at 12:12 PM, MRABpyt...@mrabarnett.plus.com wrote: I've just come across an omission in re.sub which I hadn't noticed before. In re.sub the replacement string can contain escape sequences, for example: repr(re.sub(rx, r\n, axb)) 'a\\nb' However: repr(re.sub(rx, r\x0A, axb)) 'ax0Ab' Yes, it doesn't recognise \xNN. Is there a reason for this? The regex module does the same, but is there any objection to me fixing it in the regex module? (I'm thinking about compatibility with re here.) As long as there's a way to place a single backslash in the output this seems fine to me, though I'm not sure it's important. Of course it will likely break some test... the test will then have to be fixed. I can't remember why we did this -- is there a full list of all the escapes that re.sub() interprets somewhere? I thought it was pretty limited. Maybe it's the related list of escapes that are supported in regular expressions? The documentation says: That is, \n is converted to a single newline character, \r is converted to a linefeed, and so forth. All of the other escape sequences work as expected, except for \u and \U which aren't supported at all in re. I should probably also add \N{...} to the list for completeness. I guess the current rule is that any escapes referring to characters by a numeric value are not supported; this probably made some kind of sense because \1 etc. are backreferences. But since we're discouraging octal escapes anyway I think it's fine to improve over this. A pattern can contain them, even octal escapes (must be 3 digits). ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Fixing the XML batteries
Am 09.12.2011 16:09, schrieb Dirkjan Ochtman: On Fri, Dec 9, 2011 at 09:02, Stefan Behnel stefan...@behnel.de wrote: a) The stdlib documentation should help users to choose the right tool right from the start. b) cElementTree should finally loose it's special status as a separate library and disappear as an accelerator module behind ElementTree. An at least somewhat informed +1 from me. The ElementTree API is a very good way to deal with XML from Python, and it deserves to be promoted over the included alternatives. Let's deprecate the NiCad batteries and try to guide users toward the Li-Ion ones. If you are proposing to deprecate minidom: -1 Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Fixing the XML batteries
I can't recall anyone working on any substantial improvements during the last six years or so, and the reason for that seems obvious to me. What do you think is the reason? It's not at all obvious to me. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] cpython: Issue #5689: Add support for lzma compression to the tarfile module.
On Sat, 10 Dec 2011 20:40:17 +0100 lars.gustaebel python-check...@python.org wrote: The :mod:`tarfile` module makes it possible to read and write tar -archives, including those using gzip or bz2 compression. +archives, including those using gzip, bz2 and lzma compression. (:file:`.zip` files can be read and written using the :mod:`zipfile` module.) Perhaps there should be a versionchanged directive for lzma support? Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] cpython: Document PyUnicode_Copy() and PyUnicode_EncodeCodePage()
Am 09.12.2011 20:32, schrieb Antoine Pitrou: On Fri, 09 Dec 2011 19:51:14 +0100 Victor Stinner victor.stin...@haypocalc.com wrote: On 09/12/2011 01:35, Antoine Pitrou wrote: On Fri, 09 Dec 2011 00:16:02 +0100 victor.stinnerpython-check...@python.org wrote: +.. c:function:: PyObject* PyUnicode_Copy(PyObject *unicode) + + Get a new copy of a Unicode object. + + .. versionadded:: 3.3 I'm not sure I understand. Why would you make a copy of an immutable object? PyUnicode_Copy() can be used to modify a string to create a new string with the same length. It is used for example by str.upper(), str.title(), ... (fixup()). Then the doc should mention that the returned string can be modified. Otherwise it's a bit obscure why the function exists. I'm skeptical about this modification part. If you make a copy, it's not clear at all that the new characters that you put in will fit in range with the width of the unicode string. Even decreasing the ordinal of a character may be incorrect as the result may not be canonical anymore. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Fixing the XML batteries
On 2011-12-11, at 23:03 , Martin v. Löwis wrote: People are still using PyXML, despite it's not being maintained anymore. Telling them to replace 4DOM with minidom is much more appropriate than telling them to rewrite in ET. From my understanding, Stefan's suggestion is mostly aimed at new python users trying to manipulate XML and not knowing what to use (yet). It's not about telling people to rewrite existing codebase (it's a good idea as well when possible, as far as I'm concerned, but it's a different issue). ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] cpython: Document PyUnicode_Copy() and PyUnicode_EncodeCodePage()
Le dimanche 11 décembre 2011 à 23:44 +0100, Martin v. Löwis a écrit : Am 09.12.2011 20:32, schrieb Antoine Pitrou: On Fri, 09 Dec 2011 19:51:14 +0100 Victor Stinner victor.stin...@haypocalc.com wrote: On 09/12/2011 01:35, Antoine Pitrou wrote: On Fri, 09 Dec 2011 00:16:02 +0100 victor.stinnerpython-check...@python.org wrote: +.. c:function:: PyObject* PyUnicode_Copy(PyObject *unicode) + + Get a new copy of a Unicode object. + + .. versionadded:: 3.3 I'm not sure I understand. Why would you make a copy of an immutable object? PyUnicode_Copy() can be used to modify a string to create a new string with the same length. It is used for example by str.upper(), str.title(), ... (fixup()). Then the doc should mention that the returned string can be modified. Otherwise it's a bit obscure why the function exists. I'm skeptical about this modification part. If you make a copy, it's not clear at all that the new characters that you put in will fit in range with the width of the unicode string. Even decreasing the ordinal of a character may be incorrect as the result may not be canonical anymore. Ah, good point. And perhaps a good reason to make the API private. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] readd u'' literal support in 3.3?
Am 09.12.2011 11:17, schrieb Nick Coghlan: On Fri, Dec 9, 2011 at 8:03 PM, Terry Reedy tjre...@udel.edu wrote: On 12/8/2011 8:39 PM, Vinay Sajip wrote: on an entire codebase (for example, using setup.py with flags to run 2to3 during setup). Oh. That explains the 'slow' complaint. As Chris pointed out though, the real problem with the repeatedly run 2to3 workflow is that it can make interpreting tracebacks from the field *really* hard. It's hard, but not *really* hard. In most cases, the line numbers in the 2to3 result are exactly the same as in the original, and if not, the quoted source in the traceback will give you enough context to find the source line of the problem. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] readd u'' literal support in 3.3?
When running 2to3 from a setup.py script, does it run on the whole codebase or only files that are found newer by the make-like timestamp-based dependency system? If you run build repeatedly (e.g. in a development cycle), then it will process only the modified files (comparing time stamps between the build/ area and the original source). Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] 2to3 and timestamps
When running 2to3 from a setup.py script, does it run on the whole codebase or only files that are found newer by the make-like timestamp-based dependency system? If it’s the former, as some messages seem to show (sorry no time to test right now), ISTM we can fix distutils to do the latter (unless there are bugs due to import rewriting to use explicit relative imports when there are extension modules—blergh). It would be better to teach 2to3 to do it by itself. Not everybody runs 2to3 through a setup.py script. For the 2to3 command line tool, the issue is where it shall place the output. It currently supports writing diffs to stdout (without saving any conversion result), and overwriting the original file (which means that it loses the original files). So before you try to consider incremental output, you need to consider original-preserving saves first. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] readd u'' literal support in 3.3?
Even in the plans that involve 2to3 though, drop everything prior to 2.6 was always supposed to be step 0, so single codebase adds much less of a burden than I thought. Are you talking about general porting, or about Twisted? It is a common misconception that drop everything prior to 2.6 was a recommended step 0 for porting to Python 3. That was never recommended. Instead, what *was* recommended is port to Python 2.6, which for many projects already supporting, say, 2.5, was a no-op, so people read more into that than was actually necessary. With the project ported to 2.6, you could then make use of the 3k warnings to learn what issues you would face when porting to 3k. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] cpython: Document PyUnicode_Copy() and PyUnicode_EncodeCodePage()
Le vendredi 9 décembre 2011 20:32:16 Antoine Pitrou a écrit : ... it's a bit obscure why the function exists. Yeah ok, I marked the function as private: renamed to _PyUnicode_Copy() and I undocumented it. Victor ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Omission in re.sub?
On Sun, Dec 11, 2011 at 2:36 PM, MRAB pyt...@mrabarnett.plus.com wrote: On 11/12/2011 21:04, Guido van Rossum wrote: On Sun, Dec 11, 2011 at 12:47 PM, MRABpyt...@mrabarnett.plus.com wrote: On 11/12/2011 20:27, Guido van Rossum wrote: On Sun, Dec 11, 2011 at 12:12 PM, MRABpyt...@mrabarnett.plus.com wrote: I've just come across an omission in re.sub which I hadn't noticed before. In re.sub the replacement string can contain escape sequences, for example: repr(re.sub(rx, r\n, axb)) 'a\\nb' However: repr(re.sub(rx, r\x0A, axb)) 'ax0Ab' Yes, it doesn't recognise \xNN. Is there a reason for this? The regex module does the same, but is there any objection to me fixing it in the regex module? (I'm thinking about compatibility with re here.) As long as there's a way to place a single backslash in the output this seems fine to me, though I'm not sure it's important. Of course it will likely break some test... the test will then have to be fixed. I can't remember why we did this -- is there a full list of all the escapes that re.sub() interprets somewhere? I thought it was pretty limited. Maybe it's the related list of escapes that are supported in regular expressions? The documentation says: That is, \n is converted to a single newline character, \r is converted to a linefeed, and so forth. All of the other escape sequences work as expected, except for \u and \U which aren't supported at all in re. I should probably also add \N{...} to the list for completeness. I guess the current rule is that any escapes referring to characters by a numeric value are not supported; this probably made some kind of sense because \1 etc. are backreferences. But since we're discouraging octal escapes anyway I think it's fine to improve over this. A pattern can contain them, even octal escapes (must be 3 digits). Fine, then I think we should model this. Though I think that we could start deprecating octal escapes in patterns so that eventually we can support over 99 backreferences. So maybe we should just not start supporting octal in the substitution string now. -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Fixing the XML batteries
Martin, You seem heavily invested in minidom. In the near future I will need to parse and rewrite parts of an xml file created by a third-party program (PrintShopMail, for the curious). It contains both binary and textual data. Would you recommend minidom for this purpose? What other purposes would you recommend minidom for? xml-confused-ly yours, ~Ethan~ (Comments by others are, of course, also welcome. :) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com