[Python-Dev] Omission in re.sub?

2011-12-11 Thread MRAB

I've just come across an omission in re.sub which I hadn't noticed
before.

In re.sub the replacement string can contain escape sequences, for
example:

 repr(re.sub(rx, r\n, axb))
'a\\nb'

However:

 repr(re.sub(rx, r\x0A, axb))
'ax0Ab'

Yes, it doesn't recognise \xNN.

Is there a reason for this?

The regex module does the same, but is there any objection to me fixing
it in the regex module? (I'm thinking about compatibility with re here.)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Tag trackbacks with version (was Re: readd u'' literal support in 3.3?)

2011-12-11 Thread PJ Eby
On Sat, Dec 10, 2011 at 5:30 PM, Terry Reedy tjre...@udel.edu wrote:

 Is doctest really insisting that the whole line
  Traceback (most recent call last):
 exactly match, with nothing added? It really should not, as that is not
 part of the language spec. This seems like the tail wagging the dog.


It's a regular expression match, actually.  The standard matcher ignores
everything between the Traceback line (matched by a regex) and the first
unindented line that follows in the doctest.  However, if you explicitly
try to match a traceback with the ellipsis matcher, intending to observe
whether certain specific lines are printed, then you wouldn't be using
doctest's built-in matcher, and that was the case I was concerned about.

However, as it turns out, I was confused about when this latter case
occurs: in order to do it, you have to actually intentionally print a
traceback (e.g. via traceback.format_exception() and friends), rather than
allowing the exception to propagate normally.  This doesn't happen nearly
as often in my doctests as I thought it did, but if format_exception()
changes it'll still affect some people.

The other piece I was pointing out was that if you change the message
without changing the doctest regex, then pasting an interpreter transcript
into a doctest will no longer work, because doctest will think it's trying
to match non-error output.  So that has to be changed when the exception
format changes.

So, no actual objection here; just saying that if you don't change that
regex, people who create *new* doctests with tracebacks won't be able to
get them to work without deleting the version info from their copy-pasted
tracebacks.  I was also concerned about a situation that, while it exists,
does not occur anywhere near as frequently as I thought it would in my own
tests, even for things that seriously abuse Python internals and likely
can't be ported to Python 3 anyway.  ;-)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Omission in re.sub?

2011-12-11 Thread Guido van Rossum
As long as there's a way to place a single backslash in the output
this seems fine to me, though I'm not sure it's important. Of course
it will likely break some test... the test will then have to be fixed.

I can't remember why we did this -- is there a full list of all the
escapes that re.sub() interprets somewhere? I thought it was pretty
limited. Maybe it's the related list of escapes that are supported in
regular expressions?

--Guido

On Sun, Dec 11, 2011 at 12:12 PM, MRAB pyt...@mrabarnett.plus.com wrote:
 I've just come across an omission in re.sub which I hadn't noticed
 before.

 In re.sub the replacement string can contain escape sequences, for
 example:

 repr(re.sub(rx, r\n, axb))
 'a\\nb'

 However:

 repr(re.sub(rx, r\x0A, axb))
 'ax0Ab'

 Yes, it doesn't recognise \xNN.

 Is there a reason for this?

 The regex module does the same, but is there any objection to me fixing
 it in the regex module? (I'm thinking about compatibility with re here.)
 ___
 Python-Dev mailing list
 Python-Dev@python.org
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe:
 http://mail.python.org/mailman/options/python-dev/guido%40python.org



-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Omission in re.sub?

2011-12-11 Thread MRAB

On 11/12/2011 20:27, Guido van Rossum wrote:

On Sun, Dec 11, 2011 at 12:12 PM, MRABpyt...@mrabarnett.plus.com
wrote:

I've just come across an omission in re.sub which I hadn't noticed
before.

In re.sub the replacement string can contain escape sequences, for
example:


repr(re.sub(rx, r\n, axb))

'a\\nb'

However:


repr(re.sub(rx, r\x0A, axb))

'ax0Ab'

Yes, it doesn't recognise \xNN.

Is there a reason for this?

The regex module does the same, but is there any objection to me
fixing it in the regex module? (I'm thinking about compatibility
with re here.)


As long as there's a way to place a single backslash in the output
this seems fine to me, though I'm not sure it's important. Of course
it will likely break some test... the test will then have to be
fixed.

I can't remember why we did this -- is there a full list of all the
escapes that re.sub() interprets somewhere? I thought it was pretty
limited. Maybe it's the related list of escapes that are supported
in regular expressions?

The documentation says: That is, \n is converted to a single newline 
character, \r is converted to a linefeed, and so forth.


All of the other escape sequences work as expected, except for \u
and \U which aren't supported at all in re.

I should probably also add \N{...} to the list for completeness.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Omission in re.sub?

2011-12-11 Thread Guido van Rossum
I guess the current rule is that any escapes referring to characters
by a numeric value are not supported; this probably made some kind of
sense because \1 etc. are backreferences. But since we're discouraging
octal escapes anyway I think it's fine to improve over this.

On Sun, Dec 11, 2011 at 12:47 PM, MRAB pyt...@mrabarnett.plus.com wrote:
 On 11/12/2011 20:27, Guido van Rossum wrote:

 On Sun, Dec 11, 2011 at 12:12 PM, MRABpyt...@mrabarnett.plus.com
 wrote:

 I've just come across an omission in re.sub which I hadn't noticed
 before.

 In re.sub the replacement string can contain escape sequences, for
 example:

 repr(re.sub(rx, r\n, axb))

 'a\\nb'

 However:

 repr(re.sub(rx, r\x0A, axb))

 'ax0Ab'

 Yes, it doesn't recognise \xNN.

 Is there a reason for this?

 The regex module does the same, but is there any objection to me
 fixing it in the regex module? (I'm thinking about compatibility
 with re here.)


 As long as there's a way to place a single backslash in the output
 this seems fine to me, though I'm not sure it's important. Of course
 it will likely break some test... the test will then have to be
 fixed.

 I can't remember why we did this -- is there a full list of all the
 escapes that re.sub() interprets somewhere? I thought it was pretty
 limited. Maybe it's the related list of escapes that are supported
 in regular expressions?

 The documentation says: That is, \n is converted to a single newline
 character, \r is converted to a linefeed, and so forth.

 All of the other escape sequences work as expected, except for \u
 and \U which aren't supported at all in re.

 I should probably also add \N{...} to the list for completeness.

 ___
 Python-Dev mailing list
 Python-Dev@python.org
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe:
 http://mail.python.org/mailman/options/python-dev/guido%40python.org



-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Fixing the XML batteries

2011-12-11 Thread Martin v. Löwis
Am 09.12.2011 10:09, schrieb Xavier Morel:
 On 2011-12-09, at 09:41 , Martin v. Löwis wrote:
 a) The stdlib documentation should help users to choose the right
 tool right from the start. Instead of using the totally
 misleading wording that it uses now, it should be honest about
 the performance characteristics of MiniDOM and should actively
 suggest that those who don't know what to choose (or even *that*
 they can choose) should not use MiniDOM in the first place.
 
[...]
 
 Minidom is inferior in interface flow and pythonicity, in terseness,
 in speed, in memory consumption (even more so using cElementTree, and
 that's not something which can be fixed unless minidom gets a C
 accelerator), etc… Even after fixing minidom (if anybody has the time
 and drive to commit to it), ET/cET should be preferred over it.

I don't mind pointing people to ElementTree, despite that I disagree
whether the ET interface is superior to DOM. It's Stefan's reasoning
as to *why* people should be pointed to ET, and what words should be
used to do that. IOW, I detest bashing some part of the standard
library, just to urge users to use some other part of the standard library.

People are still using PyXML, despite it's not being maintained anymore.
Telling them to replace 4DOM with minidom is much more appropriate than
telling them to rewrite in ET.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] cpython: Document PyUnicode_Copy() and PyUnicode_EncodeCodePage()

2011-12-11 Thread Martin v. Löwis
Am 09.12.2011 10:12, schrieb Nick Coghlan:
 On Fri, Dec 9, 2011 at 6:44 PM, Martin v. Löwis mar...@v.loewis.de wrote:
 Am 09.12.2011 01:35, schrieb Antoine Pitrou:
 On Fri, 09 Dec 2011 00:16:02 +0100
 victor.stinner python-check...@python.org wrote:

 +.. c:function:: PyObject* PyUnicode_Copy(PyObject *unicode)
 +
 +   Get a new copy of a Unicode object.
 +
 +   .. versionadded:: 3.3

 I'm not sure I understand. Why would you make a copy of an immutable
 object?

 It can convert a unicode subtype object into a an exact unicode
 object.

 I'd rename it to _PyUnicode_AsExactUnicode, and undocument it.
 
 Isn't it basically just exposing a C level version of the unicode()
 builtin's behaviour?

No. To call the unicode() builtin, do

  PyObject_CallFunction(PyUnicode_Type, O, param)

or some such. PyUnicode_Copy doesn't correspond to any Python-level
API.

 While I agree the name could be better (and
 PyUnicode_AsExactUnicode would certainly work), why make it private?

I suggest to be minimalistic in extensions to the API. There should
be a demonstrated need for an API before adding it, which I don't
see in this case.

In general, it will be difficult to find a demonstrable need for new
APIs, since the majority (more than 99%) of API use cases is already
covered by the abstract object API (i.e. what ceval uses).

The unicode type in particular has a bad tradition of adding tons
of function to the C API, only so we find out a few releases later
that the API is obsolete (e.g. needs additional/different parameters),
so we carry unused functions around just because some extension module
may use them.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Omission in re.sub?

2011-12-11 Thread MRAB

On 11/12/2011 21:04, Guido van Rossum wrote:

On Sun, Dec 11, 2011 at 12:47 PM, MRABpyt...@mrabarnett.plus.com  wrote:

On 11/12/2011 20:27, Guido van Rossum wrote:


On Sun, Dec 11, 2011 at 12:12 PM, MRABpyt...@mrabarnett.plus.com
wrote:


I've just come across an omission in re.sub which I hadn't noticed
before.

In re.sub the replacement string can contain escape sequences, for
example:


repr(re.sub(rx, r\n, axb))


'a\\nb'

However:


repr(re.sub(rx, r\x0A, axb))


'ax0Ab'

Yes, it doesn't recognise \xNN.

Is there a reason for this?

The regex module does the same, but is there any objection to me
fixing it in the regex module? (I'm thinking about compatibility
with re here.)



As long as there's a way to place a single backslash in the output
this seems fine to me, though I'm not sure it's important. Of course
it will likely break some test... the test will then have to be
fixed.

I can't remember why we did this -- is there a full list of all the
escapes that re.sub() interprets somewhere? I thought it was pretty
limited. Maybe it's the related list of escapes that are supported
in regular expressions?


The documentation says: That is, \n is converted to a single newline
character, \r is converted to a linefeed, and so forth.

All of the other escape sequences work as expected, except for \u
and \U which aren't supported at all in re.

I should probably also add \N{...} to the list for completeness.


I guess the current rule is that any escapes referring to characters
by a numeric value are not supported; this probably made some kind of
sense because \1 etc. are backreferences. But since we're discouraging
octal escapes anyway I think it's fine to improve over this.


A pattern can contain them, even octal escapes (must be 3 digits).
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Fixing the XML batteries

2011-12-11 Thread Martin v. Löwis
Am 09.12.2011 16:09, schrieb Dirkjan Ochtman:
 On Fri, Dec 9, 2011 at 09:02, Stefan Behnel stefan...@behnel.de wrote:
 a) The stdlib documentation should help users to choose the right tool right
 from the start.
 b) cElementTree should finally loose it's special status as a separate
 library and disappear as an accelerator module behind ElementTree.
 
 An at least somewhat informed +1 from me. The ElementTree API is a
 very good way to deal with XML from Python, and it deserves to be
 promoted over the included alternatives.
 
 Let's deprecate the NiCad batteries and try to guide users toward the
 Li-Ion ones.

If you are proposing to deprecate minidom: -1

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Fixing the XML batteries

2011-12-11 Thread Martin v. Löwis
 I can't recall anyone working on any substantial improvements during the
 last six years or so, and the reason for that seems obvious to me.

What do you think is the reason? It's not at all obvious to me.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] cpython: Issue #5689: Add support for lzma compression to the tarfile module.

2011-12-11 Thread Antoine Pitrou
On Sat, 10 Dec 2011 20:40:17 +0100
lars.gustaebel python-check...@python.org wrote:
  
  The :mod:`tarfile` module makes it possible to read and write tar
 -archives, including those using gzip or bz2 compression.
 +archives, including those using gzip, bz2 and lzma compression.
  (:file:`.zip` files can be read and written using the :mod:`zipfile` module.)

Perhaps there should be a versionchanged directive for lzma support?

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] cpython: Document PyUnicode_Copy() and PyUnicode_EncodeCodePage()

2011-12-11 Thread Martin v. Löwis
Am 09.12.2011 20:32, schrieb Antoine Pitrou:
 On Fri, 09 Dec 2011 19:51:14 +0100
 Victor Stinner victor.stin...@haypocalc.com wrote:
 On 09/12/2011 01:35, Antoine Pitrou wrote:
 On Fri, 09 Dec 2011 00:16:02 +0100
 victor.stinnerpython-check...@python.org  wrote:

 +.. c:function:: PyObject* PyUnicode_Copy(PyObject *unicode)
 +
 +   Get a new copy of a Unicode object.
 +
 +   .. versionadded:: 3.3

 I'm not sure I understand. Why would you make a copy of an immutable
 object?

 PyUnicode_Copy() can be used to modify a string to create a new string 
 with the same length. It is used for example by str.upper(), 
 str.title(), ... (fixup()).
 
 Then the doc should mention that the returned string can be modified.
 Otherwise it's a bit obscure why the function exists.

I'm skeptical about this modification part. If you make a copy, it's
not clear at all that the new characters that you put in will fit
in range with the width of the unicode string. Even decreasing the
ordinal of a character may be incorrect as the result may not be
canonical anymore.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Fixing the XML batteries

2011-12-11 Thread Xavier Morel
On 2011-12-11, at 23:03 , Martin v. Löwis wrote:
 People are still using PyXML, despite it's not being maintained anymore.
 Telling them to replace 4DOM with minidom is much more appropriate than
 telling them to rewrite in ET.

From my understanding, Stefan's suggestion is mostly aimed at new
python users trying to manipulate XML and not knowing what to use
(yet). It's not about telling people to rewrite existing codebase
(it's a good idea as well when possible, as far as I'm concerned, but
it's a different issue).
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] cpython: Document PyUnicode_Copy() and PyUnicode_EncodeCodePage()

2011-12-11 Thread Antoine Pitrou
Le dimanche 11 décembre 2011 à 23:44 +0100, Martin v. Löwis a écrit :
 Am 09.12.2011 20:32, schrieb Antoine Pitrou:
  On Fri, 09 Dec 2011 19:51:14 +0100
  Victor Stinner victor.stin...@haypocalc.com wrote:
  On 09/12/2011 01:35, Antoine Pitrou wrote:
  On Fri, 09 Dec 2011 00:16:02 +0100
  victor.stinnerpython-check...@python.org  wrote:
 
  +.. c:function:: PyObject* PyUnicode_Copy(PyObject *unicode)
  +
  +   Get a new copy of a Unicode object.
  +
  +   .. versionadded:: 3.3
 
  I'm not sure I understand. Why would you make a copy of an immutable
  object?
 
  PyUnicode_Copy() can be used to modify a string to create a new string 
  with the same length. It is used for example by str.upper(), 
  str.title(), ... (fixup()).
  
  Then the doc should mention that the returned string can be modified.
  Otherwise it's a bit obscure why the function exists.
 
 I'm skeptical about this modification part. If you make a copy, it's
 not clear at all that the new characters that you put in will fit
 in range with the width of the unicode string. Even decreasing the
 ordinal of a character may be incorrect as the result may not be
 canonical anymore.

Ah, good point. And perhaps a good reason to make the API private.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] readd u'' literal support in 3.3?

2011-12-11 Thread Martin v. Löwis
Am 09.12.2011 11:17, schrieb Nick Coghlan:
 On Fri, Dec 9, 2011 at 8:03 PM, Terry Reedy tjre...@udel.edu wrote:
 On 12/8/2011 8:39 PM, Vinay Sajip wrote:
 on an

 entire codebase (for example, using setup.py with flags to run 2to3
 during setup).


 Oh. That explains the 'slow' complaint.
 
 As Chris pointed out though, the real problem with the repeatedly run
 2to3 workflow is that it can make interpreting tracebacks from the
 field *really* hard.

It's hard, but not *really* hard. In most cases, the line numbers
in the 2to3 result are exactly the same as in the original, and if
not, the quoted source in the traceback will give you enough context
to find the source line of the problem.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] readd u'' literal support in 3.3?

2011-12-11 Thread Martin v. Löwis
 When running 2to3 from a setup.py script, does it run on the whole
 codebase or only files that are found newer by the make-like
 timestamp-based dependency system? 

If you run build repeatedly (e.g. in a development cycle), then
it will process only the modified files (comparing time stamps
between the build/ area and the original source).

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] 2to3 and timestamps

2011-12-11 Thread Martin v. Löwis
 When running 2to3 from a setup.py script, does it run on the whole
 codebase or only files that are found newer by the make-like
 timestamp-based dependency system?  If it’s the former, as some messages
 seem to show (sorry no time to test right now), ISTM we can fix
 distutils to do the latter (unless there are bugs due to import
 rewriting to use explicit relative imports when there are extension
 modules—blergh).
 
 It would be better to teach 2to3 to do it by itself. Not everybody runs
 2to3 through a setup.py script.

For the 2to3 command line tool, the issue is where it shall place the
output. It currently supports writing diffs to stdout (without saving
any conversion result), and overwriting the original file (which means
that it loses the original files).

So before you try to consider incremental output, you need to consider
original-preserving saves first.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] readd u'' literal support in 3.3?

2011-12-11 Thread Martin v. Löwis
 Even in the plans that involve 2to3
 though, drop everything prior to 2.6 was always supposed to be step 0,
 so single codebase adds much less of a burden than I thought.

Are you talking about general porting, or about Twisted?

It is a common misconception that drop everything prior to 2.6 was
a recommended step 0 for porting to Python 3. That was never
recommended.

Instead, what *was* recommended is port to Python 2.6, which for many
projects already supporting, say, 2.5, was a no-op, so people read more
into that than was actually necessary. With the project ported to 2.6,
you could then make use of the 3k warnings to learn what issues you
would face when porting to 3k.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] cpython: Document PyUnicode_Copy() and PyUnicode_EncodeCodePage()

2011-12-11 Thread Victor Stinner
Le vendredi 9 décembre 2011 20:32:16 Antoine Pitrou a écrit :
 ... it's a bit obscure why the function exists.

Yeah ok, I marked the function as private: renamed to _PyUnicode_Copy() and I 
undocumented it.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Omission in re.sub?

2011-12-11 Thread Guido van Rossum
On Sun, Dec 11, 2011 at 2:36 PM, MRAB pyt...@mrabarnett.plus.com wrote:
 On 11/12/2011 21:04, Guido van Rossum wrote:

 On Sun, Dec 11, 2011 at 12:47 PM, MRABpyt...@mrabarnett.plus.com  wrote:

 On 11/12/2011 20:27, Guido van Rossum wrote:


 On Sun, Dec 11, 2011 at 12:12 PM, MRABpyt...@mrabarnett.plus.com
 wrote:


 I've just come across an omission in re.sub which I hadn't noticed
 before.

 In re.sub the replacement string can contain escape sequences, for
 example:

 repr(re.sub(rx, r\n, axb))


 'a\\nb'

 However:

 repr(re.sub(rx, r\x0A, axb))


 'ax0Ab'

 Yes, it doesn't recognise \xNN.

 Is there a reason for this?

 The regex module does the same, but is there any objection to me
 fixing it in the regex module? (I'm thinking about compatibility
 with re here.)



 As long as there's a way to place a single backslash in the output
 this seems fine to me, though I'm not sure it's important. Of course
 it will likely break some test... the test will then have to be
 fixed.

 I can't remember why we did this -- is there a full list of all the
 escapes that re.sub() interprets somewhere? I thought it was pretty
 limited. Maybe it's the related list of escapes that are supported
 in regular expressions?

 The documentation says: That is, \n is converted to a single newline
 character, \r is converted to a linefeed, and so forth.

 All of the other escape sequences work as expected, except for \u
 and \U which aren't supported at all in re.

 I should probably also add \N{...} to the list for completeness.

 I guess the current rule is that any escapes referring to characters
 by a numeric value are not supported; this probably made some kind of
 sense because \1 etc. are backreferences. But since we're discouraging
 octal escapes anyway I think it's fine to improve over this.

 A pattern can contain them, even octal escapes (must be 3 digits).

Fine, then I think we should model this. Though I think that we could
start deprecating octal escapes in patterns so that eventually we can
support over 99 backreferences. So maybe we should just not start
supporting octal in the substitution string now.

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Fixing the XML batteries

2011-12-11 Thread Ethan Furman

Martin,

You seem heavily invested in minidom.

In the near future I will need to parse and rewrite parts of an xml file 
created by a third-party program (PrintShopMail, for the curious).

It contains both binary and textual data.

Would you recommend minidom for this purpose?  What other purposes would 
you recommend minidom for?


xml-confused-ly yours,

~Ethan~

(Comments by others are, of course, also welcome. :)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com