Re: [Python-Dev] Should we move to replace re with regex?

2011-08-29 Thread Terry Reedy

On 8/29/2011 9:00 AM, Barry Warsaw wrote:

On Aug 27, 2011, at 07:11 PM, Martin v. Löwis wrote:


A PEP should IMO only cover end-user aspects of the new re module.
Code organization is typically not in the PEP. To give a specific
example: you mentioned that there is (near) code duplication
MRAB's module. As a reviewer, I would discuss whether this can be
eliminated - but not in the PEP.


+1


I think at this point we need a tracker issue to which can be attached 
such reviews, for safe-keeping, even if most discussion continues here.


--
Terry Jan Reedy


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should we move to replace re with regex?

2011-08-29 Thread Barry Warsaw
On Aug 27, 2011, at 01:15 PM, Ben Finney wrote:

>My question is directed more to M-A Lemburg's passage above, and its
>implicit assumption that the user understand the changes between
>“Unicode 2.0/3.0 semantics” and “Unicode 6 semantics”, and how their own
>needs relate to those semantics.

More likely, it'll be a choice between wanting Unicode 6 semantics, and "don't
care".  So the PEP could include some clues as to why you'd care to use regex
instead of re.

-Barry
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should we move to replace re with regex?

2011-08-29 Thread Barry Warsaw
On Aug 26, 2011, at 05:25 PM, Dan Stromberg wrote:

>from __future__ import is an established way of trying something for a while
>to see if it's going to work.

Actually, no.

The documentation says:

-snip snip-
__future__ is a real module, and serves three purposes:

* To avoid confusing existing tools that analyze import statements and expect
  to find the modules they’re importing.
* To ensure that future statements run under releases prior to 2.1 at least
  yield runtime exceptions (the import of __future__ will fail, because there
  was no module of that name prior to 2.1).
* To document when incompatible changes were introduced, and when they will be
  — or were — made mandatory. This is a form of executable documentation, and
  can be inspected programmatically via importing __future__ and examining its
  contents.
-snip snip-

So, really the __future__ module is a way to introduce accepted but
incompatible changes in a controlled way, through successive releases.  It's
never been used to introduce experimental features that might be removed if
they don't work out.

Cheers,
-Barry
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should we move to replace re with regex?

2011-08-29 Thread Barry Warsaw
On Aug 27, 2011, at 07:11 PM, Martin v. Löwis wrote:

>A PEP should IMO only cover end-user aspects of the new re module.
>Code organization is typically not in the PEP. To give a specific
>example: you mentioned that there is (near) code duplication
>MRAB's module. As a reviewer, I would discuss whether this can be
>eliminated - but not in the PEP.

+1

-Barry

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should we move to replace re with regex?

2011-08-29 Thread Ezio Melotti
On Sun, Aug 28, 2011 at 7:28 AM, Guido van Rossum  wrote:

>
> Are you volunteering? (Even if you don't want to be the only
> maintainer, it still sounds like you'd be a good co-maintainer of the
> regex module.)
>

My name is listed in the experts index for 're' [0], and that should make me
already "co-maintainer" for the module.


> [...]
>
> >   4) add documentation for the module and the (public) functions in
> > Doc/library (this should be done anyway).
>
> Does regex have a significany public C interface? (_sre.c doesn't.)
> Does it have a Python-level interface beyond what re.py offers (apart
> from the obvious new flags and new regex syntax/semantics)?
>

I don't think it does.
Explaining the new syntax/semantics is useful for developers (e.g.what \p
and \X are supposed to match), but also for users, so it's fine to have this
documented in Doc/library/re.rst (and I don't think it's necessary to
duplicate it in the README/PEP/Wiki).


>
> > This will ensure that the general quality of the code is good, and when
> > someone actually has to work on the code, there's enough documentation to
> > make it possible.
>
> That sounds like a good description of a process that could lead to
> acceptance of regex as a re replacement.
>
>
So if we want to get this done I think we need Matthew for 1) (unless
someone else wants to do it and have him review the result).
If making a diff with the current re is doable and makes sense, we can use
the rietveld instance on the bug tracker to make the review for 2).  The
same could be done with a diff that replaces the whole module though.
3) will follow after 2), and 4) is not difficult and can be done when we
actually replace re (it's probably enough to reorganize a bit and convert to
rst the page on PyPI).

Best Regards,
Ezio Melotti

[0]: http://docs.python.org/devguide/experts.html#stdlib
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Should we move to replace re with regex?

2011-08-28 Thread Guido van Rossum
Someone asked me off-line what I wanted besides talk. Here's the list
I came up with:

You could try for instance volunteer to do a thorough code review of
the regex code, trying to think of ways to break it (e.g. bad syntax
or extreme use of nesting etc., or bad data). Or you could volunteer
to maintain it in the future. Or you could try to port it to PEP 393.
Or you could systematically go over the given list of differences
between re and regex and decide whether they are likely to be
backwards incompatibilities that will break existing code. Or you
could try to add some of the functionality requested by Tom C in one
of his several bugs.

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should we move to replace re with regex?

2011-08-28 Thread Nick Coghlan
On Sun, Aug 28, 2011 at 2:28 PM, Guido van Rossum  wrote:
> On Sat, Aug 27, 2011 at 8:59 PM, Ezio Melotti  wrote:
>> I think it would be good to:
>>   1) have some document that explains the general design and main (internal)
>> functions of the module (e.g. a PEP);
>
> I don't think that such a document needs to be a PEP; PEPs are usually
> intended where there is significant discussion expected, not just to
> explain things. A README file or a Wiki page would be fine, as long as
> it's sufficiently comprehensive.

timsort.txt and dictnotes.txt may be useful precedents for the kind of
thing that is useful on that front. IIRC, the pymalloc stuff has a
massive embedded comment, which can also work.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should we move to replace re with regex?

2011-08-27 Thread Terry Reedy

On 8/27/2011 11:54 PM, Guido van Rossum wrote:


If so, it would be like the decimal
module, which closely tracks the IEEE decimal standard, rather than the
binary float standard.


Well, I would hope that for each "major" Python version (i.e. 3.2,
3.3, 3.4, ...) we would pick a specific version of the Unicode
standard and declare our desire to be compliant with that Unicode
standard version, and not switch allegiances in some bugfix version
(e.g. 3.2.3, 3.3.1, ...).


Definitely. The unicode version would have to be frozen with beta 1 if 
not before. (I am quite sure the decimal module also freezes the IEEE 
standard version *it* follows for each Python version.)


In my view, x.y is a version of the Python language while the x.y.z 
CPython releases are progressively better implementations of that one 
language, starting with x.y.0. This is the main reason I suggested that 
the first CPython release for the 3.3 language be called 3.3.0, as it 
now is. In this view, there is no question of an x.y.z+1 release 
changing the definition of the x.y language.


--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should we move to replace re with regex?

2011-08-27 Thread Guido van Rossum
On Sat, Aug 27, 2011 at 8:59 PM, Ezio Melotti  wrote:
> On Sat, Aug 27, 2011 at 4:56 AM, Antoine Pitrou  wrote:
>>
>> On Sat, 27 Aug 2011 04:37:21 +0300
>> Ezio Melotti  wrote:
>> >
>> > I'm not sure it's worth doing an extensive review of the code, a better
>> > approach might be to require extensive test coverage  (and a review of
>> > tests).  If the code seems well written, commented, documented (I think
>> > proper rst documentation is still missing),
>>
>> Isn't this precisely what a review is supposed to assess?
>
> This can be done without actually knowing and understanding every single
> function in the module (I got the impression that someone wants this kind of
> review, correct me if I'm wrong).

Wasn't me. I've long given up expecting to understand every line of
code in CPython. I'm happy if the code is written in a way that makes
it possible to read and understand it as the need arises.

>> > We will get familiar with the code once we start contributing
>> > to it and fixing bugs, as it already happens with most of the other
>> > modules.
>>
>> I'm not sure it's a good idea for a module with more than 1 lines
>> of C code (and 4000 lines of pure Python code). This is several times
>> the size of multiprocessing. The C code looks very cleanly written, but
>> it's still a big chunk of algorithmically sophisticated code.
>
> Even unicodeobject.c is 10k+ lines of C code and I got familiar with (parts
> of) it just by fixing bugs in specific functions.
> I took a look at the regex code and it seems clear, with enough comments and
> several small functions that are easy to follow and understand.
> multiprocessing requires good knowledge of a number of concepts and
> platform-specific issues that makes it more difficult to understand and
> maintain (but maybe regex-related concepts seems easier to me because I'm
> already familiar with them).

Are you volunteering? (Even if you don't want to be the only
maintainer, it still sounds like you'd be a good co-maintainer of the
regex module.)

> I think it would be good to:
>   1) have some document that explains the general design and main (internal)
> functions of the module (e.g. a PEP);

I don't think that such a document needs to be a PEP; PEPs are usually
intended where there is significant discussion expected, not just to
explain things. A README file or a Wiki page would be fine, as long as
it's sufficiently comprehensive.

>   2) make a review on rietveld (possibly only of the diff with re, to limit
> the review to the new code only), so that people can ask questions, discuss
> and understand the code;

That would be an interesting exercise indeed.

>   3) possibly update the document/PEP with the outcome of the rietveld
> review(s) and/or address the issues discussed (if any);

Yeah, of course.

>   4) add documentation for the module and the (public) functions in
> Doc/library (this should be done anyway).

Does regex have a significany public C interface? (_sre.c doesn't.)
Does it have a Python-level interface beyond what re.py offers (apart
from the obvious new flags and new regex syntax/semantics)?

> This will ensure that the general quality of the code is good, and when
> someone actually has to work on the code, there's enough documentation to
> make it possible.

That sounds like a good description of a process that could lead to
acceptance of regex as a re replacement.

>> Another "interesting" question is whether it's easy to port to the PEP
>> 393 string representation, if it gets accepted.

It's very likely that PEP 393 is accepted. So likely, in fact, that I
would recommend that you start porting regex to PEP 393 now. The
experience would benefit both your understanding of the regex module
and the quality of the PEP and its implementation.

I like what I hear here!

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should we move to replace re with regex?

2011-08-27 Thread Ezio Melotti
On Sat, Aug 27, 2011 at 4:56 AM, Antoine Pitrou  wrote:

> On Sat, 27 Aug 2011 04:37:21 +0300
> Ezio Melotti  wrote:
> >
> > I'm not sure it's worth doing an extensive review of the code, a better
> > approach might be to require extensive test coverage  (and a review of
> > tests).  If the code seems well written, commented, documented (I think
> > proper rst documentation is still missing),
>
> Isn't this precisely what a review is supposed to assess?
>

This can be done without actually knowing and understanding every single
function in the module (I got the impression that someone wants this kind of
review, correct me if I'm wrong).


>
> > We will get familiar with the code once we start contributing
> > to it and fixing bugs, as it already happens with most of the other
> modules.
>
> I'm not sure it's a good idea for a module with more than 1 lines
> of C code (and 4000 lines of pure Python code). This is several times
> the size of multiprocessing. The C code looks very cleanly written, but
> it's still a big chunk of algorithmically sophisticated code.
>

Even unicodeobject.c is 10k+ lines of C code and I got familiar with (parts
of) it just by fixing bugs in specific functions.
I took a look at the regex code and it seems clear, with enough comments and
several small functions that are easy to follow and understand.
multiprocessing requires good knowledge of a number of concepts and
platform-specific issues that makes it more difficult to understand and
maintain (but maybe regex-related concepts seems easier to me because I'm
already familiar with them).

I think it would be good to:
  1) have some document that explains the general design and main (internal)
functions of the module (e.g. a PEP);
  2) make a review on rietveld (possibly only of the diff with re, to limit
the review to the new code only), so that people can ask questions, discuss
and understand the code;
  3) possibly update the document/PEP with the outcome of the rietveld
review(s) and/or address the issues discussed (if any);
  4) add documentation for the module and the (public) functions in
Doc/library (this should be done anyway).

This will ensure that the general quality of the code is good, and when
someone actually has to work on the code, there's enough documentation to
make it possible.

Best Regards,
Ezio Melotti


>
> Another "interesting" question is whether it's easy to port to the PEP
> 393 string representation, if it gets accepted.
>
> Regards
>
> Antoine.
>
>
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should we move to replace re with regex?

2011-08-27 Thread Guido van Rossum
On Sat, Aug 27, 2011 at 5:48 PM, Terry Reedy  wrote:
> Many of the things regex does differently might be called either bug fixes
> or feature changes, depending on one's viewpoint. Regex should definitely
> not be 'bug-compatible'.

Well, as you said, it depends on one's viewpoint. If there's a bug in
the treatment of non-BMP character ranges, that's a bug, and fixing it
shouldn't break anybody's code (unless it was worth breaking :-). But
if there's a change that e.g. (hypothetical example) makes a different
choice about how empty matches are treated in some edge case, and the
old behavior was properly documented, that's a feature change, and I'd
rather introduce a flag to select the new behavior (or, if we have to,
a flag to preserve the old behavior, if the new behavior is really
considered much better and much more useful).

> I think regex should be unicode-standard compliant as much as possible, and
> let the chips fall where they may.

In most cases the Unicode improvements in regex are not where it is
incompatible; e.g. adding \X and named ranges are fine new additions
and IIUC the syntax was carefully designed not to introduce any
incompatibilities (within the limitations of \-escapes).

It's the many other "improvements" to the regex module that sometimes
make it incompatible.There's a comprehensive list here:
http://pypi.python.org/pypi/regex . Somebody should just go over it
and for each difference make a recommendation for whether to treat
this as a bugfix, a compatible new feature, or an incompatibility that
requires some kind of flag. (We could have a single flag for all
incompatibilities, or several flags.)

> If so, it would be like the decimal
> module, which closely tracks the IEEE decimal standard, rather than the
> binary float standard.

Well, I would hope that for each "major" Python version (i.e. 3.2,
3.3, 3.4, ...) we would pick a specific version of the Unicode
standard and declare our desire to be compliant with that Unicode
standard version, and not switch allegiances in some bugfix version
(e.g. 3.2.3, 3.3.1, ...).

> Regex is already much more compliant than re, as shown by Tom Christiansen.

Nobody disagrees with this or thinks it's a bad thing. :-)

> This is pretty obviously intentional on MB's part.

That's also clear.

> It is also probably intentional that re *not* match today's Unicode
> TR18 specifications.

That I'm not so sure of. I think it's more the case that TR18 evolved
and that the re modules didn't -- probably mostly because nobody had
the time and nobody was aware of the TR18 changes.

> These are reasons why both Ezio and I suggested on the tracker adding regex
> without deleting re. (I personally would not mind just replacing re with
> regex, but then I have no legacy re code to break. So I am not suggesting
> that out of respect for those who do.)

That option is definitely still on the table. At the very least a
thorough review of the stated differences between re and regex should
be done -- I trust that MR has been very thorough in his listing of
those differences. The issues regarding maintenance and stability of
MR's code can be solved in a number of ways -- if MR doesn't mind I
would certainly be willing to give him core committer access (though
I'd still recommend that he use his time primarily to train others in
maintaining this important code base).

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should we move to replace re with regex?

2011-08-27 Thread Ezio Melotti
On Sun, Aug 28, 2011 at 3:48 AM, Terry Reedy  wrote:

>
> These are reasons why both Ezio and I suggested on the tracker adding regex
> without deleting re. (I personally would not mind just replacing re with
> regex, but then I have no legacy re code to break. So I am not suggesting
> that out of respect for those who do.)
>

I would actually prefer to replace re.

Before doing that we should make a list of all the differences between the
two modules (possibly in the PEP).  On the regex page on PyPI there's
already a list that can be used for this purpose [0].
For bug fixes it *shouldn't* be a problem if the behavior changes.  New
features shouldn't bring any backward-incompatible behavioral changes, and,
as far as I understand, Matthew introduced the NEW flag [1], to avoid
problems when they do.

I think re should be kept around only if there are too many
incompatibilities left and if they can't be fixed in regex.

Best Regards,
Ezio Melotti


[0]: http://pypi.python.org/pypi/regex/0.1.20110717
[1]: "The NEW flag turns on the new behaviour of this module, which can
differ from that of the 're' module, such as splitting on zero-width
matches, inline flags affecting only what follows, and being able to turn
inline flags off."
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should we move to replace re with regex?

2011-08-27 Thread Terry Reedy

On 8/27/2011 7:39 PM, Greg Ewing wrote:

Nick Coghlan wrote:


The next step needed is for someone to volunteer to write and champion
a PEP that:


Would it be feasible and desirable to modify regex so
that it *is* backwards-compatible with re, with a view
to making it a drop-in replacement at some point?

If not, the PEP should discuss this also.


Many of the things regex does differently might be called either bug 
fixes or feature changes, depending on one's viewpoint. Regex should 
definitely not be 'bug-compatible'.


I think regex should be unicode-standard compliant as much as possible, 
and let the chips fall where they may. If so, it would be like the 
decimal module, which closely tracks the IEEE decimal standard, rather 
than the binary float standard. Regex is already much more compliant 
than re, as shown by Tom Christiansen. This is pretty obviously 
intentional on MB's part. It is also probably intentional that re *not* 
match today's Unicode TR18 specifications.


These are reasons why both Ezio and I suggested on the tracker adding 
regex without deleting re. (I personally would not mind just replacing 
re with regex, but then I have no legacy re code to break. So I am not 
suggesting that out of respect for those who do.)


--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should we move to replace re with regex?

2011-08-27 Thread Greg Ewing

Nick Coghlan wrote:


The next step needed is for someone to volunteer to write and champion
a PEP that:


Would it be feasible and desirable to modify regex so
that it *is* backwards-compatible with re, with a view
to making it a drop-in replacement at some point?

If not, the PEP should discuss this also.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should we move to replace re with regex?

2011-08-27 Thread Steven D'Aprano

Dan Stromberg wrote:

On Sat, Aug 27, 2011 at 9:53 AM, Brian Curtin wrote:


On Sat, Aug 27, 2011 at 11:48, Dan Stromberg  wrote:

No, this was not the intent of __future__. The intent is that a

feature is desirable but also backwards incompatible (e.g. introduces
a new keyword) so that for 1 (sometimes more) releases we require the
users to use the __future__ import.

There was never any intent to use __future__ for experimental
features. If we want that maybe we could have from __experimental__
import .

OK.  So what -is- the purpose of from __future__ import?

It's in the first paragraph.



I disagree.  The first paragraph says this has something to do with new
keywords.  It doesn't appear to say what we expect users to -do- with it.
Both are important.


Have you read the PEP? I found it very helpful.

http://www.python.org/dev/peps/pep-0236/

The motivation given in the first paragraph is pretty clear to me: 
__future__ is machinery added to Python to aid the transition when a 
backwards incompatible change is made.


Perhaps it needs a note stating explicitly that it is not for trying out 
new features which may or may not be added at a later date. That may 
help prevent confusion in the, er, future.



[...]

And if people do complain, what are python-dev's options?


The PEP includes a question very similar to that:


  Q: Going back to the nested_scopes example, what if release 2.2
 comes along and I still haven't changed my code?  How can I keep
 the 2.1 behavior then?

  A: By continuing to use 2.1, and not moving to 2.2 until you do
 change your code.  The purpose of future_statement is to make
 life easier for people who keep current with the latest release
 in a timely fashion.  We don't hate you if you don't, but your
 problems are much harder to solve, and somebody with those
 problems will need to write a PEP addressing them.
 future_statement is aimed at a different audience.


To me, it's quite clear: once a feature change hits __future__, it is 
already part of the language. It may be an optional part for at least 
one release, but removing it again will require the same deprecation 
process as removing any other language feature (see PEP 5 for more details).




--
Steven

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should we move to replace re with regex?

2011-08-27 Thread Martin v. Löwis
> I disagree.  The first paragraph says this has something to do with new
> keywords.  It doesn't appear to say what we expect users to -do- with
> it.  Both are important.

Well, users can use the new features...

> Is it "You'd better try this, because it's going in eventually.  If you
> don't try it out before it becomes default behavior, you have no right
> to complain"?

No. It's "we have that feature which will be activated in a future
version. If you want to use it today, use the __future__ import. If
you don't want to use it (now or in the future), just don't."

> And if people do complain, what are python-dev's options?

That will depend on the complaint. If it's "I don't like the new
feature", then the obvious response is "don't use it, then".

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should we move to replace re with regex?

2011-08-27 Thread Virgil Dupras
On 2011-08-27, at 2:20 PM, Dan Stromberg wrote:

> 
> On Sat, Aug 27, 2011 at 9:53 AM, Brian Curtin  wrote:
> On Sat, Aug 27, 2011 at 11:48, Dan Stromberg  wrote:
> No, this was not the intent of __future__. The intent is that a
> feature is desirable but also backwards incompatible (e.g. introduces
> a new keyword) so that for 1 (sometimes more) releases we require the
> users to use the __future__ import.
> 
> There was never any intent to use __future__ for experimental
> features. If we want that maybe we could have from __experimental__
> import .
> 
> OK.  So what -is- the purpose of from __future__ import?
> 
> It's in the first paragraph. 
> 
> I disagree.  The first paragraph says this has something to do with new 
> keywords.  It doesn't appear to say what we expect users to -do- with it.  
> Both are important.
> 
> Is it "You'd better try this, because it's going in eventually.  If you don't 
> try it out before it becomes default behavior, you have no right to complain"?
> 
> And if people do complain, what are python-dev's options?
> 

__future__ imports have nothing to do with "trying stuff before it comes", it 
has to do with backward compatibility. For example, the "with_statement" was a 
__future__ import because introducing the "with" keyword would break any code 
using "with" as a token. I don't think that the goal of introducing "with" as a 
future import was "we're gonna see how it pans out, and decide if we really 
introduce it later".

__future__ means "It's coming, prepare your code".
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should we move to replace re with regex?

2011-08-27 Thread Dan Stromberg
On Sat, Aug 27, 2011 at 9:53 AM, Brian Curtin wrote:

> On Sat, Aug 27, 2011 at 11:48, Dan Stromberg  wrote:
>>
>> No, this was not the intent of __future__. The intent is that a
>>> feature is desirable but also backwards incompatible (e.g. introduces
>>> a new keyword) so that for 1 (sometimes more) releases we require the
>>> users to use the __future__ import.
>>>
>>> There was never any intent to use __future__ for experimental
>>> features. If we want that maybe we could have from __experimental__
>>> import .
>>>
>>> OK.  So what -is- the purpose of from __future__ import?
>>
>
> It's in the first paragraph.
>

I disagree.  The first paragraph says this has something to do with new
keywords.  It doesn't appear to say what we expect users to -do- with it.
Both are important.

Is it "You'd better try this, because it's going in eventually.  If you
don't try it out before it becomes default behavior, you have no right to
complain"?

And if people do complain, what are python-dev's options?
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should we move to replace re with regex?

2011-08-27 Thread Martin v. Löwis
Am 27.08.2011 12:10, schrieb Antoine Pitrou:
> On Sat, 27 Aug 2011 08:02:31 +0200
> "Martin v. Löwis"  wrote:
>>> I'm not sure it's worth doing an extensive review of the code, a better
>>> approach might be to require extensive test coverage  (and a review of
>>> tests).
>>
>> I think it's worth. It's really bad if only one developer fully
>> understands the regex implementation.
> 
> Could such a review be the topic of an informational PEP?

Well, the reviewer would also have to dive into the code details,
e.g. through Rietveld. Of course, referencing the Rietveld issue in
the PEP might be appropriate.

A PEP should IMO only cover end-user aspects of the new re module.
Code organization is typically not in the PEP. To give a specific
example: you mentioned that there is (near) code duplication
MRAB's module. As a reviewer, I would discuss whether this can be
eliminated - but not in the PEP.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should we move to replace re with regex?

2011-08-27 Thread Brian Curtin
On Sat, Aug 27, 2011 at 11:48, Dan Stromberg  wrote:
>
> No, this was not the intent of __future__. The intent is that a
>> feature is desirable but also backwards incompatible (e.g. introduces
>> a new keyword) so that for 1 (sometimes more) releases we require the
>> users to use the __future__ import.
>>
>> There was never any intent to use __future__ for experimental
>> features. If we want that maybe we could have from __experimental__
>> import .
>>
>> OK.  So what -is- the purpose of from __future__ import?
>

It's in the first paragraph.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should we move to replace re with regex?

2011-08-27 Thread Dan Stromberg
On Sat, Aug 27, 2011 at 9:19 AM, Guido van Rossum  wrote:

> On Fri, Aug 26, 2011 at 11:01 PM, Dan Stromberg 
> wrote:
> [Steven]
> >> Have then been any __future__ features that were added provisionally?
> >
> > I can't either, but ISTR hearing that from __future__ import was started
> > with such an intent.  Irrespective, it's hard to import something from
> > "future" without at least suspecting that you're on the bleeding edge.
>
> No, this was not the intent of __future__. The intent is that a
> feature is desirable but also backwards incompatible (e.g. introduces
> a new keyword) so that for 1 (sometimes more) releases we require the
> users to use the __future__ import.
>
> There was never any intent to use __future__ for experimental
> features. If we want that maybe we could have from __experimental__
> import .
>
> OK.  So what -is- the purpose of from __future__ import?
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should we move to replace re with regex?

2011-08-27 Thread Guido van Rossum
On Fri, Aug 26, 2011 at 11:01 PM, Dan Stromberg  wrote:
[Steven]
>> Have then been any __future__ features that were added provisionally?
>
> I can't either, but ISTR hearing that from __future__ import was started
> with such an intent.  Irrespective, it's hard to import something from
> "future" without at least suspecting that you're on the bleeding edge.

No, this was not the intent of __future__. The intent is that a
feature is desirable but also backwards incompatible (e.g. introduces
a new keyword) so that for 1 (sometimes more) releases we require the
users to use the __future__ import.

There was never any intent to use __future__ for experimental
features. If we want that maybe we could have from __experimental__
import .

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should we move to replace re with regex?

2011-08-27 Thread exarkun

On 26 Aug, 09:45 pm, gu...@python.org wrote:

I just made a pass of all the Unicode-related bugs filed by Tom
Christiansen, and found that in several, the response was "this is
fixed in the regex module [by Matthew Barnett]". I started replying
that I thought that we should fix the bugs in the re module (i.e.,
really in _sre.c) but on second thought I wonder if maybe regex is
mature enough to replace re in Python 3.3. It would mean that we won't
fix any of these bugs in earlier Python versions, but I could live
with that.

However, I don't know much about regex -- how compatible is it, how
fast is it (including extreme cases where the backtracking goes
crazy), how bug-free is it, and so on. Plus, how much work would it be
to actually incorporate it into CPython as a complete drop-in
replacement of the re package (such that nobody needs to change their
imports or the flags they pass to the re module).

We'd also probably have to train some core developers to be familiar
enough with the code to maintain and evolve it -- I assume we can't
just volunteer Matthew to do so forever... :-)

What's the alternative? Is adding the requested bug fixes and new
features to _sre.c really that hard?


What about other Python implementations (ie, PEP 399)?  For this to be 
seriously considered, shouldn't there also be a pure Python 
implementation of the functionality?


Jean-Paul
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should we move to replace re with regex?

2011-08-27 Thread Antoine Pitrou
On Sat, 27 Aug 2011 08:02:31 +0200
"Martin v. Löwis"  wrote:
> > I'm not sure it's worth doing an extensive review of the code, a better
> > approach might be to require extensive test coverage  (and a review of
> > tests).
> 
> I think it's worth. It's really bad if only one developer fully
> understands the regex implementation.

Could such a review be the topic of an informational PEP?

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should we move to replace re with regex?

2011-08-27 Thread Antoine Pitrou
On Sat, 27 Aug 2011 09:18:14 +0200
"Martin v. Löwis"  wrote:
> Am 27.08.2011 08:33, schrieb Terry Reedy:
> > On 8/26/2011 9:56 PM, Antoine Pitrou wrote:
> > 
> >> Another "interesting" question is whether it's easy to port to the PEP
> >> 393 string representation, if it gets accepted.
> > 
> > Will the re module need porting also?
> 
> That's a quality-of-implementation issue (in both cases). In principle,
> the modules should continue to work unmodified, and indeed SRE does.
> However, the module will then match on Py_UNICODE, which may be
> expensive to produce, and may not meet your expectations of surrogate
> pair handling.
> 
> So realistically, the module should be ported, which has the challenge
> that matching needs to operate on three different representations. The
> modules already support two representations (unsigned char and
> Py_UNICODE), but probably switching on type, not on state.

>From what I've seen, re generates two different sets of functions at
compile-time (with a stringlib-like approach), while regex has a
run-time flag to choose between the two representations (where,
interestingly, the two code paths are explicitly spelled, almost
duplicate of each other).
Matthew, please correct me if I'm wrong.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should we move to replace re with regex?

2011-08-27 Thread Nick Coghlan
On Sat, Aug 27, 2011 at 4:01 PM, Dan Stromberg  wrote:
> You're talking technically, which is important, but wasn't what I was
> suggesting would be helped.
>
> Politically, and from a marketing standpoint, it's easier to withdraw a
> feature you've given with a "Play with this, see if it works for you"
> warning.

The standard library isn't for playing. "pip install regex" is for
playing. If we aren't sure we want to make the transition, then it
doesn't go in.

However, to my mind, reviewing and incorporating regex is a far more
feasible model than trying to enhance the existing re module with a
comparable feature set. At the moment, there's already an obvious way
to get enhanced regex support in Python: install regex and use it
instead of the standard library's re module. That's enough to pretty
much kill any motivation anyone might have to make major changes to re
itself.

We're at least getting one thing right this time that we got wrong
with multiprocessing, though - we're much, much further out from the
3.3 release than we were from the 2.6 release when multiprocessing was
added to the standard library :)

The next step needed is for someone to volunteer to write and champion
a PEP that:
- articulates the deficiencies in the current re module (the regex
docs already cover some of this, as do Tom Christiansen's notes on the
issue tracker)
- explains why upgrading re in place is not feasible (e.g. noting that
the availability of regex really limits the desire for anyone to
reinvent that particular wheel, so even things that are theoretically
possible may be highly unlikely in practice)
- proposes a transition plan (personally, I'd be fine with an optparse
-> argparse style transition where re remains around indefinitely to
support legacy code, but new users are pointed towards regex. But
depending on compatibility details, merging the two APIs in the
existing re namespace may also be feasible)
- proposes a maintenance strategy (I don't know how much Matthew has
written regarding internal design details, but that kind of thing
could really help. Matthew agreeing to continue maintenance as part of
the standard library would also help a great deal, but wouldn't be
enough on its own - while it's good for modules to have active
maintainers to make the final call associated design decisions, it's
potentially problematic when other core developers don't understand
what the code is doing well enough to fix bugs in it)
- confirms that the regex test suite can be incorporated cleanly into
the standard library regression test suite (the difficulty of this was
something that was underestimated for the inclusion of
multiprocessing. Test suite integration is also the final sticking
point holding up the PEP 380 'yield from' patch, although that's close
to being resolved following the PyConAU sprints)
- document tests conducted (e.g. micro-benchmark results, fusil results)

PEP 371 (addition of multiprocessing), PEP 389 (addition of argparse)
and Jesse's reflections on the way multiprocessing was added
(http://jessenoller.com/2009/01/28/multiprocessing-in-hindsight/) are
well worth reading for anyone considering stepping up to write a PEP.
That last also highlights why even Matthew's support, however capably
he has handled maintenance of regex as an independent project,
wouldn't be enough - we had Richard Oudkerk's support and agreement to
continue maintenance as the original author of multiprocessing, but he
became unavailable early in the integration process. If Jesse hadn't
been able to take up most of that slack, the likely result would have
been reversion of the changes and removal of multiprocessing from the
2.6 release.

Writing PEPs can be quite a frustrating experience (since a lot of
feedback will be negative as people try to poke holes in the idea to
see if it stands up to close scrutiny), but it's also really
satisfying and rewarding if they end up getting accepted and
incorporated :)

>> Have then been any __future__ features that were added provisionally?
>
> I can't either, but ISTR hearing that from __future__ import was started
> with such an intent.  Irrespective, it's hard to import something from
> "future" without at least suspecting that you're on the bleeding edge.

No, we make an explicit guarantee that future imports will never go
away once they've been added. They may become redundant, but they
won't break. There's no provision in the future mechanism for changes
that are added and then later removed (see
http://docs.python.org/dev/library/__future__).

They're strictly for cases where backwards incompatibilities (usually,
but not always, new keywords) may break existing code.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should we move to replace re with regex?

2011-08-27 Thread Martin v. Löwis
Am 27.08.2011 08:33, schrieb Terry Reedy:
> On 8/26/2011 9:56 PM, Antoine Pitrou wrote:
> 
>> Another "interesting" question is whether it's easy to port to the PEP
>> 393 string representation, if it gets accepted.
> 
> Will the re module need porting also?

That's a quality-of-implementation issue (in both cases). In principle,
the modules should continue to work unmodified, and indeed SRE does.
However, the module will then match on Py_UNICODE, which may be
expensive to produce, and may not meet your expectations of surrogate
pair handling.

So realistically, the module should be ported, which has the challenge
that matching needs to operate on three different representations. The
modules already support two representations (unsigned char and
Py_UNICODE), but probably switching on type, not on state.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should we move to replace re with regex?

2011-08-26 Thread Terry Reedy

On 8/26/2011 9:56 PM, Antoine Pitrou wrote:


Another "interesting" question is whether it's easy to port to the PEP
393 string representation, if it gets accepted.


Will the re module need porting also?

--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should we move to replace re with regex?

2011-08-26 Thread Martin v. Löwis
> I can't either, but ISTR hearing that from __future__ import was started
> with such an intent. 

No, not at all. The original intention was to enable features that would
definitely would be added, not just right now. Tim Peters always
objected to claims that future imports were talking about provisional
features.

> Politically, and from a marketing standpoint, it's easier to withdraw
> a feature you've given with a "Play with this, see if it works for
> you" warning.

We don't want to add features to Python that we may have to withdraw.
If there is doubt whether they should be added, they shouldn't be added.
If they do get added, we have to live with it (until, say, Python 4,
where bad features can be removed again).

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should we move to replace re with regex?

2011-08-26 Thread Martin v. Löwis
> I'm not sure it's worth doing an extensive review of the code, a better
> approach might be to require extensive test coverage  (and a review of
> tests).

I think it's worth. It's really bad if only one developer fully
understands the regex implementation.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com



Re: [Python-Dev] Should we move to replace re with regex?

2011-08-26 Thread Dan Stromberg
On Fri, Aug 26, 2011 at 8:47 PM, Steven D'Aprano wrote:

> Antoine Pitrou wrote:
>
>> On Fri, 26 Aug 2011 17:25:56 -0700
>> Dan Stromberg  wrote:
>>
> If you add regex as "import regex", and the new regex module doesn't work
>
>> out, regex might be harder to get rid of.  from __future__ import is an
>>> established way of trying something for a while to see if it's going to
>>> work.
>>>
>>
>> That's an interesting idea. This way, integrating the new module would
>> be a less risky move, since if it gives us too many problems, we could
>> back out our decision in the next feature release.
>>
>
> I'm not sure that's correct. If there are differences in either the
> interface or the behaviour between the new regex and re, then reverting will
> be a pain regardless of whether you have:
>
> from __future__ import re
> re.compile(...)
>
> or
>
> import regex
> regex.compile(...)
>
>
> Either way, if the new regex library goes away, code will break, and fixing
> it may not be easy.


You're talking technically, which is important, but wasn't what I was
suggesting would be helped.

Politically, and from a marketing standpoint, it's easier to withdraw a
feature you've given with a "Play with this, see if it works for you"
warning.

Have then been any __future__ features that were added provisionally?
>

I can't either, but ISTR hearing that from __future__ import was started
with such an intent.  Irrespective, it's hard to import something from
"future" without at least suspecting that you're on the bleeding edge.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should we move to replace re with regex?

2011-08-26 Thread Steven D'Aprano

Antoine Pitrou wrote:

On Fri, 26 Aug 2011 17:25:56 -0700
Dan Stromberg  wrote:

[...]

If you add regex as "import regex", and the new regex module doesn't work
out, regex might be harder to get rid of.  from __future__ import is an
established way of trying something for a while to see if it's going to
work.


That's an interesting idea. This way, integrating the new module would
be a less risky move, since if it gives us too many problems, we could
back out our decision in the next feature release.


I'm not sure that's correct. If there are differences in either the 
interface or the behaviour between the new regex and re, then reverting 
will be a pain regardless of whether you have:


from __future__ import re
re.compile(...)

or

import regex
regex.compile(...)


Either way, if the new regex library goes away, code will break, and 
fixing it may not be easy. It's not likely to be so easy that merely 
deleting the "from __future__ ..." line will do it, but if it is that 
easy, then using "import re as regex" will be just as easy.


Have then been any __future__ features that were added provisionally? I 
can't think of any. That's not what __future__ is for, at least 
according to PEP 236.


http://www.python.org/dev/peps/pep-0236/

I can't think of any __future__ feature that could be easily reverted 
once people start relying on it. Either syntax would break, or behaviour 
would change.


The PEP even explicitly states that __future__ should not be used for 
changes which are backward compatible:


Note that there is no need to involve the future_statement machinery
in new features unless they can break existing code; fully backward-
compatible additions can-- and should --be introduced without a
corresponding future_statement.


I wasn't around for the move from 1.4 regex to 1.5 re, so I don't know 
what was done poorly last time. But I can't see why we should treat 
regular expressions so differently from (say) argparse and optparse.


from __future__ import optparse

No. Just... no.




--
Steven

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should we move to replace re with regex?

2011-08-26 Thread Steven D'Aprano

Ben Finney wrote:

Steven D'Aprano  writes:


Ben Finney wrote:

"M.-A. Lemburg"  writes:

No, you tell them: "If you want Unicode 6 semantics, use regex, if
you're fine with Unicode 2.0/3.0 semantics, use re".

What do we say, then, to those who are unaware of the different
semantics between those versions of Unicode, and want regular expression
to “just work” in Python?

To which document can we direct them to understand what semantics they
want?

Presumably, like all modules, both the re and the regex module will
have their own individual pages in the library reference.


My question is directed more to M-A Lemburg's passage above, and its
implicit assumption that the user understand the changes between
“Unicode 2.0/3.0 semantics” and “Unicode 6 semantics”, and how their own
needs relate to those semantics.

For programmers who know they want to follow Unicode conventions in
Python, but don't know the distinction M-A Lemburg is drawing, to which
document does he recommend we direct them?



I can only repeat my answer: the docs for the new regex module should 
include a discussion of the differences. If that requires summarising 
the differences that M-A Lemburg refers to, then so be it.




“The Unicode specification document in its various versions” isn't a
feasible answer.


Presumably the Unicode spec will be the canonical source, but I agree 
that we should not expect people to read that in order to make a 
decision between re and regex.



--
Steven
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should we move to replace re with regex?

2011-08-26 Thread Ben Finney
Steven D'Aprano  writes:

> Ben Finney wrote:
> > "M.-A. Lemburg"  writes:
>
> >> No, you tell them: "If you want Unicode 6 semantics, use regex, if
> >> you're fine with Unicode 2.0/3.0 semantics, use re".
> >
> > What do we say, then, to those who are unaware of the different
> > semantics between those versions of Unicode, and want regular expression
> > to “just work” in Python?
> >
> > To which document can we direct them to understand what semantics they
> > want?
>
> Presumably, like all modules, both the re and the regex module will
> have their own individual pages in the library reference.

My question is directed more to M-A Lemburg's passage above, and its
implicit assumption that the user understand the changes between
“Unicode 2.0/3.0 semantics” and “Unicode 6 semantics”, and how their own
needs relate to those semantics.

For programmers who know they want to follow Unicode conventions in
Python, but don't know the distinction M-A Lemburg is drawing, to which
document does he recommend we direct them?

“The Unicode specification document in its various versions” isn't a
feasible answer.

-- 
 \ “Computers are useless. They can only give you answers.” —Pablo |
  `\   Picasso |
_o__)  |
Ben Finney

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should we move to replace re with regex?

2011-08-26 Thread Antoine Pitrou
On Fri, 26 Aug 2011 17:25:56 -0700
Dan Stromberg  wrote:
> On Fri, Aug 26, 2011 at 5:08 PM, Antoine Pitrou  wrote:
> 
> > On Fri, 26 Aug 2011 15:48:42 -0700
> > Dan Stromberg  wrote:
> > >
> > > Then there probably should be a from __future__ import for a while.
> >
> > If you are willing to use a "from __future__ import", why not simply
> >
> >import regex as re
> >
> > ? We're not Perl, we don't have built-in syntactic support for regular
> > expressions.
> >
> > Regards
> >
> 
> If you add regex as "import regex", and the new regex module doesn't work
> out, regex might be harder to get rid of.  from __future__ import is an
> established way of trying something for a while to see if it's going to
> work.

That's an interesting idea. This way, integrating the new module would
be a less risky move, since if it gives us too many problems, we could
back out our decision in the next feature release.

Regards

Antoine.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should we move to replace re with regex?

2011-08-26 Thread Antoine Pitrou
On Sat, 27 Aug 2011 04:37:21 +0300
Ezio Melotti  wrote:
> 
> I'm not sure it's worth doing an extensive review of the code, a better
> approach might be to require extensive test coverage  (and a review of
> tests).  If the code seems well written, commented, documented (I think
> proper rst documentation is still missing),

Isn't this precisely what a review is supposed to assess?

> We will get familiar with the code once we start contributing
> to it and fixing bugs, as it already happens with most of the other modules.

I'm not sure it's a good idea for a module with more than 1 lines
of C code (and 4000 lines of pure Python code). This is several times
the size of multiprocessing. The C code looks very cleanly written, but
it's still a big chunk of algorithmically sophisticated code.

Another "interesting" question is whether it's easy to port to the PEP
393 string representation, if it gets accepted.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should we move to replace re with regex?

2011-08-26 Thread Steven D'Aprano

Ben Finney wrote:

"M.-A. Lemburg"  writes:



No, you tell them: "If you want Unicode 6 semantics, use regex, if
you're fine with Unicode 2.0/3.0 semantics, use re".


What do we say, then, to those who are unaware of the different
semantics between those versions of Unicode, and want regular expression
to “just work” in Python?

To which document can we direct them to understand what semantics they
want?


Presumably, like all modules, both the re and the regex module will have 
their own individual pages in the library reference. As the newcomer, 
regex should include a discussion of differences between the two. This 
can then be quietly dropped once re becomes formally deprecated.


(Assuming that the std lib keeps re and regex in parallel for a few 
releases, which is not a given.)


However, I note that last time, the old regex module was just documented 
as obsolete with little detailed discussion of the differences:


http://docs.python.org/release/1.5/lib/node69.html#SECTION00530


--
Steven
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should we move to replace re with regex?

2011-08-26 Thread Ezio Melotti
On Sat, Aug 27, 2011 at 1:57 AM, Guido van Rossum  wrote:

> On Fri, Aug 26, 2011 at 3:54 PM, "Martin v. Löwis" 
> wrote:
> > [...]
> > Among us, some are more "regex gurus" than others; you know
> > who you are. I guess the PSF would pay for the review, if that
> > is what it would take.
>
> Makes sense. I noticed Ezio seems quite in favor of regex. Maybe he knows
> more?
>

Matthew has always been responsive on the tracker, usually fixing reported
bugs in a matter of days, and I think he's willing to keep doing so once the
regex module is included.  Even if I haven't yet tried the module myself
(I'm planning to do it though), it seems quite popular out there (the
download number on PyPI apparently gets reset for each new release, so I
don't know the exact total), and apparently people are already using it as a
replacement of re.

I'm not sure it's worth doing an extensive review of the code, a better
approach might be to require extensive test coverage  (and a review of
tests).  If the code seems well written, commented, documented (I think
proper rst documentation is still missing), and tested (both with unittest
and out in the wild), and Matthew is willing to maintain it, I think we can
include it.  We will get familiar with the code once we start contributing
to it and fixing bugs, as it already happens with most of the other modules.

See also the "New regex module for 3.2?" thread (
http://mail.python.org/pipermail/python-dev/2010-July/101606.html ).

Best Regards,
Ezio Melotti


>
> --
> --Guido van Rossum (python.org/~guido )
>
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should we move to replace re with regex?

2011-08-26 Thread Ben Finney
"M.-A. Lemburg"  writes:

> Guido van Rossum wrote:

> > I really don't want to have to tell people "Oh, that bug is fixed
> > but you have to use regex instead of re" and then a few years later
> > have to tell them "Oh, we're deprecating regex, you should just use
> > re".
>
> No, you tell them: "If you want Unicode 6 semantics, use regex, if
> you're fine with Unicode 2.0/3.0 semantics, use re".

What do we say, then, to those who are unaware of the different
semantics between those versions of Unicode, and want regular expression
to “just work” in Python?

To which document can we direct them to understand what semantics they
want?

> After all, it's not like re suddenly stopped working :-)

For some value of “working”, that is. The trick is to know whether that
value is what one wants.

-- 
 \“The fact of your own existence is the most astonishing fact |
  `\you'll ever have to confront. Don't dare ever see your life as |
_o__)boring, monotonous, or joyless.” —Richard Dawkins, 2010-03-10 |
Ben Finney

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should we move to replace re with regex?

2011-08-26 Thread Dan Stromberg
On Fri, Aug 26, 2011 at 5:08 PM, Antoine Pitrou  wrote:

> On Fri, 26 Aug 2011 15:48:42 -0700
> Dan Stromberg  wrote:
> >
> > Then there probably should be a from __future__ import for a while.
>
> If you are willing to use a "from __future__ import", why not simply
>
>import regex as re
>
> ? We're not Perl, we don't have built-in syntactic support for regular
> expressions.
>
> Regards
>

If you add regex as "import regex", and the new regex module doesn't work
out, regex might be harder to get rid of.  from __future__ import is an
established way of trying something for a while to see if it's going to
work.

EG: "from __future__ import re", where re is really the new module.

But whatever.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should we move to replace re with regex?

2011-08-26 Thread Antoine Pitrou
On Fri, 26 Aug 2011 15:48:42 -0700
Dan Stromberg  wrote:
> 
> Then there probably should be a from __future__ import for a while.

If you are willing to use a "from __future__ import", why not simply

import regex as re

? We're not Perl, we don't have built-in syntactic support for regular
expressions.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should we move to replace re with regex?

2011-08-26 Thread Antoine Pitrou
On Sat, 27 Aug 2011 01:00:31 +0200
"M.-A. Lemburg"  wrote:
> > 
> > I can't say I liked how that transition was handled last time around.
> > I really don't want to have to tell people "Oh, that bug is fixed but
> > you have to use regex instead of re" and then a few years later have
> > to tell them "Oh, we're deprecating regex, you should just use re".
> 
> No, you tell them: "If you want Unicode 6 semantics, use regex,
> if you're fine with Unicode 2.0/3.0 semantics, use re". After all,
> it's not like re suddenly stopped working :-)

It has a whole lot of new features in addition to better unicode
support. See for yourself:
https://code.google.com/p/mrab-regex-hg/wiki/GeneralDetails

> Perhaps we could have a summer of code student do a review and
> analysis to get familiar with the code and then have at least
> two developers know the code well enough to support it for
> a while.

I'm not sure a GSoC student would be the best candidate to do a review
matching our expectations.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should we move to replace re with regex?

2011-08-26 Thread Antoine Pitrou
On Fri, 26 Aug 2011 15:47:21 -0700
Guido van Rossum  wrote:
> > The best way would be to contact the author, Matthew Barnett,
> 
> I had added him to the beginning of this thread but someone took him off.
> 
> > or to ask
> > on the tracker on http://bugs.python.org/issue2636. He has been quite
> > willing to answer such questions in the past, AFAIR.
> 
> So, that issue is about something called "regexp". AFAIK Matthew
> (MRAB) wrote something called "regex"
> (http://pypi.python.org/pypi/regex). Are they two different things???

No, it's the same.  The source is at
https://code.google.com/p/mrab-regex-hg/, btw.

Regards

Antoine.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should we move to replace re with regex?

2011-08-26 Thread Tom Christiansen
"M.-A. Lemburg"  wrote
   on Sat, 27 Aug 2011 01:00:31 +0200: 

> The good part is that it's based on the re code, the FUD comes
> from the fact that the new lib is 380kB larger than the old one
> and that's not even counting the generated 500kB of lookup
> tables.

Well, you have to put the property tables somewhere, somehow.
There are various schemes for demand loading them as needed,
but I don't know whether those are used.

--tom
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should we move to replace re with regex?

2011-08-26 Thread Guido van Rossum
On Fri, Aug 26, 2011 at 4:21 PM, MRAB  wrote:
> On 27/08/2011 00:08, Tom Christiansen wrote:
>>
>> "M.-A. Lemburg"  wrote
>>    on Sat, 27 Aug 2011 01:00:31 +0200:
>>
>>> The good part is that it's based on the re code, the FUD comes
>>> from the fact that the new lib is 380kB larger than the old one
>>> and that's not even counting the generated 500kB of lookup
>>> tables.
>>
>> Well, you have to put the property tables somewhere, somehow.
>> There are various schemes for demand loading them as needed,
>> but I don't know whether those are used.
>>
> FYI, the .pyd for Python v3.2 is 227KB, about half of which is property
> tables.

I wouldn't hold the size of the generated tables against you. :-)

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should we move to replace re with regex?

2011-08-26 Thread MRAB

On 27/08/2011 00:08, Tom Christiansen wrote:

"M.-A. Lemburg"  wrote
on Sat, 27 Aug 2011 01:00:31 +0200:


The good part is that it's based on the re code, the FUD comes
from the fact that the new lib is 380kB larger than the old one
and that's not even counting the generated 500kB of lookup
tables.


Well, you have to put the property tables somewhere, somehow.
There are various schemes for demand loading them as needed,
but I don't know whether those are used.


FYI, the .pyd for Python v3.2 is 227KB, about half of which is property
tables.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should we move to replace re with regex?

2011-08-26 Thread M.-A. Lemburg
Guido van Rossum wrote:
> On Fri, Aug 26, 2011 at 3:09 PM, M.-A. Lemburg  wrote:
>> Guido van Rossum wrote:
>>> I just made a pass of all the Unicode-related bugs filed by Tom
>>> Christiansen, and found that in several, the response was "this is
>>> fixed in the regex module [by Matthew Barnett]". I started replying
>>> that I thought that we should fix the bugs in the re module (i.e.,
>>> really in _sre.c) but on second thought I wonder if maybe regex is
>>> mature enough to replace re in Python 3.3. It would mean that we won't
>>> fix any of these bugs in earlier Python versions, but I could live
>>> with that.
>>>
>>> However, I don't know much about regex -- how compatible is it, how
>>> fast is it (including extreme cases where the backtracking goes
>>> crazy), how bug-free is it, and so on. Plus, how much work would it be
>>> to actually incorporate it into CPython as a complete drop-in
>>> replacement of the re package (such that nobody needs to change their
>>> imports or the flags they pass to the re module).
>>>
>>> We'd also probably have to train some core developers to be familiar
>>> enough with the code to maintain and evolve it -- I assume we can't
>>> just volunteer Matthew to do so forever... :-)
>>>
>>> What's the alternative? Is adding the requested bug fixes and new
>>> features to _sre.c really that hard?
>>
>> Why not simply add the new lib, see whether it works out and
>> then decide which path to follow.
>>
>> We've done that with the old regex lib. It took a few years
>> and releases to have people port their applications to the
>> then new re module and syntax, but in the end it worked.
>>
>> With a new regex library there are likely going to be quite
>> a few subtle differences between re and regex - even if it's
>> just doing things in a more Unicode compatible way.
>>
>> I don't think anyone can actually list all the differences given
>> the complex nature of regular expressions, so people will
>> likely need a few years and releases to get used it before
>> a switch can be made.
> 
> I can't say I liked how that transition was handled last time around.
> I really don't want to have to tell people "Oh, that bug is fixed but
> you have to use regex instead of re" and then a few years later have
> to tell them "Oh, we're deprecating regex, you should just use re".

No, you tell them: "If you want Unicode 6 semantics, use regex,
if you're fine with Unicode 2.0/3.0 semantics, use re". After all,
it's not like re suddenly stopped working :-)

> I'm really hoping someone has more actual technical understanding of
> re vs. regex and can give us some facts about the differences, rather
> than, frankly, FUD.

The good part is that it's based on the re code, the FUD comes
from the fact that the new lib is 380kB larger than the old one
and that's not even counting the generated 500kB of lookup
tables.

If no one steps up to do a review or analysis, I think the
only practical way to test the lib is to give it a prominent
chance to prove itself.

The other aspect is maintenance.

Perhaps we could have a summer of code student do a review and
analysis to get familiar with the code and then have at least
two developers know the code well enough to support it for
a while.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Aug 27 2011)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/

2011-10-04: PyCon DE 2011, Leipzig, Germany38 days to go

::: Try our new mxODBC.Connect Python Database Interface for free ! 


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should we move to replace re with regex?

2011-08-26 Thread Guido van Rossum
On Fri, Aug 26, 2011 at 3:54 PM, "Martin v. Löwis"  wrote:
>> However, I don't know much about regex
>
> The problem really is: nobody does (except for Matthew Barnett
> probably). This means that this contribution might be stuck
> "forever": somebody would have to review the module, identify
> issues, approve it, and take the blame if something breaks.
> That takes considerable time and has a considerable risk, for
> little expected glory - so nobody has volunteered to
> mentor/manage integration of that code.
>
> I believe most core contributors (who have run into this code)
> consider it worthwhile, but are just too scared to take action.
>
> Among us, some are more "regex gurus" than others; you know
> who you are. I guess the PSF would pay for the review, if that
> is what it would take.

Makes sense. I noticed Ezio seems quite in favor of regex. Maybe he knows more?

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should we move to replace re with regex?

2011-08-26 Thread Martin v. Löwis
> However, I don't know much about regex

The problem really is: nobody does (except for Matthew Barnett
probably). This means that this contribution might be stuck
"forever": somebody would have to review the module, identify
issues, approve it, and take the blame if something breaks.
That takes considerable time and has a considerable risk, for
little expected glory - so nobody has volunteered to
mentor/manage integration of that code.

I believe most core contributors (who have run into this code)
consider it worthwhile, but are just too scared to take action.

Among us, some are more "regex gurus" than others; you know
who you are. I guess the PSF would pay for the review, if that
is what it would take.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should we move to replace re with regex?

2011-08-26 Thread Dan Stromberg
On Fri, Aug 26, 2011 at 2:45 PM, Guido van Rossum  wrote:

> ...but on second thought I wonder if maybe regex is
> mature enough to replace re in Python 3.3.
>

I agree that the move from regex to re was kind of painful.

It seems someone should merge the unit tests for re and regex, and apply the
merged result to each for the sake of comparison.  There might also be a
need to expand the merged result to include new things.

Then there probably should be a from __future__ import for a while.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should we move to replace re with regex?

2011-08-26 Thread Guido van Rossum
On Fri, Aug 26, 2011 at 3:33 PM, Antoine Pitrou  wrote:
> On Fri, 26 Aug 2011 15:18:35 -0700
> Guido van Rossum  wrote:
>>
>> I can't say I liked how that transition was handled last time around.
>> I really don't want to have to tell people "Oh, that bug is fixed but
>> you have to use regex instead of re" and then a few years later have
>> to tell them "Oh, we're deprecating regex, you should just use re".
>>
>> I'm really hoping someone has more actual technical understanding of
>> re vs. regex and can give us some facts about the differences, rather
>> than, frankly, FUD.
>
> The best way would be to contact the author, Matthew Barnett,

I had added him to the beginning of this thread but someone took him off.

> or to ask
> on the tracker on http://bugs.python.org/issue2636. He has been quite
> willing to answer such questions in the past, AFAIR.

So, that issue is about something called "regexp". AFAIK Matthew
(MRAB) wrote something called "regex"
(http://pypi.python.org/pypi/regex). Are they two different things???

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should we move to replace re with regex?

2011-08-26 Thread Antoine Pitrou
On Fri, 26 Aug 2011 15:18:35 -0700
Guido van Rossum  wrote:
> 
> I can't say I liked how that transition was handled last time around.
> I really don't want to have to tell people "Oh, that bug is fixed but
> you have to use regex instead of re" and then a few years later have
> to tell them "Oh, we're deprecating regex, you should just use re".
> 
> I'm really hoping someone has more actual technical understanding of
> re vs. regex and can give us some facts about the differences, rather
> than, frankly, FUD.

The best way would be to contact the author, Matthew Barnett, or to ask
on the tracker on http://bugs.python.org/issue2636. He has been quite
willing to answer such questions in the past, AFAIR.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should we move to replace re with regex?

2011-08-26 Thread Guido van Rossum
On Fri, Aug 26, 2011 at 3:09 PM, M.-A. Lemburg  wrote:
> Guido van Rossum wrote:
>> I just made a pass of all the Unicode-related bugs filed by Tom
>> Christiansen, and found that in several, the response was "this is
>> fixed in the regex module [by Matthew Barnett]". I started replying
>> that I thought that we should fix the bugs in the re module (i.e.,
>> really in _sre.c) but on second thought I wonder if maybe regex is
>> mature enough to replace re in Python 3.3. It would mean that we won't
>> fix any of these bugs in earlier Python versions, but I could live
>> with that.
>>
>> However, I don't know much about regex -- how compatible is it, how
>> fast is it (including extreme cases where the backtracking goes
>> crazy), how bug-free is it, and so on. Plus, how much work would it be
>> to actually incorporate it into CPython as a complete drop-in
>> replacement of the re package (such that nobody needs to change their
>> imports or the flags they pass to the re module).
>>
>> We'd also probably have to train some core developers to be familiar
>> enough with the code to maintain and evolve it -- I assume we can't
>> just volunteer Matthew to do so forever... :-)
>>
>> What's the alternative? Is adding the requested bug fixes and new
>> features to _sre.c really that hard?
>
> Why not simply add the new lib, see whether it works out and
> then decide which path to follow.
>
> We've done that with the old regex lib. It took a few years
> and releases to have people port their applications to the
> then new re module and syntax, but in the end it worked.
>
> With a new regex library there are likely going to be quite
> a few subtle differences between re and regex - even if it's
> just doing things in a more Unicode compatible way.
>
> I don't think anyone can actually list all the differences given
> the complex nature of regular expressions, so people will
> likely need a few years and releases to get used it before
> a switch can be made.

I can't say I liked how that transition was handled last time around.
I really don't want to have to tell people "Oh, that bug is fixed but
you have to use regex instead of re" and then a few years later have
to tell them "Oh, we're deprecating regex, you should just use re".

I'm really hoping someone has more actual technical understanding of
re vs. regex and can give us some facts about the differences, rather
than, frankly, FUD.

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should we move to replace re with regex?

2011-08-26 Thread M.-A. Lemburg
Guido van Rossum wrote:
> I just made a pass of all the Unicode-related bugs filed by Tom
> Christiansen, and found that in several, the response was "this is
> fixed in the regex module [by Matthew Barnett]". I started replying
> that I thought that we should fix the bugs in the re module (i.e.,
> really in _sre.c) but on second thought I wonder if maybe regex is
> mature enough to replace re in Python 3.3. It would mean that we won't
> fix any of these bugs in earlier Python versions, but I could live
> with that.
> 
> However, I don't know much about regex -- how compatible is it, how
> fast is it (including extreme cases where the backtracking goes
> crazy), how bug-free is it, and so on. Plus, how much work would it be
> to actually incorporate it into CPython as a complete drop-in
> replacement of the re package (such that nobody needs to change their
> imports or the flags they pass to the re module).
> 
> We'd also probably have to train some core developers to be familiar
> enough with the code to maintain and evolve it -- I assume we can't
> just volunteer Matthew to do so forever... :-)
> 
> What's the alternative? Is adding the requested bug fixes and new
> features to _sre.c really that hard?

Why not simply add the new lib, see whether it works out and
then decide which path to follow.

We've done that with the old regex lib. It took a few years
and releases to have people port their applications to the
then new re module and syntax, but in the end it worked.

With a new regex library there are likely going to be quite
a few subtle differences between re and regex - even if it's
just doing things in a more Unicode compatible way.

I don't think anyone can actually list all the differences given
the complex nature of regular expressions, so people will
likely need a few years and releases to get used it before
a switch can be made.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Aug 27 2011)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/

2011-10-04: PyCon DE 2011, Leipzig, Germany38 days to go

::: Try our new mxODBC.Connect Python Database Interface for free ! 


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Should we move to replace re with regex?

2011-08-26 Thread Guido van Rossum
I just made a pass of all the Unicode-related bugs filed by Tom
Christiansen, and found that in several, the response was "this is
fixed in the regex module [by Matthew Barnett]". I started replying
that I thought that we should fix the bugs in the re module (i.e.,
really in _sre.c) but on second thought I wonder if maybe regex is
mature enough to replace re in Python 3.3. It would mean that we won't
fix any of these bugs in earlier Python versions, but I could live
with that.

However, I don't know much about regex -- how compatible is it, how
fast is it (including extreme cases where the backtracking goes
crazy), how bug-free is it, and so on. Plus, how much work would it be
to actually incorporate it into CPython as a complete drop-in
replacement of the re package (such that nobody needs to change their
imports or the flags they pass to the re module).

We'd also probably have to train some core developers to be familiar
enough with the code to maintain and evolve it -- I assume we can't
just volunteer Matthew to do so forever... :-)

What's the alternative? Is adding the requested bug fixes and new
features to _sre.c really that hard?

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com