Re: [Python-Dev] re performance
* Armin Rigo , 2017-01-28, 12:44: The theoretical kind of regexp is about giving a "yes/no" answer, whereas the concrete "re" or "regexp" modules gives a match object, which lets you ask for the subgroups' location, for example. Strange at it may seem, I am not aware of a way to do that using the linear-time approach of the theory---if it answers "yes", then you have no way of knowing *where* the subgroups matched. Another issue is that the theoretical engine has no notion of greedy/non-greedy matching. RE2 has linear execution time, and it supports both capture groups and greedy/non-greedy matching. The implementation is explained in this article: https://swtch.com/~rsc/regexp/regexp3.html -- Jakub Wilk ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] re performance
On Sat, 28 Jan 2017 12:07:05 -0500 Barry Warsaw wrote: > On Jan 28, 2017, at 03:43 PM, Nick Coghlan wrote: > > >I still think it could be a good candidate for a first "bundled" > >module, where we don't migrate it fully into the CPython development > >process, but *do* officially bless it and provide it by default in the > >form of a bundled wheel file (similar to what we do with pip). > > How would that work exactly. I.e. is there a PEP? > > While I think it could be a good idea to bundle (more?) third party packages > for a variety of reasons, I want to make sure it's done in a way that's still > friendly to downstream repackagers, as I'm sure you do to. :) That sounds like a lot of effort and maintenance... Don't we bundle pip *exactly* so that we don't have to bundle other third-party packages and instead tell users to "just use `pip install `"? To sum it up, how about we simply add an official suggestion to use regex in the docs? Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] re performance
Why not declare re deprecated and remove it in Python 4? I am pretty sure everyone wants to keep re in all 3.x releases, but that support need not extend beyond. So Py4 would have no battery for re, but it would (should!) be common knowledge that regex was the go-to module for general-purpose pattern matching. If re has advantages in certain situations someone might upgrade the 3.x implementation and provide it as a 3rd-party module, though the effort involved would be significant, so someone would have to be motivated to keep it. regards Steve Steve Holden On Sun, Jan 29, 2017 at 4:13 PM, Antoine Pitrou wrote: > On Sat, 28 Jan 2017 12:07:05 -0500 > Barry Warsaw wrote: > > On Jan 28, 2017, at 03:43 PM, Nick Coghlan wrote: > > > > >I still think it could be a good candidate for a first "bundled" > > >module, where we don't migrate it fully into the CPython development > > >process, but *do* officially bless it and provide it by default in the > > >form of a bundled wheel file (similar to what we do with pip). > > > > How would that work exactly. I.e. is there a PEP? > > > > While I think it could be a good idea to bundle (more?) third party > packages > > for a variety of reasons, I want to make sure it's done in a way that's > still > > friendly to downstream repackagers, as I'm sure you do to. :) > > That sounds like a lot of effort and maintenance... Don't we bundle pip > *exactly* so that we don't have to bundle other third-party packages > and instead tell users to "just use `pip install `"? > > To sum it up, how about we simply add an official suggestion to use > regex in the docs? > > Regards > > Antoine. > > > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > steve%40holdenweb.com > ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] re performance
On 29 January 2017 at 20:30, Steve Holden wrote: > Why not declare re deprecated and remove it in Python 4? I am pretty sure > everyone wants to keep re in all 3.x releases, but that support need not > extend beyond. So Py4 would have no battery for re, but it would (should!) > be common knowledge that regex was the go-to module for general-purpose > pattern matching. If re has advantages in certain situations someone might > upgrade the 3.x implementation and provide it as a 3rd-party module, though > the effort involved would be significant, so someone would have to be > motivated to keep it. Not having regex capability distributed with Python is a pretty big regression. There are still a lot of users who don't have access to 3rd party modules. Paul ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] re performance
On 29.01.17 22:30, Steve Holden wrote: Why not declare re deprecated and remove it in Python 4? I am pretty sure everyone wants to keep re in all 3.x releases, but that support need not extend beyond. So Py4 would have no battery for re, but it would (should!) be common knowledge that regex was the go-to module for general-purpose pattern matching. If re has advantages in certain situations someone might upgrade the 3.x implementation and provide it as a 3rd-party module, though the effort involved would be significant, so someone would have to be motivated to keep it. Regular expressions are used in a number of standard modules and scripts. Excluding them from the stdlib would require excluding or rewriting the large part of the stdlib. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] re performance
On 29.01.17 12:18, Jakub Wilk wrote: * Armin Rigo , 2017-01-28, 12:44: The theoretical kind of regexp is about giving a "yes/no" answer, whereas the concrete "re" or "regexp" modules gives a match object, which lets you ask for the subgroups' location, for example. Strange at it may seem, I am not aware of a way to do that using the linear-time approach of the theory---if it answers "yes", then you have no way of knowing *where* the subgroups matched. Another issue is that the theoretical engine has no notion of greedy/non-greedy matching. RE2 has linear execution time, and it supports both capture groups and greedy/non-greedy matching. The implementation is explained in this article: https://swtch.com/~rsc/regexp/regexp3.html Not all features of Python regular expressions can be implemented with linear complexity. It is possible to compile the part of regular expressions to the implementation with linear complexity. Patches are welcome. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] re performance
Armin Rigo wrote: The theoretical kind of regexp is about giving a "yes/no" answer, whereas the concrete "re" or "regexp" modules gives a match object, which lets you ask for the subgroups' location, for example. Another issue is that the theoretical engine has no notion of greedy/non-greedy matching. These things aren't part of the classical theory of REs that is usually taught, but it should be possible to do them in linear time. They can be done for context-free languages using e.g. an LALR parser, and regular languages are a subset of context-free languages. -- Greg ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Generator objects and list comprehensions?
On Thu, Jan 26, 2017 at 4:09 AM, Ivan Levkivskyi wrote: > > > Concerning list/set/dict comprehensions, I am much more in favor of making > comprehensions simply equivalent to for-loops (more or less like you > proposed using yield from). The only reason to introduce auxiliary function > scope was to prevent the loop variables from leaking outside comprehensions. > Formally, this is indeed backward incompatible, but I doubt many people > depend on the current counter-intuitive behavior. > > Concerning generator expressions, probably it is indeed better to simply > prohibit yield inside them. > > Thank you to everyone who responded to my post and provided excellent analysis. For Python, I don't know what the best way to proceed is: OPTION 1 Make a SyntaxError: [(yield 1) for x in range(10)] and update the documentation to explain that this is an invalid construct. This would have certainly helped me identify the source of the problem as I tried porting buildbot 0.9 to Python 3. However, while not very common, there is Python 2.x code that uses that. I found these cases in the buildbot code which I changed so as to work on Python 2 and 3: https://github.com/buildbot/buildbot/pull/2661 https://github.com/buildbot/buildbot/pull/2673 OPTION 2 = Make this return a list on Python 3, like in Python 2: [ (yield 1) for x in range(10)] As pointed out by others on the this mailing list, there are some problems associated with that. I don't know if there are many Python 2 codebases out there with this construct, but it would be nice to have one less Python 2 -> 3 porting gotcha. I'm OK with either approach. Leaving things the way they are in Python 3 is no good, IMHO. -- Craig ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com