Re: [Python-Dev] re performance
On 2017-01-26 21:46, Sven R. Kunze wrote: On 26.01.2017 22:33, Vlastimil Brom wrote: Hi, I can't speak about the details of mrab's implementation, but using regex, I get the resulting match instantly: [...] Nice! I focused on the stdlib re module as this is mainly used by other frameworks (like Django). (I personally prefer to use regex for other advantages, than this artificial case, but it certainly doesn't hurt to have better performance here too:) Me, too. So, it seems as if regex already uses a better algorithm although I couldn't find any reference to any regex theoretical framework like dfa, nfa, thompson multiple-state simulation or something. It still uses backtracking, like in the re module. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] re performance
On 2017-01-26 21:13, Sven R. Kunze wrote: Hi folks, I recently refreshed regular expressions theoretical basics *indulging in reminiscences* So, I read https://swtch.com/~rsc/regexp/regexp1.html However, reaching the chart in the lower third of the article, I saw Python 2.4 measured against a naive Thompson matching implementation. And I was surprised about how bad it performed compared to an unoptimized version of an older than dirt algorithm. So, I decided to give it a try with Python2.7 and Python3.5. Eh, voilà, 100% cpu and no results so far. Nothing has changed at all since 2007. >>> import re >>> re.match(r'a?'*30 + r'a'*30, 'a'*30) (still waiting) Quoting from the article: " Some might argue that this test is unfair to the backtracking implementations, since it focuses on an uncommon corner case. This argument misses the point: given a choice between an implementation with a predictable, consistent, fast running time on all inputs or one that usually runs quickly but can take years of CPU time (or more) on some inputs, the decision should be easy." Victor, as the head of Python performance department, and Matthew, as the maintainer of the new regex module, what is your stance on this particular issue? From my perspective, I can say, that regular expressions might worth optimizing especially for web applications (url matching usually uses regexes) but also for other applications where I've seen many tight loops using regexes as well. So, I am probing interest on this topic here. It's possible to add some checks to reduce the problem, as I've done in the regex module, but I'm not sure how easy it would be to retrofit it into the existing re code. (It wasn't pleasant in the regex module, so...). ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] re performance
On 26.01.2017 22:33, Vlastimil Brom wrote: Hi, I can't speak about the details of mrab's implementation, but using regex, I get the resulting match instantly: [...] Nice! I focused on the stdlib re module as this is mainly used by other frameworks (like Django). (I personally prefer to use regex for other advantages, than this artificial case, but it certainly doesn't hurt to have better performance here too:) Me, too. So, it seems as if regex already uses a better algorithm although I couldn't find any reference to any regex theoretical framework like dfa, nfa, thompson multiple-state simulation or something. Cheers, Sven ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] re performance
2017-01-26 22:13 GMT+01:00 Sven R. Kunze: > Hi folks, > > I recently refreshed regular expressions theoretical basics *indulging in > reminiscences* So, I read https://swtch.com/~rsc/regexp/regexp1.html > > However, reaching the chart in the lower third of the article, I saw Python > 2.4 measured against a naive Thompson matching implementation. And I was > surprised about how bad it performed compared to an unoptimized version of > an older than dirt algorithm. > > So, I decided to give it a try with Python2.7 and Python3.5. Eh, voilà, 100% > cpu and no results so far. Nothing has changed at all since 2007. > import re re.match(r'a?'*30 + r'a'*30, 'a'*30) > (still waiting) > > Quoting from the article: " Some might argue that this test is unfair to the > backtracking implementations, since it focuses on an uncommon corner case. > This argument misses the point: given a choice between an implementation > with a predictable, consistent, fast running time on all inputs or one that > usually runs quickly but can take years of CPU time (or more) on some > inputs, the decision should be easy." > > Victor, as the head of Python performance department, and Matthew, as the > maintainer of the new regex module, what is your stance on this particular > issue? > > From my perspective, I can say, that regular expressions might worth > optimizing especially for web applications (url matching usually uses > regexes) but also for other applications where I've seen many tight loops > using regexes as well. So, I am probing interest on this topic here. > > Best, > Sven > Hi, I can't speak about the details of mrab's implementation, but using regex, I get the resulting match instantly: Python 3.6.0 (v3.6.0:41df79263a11, Dec 23 2016, 07:18:10) [MSC v.1900 32 bit (Intel)] on win32 Type "copyright", "credits" or "license()" for more information. >>> import regex >>> regex.match(r'a?'*30 + r'a'*30, 'a'*30) >>> (I personally prefer to use regex for other advantages, than this artificial case, but it certainly doesn't hurt to have better performance here too:) regards, vbr ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] re performance
Hi folks, I recently refreshed regular expressions theoretical basics *indulging in reminiscences* So, I read https://swtch.com/~rsc/regexp/regexp1.html However, reaching the chart in the lower third of the article, I saw Python 2.4 measured against a naive Thompson matching implementation. And I was surprised about how bad it performed compared to an unoptimized version of an older than dirt algorithm. So, I decided to give it a try with Python2.7 and Python3.5. Eh, voilà, 100% cpu and no results so far. Nothing has changed at all since 2007. >>> import re >>> re.match(r'a?'*30 + r'a'*30, 'a'*30) (still waiting) Quoting from the article: " Some might argue that this test is unfair to the backtracking implementations, since it focuses on an uncommon corner case. This argument misses the point: given a choice between an implementation with a predictable, consistent, fast running time on all inputs or one that usually runs quickly but can take years of CPU time (or more) on some inputs, the decision should be easy." Victor, as the head of Python performance department, and Matthew, as the maintainer of the new regex module, what is your stance on this particular issue? From my perspective, I can say, that regular expressions might worth optimizing especially for web applications (url matching usually uses regexes) but also for other applications where I've seen many tight loops using regexes as well. So, I am probing interest on this topic here. Best, Sven ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Why is PyDateTimeAPI not initialized by datetime module import?
On Thu, Jan 26, 2017 at 11:00 AM, Skip Montanarowrote: > I just got burned (wasted a good day or so) by the fact that PyDateTimeAPI > wasn't initialized. The datetime.rst doc states (emphasis mine): > > Before using any of these functions, the header file :file:`datetime.h` > must be included in your source (note that this is not included by > :file:`Python.h`), and the macro :c:macro:`PyDateTime_IMPORT` must be > invoked, *usually as part of the module initialisation function*. > > I thought that surely the datetime module itself would initialize that > stuff. Why not? > It is fairly common for the modules that export C API in a capsule to only initialize that capsule when explicitly asked. For example, if you want to use numpy C API, you need to call import_array() in your module initialization function. [1] I don't know how expensive PyDateTime_IMPORT is, but it cannot be called in _datetimemodule.c because it is guarded by the Py_BUILD_CORE macro and is not available when _datetimemodule.c itself is compiled. I don't know whether this can be easily circumvented, but it is not hard to remember that one should initialize C API before using it. Maybe we can improve the error message when PyDateTime_IMPORT is forgotten. Feel free to open an issue for that. [1]: https://docs.scipy.org/doc/numpy-1.12.0/user/c-info.how-to-extend.html#required-subroutine ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Why is PyDateTimeAPI not initialized by datetime module import?
I just got burned (wasted a good day or so) by the fact that PyDateTimeAPI wasn't initialized. The datetime.rst doc states (emphasis mine): Before using any of these functions, the header file :file:`datetime.h` must be included in your source (note that this is not included by :file:`Python.h`), and the macro :c:macro:`PyDateTime_IMPORT` must be invoked, *usually as part of the module initialisation function*. I thought that surely the datetime module itself would initialize that stuff. Why not? Is it so terribly expensive that the C API requires this rather weird hack? The code's been their for ages, so there must have been a good reason at one time. Is that reason still valid today? (I haven't programmed at the C API level for a good long while, or I'm sure I'd have encountered this before.) Thx, Skip ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Generator objects and list comprehensions?
On 26 January 2017 at 00:53, Nathaniel Smithwrote: > > I'm not sure this is actually a good idea, given the potential for > ambiguity and backwards compatibility considerations -- I'm kind of leaning > towards the deprecate/error option on balance :-). But it's an option that > would probably be less confusing than the status quo, and would make it > easier to write the kind of straddling code the original poster was trying > to write. > > Concerning list/set/dict comprehensions, I am much more in favor of making comprehensions simply equivalent to for-loops (more or less like you proposed using yield from). The only reason to introduce auxiliary function scope was to prevent the loop variables from leaking outside comprehensions. Formally, this is indeed backward incompatible, but I doubt many people depend on the current counter-intuitive behavior. Concerning generator expressions, probably it is indeed better to simply prohibit yield inside them. -- Ivan ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com