Re: [Python-Dev] re performance

2017-01-26 Thread MRAB

On 2017-01-26 21:46, Sven R. Kunze wrote:

On 26.01.2017 22:33, Vlastimil Brom wrote:

Hi,
I can't speak about the details of mrab's implementation, but using
regex, I get the resulting match instantly: [...]


Nice! I focused on the stdlib re module as this is mainly used by other
frameworks (like Django).


(I personally prefer to use regex for other advantages, than this
artificial case, but it certainly doesn't hurt to have better
performance here too:)


Me, too.

So, it seems as if regex already uses a better algorithm although I
couldn't find any reference to any regex theoretical framework like dfa,
nfa, thompson multiple-state simulation or something.


It still uses backtracking, like in the re module.

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] re performance

2017-01-26 Thread MRAB

On 2017-01-26 21:13, Sven R. Kunze wrote:

Hi folks,

I recently refreshed regular expressions theoretical basics *indulging
in reminiscences* So, I read https://swtch.com/~rsc/regexp/regexp1.html

However, reaching the chart in the lower third of the article, I saw
Python 2.4 measured against a naive Thompson matching implementation.
And I was surprised about how bad it performed compared to an
unoptimized version of an older than dirt algorithm.

So, I decided to give it a try with Python2.7 and Python3.5. Eh, voilà,
100% cpu and no results so far. Nothing has changed at all since 2007.

  >>> import re
  >>> re.match(r'a?'*30 + r'a'*30, 'a'*30)
 (still waiting)

Quoting from the article: " Some might argue that this test is unfair to
the backtracking implementations, since it focuses on an uncommon corner
case. This argument misses the point: given a choice between an
implementation with a predictable, consistent, fast running time on all
inputs or one that usually runs quickly but can take years of CPU time
(or more) on some inputs, the decision should be easy."

Victor, as the head of Python performance department, and Matthew, as
the maintainer of the new regex module, what is your stance on this
particular issue?

  From my perspective, I can say, that regular expressions might worth
optimizing especially for web applications (url matching usually uses
regexes) but also for other applications where I've seen many tight
loops using regexes as well. So, I am probing interest on this topic here.
It's possible to add some checks to reduce the problem, as I've done in 
the regex module, but I'm not sure how easy it would be to retrofit it 
into the existing re code. (It wasn't pleasant in the regex module, so...).


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] re performance

2017-01-26 Thread Sven R. Kunze

On 26.01.2017 22:33, Vlastimil Brom wrote:

Hi,
I can't speak about the details of mrab's implementation, but using
regex, I get the resulting match instantly: [...]


Nice! I focused on the stdlib re module as this is mainly used by other 
frameworks (like Django).



(I personally prefer to use regex for other advantages, than this
artificial case, but it certainly doesn't hurt to have better
performance here too:)


Me, too.

So, it seems as if regex already uses a better algorithm although I 
couldn't find any reference to any regex theoretical framework like dfa, 
nfa, thompson multiple-state simulation or something.



Cheers,
Sven
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] re performance

2017-01-26 Thread Vlastimil Brom
2017-01-26 22:13 GMT+01:00 Sven R. Kunze :
> Hi folks,
>
> I recently refreshed regular expressions theoretical basics *indulging in
> reminiscences* So, I read https://swtch.com/~rsc/regexp/regexp1.html
>
> However, reaching the chart in the lower third of the article, I saw Python
> 2.4 measured against a naive Thompson matching implementation. And I was
> surprised about how bad it performed compared to an unoptimized version of
> an older than dirt algorithm.
>
> So, I decided to give it a try with Python2.7 and Python3.5. Eh, voilà, 100%
> cpu and no results so far. Nothing has changed at all since 2007.
>
 import re
 re.match(r'a?'*30 + r'a'*30, 'a'*30)
>  (still waiting)
>
> Quoting from the article: " Some might argue that this test is unfair to the
> backtracking implementations, since it focuses on an uncommon corner case.
> This argument misses the point: given a choice between an implementation
> with a predictable, consistent, fast running time on all inputs or one that
> usually runs quickly but can take years of CPU time (or more) on some
> inputs, the decision should be easy."
>
> Victor, as the head of Python performance department, and Matthew, as the
> maintainer of the new regex module, what is your stance on this particular
> issue?
>
> From my perspective, I can say, that regular expressions might worth
> optimizing especially for web applications (url matching usually uses
> regexes) but also for other applications where I've seen many tight loops
> using regexes as well. So, I am probing interest on this topic here.
>
> Best,
> Sven
>

Hi,
I can't speak about the details of mrab's implementation, but using
regex, I get the resulting match instantly:

Python 3.6.0 (v3.6.0:41df79263a11, Dec 23 2016, 07:18:10) [MSC v.1900
32 bit (Intel)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> import regex
>>> regex.match(r'a?'*30 + r'a'*30, 'a'*30)

>>>

(I personally prefer to use regex for other advantages, than this
artificial case, but it certainly doesn't hurt to have better
performance here too:)

regards,
vbr
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] re performance

2017-01-26 Thread Sven R. Kunze

Hi folks,

I recently refreshed regular expressions theoretical basics *indulging 
in reminiscences* So, I read https://swtch.com/~rsc/regexp/regexp1.html


However, reaching the chart in the lower third of the article, I saw 
Python 2.4 measured against a naive Thompson matching implementation. 
And I was surprised about how bad it performed compared to an 
unoptimized version of an older than dirt algorithm.


So, I decided to give it a try with Python2.7 and Python3.5. Eh, voilà, 
100% cpu and no results so far. Nothing has changed at all since 2007.


>>> import re
>>> re.match(r'a?'*30 + r'a'*30, 'a'*30)
 (still waiting)

Quoting from the article: " Some might argue that this test is unfair to 
the backtracking implementations, since it focuses on an uncommon corner 
case. This argument misses the point: given a choice between an 
implementation with a predictable, consistent, fast running time on all 
inputs or one that usually runs quickly but can take years of CPU time 
(or more) on some inputs, the decision should be easy."


Victor, as the head of Python performance department, and Matthew, as 
the maintainer of the new regex module, what is your stance on this 
particular issue?


From my perspective, I can say, that regular expressions might worth 
optimizing especially for web applications (url matching usually uses 
regexes) but also for other applications where I've seen many tight 
loops using regexes as well. So, I am probing interest on this topic here.


Best,
Sven

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Why is PyDateTimeAPI not initialized by datetime module import?

2017-01-26 Thread Alexander Belopolsky
On Thu, Jan 26, 2017 at 11:00 AM, Skip Montanaro 
wrote:

> I just got burned (wasted a good day or so) by the fact that PyDateTimeAPI
> wasn't initialized. The datetime.rst doc states (emphasis mine):
>
> Before using any of these functions, the header file :file:`datetime.h`
> must be included in your source (note that this is not included by
> :file:`Python.h`), and the macro :c:macro:`PyDateTime_IMPORT` must be
> invoked, *usually as part of the module initialisation function*.
>
> I thought that surely the datetime module itself would initialize that
> stuff. Why not?
>

It is fairly common for the modules that export C API in a capsule to only
initialize that capsule when explicitly asked.  For example, if you want to
use numpy C API, you need to call import_array() in your module
initialization function. [1]  I don't know how expensive PyDateTime_IMPORT
is, but it cannot be called in _datetimemodule.c because it is guarded by
the Py_BUILD_CORE macro and is not available when _datetimemodule.c itself
is compiled.  I don't know whether this can be easily circumvented, but it
is not hard to remember that one should initialize C API before using it.
Maybe we can improve the error message when
PyDateTime_IMPORT is forgotten.  Feel free to open an issue for that.

[1]:
https://docs.scipy.org/doc/numpy-1.12.0/user/c-info.how-to-extend.html#required-subroutine
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Why is PyDateTimeAPI not initialized by datetime module import?

2017-01-26 Thread Skip Montanaro
I just got burned (wasted a good day or so) by the fact that PyDateTimeAPI
wasn't initialized. The datetime.rst doc states (emphasis mine):

Before using any of these functions, the header file :file:`datetime.h`
must be included in your source (note that this is not included by
:file:`Python.h`), and the macro :c:macro:`PyDateTime_IMPORT` must be
invoked, *usually as part of the module initialisation function*.

I thought that surely the datetime module itself would initialize that
stuff. Why not? Is it so terribly expensive that the C API requires this
rather weird hack? The code's been their for ages, so there must have been
a good reason at one time. Is that reason still valid today? (I haven't
programmed at the C API level for a good long while, or I'm sure I'd have
encountered this before.)

Thx,

Skip
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Generator objects and list comprehensions?

2017-01-26 Thread Ivan Levkivskyi
On 26 January 2017 at 00:53, Nathaniel Smith  wrote:

>
> I'm not sure this is actually a good idea, given the potential for
> ambiguity and backwards compatibility considerations -- I'm kind of leaning
> towards the deprecate/error option on balance :-). But it's an option that
> would probably be less confusing than the status quo, and would make it
> easier to write the kind of straddling code the original poster was trying
> to write.
>
>
Concerning list/set/dict comprehensions, I am much more in favor of making
comprehensions simply equivalent to for-loops (more or less like you
proposed using yield from). The only reason to introduce auxiliary function
scope was to prevent the loop variables from leaking outside comprehensions.
Formally, this is indeed backward incompatible, but I doubt many people
depend on the current counter-intuitive behavior.

Concerning generator expressions, probably it is indeed better to simply
prohibit yield inside them.

--
Ivan
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com