date:20201014

[Guido]
> The key seems to be:

Except none of that quoted text (which I'll skip repeating) gives the
slightest clue as to _why_ it may be an improvement. So you split the
needle into two pieces. So what? What's the _point_? Why would
someone even imagine that might help?

Why is one half then searched right to left, but the other half left
to right? There's actually "a reason" for searching the right half
left to right, but because the shift on a mismatch in the left half is
a constant ("per(x)") independent of the mismatching position, it's
actually irrelevant in which order the left half's characters are
compared (well, at least in some variations of these newer
algorithms). Over-specification can be actively misleading too.

What's the key idea? "Split the needle into two pieces" is a key
_step_, not a key _insight_.

> I need a Python version though.

Dennis wrote one for his first stab, which you can download from the
bug report (it's attached there as fastsearch.py).

On the bug report's data, running that under CPython 3.9,0 on my box
reduced the time from about 3 1/2 hours to about 20 seconds. Running
it under PyPy, to under one second (and the current C implementation
under 0.1 second).

> I am not able to dream up any hard cases -- like other posters, my own
> use of substring search is usually looking for a short string in a relatively
> short piece of text. I doubt even the current optimizations matter to my uses.

They probably do, but not dramatically. Would you, e.g., notice a
factor of 2 either way in those cases? A factor of 4? Of 10? When
Fredrik first wrote this code, he routinely saw factors of 2 to 10
speedup on all sorts of string-searching benchmarks, both contrived
and "natural". The speedups were especially pronounced for Unicode
strings at the time, where skipping futile Unicode character
comparisons could be more valuable than when skipping (shorter)
single-byte comparisons.

Should also note that the fixed string searching code is also used as
a subroutine by parts of our regexp implementation, by str.count(),
str.replace(), similar `bytes` operations, and so on.

> What are typical hard cases used for?

It's kinda like asking what typical hard rounding cases for pow() are
used for ;-) They aren't deliberate. They "just happen", typically
when a pattern and the text to search contain self-similarities "but
with glitches". Search the string

text = "X" * 10_000_000

for

needle = "X" * 1_000_000

Easy! It matches at once. But now tack "Y" on to the end of the
needle. Then it's impossible to match. Brute force search first
finds a prefix match at text{; 100] but then fails to match the
trailing "Y". So brute force uses another million compares to match
the prefix at text[1 : 101]. But again fails to match the trailing
"Y". Then another million plus 1 compares to fail to match starting
at index 2, and again at index 3, and again at index 4, ... it
approaches a trillion comparisons before it finally fails.

The current code starts comparing at the right end of each trial
position first. Then an "X" from the text and the needle's trailing
"Y" mismatch at once. That's a million times faster.

Here's a simple case where the current code is horribly slow, because
neither "right-to-left" nor any of its preprocessing tricks do any
good at all. Even Daniel Sunday's full-blown algorithm[1] (which the
current code strips down _almost_ to the point of non-existence)
wouldn't help (unless it used a variant I've never actually seen that
compared the rarest pattern character first):

("X" * 10_000_000).find("X" * 500_000 + "Y" + "X" * 500_000)

The newer code returns -1 seemingly instantly.

> DNA search? (That would be cool!)

It can certainly help. Self-similarities are bound to be common in
long strings from a 4-character alphabet (ACGT). But, to be fair,
serious work with genomes typically invests in building a giant suffix
tree (or enhanced suffix array) for each cataloged sequence to be
searched. No preprocessing of needles is needed then to guarantee
worst-case efficient searches of many kinds (including things like
finding the longest adjacent repeated subsequences, more like a regexp
(.+)\1 kind of search).

But the new code could quite plausibly speed more casual prototyping
of such explorations.

[1] https://dl.acm.org/doi/abs/10.1145/79173.79184
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at
https://mail.python.org/archives/list/python-dev@python.org/message/XUP6S2IVJAW5K2NSHZ7UOKN5YQFNUWVQ/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Changing Python's string search algorithms

2020-10-14 Thread Brett Cannon

On Wed., Oct. 14, 2020, 17:37 Tim Peters,  wrote:

> [Steven D'Aprano ]
> > Perhaps this is a silly suggestion, but could we offer this as an
> > external function in the stdlib rather than a string method?
> >
> > Leave it up to the user to decide whether or not their data best suits
> > the find method or the new search function. It sounds like we can offer
> > some rough heuristics, but the only way to really know is "try it and
> > see which works best for you".
> >
> > The `string` module is an obvious place for it.
>
> I think this is premature. There is almost never an optimization
> that's a pure win in all cases. For example, on some platforms
> `timsort` will never be as fast as the old samplesort in cases with a
> very large number of equal elements, and on all platforms `timsort`
> consumes more memory.  And it's a wash on "random" data on most
> platforms. Nevertheless, it was a significant speed win for many -
> possibly even most - real-life cases.
>
> So far, the PR reduces the runtime in the bug report from about 3 1/2
> hours to under a tenth of a second. It would be crazy not to gift
> users such dramatic relief by default, unless there are good reasons
> not to. Too soon to say. On tests of random strings with characters
> following a Zipf distribution (more realistic than uniform but still
> very easy to generate - and not contrived in any way to favor either
> approach), the current new code it usually faster than the status quo,
> in no case has been worse than twice as slow, and in a number of cases
> has been 20x faster.  It's already clearly a win _overall_.
>
> The bulk of the "but it's slower" cases now are found with very short
> needles (patterns), which was expected (read my comments on the bug
> report), and exacerbated by that the structure of the random test
> generation is quite likely to create cases where a short needle is
> found early in the haystack. This combination makes the higher
> preprocessing overhead as damaging as possible. Dennis also expanded
> the current code's "32-bit bitmask" trick (which is an extreme
> simplification of Daniel Sunday's algorithm) to a fixed-size 256-byte
> table of character-class-specific skip counts, which went a _very_
> long way toward eliminating the cases where the current code enjoyed a
> "pure luck" advantage over the new code. But that increased the
> preprocessing overhead again, which again is relatively most
> significant for short needles - those 256 bytes have to be initialized
> every time, no matter the lengths of the needle or haystack.
>
> If the new code were changed to exempt short needles, even now it
> might be hard to make an objective case not to adopt it.
>

This is where it sounds like we might want to go. If we can figure out a
reasonable, cheap heuristic to define "too short"-- for either the search
string or the string to search -- to use the fancier algorithm then I would
support adding the fancy string search.

-Brett


> But it would leave open a different idea for an "opt-in" facility: one
> that allowed to give a needle to a function and get back an object
> capturing the results of preprocessing.  Then a different wrapper
> around the search code that accepted the already-pre-processed info.
> There are, e.g., certainly cases where repeated searches for a given
> 4-character needle could be sped significantly by exploiting the new
> code, but where the overhead of preprocessing is too much to bear in
> "random" performance testing. It would also open the door to more
> aggressive (expensive) kinds of preprocessing.  That's where users may
> be able to make useful, informed judgments.
>
> [David Mertz]
> > That said, I do recognize that the `re` module also has pathological
> cases,
> > and the standard library documentation explicitly says "maybe you want to
> > try the third-party `regex` implementation."  That is sort of precedent
> for
> > this approach.
>
> `regex` also suffers exponential time disasters, they're just harder
> to stumble into - and there are books explaining in detail how and why
> regex engines can fall into these traps.
>
> It's far less _expected_ in plain string searches, and Dennis was
> aware of the new algorithms because, apparently, (at least) glibc's
> memmem switched to one some years ago. So we're playing catch-up here.
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/ECPXBBF7OUNYLDURCUKYXIOTGPGVBMHX/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at

[Python-Dev] Re: Changing Python's string search algorithms

2020-10-14 Thread Guido van Rossum

On Wed, Oct 14, 2020 at 9:56 AM Tim Peters  wrote:

> [Guido]
> > Maybe someone reading this can finish the Wikipedia page on
> > Two-Way Search? The code example trails off with a function with
> > some incomprehensible remarks and then a TODO..
>
> Yes, the Wikipedia page is worse than useless in its current state,
> although some of the references it lists are helpful.  This is a much
> better summary:
>
> http://www-igm.univ-mlv.fr/~lecroq/string/node26.html#SECTION00260
>
> but, I believe, still far too telegraphic.
>

The key seems to be:

"""
The searching phase of the Two Way algorithm consists in first comparing
the character of xr from left to right, then the character of x from right
to left.
When a mismatch occurs when scanning the k-th character of xr, then a shift
of length k is performed.
When a mismatch occurs when scanning x, or when an occurrence of the
pattern is found, then a shift of length per(x) is performed.
Such a scheme leads to a quadratic worst case algorithm, this can be
avoided by a prefix memorization: when a shift of length per(x) is
performed the length of the matching prefix of the pattern at the beginning
of the window (namely m-per(x)) after the shift is memorized to avoid to
scan it again during the next attempt.
"""

The preprocessing comes down to splitting the original search string
("needle") in two parts, xl and xr, using some clever algorithm (IIUC the
wikipedia page does describe this, although my brain is currently too
addled to visualize it).

The original paper is by far the best account I've seen so far, with
> complete code and exhaustive explanations and proofs.  Even examples
> ;-)
>

I need a Python version though.

> But here's the thing: I don't believe this algorithm this _can_ be
> reduced to an elevator pitch.  Excruciating details appear to be
> fundamental at every step, and I've seen nothing yet anywhere
> approaching an "intuitive explanation" :-(  What happens instead is
> that you run it on the hardest cases you can dream up, and your jaw
> drops in amazement :-)
>

I am not able to dream up any hard cases -- like other posters, my own use
of substring search is usually looking for a short string in a relatively
short piece of text. I doubt even the current optimizations matter to my
uses. What are typical hard cases used for? DNA search? (That would be
cool!)

-- 
--Guido van Rossum (python.org/~guido)
*Pronouns: he/him **(why is my pronoun here?)*

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/HSLPDIFQ2YEWBBO3XJKYQHMP3LML3DNS/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Changing Python's string search algorithms

2020-10-14 Thread Chris Angelico

On Thu, Oct 15, 2020 at 11:38 AM Tim Peters  wrote:
> I think this is premature. There is almost never an optimization
> that's a pure win in all cases. For example, on some platforms
> `timsort` will never be as fast as the old samplesort in cases with a
> very large number of equal elements, and on all platforms `timsort`
> consumes more memory.  And it's a wash on "random" data on most
> platforms. Nevertheless, it was a significant speed win for many -
> possibly even most - real-life cases.

And timsort is one of my go-tos for teaching the concept of hybrid
sorting algorithms, because, at its heart, it's simple enough to
explain, and it manages to be so incredibly valuable in real-world
code. :)

> But it would leave open a different idea for an "opt-in" facility: one
> that allowed to give a needle to a function and get back an object
> capturing the results of preprocessing.  Then a different wrapper
> around the search code that accepted the already-pre-processed info.
> There are, e.g., certainly cases where repeated searches for a given
> 4-character needle could be sped significantly by exploiting the new
> code, but where the overhead of preprocessing is too much to bear in
> "random" performance testing. It would also open the door to more
> aggressive (expensive) kinds of preprocessing.  That's where users may
> be able to make useful, informed judgments.

Kinda like the way a compiled regex is used? I like this idea. So it'd
be heuristics in the core language that choose a good default for most
situations, and then a str method that returns a preprocessed needle.
In a Python implementation that doesn't want to use two different
algorithms, that preprocessor could return the string unchanged, but
otherwise it's an opaque object usable only in string searches.

+1, if my interpretation of your description is correct.

ChrisA
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/4MOC4RQDFPFGCF2OHUAK4YACGGYMTFGS/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Changing Python's string search algorithms

[Steven D'Aprano ]
> Perhaps this is a silly suggestion, but could we offer this as an
> external function in the stdlib rather than a string method?
>
> Leave it up to the user to decide whether or not their data best suits
> the find method or the new search function. It sounds like we can offer
> some rough heuristics, but the only way to really know is "try it and
> see which works best for you".
>
> The `string` module is an obvious place for it.

I think this is premature. There is almost never an optimization
that's a pure win in all cases. For example, on some platforms
`timsort` will never be as fast as the old samplesort in cases with a
very large number of equal elements, and on all platforms `timsort`
consumes more memory.  And it's a wash on "random" data on most
platforms. Nevertheless, it was a significant speed win for many -
possibly even most - real-life cases.

So far, the PR reduces the runtime in the bug report from about 3 1/2
hours to under a tenth of a second. It would be crazy not to gift
users such dramatic relief by default, unless there are good reasons
not to. Too soon to say. On tests of random strings with characters
following a Zipf distribution (more realistic than uniform but still
very easy to generate - and not contrived in any way to favor either
approach), the current new code it usually faster than the status quo,
in no case has been worse than twice as slow, and in a number of cases
has been 20x faster.  It's already clearly a win _overall_.

The bulk of the "but it's slower" cases now are found with very short
needles (patterns), which was expected (read my comments on the bug
report), and exacerbated by that the structure of the random test
generation is quite likely to create cases where a short needle is
found early in the haystack. This combination makes the higher
preprocessing overhead as damaging as possible. Dennis also expanded
the current code's "32-bit bitmask" trick (which is an extreme
simplification of Daniel Sunday's algorithm) to a fixed-size 256-byte
table of character-class-specific skip counts, which went a _very_
long way toward eliminating the cases where the current code enjoyed a
"pure luck" advantage over the new code. But that increased the
preprocessing overhead again, which again is relatively most
significant for short needles - those 256 bytes have to be initialized
every time, no matter the lengths of the needle or haystack.

If the new code were changed to exempt short needles, even now it
might be hard to make an objective case not to adopt it.

But it would leave open a different idea for an "opt-in" facility: one
that allowed to give a needle to a function and get back an object
capturing the results of preprocessing.  Then a different wrapper
around the search code that accepted the already-pre-processed info.
There are, e.g., certainly cases where repeated searches for a given
4-character needle could be sped significantly by exploiting the new
code, but where the overhead of preprocessing is too much to bear in
"random" performance testing. It would also open the door to more
aggressive (expensive) kinds of preprocessing.  That's where users may
be able to make useful, informed judgments.

[David Mertz]
> That said, I do recognize that the `re` module also has pathological cases,
> and the standard library documentation explicitly says "maybe you want to
> try the third-party `regex` implementation."  That is sort of precedent for
> this approach.

`regex` also suffers exponential time disasters, they're just harder
to stumble into - and there are books explaining in detail how and why
regex engines can fall into these traps.

It's far less _expected_ in plain string searches, and Dennis was
aware of the new algorithms because, apparently, (at least) glibc's
memmem switched to one some years ago. So we're playing catch-up here.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/ECPXBBF7OUNYLDURCUKYXIOTGPGVBMHX/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Changing Python's string search algorithms

2020-10-14 Thread David Mertz

On Wed, Oct 14, 2020 at 7:45 PM Steven D'Aprano  wrote:

> Perhaps this is a silly suggestion, but could we offer this as an
> external function in the stdlib rather than a string method?
>

That feels unworkable to me.

For one thing, the 'in' operator hits this same issue, doesn't it? But for
another, the example initially posted where the current code did or didn't
hit that quadratic breakdown was just one extra character in a long
string.  I cannot imagine end users knowing in advance whether their exact
search will hit that unlucky circumstance that is very data dependent.

If it was a case of "in certain circumstances, this other function might be
twice as fast" I can see that being useful.  But here we get a big-O
explosion that went from milliseconds to hours on ostensibly similar
operations.

That said, I do recognize that the `re` module also has pathological cases,
and the standard library documentation explicitly says "maybe you want to
try the third-party `regex` implementation."  That is sort of precedent for
this approach.

-- 
The dead increasingly dominate and strangle both the living and the
not-yet born.  Vampiric capital and undead corporate persons abuse
the lives and control the thoughts of homo faber. Ideas, once born,
become abortifacients against new conceptions.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/EN75GDICTTCHKMKEQQDETKNPFF4AJ3QR/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: [python-committers] Performance benchmarks for 3.9

2020-10-14 Thread Oscar Benjamin

On Wed, 14 Oct 2020 at 19:12, Ivan Pozdeev via Python-Dev
 wrote:
>
>
> On 14.10.2020 17:04, M.-A. Lemburg wrote:
> > On 14.10.2020 16:00, Pablo Galindo Salgado wrote:
> >>>   Would it be possible to get the data for older runs back, so that
> >> it's easier to find the changes which caused the slowdown ?
> >>
> >> Unfortunately no. The reasons are that that data was misleading because
> >> different points were computed with a different version of pyperformance 
> >> and
> >> therefore with different packages (and therefore different code). So the 
> >> points
> >> could not be compared among themselves.
> >>
> >> Also, past data didn't include 3.9 commits because the data gathering was 
> >> not
> >> automated and it didn't run in a long time :(
> > Make sense.
> >
> > Would it be possible rerun the tests with the current
> > setup for say the last 1000 revisions or perhaps a subset of these
> > (e.g. every 10th revision) to try to binary search for the revision which
> > introduced the change ?
>
> Git has a facility for this called "bisection". It can be run in automated 
> mode, exactly for a scenario like this.

As Victor mentioned earlier, asv (airspeed velocity) has this feature
as "asv find" and it can run a benchmark and has some heuristics to
bisect when a "slowdown" occurred. In some situations that is very
useful for identifying a commit that introduced a performance
regression but in other cases there can be a number of changes that
have different effects and the timings can be noisy. This can mean
that the bisection doesn't converge properly to a relevant commit.

Often there's really no substitute for running over all commits and
being able to see how the benchmark changes over time. You can do this
with asv and then the web interface gives a timeseries where the
data-points link directly to the commits on github. I'm currently
preparing the SymPy 1.7 release so Aaron Meurer recently ran asv over
all commits from 1.6 to master and you can see the output here:
https://www.asmeurer.com/sympy_benchmarks/

--
Oscar
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/YU5KOX7SHMSGUO4B3PIOZFMKA6S5GJ22/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Changing Python's string search algorithms

2020-10-14 Thread Steven D'Aprano

Perhaps this is a silly suggestion, but could we offer this as an 
external function in the stdlib rather than a string method?

Leave it up to the user to decide whether or not their data best suits 
the find method or the new search function. It sounds like we can offer 
some rough heuristics, but the only way to really know is "try it and 
see which works best for you".

The `string` module is an obvious place for it.

-- 
Steve
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/SVMEBOHDPPODZQGDOGSUPZYT4B7UIIAW/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Remove module's version attributes in the stdlib

2020-10-14 Thread Batuhan Taskaya

I've indexed a vast majority of the files from top 4K pypi packages to 
this system, and here are the results about __version__ usage on 
argparse, cgi, csv, decimal, imaplib, ipaddress, optparse, pickle, 
platform, re, smtpd, socketserver, tabnanny (result of an quick grep)



rawdata/clean/argparse/setup.py

|argparse.__version__|

rawdata/pypi/junitparser-1.4.1/bin/junitparser

|argparse.__version__|

rawdata/pypi/interpret_community-0.15.1/interpret_community/mlflow/mlflow.py 



|pickle.__version__|

The pickle in the last example looks like a result of import cloudpickle 
as pickle, so we are safe to eliminate that.


Here is the query if you want to try by yourself on different 
parameters: 
https://search.tree.science/?query=Attribute%28Name%28%27argparse%27%7C%27cgi%27%7C%27csv%27%7C%27decimal%27%7C%27imaplib%27%7C%27ipaddress%27%7C%27optparse%27%7C%27platform%27%7C%27pickle%27%7C%27re%27%7C%27smtpd%27%7C%27socketserver%27%7C%27tabnanny%27%29%2C+%22__version__%22%29 


On 14.10.2020 21:23, Neil Schemenauer wrote:

On 2020-10-14, Serhiy Storchaka wrote:

I propose to remove __version__ in all stdlib modules. Are there any
exceptions?

I agree that these kinds of meta attributes are not useful and it
would be nice to clean them up.  However, IMHO, maybe the cleanup is
not worth breaking Python programs.  We could remove them from the
documentation, add comments (or deprecation warnings) telling people
not to use them.

I think it would be okay to remove them if we could show that the
top N PyPI packages don't use these attributes or at least very few
of them do.  As someone who regularly tests alpha releases, I've
found it quite painful to do since nearly every release is breaking
3rd party packages that my code depends on.  I feel we should try
hard to avoid breaking things unless there is a strong reason and
there is no easy way to provide backwards compatibility.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/MI2SLQCZIKBRFX7HCUB7G4B64MTZ6XVC/
Code of Conduct: http://python.org/psf/codeofconduct/
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/MSQTTUZOW6KSECSZE5XH65LANGII2P5F/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Remove module's version attributes in the stdlib

2020-10-14 Thread Neil Schemenauer

On 2020-10-14, Serhiy Storchaka wrote:
> I propose to remove __version__ in all stdlib modules. Are there any
> exceptions?

I agree that these kinds of meta attributes are not useful and it
would be nice to clean them up.  However, IMHO, maybe the cleanup is
not worth breaking Python programs.  We could remove them from the
documentation, add comments (or deprecation warnings) telling people
not to use them.

I think it would be okay to remove them if we could show that the
top N PyPI packages don't use these attributes or at least very few
of them do.  As someone who regularly tests alpha releases, I've
found it quite painful to do since nearly every release is breaking
3rd party packages that my code depends on.  I feel we should try
hard to avoid breaking things unless there is a strong reason and
there is no easy way to provide backwards compatibility.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/MI2SLQCZIKBRFX7HCUB7G4B64MTZ6XVC/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Performance benchmarks for 3.9

2020-10-14 Thread Terry Reedy


On 10/14/2020 9:16 AM, Pablo Galindo Salgado wrote:


You can check these benchmarks I am talking about by:

* Go here: https://speed.python.org/comparison/
* In the left bar, select "lto-pgo latest in branch '3.9'" and "lto-pgo 
latest in branch '3.8'"


At the moment, there are only results for 'speed-python', none for 
'Broadwell-EP'.  What do those terms mean?


If one leaves all 5 versions checked, they are mis-ordered 3.9, 3.7, 
3.8, 3.6, master.  The correct sequence, 3.6 to master, would be easier 
to read and interpret.  Then pick colors to maximize contrast between 
adjacent bars.



* To better read the plot, I would recommend to select a "Normalization" 
to the 3.8 branch (this is in the top part of the page)


Or either end of whatever sequence one includes.


    and to check the "horizontal" checkbox.


Overall, there have been many substantial improvements since 3.6.

--
Terry Jan Reedy

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/DDQC76IF5DF335MTYNCGN7OAFJQPYZUI/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: [python-committers] Performance benchmarks for 3.9

2020-10-14 Thread Ivan Pozdeev via Python-Dev




On 14.10.2020 17:04, M.-A. Lemburg wrote:

On 14.10.2020 16:00, Pablo Galindo Salgado wrote:

  Would it be possible to get the data for older runs back, so that

it's easier to find the changes which caused the slowdown ?

Unfortunately no. The reasons are that that data was misleading because
different points were computed with a different version of pyperformance and
therefore with different packages (and therefore different code). So the points
could not be compared among themselves.

Also, past data didn't include 3.9 commits because the data gathering was not
automated and it didn't run in a long time :(

Make sense.

Would it be possible rerun the tests with the current
setup for say the last 1000 revisions or perhaps a subset of these
(e.g. every 10th revision) to try to binary search for the revision which
introduced the change ?


Git has a facility for this called "bisection". It can be run in automated 
mode, exactly for a scenario like this.

https://git-scm.com/docs/git-bisect



On Wed, 14 Oct 2020 at 14:57, M.-A. Lemburg mailto:m...@egenix.com>> wrote:

 Hi Pablo,

 thanks for pointing this out.

 Would it be possible to get the data for older runs back, so that
 it's easier to find the changes which caused the slowdown ?

 Going to the timeline, it seems that the system only has data
 for Oct 14 (today):

 
https://speed.python.org/timeline/#/?exe=12=regex_dna=1=1000=off=on=on=none

 In addition to unpack_sequence, the regex_dna test has slowed
 down a lot compared to Py3.8.

 
https://github.com/python/pyperformance/blob/master/pyperformance/benchmarks/bm_unpack_sequence.py
 
https://github.com/python/pyperformance/blob/master/pyperformance/benchmarks/bm_regex_dna.py

 Thanks.

 On 14.10.2020 15:16, Pablo Galindo Salgado wrote:
 > Hi!
 >
 > I have updated the branch benchmarks in the pyperformance server and now 
they
 > include 3.9. There are
 > some benchmarks that are faster but on the other hand some benchmarks are
 > substantially slower, pointing
 > at a possible performance regression in 3.9 in some aspects. In 
particular
 some
 > tests like "unpack sequence" are
 > almost 20% slower. As there are some other tests were 3.9 is faster, is
 not fair
 > to conclude that 3.9 is slower, but
 > this is something we should look into in my opinion.
 >
 > You can check these benchmarks I am talking about by:
 >
 > * Go here: https://speed.python.org/comparison/
 > * In the left bar, select "lto-pgo latest in branch '3.9'" and "lto-pgo 
latest
 > in branch '3.8'"
 > * To better read the plot, I would recommend to select a "Normalization"
 to the
 > 3.8 branch (this is in the top part of the page)
 >    and to check the "horizontal" checkbox.
 >
 > These benchmarks are very stable: I have executed them several times 
over the
 > weekend yielding the same results and,
 > more importantly, they are being executed on a server specially prepared 
to
 > running reproducible benchmarks: CPU affinity,
 > CPU isolation, CPU pinning for NUMA nodes, CPU frequency is fixed, CPU
 governor
 > set to performance mode, IRQ affinity is
 > disabled for the benchmarking CPU nodes...etc so you can trust these 
numbers.
 >
 > I kindly suggest for everyone interested in trying to improve the 3.9 
(and
 > master) performance, to review these benchmarks
 > and try to identify the problems and fix them or to find what changes
 introduced
 > the regressions in the first place. All benchmarks
 > are the ones being executed by the pyperformance suite
 > (https://github.com/python/pyperformance) so you can execute them
 > locally if you need to.
 >
 > ---
 >
 > On a related note, I am also working on the speed.python.org
 
 >  server to provide more automation and
 > ideally some integrations with GitHub to detect performance regressions. 
For
 > now, I have done the following:
 >
 > * Recompute benchmarks for all branches using the same version of
 > pyperformance (except master) so they can
 >    be compared with each other. This can only be seen in the "Comparison"
 > tab: https://speed.python.org/comparison/
 > * I am setting daily builds of the master branch so we can detect 
performance
 > regressions with daily granularity. These
 >    daily builds will be located in the "Changes" and "Timeline" tabs
 > (https://speed.python.org/timeline/).
 > * Once the daily builds are working as expected, I plan to work on 
trying to
 > automatically comment or PRs or on bpo if
 > we detect that a commit has introduced some notable performance 
regression.
 >
 > Regards from sunny London,
 > Pablo Galindo Salgado.
 >
 >

[Python-Dev] Re: [python-committers] Re: Performance benchmarks for 3.9

> Would it be possible instead to run git-bisect for only a _particular_
benchmark? It seems that may be all that’s needed to track down particular
regressions. Also, if e.g. git-bisect is used it wouldn’t be every e.g.
10th revision but rather O(log(n)) revisions.

That only works if there is a single change that produced the issue and not
many small changes that have a cumulative effect, which is normally the
case. Also, it does not work (is more tricky to make it work) if the issue
was introduced, then fixed somehow and then introduced again or in a worse
way.

On Wed, 14 Oct 2020 at 18:58, Chris Jerdonek 
wrote:

> MOn Wed, Oct 14, 2020 at 8:03 AM Pablo Galindo Salgado <
> pablog...@gmail.com> wrote:
>
>> > Would it be possible rerun the tests with the current
>> setup for say the last 1000 revisions or perhaps a subset of these
>> (e.g. every 10th revision) to try to binary search for the revision which
>> introduced the change ?
>>
>> Every run takes 1-2 h so doing 1000 would be certainly time-consuming :)
>>
>
> Would it be possible instead to run git-bisect for only a _particular_
> benchmark? It seems that may be all that’s needed to track down particular
> regressions. Also, if e.g. git-bisect is used it wouldn’t be every e.g.
> 10th revision but rather O(log(n)) revisions.
>
> —Chris
>
>
>
>
> That's why from now on I am trying to invest in daily builds for master,
>> so we can answer that exact question if we detect regressions in the
>> future.
>>
>>
>> On Wed, 14 Oct 2020 at 15:04, M.-A. Lemburg  wrote:
>>
>>> On 14.10.2020 16:00, Pablo Galindo Salgado wrote:
>>> >> Would it be possible to get the data for older runs back, so that
>>> > it's easier to find the changes which caused the slowdown ?
>>> >
>>> > Unfortunately no. The reasons are that that data was misleading because
>>> > different points were computed with a different version of
>>> pyperformance and
>>> > therefore with different packages (and therefore different code). So
>>> the points
>>> > could not be compared among themselves.
>>> >
>>> > Also, past data didn't include 3.9 commits because the data gathering
>>> was not
>>> > automated and it didn't run in a long time :(
>>>
>>> Make sense.
>>>
>>> Would it be possible rerun the tests with the current
>>> setup for say the last 1000 revisions or perhaps a subset of these
>>> (e.g. every 10th revision) to try to binary search for the revision which
>>> introduced the change ?
>>>
>>> > On Wed, 14 Oct 2020 at 14:57, M.-A. Lemburg >> > > wrote:
>>> >
>>> > Hi Pablo,
>>> >
>>> > thanks for pointing this out.
>>> >
>>> > Would it be possible to get the data for older runs back, so that
>>> > it's easier to find the changes which caused the slowdown ?
>>> >
>>> > Going to the timeline, it seems that the system only has data
>>> > for Oct 14 (today):
>>> >
>>> >
>>> https://speed.python.org/timeline/#/?exe=12=regex_dna=1=1000=off=on=on=none
>>> >
>>> > In addition to unpack_sequence, the regex_dna test has slowed
>>> > down a lot compared to Py3.8.
>>> >
>>> >
>>> https://github.com/python/pyperformance/blob/master/pyperformance/benchmarks/bm_unpack_sequence.py
>>> >
>>> https://github.com/python/pyperformance/blob/master/pyperformance/benchmarks/bm_regex_dna.py
>>> >
>>> > Thanks.
>>> >
>>> > On 14.10.2020 15:16, Pablo Galindo Salgado wrote:
>>> > > Hi!
>>> > >
>>> > > I have updated the branch benchmarks in the pyperformance server
>>> and now they
>>> > > include 3.9. There are
>>> > > some benchmarks that are faster but on the other hand some
>>> benchmarks are
>>> > > substantially slower, pointing
>>> > > at a possible performance regression in 3.9 in some aspects. In
>>> particular
>>> > some
>>> > > tests like "unpack sequence" are
>>> > > almost 20% slower. As there are some other tests were 3.9 is
>>> faster, is
>>> > not fair
>>> > > to conclude that 3.9 is slower, but
>>> > > this is something we should look into in my opinion.
>>> > >
>>> > > You can check these benchmarks I am talking about by:
>>> > >
>>> > > * Go here: https://speed.python.org/comparison/
>>> > > * In the left bar, select "lto-pgo latest in branch '3.9'" and
>>> "lto-pgo latest
>>> > > in branch '3.8'"
>>> > > * To better read the plot, I would recommend to select a
>>> "Normalization"
>>> > to the
>>> > > 3.8 branch (this is in the top part of the page)
>>> > >and to check the "horizontal" checkbox.
>>> > >
>>> > > These benchmarks are very stable: I have executed them several
>>> times over the
>>> > > weekend yielding the same results and,
>>> > > more importantly, they are being executed on a server specially
>>> prepared to
>>> > > running reproducible benchmarks: CPU affinity,
>>> > > CPU isolation, CPU pinning for NUMA nodes, CPU frequency is
>>> fixed, CPU
>>> > governor
>>> > > set to performance

[Python-Dev] Re: [python-committers] Re: Performance benchmarks for 3.9

2020-10-14 Thread Chris Jerdonek

MOn Wed, Oct 14, 2020 at 8:03 AM Pablo Galindo Salgado 
wrote:

> > Would it be possible rerun the tests with the current
> setup for say the last 1000 revisions or perhaps a subset of these
> (e.g. every 10th revision) to try to binary search for the revision which
> introduced the change ?
>
> Every run takes 1-2 h so doing 1000 would be certainly time-consuming :)
>

Would it be possible instead to run git-bisect for only a _particular_
benchmark? It seems that may be all that’s needed to track down particular
regressions. Also, if e.g. git-bisect is used it wouldn’t be every e.g.
10th revision but rather O(log(n)) revisions.

—Chris




That's why from now on I am trying to invest in daily builds for master,
> so we can answer that exact question if we detect regressions in the
> future.
>
>
> On Wed, 14 Oct 2020 at 15:04, M.-A. Lemburg  wrote:
>
>> On 14.10.2020 16:00, Pablo Galindo Salgado wrote:
>> >> Would it be possible to get the data for older runs back, so that
>> > it's easier to find the changes which caused the slowdown ?
>> >
>> > Unfortunately no. The reasons are that that data was misleading because
>> > different points were computed with a different version of
>> pyperformance and
>> > therefore with different packages (and therefore different code). So
>> the points
>> > could not be compared among themselves.
>> >
>> > Also, past data didn't include 3.9 commits because the data gathering
>> was not
>> > automated and it didn't run in a long time :(
>>
>> Make sense.
>>
>> Would it be possible rerun the tests with the current
>> setup for say the last 1000 revisions or perhaps a subset of these
>> (e.g. every 10th revision) to try to binary search for the revision which
>> introduced the change ?
>>
>> > On Wed, 14 Oct 2020 at 14:57, M.-A. Lemburg > > > wrote:
>> >
>> > Hi Pablo,
>> >
>> > thanks for pointing this out.
>> >
>> > Would it be possible to get the data for older runs back, so that
>> > it's easier to find the changes which caused the slowdown ?
>> >
>> > Going to the timeline, it seems that the system only has data
>> > for Oct 14 (today):
>> >
>> >
>> https://speed.python.org/timeline/#/?exe=12=regex_dna=1=1000=off=on=on=none
>> >
>> > In addition to unpack_sequence, the regex_dna test has slowed
>> > down a lot compared to Py3.8.
>> >
>> >
>> https://github.com/python/pyperformance/blob/master/pyperformance/benchmarks/bm_unpack_sequence.py
>> >
>> https://github.com/python/pyperformance/blob/master/pyperformance/benchmarks/bm_regex_dna.py
>> >
>> > Thanks.
>> >
>> > On 14.10.2020 15:16, Pablo Galindo Salgado wrote:
>> > > Hi!
>> > >
>> > > I have updated the branch benchmarks in the pyperformance server
>> and now they
>> > > include 3.9. There are
>> > > some benchmarks that are faster but on the other hand some
>> benchmarks are
>> > > substantially slower, pointing
>> > > at a possible performance regression in 3.9 in some aspects. In
>> particular
>> > some
>> > > tests like "unpack sequence" are
>> > > almost 20% slower. As there are some other tests were 3.9 is
>> faster, is
>> > not fair
>> > > to conclude that 3.9 is slower, but
>> > > this is something we should look into in my opinion.
>> > >
>> > > You can check these benchmarks I am talking about by:
>> > >
>> > > * Go here: https://speed.python.org/comparison/
>> > > * In the left bar, select "lto-pgo latest in branch '3.9'" and
>> "lto-pgo latest
>> > > in branch '3.8'"
>> > > * To better read the plot, I would recommend to select a
>> "Normalization"
>> > to the
>> > > 3.8 branch (this is in the top part of the page)
>> > >and to check the "horizontal" checkbox.
>> > >
>> > > These benchmarks are very stable: I have executed them several
>> times over the
>> > > weekend yielding the same results and,
>> > > more importantly, they are being executed on a server specially
>> prepared to
>> > > running reproducible benchmarks: CPU affinity,
>> > > CPU isolation, CPU pinning for NUMA nodes, CPU frequency is
>> fixed, CPU
>> > governor
>> > > set to performance mode, IRQ affinity is
>> > > disabled for the benchmarking CPU nodes...etc so you can trust
>> these numbers.
>> > >
>> > > I kindly suggest for everyone interested in trying to improve the
>> 3.9 (and
>> > > master) performance, to review these benchmarks
>> > > and try to identify the problems and fix them or to find what
>> changes
>> > introduced
>> > > the regressions in the first place. All benchmarks
>> > > are the ones being executed by the pyperformance suite
>> > > (https://github.com/python/pyperformance) so you can execute them
>> > > locally if you need to.
>> > >
>> > > ---
>> > >
>> > > On a related note, I am also working on the speed.python.org
>> > 
>> > >

[Python-Dev] Re: Remove module's version attributes in the stdlib

2020-10-14 Thread Brett Cannon

I think if the project is not maintained externally and thus synced into
the stdlib we can drop the attributes.

On Wed, Oct 14, 2020 at 8:44 AM Guido van Rossum  wrote:

> None of these have seen much adoption, so I think we can lose them without
> dire consequences. The info should be moved into a docstring or comment.
>
> On Wed, Oct 14, 2020 at 06:54 Serhiy Storchaka 
> wrote:
>
>> Some module attributes in the stdlib have attribute __version__. It
>> makes sense if the module is developed independently from Python, but
>> after inclusion in the stdlib it no longer have separate releases which
>> should be identified by version. New changes goes into module usually
>> without changing the value of __version__. Different versions of the
>> module for different Python version can have different features but the
>> same __version__.
>>
>> I propose to remove __version__ in all stdlib modules. Are there any
>> exceptions?
>>
>> Also, what do you think about other meta attributes like __author__,
>> __credits__, __email__, __copyright__, __about__, __date__?
>> ___
>> Python-Dev mailing list -- python-dev@python.org
>> To unsubscribe send an email to python-dev-le...@python.org
>> https://mail.python.org/mailman3/lists/python-dev.python.org/
>> Message archived at
>> https://mail.python.org/archives/list/python-dev@python.org/message/KBU4EU2JULXSMUZULD5HJJWCGOMN52MK/
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
> --
> --Guido (mobile)
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/RGHWIBDQJOHRUM726BR2WYZOBO2T5WIY/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/NS6YVA3QCAXELK7KROOLULKXOF3KFFJA/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Changing Python's string search algorithms

2020-10-14 Thread Tal Einat

On Wed, Oct 14, 2020 at 7:57 PM Tim Peters  wrote:
>
> [Guido]
> > Maybe someone reading this can finish the Wikipedia page on
> > Two-Way Search? The code example trails off with a function with
> > some incomprehensible remarks and then a TODO..
>
> Yes, the Wikipedia page is worse than useless in its current state,
> although some of the references it lists are helpful.  This is a much
> better summary:
>
> http://www-igm.univ-mlv.fr/~lecroq/string/node26.html#SECTION00260
>
> but, I believe, still far too telegraphic.
>
> The original paper is by far the best account I've seen so far, with
> complete code and exhaustive explanations and proofs.  Even examples
> ;-)
>
> But here's the thing: I don't believe this algorithm this _can_ be
> reduced to an elevator pitch.  Excruciating details appear to be
> fundamental at every step, and I've seen nothing yet anywhere
> approaching an "intuitive explanation" :-(  What happens instead is
> that you run it on the hardest cases you can dream up, and your jaw
> drops in amazement :-)

That sounds very interesting! I'll definitely be digging deeper into
the algorithm, papers and proofs of the underlying "Critical
Factorization" theorem. My goal would be to find a good way to explain
them, ideally some kind of interactive article. Ideally that would
also lead to better Wikipedia pages.

I'd be happy to help with this project. I've done quite a bit of
relevant work while developing my fuzzysearch[1] library.

- Tal Einat

[1] https://pypi.org/project/fuzzysearch/
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/HJQTATKMX7SOEUGI5XBVA6JQZKDFCBEH/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Changing Python's string search algorithms

[Guido]
> Maybe someone reading this can finish the Wikipedia page on
> Two-Way Search? The code example trails off with a function with
> some incomprehensible remarks and then a TODO..

Yes, the Wikipedia page is worse than useless in its current state,
although some of the references it lists are helpful.  This is a much
better summary:

http://www-igm.univ-mlv.fr/~lecroq/string/node26.html#SECTION00260

but, I believe, still far too telegraphic.

The original paper is by far the best account I've seen so far, with
complete code and exhaustive explanations and proofs.  Even examples
;-)

But here's the thing: I don't believe this algorithm this _can_ be
reduced to an elevator pitch.  Excruciating details appear to be
fundamental at every step, and I've seen nothing yet anywhere
approaching an "intuitive explanation" :-(  What happens instead is
that you run it on the hardest cases you can dream up, and your jaw
drops in amazement :-)
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/B2MPSYXRKFA3V3GP45GAFSIKKCK5NHJ3/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Changing Python's string search algorithms

2020-10-14 Thread Guido van Rossum

Maybe someone reading this can finish the Wikipedia page on Two-Way Search?
The code example trails off with a function with some incomprehensible
remarks and then a TODO...

On Wed, Oct 14, 2020 at 9:07 AM Tim Peters  wrote:

> Rest assured that Dennis is aware of that pragmatics may change for
> shorter needles.
>
> The code has always made a special-case of 1-character needles,
> because it's impossible "even in theory" to improve over
> straightforward brute force search then.
>
> Say the length of the text to search is `t`, and the length of the
> pattern `p`. Brute force and the current code have worst case O(t*p)
> behavior. The new code, worst case O(t+p). If that's as deep as you
> dig into it, it seems all but obvious then that O(t*p) just can't be
> that bad when p is small, so keep it simple.
>
> But there's another twist:  the current and new code both have O(t/p)
> cases too (but brute force, and even Knuth-Morris-Pratt, don't). That
> can be highly beneficial even for p as small as 3.
>
> Unfortunately, the exact cases in which the current and new code enjoy
> O(t/p) behavior aren't the same.
>
> Lying a bit:  In general the current code has just two tricks. One of
> those tricks is useless (pure waste of time) if the pattern happens to
> end with a repeated character pair, and is most effective if the last
> character appears nowhere else in the pattern. The other trick is most
> effective if the set of characters in the text has no intersection
> with the set of characters in the pattern (note that the search is
> certain to fail then), but useless if the set of text characters is a
> subset of the set of pattern characters (as, e.g., it very often is in
> real-life searches in apps with a small alphabet, like [ACGT}+ for
> genomes).
>
> But I don't know how to characterize when the new code gets maximum
> benefit. It's not based on intuitive tricks.  The paper that
> introduced it[1] says it's based on "a deep theorem on words
> known as the Critical Factorization Theorem due to Cesari, Duval,
> Vincent, and Lothaire", and I still don't fully understand why it
> works.
>
> It's clear enough, though, that the new code gets real systematic
> value out of patterns with repetitions (like "abab"), where the
> current code gets real value from those only by accident (e.g., "a"
> and "b" happen to appear rarely in the text being searched, in which
> case that the pattern has repetitions is irrelevant).
>
> But, as I said in the first message, the PR is "preliminary". There
> are still worlds of tweaks that have been thought of but not yet tried
> even once.
>
> [1] search for "Two-Way String Matching" by Crochemore and Perrin.
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/G53VXXYWWEM26S2XKVX5W6P54R47YQTG/
> Code of Conduct: http://python.org/psf/codeofconduct/
>


-- 
--Guido van Rossum (python.org/~guido)
*Pronouns: he/him **(why is my pronoun here?)*

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/NHG5YVVVZBQSEZBSCDVLETDTFSHBMKBV/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: [python-committers] Re: Performance benchmarks for 3.9

On 14.10.2020 17:59, Antoine Pitrou wrote:
> 
> Le 14/10/2020 à 17:25, M.-A. Lemburg a écrit :
>>
>> Well, there's a trend here:
>>
>> [...]
>>
>> Those two benchmarks were somewhat faster in Py3.7 and got slower in 3.8
>> and then again in 3.9, so this is more than just an artifact.
> 
> unpack-sequence is a micro-benchmark.  It's useful if you want to
> investigate the cause of a regression witnessed elsewhere (or if you're
> changing things in precisely that part of the interpreter), but it's not
> relevant in itself to measure Python performance.

Since unpacking is done a lot in Python applications, this particular
micro benchmark does have an effect on overall performance and there
was some recent discussion about exactly this part of the code slowing
down (even though the effects were related to macOS only AFAIR).

As with most micro benchmarks, you typically don't see the effect
of one slowdown or speedup in applications. Only if several such
changes come together, you notice a change.

That said, it's still good practice to keep an eye on such performance
regressions and also to improve upon micro benchmarks.

The latter was my main motiviation for writing pybench back in 1997,
which focuses on such micro benchmarks, rather than higher level
benchmarks, where it's much harder to find out why performance
changed.

> regex-dna is a "mini"-benchmark. I suppose someone could look if there
> were any potentially relevant changes done in the regex engine, that
> would explain the changes.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts (#1, Oct 14 2020)
>>> Python Projects, Coaching and Support ...https://www.egenix.com/
>>> Python Product Development ...https://consulting.egenix.com/

::: We implement business ideas - efficiently in both time and costs :::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   https://www.egenix.com/company/contact/
 https://www.malemburg.com/
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/S5RDUVCUER2GIAOBNTLOZ7QUNCRDDIWJ/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Changing Python's string search algorithms

Rest assured that Dennis is aware of that pragmatics may change for
shorter needles.

The code has always made a special-case of 1-character needles,
because it's impossible "even in theory" to improve over
straightforward brute force search then.

Say the length of the text to search is `t`, and the length of the
pattern `p`. Brute force and the current code have worst case O(t*p)
behavior. The new code, worst case O(t+p). If that's as deep as you
dig into it, it seems all but obvious then that O(t*p) just can't be
that bad when p is small, so keep it simple.

But there's another twist: the current and new code both have O(t/p)
cases too (but brute force, and even Knuth-Morris-Pratt, don't). That
can be highly beneficial even for p as small as 3.

Unfortunately, the exact cases in which the current and new code enjoy
O(t/p) behavior aren't the same.

Lying a bit: In general the current code has just two tricks. One of
those tricks is useless (pure waste of time) if the pattern happens to
end with a repeated character pair, and is most effective if the last
character appears nowhere else in the pattern. The other trick is most
effective if the set of characters in the text has no intersection
with the set of characters in the pattern (note that the search is
certain to fail then), but useless if the set of text characters is a
subset of the set of pattern characters (as, e.g., it very often is in
real-life searches in apps with a small alphabet, like [ACGT}+ for
genomes).

But I don't know how to characterize when the new code gets maximum
benefit. It's not based on intuitive tricks. The paper that
introduced it[1] says it's based on "a deep theorem on words
known as the Critical Factorization Theorem due to Cesari, Duval,
Vincent, and Lothaire", and I still don't fully understand why it
works.

It's clear enough, though, that the new code gets real systematic
value out of patterns with repetitions (like "abab"), where the
current code gets real value from those only by accident (e.g., "a"
and "b" happen to appear rarely in the text being searched, in which
case that the pattern has repetitions is irrelevant).

But, as I said in the first message, the PR is "preliminary". There
are still worlds of tweaks that have been thought of but not yet tried
even once.

[1] search for "Two-Way String Matching" by Crochemore and Perrin.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at
https://mail.python.org/archives/list/python-dev@python.org/message/G53VXXYWWEM26S2XKVX5W6P54R47YQTG/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: [python-committers] Re: Performance benchmarks for 3.9

2020-10-14 Thread Antoine Pitrou


Le 14/10/2020 à 17:25, M.-A. Lemburg a écrit :
> 
> Well, there's a trend here:
> 
> [...]
> 
> Those two benchmarks were somewhat faster in Py3.7 and got slower in 3.8
> and then again in 3.9, so this is more than just an artifact.

unpack-sequence is a micro-benchmark.  It's useful if you want to
investigate the cause of a regression witnessed elsewhere (or if you're
changing things in precisely that part of the interpreter), but it's not
relevant in itself to measure Python performance.

regex-dna is a "mini"-benchmark. I suppose someone could look if there
were any potentially relevant changes done in the regex engine, that
would explain the changes.

Regards

Antoine.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/52RWCMURIETH5IYWXCDTO7PKC5CGFNH6/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Remove module's version attributes in the stdlib

2020-10-14 Thread Guido van Rossum

None of these have seen much adoption, so I think we can lose them without
dire consequences. The info should be moved into a docstring or comment.

On Wed, Oct 14, 2020 at 06:54 Serhiy Storchaka  wrote:

> Some module attributes in the stdlib have attribute __version__. It
> makes sense if the module is developed independently from Python, but
> after inclusion in the stdlib it no longer have separate releases which
> should be identified by version. New changes goes into module usually
> without changing the value of __version__. Different versions of the
> module for different Python version can have different features but the
> same __version__.
>
> I propose to remove __version__ in all stdlib modules. Are there any
> exceptions?
>
> Also, what do you think about other meta attributes like __author__,
> __credits__, __email__, __copyright__, __about__, __date__?
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/KBU4EU2JULXSMUZULD5HJJWCGOMN52MK/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
-- 
--Guido (mobile)
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/RGHWIBDQJOHRUM726BR2WYZOBO2T5WIY/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Remove module's version attributes in the stdlib

2020-10-14 Thread Victor Stinner

Hi,

I was always confused by the __version__ variable of *some* modules.
It's surprising since it's no longer incremented when the module is
fixed or gets new features. Also, the number is unrelated to the
Python version. I suggest to remove __version__.

__author__, __credits__, __email__, __copyright__: can these
information be kept as *comments*?

__date__: what is that? Is it still relevant in 2020 since Python uses
Git? I suggest removing it.

For modules even have a changelog. Should it be removed since it is no
longer updated for years?

Many of these variables are ghosts from the early CVS time of Python :-)

Victor

Le mer. 14 oct. 2020 à 15:57, Serhiy Storchaka  a écrit :
>
> Some module attributes in the stdlib have attribute __version__. It
> makes sense if the module is developed independently from Python, but
> after inclusion in the stdlib it no longer have separate releases which
> should be identified by version. New changes goes into module usually
> without changing the value of __version__. Different versions of the
> module for different Python version can have different features but the
> same __version__.
>
> I propose to remove __version__ in all stdlib modules. Are there any
> exceptions?
>
> Also, what do you think about other meta attributes like __author__,
> __credits__, __email__, __copyright__, __about__, __date__?
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at 
> https://mail.python.org/archives/list/python-dev@python.org/message/KBU4EU2JULXSMUZULD5HJJWCGOMN52MK/
> Code of Conduct: http://python.org/psf/codeofconduct/



-- 
Night gathers, and now my watch begins. It shall not end until my death.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/HH726TXVDUF4GSTUJALKU3WT5LIQNOAZ/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: [python-committers] Re: Performance benchmarks for 3.9

2020-10-14 Thread Victor Stinner

I suggest to limit to one "dot" per week, since CodeSpeed (the website
to browse the benchmark results) is somehow limited to 50 dots (it can
display more if you only display a single benchmark).

Previously, it was closer to one "dot" per month which allowed to
display a timeline over 5 years. In my experience, significant
performance changes are rare and only happen once every 3 months. So a
granularity of 1 day is not needed.

We may consider to use the tool "asv" which has a nice web UI to
browse results. It also provides a tool to automatically run a bisect
to identify which commit introduced a speedup or slowdown.

Last time I checked, asv has a simpler way to run benchmarks than
pyperf. It doesn't spawn multiple processes for example. I don't know
if it would be possible to plug pyperf into asv.

Victor

Le mer. 14 oct. 2020 à 17:03, Pablo Galindo Salgado
 a écrit :
>
> > Would it be possible rerun the tests with the current
> setup for say the last 1000 revisions or perhaps a subset of these
> (e.g. every 10th revision) to try to binary search for the revision which
> introduced the change ?
>
> Every run takes 1-2 h so doing 1000 would be certainly time-consuming :)
>
> That's why from now on I am trying to invest in daily builds for master,
> so we can answer that exact question if we detect regressions in the future.
>
>
> On Wed, 14 Oct 2020 at 15:04, M.-A. Lemburg  wrote:
>>
>> On 14.10.2020 16:00, Pablo Galindo Salgado wrote:
>> >> Would it be possible to get the data for older runs back, so that
>> > it's easier to find the changes which caused the slowdown ?
>> >
>> > Unfortunately no. The reasons are that that data was misleading because
>> > different points were computed with a different version of pyperformance 
>> > and
>> > therefore with different packages (and therefore different code). So the 
>> > points
>> > could not be compared among themselves.
>> >
>> > Also, past data didn't include 3.9 commits because the data gathering was 
>> > not
>> > automated and it didn't run in a long time :(
>>
>> Make sense.
>>
>> Would it be possible rerun the tests with the current
>> setup for say the last 1000 revisions or perhaps a subset of these
>> (e.g. every 10th revision) to try to binary search for the revision which
>> introduced the change ?
>>
>> > On Wed, 14 Oct 2020 at 14:57, M.-A. Lemburg > > > wrote:
>> >
>> > Hi Pablo,
>> >
>> > thanks for pointing this out.
>> >
>> > Would it be possible to get the data for older runs back, so that
>> > it's easier to find the changes which caused the slowdown ?
>> >
>> > Going to the timeline, it seems that the system only has data
>> > for Oct 14 (today):
>> >
>> > 
>> > https://speed.python.org/timeline/#/?exe=12=regex_dna=1=1000=off=on=on=none
>> >
>> > In addition to unpack_sequence, the regex_dna test has slowed
>> > down a lot compared to Py3.8.
>> >
>> > 
>> > https://github.com/python/pyperformance/blob/master/pyperformance/benchmarks/bm_unpack_sequence.py
>> > 
>> > https://github.com/python/pyperformance/blob/master/pyperformance/benchmarks/bm_regex_dna.py
>> >
>> > Thanks.
>> >
>> > On 14.10.2020 15:16, Pablo Galindo Salgado wrote:
>> > > Hi!
>> > >
>> > > I have updated the branch benchmarks in the pyperformance server and 
>> > now they
>> > > include 3.9. There are
>> > > some benchmarks that are faster but on the other hand some 
>> > benchmarks are
>> > > substantially slower, pointing
>> > > at a possible performance regression in 3.9 in some aspects. In 
>> > particular
>> > some
>> > > tests like "unpack sequence" are
>> > > almost 20% slower. As there are some other tests were 3.9 is faster, 
>> > is
>> > not fair
>> > > to conclude that 3.9 is slower, but
>> > > this is something we should look into in my opinion.
>> > >
>> > > You can check these benchmarks I am talking about by:
>> > >
>> > > * Go here: https://speed.python.org/comparison/
>> > > * In the left bar, select "lto-pgo latest in branch '3.9'" and 
>> > "lto-pgo latest
>> > > in branch '3.8'"
>> > > * To better read the plot, I would recommend to select a 
>> > "Normalization"
>> > to the
>> > > 3.8 branch (this is in the top part of the page)
>> > >and to check the "horizontal" checkbox.
>> > >
>> > > These benchmarks are very stable: I have executed them several times 
>> > over the
>> > > weekend yielding the same results and,
>> > > more importantly, they are being executed on a server specially 
>> > prepared to
>> > > running reproducible benchmarks: CPU affinity,
>> > > CPU isolation, CPU pinning for NUMA nodes, CPU frequency is fixed, 
>> > CPU
>> > governor
>> > > set to performance mode, IRQ affinity is
>> > > disabled for the benchmarking CPU nodes...etc so you can trust these 
>> > numbers.
>> > >
>> > > I kindly suggest for everyone

[Python-Dev] Re: [python-committers] Re: Performance benchmarks for 3.9

On 14.10.2020 16:14, Antoine Pitrou wrote:
> Le 14/10/2020 à 15:16, Pablo Galindo Salgado a écrit :
>> Hi!
>>
>> I have updated the branch benchmarks in the pyperformance server and now
>> they include 3.9. There are
>> some benchmarks that are faster but on the other hand some benchmarks
>> are substantially slower, pointing
>> at a possible performance regression in 3.9 in some aspects. In
>> particular some tests like "unpack sequence" are
>> almost 20% slower. As there are some other tests were 3.9 is faster, is
>> not fair to conclude that 3.9 is slower, but
>> this is something we should look into in my opinion.
>>
>> You can check these benchmarks I am talking about by:
>>
>> * Go here: https://speed.python.org/comparison/
>> * In the left bar, select "lto-pgo latest in branch '3.9'" and "lto-pgo
>> latest in branch '3.8'"
>> * To better read the plot, I would recommend to select a "Normalization"
>> to the 3.8 branch (this is in the top part of the page)
>>    and to check the "horizontal" checkbox.
> Those numbers tell me that it's a wash.  I wouldn't worry about a small
> regression on a micro- or mini-benchmark while the overall picture is
> stable.

Well, there's a trend here:



Those two benchmarks were somewhat faster in Py3.7 and got slower in 3.8 and
then again in 3.9, so this is more than just an artifact.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts (#1, Oct 14 2020)
>>> Python Projects, Coaching and Support ...https://www.egenix.com/
>>> Python Product Development ...https://consulting.egenix.com/


::: We implement business ideas - efficiently in both time and costs :::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   https://www.egenix.com/company/contact/
 https://www.malemburg.com/

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/XICRZGNDU6TOG3VNFHKTOCVBQO3ZEZ5L/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: [python-committers] Re: Performance benchmarks for 3.9

>  I wouldn't worry about a small regression on a micro- or mini-benchmark
while the overall picture is
stable.

Absolutely, I agree is not something to *worry* but I think it makes sense
to investigate as
the possible fix may be trivial. Part of the reason I wanted to recompute
them was because
the micro-benchmarks published in the What's new of 3.9 were confusing a
lot of users that
were thinking if 3.9 was slower.

On Wed, 14 Oct 2020 at 15:14, Antoine Pitrou  wrote:

>
> Le 14/10/2020 à 15:16, Pablo Galindo Salgado a écrit :
> > Hi!
> >
> > I have updated the branch benchmarks in the pyperformance server and now
> > they include 3.9. There are
> > some benchmarks that are faster but on the other hand some benchmarks
> > are substantially slower, pointing
> > at a possible performance regression in 3.9 in some aspects. In
> > particular some tests like "unpack sequence" are
> > almost 20% slower. As there are some other tests were 3.9 is faster, is
> > not fair to conclude that 3.9 is slower, but
> > this is something we should look into in my opinion.
> >
> > You can check these benchmarks I am talking about by:
> >
> > * Go here: https://speed.python.org/comparison/
> > * In the left bar, select "lto-pgo latest in branch '3.9'" and "lto-pgo
> > latest in branch '3.8'"
> > * To better read the plot, I would recommend to select a "Normalization"
> > to the 3.8 branch (this is in the top part of the page)
> >and to check the "horizontal" checkbox.
>
> Those numbers tell me that it's a wash.  I wouldn't worry about a small
> regression on a micro- or mini-benchmark while the overall picture is
> stable.
>
> Regards
>
> Antoine.
> ___
> python-committers mailing list -- python-committ...@python.org
> To unsubscribe send an email to python-committers-le...@python.org
> https://mail.python.org/mailman3/lists/python-committers.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-committ...@python.org/message/WMNBN4LI5W7U5HKPJWQOHGZXK4X3IRHV/
> Code of Conduct: https://www.python.org/psf/codeofconduct/
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/BZVCEUK42OZEN733LZB6OYXDV22GGXLL/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: [python-committers] Performance benchmarks for 3.9

> Would it be possible rerun the tests with the current
setup for say the last 1000 revisions or perhaps a subset of these
(e.g. every 10th revision) to try to binary search for the revision which
introduced the change ?

Every run takes 1-2 h so doing 1000 would be certainly time-consuming :)

That's why from now on I am trying to invest in daily builds for master,
so we can answer that exact question if we detect regressions in the future.


On Wed, 14 Oct 2020 at 15:04, M.-A. Lemburg  wrote:

> On 14.10.2020 16:00, Pablo Galindo Salgado wrote:
> >> Would it be possible to get the data for older runs back, so that
> > it's easier to find the changes which caused the slowdown ?
> >
> > Unfortunately no. The reasons are that that data was misleading because
> > different points were computed with a different version of pyperformance
> and
> > therefore with different packages (and therefore different code). So the
> points
> > could not be compared among themselves.
> >
> > Also, past data didn't include 3.9 commits because the data gathering
> was not
> > automated and it didn't run in a long time :(
>
> Make sense.
>
> Would it be possible rerun the tests with the current
> setup for say the last 1000 revisions or perhaps a subset of these
> (e.g. every 10th revision) to try to binary search for the revision which
> introduced the change ?
>
> > On Wed, 14 Oct 2020 at 14:57, M.-A. Lemburg  > > wrote:
> >
> > Hi Pablo,
> >
> > thanks for pointing this out.
> >
> > Would it be possible to get the data for older runs back, so that
> > it's easier to find the changes which caused the slowdown ?
> >
> > Going to the timeline, it seems that the system only has data
> > for Oct 14 (today):
> >
> >
> https://speed.python.org/timeline/#/?exe=12=regex_dna=1=1000=off=on=on=none
> >
> > In addition to unpack_sequence, the regex_dna test has slowed
> > down a lot compared to Py3.8.
> >
> >
> https://github.com/python/pyperformance/blob/master/pyperformance/benchmarks/bm_unpack_sequence.py
> >
> https://github.com/python/pyperformance/blob/master/pyperformance/benchmarks/bm_regex_dna.py
> >
> > Thanks.
> >
> > On 14.10.2020 15:16, Pablo Galindo Salgado wrote:
> > > Hi!
> > >
> > > I have updated the branch benchmarks in the pyperformance server
> and now they
> > > include 3.9. There are
> > > some benchmarks that are faster but on the other hand some
> benchmarks are
> > > substantially slower, pointing
> > > at a possible performance regression in 3.9 in some aspects. In
> particular
> > some
> > > tests like "unpack sequence" are
> > > almost 20% slower. As there are some other tests were 3.9 is
> faster, is
> > not fair
> > > to conclude that 3.9 is slower, but
> > > this is something we should look into in my opinion.
> > >
> > > You can check these benchmarks I am talking about by:
> > >
> > > * Go here: https://speed.python.org/comparison/
> > > * In the left bar, select "lto-pgo latest in branch '3.9'" and
> "lto-pgo latest
> > > in branch '3.8'"
> > > * To better read the plot, I would recommend to select a
> "Normalization"
> > to the
> > > 3.8 branch (this is in the top part of the page)
> > >and to check the "horizontal" checkbox.
> > >
> > > These benchmarks are very stable: I have executed them several
> times over the
> > > weekend yielding the same results and,
> > > more importantly, they are being executed on a server specially
> prepared to
> > > running reproducible benchmarks: CPU affinity,
> > > CPU isolation, CPU pinning for NUMA nodes, CPU frequency is fixed,
> CPU
> > governor
> > > set to performance mode, IRQ affinity is
> > > disabled for the benchmarking CPU nodes...etc so you can trust
> these numbers.
> > >
> > > I kindly suggest for everyone interested in trying to improve the
> 3.9 (and
> > > master) performance, to review these benchmarks
> > > and try to identify the problems and fix them or to find what
> changes
> > introduced
> > > the regressions in the first place. All benchmarks
> > > are the ones being executed by the pyperformance suite
> > > (https://github.com/python/pyperformance) so you can execute them
> > > locally if you need to.
> > >
> > > ---
> > >
> > > On a related note, I am also working on the speed.python.org
> > 
> > >  server to provide more automation and
> > > ideally some integrations with GitHub to detect performance
> regressions. For
> > > now, I have done the following:
> > >
> > > * Recompute benchmarks for all branches using the same version of
> > > pyperformance (except master) so they can
> > >be compared with each other. This can only be seen in the
> "Comparison"
> > > tab: https://speed.python.org/comparison/
>

[Python-Dev] Re: [python-committers] Performance benchmarks for 3.9

2020-10-14 Thread Antoine Pitrou


Le 14/10/2020 à 15:16, Pablo Galindo Salgado a écrit :
> Hi!
> 
> I have updated the branch benchmarks in the pyperformance server and now
> they include 3.9. There are
> some benchmarks that are faster but on the other hand some benchmarks
> are substantially slower, pointing
> at a possible performance regression in 3.9 in some aspects. In
> particular some tests like "unpack sequence" are
> almost 20% slower. As there are some other tests were 3.9 is faster, is
> not fair to conclude that 3.9 is slower, but
> this is something we should look into in my opinion.
> 
> You can check these benchmarks I am talking about by:
> 
> * Go here: https://speed.python.org/comparison/
> * In the left bar, select "lto-pgo latest in branch '3.9'" and "lto-pgo
> latest in branch '3.8'"
> * To better read the plot, I would recommend to select a "Normalization"
> to the 3.8 branch (this is in the top part of the page)
>    and to check the "horizontal" checkbox.

Those numbers tell me that it's a wash.  I wouldn't worry about a small
regression on a micro- or mini-benchmark while the overall picture is
stable.

Regards

Antoine.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/WMNBN4LI5W7U5HKPJWQOHGZXK4X3IRHV/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: [python-committers] Performance benchmarks for 3.9

On 14.10.2020 16:00, Pablo Galindo Salgado wrote:
>> Would it be possible to get the data for older runs back, so that
> it's easier to find the changes which caused the slowdown ?
> 
> Unfortunately no. The reasons are that that data was misleading because
> different points were computed with a different version of pyperformance and
> therefore with different packages (and therefore different code). So the 
> points
> could not be compared among themselves.
> 
> Also, past data didn't include 3.9 commits because the data gathering was not
> automated and it didn't run in a long time :(

Make sense.

Would it be possible rerun the tests with the current
setup for say the last 1000 revisions or perhaps a subset of these
(e.g. every 10th revision) to try to binary search for the revision which
introduced the change ?

> On Wed, 14 Oct 2020 at 14:57, M.-A. Lemburg  > wrote:
> 
> Hi Pablo,
> 
> thanks for pointing this out.
> 
> Would it be possible to get the data for older runs back, so that
> it's easier to find the changes which caused the slowdown ?
> 
> Going to the timeline, it seems that the system only has data
> for Oct 14 (today):
> 
> 
> https://speed.python.org/timeline/#/?exe=12=regex_dna=1=1000=off=on=on=none
> 
> In addition to unpack_sequence, the regex_dna test has slowed
> down a lot compared to Py3.8.
> 
> 
> https://github.com/python/pyperformance/blob/master/pyperformance/benchmarks/bm_unpack_sequence.py
> 
> https://github.com/python/pyperformance/blob/master/pyperformance/benchmarks/bm_regex_dna.py
> 
> Thanks.
> 
> On 14.10.2020 15:16, Pablo Galindo Salgado wrote:
> > Hi!
> >
> > I have updated the branch benchmarks in the pyperformance server and 
> now they
> > include 3.9. There are
> > some benchmarks that are faster but on the other hand some benchmarks 
> are
> > substantially slower, pointing
> > at a possible performance regression in 3.9 in some aspects. In 
> particular
> some
> > tests like "unpack sequence" are
> > almost 20% slower. As there are some other tests were 3.9 is faster, is
> not fair
> > to conclude that 3.9 is slower, but
> > this is something we should look into in my opinion.
> >
> > You can check these benchmarks I am talking about by:
> >
> > * Go here: https://speed.python.org/comparison/
> > * In the left bar, select "lto-pgo latest in branch '3.9'" and "lto-pgo 
> latest
> > in branch '3.8'"
> > * To better read the plot, I would recommend to select a "Normalization"
> to the
> > 3.8 branch (this is in the top part of the page)
> >    and to check the "horizontal" checkbox.
> >
> > These benchmarks are very stable: I have executed them several times 
> over the
> > weekend yielding the same results and,
> > more importantly, they are being executed on a server specially 
> prepared to
> > running reproducible benchmarks: CPU affinity,
> > CPU isolation, CPU pinning for NUMA nodes, CPU frequency is fixed, CPU
> governor
> > set to performance mode, IRQ affinity is
> > disabled for the benchmarking CPU nodes...etc so you can trust these 
> numbers.
> >
> > I kindly suggest for everyone interested in trying to improve the 3.9 
> (and
> > master) performance, to review these benchmarks
> > and try to identify the problems and fix them or to find what changes
> introduced
> > the regressions in the first place. All benchmarks
> > are the ones being executed by the pyperformance suite
> > (https://github.com/python/pyperformance) so you can execute them
> > locally if you need to.
> >
> > ---
> >
> > On a related note, I am also working on the speed.python.org
> 
> >  server to provide more automation and
> > ideally some integrations with GitHub to detect performance 
> regressions. For
> > now, I have done the following:
> >
> > * Recompute benchmarks for all branches using the same version of
> > pyperformance (except master) so they can
> >    be compared with each other. This can only be seen in the 
> "Comparison"
> > tab: https://speed.python.org/comparison/
> > * I am setting daily builds of the master branch so we can detect 
> performance
> > regressions with daily granularity. These
> >    daily builds will be located in the "Changes" and "Timeline" tabs
> > (https://speed.python.org/timeline/).
> > * Once the daily builds are working as expected, I plan to work on 
> trying to
> > automatically comment or PRs or on bpo if
> > we detect that a commit has introduced some notable performance 
> regression.
> >
> > Regards from sunny London,
> > Pablo Galindo Salgado.
> >
> > ___
> > python-committers mailing list --

[Python-Dev] Remove module's version attributes in the stdlib

2020-10-14 Thread Serhiy Storchaka

Some module attributes in the stdlib have attribute __version__. It
makes sense if the module is developed independently from Python, but
after inclusion in the stdlib it no longer have separate releases which
should be identified by version. New changes goes into module usually
without changing the value of __version__. Different versions of the
module for different Python version can have different features but the
same __version__.

I propose to remove __version__ in all stdlib modules. Are there any
exceptions?

Also, what do you think about other meta attributes like __author__,
__credits__, __email__, __copyright__, __about__, __date__?
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/KBU4EU2JULXSMUZULD5HJJWCGOMN52MK/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: [python-committers] Performance benchmarks for 3.9

Hi Pablo,

thanks for pointing this out.

Would it be possible to get the data for older runs back, so that
it's easier to find the changes which caused the slowdown ?

Going to the timeline, it seems that the system only has data
for Oct 14 (today):

https://speed.python.org/timeline/#/?exe=12=regex_dna=1=1000=off=on=on=none

In addition to unpack_sequence, the regex_dna test has slowed
down a lot compared to Py3.8.

https://github.com/python/pyperformance/blob/master/pyperformance/benchmarks/bm_unpack_sequence.py
https://github.com/python/pyperformance/blob/master/pyperformance/benchmarks/bm_regex_dna.py

Thanks.

On 14.10.2020 15:16, Pablo Galindo Salgado wrote:
> Hi!
> 
> I have updated the branch benchmarks in the pyperformance server and now they
> include 3.9. There are
> some benchmarks that are faster but on the other hand some benchmarks are
> substantially slower, pointing
> at a possible performance regression in 3.9 in some aspects. In particular 
> some
> tests like "unpack sequence" are
> almost 20% slower. As there are some other tests were 3.9 is faster, is not 
> fair
> to conclude that 3.9 is slower, but
> this is something we should look into in my opinion.
> 
> You can check these benchmarks I am talking about by:
> 
> * Go here: https://speed.python.org/comparison/
> * In the left bar, select "lto-pgo latest in branch '3.9'" and "lto-pgo latest
> in branch '3.8'"
> * To better read the plot, I would recommend to select a "Normalization" to 
> the
> 3.8 branch (this is in the top part of the page)
>    and to check the "horizontal" checkbox.
> 
> These benchmarks are very stable: I have executed them several times over the
> weekend yielding the same results and,
> more importantly, they are being executed on a server specially prepared to
> running reproducible benchmarks: CPU affinity,
> CPU isolation, CPU pinning for NUMA nodes, CPU frequency is fixed, CPU 
> governor
> set to performance mode, IRQ affinity is
> disabled for the benchmarking CPU nodes...etc so you can trust these numbers.
> 
> I kindly suggest for everyone interested in trying to improve the 3.9 (and
> master) performance, to review these benchmarks
> and try to identify the problems and fix them or to find what changes 
> introduced
> the regressions in the first place. All benchmarks
> are the ones being executed by the pyperformance suite
> (https://github.com/python/pyperformance) so you can execute them
> locally if you need to.
> 
> ---
> 
> On a related note, I am also working on the speed.python.org
>  server to provide more automation and
> ideally some integrations with GitHub to detect performance regressions. For
> now, I have done the following:
> 
> * Recompute benchmarks for all branches using the same version of
> pyperformance (except master) so they can
>    be compared with each other. This can only be seen in the "Comparison"
> tab: https://speed.python.org/comparison/
> * I am setting daily builds of the master branch so we can detect performance
> regressions with daily granularity. These
>    daily builds will be located in the "Changes" and "Timeline" tabs
> (https://speed.python.org/timeline/).
> * Once the daily builds are working as expected, I plan to work on trying to
> automatically comment or PRs or on bpo if
> we detect that a commit has introduced some notable performance regression.
> 
> Regards from sunny London,
> Pablo Galindo Salgado.
> 
> ___
> python-committers mailing list -- python-committ...@python.org
> To unsubscribe send an email to python-committers-le...@python.org
> https://mail.python.org/mailman3/lists/python-committers.python.org/
> Message archived at 
> https://mail.python.org/archives/list/python-committ...@python.org/message/G3LB4BCAY7T7WG22YQJNQ64XA4BXBCT4/
> Code of Conduct: https://www.python.org/psf/codeofconduct/
> 

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts (#1, Oct 14 2020)
>>> Python Projects, Coaching and Support ...https://www.egenix.com/
>>> Python Product Development ...https://consulting.egenix.com/


::: We implement business ideas - efficiently in both time and costs :::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   https://www.egenix.com/company/contact/
 https://www.malemburg.com/
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/P7Y2HFD62P6FYLNJFE6BZ7XHQ2NEN6J5/
Code of Conduct:

[Python-Dev] Re: [python-committers] Performance benchmarks for 3.9

> Would it be possible to get the data for older runs back, so that
it's easier to find the changes which caused the slowdown ?

Unfortunately no. The reasons are that that data was misleading because
different points were computed with a different version of pyperformance
and therefore with different packages (and therefore different code). So
the points could not be compared among themselves.

Also, past data didn't include 3.9 commits because the data gathering was
not automated and it didn't run in a long time :(


On Wed, 14 Oct 2020 at 14:57, M.-A. Lemburg  wrote:

> Hi Pablo,
>
> thanks for pointing this out.
>
> Would it be possible to get the data for older runs back, so that
> it's easier to find the changes which caused the slowdown ?
>
> Going to the timeline, it seems that the system only has data
> for Oct 14 (today):
>
>
> https://speed.python.org/timeline/#/?exe=12=regex_dna=1=1000=off=on=on=none
>
> In addition to unpack_sequence, the regex_dna test has slowed
> down a lot compared to Py3.8.
>
>
> https://github.com/python/pyperformance/blob/master/pyperformance/benchmarks/bm_unpack_sequence.py
>
> https://github.com/python/pyperformance/blob/master/pyperformance/benchmarks/bm_regex_dna.py
>
> Thanks.
>
> On 14.10.2020 15:16, Pablo Galindo Salgado wrote:
> > Hi!
> >
> > I have updated the branch benchmarks in the pyperformance server and now
> they
> > include 3.9. There are
> > some benchmarks that are faster but on the other hand some benchmarks are
> > substantially slower, pointing
> > at a possible performance regression in 3.9 in some aspects. In
> particular some
> > tests like "unpack sequence" are
> > almost 20% slower. As there are some other tests were 3.9 is faster, is
> not fair
> > to conclude that 3.9 is slower, but
> > this is something we should look into in my opinion.
> >
> > You can check these benchmarks I am talking about by:
> >
> > * Go here: https://speed.python.org/comparison/
> > * In the left bar, select "lto-pgo latest in branch '3.9'" and "lto-pgo
> latest
> > in branch '3.8'"
> > * To better read the plot, I would recommend to select a "Normalization"
> to the
> > 3.8 branch (this is in the top part of the page)
> >and to check the "horizontal" checkbox.
> >
> > These benchmarks are very stable: I have executed them several times
> over the
> > weekend yielding the same results and,
> > more importantly, they are being executed on a server specially prepared
> to
> > running reproducible benchmarks: CPU affinity,
> > CPU isolation, CPU pinning for NUMA nodes, CPU frequency is fixed, CPU
> governor
> > set to performance mode, IRQ affinity is
> > disabled for the benchmarking CPU nodes...etc so you can trust these
> numbers.
> >
> > I kindly suggest for everyone interested in trying to improve the 3.9
> (and
> > master) performance, to review these benchmarks
> > and try to identify the problems and fix them or to find what changes
> introduced
> > the regressions in the first place. All benchmarks
> > are the ones being executed by the pyperformance suite
> > (https://github.com/python/pyperformance) so you can execute them
> > locally if you need to.
> >
> > ---
> >
> > On a related note, I am also working on the speed.python.org
> >  server to provide more automation and
> > ideally some integrations with GitHub to detect performance regressions.
> For
> > now, I have done the following:
> >
> > * Recompute benchmarks for all branches using the same version of
> > pyperformance (except master) so they can
> >be compared with each other. This can only be seen in the "Comparison"
> > tab: https://speed.python.org/comparison/
> > * I am setting daily builds of the master branch so we can detect
> performance
> > regressions with daily granularity. These
> >daily builds will be located in the "Changes" and "Timeline" tabs
> > (https://speed.python.org/timeline/).
> > * Once the daily builds are working as expected, I plan to work on
> trying to
> > automatically comment or PRs or on bpo if
> > we detect that a commit has introduced some notable performance
> regression.
> >
> > Regards from sunny London,
> > Pablo Galindo Salgado.
> >
> > ___
> > python-committers mailing list -- python-committ...@python.org
> > To unsubscribe send an email to python-committers-le...@python.org
> > https://mail.python.org/mailman3/lists/python-committers.python.org/
> > Message archived at
> https://mail.python.org/archives/list/python-committ...@python.org/message/G3LB4BCAY7T7WG22YQJNQ64XA4BXBCT4/
> > Code of Conduct: https://www.python.org/psf/codeofconduct/
> >
>
> --
> Marc-Andre Lemburg
> eGenix.com
>
> Professional Python Services directly from the Experts (#1, Oct 14 2020)
> >>> Python Projects, Coaching and Support ...https://www.egenix.com/
> >>> Python Product Development ...https://consulting.egenix.com/
>

[Python-Dev] Re: [python-committers] Performance benchmarks for 3.9

> The performance figures in the Python 3.9 "What's New"

Those are also micro-benchmarks, which can have no effect at all on
macro-benchmarks. The ones I am
linking are almost all macro-benchmarks, so, unfortunately, the ones
in Python 3.9 "What's New" are
not lying and they seem to be correlated to the same issue.

Also although they are not incorrect, those benchmarks in the Python 3.9
"What's New"  were not executed with LTO/PGO/CPU isolation...etc so I would
kindly suggest taking the ones in the speed.python.org as the canonical
ones if they start
to differ in any way.

Pablo

On Wed, 14 Oct 2020 at 14:25, Paul Moore  wrote:

> The performance figures in the Python 3.9 "What's New" (here -
> https://docs.python.org/3/whatsnew/3.9.html#optimizations) did look
> oddly like a lot of things went slower, to me. I assumed I'd misread
> the figures, and moved on, but maybe I was wrong to do so...
>
> Paul
>
> On Wed, 14 Oct 2020 at 14:17, Pablo Galindo Salgado 
> wrote:
> >
> > Hi!
> >
> > I have updated the branch benchmarks in the pyperformance server and now
> they include 3.9. There are
> > some benchmarks that are faster but on the other hand some benchmarks
> are substantially slower, pointing
> > at a possible performance regression in 3.9 in some aspects. In
> particular some tests like "unpack sequence" are
> > almost 20% slower. As there are some other tests were 3.9 is faster, is
> not fair to conclude that 3.9 is slower, but
> > this is something we should look into in my opinion.
> >
> > You can check these benchmarks I am talking about by:
> >
> > * Go here: https://speed.python.org/comparison/
> > * In the left bar, select "lto-pgo latest in branch '3.9'" and "lto-pgo
> latest in branch '3.8'"
> > * To better read the plot, I would recommend to select a "Normalization"
> to the 3.8 branch (this is in the top part of the page)
> >and to check the "horizontal" checkbox.
> >
> > These benchmarks are very stable: I have executed them several times
> over the weekend yielding the same results and,
> > more importantly, they are being executed on a server specially prepared
> to running reproducible benchmarks: CPU affinity,
> > CPU isolation, CPU pinning for NUMA nodes, CPU frequency is fixed, CPU
> governor set to performance mode, IRQ affinity is
> > disabled for the benchmarking CPU nodes...etc so you can trust these
> numbers.
> >
> > I kindly suggest for everyone interested in trying to improve the 3.9
> (and master) performance, to review these benchmarks
> > and try to identify the problems and fix them or to find what changes
> introduced the regressions in the first place. All benchmarks
> > are the ones being executed by the pyperformance suite (
> https://github.com/python/pyperformance) so you can execute them
> > locally if you need to.
> >
> > ---
> >
> > On a related note, I am also working on the speed.python.org server to
> provide more automation and
> > ideally some integrations with GitHub to detect performance regressions.
> For now, I have done the following:
> >
> > * Recompute benchmarks for all branches using the same version of
> pyperformance (except master) so they can
> >be compared with each other. This can only be seen in the
> "Comparison" tab: https://speed.python.org/comparison/
> > * I am setting daily builds of the master branch so we can detect
> performance regressions with daily granularity. These
> >daily builds will be located in the "Changes" and "Timeline" tabs (
> https://speed.python.org/timeline/).
> > * Once the daily builds are working as expected, I plan to work on
> trying to automatically comment or PRs or on bpo if
> > we detect that a commit has introduced some notable performance
> regression.
> >
> > Regards from sunny London,
> > Pablo Galindo Salgado.
> > ___
> > python-committers mailing list -- python-committ...@python.org
> > To unsubscribe send an email to python-committers-le...@python.org
> > https://mail.python.org/mailman3/lists/python-committers.python.org/
> > Message archived at
> https://mail.python.org/archives/list/python-committ...@python.org/message/G3LB4BCAY7T7WG22YQJNQ64XA4BXBCT4/
> > Code of Conduct: https://www.python.org/psf/codeofconduct/
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/4OG362VITFQMDLZRWVHMEAQQIIAX2KOT/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: [python-committers] Performance benchmarks for 3.9

2020-10-14 Thread Paul Moore

The performance figures in the Python 3.9 "What's New" (here -
https://docs.python.org/3/whatsnew/3.9.html#optimizations) did look
oddly like a lot of things went slower, to me. I assumed I'd misread
the figures, and moved on, but maybe I was wrong to do so...

Paul

On Wed, 14 Oct 2020 at 14:17, Pablo Galindo Salgado  wrote:
>
> Hi!
>
> I have updated the branch benchmarks in the pyperformance server and now they 
> include 3.9. There are
> some benchmarks that are faster but on the other hand some benchmarks are 
> substantially slower, pointing
> at a possible performance regression in 3.9 in some aspects. In particular 
> some tests like "unpack sequence" are
> almost 20% slower. As there are some other tests were 3.9 is faster, is not 
> fair to conclude that 3.9 is slower, but
> this is something we should look into in my opinion.
>
> You can check these benchmarks I am talking about by:
>
> * Go here: https://speed.python.org/comparison/
> * In the left bar, select "lto-pgo latest in branch '3.9'" and "lto-pgo 
> latest in branch '3.8'"
> * To better read the plot, I would recommend to select a "Normalization" to 
> the 3.8 branch (this is in the top part of the page)
>and to check the "horizontal" checkbox.
>
> These benchmarks are very stable: I have executed them several times over the 
> weekend yielding the same results and,
> more importantly, they are being executed on a server specially prepared to 
> running reproducible benchmarks: CPU affinity,
> CPU isolation, CPU pinning for NUMA nodes, CPU frequency is fixed, CPU 
> governor set to performance mode, IRQ affinity is
> disabled for the benchmarking CPU nodes...etc so you can trust these numbers.
>
> I kindly suggest for everyone interested in trying to improve the 3.9 (and 
> master) performance, to review these benchmarks
> and try to identify the problems and fix them or to find what changes 
> introduced the regressions in the first place. All benchmarks
> are the ones being executed by the pyperformance suite 
> (https://github.com/python/pyperformance) so you can execute them
> locally if you need to.
>
> ---
>
> On a related note, I am also working on the speed.python.org server to 
> provide more automation and
> ideally some integrations with GitHub to detect performance regressions. For 
> now, I have done the following:
>
> * Recompute benchmarks for all branches using the same version of 
> pyperformance (except master) so they can
>be compared with each other. This can only be seen in the "Comparison" 
> tab: https://speed.python.org/comparison/
> * I am setting daily builds of the master branch so we can detect performance 
> regressions with daily granularity. These
>daily builds will be located in the "Changes" and "Timeline" tabs 
> (https://speed.python.org/timeline/).
> * Once the daily builds are working as expected, I plan to work on trying to 
> automatically comment or PRs or on bpo if
> we detect that a commit has introduced some notable performance regression.
>
> Regards from sunny London,
> Pablo Galindo Salgado.
> ___
> python-committers mailing list -- python-committ...@python.org
> To unsubscribe send an email to python-committers-le...@python.org
> https://mail.python.org/mailman3/lists/python-committers.python.org/
> Message archived at 
> https://mail.python.org/archives/list/python-committ...@python.org/message/G3LB4BCAY7T7WG22YQJNQ64XA4BXBCT4/
> Code of Conduct: https://www.python.org/psf/codeofconduct/
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/XHRZO6MFHFJETR54TSIXBMLFDJOXS3Z4/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Performance benchmarks for 3.9