[Python-Dev] Re: Changing Python's string search algorithms

2020-10-14 Thread Tim Peters
[Guido] > The key seems to be: Except none of that quoted text (which I'll skip repeating) gives the slightest clue as to _why_ it may be an improvement. So you split the needle into two pieces. So what? What's the _point_? Why would someone even imagine that might help? Why is one half then

[Python-Dev] Re: Changing Python's string search algorithms

2020-10-14 Thread Brett Cannon
On Wed., Oct. 14, 2020, 17:37 Tim Peters, wrote: > [Steven D'Aprano ] > > Perhaps this is a silly suggestion, but could we offer this as an > > external function in the stdlib rather than a string method? > > > > Leave it up to the user to decide whether or not their data best suits > > the find

[Python-Dev] Re: Changing Python's string search algorithms

2020-10-14 Thread Guido van Rossum
On Wed, Oct 14, 2020 at 9:56 AM Tim Peters wrote: > [Guido] > > Maybe someone reading this can finish the Wikipedia page on > > Two-Way Search? The code example trails off with a function with > > some incomprehensible remarks and then a TODO.. > > Yes, the Wikipedia page is worse than useless

[Python-Dev] Re: Changing Python's string search algorithms

2020-10-14 Thread Chris Angelico
On Thu, Oct 15, 2020 at 11:38 AM Tim Peters wrote: > I think this is premature. There is almost never an optimization > that's a pure win in all cases. For example, on some platforms > `timsort` will never be as fast as the old samplesort in cases with a > very large number of equal elements, and

[Python-Dev] Re: Changing Python's string search algorithms

2020-10-14 Thread Tim Peters
[Steven D'Aprano ] > Perhaps this is a silly suggestion, but could we offer this as an > external function in the stdlib rather than a string method? > > Leave it up to the user to decide whether or not their data best suits > the find method or the new search function. It sounds like we can offer

[Python-Dev] Re: Changing Python's string search algorithms

2020-10-14 Thread David Mertz
On Wed, Oct 14, 2020 at 7:45 PM Steven D'Aprano wrote: > Perhaps this is a silly suggestion, but could we offer this as an > external function in the stdlib rather than a string method? > That feels unworkable to me. For one thing, the 'in' operator hits this same issue, doesn't it? But for

[Python-Dev] Re: [python-committers] Performance benchmarks for 3.9

2020-10-14 Thread Oscar Benjamin
On Wed, 14 Oct 2020 at 19:12, Ivan Pozdeev via Python-Dev wrote: > > > On 14.10.2020 17:04, M.-A. Lemburg wrote: > > On 14.10.2020 16:00, Pablo Galindo Salgado wrote: > >>> Would it be possible to get the data for older runs back, so that > >> it's easier to find the changes which caused the

[Python-Dev] Re: Changing Python's string search algorithms

2020-10-14 Thread Steven D'Aprano
Perhaps this is a silly suggestion, but could we offer this as an external function in the stdlib rather than a string method? Leave it up to the user to decide whether or not their data best suits the find method or the new search function. It sounds like we can offer some rough heuristics,

[Python-Dev] Re: Remove module's __version__ attributes in the stdlib

2020-10-14 Thread Batuhan Taskaya
I've indexed a vast majority of the files from top 4K pypi packages to this system, and here are the results about __version__ usage on argparse, cgi, csv, decimal, imaplib, ipaddress, optparse, pickle, platform, re, smtpd, socketserver, tabnanny (result of an quick grep)

[Python-Dev] Re: Remove module's __version__ attributes in the stdlib

2020-10-14 Thread Neil Schemenauer
On 2020-10-14, Serhiy Storchaka wrote: > I propose to remove __version__ in all stdlib modules. Are there any > exceptions? I agree that these kinds of meta attributes are not useful and it would be nice to clean them up. However, IMHO, maybe the cleanup is not worth breaking Python programs.

[Python-Dev] Re: Performance benchmarks for 3.9

2020-10-14 Thread Terry Reedy
On 10/14/2020 9:16 AM, Pablo Galindo Salgado wrote: You can check these benchmarks I am talking about by: * Go here: https://speed.python.org/comparison/ * In the left bar, select "lto-pgo latest in branch '3.9'" and "lto-pgo latest in branch '3.8'" At the moment, there are only results for

[Python-Dev] Re: [python-committers] Performance benchmarks for 3.9

2020-10-14 Thread Ivan Pozdeev via Python-Dev
On 14.10.2020 17:04, M.-A. Lemburg wrote: On 14.10.2020 16:00, Pablo Galindo Salgado wrote:  Would it be possible to get the data for older runs back, so that it's easier to find the changes which caused the slowdown ? Unfortunately no. The reasons are that that data was misleading because

[Python-Dev] Re: [python-committers] Re: Performance benchmarks for 3.9

2020-10-14 Thread Pablo Galindo Salgado
> Would it be possible instead to run git-bisect for only a _particular_ benchmark? It seems that may be all that’s needed to track down particular regressions. Also, if e.g. git-bisect is used it wouldn’t be every e.g. 10th revision but rather O(log(n)) revisions. That only works if there is a

[Python-Dev] Re: [python-committers] Re: Performance benchmarks for 3.9

2020-10-14 Thread Chris Jerdonek
MOn Wed, Oct 14, 2020 at 8:03 AM Pablo Galindo Salgado wrote: > > Would it be possible rerun the tests with the current > setup for say the last 1000 revisions or perhaps a subset of these > (e.g. every 10th revision) to try to binary search for the revision which > introduced the change ? > >

[Python-Dev] Re: Remove module's __version__ attributes in the stdlib

2020-10-14 Thread Brett Cannon
I think if the project is not maintained externally and thus synced into the stdlib we can drop the attributes. On Wed, Oct 14, 2020 at 8:44 AM Guido van Rossum wrote: > None of these have seen much adoption, so I think we can lose them without > dire consequences. The info should be moved into

[Python-Dev] Re: Changing Python's string search algorithms

2020-10-14 Thread Tal Einat
On Wed, Oct 14, 2020 at 7:57 PM Tim Peters wrote: > > [Guido] > > Maybe someone reading this can finish the Wikipedia page on > > Two-Way Search? The code example trails off with a function with > > some incomprehensible remarks and then a TODO.. > > Yes, the Wikipedia page is worse than useless

[Python-Dev] Re: Changing Python's string search algorithms

2020-10-14 Thread Tim Peters
[Guido] > Maybe someone reading this can finish the Wikipedia page on > Two-Way Search? The code example trails off with a function with > some incomprehensible remarks and then a TODO.. Yes, the Wikipedia page is worse than useless in its current state, although some of the references it lists

[Python-Dev] Re: Changing Python's string search algorithms

2020-10-14 Thread Guido van Rossum
Maybe someone reading this can finish the Wikipedia page on Two-Way Search? The code example trails off with a function with some incomprehensible remarks and then a TODO... On Wed, Oct 14, 2020 at 9:07 AM Tim Peters wrote: > Rest assured that Dennis is aware of that pragmatics may change for >

[Python-Dev] Re: [python-committers] Re: Performance benchmarks for 3.9

2020-10-14 Thread M.-A. Lemburg
On 14.10.2020 17:59, Antoine Pitrou wrote: > > Le 14/10/2020 à 17:25, M.-A. Lemburg a écrit : >> >> Well, there's a trend here: >> >> [...] >> >> Those two benchmarks were somewhat faster in Py3.7 and got slower in 3.8 >> and then again in 3.9, so this is more than just an artifact. > >

[Python-Dev] Re: Changing Python's string search algorithms

2020-10-14 Thread Tim Peters
Rest assured that Dennis is aware of that pragmatics may change for shorter needles. The code has always made a special-case of 1-character needles, because it's impossible "even in theory" to improve over straightforward brute force search then. Say the length of the text to search is `t`, and

[Python-Dev] Re: [python-committers] Re: Performance benchmarks for 3.9

2020-10-14 Thread Antoine Pitrou
Le 14/10/2020 à 17:25, M.-A. Lemburg a écrit : > > Well, there's a trend here: > > [...] > > Those two benchmarks were somewhat faster in Py3.7 and got slower in 3.8 > and then again in 3.9, so this is more than just an artifact. unpack-sequence is a micro-benchmark. It's useful if you want

[Python-Dev] Re: Remove module's __version__ attributes in the stdlib

2020-10-14 Thread Guido van Rossum
None of these have seen much adoption, so I think we can lose them without dire consequences. The info should be moved into a docstring or comment. On Wed, Oct 14, 2020 at 06:54 Serhiy Storchaka wrote: > Some module attributes in the stdlib have attribute __version__. It > makes sense if the

[Python-Dev] Re: Remove module's __version__ attributes in the stdlib

2020-10-14 Thread Victor Stinner
Hi, I was always confused by the __version__ variable of *some* modules. It's surprising since it's no longer incremented when the module is fixed or gets new features. Also, the number is unrelated to the Python version. I suggest to remove __version__. __author__, __credits__, __email__,

[Python-Dev] Re: [python-committers] Re: Performance benchmarks for 3.9

2020-10-14 Thread Victor Stinner
I suggest to limit to one "dot" per week, since CodeSpeed (the website to browse the benchmark results) is somehow limited to 50 dots (it can display more if you only display a single benchmark). Previously, it was closer to one "dot" per month which allowed to display a timeline over 5 years. In

[Python-Dev] Re: [python-committers] Re: Performance benchmarks for 3.9

2020-10-14 Thread M.-A. Lemburg
On 14.10.2020 16:14, Antoine Pitrou wrote: > Le 14/10/2020 à 15:16, Pablo Galindo Salgado a écrit : >> Hi! >> >> I have updated the branch benchmarks in the pyperformance server and now >> they include 3.9. There are >> some benchmarks that are faster but on the other hand some benchmarks >> are

[Python-Dev] Re: [python-committers] Re: Performance benchmarks for 3.9

2020-10-14 Thread Pablo Galindo Salgado
> I wouldn't worry about a small regression on a micro- or mini-benchmark while the overall picture is stable. Absolutely, I agree is not something to *worry* but I think it makes sense to investigate as the possible fix may be trivial. Part of the reason I wanted to recompute them was because

[Python-Dev] Re: [python-committers] Performance benchmarks for 3.9

2020-10-14 Thread Pablo Galindo Salgado
> Would it be possible rerun the tests with the current setup for say the last 1000 revisions or perhaps a subset of these (e.g. every 10th revision) to try to binary search for the revision which introduced the change ? Every run takes 1-2 h so doing 1000 would be certainly time-consuming :)

[Python-Dev] Re: [python-committers] Performance benchmarks for 3.9

2020-10-14 Thread Antoine Pitrou
Le 14/10/2020 à 15:16, Pablo Galindo Salgado a écrit : > Hi! > > I have updated the branch benchmarks in the pyperformance server and now > they include 3.9. There are > some benchmarks that are faster but on the other hand some benchmarks > are substantially slower, pointing > at a possible

[Python-Dev] Re: [python-committers] Performance benchmarks for 3.9

2020-10-14 Thread M.-A. Lemburg
On 14.10.2020 16:00, Pablo Galindo Salgado wrote: >> Would it be possible to get the data for older runs back, so that > it's easier to find the changes which caused the slowdown ? > > Unfortunately no. The reasons are that that data was misleading because > different points were computed with a

[Python-Dev] Remove module's __version__ attributes in the stdlib

2020-10-14 Thread Serhiy Storchaka
Some module attributes in the stdlib have attribute __version__. It makes sense if the module is developed independently from Python, but after inclusion in the stdlib it no longer have separate releases which should be identified by version. New changes goes into module usually without changing

[Python-Dev] Re: [python-committers] Performance benchmarks for 3.9

2020-10-14 Thread M.-A. Lemburg
Hi Pablo, thanks for pointing this out. Would it be possible to get the data for older runs back, so that it's easier to find the changes which caused the slowdown ? Going to the timeline, it seems that the system only has data for Oct 14 (today):

[Python-Dev] Re: [python-committers] Performance benchmarks for 3.9

2020-10-14 Thread Pablo Galindo Salgado
> Would it be possible to get the data for older runs back, so that it's easier to find the changes which caused the slowdown ? Unfortunately no. The reasons are that that data was misleading because different points were computed with a different version of pyperformance and therefore with

[Python-Dev] Re: [python-committers] Performance benchmarks for 3.9

2020-10-14 Thread Pablo Galindo Salgado
> The performance figures in the Python 3.9 "What's New" Those are also micro-benchmarks, which can have no effect at all on macro-benchmarks. The ones I am linking are almost all macro-benchmarks, so, unfortunately, the ones in Python 3.9 "What's New" are not lying and they seem to be correlated

[Python-Dev] Re: [python-committers] Performance benchmarks for 3.9

2020-10-14 Thread Paul Moore
The performance figures in the Python 3.9 "What's New" (here - https://docs.python.org/3/whatsnew/3.9.html#optimizations) did look oddly like a lot of things went slower, to me. I assumed I'd misread the figures, and moved on, but maybe I was wrong to do so... Paul On Wed, 14 Oct 2020 at 14:17,

[Python-Dev] Performance benchmarks for 3.9

2020-10-14 Thread Pablo Galindo Salgado
Hi! I have updated the branch benchmarks in the pyperformance server and now they include 3.9. There are some benchmarks that are faster but on the other hand some benchmarks are substantially slower, pointing at a possible performance regression in 3.9 in some aspects. In particular some tests