[Python-Dev] Re: Python Software Foundation Board of Directors Election 2022

2022-06-19 Thread Tim Peters
Heads up! I found my PSF Board voting info in my gmail spam folder
today; looks like it was mailed out this morning.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/UDX2JHQZHICWWWMUGD3LBH32ABHWVCZC/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Can I ask a real dumb procedural question about GitHub email?

2022-05-04 Thread Tim Peters
[Skip Montanaro ]
> I subscribe to the python/cpython stuff on GitHub. I find it basically
> impossible to follow because of the volume.
> ...
> How (if at all) do people deal with this firehose of email? Am I the
> only person dumb enough to have tried?

My observation is that, over time, all sources of email strive to
enlist the aid of every electron in the universe to transform all of
creation into more copies of its output, directed to my gmail account.

I gave up. These days, if Google doesn't put something in the
"priority" section of my gmail inbox, chances are the only thing I'll
see of it is a fraction-of-a-second glance at the subject line as I
check the "select all" box, on page after page, before deleting it.
That gets done every day almost as often as clicking on "report spam
and unsubscribe".

I fondly remember the days when email volume was merely unbearable ;-)
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/FE7522WPE6Q3W55KNNFLVZXR6G3CWCNH/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Is anyone using 15-bit PyLong digits (PYLONG_BITS_IN_DIGIT=15)?

2022-01-17 Thread Tim Peters
[Barry Scott and Steve Dower share tips for convincing Visual Studio
 to show assembler without recompiling the file]

Thanks, fellows! That mostly ;-) workedl. Problem remaining is that
breakpoints just didn't work. They showed up "visually", and in the table
of set breakpoints, but code went whizzing right by them every time.

I didn't investigate. It's possible, e.g., that the connection between C
source and the generated PGO code was so obscure that VS just gave up - or
just blew it.

Instead I wrote a Python loop to run a division of interest "forever". That
way I hoped I'd be likely to land in the loop of interest by luck when I
broke into the process.

Which worked! So here's the body of the main loop:

7FFE451D2760  mov eax,dword ptr [rcx-4]
7FFE451D2763  lea rcx,[rcx-4]
7FFE451D2767  shl r9,1Eh
7FFE451D276B  or  r9,rax
7FFE451D276E  cmp r8,0Ah
7FFE451D2772  jne long_div+25Bh (07FFE451D27CBh)
7FFE451D2774  mov rax,rdi
7FFE451D2777  mul rax,r9
7FFE451D277A  mov rax,rdx
7FFE451D277D  shr rax,3
7FFE451D2781  mov dword ptr [r10+rcx],eax
7FFE451D2785  mov eax,eax
7FFE451D2787  imulrax,r8
7FFE451D278B  sub r9,rax
7FFE451D278E  sub rbx,1
7FFE451D2792  jns long_div+1F0h (07FFE451D2760h)

And above the loop is this line, which you'll recognize as loading the same
scaled reciprocal of 10 as the gcc code Mark posted earlier. The code above
moves %rdi into %rax before the mul instruction:

7FFE451D2747  mov rdi,0CCCDh

Note an odd decision here:the MS code compares the divisor to 10 on _every_
iteration. There are not two, "10 or not 10?", loop; bodies. Instead, if
the divisor isn't 10, "jne long_div+25Bh" jumps to code not shown here, a
few instructions that use hardware division, and then jump back into the
tail end of the loop above to finish computing the remainder (etc).

So they not only optimized division by 10, they added a useless test and
two branches to every iteration of the loop when we're not dividing by 10
;-)
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/DGRMIVHOIAYQEQJD32ED4MCBOPFE5SIT/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Is anyone using 15-bit PyLong digits (PYLONG_BITS_IN_DIGIT=15)?

2022-01-16 Thread Tim Peters
[Guido]
> I don't think there's a way to do a PGO build from Visual Studio; but
> a command prompt in the repo can do it using `PCbuild\build.bat --pgo`.
> Just be patient with it.

Thanks! That worked, and was easy, and gave me an executable that runs
"// 10" at supernatural speed.

Alas, Visual Studio will not show a disassembly window unless the
debugger is running, and there appears to be no way to convince it to
run its debugger without it first recompiling the source file you're
staring at. Which it recomplies without benefit of PGO.

So, in the end, it was an elaborate way to reproduce the ;non-PGO
optimized machine code I already saw :-)
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/D47HHYYASRHS56AZL6SEK4Y7K5FNSJOP/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Is anyone using 15-bit PyLong digits (PYLONG_BITS_IN_DIGIT=15)?

2022-01-16 Thread Tim Peters
[Tim, incidentally notes that passing 10 as the divisor to inplace_divrem1()
 is "impossibly fast" on Windows, consuming less than a third the time as
 when passing seemingly any other divisor]

[Mark Dickinson, discovers much the same is true under other, but not all,
 Linux-y builds, due to the generated code doing a runtime(!) check for
 10 and using a scaled integer reciprocal multiplication trick, instead of
 HW division, in that specific case]

And there's a picture of this in the dictionary, next to the "dubious
optimization" entry ;-)

I have to believe the same is true under Visual Studio 2019, but
offhand don't know how to prove that. I understand Steve uses PGO to
build the python.org Windows release, but I've never done that - the
"Release build" configuration I get from Github does not use PGO, and
the code I get from my own "release builds" is equally slow for all
divisors (and the generated assembler is obviously not trying to
special-case anything).

You later gave PGO the credit, which made sense from the start. But I
_assumed_ PGO was latching on to // 10 because all sorts of programs
use that all the time simply as a consequence of converting Python
integers to strings. But that's not it, The base-conversion code
doesn't call inplace_divrem1() - it does its own division loops with
compile-time constants.

So, as you already said ;-), it appears to be mostly an accident due
to the specific tests we feed to PGO builds.

More interesting to me now is the "over 3x faster" empirical
observation. The paper I referenced earlier has a cheaper way to use
scaled integer reciprocals, requiring nothing worse than 32x32->64 int
mult, and, to compute the fudged reciprocal, nothing worse than
64x32-:>32 int divide.

OTOH, libdivide already has working code ready to use, and - unlike
the paper I referenced - does not require "normalizing" the divisor
(shifting left to ensure it's >= base/2).

libdivide needs more cycles to compute its 64-bit reciprocal, but in
the context of bigints that's amortized over the number of digits in
the numerator. A win is a win ;-)
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/NEUNFZU3TQU4CPTYZNF3WCN7DOJBBTK5/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Is anyone using 15-bit PyLong digits (PYLONG_BITS_IN_DIGIT=15)?

2022-01-15 Thread Tim Peters
[Gregory P. Smith ]
> ...
> That only appears true in default boring -O2 builds.  Use
> `./configure --enable-optimizations` and the C version is *much* faster
> than your asm one...
>
> 250ns for C vs 370ns for your asm divl one using old gcc 9.3 on my
> zen3 when compiled using --enable-optimizations.

Something is missing here, but can't guess what without seeing the
generated machine code.But I trust Mark will do that.

> tested using ` -m timeit -n 150 -s 'x = 10**1000; r=x//10; assert r == 
> 10**999, r' 'x//10' `
>
> Use of __asm__ appears to have prevented the compiler from being able to fully
> optimize that code in PGO/FDO mode.
>
> I trust the compiler toolchain to know what's close to best here.

Mark is extremely experienced in numeric programming, and is the most
capable numerical analyst contributing to CPython. That's why he
trusts nothing ;-)


> ...  there's probably interesting ways to optimize bignum division using
> opencl and vector hardware as well - for the case when you know you've
> got over a dozen digits; but that's what numpy et. al. are for.

numpy has no support for bigints, beyond allowing the use of PyObject*
in numpy arrays (dtype='object') .In which case _all_ the bigint math
is performed by the hosting Python implementation.

People dead serious about bigint speed use the GMP library, which is a
massive amount of code and 100% devoted to peak speed. In Python,
gmpy2 supplies Python bindings for the GMP/MPIR, MPFR, and MPC
libraries.

> Bignums in python are a convenience. Most values normal code deals
> with are less than 2**100.

There's very little that "most normal code" uses. Not bigints, not
HTML, not regexps, not sockets, on & on & on..Whichever bubbles you
live in are nevertheless bubbles ;-)

The speed of + - * // impose fundamental limits on the speed of almost
all bigint algorithms. A reasonably quick win for bigint // would be
most welcome.

Anyone dead serious about working with bigints in Python "should"
install gmpy2. But it's still fun to prototype work in plain old
Python.

Note: I added an implementation of Karatsuba multiplication to CPython
two decades ago (which gives a major O() improvement over "schoolbook"
multiplication). I've often said that was a mistake,which I wouldn't
make again. Because it added relative mountains of new code to
longobject.c, and people who need fast bigint * should _still_ look to
GMP anyway (which implements several additional, even more gonzo, *
algorithms).

But neither do I want to remove it. It works fine, and the speed boost
is welcome (in my bubbles, multi-thousand bit ints are common).

What Mark is looking at here has a _very_ much more limited scope: the
two (I think) places in longobject.c that are dividing two native-size
machine ints in time-critical loops. "But nobody divides big ints
anyway - why would they?". For example, if you're investigating crypto
schemes, modular exponentiation with multi-thousand bit ints can be
common, and under the covers, CPython uses the same code for both //
and %. One modular exponentiation has to do a * and % for each bit
position in the exponent, plus more proportional to the number  of
bits set in the exponent.

About trusting "the toolchain", here under Win84 on a capable-enough
desktop box (Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz, 3601 Mhz, 4
Core(s), 8 Logical Processor(s), using the released 3.10.1:

$ python --version
Python 3.10.1

$ python -m timeit -n 150 -s "x = 10**1000" "x//10"
150 loops, best of 5: 376 nsec per loop

Which actually makes little sense to me. 10**1000 requires 111 CPython
"digits", which is the number of times the loop in `inplace_divrem1()`
has to go around. Under 4 nsec per iteration seems close to impossibly
fast on a 3.8GHz box, given the presence of any division instruction.

However, dividing by 10 is not a worst case on this box. Dividing by
100 is over 3x slower:

$ python -m timeit -n 150 -s "x = 10**1000" "x//100"
150 loops, best of 5: 1.25 usec per loop

Dividing by the largest single Python "digit" is apparently the same:

$ python -m timeit -n 150 -s "x = 10**1000; d=2**30-1" "x//d"
150 loops, best of 5: 1.29 usec per loop

In fact, much the same story for dividing by 1, 2, 3. 4, 5, 6, 7, 8,
9, and 11. Got bored then ;-) They're _all_ much slower than dividing
by 10 in this case.

Trust nothing :-)
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/GLKIVXMJMVH35SFIR455O6SURNOTZCKD/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Is anyone using 15-bit PyLong digits (PYLONG_BITS_IN_DIGIT=15)?

2022-01-14 Thread Tim Peters
]Mark Dickinson ]
>> Division may still be problematic.

Heh. I'm half convinced that heavy duty bigint packages are so often
written in assembler because their authors are driven insane by trying
to trick C compilers into generating "the obvious" machine
instructions needed.

An alternative to HW division is to multiply by a scaled integer
reciprocal instead, but it's very delicate to get all cases right. GMP
heavyweights Niels Moller and Torbjorn Granlund wrote up the most
efficient I've seen of this ilk in their paper "Improved division by
invariant integers".

It requires a single x single -> double multiply, and another but only
the low bits from that one, to get the right quotient and remainder.

Ironically, it would run faster if CPython used all 32 bits of its
internal digits - some operations in their algorithm are expected to
overflow at times (including + and -!), andi it's crucial that they be
done modulo the base in use. That happens "by magic" if the base
matches the unsigned type's width.

They only need a single-width scaled reciprocal, which can be computed
with a HW double / single -> single divide if available, or via
excruciating division-free scaled integer Newton refinement. Of course
the cost of computing the reciprocal once pays off each time the same
denominator is used.

Curiously,

"""
It is curious that on these processors, the combination of our
reciprocal algorithm (Alg. 2) and division algorithm (Alg. 4) is
significantly faster than the built in assembler instruction
for 2/1 division.
"""

HW division may have gotten faster since then, though.

> On that note: Python divisions are somewhat crippled even on x64. Assuming
> 30-bit digits, the basic building block that's needed for multi-precision 
> division
> is a 64-bit-by-32-bit unsigned integer division, emitting a 32-bit quotient 
> (and
> ideally also a 32-bit remainder).

Which is an instance of what they mean by "2/1 division" above.

> ...
> IOW, forcing use of DIVL instead of DIVQ, in combination with getting the 
> remainder
> directly from the DIV instruction instead of computing it separately, gives a 
> 41%
> speedup in this particular worst case. I'd expect the effect to be even more 
> marked
> on x86, but haven't yet done those timings.

> ...
> I don't know whether we really want to open the door to using inline assembly
> for performance reasons in longobject.c, but it's interesting to see the 
> effect.

Indeed it is! But changing the problem to use multiplication instead
may be both faster and more portable, albeit at the cost of subtler
code.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/QGCHH6B7HCQRARPPVELVROO6FMWQY3G4/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Is anyone using 15-bit PyLong digits (PYLONG_BITS_IN_DIGIT=15)?

2021-12-30 Thread Tim Peters
>> The reason for digits being a multiple of 5 bits should be revisited vs
>> its original intent

> I added that. The only intent was to make it easier to implement
> bigint exponentiation easily ...

That said, I see the comments in longintrepr.h note a stronger constraint:

"""
the marshal code currently expects that PyLong_SHIFT is a multiple of 15
"""

But that's doubtless also shallow.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/YSNDNGB2QUBS3ZX2JZNFZCDITRTDGBEM/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Is anyone using 15-bit PyLong digits (PYLONG_BITS_IN_DIGIT=15)?

2021-12-30 Thread Tim Peters
[Gregory P. Smith ]
> The reason for digits being a multiple of 5 bits should be revisited vs
> its original intent

I added that. The only intent was to make it easier to implement
bigint exponentiation easily while viewing the exponent as being in
base 32 (so as to chew up 5 bits at a time).. Since the number of
digit bits _was_ 15 at the time, and it was 'obvious enough" that 30
bits would work fine when 64-bit boxes became important enough to care
about, it was easy to settle on "multiple of 5" as a quick, pragmatic
tradeoff. The number controls how much space and time is needed to
precompute a lookup table for a "large" (many bits) power - and powers
can be very large indeed in the modular exponentiation case.

I doubt Python will move beyond 30-bit digits. While 64-bit support
became increasingly and steadily more widespread (both SW and HW) in
the 32-bit world. I see very much less enthusiasm to move beyond 64
bits. Yes, 64x64 -> 128 multiply is spreading, but that's about it.
Some applications can make very good use of that, but, in the grand
scheme of things, they remain niche. Whole universes of apps are
getting real benefit by moving from 32 to 64 bits. That's needed just
to get integers big enough to represent physical memory addresses in
many consumer desktops now.  But we'll likely never need to go beyond
64 bits for that.

In any case, "multiple of 5" is a shallow constraint. and getting rid
of it shouldn't require more than rewriting longobject.c's
long_pow()'s

for (j = PyLong_SHIFT - 5; j >= 0; j -= 5) {

loop in a more convoluted way to cross digit boundaries when trying to
suck up "the next" 5 bits.

In fact, there are good reasons to make that loop more convoluted
anyway, but I never got around to it (it "should" instead be aiming to
find "the next" 5-bit slice that ends with a 1-bit, which would allow
cutting the aforementioned precomputed table in half).
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/CP3EH5EWVDJKN63XAOMDCOL4OTWUF2JY/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Having Sorted Containers in stdlib?

2021-11-13 Thread Tim Peters
I started writing up a SortedDict use case I have, but it's very
elaborate and I expect it would just end with endless pointless
argument about other approaches I _could_ take. But I already know all
those ;-)

So let's look at something conceptually "dead easy" instead: priority
queues. They're a basic building block in algorithms from many fields.

In the distributed Python, `heapq` is the only thing that comes close
to being reasonably efficient and scalable for that.

However:

- It only supports min-heaps. It's not that max-heaps aren't also
useful (indeed, _under the covers_ CPython also implements max-heaps
for its own internal use). It's more that the original API simply
doesn't generalize cleanly.

- So there's a body of word-of-mouth "tricks" for making a min-heap
"act like" a max-heap. For integer or float priorities, it suffices to
use the negation of "the real" priority. For other kinds of values, it
gets trickier. In the end, you can wrap each item in a class that
implements `x.__lt__(y)` by doing `y.value < x.value`. But few people
appear to be aware of that, while nobody wants to endure the bother
;-)

- It's often the case that "the priority" needs to be computed from an
item's value. `heapq` has no direct support for that. Instead there's
another body of word-of-mouth tricks akin to the old
decorate-sort-undecorate approach sorting used. Rather than store the
values in the heap, you store `(priority, value)` tuples. But then
various unpleasant things can happen if two priorities are tied: in
that case tuple comparison falls back to comparing the values. But,
e.g., value comparison may be very expensive, or the values may not
support __lt__ at all. So instead you store `(priority,
unique_integer, value)` triples. And hope that you don't really need a
max-heap, lest it get even more obscure.

Using SortedContainers, none of those are issues. It's all
straightforward and obvious. If you have a list L always sorted by
priority, extracting from a "min queue" just requires L.pop(0), or
from a "max queue" L.pop(-1) (or L.pop() - -1 is the default, same as
for the Python list.pop()). No tricks are needed. Fine too if
sometimes you want to extract the min, while at other times the max.
Or, e.g., peek at the 5 highest-priority values and pick the one that
best fits resources available at the time.

In cases of computed priorities, the SortedKeyList constructor allows
specifying a function to be used to compute the keys used to determine
the list's ordering. Again straightforward and obvious.

from sortedcontainers import SortedKeyList
L = SortedKeyList([(i, j) for i in range(1, 5)
  for j in range(1, 5)],
  key=lambda t: t[0]/t[1])

>>> [t for t in L]
[(1, 4), (1, 3), (1, 2), (2, 4), (2, 3), (3, 4), (1, 1),
 (2, 2), (3, 3), (4, 4), (4, 3), (3, 2), (2, 1), (4, 2),
  (3, 1), (4, 1)]

That said, in most of my code I end up _removing_ uses of
SortedContainers, in favor of faster ways of getting the job done. The
package isn't the last resort for me, it's the first. I can write code
in straightforward ways that scale well from the start. If I need more
speed later, fine, I can puzzle out faster ways to get the task done.
If you're proud of that you _start_ by plotting ways to minimize the
number of sorts you do, you're coding in C++, not Python ;-)

So I suggest people are staring at the wrong end when asking for use
cases that can't be done without the package. Those are necessarily
non-trivial. Having a sorted collection is "more than good enough" for
many tasks. As above, e.g., there's no reason to imagine that Grant
Jenks had priority queues in mind at all, yet the flavors of sorted
lists he implemented are very easy to use for that task.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/XXAXVV3KJ4ORQ7K6ELSS4R7S5725ASRE/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Having Sorted Containers in stdlib?

2021-11-11 Thread Tim Peters
[Christopher Barker ]
> Earlier in the thread, we were pointed to multiple implementations.
>
> Is this particular one clearly the “best”[*]?
>
> If so, then sure.
>
> -CHB
>
> [*] best meaning “most appropriate for the stdlib”. A couple folks have
> already pointed to the quality of the code. But my understanding is that
> different algorithms are more or less appropriate for different use
> cases. So is this one fairly “universal”?

As noted earlier, SortedContainers is downloaded from PyPI hundreds of
thousands of times every day, and that's been so for a long time..
Best I can tell, no similar package is in the same universe as that.

The author identified over a dozen potential competitors here[1], but
most didn't look like serious projects to him. He ran extensive timing
tests against the rest, reported there too. HIs package usually wins,
and is at worst competitive. It usually wins on memory burden too.

Some of these things are people intrigued by "a data structure" they
read about and implement for fun and self-education. That sometimes
shines through in data-structure-specific APIs. While the
SortedContainers extensive docs explain how things are implemented (if
you look in the right spots for that), the API is remarkably agnostic
(& useful)). For example, this is a method of SortedList:

irange(minimum=None, maximum=None, inclusive=(True, True), reverse=False)

which returns an iterator over a contiguous sorted range of values
(neither, either, or both of whose endpoints may be included,
depending on what you pass for `inclusive` This "makes sense"
regardless of how the structure may be implemented. Of course
SortedSet and SortedDict support `irange()` too.

[1] http://www.grantjenks.com/docs/sortedcontainers/performance.html
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/FD3DSXZUBDUC3PFQH2SS47NIZ36TTOL6/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Having Sorted Containers in stdlib?

2021-11-10 Thread Tim Peters
[Bob Fang ]
> This is a modest proposal to consider having sorted containers
> (http://www.grantjenks.com/docs/sortedcontainers/) in standard library.

+1 from me, but if and only if Grant Jenks (its author) wants that too.

It's first-rate code in all respects, including that it's a fine
example _of_ Python programming (it's not written in C - in Python).

People often say "well, put a thing on PyPI first, and see whether
people really want it!".

SortedContainers has been on PyPI for years straight now, and still
gets hundreds of thousands of downloads from there every day:

https://pypistats.org/packages/sortedcontainers

So it's already widely adopted in the community. What else would it
need to prove?

If it still doesn't qualify, "well, put a thing on PyPI first" is just
obstructionism ;-)
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/SE6YOVJQI6OEWNM5SAQL3X5VPEFGVZAA/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Having Sorted Containers in stdlib?

2021-11-10 Thread Tim Peters
[Christopher Barker ]
> Maybe a stupid question:
>
> What are use cases for sorted dicts?
>
> I don’t think I’ve ever needed one.

For example, for some mappings with totally ordered keys, it can be
useful to ask for the value associated with a key that's not actually
there, because "close to the key is good enough".

A SortedDict supports several ways of doing that efficiently,
depending on what - exactly - an app means by "good enough". For
example, it's easy to find _the_ closest key. Or the k closest keys.
Or the two single keys <= and >=. Or ... Pick one; take some form of
average of their values; use a domain-specific interpolation function
on their keys & values; ...

To make it concrete, suppose you're running a COVID-19 app where
reports of new cases are coming in around the clock. One SortedDict
may be used to map latitude to the number of confirmed cases reported.
If a researcher asks for the number at a specific latitude, but no
exact match is found, fine, you can still efficiently show them the
results across the smallest known interval containing the latitude of
interest. Or if they want the results broken into buckets of latitudes
in 15-degree (N-degree - any N they ask for) intervals, also easy:
enumerating all entries in a given key range also goes fast. Or ...

A sorted list supports all sorts of membership queries efficiently.
Under the covers, a SortedDict _is_ a SortedList, plus a regular
Python dict to associate values with the list's elements.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/FVDQZ4FJMSPYYTKNDRGLUXMOE226YIDK/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: containment and the empty container

2021-11-08 Thread Tim Peters
[Ethan Furman ]
> When is an empty container contained by a non-empty container?

That depends on how the non-empty container's type defines
__contains__. The "stringish" types (str, byte, bytearray) work _very_
differently from others (list, set, tuple) in this respect.

   t in x

for the latter answers whether t is contained in x, but in the former
case answers whether t is a _sub_sequence (contiguous) of x.

So, e.g.,

>>> "ab" in "cab"
True
>>> b"ab" in b"cab"
True
>>> bytearray(b"ab") in bytearray(b"cab")
True

while

>>> ['a', 'b'] in ['c', 'a', 'b']
False
>>> ('a', 'b') in ('c', 'a', 'b')
False

For every stringish type S, S() (the empty value of that type) is a
subsequence of every value of type S, and `in` reflects that truth.

> Personally, I have never had a use for '' in 'some string' being True, and
> always have to guard against that scenario.

Don't know whether or not any of my code relies on it, but am pretty
sure any relevant code would have already special-cased an empty
string out any code path that would have tried to apply 'in` to it.
Not particularly because it would be surprised by True, but because
there's nothing useful it could _do_ with an empty string regardless
(e.g., every slice of it would be equal to the original - an empty
string is an end case for any string algorithm).

> .. on the other hand, it seems that collections of related flags are often 
> treated
> as in set theory, where the empty set is a member of any non-empty set.

Not how any set theory I've ever seen works: a set S contains the
empty set if and only if some member of S _is_ the empty set.  Which
is also how Python's frozenset.__contains__ works (a plain Python set
can never contain the empty set, because only a frozenset can contain
a set (empty or not)).
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/SSIRKW7TR5RUTOXLFWVGI7NAIZOHQKGH/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: The list.sort(reverse=True) method not consistent with description

2021-10-30 Thread Tim Peters
[Raymond Bisdorff ]
> I fully agree with your point. By default, all the components of the
> tuple should be used in the comparison.
>
> Yet, I was confused by the following result.
>  >>> from operator import itemgetter
>  >>> L = [(1, 'a'), (2, 'b'), (1, 'c'), (2, 'd'), (3, 'e')]
>  >>> L.sort(key=itemgetter(0), reverse=True)
>  >>> L = [(3, 'e'), (2, 'd'), (2, 'b'), (1, 'c'), (1, 'a')]
>
> Should the tuples comparison is in this case, I thought, not be solely
> based on the first tuple component?

I'm not sure what you're doing there. The last line in your example
isn't showing the result of sorting, it's assigning a brand new list
to `L`' Here the same thing, but changing the last line to show the
result of sorting:

>>> from operator import itemgetter
>>> L = [(1, 'a'), (2, 'b'), (1, 'c'), (2, 'd'), (3, 'e')]
>>> L.sort(key=itemgetter(0), reverse=True)
>>> L
[(3, 'e'), (2, 'b'), (2, 'd'), (1, 'a'), (1, 'c')]

So stability does matter when comparing only the tuples' first
components, and tuples with the same first component did retain their
original order.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/ATYT7LDUZ4W422CFRDQJB7KHKSAFM674/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: The list.sort(reverse=True) method not consistent with description

2021-10-30 Thread Tim Peters
[Raymond Bisdorff ]
> ...
> Please notice the following inconsistency in Python3.10.0 and before of
> a sort(reverse=True) result:
>
>  >>> L = [(1, 'a'), (2, 'b'), (1, 'c'), (2, 'd'), (3, 'e')]
>  >>> L.sort(reverse=True)
>  >>> L
>  >>> [(3, 'e'), (2, 'd'), (2, 'b'), (1, 'c'), (1, 'a')]

Looks good to me.

> it should be:
>
>  >>> L = [(1, 'a'), (2, 'b'), (1, 'c'), (2, 'd'), (3, 'e')]
>  >>> reverseTuplesSort(L)
> [(3, 'e'), (2, 'b'), (2, 'd'), (1, 'a'), (1, 'c')]

Stability is irrelevant in the example, because no two list elements
are equal. You appear to be thinking, perhaps, that s[0] == t[0] when
s == (1, 'a') and t == (1, 'c') means s and t "are equal", but that's
not so at all.

>>> s = (1, 'a')
>>> t = (1, 'c')
>>> s == t
False
>>> s < t
True
>>> t > s
True

So s MUST come before t in a forward sort, and t MUST come before s in
a reverse sort.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/V22SQQI4FC4XD4TNCCXWBLCOCH2W4XU3/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Packing a long list of numbers into memory

2021-10-20 Thread Tim Peters
Sorry for the spam! A bunch of these were backed up in the moderation
queue. I used the UI to set the list to auto-discard future messages
from this address, but then clicked "Accept" in the mistaken sense of
"yes, accept my request to auto-nuke this clown". But it took "Accept"
to mean "sure thing, boss! I'll send this to everyone ASAP!!".
Computer software. Everyone, please, get a useful job instead ;-)

On Wed, Oct 20, 2021 at 10:43 PM  wrote:
>
> that is great to see this post . https://bit.ly/3C551OO
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at 
> https://mail.python.org/archives/list/python-dev@python.org/message/5BMKWRYJVHTKLIW4MJM7J7SEYN2OXNJE/
> Code of Conduct: http://python.org/psf/codeofconduct/
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/FZDBRF5D7ZBTXRUVJXPSBRFA2RRMI7LE/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: (Adaptive) Shivers sort variant of Tim sort

2021-08-28 Thread Tim Peters
[Laurent Lyaudet ]
> ...
> My benchmarks could be improved but however I found that Shivers' sort
> and adaptive Shivers' sort (aka Jugé's sort) performs better than
> Tim's sort.

Cool! Could you move this to the issue report already open on this?

Replace list sorting merge_collapse()?
https://bugs.python.org/issue34561

Alas, that's been inactive for nearly 3 years, but we _were_ making
some decent progress before everyone got distracted by "more
important" stuff.

You'll see there that Vincent Jugé contributed another version,
designed to address that his sort did substantially worse than what we
already have in some cases with very few runs. Which, for reasons
explained there, is likely important.

Before that, the best alternative known was - and by far - Munro &
Wild's "powersort" merge strategy. That's deeply rooted in theory
about fast approximations to optimal binary search trees.

Where that left off, Vincent was going to try proving properties of
his new variant, plus investigate other ideas. Alas, that's where it
ended, as Vincent ran out of time too.

In any case, yes, I'm entirely in favor of replacing timsort's merge
strategy, by something with provably good properties even in worst
cases. As explained in the issue report, the merge strategy I made up
was merely the first thing I tried that didn't obviously suck ;-) -
almost all the work went into making merging itself as zippy as
reasonably possible..
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/CQ32BVLF66ITQ6NBSYCQUPB7TPAGD5AF/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Dropping out of this list

2021-08-19 Thread Tim Peters
[me]
> If you want more active moderation, volunteer for the job. I'd happily
> give it up, and acknowledge that my laissez-faire moderation approach
> is out of style.

But, please, don't tell _me_ off-list that you volunteer. I want no
say in who would become a new moderator - I'm already doing the job as
I believe it's _best_ done. "Hands off" as much as possible.

So if the community disagrees, they need to work out among themselves
"the rules" for what should be suppressed, and pick moderators they
believe will enforce those rules.

I'm old, and cut my online discussion teeth via Usenet. It was total
chaos, and I came to appreciate that deeply. Nobody was silenced by a
central authority, but then again nobody could insist on being heard
either. News readers quickly grew rather sophisticated notions of
"killfiles", which allowed a user to quickly render posters they found
useless, and/or toxic threads, and/or ... (anything that could be
identified by a web of user-supplied regular expressions) completely
invisible to them.

The unsolvable problem I see with trying to ask a moderator to do that
for everyone at once is that, to stop complaints about over-permissive
moderation, the moderator would have to emulate the union of all list
members' killfiles (had they ability to make their own), not their
intersection. And then essentially no messages would be approved. You
think I'm joking, but, e.g., back in the day I knew someone who
killefiled every poster who had a surname that "sounded Armenian". And
you better believe that _some_ people today would insist that _this_
message be suppressed because "killfile" evokes violence.

Me, I'd consider suppressing it only because I generally find very
little value in meta-posts, which this post _about_ python-dev
moderation is ;-)
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/OER42TVZSUC5YG6TT7TWI263UC7SYU4N/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Dropping out of this list

2021-08-18 Thread Tim Peters
Various variations on:

> ... I am also considering unsubscribing if someone doesn't step in and stop
> the mess going on between Brett and Marco. ...

Overall, "me too!" pile-ons _are_ "the [bulk of the] mess" to most
list subscribers.

It will die out on its own in time. Dr. Brett should know by now that
he'll never get the "last word" with Dr. Marco, And everyone should
know by now that Dr. Marco won't let anyone else get the last word
either ;-)

So stop feeding it. That's the practical thing to do.

Like almost everything else in CPython-land, list moderation is a
volunteer thing.  I've been the most active (by far) list moderator
here essentially forever, but I see my only _legitimate_ job as
stopping spam. I might step in to block blatantly illegal posts too,
but that hasn't yet come up.

If you want more active moderation, volunteer for the job. I'd happily
give it up, and acknowledge that my laissez-faire moderation approach
is out of style.

Else suit yourself: unsubscribe if you like, and come back whenever
you like. It's all self-serve, and you're welcome here whenever "here"
is welcome to you.

As a matter of courtesy, though, it's less disruptive to the list
membership if you just unsubscribe rather than announce to the 3,585
list members that you're considering unsubscribing ;-)
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/ZK5YV5SPYCGKGE3UX2BTRFAGMGRKIJIS/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Problems with dict subclassing performance

2021-08-15 Thread Tim Peters
[Marco Sulla ]
> I repeat, even the worst AI will understand from the context what I
> meant.

Amazingly enough, the truth value of a proposition does not increase
via repetition ;-)

>>> bool(True * 1_000_000_000)
True
>>> bool(False * 1_000_000_000)
False

> But let me do a very rude example:
> What if I said to Steven "I pretend immediate suck my $%#$"? Do you
> think you and the others will not understand the sense? :D

Yes! But, for me, solely because of the "suck my $%#$" part.  I still
have NO idea what the "pretend immediate" part means to you. I see
Chris spelled out, in some detail, what "pretend" means to native
English speakers. None of which make sense in this context either,
unless you're saying you're going to "make believe" that someone is
going to "immediate suck [your] $%#$". But, in that case, what you
choose to fantasize about doesn't really seem relevant either.
"Pretend" just doesn't make any more sense here than, say,
"hippopotamus" would.

> C'Mon, you are offending my poor intelligence.
>
> As I wanted to say, I pretend from Steven his excuses for his
> insinuation, immediately. Is this clear now,

Not to me, no.  Although I bet the new phrasing "insinuation" gets
much closer to your intent than "excuses".

You think Steven was indirectly accusing you of unethical behavior
(trolling for StackOverflow upvotes)?. That's not the sense I got from
his original reply, but I can understand it if you did. If that's your
complaint, I'll leave it to Steven to say what his intent was - and,
if appropriate, to apologize for unintended offense.

> or must I take an English course at Cambridge?

This isn't about advanced English usage. It's about the ordinary
meanings of ordinary words in (what should be!) simple contexts.  If I
said to you

Capisco con il pesce il nodo insolito!

I doubt you'd suggest I study Italian at the University of Bologna ;-)
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/4LWHML3I2UWU3Y2K6ZJ7RMP352JUDSTK/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Problems with dict subclassing performance

2021-08-15 Thread Tim Peters
[Marco Sulla ]
> It's the Netiquette, Chris. It's older than Internet. It's a gross
> violation of the Netiquette remarking grammatical or syntactical
> errors. I think that also the least advanced AI will understand what I
> meant.

As multiple people have said now, including me, they had no idea what
you meant.by "I pretend your immediate excuses". It's not a complaint
that it's expressed inelegantly, but that they can't make _any_ sense
of it. By my count, this is at least the second time you've declined
to explain what you meant, but instead implied the person who said
they couldn't understand it was being dishonest.

> I think anyway that now the sense of my request is __very clear__. I
> ask the intervention of a moderator. How can I do this? Thank you in
> advance.

You already did, and I already replied. I don't view myself as a
parent, referee, or judge here, and have never suppressed anyone for
anything they said. I expect people to work out their differences
among themselves, like adults.

If you want to pursue this, I can only suggest bringing up a Code of
Conduct complaint. You do that like so:

https://www.python.org/psf/conduct/

"""
If you believe that someone is violating the code of conduct, or have
any other concerns, please contact a member of the Python Software
Foundation Code of Conduct work group immediately. They can be reached
by emailing conduct...@python.org
"""
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/ND7WVOPRQUTXBR3K6XSAJ3C2YL23WVCR/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Problems with dict subclassing performance

2021-08-15 Thread Tim Peters
[Marco Sulla ]
> Oh, this is enough. The sense of the phrase was very clear and you all
> have understood it.

Sincerely, I have no idea what "I pretend your immediate excuses."
means, in or out of context.

> Remarking grammatical errors is a gross violation
> of the Netiquette. I ask __immediately__ the intervent of a moderator,

I'm the only active moderator on this list, and I expect everyone has
noticed I don't exercise those powers at all except to try to keep
spam from showing up to begin with.

I had no problem at all with your original post, but, as usual,
_personally_ find almost no value in meta-posts (posts _about_ posts).

People make them anyway, and that's OK by me. They eventually burn
themselves out.

I would _like_ it most if everyone dropped this thread now, unless
they want to say something about the original topic (dict subclass
performance). But, towards that end, I'm not going to threaten anyone
with any moderation action.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/T5Z35KJLY3RQGS2HXYHLTR3QFURO34RF/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Windows buildbots may be broken

2021-08-09 Thread Tim Peters
Sorry, all! This post was pure spam - I clicked the wrong button on
the moderator UI. The list has already been set to auto-reject any
future posts from this member.

On Mon, Aug 9, 2021 at 10:51 AM ridhimaortiz--- via Python-Dev
 wrote:
>
> It is really nice post. https://bit.ly/3fsxwwl
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at 
> https://mail.python.org/archives/list/python-dev@python.org/message/WMJOVXKRH2ILPADU5QMLIVEIXHWPBOLZ/
> Code of Conduct: http://python.org/psf/codeofconduct/
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/INKI3S6ZZTQDVXZ3Y5BIJJO6QMGQJ5EK/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: [slightly OT] cryptographically strong random.SystemRandom()

2021-07-09 Thread Tim Peters
[Ethan Furman]
> A question [1] has arisen about the viability of `random.SystemRandom` in
> Pythons before and after the secrets module was introduced
> (3.5 I think) -- specifically
>
>  does it give independent and uniform discrete distribution for
>  cryptographic purposes across CPython 3.x versions?
>
> ,,,
> [1] https://stackoverflow.com/q/68319071/208880

`secrets` is just a wrapper around `random.SystemRandom`, so the
presence or absence of `secrets` doesn't matter.

As to SystemRandom, all answers depend on the quality of the platform
os.urandom(), which Python has no control over. See my answer here,
and the comments on it:

https://stackoverflow.com/questions/20936993/how-can-i-create-a-random-number-that-is-cryptographically-secure-in-python/20937265#20937265
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/7TZWAZCNFC4ZMRQENWQ5DDIFU6QPGN4Z/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Why list.sort() uses mergesort and not timsort?

2021-06-07 Thread Tim Peters
[Dan Stromberg ]
> ...
> Timsort added the innovation of making mergesort in-place, plus a little
> (though already common) O(*n^2) sorting for small sublists.

Actually, both were already very common in mergesorts. "timsort" is
much more a work of engineering than of insight ;-) That is, it
combined many pre-existing ideas, but pushed them hard, and got them
to work smoothly together.

The most novel part is "galloping" to vastly cut the number of
comparisons needed in many real-world cases of partially ordered
inputs. I took that idea from (as briefly explained in listsort.txt)
papers seeking to speed real-life unions and intersections of sorted
lists, presumably in database implementations.  I later discovered
that the idea had already been applied to a mergesort, as noted in a
paper by McIlroy (cited in listsort.txt) - but the paper didn't give
any details, and best I can tell he never published his code.

WIthout that, timsort wouldn't have been worth the bother of writing -
it was generally no faster than Python's prior sort implementation on
randomly-ordered inputs, and is quite a large pile of code.  But many
cases of real-world inputs do have significant pre-existing order of
some kind, where even a "perfect" quicksort (always picks _the_ median
as the partition pivot) remains O(n log n) but timsort O(n).

Curiously, though, it was the only implementation ever tried of a
mergesort-based algorithm that wasn't notably _slower_ on
randomly-ordered inputs than Python's prior "samplesort" algorithm
(like quicksort but with a very high-quality guess for the median
element based on recursively sample-sorting a largish random sample).
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/Y22TRSA5I4JCNC6GRBX7QZYF4MLGPCYK/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Anyone else gotten bizarre personal replies to mailing list posts?

2021-05-04 Thread Tim Peters
FYI, I just force-unsubscribed this member (Hoi Lam Poon) from
python-dev.  Normally I don't do things like that, since, e.g, we have
no way to know whether the sender address was spoofed in emails we
get.  But in this case Hoi's name has come up several times as the
sender of individual spam, and if Hoi is actually a legit member they
should have noticed and spoken up.

Of course this can't stop them from spamming. Just intended to
increase the effort for them, if they're determined to continue.

[nothing new below[

On Tue, May 4, 2021 at 11:34 AM Jonathan Goble  wrote:
>
> I just got one now from the same person on the dependabot thread. Entirely in 
> Chinese except for "GNU license" in the middle, along with an attached jpeg 
> of a random SciPy-related tweet (according to Gmail's preview thumbnail; I 
> didn't actually open the attachment for obvious reasons).
>
> On Sun, May 2, 2021, 5:08 AM Jeff Allen  wrote:
>>
>> Yes, I got one from the same address today. Thanks for pointing out these 
>> are individual peformances: it was annoying when I thought it was spam to 
>> the list.
>>
>> Although Hoi Lam Poon is a real (female) name, it may signify a generated 
>> lampoon.
>>
>> Jeff Allen
>>
>> On 23/04/2021 16:38, Nathaniel Smith wrote:
>>
>> I just got the reply below sent directly to my personal account, and I'm 
>> confused about what's going on. If it's just a one off I'll chalk it up to 
>> random internet weirdness, but if other folks are getting these too it might 
>> be something the list admins should look into? Or... something?
>>
>> -- Forwarded message -
>> From: Hoi lam Poon 
>>
>> ___
>> Python-Dev mailing list -- python-dev@python.org
>> To unsubscribe send an email to python-dev-le...@python.org
>> https://mail.python.org/mailman3/lists/python-dev.python.org/
>> Message archived at 
>> https://mail.python.org/archives/list/python-dev@python.org/message/3EA53I3Y72ACEPDVG467NMNTXHRL3NXL/
>> Code of Conduct: http://python.org/psf/codeofconduct/
>
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at 
> https://mail.python.org/archives/list/python-dev@python.org/message/JNYRFZ4FTIFUJFS7FPNMKI2HXW7SLL52/
> Code of Conduct: http://python.org/psf/codeofconduct/
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/LFULZM6GLDUAS5SFQ4F454QMRGEH3EGH/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: On the migration from master to main

2021-05-03 Thread Tim Peters
I'm guessing it's time to fiddle local CPython clones to account for
master->main renaming now?

If so, I've seen two blobs of instructions, which are very similar but
not identical:

Blob 1 ("origin"):

"""
You just need to update your local clone after the branch name changes.
>From the local clone of the repository on a computer,
run the following commands to update the name of the default branch.

$ git branch -m master main
$ git fetch origin
$ git branch -u origin/main main

Apart from that, you should update any local script or command that uses
the name "master" to use the name "main".
"""

Blob 2 ("upstream"):

"""
The CPython repository's default branch was renamed from ``master`` to ``main``
after the Python 3.10b1 release. If you had cloned the repository before this
change, you can rename your local branch as follows::

git branch -m master main
git fetch upstream
git branch -u upstream/main main
"""

>From my dim understanding, "upstream" makes more sense, but I don't
know. On my box:

"""
$ git remote -v
origin  https://github.com/tim-one/cpython (fetch)
origin  https://github.com/tim-one/cpython (push)
upstreamhttps://github.com/python/cpython (fetch)
upstreamhttps://github.com/python/cpython (push)
"""

so sync\ing with the universally shared "upstream" just makes more sense to me.

Right? Wrong? Do we need some mix of both? Something else?

I'd rather get it right the first time than try to untangle a mess
after the fact ;-)
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/X3V42DYOHFLGAUPODFLXMYZWDE3VY47W/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PyGC and PyObject_Malloc introspection

2021-01-14 Thread Tim Peters
[Julien Danjou]
> ...
> Supposedly PyObject_Malloc() returns some memory space to store a
> PyObject. If that was true all the time, that would allow anyone to
> introspect the allocated memory and understand why it's being used.
>
> Unfortunately, this is not the case. Objects whose types are tracked by
> the GC go through _PyObject_GC_Alloc() which changes the underlying
> memory structure to be (PyGC_HEAD + PyObject).
>
> This is a bummer as there then no safe way that I can think of to know
> if an allocated memory space is gc-tracked or gc-untracked. It makes it
> therefore impossible to introspect the memory allocated by
> PyObject_Malloc().

I'm not clear on exactly what it is you're after, but CPython faces
the same question all the time: _given_ a pointer to an object, is
there, or is there not, a GC header prepended?  That's answered by
this C API function:

"""
int PyObject_IS_GC(PyObject *obj)

Returns non-zero if the object implements the garbage collector
protocol, otherwise returns 0.

The object cannot be tracked by the garbage collector if this function
returns 0.
"""

FYI, the implementation usually resolves it by looking at whether
obj's type object has the Py_TPFLAGS_HAVE_GC flag set.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/LBYGKGEPB5ZAFEQGOBIICNEDRYK4PIT3/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Changing Python's string search algorithms

2020-10-17 Thread Tim Peters
[Tim]
> ...
> Alas, the higher preprocessing costs leave the current PR slower in "too
> many" cases too, especially when the needle is short and found early
> in the haystack. Then any preprocessing cost approaches a pure waste
> of time.

But that was this morning.  Since then, Dennis changed the PR to back
off to the current code when the needle is "too small". There are very
few cases we know of now where the PR code is slower at all than the
current code, none where it's dramatically slower, many where it's
significantly faster, and some non-contrived cases where it's
dramatically faster (for example, over a factor of 10 in
stringbench.py's "late match, 100 characters" forward-search tests,
and especially beneficial for Unicode (as opposed to bytes)).  Then
there are the pathological cases like in the original issue report,
where it's multiple orders of magnitude faster (about 3 1/2 hours to
under a tenth of a second in the issue report's case).

Still waiting for someone who thinks string search speed is critical
in their real app to give it a try.  In the absence of that, I endorse
merging this.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/5K2ISKORAH327WLQ43QG5QMPWXNHMFTC/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Changing Python's string search algorithms

2020-10-17 Thread Tim Peters
[Tim]
>> Note that no "extra" storage is needed to exploit this. No character
>> lookups, no extra expenses in time or space of any kind.  Just "if we
>> mismatch on the k'th try, we can jump ahead k positions".

[Antoine Pitrou ]
> Ok, so that means that on a N-character haystack, it'll always do at
> least N character comparisons?
>
> IIRC, the current algorithm is able to do much less than that in
> favorable cases.  If the needle is k characters long and they are all
> distinct, it's able to do N/k comparisons.

It's an excellent question, and from what I said it's a correct inference.

But I only described how the algorithm matches the "v" part of the
"needle = u + v" splitting.  When matching the "u" part, skips
materially exceeding the count of comparisons made are possible.

The paper claims the minimal number of comparisons needed (ignoring
preprocessing) is 2N/k, so same order as the current algorithm. But,
for the constant factor, Dennis's code achieves N/k, because he
augmented the Crochemre-Perrin algorithm with a variant of Sunday's
algorithm (a simplification, but not as extreme as the current code's
simplification). Note that best case near N/k is a known lower bound
for best cases - impossible to do materially better.

In the 'x' * 100 + 'y' example I ended my msg with, recall that  v wa
just "y", so the part I described was of minimum value - if the last
haystack character in the current search window isn't "y", that the
mismatch occurred on the first try leads to a shift of just 1.

But if the character one beyond the end of the current search window
isn't "x" or "y" then, Sunday's algorithm allows to skip 102 positions
(len(needle) + 1).  His algorithm requires precomputing a skip-count
table of O(len(sigma)) space, where sigma is the universe of possible
characters ("the alphabet").  That's "a whole lot" for Unicode.

Fredrik and Dennis simplified Sunday's algorithm to weaker versions
that require relatively small constant space instead, independent of
needle, pattern, and alphabet size.

Despite the current code's simplification of that nearly to the point
of non-existence (it reduces the space needed to 32 bits, and with
only one possible skip count), it's nevertheless so effective that it
beat Dennis's initially pure Crochemre-Perrin code by dramatic margins
in a significant number of exceptionally lucky (for the current code)
cases.  After adding a variation of Sunday (the current PR uses 256
bytes, and has 64K possible skip counts - but perhaps other tradeoffs
would work better), we haven't yet seen a case (contrived or natural)
where the current code is dramatically faster.

But we have seen non-contrived cases where the current PR is
dramatically faster, including in the benchmark code Fredrik gifted us
with (Tools/stringbench/stringbench.py in your Python tree).  Alas,
the higher preprocessing costs leave the current PR slower in "too
many" cases too, especially when the needle is short and found early
in the haystack. Then any preprocessing cost approaches a pure waste
of time.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/MHGNAZH7X4HS2KSWTRDVSX2AJXN7POXZ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Changing Python's string search algorithms

2020-10-16 Thread Tim Peters
[Tim Peters, explains one of the new algorithm's surprisingly
 effective moving parts]

[Chris Angelico ]
> Thank you, great explanation. Can this be added to the source code
> if/when this algorithm gets implemented?

No ;-)  While I enjoy trying to make hard things clear(er), I need to
understand them inside out myself first, and I'm still not there with
this algorithm.

Even of what I do grok now, I cheated some to make that post simple.
For example, if you read it carefully, you'll realize that the
explanation I gave doesn't actually explain why the "x" * 100 example
is correct: the explanation implicitly relied on that the left half
the splitting (u) is at least as long as the right half (v).  But
that's not always the case (and, in particular, for "x" * 100, u is
empty!).

Tal Einat posted earlier that he was keen to try to work up a clear
explanation, and I look forward to that. All the expositions I've
found of this algorithm so far are hard to approach.  The original
paper is still the best I've seen.  Everything I wrote was but a gloss
on its Lemma 3.2.

> I've learned all kinds of things from the huge block comments
> about timsort, list growth, and the like. They make great reading
> when you're trying to figure out how to explain things (or if
> you're a bored nerd browsing for curiosity's sake - that works too!).

Takes one to know one ;-)  There's beauty and elegance in this
algorithm, but they'll go unloved without explanations clear enough
for the non-obsessed to follow.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/HMSOQD22CLHLZ4VEWV6FPRA2RC3TM75H/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Changing Python's string search algorithms

2020-10-16 Thread Tim Peters
I don't plan on making a series of these posts, just this one, to give
people _some_ insight into why the new algorithm gets systematic
benefits the current algorithm can't.  It splits the needle into two
pieces, u and v, very carefully selected by subtle linear-time needle
preprocessing (and it's quite a chore to prove that a suitable
splitting always exists!):

needle == u + v

There are a whole bunch of things that can be said about this
splitting (and must be said to prove worst-case linear time), but
here's just one for which it's easy to show the benefit:  no
(non-empty) prefix of v is a suffix of u.

Now I think it's quite easy to twist your head into knots trying to
see what follows from that, so I'll just give a highly generalizable
concrete example.  Suppose v starts with "xyza".  Then u cannot end
with "x", "xy", "xyz", or "xyza". If u did, then u would have a suffix
that's also a prefix of v.

A match attempt at a given haystack position starts by matching the
characters in v one at a time, left to right. Here with haystack on
the first line, needle on the second, and "0" denoting "don't care /
unknown / doesn't exist" - it's just _some_ glyph so the example has
some chance of lining up under annoying proportional fonts :-(

Say the first character matches:

0x00
0xyzavvv

And the second and third:

0xyz
0xyzavvv

But the 4th doesn't:

0xyzZ000
0xyzavvv

Is there any possibility of a match if we shift the needle one to the
right? No! If we tried, the last character of u would line up with the
"x" in the haystack, and we already know "x" is not a suffix of u.

How about if we shifted two to the right? No again! And for the same
reason: then the last two characters of u would line up with "xy" in
the haystack, but we already know "xy" is not a suffix of u.

And so on.  In general, if we find a mismatch while trying the k'th
(1-based) character of v, we can shift the needle k places to the
right before there's any possibility of matching.  In this case, k=4,
so we can jump to:

0xyzA000
0xyzavvv

Note that no "extra" storage is needed to exploit this. No character
lookups, no extra expenses in time or space of any kind.  Just "if we
mismatch on the k'th try, we can jump ahead k positions".  Nothing
special about "xyza" either - this works with any characters at all!

"xyza" isn't by any stretch a hard case regardless.  But it goes
_some_ way toward hinting at why potentially hard cases are rendered
toothless too.  For example, the needle "x" * 100 is split into

u = "" # yup, empty!
v = needle

Now, e.g., if we find a mismatch at the 83'rd character of v, we can
leap ahead 83 positions immediately.

What about needle "x" * 100 + "y"?  I picked that to crush your
premature optimism ;-) Now the splitting is

u = "x" * 100
v = "y"

and the wonderful trick above only applies to a trivial 1-character
string. The "hard part" is all in matching the u piece now, which I'll
leave as an exercise for the reader ;-)
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/HPMYTQSJ4IXUYUCZE2EYCIF34RTQDW7O/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Changing Python's string search algorithms

2020-10-16 Thread Tim Peters
[Marco Sulla]
> Excuse me if I intrude in an algorithm that I have not understood, but
> the new optimization can be applied to regexps too?

The algorithm is limited to searching for fixed strings.

However, _part_ of our regexp implementation (the bit that looks ahead
for a fixed string) will inherit it by magic.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/Z4R6VUAOXMH2D5NDGWZX52BLU6F7XNXE/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Changing Python's string search algorithms

2020-10-15 Thread Tim Peters
[Guido]
> I am not able to dream up any hard cases -- like other posters,
> my own use of substring search is usually looking for a short
> string in a relatively short piece of text. I doubt even the current
> optimizations matter to my uses.

I should  have responded to this part differently.   What I said was
fine ;-) , but it's a mistake to focus exclusively on pathologies
here.  What Fredrik did was with the aim of significantly speeding
utterly ordinary searches.  For example, search for "xyz" in
"abcdefgh...xyz".

Brute force starts comparing at the first location:

abcdefgh...
xyz

The current code compares "c" with "z" fist.  They don't match.  Now
what?  It looks at the_next_ character in the haystack, "d".  Thanks
to preprocessing the needle, it knows that "d" appears nowhere in the
needle (actually, the code is so keen on "tiny constant extra space"
that preprocessing only saves enough info to get a _probabilitistic_
guess about whether "d" is in the needle, but one that's always
correct when "not in the needle" is its guess).

For that reason, there's no possible match when starting the attempt
at _any_ position that leaves "d" overlapping with the needle.  So it
can immediately skip even trying starting at text indices 1, 2, or 3.
It can jump immediately to try next at index 4:

abcdefgh...
0123xyz

Then the same kind of thing, seeing that "g" and "z" don't match, and
that "h" isn't in the needle.  And so on, jumping 4 positions at a
time until finally hitting "xyz" at the end of the haystack.

The new code gets similar kinds of benefits in _many_ similarly
ordinary searches too, but they're harder to explain because they're
not based on intuitive tricks explicitly designed to speed ordinary
cases.  They're more happy consequences of making pathological cases
impossible.

>From randomized tests so far, it's already clear that the new code is
finding more of this nature to exploit in non-pathological cases than
the current code.  Although that's partly (but only partly) a
consequence of Dennis augmenting the new algorithm with a more
powerful version of the specific trick explained above (which is an
extreme simplification of Daniel Sunday's algorithm, which was also
aimed at speeding ordinary searches).
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/Y3DFFBHNMHDGRE2GIEMH7XLY5YR6BMKR/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Changing Python's string search algorithms

2020-10-15 Thread Tim Peters
[Dennis Sweeney ]
> Here's my attempt at some heuristic motivation:

Thanks, Dennis!  It helps.  One gloss:

> 
> The key insight though is that the worst strings are still
> "periodic enough", and if we have two different patterns going on,
> then we can intentionally split them apart.

The amazing (to me) thing is that splitting into JUST two parts is
always enough to guarantee linearity.  What if there are a million
different patterns going on ?  Doesn't matter!  I assume this
remarkable outcome is a consequence of the Critical Factorization
Theorem.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/IKPKLR43U522PC55JB7GQZMQSGJHNCKF/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Changing Python's string search algorithms

2020-10-14 Thread Tim Peters
[Guido]
> The key seems to be:

Except none of that quoted text (which I'll skip repeating) gives the
slightest clue as to _why_ it may be an improvement.  So you split the
needle into two pieces.  So what?  What's the _point_?  Why would
someone even imagine that might help?

Why is one half then searched right to left, but the other half left
to right?  There's actually "a reason" for searching the right half
left to right, but because the shift on a mismatch in the left half is
a constant ("per(x)") independent of the mismatching position, it's
actually irrelevant in which order the left half's characters are
compared (well, at least in some variations of these newer
algorithms).  Over-specification can be actively misleading too.

What's the key idea?  "Split the needle into two pieces" is a key
_step_, not a key _insight_.


> I need a Python version though.

Dennis wrote one for his first stab, which you can download from the
bug report (it's attached there as fastsearch.py).

On the bug report's data, running that under CPython 3.9,0 on my box
reduced the time from about 3 1/2 hours to about 20 seconds.  Running
it under PyPy, to under one second (and the current C implementation
under 0.1 second).

> I am not able to dream up any hard cases -- like other posters, my own
> use of substring search is usually looking for a short string in a relatively
> short piece of text. I doubt even the current optimizations matter to my uses.

They probably do, but not dramatically.  Would you, e.g., notice a
factor of 2 either way in those cases?  A factor of 4?  Of 10?  When
Fredrik first wrote this code, he routinely saw factors of 2 to 10
speedup on all sorts of string-searching benchmarks, both contrived
and "natural". The speedups were especially pronounced for Unicode
strings at the time, where skipping futile Unicode character
comparisons could be more valuable than when skipping (shorter)
single-byte comparisons.

Should also note that the fixed string searching code is also used as
a subroutine by parts of our regexp implementation, by str.count(),
str.replace(), similar `bytes` operations, and so on.

> What are typical hard cases used for?

It's kinda like asking what typical hard rounding cases for pow() are
used for ;-) They aren't deliberate. They "just happen", typically
when a pattern and the text to search contain self-similarities "but
with glitches".   Search the string

   text =  "X" * 10_000_000

for

needle = "X" * 1_000_000

Easy!  It matches at once.  But now tack "Y" on to the end of the
needle.  Then it's impossible to match.  Brute force search first
finds a prefix match at text{; 100] but then fails to match the
trailing "Y".  So brute force uses another million compares to match
the prefix at text[1 : 101]. But again fails to match the trailing
"Y".  Then another million plus 1 compares to fail to match starting
at index 2, and again at index 3, and again at index 4, ... it
approaches a trillion comparisons before it finally fails.

The current code starts comparing at the right end of each trial
position first. Then an "X" from the text and the needle's trailing
"Y" mismatch at once.  That's a million times faster.

Here's a simple case where the current code is horribly slow, because
neither "right-to-left" nor any of its preprocessing tricks do any
good at all.  Even Daniel Sunday's full-blown algorithm[1] (which the
current code strips down _almost_ to the point of non-existence)
wouldn't help (unless it used a variant I've never actually seen that
compared the rarest pattern character first):

("X" * 10_000_000).find("X" * 500_000 + "Y" + "X" * 500_000)

The newer code returns -1 seemingly instantly.

> DNA search? (That would be cool!)

It can certainly help. Self-similarities are bound to be common in
long strings from  a 4-character alphabet (ACGT). But, to be fair,
serious work with genomes typically invests in building a giant suffix
tree (or enhanced suffix array) for each cataloged sequence to be
searched. No preprocessing of needles is needed then to guarantee
worst-case efficient searches of many kinds (including things like
finding the longest adjacent repeated subsequences, more like a regexp
(.+)\1 kind of search).

But the new code could quite plausibly speed more casual prototyping
of such explorations.

[1] https://dl.acm.org/doi/abs/10.1145/79173.79184
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/XUP6S2IVJAW5K2NSHZ7UOKN5YQFNUWVQ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Changing Python's string search algorithms

2020-10-14 Thread Tim Peters
[Steven D'Aprano ]
> Perhaps this is a silly suggestion, but could we offer this as an
> external function in the stdlib rather than a string method?
>
> Leave it up to the user to decide whether or not their data best suits
> the find method or the new search function. It sounds like we can offer
> some rough heuristics, but the only way to really know is "try it and
> see which works best for you".
>
> The `string` module is an obvious place for it.

I think this is premature. There is almost never an optimization
that's a pure win in all cases. For example, on some platforms
`timsort` will never be as fast as the old samplesort in cases with a
very large number of equal elements, and on all platforms `timsort`
consumes more memory.  And it's a wash on "random" data on most
platforms. Nevertheless, it was a significant speed win for many -
possibly even most - real-life cases.

So far, the PR reduces the runtime in the bug report from about 3 1/2
hours to under a tenth of a second. It would be crazy not to gift
users such dramatic relief by default, unless there are good reasons
not to. Too soon to say. On tests of random strings with characters
following a Zipf distribution (more realistic than uniform but still
very easy to generate - and not contrived in any way to favor either
approach), the current new code it usually faster than the status quo,
in no case has been worse than twice as slow, and in a number of cases
has been 20x faster.  It's already clearly a win _overall_.

The bulk of the "but it's slower" cases now are found with very short
needles (patterns), which was expected (read my comments on the bug
report), and exacerbated by that the structure of the random test
generation is quite likely to create cases where a short needle is
found early in the haystack. This combination makes the higher
preprocessing overhead as damaging as possible. Dennis also expanded
the current code's "32-bit bitmask" trick (which is an extreme
simplification of Daniel Sunday's algorithm) to a fixed-size 256-byte
table of character-class-specific skip counts, which went a _very_
long way toward eliminating the cases where the current code enjoyed a
"pure luck" advantage over the new code. But that increased the
preprocessing overhead again, which again is relatively most
significant for short needles - those 256 bytes have to be initialized
every time, no matter the lengths of the needle or haystack.

If the new code were changed to exempt short needles, even now it
might be hard to make an objective case not to adopt it.

But it would leave open a different idea for an "opt-in" facility: one
that allowed to give a needle to a function and get back an object
capturing the results of preprocessing.  Then a different wrapper
around the search code that accepted the already-pre-processed info.
There are, e.g., certainly cases where repeated searches for a given
4-character needle could be sped significantly by exploiting the new
code, but where the overhead of preprocessing is too much to bear in
"random" performance testing. It would also open the door to more
aggressive (expensive) kinds of preprocessing.  That's where users may
be able to make useful, informed judgments.

[David Mertz]
> That said, I do recognize that the `re` module also has pathological cases,
> and the standard library documentation explicitly says "maybe you want to
> try the third-party `regex` implementation."  That is sort of precedent for
> this approach.

`regex` also suffers exponential time disasters, they're just harder
to stumble into - and there are books explaining in detail how and why
regex engines can fall into these traps.

It's far less _expected_ in plain string searches, and Dennis was
aware of the new algorithms because, apparently, (at least) glibc's
memmem switched to one some years ago. So we're playing catch-up here.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/ECPXBBF7OUNYLDURCUKYXIOTGPGVBMHX/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Changing Python's string search algorithms

2020-10-14 Thread Tim Peters
[Guido]
> Maybe someone reading this can finish the Wikipedia page on
> Two-Way Search? The code example trails off with a function with
> some incomprehensible remarks and then a TODO..

Yes, the Wikipedia page is worse than useless in its current state,
although some of the references it lists are helpful.  This is a much
better summary:

http://www-igm.univ-mlv.fr/~lecroq/string/node26.html#SECTION00260

but, I believe, still far too telegraphic.

The original paper is by far the best account I've seen so far, with
complete code and exhaustive explanations and proofs.  Even examples
;-)

But here's the thing: I don't believe this algorithm this _can_ be
reduced to an elevator pitch.  Excruciating details appear to be
fundamental at every step, and I've seen nothing yet anywhere
approaching an "intuitive explanation" :-(  What happens instead is
that you run it on the hardest cases you can dream up, and your jaw
drops in amazement :-)
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/B2MPSYXRKFA3V3GP45GAFSIKKCK5NHJ3/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Changing Python's string search algorithms

2020-10-14 Thread Tim Peters
Rest assured that Dennis is aware of that pragmatics may change for
shorter needles.

The code has always made a special-case of 1-character needles,
because it's impossible "even in theory" to improve over
straightforward brute force search then.

Say the length of the text to search is `t`, and the length of the
pattern `p`. Brute force and the current code have worst case O(t*p)
behavior. The new code, worst case O(t+p). If that's as deep as you
dig into it, it seems all but obvious then that O(t*p) just can't be
that bad when p is small, so keep it simple.

But there's another twist:  the current and new code both have O(t/p)
cases too (but brute force, and even Knuth-Morris-Pratt, don't). That
can be highly beneficial even for p as small as 3.

Unfortunately, the exact cases in which the current and new code enjoy
O(t/p) behavior aren't the same.

Lying a bit:  In general the current code has just two tricks. One of
those tricks is useless (pure waste of time) if the pattern happens to
end with a repeated character pair, and is most effective if the last
character appears nowhere else in the pattern. The other trick is most
effective if the set of characters in the text has no intersection
with the set of characters in the pattern (note that the search is
certain to fail then), but useless if the set of text characters is a
subset of the set of pattern characters (as, e.g., it very often is in
real-life searches in apps with a small alphabet, like [ACGT}+ for
genomes).

But I don't know how to characterize when the new code gets maximum
benefit. It's not based on intuitive tricks.  The paper that
introduced it[1] says it's based on "a deep theorem on words
known as the Critical Factorization Theorem due to Cesari, Duval,
Vincent, and Lothaire", and I still don't fully understand why it
works.

It's clear enough, though, that the new code gets real systematic
value out of patterns with repetitions (like "abab"), where the
current code gets real value from those only by accident (e.g., "a"
and "b" happen to appear rarely in the text being searched, in which
case that the pattern has repetitions is irrelevant).

But, as I said in the first message, the PR is "preliminary". There
are still worlds of tweaks that have been thought of but not yet tried
even once.

[1] search for "Two-Way String Matching" by Crochemore and Perrin.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/G53VXXYWWEM26S2XKVX5W6P54R47YQTG/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Changing Python's string search algorithms

2020-10-13 Thread Tim Peters
Fredrik Lundh crafted our current string search algorithms, and
they've served us very well.  They're nearly always as fast as
dumbest-possible brute force search, and sometimes much faster. This
was bought with some very cheap one-pass preprocessing of the pattern
(the substring to search _for_), to compute a single integer "skip",
and a single 32-bit bitmask, which can sometimes be used to skip over
areas where it can prove matches are doomed to fail.

Unlike most "fancier than brute" algorithms, it does not build
preprocessing tables of size proportional to the length of the
pattern, or to the size of the alphabet. The extra space is O(1), and
tiny, independent of pattern and alphabet size.

A weakness is that worst-case search time is proportional to the
product of the lengths of the inputs. This can be so for searches that
succeed as well as for those that fail.  This was recently
dramatically illustrated by this bug report:

https://bugs.python.org/issue41972

where a single large string search consumed well over 3 hours of CPU
time before it succeeded. It "should take" a fraction of a second.

While it was news to me, Dennis Sweeney was aware of that other
algorithms requiring only O(1) extra space are now well known with
worst-case linear time. That's really quite remarkable :-) They do
require fancier (but still linear-time) preprocessing, so can't always
be a pure win.

So this message: string (which includes byte) search is an extremely
common operation, across all kinds of applications, so changes here
need as many eyeballs on them as they can get. What's typical? What's
important? What doesn't matter? What about reverse searches ("rfind")?
How can we quantify the acceptable region of tradeoffs?

My guess is that searches for substrings less than 50 characters long
in strings under ten thousand characters are most important for most
apps, but I just pulled that guess out of my butt. I'm also guessing
that searches for multi-million-character substrings (like in the bug
report above) are rare indeed, and that any change sparing them from
quadratic-time disaster would be "way more than good enough".

But I don't know. If you can help, Dennis already has a preliminary PR
implementing one of the newer algorithms, augmented with Fredrik's
unique ideas:

https://github.com/python/cpython/pull/22679

By "help" I don't necessarily mean code review - it could, e.g., be a
huge help to test-drive it on your own critical string-slinging apps.
Faster? Slower? A wash? Wrong answer? Crash?

Due to the natures of the old and new algorithms, neither will be
faster or slower in all cases, the newer one should never be
dramatically slower than the old one, the new one will be
spectacularly faster in some cases, and "in general" it seems
impossible to guess in advance. It depends on the value the fancier
preprocessing can extract from the pattern versus the higher
preprocessing cost, and that in turn depends on details of exactly
what the patterns and texts to be searched are.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/ECAZN35JCEE67ZVYHALRXDP4FILGR53Y/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Deferred, coalescing, and other very recent reference counting optimization

2020-09-03 Thread Tim Peters
I'm surprised nobody has mentioned this:  there are no "unboxed" types
in CPython - in effect, every object user code creates is allocated
from the heap. Even, e.g., integers and floats. So even non-contrived
code can create garbage at a ferocious rate. For example, think about
this simple function, which naively computes the Euclidean distance
between two n-vectors:

```python
def dist(xs, ys):
from math import sqrt
if len(xs) != len(ys):
raise ValueError("inputs must have same length")
return sqrt(sum((x - y)**2 for x, y in zip(xs, ys)))
```

In general, `len(xs)` and `len(ys)` each create a new integer object,
which both become trash the instant `!=` completes.  Say the common
length is `n`.

`zip` then creates `n` 2-tuple objects, each of which lives only long
enough to be unpacked into `x` and `y`.  The the result of `x-y` lives
only long enough to be squared, and the result of that lives only long
enough to be added into `sum()`'s internal accumulator.  Finally, the
grand total lives only long enough to extract its square root.

With "immediate" reclamation of garbage via refcounting, memory use is
trival regardless of how large `n` is, as CPython reuses the same heap
space over & over & over, ASAP.  The space for each 2-tuple is
reclaimed before `x-y` is computed, the space for that is reclaimed
when the square completes, and the space for the square is reclaimed
right after `sum()` folds it in.  It's memory-efficient and
cache-friendly "in the small".

Of course that's _assuming__, e.g., that `(x-y).__pow__(2)` doesn't
save away its arguments somewhere that outlives the method call, but
the compiler has no way to know whether it does.  The only thing it
can assume about the element types is that they support the methods
the code invokes.  It fact, the compiler has no idea whether the
result type of `x-y` is even the same as the type of `x` or of `y`.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/NU4T5TFPDVYDCR5ADY6KKJ6USWVFD3TZ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 622 version 2 (Structural Pattern Matching)

2020-07-08 Thread Tim Peters
One microscopic point:

[Guido]
> ...
> (if `.x` is unacceptable, it’s unclear why `^x` would be any
> better),

As Python's self-appointed spokesperson for the elderly, there's one
very clear difference:  a leading "." is - literally - one microscopic
point, all but invisible.  A leading caret is far easier to see, on a
variety of devices and using a variety of fonts.  Indeed, I missed the
leading dot in ".x" in your email the first two times I read that
sentence.

But a caret is harder to type.  So here's an off-the-wall idea:  use
an ellipsis. If you're still using a maximal-munch lexer, ellipsis
followed by an identifier is currently a syntax error.  "...x" is far
easier to see than ".x', easier to type than "^x", and retains the
mnemonic connection that "something is a named load pattern if and
only if it has dots".

"..." is also a mnemonic for "OK, here I want to match ... umm ... let
me think ... I know!  A fixed value." ;-)
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/TPTECZKGQY53L7ZKIQOVOXOTMQE6A447/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Recent PEP-8 change

2020-07-02 Thread Tim Peters
[Paul Moore ]
> (This is a genuine question, and I'm terrified of being yelled at for
> asking it, which gives an idea of the way this thread has gone - but I
> genuinely do want to know, to try to improve my own writing).
>
> What *is* the correct inclusive way to refer to an unidentified person
> in a technical document, without sacrificing clarity by using
> convoluted circumlocutions like "he/her/they" or over-use of the
> passive voice? My impression is that commonly accepted language rules
> and usage are lagging behind, and there's no good answer to this
> question yet :-(

I've used singular "they" for years'; if someone truly objects to
that, they've kept it to themself.

"Authorities" are moving in that direction too; e.g.,

https://apastyle.apa.org/style-grammar-guidelines/grammar/singular-they

"""
The singular “they” is a generic third-person singular pronoun in
English. Use of the singular “they” is endorsed as part of APA Style
because it is inclusive of all people and helps writers avoid making
assumptions about gender. Although usage of the singular “they” was
once discouraged in academic writing, many advocacy groups and
publishers have accepted and endorsed it, including Merriam-Webster’s
Dictionary.
"""

Then again, we're talking about humans.  There's nothing you can do -
or refrain from doing - that won't mortally offend someone :-)
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/4MOBYZJUEMUKKI2GELZNXTRMT7QPRGR3/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Please refrain from posting on the PEP 8 threads for 24 hours

2020-06-30 Thread Tim Peters
[Victor Stinner ]
> If someone continues to feed the PEP 8 discussion, would it be
> possible to change their account to require moderation for 1 day or
> maybe even up to 1 week? I know that Mailman 3 makes it possible.

I see no such capability.  I could, for example, manually fiddle
things so that messages from your email address are held for
moderation, but that would persist until I manually undid it egain.

But there seem to be many things Mailman 3 is theoretically capable of
that aren't available via the list owner web UI. That's the only admin
access I have (or want).
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/FSIQ36ZYGUMPCYXUOABBV6TCDVU2H467/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Please refrain from posting on the PEP 8 threads for 24 hours

2020-06-30 Thread Tim Peters
[Brett Cannon  wrote:]
> Regardless of what side you fall on, I think we can agree that
> emotions are running very high at the moment. Nothing is going
> to change in at least the next 24 hours, so I am personally
> asking folks to step back for at least that long and think about:
>
> Is what you want to say going to contribute to the discussion?
> Is it being phrased in a polite, constructive fashion?
>
> If after 24 hours of reflection you feel your email still meets these
> criteria then I would say it's reasonable to send that email.

As python-dev's primary moderator, I endorse and second Brett's request.

No, I'm not slapping blanket moderation on the list unless forced to.
This is just a sensible adult-to-adult request to cool down and
restore at least a shallow illusion;-) of mutual civility.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/HB22ZQECC5DJPIOFXHHDUAMGD6YN6WFO/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Lists placed into Emergency Moderation status

2020-06-30 Thread Tim Peters
 [Ernest W. Durbin III ]
>>> Reviewing, I may have misinterpreted the message from PSF Executive
>>> Director regarding the situation.
>>>
>>> It does appear that python-ideas moderators contacted postmaster@.
>>> Appears I misread a message saying “it looks like it’s happening on
>>> python-dev too” to mean that the request was for both lists.

[Tim Peters]
>> It depends on who you want to annoy least ;-)
>>
>> If it's the position of the PSF that some kind(s) of messages must be
>> suppressed, then I'll need a more-or-less clear definition, from them,
>> of what those consist of.  At which point I'll agree to enforce them,
>> or step down as a python-dev moderator.

[Thomas Wouters ]
> Just to clarify: this had nothing to do with the PSF, but with the
> Steering Council

My understanding is that the "PSF Executive Director" Ernest cited has
nothing to do with the Steering Council, but with the PSF ;-)  In any
case, that's why I said "if it's the position of the PSF ...".

> and the python-ideas moderators. We were discussing the PEP-8 "controversy"
> last night, after Titus (one of the python-ideas moderators) asked about 
> putting
< *that* list in full moderation, and we talked about how the
discussion had bled
> over into python-dev. We weren't clear among ourselves which lists we were
> talking about anymore when we asked Ernest

So it wasn't the PSF Executive Director.  Regardless, same thing in
the end to me:  if someone who claims authority to do this wants to
enforce some content suppression rules on python-dev, that's a
business I'm very reluctant to get into, but can't judge without a
clearer definition of what's out of bounds.

I do OK with "spam" and "wholly off topic", but Python people telling
each other their attitudes, beliefs, and/or behaviors are disgusting,
unacceptable, evil ... strikes me as being a healthy (if unpleasant)
family fight.  I won't take a side on that in the moderation role.

> to see about emergency moderation mode (which wasn't as easy as we
> would've expected),

To judge from the change I undid myself for python-dev, it appeared to
amount to no more than putting ".*" in a regexp field to match all
Subject lines. (To be clear, I undid this for python-dev; Ernest did
not, so blame me - as the night wore on, I wanted to kill the change
before I went to sleep, so people posting overnight (from my POV)
wouldn't have to wait hours & hours for posts to go through.)

> which is why Ernest did it for both lists. We should've been more careful
> about this, and looped in the python-dev moderators straight away. Sorry
> about that.

Not a real problem! Just another minor computer-related busy-work
annoyance, in a life overflowing with such for decade after decade
after decade ;-)
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/AAXZP3GD6PWW36FNQLHUD545NM7W6MX4/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Lists placed into Emergency Moderation status

2020-06-29 Thread Tim Peters
[Ernest W. Durbin III ]
> Reviewing, I may have misinterpreted the message from PSF Executive
> Director regarding the situation.
>
> It does appear that python-ideas moderators contacted postmaster@.
> Appears I misread a message saying “it looks like it’s happening on
> python-dev too” to mean that the request was for both lists.
>
> Should I disable moderation on python-dev?

It depends on who you want to annoy least ;-)

If it's the position of the PSF that some kind(s) of messages must be
suppressed, then I'll need a more-or-less clear definition, from them,
of what those consist of.  At which point I'll agree to enforce them,
or step down as a python-dev moderator.

Otherwise, as I said, I've seen nothing on python-dev so far that I
would have rejected.  So all "emergency moderation status" is doing
for python-dev so far is annoying me with more email to go approve
messages waiting in the queue, and to annoy those messages' senders
with delays (waiting for moderator action).

So it's not, in reality, accomplishing anything of value here.  So, if
I were you, I'd disable emergency moderation status on python-dev.
But then I'm old, retired, and notoriously pig-headed ;-)
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/LZVR6SLZN4LCGD4SC4BCVHKMUH6GKA6D/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Lists placed into Emergency Moderation status

2020-06-29 Thread Tim Peters
[Ernest W. Durbin III ]
> At the request of the list moderators of python-ideas and python-dev,
> both lists have been placed into emergency moderation mode. All
> new posts must be approved before landing on the list.
>
> When directed by the list moderators, this moderation will be disabled.

I do at least 99% of python-dev moderation actions, and this was news
to me.  I'm not going to reject anything Python-related based on
whether I (or anyone else) agree(s) with its political spin.  If
that's what's wanted, please contact whichever python-dev moderator(s)
did request this, and tell them the job of actually doing python-dev
moderation is all theirs until this passes.  Otherwise anything
Python-related short of pure abuse will be approved.

Seriously. I'm not going to reject sincere "you're a racist!" posts or
"I know you are, but what am I?" posts.  But I don't even know whether
that's what this is about - all I've seen is some moderately heated
discussion, none of which I would have rejected.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/PYRZRPDJRVVJVICEC3N4YLR2XG2A7SEC/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 622: Structural Pattern Matching

2020-06-25 Thread Tim Peters
[Tim]
 See reply to Glenn. Can you give an example of a dotted name that is
 not a constant value pattern? An example of a non-dotted name that is?
 If you can't do either (and I cannot)), then that's simply what "if

 [Rhodri James ]
>>> case long.chain.of.attributes:

[Tim]
>> That's a dotted name and so is a constant value pattern - read the PEP.
>>
>>  Every dotted name in a pattern is looked up using normal Python
>>  name resolution rules, and the value is used for comparison by
>>  equality with the matching expression (same as for literals).

[Rhodri]
> Then I am surprised, which is worse.  "long.chain.of.attributes" looks
> like an assignment target, and I would have expected the case to have
> been a name pattern.

As always, I don't care whether something is obvious at first glance.
I care whether something can be learned with reasonable effort, and
"sticks" _after_ it's learned.  There's essentially nothing truly
obvious about programming.

This, from the PEP, is the entire grammar for a "name pattern'"

name_pattern: NAME !('.' | '(' | '=')

That's it.  A plain name not followed by a dot, left paren, or equality sign.

While it may or may not surprise any given person at first glance,
it's very simple.  Put a fraction of the effort into learning it as
you're willing to expend on articulating surprise, and it would
already be far behind you ;-)
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/HQHVWUA7TSDOOGJTK4CO32CS3TRHEMW6/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 622: Structural Pattern Matching

2020-06-25 Thread Tim Peters
[Rhodri James ]
> I'm seriously going to maintain that I will forget the meaning of "case
> _:" quickly and regularly,

Actually,  you won't - trust me ;-)

> just as I quickly and regularly forget to use
> "|" instead of "+" for set union.  More accurately, I will quickly and
> regularly forget that in this one place, "_" is special.

Because that's the opposite of "accurate". There's nothing special
about "_" "in this one place". It's but a single application of that
"_" is used as a wildcard in _all_ matching contexts throughout the
PEP.

And it's not even new with this PEP.  "_" is routinely used already in
lots of code to mean "the syntax requires a binding target here, but I
don't care about the binding", from

lists = [[] for _ in range(100)]

to

first, _, third = triple

The last is especially relevant, because that's already a form of destructuring.

The only thing new about this use of "_" in the PEP is that it
specifies no binding will occur. Binding does occur in the examples
above (because there's nothing AT ALL special about "_" now - it's
just a one-character identifier, and all the rest is convention,
including that the REPL uses it to store the value of the
last-displayed object).


>> See reply to Glenn. Can you give an example of a dotted name that is
>> not a constant value pattern? An example of a non-dotted name that is?
>> If you can't do either (and I cannot)), then that's simply what "if

>case long.chain.of.attributes:

That's a dotted name and so is a constant value pattern - read the PEP.

Every dotted name in a pattern is looked up using normal Python
name resolution rules, and the value is used for comparison by
equality with the matching expression (same as for literals).


> or more likely
>
>case (foo.x, foo.y)

Ditto.

> for the first.  For the second, it's a no-brainer that you can't have a
> non-dotted name as a constant value pattern, since the current constant
> value pattern mandates a leading dot.

Not so.  _Solme_ dot is necessary and sufficient to identify a
constant value pattern now.  A leading dot is only _required_ in case
an intended constant value pattern would have no dots otherwise.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/VDDYNQO7JOEZ2ENSHWIJAYBGXAHLBVLI/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 622: Structural Pattern Matching

2020-06-24 Thread Tim Peters
[Taine Zhao ]
> "or" brings an intuition of the execution order of pattern matching, just
> like how people already know about "short-circuiting".
>
> "or" 's operator precedence also suggests the syntax of OR patterns.
>
> As we have "|"  as an existing operator, it seems that there might be
> cases that the precedence of "|" is not consistent with it in an
> expression. This will mislead users.
>
> You said "All reuse of symbols carries baggage", I'd say,
>
> All **inconsistent** reuse of symbols carries baggage,  but the
> consistent reuse builds good intuitive sense and shows the good
> taste of designers.

We're not talking about abstract computation here:  this is a specific
feature, and "|" is the _only_ infix operator.  The PEP considered and
rejected "&" and a unary "not", so that's the universe we're left
with.  With only one "operator", it's really hard to "mislead" ;-)

In any case, the model here is far more regular expressions than
Python int arithmetic or set unions.  "|" means essentially the same
thing in the PEP as it does in Python regexps:  tru supatterns one at
a time, left to right.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/LH6WUBPBAEPSOAMSB37ZWA2Z2G7X47AB/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 622: Structural Pattern Matching

2020-06-24 Thread Tim Peters
[Ethan Furman ]
> "case _:" is easy to miss -- I missed it several times reading through the 
> PEP.

As I said, I don't care about "shallow first impressions".  I care
about how a thing hangs together _after_ climbing its learning curve -
which in this case is about a nanometer tall ;-)

You're not seriously going to maintain that you're struggling to grasp
the meaning of "case _:" now, right?

> Huh.  I would consider "case _:" to be the wart, especially since "case 
> default:"
> or "case anything:" or "case i_dont_care:" all do basically the same thing 
> (although
> they bind to the given name,

Having climbed the trivial learning curve, only a deliberate wise ass
would code garbage like "case i_dont_care:". I don't care about them
either. The one obvious way to do it has already been made clear to
them. You may as well, e.g., complain that there's nothing to stop a
wise ass from writing "-5+6" where everyone else writes "+1".

> while _ does not bind to anything, but of what practical importance is that?) 
> .

One obvious way to do it is of major practical importance.

> ...
> Besides which, if we use "|" instead of "or" then we can't later allow more
> general expressions.

Good!  The PEP is quite complicated enough already.  But if you want
to pursue this seriously, you're going to have your work cut for you
to explain why "|" is more sacred than "or" with respect to "more
general expressions".  If you don't want to preclude anything, then
you need to invent syntax that has no meaning at all now.

>> ".NAME" grated at first, but extends the idea that dotted names are
>> always constant value patterns to "if and only if". So it has mnemonic
>> value.

> How do you get from "." to "iff" ?

See reply to Glenn. Can you give an example of a dotted name that is
not a constant value pattern? An example of a non-dotted name that is?
If you can't do either (and I cannot)), then that's simply what "if
and only if" means.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/YGTDPCASYUQYCMU5PS2S5YO7DBNDYYWP/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 622: Structural Pattern Matching

2020-06-24 Thread Tim Peters
[Tim]
>> ".NAME" grated at first, but extends the idea that dotted names are
>> always constant value patterns to "if and only if". So it has mnemonic
>> value. When context alone can't distinguish whether a name is meant as
>> (in effect) an lvalue or an rvalue, no syntax decorations can prevent
>> coding errors. Names in destructuring constructs are overwhelmingly
>> intended as lvalues, so adding extra cruft to say "no, I meant rvalue"
>> is the pragmatic choice.


[Glenn Linderman ]
> This is just a bunch of words to me, without meaning.
>
> I'd like to understand it.

Did you read the PEP?

> What do you mean by "the idea that dotted names are always constant
> value patterns"?

Under the PEP's "Constant Value Pattern"  section:

Every dotted name in a pattern is looked up using normal Python
name resolution rules, and the value is used for comparison by
equality with the matching expression (same as for literals).

That's what I meant.  "Contains a dot" implies "constant value pattern".

> What do you mean by 'extends (the above) to "if and only if" '?

Because the next sentence from the PEP:

 As a special case to avoid ambiguity with name patterns, simple
 names must be prefixed with a dot to be considered a reference:

completes turning "contains a dot" into a necessary and sufficient
("if and only if") condition for distinguishing a constant value
pattern from a name pattern.  Where "constant value pattern" and "name
pattern" are again used with the PEP's meanings.


> As a result of not understanding the above, I see no mnemonic value.

While I do.  "If I want `xyz` to be interpreted as a constant value
pattern, it needs to contain a dot: `.xyy` should do it.  If I want
`enums.HI` to be interpreted as a constant value, it already contains
a dot, so it will be."

> My understanding of the "." as proposed is that it is optional, except
> in cases where it would be ambiguous... seems like it would be better if
> it were required for one case or the other, so that there would be no
> need to determine whether or not it is ambiguous by the surrounding
> state/code/declarations.

A dot is required, when and only when you want the chunk of syntax to
be interpreted as a constant value pattern.  I said nothing at all
about "_leading_" dots, which appear to be all you have in mind there.
_Some_ dot is mandatory to make it a constant value pattern; a
_leading_ dot may or may not be required.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/YEUQNW5NUAW2OGMRJN22G3JMSIB4PA4J/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 622: Structural Pattern Matching

2020-06-24 Thread Tim Peters
You got everything right the first time ;-)  The PEP is an extended
illustration of "although that way may not be obvious at first unless
you're Dutch".

I too thought "why not else:?" at first. But "case _:" covers it in
the one obvious way after grasping how general wildcard matches are.
Introducing "else:" too would be adding a wart (redundancy) just to
stop shallow-first-impression whining.

"|" is also a fine way to express alternatives. "case" has its own
sub-language with its own rules, and "|" is widely used to express
alternatives (whether in regexps, formal grammars, ...). Spell it,
e.g., "or", and then I wonder "what does short-circuiting have to do
with it?". All reuse of symbols carries baggage.

".NAME" grated at first, but extends the idea that dotted names are
always constant value patterns to "if and only if". So it has mnemonic
value. When context alone can't distinguish whether a name is meant as
(in effect) an lvalue or an rvalue, no syntax decorations can prevent
coding errors. Names in destructuring constructs are overwhelmingly
intended as lvalues, so adding extra cruft to say "no, I meant rvalue"
is the pragmatic choice.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/YDEHQRHB5S4O6KP5ROUZGOKSR3H54T34/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: How to enable tracemalloc for the test suite?

2020-04-05 Thread Tim Peters
For posterity, just recording best guesses for the other mysteries left hanging:

- PYTHONTRACEMALLOC didn't work for you because Victor's traceback
showed that Py_FinalizeEx was executing _PyImport_Fini,, one statement
_after_ it disabled tracemalloc via _PyTraceMalloc_Fini.

- The address passed to free() was for the struct representing the
False singleton, which is static   Since it wasn't obtained from a
malloc/realloc call to begin with, the free() debug code didn't find
anything it expected to find.  As noted earlier, if tracemalloc had
still been active, it wouldn't have found that address in its
database, and so would have produced no output at all.

So - that was easy ;-)
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/BRPNFTDWNJKVZS3KUKWYPN46KGER6LDU/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: How to enable tracemalloc for the test suite?

2020-04-04 Thread Tim Peters
[Skip Montanaro ]
> ...
> I thought setting PYTHONTRACEMALLOC should provoke some useful output,
> but I was confused into thinking I was (am?) still missed something
> because it continued to produce this message:
>
> Enable tracemalloc to get the memory block allocation traceback

Ah, I overlooked that.


> which suggests to me tracemalloc still isn't enabled. That's emitted
> from Modules/_tracemalloc.c and seems to be properly protected:
>
> if (!_Py_tracemalloc_config.tracing) {
> PUTS(fd, "Enable tracemalloc to get the memory block "
>  "allocation traceback\n\n");
> return;
> }
>
> so I think there is still more to do.

Agreed - for whatever reason, tracemalloc is not enabled for you at
the time your run dies, despite your

> % PYTHONTRACEMALLOC=5 ./python ./Tools/scripts/run_tests.py -R
5:50:reflog.txt test_rattlesnake

command line.  Have to leave that mystery for Victor!

So between the "obvious" possibilities:

- something is going nuts spraying 0 all over memory

- the pointer passed to free() wasn't actually obtained from a
malloc/realloc function

there's nothing in the output to favor one over the other.  Except
that dereferencing addresses near the pointer didn't blow up (null
bytes were returned), so the pointer is at least in the process's
address space.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/74XUF5UNOID5YZ2EOUJD5YXOSNHIBEGT/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: How to enable tracemalloc for the test suite?

2020-04-04 Thread Tim Peters
[Skip Montanaro ]
> I've got a memory issue in my modified Python interpreter I'm trying
> to debug. Output at the end of the problematic unit test looks like this:

To my eyes, you left out the most important part ;-)  A traceback
showing who made the fatal free() call to begin with.

In debug mode, allocations are padded on both ends with various stuff:
 FORBIDDENBYTEs (0xfd), a count of the number of bytes originally
requested, and a serial number incremented by 1 each time
alloc/realloc is called.  The memory you requested is entirely filled
with CLEANBYTEs ()xcd).

On deallocation, that stuff is checked (that's where your run is
dying), and the memory is filled with DEADBYTEs (0xdd) (a weak but
nevertheless often effective way to detect "double free"s).

Now in your case, absolutely every byte shown is 0x00.  There are no
useful clues at all remaining.  It's not just a short buffer overrun,
_everything_ has been NULLed out, including before the requested
memory.

So something has gone nuts spraying 0 all over memory ... or the
pointer passed to free() was nonsense to begin with.  If it were a
case of double free, we'd usually expect to see some DEADBYTEs in the
debug output.  But there weren't any.

However, you did not get any output from tracemalloc, despite that it
sounds like you set up the right magic to activate it.  Which leans
toward the "complete nonsense" hypothesis.  If tracemalloc doesn't
find the address in its dict of currently-allocated addresses, looks
like it fails silently, simply not producing any output.  Which is
what you're experiencing.

So where was free() called and what was passed to it?  All the
evidence is consistent with the idea that what was passed to free()
was never obtained from a malloc or reallloc call to begin with.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/A32E5ZVJ4IZBNDHMLOXH2OOAB2CKAYJN/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Are PyObject_RichCompareBool shortcuts part of Python or just CPython quirks?

2020-02-03 Thread Tim Peters
[Tim]
>> PyObject_RichCompareBool(x, y, op) has a (valuable!) shortcut: if x
>> and y are the same object, then equality comparison returns True
>> and inequality False. No attempt is made to execute __eq__ or
>> __ne__ methods in those cases.
>> ...
>> If it's intended that Python-the-language requires this, that needs to
>> be documented.

[Raymond]
> This has been slowly, but perhaps incompletely documented over the
> years and has become baked in the some of the collections ABCs as well.
>  For example, Sequence.__contains__() is defined as:
>
> def __contains__(self, value):
> for v in self:
> if v is value or v == value:  # note the identity test
> return True
> return False

But it's unclear to me whether that's intended to constrain all
implementations, or is just mimicking CPython's list.__contains__.
That's always a problem with operational definitions.  For example,
does it also constrain all implementations to check in iteration
order?  The order can be visible, e.g, in the number of times v.__eq__
is called.


> Various collections need to assume reflexivity, not just for speed, but so 
> that we
> can reason about them and so that they can maintain internal consistency. For
> example, MutableSet defines pop() as:
>
> def pop(self):
> """Return the popped value.  Raise KeyError if empty."""
> it = iter(self)
> try:
> value = next(it)
> except StopIteration:
> raise KeyError from None
> self.discard(value)
> return value

As above, except  CPyhon's own set implementation implementation
doesn't faithfully conform to that:

>>> x = set(range(0, 10, 2))
>>> next(iter(x))
0
>>> x.pop() # returns first in iteration order
0
>>> x.add(1)
>>> next(iter(x))
1
>>> x.pop()  # ditto
1
>>> x.add(1)  # but try it again!
>>> next(iter(x))
1
>>> x.pop() # oops! didn't pop the first in iteration order
2

Not that I care ;-)  Just emphasizing that it's tricky to say no more
(or less) than what's intended.

> That pop() logic implicitly assumes an invariant between membership and 
> iteration:
>
>assert(x in collection for x in collection)

Missing an "all".

> We really don't want to pop() a value *x* and then find that *x* is still
> in the container.   This would happen if iter() found the *x*, but discard()
> couldn't find the object because the object can't or won't recognize itself:

Speaking of which, why is "discard()" called instead of "remove()"?
It's sending a mixed message:  discard() is appropriate when you're
_not_ sure the object being removed is present.


>  s = {float('NaN')}
>  s.pop()
>  assert not s  # Do we want the language to guarantee that
>   # s is now empty?  I think we must.

I can't imagine an actual container implementation that wouldn't. but
no actual container implements pop() in the odd way MutableSet.pop()
is written.  CPython's set.pop does nothing of the sort - doesn't even
have a pointer equality test (except against C's NULL and `dummy`,
used merely to find "the first (starting at the search finger)" slot
actually in use).

In a world where we decided that the identity shortcut is _not_
guaranteed by the language, the real consequence would be that the
MutableSet.pop() implementation would need to be changed (or made
NotImplemented, or documented as being specific to CPython).

> The code for clear() depends on pop() working:
>
> def clear(self):
> """This is slow (creates N new iterators!) but effective."""
> try:
> while True:
> self.pop()
> except KeyError:
> pass
>
> It would unfortunate if clear() could not guarantee a post-condition that the
> container is empty:

That's again a consequence of how MutableSet.pop was written.  No
actual container has any problem implementing clear() without needing
any kind of object comparison.

>  s = {float('NaN')}
>  s.clear()
>  assert not s   # Can this be allowed to fail?

No, but as above it's a very far stretch to say that clear() emptying
a container _relies_ on the object identity shortcut.  That's a just a
consequence of an odd specific clear() implementation, relying in turn
on an odd specific pop() implementation that assumes the shortcut is
in place.


> The case of count() is less clear-cut, but even there 
> identity-implies-equality
> improves our ability to reason about code:

Absolutely!  That "x is x implies equality" is very useful.  But
that's not the question ;-)

>  Given some list, *s*, possibly already populated, would you want the
> following code to always work:
>
>  c = s.count(x)
>  s.append(x)
>  assert s.count(x) == c + 1 # To me, this is fundamental
>   to what the word 
> "count" means.

I would, yes.  But it's also possible to define 

[Python-Dev] Re: Python library for the ServiceNow REST API

2020-01-28 Thread Tim Peters
[Guido]
> Honestly that looked like a spammer.

I approved the message, and it looked like "probably spam" to me too.
But it may have just been a low-quality message, and the new moderator
UI still doesn't support adding custom text to a rejection message.
Under the old system, I _would_ have rejected it, but added a message
explaining why.

As I've said before, I give broad benefit-of-the-doubt to marginal
messages, and for that reason alone some spam is certain to get
approved from time to time.

A vast majority of spam directed to this list does get discarded,
because there's usually no plausible doubt at all.  I don't believe a
legitimate message has ever been rejected, and _that's_ more important
to me.

> Note that at the bottom they wrote
>
>   servicenow training
>
> A typical strategy for such spammers is to send a message to a popular
> mailing list with some text that looks vaguely related (to the spammer,
> who is usually not well versed in the subject of the list) and add some
> link that they want to promote (presumably a hacked or scam site).

I went to the site first, and my up-to-date Malwarebytes installation
(both the full commercial product and the browser extension) found
nothing to complain about except that the page linked to one ad
network (common as mud).  It got a cleaner report from that than
_most_ sites I visit from day to day.  For example, Malwarebytes also
points out that www.python.org links to one "ad network or tracker"
(ssl.google-analytics.com).


> The list moderator ought to ban the user.

I will not, but fine by me if some other python-dev moderator does.
That poster is still subject to moderation.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/EX2QEOZV3BRH5DH3JZOJPULSSZDQZDG3/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Are PyObject_RichCompareBool shortcuts part of Python or just CPython quirks?

2020-01-24 Thread Tim Peters
[Terry Reedy ]

]& skipping all the parts I agree with]

> ...
> Covered by "For user-defined classes which do not define __contains__()
> but do define __iter__(), x in y is True if some value z, for which the
> expression x is z or x == z is true, is produced while iterating over y.
> " in
>
> https://docs.python.org/3/reference/expressions.html#membership-test-operations

Just went through that with Serhiy.  Very briefly, two problems:

1. Those docs are over-specified if we intend that the "x is z"  is an
implementation choice rather than a language requirement.

2. Repeating all that stuff all over the docs is a PITA, error-prone,
and ugly, since contexts that call PyObject_RichCompareBool() under
the covers are all over the place too.

Document it carefully in _one_ place (the Reference Manual already has
your good start), and just plant links to that in high-value locations
elsewhere in the docs.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/4EVEQZZPEPAGSJ25SSG2EQWQ3MUP2ZHW/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Are PyObject_RichCompareBool shortcuts part of Python or just CPython quirks?

2020-01-24 Thread Tim Peters
[Tim]
>> I think it needs more words, though, to flesh out what about this is
>> allowed by the language (as opposed to what CPython happens to do),
>> and to get closer to what Guido is trying to get at with his
>> "*implicit* calls".  For example, it's at work here, but there's not a
>> built-in container in sight:
>>
>> >>> import math
>> >>> def f():
>> ... while True:
>> ... yield math.nan
>> >>> math.nan in f()
>> True

[Serhiy Storchaka ]
> It is documented in:
> https://docs.python.org/3/reference/expressions.html#membership-test-operations
>
> """
> For user-defined classes which do not define :meth:`__contains__` but do
> define
> :meth:`__iter__`, ``x in y`` is ``True`` if some value ``z``, for which the
> expression ``x is z or x == z`` is true, is produced while iterating
> over ``y``.
> ...

But _should_ it be?  There are two parts to that:

1. Consensus seems to be that whether - and where - implicit equality
comparisons exploit object identity should be implementation-defined.
If so, the quoted docs are over-specified, elevating a CPython
implementation choice to a language requirement.

2. As I said in the first message, consequences of this choice are
"all over the place".  Repeating equivalents of "x is z or x == z" all
over the docs is ugly and error-prone, and especially if it also needs
to spell out that the "x is z" part is something CPython does but that
other implementations may not do.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/QLBATZTQQU5H4BGJAFRO3DNXN23NB7UU/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Are PyObject_RichCompareBool shortcuts part of Python or just CPython quirks?

2020-01-24 Thread Tim Peters
[Terry Reedy ]
> ...
> It is, in the section on how to understand and use value comparison
> *operators* ('==', etc.).
> https://docs.python.org/3/reference/expressions.html#value-comparisons
>
> First "The default behavior for equality comparison (== and !=) is based
> on the identity of the objects."
>
> Then in particular, "The built-in containers typically assume identical
> objects are equal to themselves. That lets them bypass equality tests
> for identical objects to improve performance and to maintain their
> internal invariants."

Cool!  I didn't find that, and I assume Victor didn't either.  It's good!

I think it needs more words, though, to flesh out what about this is
allowed by the language (as opposed to what CPython happens to do),
and to get closer to what Guido is trying to get at with his
"*implicit* calls".  For example, it's at work here, but there's not a
built-in container in sight:

>>> import math
>>> def f():
... while True:
... yield math.nan
>>> math.nan in f()
True


>> For example, in __eq__() method documentation:
>> https://docs.python.org/dev/reference/datamodel.html#object.__eq

> The text that follows discusses rich comparison *special methods* and
> how to write them.  It should refer people to the value-comparisons
> section for information on specific classes, as in the second quote
> above.  It would not hurt if the operator section referred people back
> to special method discussion.  I think you should go ahead and add one
> or both links.

Agreed.  If I want to know why my __eq__ (or __ne__) isn't being
called, it's __eq__/__ne__ docs I'm going to look at first.  And
probably last, unless I'm nudged in the right direction.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/T4BAR4DVTMTR3GOJ2Y2AL2ZBCL6VSPUK/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Are PyObject_RichCompareBool shortcuts part of Python or just CPython quirks?

2020-01-24 Thread Tim Peters
[Inada Naoki ]
> FWIW, (list|tuple).__eq__ and (list|tuple).__contains__ uses it too.
> It is very important to compare recursive sequences.
>
> >>> x = []
> >>> x.append(x)
> >>> y = [x]
> >>> z = [x]
> >>> y == z
> True

That's a visible consequence, but I'm afraid this too must be
considered an implementation-defined quirk.  If doing a fine job of
comparing recursive containers was the actual goal, we would have
needed to do a much fancier thing, akin to testing graph isomorphism:

>>> a = []
>>> b = []
>>> for x in a, b:
... x.append(x)
>>> a
[[...]]
>>> b
[[...]]
>>> a == a
True
>>> b == b
True
>>> a == b
Traceback (most recent call last):
...
RecursionError: maximum recursion depth exceeded in comparison

Instead the happy behavior in your example (and in my first two
examples) just follows as a no-effort consequence of the
special-casing for identity equality in PyObject_RichCompareBool().

(Note that the _top_ level comparisons in my "a == a" and "b == b"
examples do _not_ exploit that the comparands are the same object -
it's list_richcompare() calling PyObject_RichCompareBool() where the
latter tells the former that a[0] and a[0] are equal.)
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/L4RPMHX7USTWLACBC5URKB3WXNBH2DKA/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Are PyObject_RichCompareBool shortcuts part of Python or just CPython quirks?

2020-01-23 Thread Tim Peters
PyObject_RichCompareBool(x, y, op) has a (valuable!) shortcut:  if x
and y are the same object, then equality comparison returns True and
inequality False.  No attempt is made to execute __eq__ or __ne__
methods in those cases.

This has visible consequences all over the place, but they don't
appear to be documented.  For example,

>>> import math
>>> ([math.nan] * 5).count(math.nan)
5

despite that `math.nan == math.nan` is False.

It's usually clear which methods will be called, and when, but not
really here.  Any _context_ that calls PyObject_RichCompareBool()
under the covers, for an equality or inequality test, may or may not
invoke __eq__ or __ne__, depending on whether the comparands are the
same object.  Also any context that inlines these special cases to
avoid the overhead of calling PyObject_RichCompareBool() at all.

If it's intended that Python-the-language requires this, that needs to
be documented.

Or if it's implementation-defined, then _that_ needs to be documented.

Which isn't straightforward in either case, in part because
PyObject_RichCompareBool isn't a language-level concept.

This came up recently when someone _noticed_ the list.count(NaN)
behavior, and Victor made a PR to document it:

https://github.com/python/cpython/pull/18130

I'm pushing back, because documenting it _only_ for .count() makes
.count() seem unique in a way it isn't, and doesn't resolve the
fundamental issue:  is this language behavior, or implementation
behavior?

Which I don't want to argue about.  But you certainly should ;-)
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/3ZAMS473HGHSI64XB3UV4XBICTG2DKVF/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Documenting Python's float.__str__()

2020-01-21 Thread Tim Peters
[Serhiy Storchaka]
> This is not the only difference between '.17g' and repr().
>
> >>> '%.17g' % 1.23456789
> '1.23456788'
> >>> format(1.23456789, '.17g')
> '1.23456788'
> >>> repr(1.23456789)
> '1.23456789'

More amazingly ;-), repr() isn't even always the same as a %g format
specifying exactly the same number of digits as repr() produces.

That's because repr(x) currently returns the shortest string such that
eval(repr(x)) == x, but %g rounds correctly to the given number of
digits.  Not always the same thing!

>>> x = 2.0 ** 89
>>> print(repr(x))
6.189700196426902e+26
>>> print("%.16g" % x) # repr produced 16 digits
6.189700196426901e+26

The repr() output is NOT correctly rounded.  To see which one is
correctly rounded, here's an easy way:

>>> import decimal
>>> decimal.Decimal(x)
Decimal('618970019642690137449562112')

The "37449562112" is rounded off, and is less than half a unit in the
last place, so correct rounding truncates the last digit to 1.

But there is no string with 16 digits other than the incorrectly
rounded one repr() returns that gives x back.  In particular, the
correctly rounded 16 digit string does not:

>>> 6.189700196426901e+26 # 16-digit correctly rounded fails
6.189700196426901e+26
>>> x == _
False

To my mind it's idiotic(*) that "shortest string" requires incorrect
rounding in some cases.

In Python's history, eval(repr(x)) == x is something that was always
intended, so long as the writing and reading was done by the same
Python instance on the same machine.  Maybe it's time to document that
;-)

But CPython goes far beyond that now, also supplying correct rounding,
_except_ for repr's output, where - for reasons already illustrated -
"correct rounding" and "shortest" can't always both be satisfied.

(*) What wouldn't be idiotic?  For repr(x) to return the shortest
_correctly rounded_ string such that eval(repr(x)) == x.  In the
example, that would require repr(x) to produce a 17-digit output (and
17 is the most that's ever needed for a Python float on boxes with
IEEE doubles).  But "shortest string" was taken ultra literally by the
people who first worked out routines capable of doing that, so has
become a de facto standard now.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/F3BOGGTGJPZS3RR7FKG7YE6GYADHYI76/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Object deallocation during the finalization of Python program

2020-01-09 Thread Tim Peters
[Pau Freixes ]
> Recently I've been facing a really weird bug where a Python program
> was randomly segfaulting during the finalization, the program was
> using some C extensions via Cython.

There's nothing general that can be said that would help.  These
things require excruciating details to resolve.

It's generally the case that these things are caused by code missing
some aspect of correct use of Python's C API.  Which can be subtle!
It's very rare that such misuse is found in the core, but not so rare
in extensions.

Here's the most recent.  Be aware that it's a very long report, and -
indeed - is stuffed to the gills with excruciating details ;-)

https://bugs.python.org/issue38006

In that case a popular C extension wasn't really "playing by the
rules", but we changed CPython's cyclic garbage collector to prevent
it from segfaulting anyway.

So, one thing you didn't tell us:  which version of Python were you
using?  If not the most recent (3.8.1), try again with that (which
contains the patches discussed in the issue report above).

> Debugging the issue I realized that during the deallocation of one of
> the Python objects the deallocation function was trying to release a
> pointer that was surprisingly assigned to  NULL. The pointer was at
> the same time held by another Python object that was an attribute of
> the Python object that had the deallocation function, something like
> this:
>
> ...
>
> So my question would be, could CPython deallocate the objects during
> the finalization step without considering the dependencies between
> objects?

No, it could not.  BUT.  But but but.  If C code isn't playing by all
the rules, it's possible that CPython doesn't _know_ what the
dependencies actually are.  CPython can only know about pointers that
the user's C API calls tell CPython about.  In the report above, the C
extension didn't tell CPython about some pointers in cycles at all.

Curiously, for most kinds of garbage collectors, failing to inform the
collector about pointers is massively & widely & very reliably
catastrophic.  If it doesn't see a pointer, it can conclude that a
live object is actually trash.  But CPython's cyclic collector follows
pointers to prove objects _are_ trash, not to prove that they're
alive.  So when CPython isn't told about a pointer, it usually "fails
soft", believing a trash object is actually alive instead.  That can
cause a memory leak, but usually not a crash.

But not always.  In the presence of weakrefs too, believing a trash
object is actually alive can cause some kinds of finalization to be
done "too late", which can cause a crash when refcounting discovers
that the object actually is trash.  No, you're not going to get a
simple example of that ;-)  DIg through the issue report above.

> If this is not the right list to make this kind of questions, just let
> me know what would be the best place for making this kind of questions

Open an issue report instead?  Discussions won't solve it, alas.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/WTH2LH3S52CN435B6I273BJ3QJW5GTES/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Should set objects maintain insertion order too?

2019-12-29 Thread Tim Peters
[Larry]
> It's a lightweight abstract dependency graph.  Its nodes are opaque,
> only required to be hashable.  And it doesn't require that you give it
> all the nodes in strict dependency order.
>
> When you add a node, you can also optionally specify
> dependencies, and those dependencies aren't required
> to be nodes that the graph has previously seen.  So you can add
> a node A that depends on B and C, without showing it B or C
> first.  When that happens it creates placeholder nodes for B
> and C, and naturally these nodes have no dependencies.  You
> can then later inform the graph that node B has dependencies
> on X Y and Z.
>
> (My specific use case is a package manager.  Packages are identified
> with unique strings.  So the nodes I give the give the graph are simply
> those package names.  It's pretty common for me to get a package
> with dependencies on packages I haven't seen yet.  Designing the
> graph to create placeholders let me skip making two passes over
> the package list, pre-creating the package objects, etc etc.  This
> made the code simpler and presumably faster.)
>
> The graph API maintains an externally-visible "ready" iterable of
> nodes.  And since B can go from ready to not-ready, it can get added
> to "ready" and subsequently removed.
>
> Also, when a node is satisfied, you simply remove it from the graph.
>  If A depends on B and C, and they all represent jobs, and B and C
> have no dependencies, they'll be in the "ready" iterable.  You iterate
> over "ready", and execute those jobs, and assuming they are
> successful you then remove them from the graph.  When A's
> dependencies are all satisfied, it'll get added to the "ready" iterable.
>  And naturally when B and C were removed from the graph, they were
> removed from the "ready" iterable too.

OK!  You're doing a topological sort.  There are natural & simple ways
to do that right now that scale efficiently to very large graphs (time
& space linear in the sum of the number of nodes and the number of
dependencies).  Curiously, the difficulties are mostly driven by the
quality of _error_ messages you want (in case of, e.g., cycles in the
dependency graph).

Lots of those implementations became deterministic "by magic" when
ordered dicts were introduced.  This is why:  a bare bones
implementation needs to associate two pieces of info with each node:
a list of its immediate successors, and an integer count of the number
of its immediate predecessors.  A dict is the natural way to implement
that association.

When all the dependency info has been entered, then:

- First all nodes are emitted whose predecessor count is 0.  Iterating
over the association dict once is the natural way to find them, and
that order is defined now.

- Then, as each emitted node N is marked done:
  for child in N.successors:
  assert child.predcount > 0
  child.predcount -= 1
  if child.predcount == 0:
  emit(child)

That's about all there is to it.  But there's no end to the cruft that
_can_ be added to, e.g., verify that a node is marked done no more
than once, or to compute & display strongly connected components if
not all nodes are emitted, etc.

Ordered sets could improve this "natural" implementation if the
successor lists were successor sets instead, simply because then -
like lists - their iteration order would be defined, which can in turn
affect the order in which nodes are emitted in the loop just above.
But lists are adequate to the task, are cheaper in both time and
space, and can't _not_ be ordered ;-)


> Thus it's this "ready" collection that I want to a) be iterable, b) be 
> stable, and
> c) support fast removal.  I add and remove nodes from that set a lot, which I
> realized would perturb the iteration order from run to run, etc etc etc, and 
> here
> we are.

The sketch above (which is a bog-common way to implement topsorts)
doesn't build a data structure recording the final order, and, in
fact, never deletes anything.  You might protest that the initial
iteration step (scanning the association dict to find nodes with no
predecessors) is an expense you skip because nodes with predecessors
are deleted from your "ready" structure all along.  But nodes with
predecessors are _touched_ either way, and merely skipping over them
is bound to be cheaper than deleting them.

After that initial step, no search of any kind is needed again.

> I grant you maybe it's a weird approach.  But after two false starts, where I
> was starting to boil the oceans with ABCs and "node is satisfied" APis and
> separate iterator objects, I had a re-think and hit on this super lightweight
> design.  I'm actually quite pleased with it--it's short, it has a minimal API,
> it's readable, it was easy to get right, and it solves my problem.

Whereas I would have started with a topsort and finished it while I
was sleeping ;-)  Seriously, I've written such a thing many times, but
never reuse the code because it's so easy to 

[Python-Dev] Re: Should set objects maintain insertion order too?

2019-12-28 Thread Tim Peters
[Larry]
> Here is the original description of my problem, from the original email in
> this thread.  I considered this an adequate explanation of my problem
> at the time.

>> I do have a use case for this. In one project I maintain a "ready" list of
>> jobs; I need to iterate over it, but I also want fast lookup because I
>> soemtimes remove jobs when they're subsequently marked "not ready".

Which is a description not of "the problem", but of the operations you
want to perform.  It's the first time in my life I've heard of a
"ready list of jobs" where the programmer sometimes decides later "oh,
no - this one and that one aren't ready after all" ;-)

Not that it matters much - you want the operations you want.  The lack
of "oh - of course - that's a problem we all face at times"
motivation, though, works against it being _compelling_ on its own.

> ...
> In subsequent emails I've clarified that my workloads are small enough--and
> computers are fast enough--that almost any data structure would work fine
> for me here, e.g. a list.
>
>  I don't think my needs should drive the decision making process regardless.
>  I only described my problem to get the conversation rolling.

Which it did :-)  Others went on to, explicitly or implicitly, suggest
that an ordered set must also, e.g., support the entire
MutableSequence interface, and even define what happens if you mutate
the ordered set _while_ iterating over it (lists define the results in
such cases, but ordered dicts raise an exception).


> I opine that anybody who iterates over set objects and has bugs in their
> code would appreciate set objects maintaining insertion order.  I suspect
> it would make their debugging easier, because given identical inputs their
> set iteration would behave identically, thus making their bugs that much
> more stable.  That's as much use case as I have for the feature.

That's much stronger to me than some weird FIFO-only queue supporting
fast deletion of interior elements (which - sorry - I've never had a
use for).

And fine by me if such a thing is added.  All else being equal, I'd
prefer that the builtin set type be ordered, simply because it's less
surprising. and dicts are ordered now (although, as Inada Naoki said,
their current implementation isn't suitable for a FIFO queue, doe
primarily to O(N) time needed to find "the first" key+value record).

I believe we agree that adding _some_ flavor of OrderedSet to
`collections` would be a fine start.  I'm just not cool with replacing
the builtin set before there's an implementation fast and
memory-efficient enough that _current_ heavy set users won't feel
betrayed by major slowdowns or memory bloat when the switch is made.

Toward that end, my opinion - which I won't defend - is that
OrderedSet should not promise better than O(N)-time indexing (if it
supports indexing at all), and should - like current sets and dicts -
try to raise an exception if it can detect that the container has been
mutated during iteration.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/AH4LQB3OCMCE22FEWUALRND5P6AHUWE6/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Should set objects maintain insertion order too?

2019-12-27 Thread Tim Peters
[Nick Coghlan ]
> I took Larry's request a slightly different way: he has a use case where
> he wants order preservation (so built in sets aren't good), but combined
> with low cost duplicate identification and elimination and removal of
> arbitrary elements (so lists and collections.deque aren't good). Organising
> a work queue that way seems common enough ...

Is it?  I confess I haven't thought of a plausible use case.  Larry
didn't really explain his problem, just suggested that an ordered set
would be "a solution" to it.

The problem:  whether there are duplicates "in the queue" is a
question about an implementation detail, hard for me to translate to a
question about the _task_ to be solved.

For example, suppose "the task" is to make a deep copy of a web site.
A "job" consists of following a URL, sucking down the page text, and
adding new jobs for contained URLs on the page.

We probably don't want to suck down the page text multiple times for a
given URL, but checking whether a URL is currently already in the job
queue is asking a question about an implementation detail that misses
the point:  we want to know whether that URL has already been chased,
period.  Whether the URL is currently in the queue is irrelevant to
that.  The only logical relationship is that a job that has finished
_was_ in the queue at some time before the job finished.

So, ya, I've seen and implemented lots of work queues along these
lines - but an OrderedSet would be an "attractive nuisance"
(offhandedly appearing to solve a problem it doesn't actually
address):

jobs = some_kind_of_queue()
finished_jobs = set()
...
while jobs:
job = jobs.get()
if job in finished_jobs:
continue
try:
work on the job
possibly adding (sub)jobs to the `jobs` queue
except TransientExceptions:
jobs.put(job)  # try again later
else:
finished_jobs.add(job)

Clear?  There is a queue here, and a set, but they can't be combined
in a useful way:  the set contents have nothing to do with what's
currently in the queue - the set consists of jobs that have been
successfully completed; the queue doesn't care whether it contains
duplicates, and merely skips over jobs that have been completed by the
time the queue spits them out (which strikes me as more robust design
than duplicating logic everywhere a job is added to a queue to try to
prevent adding already-done jobs to begin with).

Similarly, in other places I've used a "finished" flag on the object
instead of a set, or a dict mapping jobs to info about job status and
results.  But in all these cases, the vital info associated with a job
really has little to do with whether the job is currently sitting on a
queue.

If "is it currently on the queue?" _is_ a common question to ask,
perhaps someone could give a semi-realistic example?
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/2XDEVSO5S5ZLVZTLX43UHBOJWMXYJIIB/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Should set objects maintain insertion order too?

2019-12-26 Thread Tim Peters
[Tim]
>> - I don't have a theory for why dict build time is _so_ much higher
>> than dict lookup time for the nasty keys.

To be clearer, in context this was meant to be _compared to_ the
situation for sets.  These were the numbers:

11184810 nasty keys
dict build   23.32
dict lookup  13.26
set build 9.79
set lookup8.46

Build time is higher than lookup time for both, which isn't
surprising.  But it's _much_ higher (whether in absolute or relative
terms) for dicts than for sets.

[Serhiy Storchaka ]
> 1/2 of items were copied to a new hashtable when the dict grew up. 1/4
> of items were copied twice. 1/8 of items were copied three times, etc.
> In average every item is copied 1 time, so the build time is twice the
> lookup time when the cost of lookup is dominated.

I agree the average total number of table insertions per element
approaches 2 either way (counting insertions due to internal table
growth as well as insertions visible from Python code).  That's why
it's expected that build time exceeds lookup time (the latter does
exactly one lookup per element).


> But the lookup benchmarking includes the overhead of iterating Python
> loop, which is more significant for "good" keys, thus the lookup time is
> larger than the build time.

Iteration time is too minor to care much about.  Even in the worst
case (sets with the nasty keys), the Python:

for k in d:
pass

takes under a second here.  Take a full second off the "set lookup"
time, and dict build time still exceeds dict lookup time (whether in
absolute or relative terms) much more so than for sets.

But I'm not going to pursue it.  The high-order bit was just to
demonstrate that the set object's more complicated (than dicts) probe
sequence does buy highly significant speedups - in at least one highly
contrived case.  Since the probe sequence changes were aimed at
reducing cache misses, a full account requires comparing cache misses
between the dict and set implementations.  That's a mess, and slogging
through it appears unlikey to result in insight.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/IYPHUKW7DGYNAFAC4AOCDYYUWANHN5YE/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Should set objects maintain insertion order too?

2019-12-25 Thread Tim Peters
[Tim]
> I know the theoretical number of probes for dicts, but not for sets
> anymore.  The latter use a mix of probe strategies now, "randomish"
> jumps (same as for dicts) but also purely linear ("up by 1") probing
> to try to exploit L1 cache.
>
> It's not _apparent_ to me that the mix actually helps ;-)  I just
> trust that Raymond timed stuff until he was sure it does.

To illustrate, I contrived a case where the set optimizations clearly
pay off.  Output (3.8.1, Win 10 Pro, quad core box with plenty of
RAM):

11184810 nice keys
dict build0.54
dict iter 0.07
dict lookup   0.81
set build 0.54
set iter  0.08
set lookup0.82

11184810 nasty keys
dict build   23.32
dict iter 0.09
dict lookup  13.26
set build 9.79
set iter  0.28
set lookup8.46

I didn't make heroic efforts to keep the machine quiet, and there's
substantial variation across runs.  For example, there's no real
difference beteen "0.07" and "0.08".

Some things to note:

- The "nice" keys act much the same regardless of whether dict or set.
Dicts have an advantage for iteration "in theory" in the absence of
deletions because there are no "holes" in the area storing the
hash+key+value records.  But for these specific keys, the same is true
of sets:  the initial hash probes are at indices 0, 1, 2, ...,
len(the_set)-1, and there are no holes in its hash+key records either
(all the empty slots are together "at the right end").

- Sets get a lot of value out of their fancier probe sequence for the
nasty keys.  There are lots of collisions, and sets can much more
often resolve them from L1 cache.  Better than a factor of 2 for build
time is nothing to sneeze at.

- Iteration for sets is substantially slower in this case, for two
reasons:  (1) sets _do_ have holes for this collection of keys; and,
(2) while I don't think it's documented, the maximum load factor for
sets is lower than for dicts, and 11184810 was picked to give the
maximum load factor for a dict with 2**24 slots.  The set in this case
has twice as many slots as the dict, and all the extras are empty
slots that iteration has to skip over.

- I don't have a theory for why dict build time is _so_ much higher
than dict lookup time for the nasty keys.

- If you don't know a lot about how these things are implemented, just
about everything is baffling ;-)  Integer keys are important in
practice, but things about how modern dicts are implemented were
_driven_ by opportunities to exploit properties of the mostly-trivial
int hash function.  For "general" timing, it's better to use string
keys, whose hash codes are "pretty randomish" (but can vary across
runs).  When using int keys, it's really easy to _end up_ measuring
happy behaviors _specific_ to int keys.

Code:

from time import perf_counter as now
import sys
from collections import deque

def disp(text, f):
print(f"{text:20s} {f:5.2f}")

M = sys.hash_info.modulus
targetlen = 2**24 * 2 // 3 # highest load factor for dict
hc = 1
badkeys = []
while len(badkeys) < targetlen:
# add no more than 100 ints with hash code `hc`
toadd = min(100, targetlen - len(badkeys))
badkeys.extend(hc + i * M for i in range(toadd))
hc += 713 # more than less arbitrary
goodkeys = list(range(targetlen))

for tag, keys in ("nice", goodkeys), ("nasty", badkeys):
print()
print(len(keys), tag, "keys")

for thetype, builder in ("dict", dict.fromkeys), ("set", set):
s = now()
d = builder(keys)
f = now()
disp(thetype + " build", f-s)

s = now()
deque(d, maxlen=0)
f = now()
disp(thetype + " iter", f-s)

s = now()
for k in keys:
assert k in d
f = now()
disp(thetype + " lookup", f-s)

del d
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/S3UB5TCSUBLYSESG7WO2HNCFBZB5N4BG/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Should set objects maintain insertion order too?

2019-12-23 Thread Tim Peters
>> Also, I believe that max "reasonable" integer range of no collision
>> is (-2305843009213693951, 2305843009213693951), ...

> Any range  that does _not_ contain both -2 and -1 (-1 is an annoying
> special case, with hash(-1) == hash(-2) == -2), and spans no more than
> sys.hash_info.modulus integers.  Apart from that, the sign and
> magnitude of the start of the range don't matter; e.g.,
>
> >>> len(set(hash(i) for i in range(10**5000, 10**5000 + 100)))
> 100
> >>> len(set(hash(i) for i in range(-10**5000, -10**5000 + 100)))
> 100

Sorry again!  That only shows that the hash codes are unique.  Dicts
and sets use only the last N bits to determine the start of the probe
sequence, for some value of N >= 3 (depending on the table size).  So
for a table of size a million, the last 20 bits (100 ~= 2**20) are
interesting.  But same thing:

>>> MASK = (1 << 20) - 1
>>> len(set(hash(i) & MASK for i in range(-10**5000, -10**5000 + 100)))
100
>>> len(set(hash(i) & MASK for i in range(-10**5000, -10**5000 + 100)))
100
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/BXHOS73YMJDSVZUM74VYXXYN5WW63BZ4/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Should set objects maintain insertion order too?

2019-12-23 Thread Tim Peters
[Kyle]
> ...
> For some reason, I had assumed in the back of my head (without
> giving it much thought) that the average collision rate would be the
> same for set items and dict keys. Thanks for the useful information.

I know the theoretical number of probes for dicts, but not for sets
anymore.  The latter use a mix of probe strategies now, "randomish"
jumps (same as for dicts) but also purely linear ("up by 1") probing
to try to exploit L1 cache.

It's not _apparent_ to me that the mix actually helps ;-)  I just
trust that Raymond timed stuff until he was sure it does.

> ...
> Ah, I forgot to consider how the hash function actually works for continuous
> integers. A better comparison to demonstrate the collision differences would
> likely use random strings.

And, at an extreme, a class with a __hash__ that always returns the
same integer.

> Also, I believe that max "reasonable" integer range of no collision
> is (-2305843009213693951, 2305843009213693951), ...

Any range  that does _not_ contain both -2 and -1 (-1 is an annoying
special case, with hash(-1) == hash(-2) == -2), and spans no more than
sys.hash_info.modulus integers.  Apart from that, the sign and
magnitude of the start of the range don't matter; e.g.,

>>> len(set(hash(i) for i in range(10**5000, 10**5000 + 100)))
100
>>> len(set(hash(i) for i in range(-10**5000, -10**5000 + 100)))
100
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/FXTRZLOHSWYV5KEGJ4DYNS7HOMUOPVE5/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Should set objects maintain insertion order too?

2019-12-23 Thread Tim Peters
Sorry!  A previous attempt to reply got sent before I typed anything :-(

Very briefly:

> >>> timeit.timeit("set(i for i in range(1000))", number=100_000)

[and other examples using a range of integers]

The collision resolution strategy for sets evolved to be fancier than
for dicts, to reduce cache misses.  This is important for sets because
the _only_ interesting thing about an element wrt a set is whether or
not the set contains it.   Lookup speed is everything for sets.

If you use a contiguous range of "reasonable" integers for keys, the
integer hash function is perfect:  there's never a collision.  So any
such test misses all the work Raymond did to speed set lookups.
String keys have sufficiently "random" hashes to reliably create
collisions, though.  Cache misses can, of course, have massive effects
on timing.

> Add (much faster for dicts):
> >>> timeit.timeit("s = set(); s.add(0)", number=100_000_000)
> 13.330938750001224
> >>> timeit.timeit("d = {}; d[0] = None", number=100_000_000)
> 5.788865385999088

In the former case you're primarily measuring the time to look up the
"add" method of sets (that's more expensive than adding 0 to the set).
A better comparison would, e.g., move `add = s.add` to a setup line,
and use plain "add(0)" in the loop.

That's it!
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/5UKYDB36WDRRIYOLRD52QS4I43SCAAJ6/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Should set objects maintain insertion order too?

2019-12-23 Thread Tim Peters
peculation since I haven't ran any benchmarks for an 
>>> OrderedSet implementation, but I strongly suspect that OrderedSet will end 
>>> up having better average performance for add, pop, and update than trying 
>>> to use a dictionary with None values (assuming it's implemented in C or at 
>>> least with a C accelerator).
>>>
>>> Not to mention allowing the ability to update (for both addition and 
>>> removal) based on intersections and unions across ordered sets, which 
>>> currently isn't possible to do with dictionaries (at least not directly 
>>> with |=, &=, -=. and ^=). I'm not sure how much of a practical use case 
>>> this would have for ordered sets specifically, but it's a nice bonus.
>>>
>>> On Sat, Dec 21, 2019 at 7:59 AM Larry Hastings  wrote:
>>>>
>>>>
>>>> (mixing together messages in the thread, sorry threaded-view readers)
>>>>
>>>> On 12/19/19 3:15 PM, Tim Peters wrote:
>>>>
>>>> [Nick]
>>>>
>>>> I took Larry's request a slightly different way:
>>>>
>>>> [...]
>>>>
>>>> Only Larry can answer whether that would meet his perceived need.  My
>>>> _guess_ is that he wouldn't know OrderedSet existed, and, even if he
>>>> did, he'd use a dict with None values anyway (because it's less hassle
>>>> and does everything he wanted).
>>>>
>>>>
>>>> At last, something I can speak knowledgeably about: Larry's use case!  
>>>> Regarding Larry, I'd say
>>>>
>>>> his use case was small enough that almost anything maintaining order would 
>>>> have worked, even a list,
>>>> he often does have a pretty good idea what goodies are salted away in the 
>>>> Python standard library, and
>>>> a hypothetical collections.OrderedSet would probably work just fine.  And 
>>>> he'd probably use it too, as that makes the code clearer / easier to read. 
>>>>  If you read code using an OrderedSet, you know what the intent was and 
>>>> what the semantics are.  If you see code using a dict but setting the 
>>>> values to None, you think "that's crazy" and now you have a puzzle to 
>>>> solve.  Let's hope those comments are accurate!
>>>>
>>>>
>>>> Also, speaking personally, at least once (maybe twice?) in this thread 
>>>> folks have asked "would YOU, Mr Larry, really want ordered sets if it 
>>>> meant sets were slightly slower?"
>>>>
>>>> The first thing I'd say is, I'm not sure why people should care about 
>>>> what's best for me.  That's sweet of you!  But you really shouldn't.
>>>>
>>>> The second thing is, my needs are modest, so the speed hit we're talking 
>>>> about would likely be statistically insignificant, for me.
>>>>
>>>> And the third thing is, I don't really put the set() API through much of a 
>>>> workout anyway.  I build sets, I add and remove items, I iterate over 
>>>> them, I do the occasional union, and only very rarely do I use anything 
>>>> else.  Even then, when I write code where I reach for a fancy operation 
>>>> like intersection or symmetric_difference--what tends to happen is, I have 
>>>> a rethink and a refactor, and after I simplify my code I don't need the 
>>>> fancy operations anymore.  I can't remember the last time I used one of 
>>>> these fancy operations where the code really stuck around for a long time.
>>>>
>>>> So, personally?  Sure, I'd take that tradeoff.  As already established, I 
>>>> like that dict objects maintain their order, and I think it'd be swell if 
>>>> sets maintained their order too.  I suspect the performance hit wouldn't 
>>>> affect me in any meaningful way, and I could benefit from the 
>>>> order-maintaining semantics.  I bet it'd have other benefits too, like 
>>>> making regression tests more stable.  And my (admittedly uninformed!) 
>>>> assumption is, the loss of performance would mostly be in these 
>>>> sophisticated operations that I never seem to use anyway.
>>>>
>>>> But I don't mistake my needs for the needs of the Python community at 
>>>> large.  I'd be mighty interested to read other folks coming forward and 
>>>> saying, for example, "please don't make set objects any slower!" and 
>>>> talking us through neat real-world use c

[Python-Dev] Re: Should set objects maintain insertion order too?

2019-12-20 Thread Tim Peters
[Inada Naoki ]
> I just meant the performance of the next(iter(D)) is the most critical part
> when you implement orderdset on top of the current dict and use it as a queue.

Which is a good point.  I added a lot more, though, because Wes didn't
even mention queues in his question:

 [Wes Turner ]
 How slow and space-inefficient would it be to just implement the set 
 methods
 on top of dict?

I can't guess what he was _really_ asking about, so now he's got lots
of answers ;-)

> This code should be O(N), but it's O(N^2) if q is implemented on top
> of the dict.
>
>while q:
>item = q.popleft()
>
> Sorry for the confusion.

To flesh this out, the FIFO queue entries are all together in a
contiguous vector, with no "holes" in the middle.  Pushing a new item
appends "to the right" (higher index in the vector), while popping an
item leaves "a hole" at the left.  But there are no holes "in the
middle" in this case.

So the first real entry is at a higher index with every pop, but
iteration starts at index 0 every time.  The more pops there are, the
more holes iteration has to skip over, one at a time, to find the
first real entry remaining.

Until pushing a new item would fall off the right end of the vector.
Then the table is rebuilt from scratch, moving all the real entries to
form a contiguous block starting at index 0 again.


>> And the current set implementation (like the older dict
>> implementation) never needs to "rebuild the table from scratch" unless
>> the cardinality of the set keeps growing.

> It is a bit misleading.  If "the cardinality of the set" means len(S),

Yes.

> set requires the rebuild in low frequency if the its items are random.
>
> Anyway, it is a smaller problem than next(iter(D)) because of the
> amortized cost is O(1).

For the specific case of FIFO queues, sure.  In general, though, "O(1)
period". is more desirable than "amortized O(1)".  This is especially
true when sets get very large.


> Current dict is not optimized for queue usage.

Nor should it be:-)  That's what collections.deque is for, pushes and
pops, at both ends, always time O(1) period.  No holes, no internal
rearrangements, and O(1) "wasted" space.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/5VDU52JX2GFM6546W6FCFKXIODDAQKP4/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Should set objects maintain insertion order too?

2019-12-20 Thread Tim Peters
[Wes Turner ]
>> How slow and space-inefficient would it be to just implement the set methods
>> on top of dict?

[Inada Naoki ]
> Speed:  Dict doesn't cache the position of the first item.  Calling
> next(iter(D)) repeatedly is O(N) in worst case.
> ...

See also Raymond's (only) message in this thread.  We would also lose
low-level speed optimizations specific to the current set
implementation.  And we would need to define what "insertion order"
_means_ for operators (union, intersection, symmetric difference) that
build a new set out of other sets without a user-level "insert" in
sight.  However that's defined, it may constrain possible
implementations in ways that are inherently slower (for example, may
require the implementation to iterate over the larger operand where
they currently iterate over the smaller).

And the current set implementation (like the older dict
implementation) never needs to "rebuild the table from scratch" unless
the cardinality of the set keeps growing.  As Raymond telegraphically
hinted, the current dict implementation can, at times, in the presence
of deletions, require rebuilding the table from scratch even if the
dict's maximum size remains bounded.

That last can't be seen directly from Python code (rebuilding the
table is invisible at the Python level).  Here's a short example:

d = dict.fromkeys(range(3))
while True:
del d[0]
d[0] = None

Run that with a breakpoint set on dictobject.c's `dictresize()`
function.  You'll see that it rebuilds the table from scratch every
3rd time through the loop.  In effect, for the purpose of deciding
whether it needs to rebuild, the current dict implementation sees no
difference between adding a new element and deleting an existing
element   Deleting leaves behind "holes" that periodically have to be
squashed out of existence so that insertion order can be maintained in
a dead simple way upon future additions.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/BVCDKT5ULY324RZATMCEFGAMYOBJFHIZ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Should set objects maintain insertion order too?

2019-12-19 Thread Tim Peters
[Nick]
> I must admit that I was assuming without stating that a full OrderedSet
> implementation would support the MutableSequence interface.

Efficient access via index position too would be an enormous new
requirement,  My bet:  basic operations would need to change from O(1)
to O(log(N)).

BTW, in previous msgs there are links to various implementations
calling themselves "ordered sets".  One of them supplies O(1)
indexing, but at the expense of making deletion O(N) (!):

https://pypi.org/project/ordered-set/

If efficient indexing is really wanted, then the original "use case"
Larry gave was definitely obscuring an XY problem ;-)
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/IBRSGUTHIMOZ6JGIYJBQJFXEANFZI4V5/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Should set objects maintain insertion order too?

2019-12-19 Thread Tim Peters
[David Mertz ]
> It's not obvious to me that insertion order is even the most obvious or
> most commonly relevant sort order.  I'm sure it is for Larry's program, but
> often a work queue might want some other order.  Very often queues
> might instead, for example, have a priority number assigned to them.

Indeed, and it makes me more cautious that claims for the _usefulness_
(rather than just consistency) of an ordered set are missing an XY
problem.  The only "natural" value of insertion-order ordering for a
dynamic ordered set is that it supports FIFO queues.  Well, that's one
thing that a deque already excels at, but much more obviously,
flexibly, and space- and time- efficiently.

For dicts the motivating use cases were much more "static", like
preserving textual order of keyword argument specifications, and
avoiding gratuitous changing of order when round-tripping key-value
encodings like much of JSON.

So what problem(s) would a dynamic ordered set really be aiming at?  I
don't know.  Consistency & predictability I understand and appreciate,
but little beyond that.  Even FIFO ordering is somewhat a PITA, since
`next(iter(set))` is an overly elaborate way to need to spell "get the
first" if a FIFO queue _is_ a prime dynamic use case.  And there's no
efficient way at all to access the other end (although I suppose
set.pop() would - like dict.popitem() - change to give a _destructive_
way to get at "the other end").
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/J52ATOHFXNBSEJV6QQZBFXSC2TAHOWGW/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Should set objects maintain insertion order too?

2019-12-19 Thread Tim Peters
[Nick]
> I took Larry's request a slightly different way:

Sorry, I was unclear:  by "use case" I had in mind what appeared to me
to be the overwhelming thrust of the _entirety_ of this thread so far,
not Larry's original request.

> he has a use case where he wants order preservation (so built in sets aren't
> good), but combined with low cost duplicate identification and elimination and
> removal of arbitrary elements (so lists and collections.deque aren't good).

Sure.

> Organising a work queue that way seems common enough to me to be
> worthy of a stdlib solution that doesn't force you to either separate a
> "job id" from the "job" object in an ordered dict, or else use an ordered
> dict with "None" values.
>
> So while it means answering "No" to the "Should builtin sets preserve
> order?" question (at least for now), adding collections.OrderedSet *would*
> address that "duplicate free pending task queue" use case.

Only Larry can answer whether that would meet his perceived need.  My
_guess_ is that he wouldn't know OrderedSet existed, and, even if he
did, he'd use a dict with None values anyway (because it's less hassle
and does everything he wanted).

But I have to add that I don't know enough about his original use case
to be sure it wasn't "an XY problem":

https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem
"""
The XY problem is asking about your attempted solution rather than
your actual problem.

That is, you are trying to solve problem X, and you think solution Y
would work, but instead of asking about X when you run into trouble,
you ask about Y.
"""

Because I've never had a "job queue" problem where an OrderedSet would
have helped.  Can't guess what's different about Larry's problem.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/TV2XVX2NKUGJJZNFKO4TSMEPVQI6Y75Q/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Should set objects maintain insertion order too?

2019-12-19 Thread Tim Peters
[Nick Coghlan ]
> Starting with "collections.OrderedSet" seems like a reasonable idea,
> though - that way "like a built-in set, but insertion order preserving" will
> have an obvious and readily available answer, and it should also
> make performance comparisons easier.

Ya, I suggested starting with collections.OrderedSet earlier, but gave up on it.

The problem is that the "use case" here isn't really a matter of
functionality, but of consistency:  "it would be nice if", like dicts
enjoy now, the iteration order of sets was defined in an
implementation-independent way.  That itch isn't scratched unless the
builtin set type defines it.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/PM5ENMLR665XG32AS2FEAEUVDG3AFWV6/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Should set objects maintain insertion order too?

2019-12-17 Thread Tim Peters
[Larry]
> "I don't care about performance" is not because I'm aching for Python to
> run my code slowly.  It's because I'm 100% confident that the Python
> community will lovingly optimize the implementation.

I'm not ;-)

>  So when I have my language designer hat on, I really don't concern myself
> with performance.  As I thought I said earlier in the thread, I think we 
> should
> figure out the semantics we want first, and then we figure out how to make it 
> fast.

Pretty much the opposite of how Python started.  The original lists
and dicts were deliberately designed to have dirt simple
implementations, in reaction against ABC's "theoretically optimal"
data structures that were a nightmare to maintain, and had so much
overhead to support "optimality" in all cases that they were, in fact,
much slower in common cases.

There isn't magic waiting to be uncovered here:  if you want O(1)
deletion at arbitrary positions in an ordered sequence, a doubly
linked list is _the_ way to do it.  That's why, e.g., Raymond said he
still uses a doubly linked list, instead of an ordered dict, in the
LRU cache implementation.  If that isn't clear, a cache can be
maintained in least-to-most recently accessed order with an ordered
dict like so:

if key in cache:
cached_value = cache.pop(key)  # remove key
   else:
compute cached_value
assert key not in cache
cache[key] = cached_value # most recently used at the end now
return cached_value

and deleting the least recently used is just "del
cache[next(iter(cache))]" (off topic, just noting this is a fine use
for the "first()" function that's the subject of a different thread).

We _could_ structure an ordered dict's hash+key+value records as a
doubly linked list instead (and avoid the need for O(N)
reorganizations).  But then we piss away much of the memory savings
(to store the new links) that was the _point_ of compact dicts to
begin with.

So there was a compromise.  No links, deletion on its own is O(1), but
can periodically require O(N) under-the-covers table reorganizations
to squash out the holes.  "Suitable enough" for ordered dicts, but
_thought_ to be unsuitable for ordered sets (which appear to have
higher rates of mixing deletions with insertions - the LRU cache
example being an exception).

But there are also other optimizations in the current set
implementation, so "fine, add the doubly linked list to sets but not
to dicts" is only part of it.

Which may or may not be possible to match, let alone beat, in an
ordered set implementation.  A practical barrier now is that Python is
too mature to bank on loving optimizations _after_ a change to a core
feature is released.  It's going to need a highly polished
implementation first.

I certainly don't object to anyone trying, but it won't be me ;-)
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/UXWGNLGC3BG2XSMYN5H73FVKP3O2XUUC/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Should set objects maintain insertion order too?

2019-12-17 Thread Tim Peters
...

[Larry]
>> One prominent Python core developer** wanted this feature for years, and I 
>> recall
>> them saying something like:
>>
>> Guido says, "When a programmer iterates over a dictionary and they see the 
>> keys
>> shift around when the dictionary changes, they learn something!"  To that I 
>> say--"Yes!
>> They learn that Python is unreliable and random!"

[Tim]
> I never wanted ordered dicts, but never objected to them either.  All
> in all, given that _I_ haven't seen a performance degradation, I like
> that they're more predictable, and love the memory savings.

That reads like I was correcting Larry for misquoting _me_.  Sorry!
He wasn't quoting me, and I didn't think he was.  What he did quote
was delightfully snarky enough that I didn't want to snip it, but not
substantial enough to be worth the bother of addressing.  So I just
moved on to summarize what I said at the time (i.e., nothing).
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/LXJU3EKB2KM3BT6MTGDLWVHTZUUIZGJK/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Should set objects maintain insertion order too?

2019-12-16 Thread Tim Peters
[Larry]
> Didn't some paths also get slightly slower as a result of maintaining
> insertion order when mixing insertions and deletions?

I paid no attention at the time.  But in going from "compact dict" to
"ordered dict", deletion all by itself got marginally cheaper.  The
downside was the need to rearrange the whole dict when too many
"holes" built up.  "Compact (but unordered) dict" doesn't need that.

> My recollection is that that was part of the debate--not only "are we going to
> regret inflicting these semantics on posterity, and on other 
> implementations?",
> but also "are these semantics worth the admittedly-small performance hit, in
> Python's most important and most-used data structure?".

There's a layer of indirection in compact dicts - lookups are
distributed across two arrays.  In non-compact unordered dicts,
everything is in a single array.  Cache effects may or may not make a
measurable difference then, depending on all sorts of things.

> Also, I don't recall anything about us resigning ourselves to explicitly
> maintain ordering on dicts because people were relying on it, "battling
> windmills", etc.  Dict ordering had never been guaranteed, a lack
> of guarantee Python had always taken particularly seriously.  Rather, we
> decided it was a desirable feature, and one worth pursuing even at the
> cost of a small loss of performance.

I'm not convinced there was a loss of performance.  The delay between
the implementation supplying ordered dicts, and the language
guaranteeing it, was, I suspect, more a matter of getting extensive
real-world experience about whether the possible new need to massively
rearrange dict internals to remove "holes" would bite someone too
savagely to live with.  But, again, I wasn't paying attention at the
time.

> One prominent Python core developer** wanted this feature for years, and I 
> recall
> them saying something like:
>
> Guido says, "When a programmer iterates over a dictionary and they see the 
> keys
> shift around when the dictionary changes, they learn something!"  To that I 
> say--"Yes!
> They learn that Python is unreliable and random!"

I never wanted ordered dicts, but never objected to them either.  All
in all, given that _I_ haven't seen a performance degradation, I like
that they're more predictable, and love the memory savings.

But as Raymond said (& I elaborated on), there's reason to believe
that the implementation of ordered dicts is less suited to sets, where
high rates of mixing adds and deletes is more common (thus triggering
high rates of massive internal dict rearranging).  Which is why I said
much earlier that I'd be +1 on ordered sets only when there's an
implementation that's as fast as what we have now.

Backing up:

> Python is the language where speed, correctness, and readability trump
> performance every time.

Speed trumping performance didn't make any sense ;-)

So assuming you didn't intend to type "speed", I think you must have,
say, Scheme or Haskell in mind there.  "Practicality beats purity" is
never seen on forums for those languages.  Implementation speed & pain
have played huge rules in many Python decisions.  As someone who has
wrestled with removing the GIL, you should know that better than
anyone ;-)
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/QK3KERIPYD3Q3XNKZJBQBQ6NUUKT63WN/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Should set objects maintain insertion order too?

2019-12-16 Thread Tim Peters
[Tim]
>> If it's desired that "insertion order" be consistent across runs,
>> platforms, and releases, then what "insertion order" _means_ needs to
>> be rigorously defined & specified for all set operations.  This was
>> comparatively trivial for dicts, because there are, e.g., no
>> commutative binary operators defined on dicts.

[Larry]
> My intuition is that figuring out sensible semantics here is maybe not
> trivial, but hardly impossible.  If this proposal goes anywhere I'd be
> willing to contribute to figuring it out.

No, it _is_ easy.  It's just tedious, adding piles of words to the
docs, and every constraint also constrains possible implementations.
You snipped my example of implementing union, which you should really
think about instead ;-)


>> If you want to insist that `a | b` first list all the elements of a,
>> and then all the elements of b that weren't already in a, regardless
>> of cost, then you create another kind of unintuitive surprise:  in
>> general the result of "a | b" will display differently than the result
>> of "b | a" (although the results will compare equal), and despite that
>> the _user_ didn't "insert" anything.

> Call me weird--and I won't disagree--but I find nothing unintuitive about 
> that.
>  After all, that's already the world we live in: there are any number of sets
> that compare as equal but display differently.  In current Python:
>
> >>> a = {'a', 'b', 'c'}
> >>> d = {'d', 'e', 'f'}
> >>> a | d
> {'f', 'e', 'd', 'a', 'b', 'c'}
> >>> d | a
> {'f', 'b', 'd', 'a', 'e', 'c'}

Yup, it happens   But under the sample union implementation I gave, it
would never happen when the sets had different cardinalities (the
different sizes are used to force a "standard" order then).  For
mathematical sets, | is commutative (it makes no difference to the
result if the arguments are swapped - but can make a _big_ difference
to implementation performance unless the implementation is free to
pick the better order).

> ...
> This is also true for dicts, in current Python, which of course do maintain
> insertion order.  Dicts don't have the | operator, so I approximate the
> operation by duplicating the dict (which AFAIK preserves insertion order)

Ya, it does, but I don't believe that's documented (it should be).

> and using update.

Too different to be interesting here - update() isn't commutative.
For sets, union, intersection, and symmetric difference are
commutative.

> ...
> Since dicts already behave in exactly that way, I don't think it would be too
> surprising if sets behaved that way too.  In fact, I think it's a little 
> surprising
> that they behave differently, which I suppose was my whole thesis from
> the beginning.

I appreciate that dicts and sets behave differently in visible ways
now.  It just doesn't bother me enough that I'm cool with slowing set
operations to "fix that".
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/L2VGSHROPMW4BV6ORYMFKQ4ZJ5AXTZLE/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Should set objects maintain insertion order too?

2019-12-16 Thread Tim Peters
[Petr Viktorin ]
> ...
> Originally, making dicts ordered was all about performance (or rather
> memory efficiency, which falls in the same bucket.) It wasn't added
> because it's better semantics-wise.

As I tried to flesh out a bit in a recent message, the original
"compact dict" idea got all the memory savings, but did _not_ maintain
insertion order.  Maintaining insertion order too complicated
deletions (see recent message), but was deliberately done because
people really did want "ordered dicts".

> Here's one (very simplified and maybe biased) view of the history of dicts:
>
> * 2.x: Dicts are unordered, please don't rely on the order.
> * 3.3: Dict iteration order is now randomized. We told you not to rely
> on it!
> * 3.6: We now use an optimized implementation of dict that saves memory!
> As a side effect it makes dicts ordered, but that's an implementation
> detail, please don't rely on it.
> * 3.7: Oops, people are now relying on dicts being ordered. Insisting on
> people not relying on it is battling windmills. Also, it's actually
> useful sometimes, and alternate implementations can support it pretty
> easily. Let's make it a language feature! (Later it turns out
> MicroPython can't support it easily. Tough luck.)

A very nice summary!  My only quibble is as above:  the "compact dict"
implementation doesn't maintain insertion order naturally, _unless_
there are no deletions (which is, e.g., true of dicts constructed to
pass keyword arguments).  The code got hairier to maintain insertion
order in the presence of mixing insertions and deletions.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/TRVOCOLQGSOM2OLLAI3UPRCTFIKIWWH6/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Should set objects maintain insertion order too?

2019-12-16 Thread Tim Peters
[Raymond]
> ...
> * The ordering we have for dicts uses a hash table that indexes into a 
> sequence.
> That works reasonably well for typical dict operations but is unsuitable for 
> set
> operations where some common use cases make interspersed additions
> and deletions (that is why the LRU cache still uses a cheaply updated doubly-l
> linked list rather that deleting and reinserting dict entries).

I'm going to take a stab at fleshing that out a bit:  ideally, an
ordered dict stores hash+key+value records in a contiguous array with
no "holes".  That's ("no holes") where the memory savings comes from.
The "holes" are in the hash table, which only hold indices _into_ the
contiguous array (and the smallest width of C integer indices
sufficient to get to every slot in the array).

"The problem" is that deletions leave _two_ kinds of holes:  one in
the hash table, and another in the contiguous vector.  The latter
holes cannot be filled with subsequent new hash+key+value records
because that would break insertion order.

So in an app that mixes additions and deletions, the ordered dict
needs to be massively rearranged at times to squash out the holes left
behind by deletions, effectively moving all the holes to "the end",
where they can again be used to reflect insertion order.

Unordered dicts never had to be rearranged unless the total size
changed "a lot", and that remains true of the set implementation.  But
in apps mixing adds and deletes, ordered dicts can need massive
internal rearrangement at times even if the total size never changes
by more than 1.

Algorithms doing a lot of mixing of adds and deletes seem a lot more
common for sets than for dicts, and so the ordered dict's basic
implementation _approach_ is a lot less suitable for sets.  Or, at
least, that's my best attempt to flesh out Raymond's telegraphic
thinking there.

Note:  the holes left by deletions _wouldn't_ be "a problem" _except_
for maintaining insertion order.  If we were only after the memory
savings, then on deletion "the last" record in the contiguous array
could be moved into the hole at once, leaving the array hole-free
again.  But that also changes the order.  IIRC, that's what the
original "compact dict" _did_ do.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/QQXSSJWHKTUNHSMSHVM7XLMDBMUV7BDX/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Should set objects maintain insertion order too?

2019-12-16 Thread Tim Peters
[Tim]
> BTW, what should
>
> {1, 2} | {3, 4, 5, 6, 7}
>
> return as ordered sets?  Beats me.;

[Larry]
> The obvious answer is {1, 2, 3, 4, 5, 6, 7}.

Why?  An obvious implementation that doesn't ignore performance entirely is:

def union(smaller, larger):
if len(larger) < len(smaller):
smaller, larger = larger, smaller
result = larger.copy()
for x in smaller:
   result.add(x)

In the example, that would first copy {3, 4, 5, 6, 7}, and then add 1
and 2 (in that order) to it, giving {3, 4, 5, 6, 7, 1, 2} as the
result.

If it's desired that "insertion order" be consistent across runs,
platforms, and releases, then what "insertion order" _means_ needs to
be rigorously defined & specified for all set operations.  This was
comparatively trivial for dicts, because there are, e.g., no
commutative binary operators defined on dicts.

If you want to insist that `a | b` first list all the elements of a,
and then all the elements of b that weren't already in a, regardless
of cost, then you create another kind of unintuitive surprise:  in
general the result of "a | b" will display differently than the result
of "b | a" (although the results will compare equal), and despite that
the _user_ didn't "insert" anything.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/OU37ZU46BCHI6HLA7E3NEWCDOLQOHRNF/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Should set objects maintain insertion order too?

2019-12-16 Thread Tim Peters
[Guido]
> ...
> the language should not disappoint them, optimization opportunities be damned.

I would like to distinguish between two kinds of "optimization
opportunities":  theoretical ones that may or may not be exploited
some day, and those that CPython has _already_ exploited.

That is, we don't have a blank slate here.  As Raymond said, the set
implementation already diverged from the dict implementation in
fundamental ways for "go fast" reasons.  How much are people willing
to see set operations slow down to get this new feature?

For me, "none" ;-)  Really.  I have no particular use for "ordered"
sets, but have set-slinging code that benefits _today_ from the "go
fast" work Raymond did for them.

Analogy:  it was always obvious that list.sort() is "better" stable
than not, but I ended up crafting a non-stable samplesort variant that
ran faster than any stable sort anyone tried to write.  For years.  So
we stuck with that, to avoid slowing down sorting across releases.

The stable "timsort" finally managed to be _as_ fast as the older
samplesort in almost all cases, and was very much faster in many
real-world cases.  "Goes faster" was the thing that really sold it,
and so much so that its increased memory use didn't count for much
against it in comparison.

Kinda similarly, "ordered dicts" were (as has already been pointed
out) originally introduced as a memory optimization ("compact" dicts),
where the "ordered" part fell out more-or-less for free.  The same
approach wouldn't save memory for sets (or so I'm told), so the
original motivation for compact dicts doesn't carry over.

So I'm +1 on ordered sets if and only if there's an implementation
that's at least as fast as what we already have.  If not now, then,
like ordered dicts evolved, offer a slower OrderedSet type in the
`collections` module for those who really want it, and wait for magic
;-)

BTW, what should

{1, 2} | {3, 4, 5, 6, 7}

return as ordered sets?  Beats me.;  The imbalance between the
operands' cardinalities can be arbitrarily large, and "first make a
copy of the larger one, then loop over the smaller one" is the obvious
way to implement union to minimize the number of lookups needed.  The
speed penalty for, e.g., considering "the left" operand's elements'
insertion orders to precede all "the right" operand's elements'
insertion orders can be arbitrarily large.

The concept of "insertion order" just doesn't make well-defined sense
to me for any operation the produces a set from multiple input sets,
unless it means no more than "whatever order happens to be used
internally to add elements to the result set".  Dicts don't have
anything like that, although dict.update comes close, but in that case
the result is defined by mutating the dict via a one-at-a-time loop
over the argument dict.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/YXSGUPZYTO7TKOVVU32276M54TMITVVQ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Should set objects maintain insertion order too?

2019-12-15 Thread Tim Peters
[Larry Hastings ]
> As of 3.7, dict objects are guaranteed to maintain insertion order.  But set
> objects make no such guarantee, and AFAIK in practice they don't maintain
> insertion order either.

If they ever appear to, it's an accident you shouldn't rely on.

>  Should they?

>From Raymond, 22 Dec 2017:

https://twitter.com/raymondh/status/944454031870607360
"""
Sets use a different algorithm that isn't as amendable to retaining
insertion order. Set-to-set operations lose their flexibility and
optimizations if order is required. Set mathematics are defined in
terms of unordered sets. In short, set ordering isn't in the immediate
future.
"""

Which is more an answer to "will they?" than "should they?" ;-)
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/WMIZRFJZXI3CD5YMLOC5Z3LXVXD7HM4R/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Macros instead of inline functions?

2019-12-04 Thread Tim Peters
[Skip Montanaro ]
> ...
> I don't think stable code which uses macros should be changed (though
> I see the INCREF/DECREF macros just call private inline functions, so
> some conversion has clearly been done). Still, in new code, shouldn't
> the use of macros for more than trivial use cases (constant defs,
> simple one-liners) be discouraged at this point?

Within reason, I think so.  Like C's old "register" declaration,
compilers will eventually evolve to make better decisions about what
should be done than humans generally do.

But there are macros that exist just to reduce repetitive, error-prone
typing, and others that set up control flow (like the trashcan macros.
or listobject.c's IFLT).  Which is the "within reason" part above.
There are still fine uses for macros in C.

It's just that faking inline functions is no longer one of them ;-)
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/KGU63DELU4YJTVFALCP55SXQIJ2QN5WJ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Compacting the Uncompactable

2019-09-28 Thread Tim Peters
Short course:  a replacement for malloc for use in contexts that can't
"move memory" after an address is passed out, but want/need the
benefits of compactification anyway.

Key idea:  if the allocator dedicates each OS page to requests of a
specific class, then consider two pages devoted to the same class,
where `-` is free space, and X and Y are allocated chunks:

page1: X--XX-X---X
page2: --Y--Y--YY-

Because there's no overlap in the offsets of allocated blocks here, we
can copy the 4 Ys into page1 (or Xs into page2).  Then tell the OS to
change its page tables so that the virtual addresses of page1 amd
page2 _both_ map to the physical page page1 referred to.  page2's
physical page can be returned to the OS then.

No _virtual_ address malloc ever handed out needs to change.

The rest is making this all safe & "go fast".

On Sat, Sep 28, 2019 at 2:06 PM MRAB  wrote:
>
> Here's a video about memory fragmentation and compaction that you might
> find interesting:
>
> "Compacting the Uncompactable" by Bobby Powers
> https://www.youtube.com/watch?v=c1UBJbfR-H0
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at 
> https://mail.python.org/archives/list/python-dev@python.org/message/G6OR45TETKIFZDVWAK5ZGLLFTIC422TG/
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/VMP5RXVQV574SXNJC5SNIJNAL3PYZZC6/


[Python-Dev] Re: Spammy stuff (was Re: congrats on 3.5! Alas, windows 7 users are having problems installing it)

2019-09-15 Thread Tim Peters
[Tim]
> While python-dev has several "official" moderators, best I can tell
> I'm the only one who has reviewed these messages for years.

I should clarify that!  That's not meant to be a dig at the other
moderators.  I review everything because I'm retired and am near the
computer many hours almost every day.  That's been so for at least a
decade.  So the others rarely get a chance - the queue is very likely
empty if they ever look.  After a few years of that, they may well
stop looking ;-)
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/ZZXYE2PPVEURQ23CEOFYS5YPBJR5BDDT/


[Python-Dev] Re: Spammy stuff (was Re: congrats on 3.5! Alas, windows 7 users are having problems installing it)

2019-09-15 Thread Tim Peters
[This is about the mailing list, not about Python development]

That python-dev-owner has gotten two complaints about this message so
far suggests I should explain what's going on ;-)

New list members are automatically moderated.  Their posts sit in a
moderation queue waiting for moderator action.

While python-dev has several "official" moderators, best I can tell
I'm the only one who has reviewed these messages for years.

So - yup! - I approved this spam.  The Subject was legit, the email
address was from Google rather than, e.g., some obscure Russian host,
and the link was to a site with an obviously computer-related name.
"Good enough".  Usually, but not in this case.

So, sorry, but sometimes one of these just will slip through.  I've
since fiddled the list to auto-discard future posts from this address,
but have no ability to purge the spam from the list archives.

There are other not-spam but "low quality" messages I've let through
too, since the list migrated to Mailman 3.  That's because we lost
some very useful moderator functionality (which I've griped about on a
Mailman 3 issue tracker):

- It's no longer possible to forward a message from the queue to a
different address.  So, e.g., rather than throw them away, I've let
through some newbie messages that I would previously have forwarded to
the Python help list.

- It's no longer possible to write custom text for a "Reject" action.
So, again, rather than throw away messages that should obviously be
posted to a different list, with the author getting nothing but a
generic explanation-free "rejected" message, I've let them through.

Yes, I _could_ use my own my email account and copy/paste/fiddle to
forward/reply in such cases, independent of Mailman functionality.
And sometimes I do.  But there's only so much time I can give to this,
and some days I settle for clearing the queue just as quickly as
Tim-ly possible.

Some of that will get better if/when Mailman 3 restores lost
functionality, but I doubt anything can be done that stops spam 100%
short of erring on the side of rejecting marginally legit messages
too.  But that's not how my brain is wired, so you would need a
different moderator if that's what you want.

In the meantime, I'll continue cashing  my lavish moderator checks and
just laugh at complaints ;-)

On Sun, Sep 15, 2019 at 12:21 AM  wrote:
>
> Check this article here which will help you out.
> [useless link]
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/3DPJTBS252J3XX5FMFRXTOFWYIUDEO6H/


[Python-Dev] Re: PEP 572 TargetScopeError

2019-08-09 Thread Tim Peters
[Guido]
> I don't see how this debate can avoid a vote in the Steering Council.

FWIW, I found Nick's last post wholly persuasive:  back off to
SyntaxError for now, and think about adding a more specific exception
later for _all_ cases (not just walrus) in which a scope conflict
isn't allowed (e.g., his example of declaring a function's formal
argument `global` in the function).
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/7ILVNCFSYMJ4HP5INVA34N4DVEKSEYAU/


[Python-Dev] Re: PEP 572 TargetScopeError

2019-08-08 Thread Tim Peters
[Barry Warsaw ]
> bpo-37757: https://bugs.python.org/issue37757

Really couldn't care less whether it's TargetScopeError or
SyntaxError, but don't understand the only rationale given here for
preferring the latter:

> To me, “TargetScopeError” is pretty obscure and doesn’t give users an
> obvious clue as to what’s going wrong, without searching the interwebs.

Whereas SyntaxError would give no clue whatsoever, and nothing useful
to search for.  In contrast, a search for TargetScopeError would
presumably find a precisely relevant explanation as the top hit
(indeed, it already does today).

I don't care because it's unlikely an error most people will make at
all - or, for those too clever for their own good, make more than once
;-)
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/WBWCV7PBDYY6PVSXIX2QX7GKXY54PREU/


[Python-Dev] Re: Can someone please finish merging GH-13482?

2019-08-07 Thread Tim Peters
[Brett Cannon ]
> We probably need to update https://devguide.python.org/committing/ to
> have a step-by-step list of how to make a merge works and how to
> handle backports instead of the wall of text that we have. (It's already
> outdated anyway, e.g. `Misc/ACKS` really isn't important as git itself
> records the author of the commit and so that can be used with
> `Misc/ACKS` for old commits to gather the list of folks who have"
> contributed.)

Don't put too much weight on my screwups ;-)  I was appalled to hear
that the OP's contribution was still sitting unmerged, and was in a
hurry to resolve that at a time I _had_ no significant time to give to
it.

Mariatta and Terry Reedy finished up what I left undone, so in the end
it's all good :-)

And there's a problem with the GitHub workflow docs that may be unique
to me:  we have helpful layers of automation, but they're shortening a
git workflow I don't understand even without the automation.  With the
automation, I'm just doubly clueless.

That's because I'm old.  My capacity to give a rip about source
control system quirks was apparently entirely used up when I spent
several weeks mastering the intricacies of Mercurial.  Try as I might,
I just haven't been able to force myself to become competent with git.
It's not that I disapprove of git!  It's apparently more that we're
each born with a finite capacity for being _able_ to learn Yet Another
New Source Control System, and mine was used up on YANSCS #8 ;-)

aging-isn't-for-the-optimistic-ly y;rs  - tim
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/ACAJVCPHE2N5RNI4BNL2JK6INMJRPZ2N/


[Python-Dev] Re: Can someone please finish merging GH-13482?

2019-08-06 Thread Tim Peters
[Mariatta ]
- Since this is a 1st-time contributor, does it need a change to the ACKS
file?

>
> I think the change is trivial enough, the misc/acks is not necessary.
>
> - Anything else?
>
>
> 1. Does it need to be backported? If so, please add the "needs backport to
> .." label.
>
> 2. Add the "烙 automerge" label. The bot will take care of merging. It
> will use the PR title and description as commit message.
> If the PR title/ description is not good enough as commit message, edit it
> first before adding the "烙 automerge" label.
>
> So I'll watch and do it myself next time ;-)
>
>
> Hopefully the above instructions are clear and simple enough, so I'll let
> you trigger the automerge this time.
>

So I tried ;-)  No idea what did and didn't work.  I got several of these
messages:

"""
Sorry, I can't merge this PR. Reason: Base branch was modified. Review and
try the merge again..
"""

I'm guessing (don't know) those are about failed auto-backports.  I don't
have time now to try to figure it all out - It seemed to have spawned a
number of new GH PRs for the backports, and I'm thoroughly lost now :-(

The change did show up when I pulled from upstream master, so I expect that
part did work.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/SHIDYDSL43D6ANHGBCGVPMHSGCU5XK2P/


[Python-Dev] Can someone please finish merging GH-13482?

2019-08-06 Thread Tim Peters
https://github.com/python/cpython/pull/13482

is a simple doc change for difflib, which I approved some months ago.
But I don't know the current workflow well enough to finish it myself.
Like:

- Does something special need to be done for doc changes?

- Since this is a 1st-time contributor, does it need a change to the ACKS file?

- Anything else?

I'm sure this is all obvious after you've done it once, but I haven't.
So I'll watch and do it myself next time ;-)
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/TFTQCJ7IZNCE3XUWEC4IVHPXBDZIKVEI/


  1   2   3   4   5   6   7   8   9   10   >