[issue42422] Py_Decref on value crash the interpreter in Python/ceval.c:1104

2020-11-21 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

>From 
>https://github.com/python/cpython/blob/master/Lib/test/crashers/bogus_code_obj.py
> :

"""
Broken bytecode objects can easily crash the interpreter.
This is not going to be fixed.  It is generally agreed that there is no
point in writing a bytecode verifier and putting it in CPython just for
this.  Moreover, a verifier is bound to accept only a subset of all safe
bytecodes, so it could lead to unnecessary breakage.
For security purposes, "restricted" interpreters are not going to let
the user build or load random bytecodes anyway.  Otherwise, this is a
"won't fix" case.
"""

import types

co = types.CodeType(0, 0, 0, 0, 0, b'\x04\x71\x00\x00',
    (), (), (), '', '', 1, b'')
exec(co)

--
nosy: +Dennis Sweeney

___
Python tracker 
<https://bugs.python.org/issue42422>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42334] Subclassing int and complex with keyword arguments weird

2020-11-12 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

Here's an example:

class A(complex):
def __init__(self, *args, test):
self.test = test
def __new__(cls, *args, test):
return super().__new__(cls, *args)

>>> a = A(1, test="TEST")
>>> a
(1+0j)
>>> a.test
'TEST'

>>> b = A(1, 1, test="TEST2")
>>> b
(1+1j)
>>> b.test
'TEST2'

--

___
Python tracker 
<https://bugs.python.org/issue42334>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42334] Subclassing int and complex with keyword arguments weird

2020-11-12 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

This is because int, str, and complex are immutable. If I have 

class MyInt(int):
def __init__(self, stuff):
pass

then when I call MyInt("19"), the string "19" is passed to the constructor 
int.__new__ before the overridden initializer MyInt.__init__. You can only 
override that by implementing a MyInt.__new__ to override the int constructor.

This is not a bug.

--
nosy: +Dennis Sweeney

___
Python tracker 
<https://bugs.python.org/issue42334>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41972] bytes.find consistently hangs in a particular scenario

2020-11-06 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

>  Toward that end it would be fine to use "very high" cutoffs, and save tuning 
> for later.

This feels reasonable to me -- I changed the cutoff to the more cautious `if (m 
>= 100 && n - m >= 5000)`, where the averages are very consistently faster by 
my measurements, and it seems that Tal confirms that, at least for the `m >= 
100` part. More tuning may be worth exploring later, but this seems pretty safe 
for now, and it should fix all of the truly catastrophic cases like in the 
original post.

--

___
Python tracker 
<https://bugs.python.org/issue41972>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41972] bytes.find consistently hangs in a particular scenario

2020-11-06 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

I used the following script, and the raw results are attached.


import pyperf
runner = pyperf.Runner()

ms = [12, 16, 24, 32, 48, 64, 96, 128, 
  192, 256, 384, 512, 768, 1024, 1536]

ns = [2000, 3000, 4000, 6000, 8000,
  12000, 16000, 24000, 32000, 48000,
  64000, 96000]

for n, m in product(ns, ms):
runner.timeit(f"needle={m}, haystack={n}",
  setup="needle = zipf_string(m); haystack = 
zipf_string(n)",
  stmt="needle in haystack",
  globals=locals() | globals())

--
Added file: https://bugs.python.org/file49578/length_benchmarks.txt

___
Python tracker 
<https://bugs.python.org/issue41972>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41972] bytes.find consistently hangs in a particular scenario

2020-11-06 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

I am attempting to better understand the performance characteristics to 
determine where a cutoff should go.

Attached is a colorful table of benchmarks of the existing algorithm to the PR 
with the cutoff changed to `if (1)` (always two-way) or `if (0)` (always status 
quo), and tested on a variety of needle lengths and haystack lengths.

--
Added file: https://bugs.python.org/file49577/Table of benchmarks on lengths.jpg

___
Python tracker 
<https://bugs.python.org/issue41972>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41972] bytes.find consistently hangs in a particular scenario

2020-10-23 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

Here are those zipf-distributed benchmarks for PR 22904: 
https://pastebin.com/raw/qBaMi2dm

Ignoring differences of <5%, there are 33 cases that get slower, but 477 cases 
that got faster.

Here's a stringbench.py run for PR 22904: https://pastebin.com/raw/ABm32bA0

It looks like the stringbench times get a bit worse on a few cases, but I would 
attribute that to the benchmarks having many "difficult" cases with a unique 
character at the end of the needle, such as:

s="ABC"*33; ((s+"D")*500+s+"E").find(s+"E"),

which the status quo already handles as well as possible, whereas the PR best 
handles the case where some middle "cut" character is unique. Who knows how 
common these cases are.

--

___
Python tracker 
<https://bugs.python.org/issue41972>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41972] bytes.find consistently hangs in a particular scenario

2020-10-23 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

This is the approach in the PR:

# JUMP_BOTH
while ...:
if haystack[j + cut] != needle[cut]:
# Sunday skip
j += table[haystack[j + len(needle)]]
continue
j += rest_of_the_two_way_algorithm()

If I understand correctly, this is what you're suggesting:

# JUMP_MAX
while ...:
shift = rest_of_the_two_way_algorithm()
j += max(shift, table[haystack[j + len(needle)]])

I implemented this and on my Zipf benchmark, and JUMP_BOTH was faster on 410 
cases (at most 1.86x faster), while JUMP_MAX was faster on 179 cases (at most 
1.3x faster).

Some things to notice about the JUMP_BOTH approach:

* The if-statement is actually the first step of the 
rest_of_the_two_way_algorithm, so there's no extra comparison over the pure 
two-way algorithm.

* The if-block only executes in situations where the two-way algorithm would 
say to skip ahead by only 1. In all other situations, the two-way algorithm 
skips by at least 2.

The typical purpose of the Sunday skip (as far as I can tell) is that in 
Sunday's algorithm, we find a mismatch, then have no a priori idea of how far 
ahead to skip, other than knowing that it has to be at least 1, so we check the 
character 1 to the right of the window. Another way of thinking about this 
would be to increment the window by 1, look at the last character in the new 
window, and jump ahead to line it up.

However, with the hybrid method of the PR, we *do* have some a priori skip, 
bigger than 1, known before we check the table. So instead of doing the maximum 
of the two-way skip and the Sunday skip, why not do both? As in: do the two-way 
shift, then extend that with a Sunday shift in the next iteration.

I tried another version also with this in mind, something like:

# JUMP_AFTER
while ...:
# Guaranteed to jump to a mismatched poisition
j += rest_of_the_two_way_algorithm() - 1
j += table[haystack[j + len(needle)]]

But this resulted in slightly-worse performance: JUMP_BOTH was faster in 344 
cases (at most 1.9x faster), while JUMP_AFTER was faster in 265 cases (at most 
1.32x faster). My guess for the explanation of this is that since most of the 
time is spent on Sunday shifts lining up the cut character, it's beneficial for 
the compiler to generate specialized code for that case.

--

___
Python tracker 
<https://bugs.python.org/issue41972>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41972] bytes.find consistently hangs in a particular scenario

2020-10-22 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

I attached a new PR, with a lot of the same ideas.

The major differences between this and the last PR:

* The first character to be checked at each alignment is the first character of 
the right half, rather than the last.

* If that first character does not match, then the character immediately 
following the window is looked up in the table, and we jump forward accordingly 
(Sunday's trick).

I'll post some more benchmarks soon, but preliminarily, it seems like this 
swapping of the character to first be matched is better for some needles, worse 
for others, which makes sense. Stringbench.py for example has some needles that 
have a unique character at the end, which prefers the status quo and old PR, 
but other strings work better for this PR.

--

___
Python tracker 
<https://bugs.python.org/issue41972>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41972] bytes.find consistently hangs in a particular scenario

2020-10-22 Thread Dennis Sweeney


Change by Dennis Sweeney :


--
pull_requests: +21836
pull_request: https://github.com/python/cpython/pull/22904

___
Python tracker 
<https://bugs.python.org/issue41972>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41972] bytes.find consistently hangs in a particular scenario

2020-10-19 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

Unfortunately due to licensing issues, it looks like I'll have to ditch the PR 
and start from scratch: it was too heavily based on the glibc implementation, 
which has a stricter license. It may be take a good deal of time before I have 
a potential replacement, but I'll try to work on a "clean-room" implementation. 
Lesson learned!

--

___
Python tracker 
<https://bugs.python.org/issue41972>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41972] bytes.find consistently hangs in a particular scenario

2020-10-18 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

I posted the example thinking that having a concrete walkthrough might be good 
for discussion, and it looks like it was. ;-)

This makes me curious how a simplified-but-not-as-simplified-as-the-status-quo 
Sunday algorithm would fare: using the Sunday algorithm, but with a shift 
lookup modulo 128 rather than a boolean lookup. It might not be provably O(n), 
but it might be faster for some cases. Although if the common case is already 
fast enough, maybe we do want a provably-linear algorithm for the big stuff.

--

___
Python tracker 
<https://bugs.python.org/issue41972>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41972] bytes.find consistently hangs in a particular scenario

2020-10-18 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

> Dennis, I think that's expected, right? Two-way on its own can exploit 
> nothing about individual characters - it only preprocesses the needle to 
> break the possibility for quadratic-time behavior due to periods in the 
> needle.

Yes, that's correct.

> It sounds like you switched the current code's Sunday-ish approach to a more 
> Boyer-Moore-ish approach. That probably hurt, and especially for shorter 
> needles (where Sunday can yield a maximum shift of len(needle)+1, not 
> len(needle) - the "+1" is more significant the smaller len(needle)).

I'm definitely open to switching things up, but I'm not totally sure there 
would be a difference. In Sunday, the pattern is somehow scanned at each step, 
and then upon finding a mismatch, we look up in the table the first character 
outside the window. With the PR as written, when looking up in the table the 
last character of the window, we haven't actually produced a mismatch yet; the 
'continue' statements jump ahead *until* the last characters match (mod 128), 
and then the two-way scanning begins, which determines how the window gets 
shifted after that. But after this two-way-determined shift, the last character 
in the new window immediately gets looked up again in the table at the top of 
the loop, effectively letting the two-way shift immediately be extended to a 
longer shift. I suppose that the line `j += i - suffix + 1;` (which is very 
often an increment by 1) could be changed to a table lookup of the character 
just beyond the window, but I don't quite understand why that would b
 e better than incrementing j and then doing the table lookups at the top of 
the loop.

--

___
Python tracker 
<https://bugs.python.org/issue41972>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41972] bytes.find consistently hangs in a particular scenario

2020-10-18 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

FWIW, one of the "# Made the spaces line up" is actually a "skip ahead by the 
needle length".

--

___
Python tracker 
<https://bugs.python.org/issue41972>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41972] bytes.find consistently hangs in a particular scenario

2020-10-18 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

Below is one of the tests that got run when I happened to import something, and 
I thought it was a good illustration of the Boyer-Moore bad character shift 
table.

It's worth noting in particular that the table is the dominant force for speed 
in some common cases, with the two-way stuff only ever being checked once in 
this example. The shift table can be defeated with pathological strings, and 
that's where the two-way stuff begins to shine.

Checking " 32 bit (ARM)" in "3.10.0a1+ (heads/two-way-dirty:cf4e398e94, Oct 18 
2020, 20:09:21) [MSC v.1927 32 bit (Intel)]".

Two-way with needle=" 32 bit (ARM)" and 
haystack="(heads/two-way-dirty:cf4e398e94, Oct 18 2020, 20:09:21) [MSC v.1927 
32 bit (Intel)]"
Split " 32 bit (ARM)" into " 32 bit" and " (ARM)".
needle is NOT completely periodic.
Using period 8.
> "(heads/two-way-dirty:cf4e398e94, Oct 18 2020, 20:09:21) [MSC v.1927 32 bit 
> (Intel)]"
> " 32 bit (ARM)"
Last character not found in string.
> "(heads/two-way-dirty:cf4e398e94, Oct 18 2020, 20:09:21) [MSC v.1927 32 bit 
> (Intel)]"
>  " 32 bit (ARM)"
Table says shift by 11.
> "(heads/two-way-dirty:cf4e398e94, Oct 18 2020, 20:09:21) [MSC v.1927 32 bit 
> (Intel)]"
> " 32 bit (ARM)" # Made the '3's line up
Table says shift by 5.
> "(heads/two-way-dirty:cf4e398e94, Oct 18 2020, 20:09:21) [MSC v.1927 32 bit 
> (Intel)]"
>  " 32 bit (ARM)" # Here made the spaces line up
Last character not found in string.
> "(heads/two-way-dirty:cf4e398e94, Oct 18 2020, 20:09:21) [MSC v.1927 32 bit 
> (Intel)]"
>   " 32 bit (ARM)" # Made the spaces 
> line up
Checking the right half.
No match.
Jump forward without checking left half.
> "(heads/two-way-dirty:cf4e398e94, Oct 18 2020, 20:09:21) [MSC v.1927 32 bit 
> (Intel)]"
>" 32 bit (ARM)" # Made the spaces 
> line up
Table says shift by 5.
> "(heads/two-way-dirty:cf4e398e94, Oct 18 2020, 20:09:21) [MSC v.1927 32 bit 
> (Intel)]"
> " 32 bit (ARM)" # Made the 
> spaces line up
Table says shift by 5.
> "(heads/two-way-dirty:cf4e398e94, Oct 18 2020, 20:09:21) [MSC v.1927 32 bit 
> (Intel)]"
>  " 32 bit (ARM)" # Made 
> the spaces line up
Table says shift by 10.
> "(heads/two-way-dirty:cf4e398e94, Oct 18 2020, 20:09:21) [MSC v.1927 32 bit 
> (Intel)]"
>" 32 bit 
> (ARM)" # Made the spaces line up
Table says shift by 4.
> "(heads/two-way-dirty:cf4e398e94, Oct 18 2020, 20:09:21) [MSC v.1927 32 bit 
> (Intel)]"
>" 32 bit 
> (ARM)" # Made the lparens line up
Last character not found in string.
Reached end. Returning -1.

--

___
Python tracker 
<https://bugs.python.org/issue41972>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41972] bytes.find consistently hangs in a particular scenario

2020-10-17 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

> But there _also_ seem to be real (but much smaller) benefits
> for the "search backward" cases, which I don't recall seeing
> when I tried it.  Do you have a guess as to why?

I did change `skip = mlast - 1;` to `skip = mlast;` as you had pointed out.
So it could be that, or just a compiler/benchmark ghost.

--

___
Python tracker 
<https://bugs.python.org/issue41972>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41972] bytes.find consistently hangs in a particular scenario

2020-10-17 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

I added the cutoff for strings >= 10 characters, and I converted the PR from a 
draft to "Ready to Review."

When running stringbench.py before and after the PR, I get these results:

Summary:
Unicode Before: 81.82   Bytes Before: 92.62 
Unicode After:  64.70   Bytes after: 62.41

Full results here:
https://pastebin.com/raw/DChzMjhH

And on the random zipf benchmarks: 

Summary:
14 cases slower (median 1.16x slower, at most 1.52x slower)
601 cases faster (median 2.15x faster, at most 21.94x faster)

Full results here:
https://pastebin.com/raw/ez6529Bp

--

___
Python tracker 
<https://bugs.python.org/issue41972>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41972] bytes.find consistently hangs in a particular scenario

2020-10-16 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

I'm doing a couple more timing tests to try to understand exactly when the 
cutoff should be applied (based on some combination of needle and haystack 
lengths).

Can the rolling hash algorithm be made to go sublinear like O(n/m)? It looked 
like it was pretty essential that it hash/unhash each and every haystack 
character. You could probably do a bloom-like check where you jump ahead by the 
needle length sometimes, but you'd then have to re-hash the m characters in the 
new window anyway.

--

___
Python tracker 
<https://bugs.python.org/issue41972>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41972] bytes.find consistently hangs in a particular scenario

2020-10-14 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

The most recent batch of commits added a jump table.
Between master and PR 22679 now, there are 151 cases slower than master and 463 
that faster than master.
The slower cases are at most twice as slow, but the faster cases are often 
10-20x faster.
I could add a cutoff to use a simpler algorithm instead, for needles of length 
less than ~10,
but I wanted to get the "purer" data out before making that change.

The benchmark data is here: https://pastebin.com/raw/bzQ4xQgM

--

___
Python tracker 
<https://bugs.python.org/issue41972>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41972] bytes.find consistently hangs in a particular scenario

2020-10-13 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

That test needle happened to end with a G and not have another G until much 
earlier. The status quo took advantage of that, but the PR only takes advantage 
of the skip value for a certain middle character. Perhaps it could do both.

--

___
Python tracker 
<https://bugs.python.org/issue41972>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41972] bytes.find consistently hangs in a particular scenario

2020-10-13 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

@Tim I got this again for that benchmark:

length=3442, value=ASXABCDHAB...: Mean +- std dev: 2.39 ms +- 0.01 ms

Unfortunately not a ghost.

--

___
Python tracker 
<https://bugs.python.org/issue41972>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41972] bytes.find consistently hangs in a particular scenario

2020-10-13 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

Another algorithmic possibility: Instead of the bitset, we could have a 
stack-allocated

uint8_t jump[32]; // maybe 64? Maybe uint16_t?

It would say this: If the last character lined up in the haystack is congruent 
to i mod (1 << 8), then jump ahead by (neede_len if jump[i]==255 else jump[i]), 
where jump[i] gives the distance between the end of the needle and the last 
occurrence in the needle of a character congruent to i.

Is this sort of constant-but-more-than-a-few-bytes stack-space an acceptable 
constant memory cost? If so, I believe that could be a big improvement.

There are also a bunch of little tweaks that could be done here: For example, 
should a hypothetical jump-table jump to line up the last character in the 
needle, or jump to line up the middle character in the needle? The max from two 
tables? Should we search for the last characters to be equal, or just make sure 
the last character is in the needle-bit-set (like in the PR)? Should there be a 
second bit-set for the right half of the string? Should there be a skip value 
computed for the last character as well as the middle character (middle 
character only in the PR)? etc. etc. I'll be sure to try some of these things.

--

___
Python tracker 
<https://bugs.python.org/issue41972>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41972] bytes.find consistently hangs in a particular scenario

2020-10-13 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

bench_table.txt gives my results (`ref` is Master, `change` is with PR 22679).
The change gives 342 faster cases and 275 slower cases, and 9 cases with no 
change.

I chose a random word of length 10**6 with a zipf character distribution for 
the haystack, then 20 random needles (also zipf) of each length and tested 
those same needles and haystack for both.

I ran then with this:

from lots_of_benches import needles, haystack
needles: list[str]
haystack: str

from pyperf import Runner
runner = Runner()

for needle in needles:
n = len(needle)
abbrev = needle if n <= 10 else f"{needle[:10]}..."
runner.timeit(
name=f"length={n}, value={abbrev}",
stmt=f"needle in haystack",
globals=globals(),
)

--
Added file: https://bugs.python.org/file49518/bench_table.txt

___
Python tracker 
<https://bugs.python.org/issue41972>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41972] bytes.find consistently hangs in a particular scenario

2020-10-12 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

I used random_bench.py to compare PR 22679 to Master, and the results are in 
bench_results.txt. Results were varied. I suppose this depends on what cases we 
want to optimize for.

--
Added file: https://bugs.python.org/file49512/random_bench.py

___
Python tracker 
<https://bugs.python.org/issue41972>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41972] bytes.find consistently hangs in a particular scenario

2020-10-12 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

PR 22679 is a draft that does the two-way algorithm but also adds both of the 
tricks from Fredrik's implementation: a bit-set "bloom filter" and remembering 
the skip-distance between some pair of characters.

--
Added file: https://bugs.python.org/file49511/bench_results.txt

___
Python tracker 
<https://bugs.python.org/issue41972>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41972] bytes.find consistently hangs in a particular scenario

2020-10-12 Thread Dennis Sweeney


Change by Dennis Sweeney :


--
pull_requests: +21650
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/22679

___
Python tracker 
<https://bugs.python.org/issue41972>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41972] bytes.find consistently hangs in a particular scenario

2020-10-12 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

Here is a C implementation of the two-way algorithm that should work as a 
drop-in replacement for Objects/stringlib/fastsearch.h.

Benchmarking so far, it looks like it is a bit slower in a lot of cases. But 
it's also a bit faster in a some other cases and oodles faster in the really 
bad cases.

I wonder if there's a good heuristic cutoff (for the needle size?) where the 
two-way usually becomes better.

--
Added file: https://bugs.python.org/file49508/fastsearch.h

___
Python tracker 
<https://bugs.python.org/issue41972>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41972] bytes.find consistently hangs in a particular scenario

2020-10-11 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

> Offhand do you know what the _best_ timing for two-way search is in a 
> pattern-not-found case?

Looking at the glibc implementation, in the top-level "else" clause
(for when the string isn't completely periodic),
we have:

period = MAX (suffix, needle_len - suffix) + 1;

so period > needle_len / 2.

Then during each iteration of the j-loop (where j is the index into the 
haystack),
j is sometimes incremented by `period`, which would probably give O(n/m) best 
case.

Here's a sublinear example for the two-way algorithm:

N = 10**7
haystack = "ABC" * N
needle1 = "ABC" * (N // 10) + "DDD"
needle2 = "ABC" * 10 + "DDD"

=== Results ===
Pure Python Two-Way, shorter needle: 1.7142754
Pure Python Two-Way. Longer needle:  0.00188450667
Python builtin, shorter needle:  0.0310714082
Python builtin, longer needle:   0.0305660707

This case is surprisingly better than the builtin:

N = 10**7
haystack = "ABC" * N
needle1 = "DDD" + "ABC" * (N // 10)
needle2 = "DDD" + "ABC" * 10

=== Results ===
Pure Python Two-Way, shorter needle: 0.00208950774
Pure Python Two-Way. Longer needle:  0.00178789534
Python builtin, shorter needle:  0.0299981028
Python builtin, longer needle:   0.029634395

This was measured using the attached fastsearch.py. There are some 
manipulations in there like string slicing that would make more sense as 
pointer math in C, so especially accounting for that, I'm guessing it would be 
pretty competitive in a lot of cases.

A lot of the implementations look like they use a complete Boyer-Moore table to 
go sublinear in even more cases, which it seems we want to avoid. I'm not 
certain, but it's conceivable that the existing techniques of using just one 
one delta-value or the "Bloom filter" could be thrown in there to get some of 
the same improvements. Although that could be redundant -- I'm not sure.

--
Added file: https://bugs.python.org/file49507/fastsearch.py

___
Python tracker 
<https://bugs.python.org/issue41972>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41972] bytes.find consistently hangs in a particular scenario

2020-10-10 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

I agree that skip could could do 1 better.

---

> I don't know whether it "should be" applied

I don't think I'm convinced: the second check fixes only the very specific case 
when s[len(p):].startswith(p).
Perturbations of reproducer.py are still very slow with the patch:

- pattern2 = b"B" * BIG
+ pattern2 = b"B" * (BIG + 10)

In fact, it's even slower than before due to the double-checking.

---

I'd be curious how something like Crochemore and Perrin's "two-way" algorithm 
would do (constant space, compares < 2n-m):

http://www-igm.univ-mlv.fr/~lecroq/string/node26.html#SECTION00260
https://en.wikipedia.org/wiki/Two-way_string-matching_algorithm

Apparently it's been used as glibc's memmem() since 2008. There, it 
additionally computes a table, but only for long needles where it really pays 
off:

https://code.woboq.org/userspace/glibc/string/str-two-way.h.html
https://code.woboq.org/userspace/glibc/string/memmem.c.html

Although there certainly seems to be a complexity rabbithole here that would be 
easy to over-do.
I might look more into this.

--

___
Python tracker 
<https://bugs.python.org/issue41972>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41972] bytes.find consistently hangs in a particular scenario

2020-10-08 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

Indeed, this is just a very unlucky case.

>>> n = len(longer)
>>> from collections import Counter
>>> Counter(s[:n])
Counter({0: 9056995, 255: 6346813})
>>> s[n-30:n+30].replace(b'\x00', b'.').replace(b'\xff', b'@')
b'..@@'
>>> Counter(s[n:])
Counter({255: 18150624})


When checking "base", we're in this situation

pattern: 
 string: .
Algorithm says: ^ these last characters don't match.
 ^ this next character is not in the pattern
 Therefore, skip ahead a bunch:

 pattern:  
  string: .

 This is a match!


Whereas when checking "longer", we're in this situation:

pattern: @
 string: .
Algorithm says:  ^ these last characters don't match.
  ^ this next character *is* in the pattern.
  We can't jump forward.

 pattern:   
  string: .

 Start comparing at every single alignment...


I'm attaching reproducer.py, which replicates this from scratch without loading 
data from a file.

--
Added file: https://bugs.python.org/file49499/reproducer.py

___
Python tracker 
<https://bugs.python.org/issue41972>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41972] bytes.find consistently hangs in a particular scenario

2020-10-08 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

Adding some hasty printf-debugging to fastsearch.h, I see this:

>>> with open('data.bin', 'rb') as f:
... s = f.read()
...
>>> base = 15403807 * b'\xff'
>>> longer = base + b'\xff'
>>> base in s

Bloom has 0x00: 0
Bloom has 0xff: -2147483648
skip = 0
Checking bloom filter for next char 0x00... not found in pattern. Skipping 
a bunch.
Candidate at 15403808

True

>>> longer in s

Bloom has 0x00: No
Bloom has 0xff: Yes
skip = 0
Checking bloom filter for next char 0xff... found in pattern; increment by 
only one.
Candidate at 1
Candidate at 2
Candidate at 3
Candidate at 4
Candidate at 5
Candidate at 6
Candidate at 7
(...)

Trying the same base, I also get this:

>>> with open('data.bin', 'rb') as f:
... s = f.read()
...
>>> base = 15403807 * b'\xff'
>>> s1 = s[1:]
>>> base in s1

Bloom has 0x00: No
Bloom has 0xff: Yes
skip = 0
Checking bloom filter for next char 0xff... found in pattern; increment by 
only one.
Candidate at 1
Candidate at 2
Candidate at 3
Candidate at 4
Candidate at 5
Candidate at 6
Candidate at 7
(...)

So maybe part of the issue is just that the algorithm is getting particularly
unlucky regarding how the strings line up to start with.

The fact that "skip" is 0 in these cases seems to just be a limitation of the 
algorithm:
we only skip to line up the second-to-last occurrence of the last character,
but with these strings, that's just the second-to-last character.

Computing the entire Boyer-Moore table would be better in cases like these, but
that would require dynamic memory allocation (length of pattern) which I think 
is why it is avoided.

--
nosy: +Dennis Sweeney

___
Python tracker 
<https://bugs.python.org/issue41972>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41356] Convert bool.__new__ to argument clinic

2020-09-29 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

Superseded by https://bugs.python.org/issue41870

--
nosy:  -larry
resolution:  -> works for me
stage: patch review -> resolved
status: open -> closed

___
Python tracker 
<https://bugs.python.org/issue41356>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41873] Add vectorcall for float()

2020-09-28 Thread Dennis Sweeney


Change by Dennis Sweeney :


--
keywords: +patch
pull_requests: +21463
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/22432

___
Python tracker 
<https://bugs.python.org/issue41873>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41873] Add vectorcall for float()

2020-09-28 Thread Dennis Sweeney


New submission from Dennis Sweeney :

I got these benchmarks:

.\python.bat -m pyperf timeit "float(0)"

Before: Mean +- std dev: 79.0 ns +- 1.0 ns
After:  Mean +- std dev: 51.5 ns +- 1.6 ns

--
components: Interpreter Core
messages: 377590
nosy: Dennis Sweeney
priority: normal
severity: normal
status: open
title: Add vectorcall for float()
type: performance
versions: Python 3.10

___
Python tracker 
<https://bugs.python.org/issue41873>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41850] inspect.py: access block stack

2020-09-24 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

I believe the block stack is only for catching exceptions. Loops do not 
interact with the block stack. Only SETUP_FINALLY adds to the block stack:

https://docs.python.org/3/library/dis.html#opcode-SETUP_FINALLY

Meanwhile, loops and conditionals and the like are compiled into goto-like 
jumps.

There may be some existing solution, but I'm not sure what. Maybe look at the 
traceback or ast modules.

--
nosy: +Dennis Sweeney

___
Python tracker 
<https://bugs.python.org/issue41850>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41724] SQLite returns "str" instead of "datetime.datetime" with aggregate queries.

2020-09-05 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

Here's a reproducer.

--
nosy: +Dennis Sweeney
Added file: https://bugs.python.org/file49447/reproducer.py

___
Python tracker 
<https://bugs.python.org/issue41724>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41678] File-level, optionally external sorting

2020-09-01 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

Attached is a proof of concept.

--
Added file: https://bugs.python.org/file49436/disksort.py

___
Python tracker 
<https://bugs.python.org/issue41678>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41678] File-level, optionally external sorting

2020-08-31 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

If we were to do this, I think a better API might be to accept an arbitrary 
iterable, then produce a sorted iterable:

def sorted_on_disk(iterable, key=None, reverse=False) -> Iterable:
...

It would sort chunks of the input and store them in files as sequences of 
pickles, merging them as they got bigger, and then return an iterator over the 
resulting single sorted file.

This would be more composable with other standard Python functions and would be 
a good way of separating concerns. sorted(...) and heapq.merge(...) already 
have the correct APIs to do it this way.

Potential implementation detail: For some small fixed n, always doing a 2^n-way 
heapq.merge instead of a bunch of 2-way merges would do fewer passes over the 
data and would allow the keys to be computed 1/n as many times, assuming we 
wouldn't decorate-sort-undecorate.

--
nosy: +Dennis Sweeney

___
Python tracker 
<https://bugs.python.org/issue41678>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41621] defaultdict miss behave when using default_factory passed as kwargs

2020-08-24 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

PR 21945 changes the signature:

- defaultdict(default_factory[, ...])
+ defaultdict(default_factory=None, /, [...])

--

___
Python tracker 
<https://bugs.python.org/issue41621>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41621] defaultdict miss behave when using default_factory passed as kwargs

2020-08-24 Thread Dennis Sweeney


Change by Dennis Sweeney :


--
keywords: +patch
nosy: +Dennis Sweeney
nosy_count: 3.0 -> 4.0
pull_requests: +21055
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/21945

___
Python tracker 
<https://bugs.python.org/issue41621>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41545] gc API requiring matching number of gc.disable - gc.enable calls

2020-08-17 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

FWIW I forgot the gc.disable() line in the contextmanager, but what I said 
still applies.

--

___
Python tracker 
<https://bugs.python.org/issue41545>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41545] gc API requiring matching number of gc.disable - gc.enable calls

2020-08-17 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

The save-a-boolean-for-each-context-manager approach has an issue if used with 
concurrent generators, where the lifetimes of two generator objects might be 
overlapping but not completely nested, as shown below. The same issue should 
arise when using multiple threads. The approach in no_gc.py with counting the 
times_disabled does not run into the same issue.

>>> from contextlib import contextmanager
>>> import gc
>>> @contextmanager
def no_gc():
was_enabled = gc.isenabled()
try:
yield
finally:
if was_enabled:
gc.enable()


>>> def gen1():
with no_gc():
yield "a"
yield "b"
yield "c"


>>> def gen2():
with no_gc():
yield 1
yield 2
yield 3


>>> 
>>> g1 = gen1()
>>> g2 = gen2()
>>> next(g1)
'a'
>>> next(g2)
1
>>> list(g1)
['b', 'c']
>>> gc.isenabled() # should be False!
True
>>> list(g2)
[2, 3]

--

___
Python tracker 
<https://bugs.python.org/issue41545>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41545] gc API requiring matching number of gc.disable - gc.enable calls

2020-08-15 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

This is exactly the motivation for context managers, no? I attached no_gc.py, 
which works when nested and should additionally be thread-safe. Usage:

from no_gc import no_gc

with no_gc():
# collection disabled
with no_gc():
# collection is still disabled
# collection is still disabled
# collection is finally re-enabled here

--
nosy: +Dennis Sweeney
Added file: https://bugs.python.org/file49394/no_gc.py

___
Python tracker 
<https://bugs.python.org/issue41545>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41356] Convert bool.__new__ to argument clinic

2020-07-22 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

More microbenchmarks:

pyperf timeit "bool()"
Before: 63.1 ns +- 0.7 ns
After:  51.7 ns +- 1.2 ns

pyperf timeit "bool(0)"
Before: 77.4 ns +- 1.9 ns
After:  67.2 ns +- 1.3 ns

pyperf timeit "bool(17)"
Before: 77.3 ns +- 2.3 ns
After:  67.1 ns +- 1.0 ns

--

___
Python tracker 
<https://bugs.python.org/issue41356>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41356] Convert bool.__new__ to argument clinic

2020-07-21 Thread Dennis Sweeney


Change by Dennis Sweeney :


--
keywords: +patch
pull_requests: +20723
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/21581

___
Python tracker 
<https://bugs.python.org/issue41356>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41356] Convert bool.__new__ to argument clinic

2020-07-21 Thread Dennis Sweeney


New submission from Dennis Sweeney :

Benchmarked on my machine (Windows 10):

.\python.bat -m pyperf timeit -s "from collections import deque; x = [[], [1]] 
* 1_000_000" "deque(map(bool, x), maxlen=0)"

--- Win32 build configuration ---
Master: 105 ms +- 2 ms
With this change:  88.2 ms +- 2.1 ms

--- x64 build configuration ---
Master:80.2 ms +- 1.3 ms
With this change:  74.2 ms +- 0.5 ms

--
components: Argument Clinic
messages: 374058
nosy: Dennis Sweeney, larry
priority: normal
severity: normal
status: open
title: Convert bool.__new__ to argument clinic
type: performance
versions: Python 3.10

___
Python tracker 
<https://bugs.python.org/issue41356>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41299] Python3 threading.Event().wait time is twice as large as Python27

2020-07-15 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

I reproduced something similar on Python 3.9.0b1, Windows 64-bit version:

py -m pyperf timeit -s "import threading; E = threading.Event()" 
"E.wait()"

NUMBERMean +- std dev
---
0.0   5.79 us +- 0.13 us
0.000115.6 ms +- 0.1 ms
0.001 15.6 ms +- 0.1 ms
0.01  15.6 ms +- 0.6 ms
0.013 15.5 ms +- 0.6 ms
0.015 15.9 ms +- 0.9 ms
0.016 25.2 ms +- 0.5 ms
0.017 31.2 ms +- 0.2 ms
0.018 31.2 ms +- 0.4 ms
0.025 31.2 ms +- 1.0 ms
0.05  62.2 ms +- 0.8 ms
0.1   109 ms +- 2 ms
0.2   201 ms +- 3 ms
0.5   500 ms +- 0 ms
1.0   1.00 sec +- 0.00 sec

On the smaller scale, it looks quantized to multiples of ~15ms (?), but then it 
gets more accurate as the times get larger. I don't think it's a measurement 
error since the first measurement manages microseconds.

Perhaps this is just an OS-level thread-scheduling issue?

--
nosy: +Dennis Sweeney

___
Python tracker 
<https://bugs.python.org/issue41299>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41166] CLASS ATTRIBUTES

2020-06-30 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

I think the word "attribute" is preferred over "value" because "value" can mean 
just about anything, whereas, according to 
https://docs.python.org/3/glossary.html?highlight=glossary , an attribute is 
specifically:

"""A value associated with an object which is referenced by name using dotted 
expressions. For example, if an object `o` has an attribute `a` it would be 
referenced as `o.a`."""

The phrase "class attribute" is used throughout the documentation to mean an 
attribute of a class. To reference the class attribute called `foo` of the 
class called `MyClass`, we write `MyClass.foo`.

Unless I misunderstand you, I don't think there is an issue.

--
nosy: +Dennis Sweeney

___
Python tracker 
<https://bugs.python.org/issue41166>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40890] Dict views should be introspectable

2020-06-14 Thread Dennis Sweeney


Change by Dennis Sweeney :


--
pull_requests: +20061
stage: resolved -> patch review
pull_request: https://github.com/python/cpython/pull/20873

___
Python tracker 
<https://bugs.python.org/issue40890>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40925] Remove redundant macros used for stack manipulation in interpreter

2020-06-12 Thread Dennis Sweeney


Change by Dennis Sweeney :


Removed file: https://bugs.python.org/file49229/master_perf.txt

___
Python tracker 
<https://bugs.python.org/issue40925>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40925] Remove redundant macros used for stack manipulation in interpreter

2020-06-12 Thread Dennis Sweeney


Change by Dennis Sweeney :


Removed file: https://bugs.python.org/file49228/pushpop_perf.txt

___
Python tracker 
<https://bugs.python.org/issue40925>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40925] Remove redundant macros used for stack manipulation in interpreter

2020-06-12 Thread Dennis Sweeney


Change by Dennis Sweeney :


Added file: https://bugs.python.org/file49230/perf_diff.txt

___
Python tracker 
<https://bugs.python.org/issue40925>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40925] Remove redundant macros used for stack manipulation in interpreter

2020-06-12 Thread Dennis Sweeney


Change by Dennis Sweeney :


Added file: https://bugs.python.org/file49229/master_perf.txt

___
Python tracker 
<https://bugs.python.org/issue40925>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40925] Remove redundant macros used for stack manipulation in interpreter

2020-06-12 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

I just added PR 20845, but I'm concerned about performance. I'm attaching the 
results of a run of pyperformance before and after PR 20845.

--
Added file: https://bugs.python.org/file49228/pushpop_perf.txt

___
Python tracker 
<https://bugs.python.org/issue40925>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40925] Remove redundant macros used for stack manipulation in interpreter

2020-06-12 Thread Dennis Sweeney


Change by Dennis Sweeney :


--
nosy: +Dennis Sweeney
nosy_count: 3.0 -> 4.0
pull_requests: +20038
pull_request: https://github.com/python/cpython/pull/20845

___
Python tracker 
<https://bugs.python.org/issue40925>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40890] Dict views should be introspectable

2020-06-12 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

Would it be better to have a dictview.mapping() method rather than an 
attribute, since it constructs a new object of a different type and since 
that's what keys(), values(), and items() are?

--

___
Python tracker 
<https://bugs.python.org/issue40890>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40890] Dict views should be introspectable

2020-06-08 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

Here's a workaround that's possible with PR 20749 applied:

>>> d = {"a":1, "b":2} # fill up the dict...
>>> DICT = object()
>>> d[DICT] = d
>>> items = d.items()
>>> del d
>>>
>>> d = items.mapping[DICT].pop(DICT)
>>> d
{'a': 1, 'b': 2}

--

___
Python tracker 
<https://bugs.python.org/issue40890>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40890] Dict views should be introspectable

2020-06-08 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

PR 20749 gives each dict view access to a mappingproxy for the original dict, 
although I don't know if that defeats the original purpose.

It might be hard to sensibly make MappingProxy(d).items() return something 
other than d.items(), since this is already the behavior for user-defined 
classes:

>>> class A:
def __getitem__(self, key):
return "value"
def items(self):
return 17

>>> from types import MappingProxyType
>>> MappingProxyType(A()).items()
17

--

___
Python tracker 
<https://bugs.python.org/issue40890>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40890] Dict views should be introspectable

2020-06-08 Thread Dennis Sweeney


Change by Dennis Sweeney :


--
keywords: +patch
pull_requests: +19954
pull_request: https://github.com/python/cpython/pull/20749

___
Python tracker 
<https://bugs.python.org/issue40890>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40889] Symmetric difference on dict_views is inefficient

2020-06-08 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

A demo:

>>> class Int(int):
... hash_calls = 0
... def __hash__(self):
... Int.hash_calls += 1
... return super().__hash__()
...
>>> left = {Int(1): -1, Int(2): -2, Int(3): -3, Int(4): -4, Int(5): -5, Int(6): 
>>> -6, Int(7): -7}
>>> right = {Int(1): -1, Int(2): -2, Int(3): -3, Int(4): -4, Int(5): -5, 
>>> Int(8): -8, Int(9): -9}
>>> Int.hash_calls = 0
>>> left.items() ^ right.items()
{(9, -9), (7, -7), (8, -8), (6, -6)}
>>> Int.hash_calls



It looks like the same trick (searching by key and comparing values before 
maybe constructing a 2-tuple) might give similar performance improvements for 
dict_items.__or__, dict_items.__and__, and dict_items.__sub__. Is it worth 
discussing these other operators in this issue?

--

___
Python tracker 
<https://bugs.python.org/issue40889>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40889] Symmetric difference on dict_views is inefficient

2020-06-08 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

PR 20718 helps somewhat by only creating and hashing the tuples that wind up in 
the final set. Here's a benchmark:

-m pyperf timeit -s "d1 = {i:i for i in range(100_000)}; d2 = {i:i|1 for i in 
range(100_000)}" "d1.items() ^ d2.items()"

  Master: 37.9 ms +- 0.4 ms
PR 20718: 25.5 ms +- 0.4 ms

--

___
Python tracker 
<https://bugs.python.org/issue40889>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40889] Symmetric difference on dict_views is inefficient

2020-06-08 Thread Dennis Sweeney


Change by Dennis Sweeney :


--
keywords: +patch
pull_requests: +19928
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/20718

___
Python tracker 
<https://bugs.python.org/issue40889>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40889] Symmetric difference on dict_views is inefficient

2020-06-07 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

What about returning another dict_items instead of a set? As in (using the 
convention `d.items().mapping is d`):


dict_items = type({}.items())

def __xor__(self: dict_items, other):
if isinstance(other, dict_items):
new = self.mapping.copy()
MISSING = object()
for key, value in other:
existing = new.get(key, MISSING)
if existing is MISSING:
# (key, value) in (other - self)
new[key] = value
elif existing == value:
# (key, value) in (self & other)
del new[key]
else:
# (key, value) in (self - other)
new[key] = value
return new.items()
else:
return set(self) ^ set(other)

I believe this wouldn't require any re-hashing. It would also allow for 
non-hashable values.

--

___
Python tracker 
<https://bugs.python.org/issue40889>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40890] Dict views should be introspectable

2020-06-07 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

Would the best way to address this be adding new KeysProxy, ValuesProxy, and 
ItemsProxy types?

--

___
Python tracker 
<https://bugs.python.org/issue40890>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40890] Dict views should be introspectable

2020-06-07 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

Indeed, with PR 20691 applied, the following crashes:

>>> vars(str).items().mapping.clear()
>>> "uh oh"

--

___
Python tracker 
<https://bugs.python.org/issue40890>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40890] Dict views should be introspectable

2020-06-07 Thread Dennis Sweeney


Change by Dennis Sweeney :


--
keywords: +patch
pull_requests: +19906
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/20691

___
Python tracker 
<https://bugs.python.org/issue40890>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40890] Dict views should be introspectable

2020-06-07 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

I think this will also require typing.MappingProxyType to change a bit, since 
it would make a proxy's underlying dict accessible:

>>> d = dict()
>>> proxy = MappingProxyType(d)
>>> type(proxy.items()) is type(d.items())  # should be False
True
>>> proxy.items().mapping is d  # should be False
???

--

___
Python tracker 
<https://bugs.python.org/issue40890>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40890] Dict views should be introspectable

2020-06-07 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

I'd be happy to write a PR.

Method names could be "mapping", "target", "target_mapping", "target_dict", 
"referent_dict", etc. 

I like the choice of "target_mapping":

d = dict()
assert d is d.keys().target_mapping
assert d is d.values().target_mapping
assert d is d.items().target_mapping

--
nosy: +Dennis Sweeney

___
Python tracker 
<https://bugs.python.org/issue40890>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40879] Strange regex cycle

2020-06-05 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

It looks like only the first 200 characters of the input string's repr are used 
as the compiled pattern's repr for some reason:

https://github.com/python/cpython/blob/master/Modules/_sre.c#L1294

I don't know if there is a good reason, especially since this violates 
eval(repr(pattern)) == pattern in a bad way:

>>> eval(repr(re.compile(STR_RE_URL)))
Traceback (most recent call last):
File "", line 1, in 
File "", line 1

re.compile('(?i)\\b((?:[a-z][\\w-]+:(?:/{1,3}|[a-z0-9%])|www\\d{0,3}[.]|[a-z0-9.\\-]+[.][a-z]{2,4}/)(?:[^\\s()<>]+|\\(([^\\s()<>]+|(\\([^\\s()<>]+\\)))*\\))+(?:\\(([^\\s()<>]+|(\\([^\\s()<>]+\\)))*\\)|[^\\s`!()\,
 re.IGNORECASE)


^
SyntaxError: EOL while scanning string literal

--
nosy: +Dennis Sweeney

___
Python tracker 
<https://bugs.python.org/issue40879>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38938] Possible performance improvement for heapq.merge()

2020-05-30 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

PR 20550 uses a linked structure like what we've been talking about.

--

___
Python tracker 
<https://bugs.python.org/issue38938>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38938] Possible performance improvement for heapq.merge()

2020-05-30 Thread Dennis Sweeney


Change by Dennis Sweeney :


--
pull_requests: +19794
pull_request: https://github.com/python/cpython/pull/20550

___
Python tracker 
<https://bugs.python.org/issue38938>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25782] CPython hangs on error __context__ set to the error itself

2020-05-30 Thread Dennis Sweeney


Change by Dennis Sweeney :


--
pull_requests: +19788
pull_request: https://github.com/python/cpython/pull/20539

___
Python tracker 
<https://bugs.python.org/issue25782>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue25782] CPython hangs on error __context__ set to the error itself

2020-05-30 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

For clarification, the existing behavior on master:
When trying to raise the exception H,
F -> G -> H -> I -> NULL
becomes
H -> F -> G -> NULL

But when trying to set the exception A on top of
B -> C -> D -> E -> C -> ...,
it gets stuck in an infinite loop from the existing cycle.

My PR 20539 keeps the first behavior and resolves the infinite loop by making it
A -> B -> C -> D -> E -> NULL,
which seems consistent with the existing behavior.

So it should be strictly a bugfix. It also only changes the PyErr_SetObject 
code and not the PyException_SetContext code.

--
nosy: +Dennis Sweeney

___
Python tracker 
<https://bugs.python.org/issue25782>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40696] exception chain cycles cause hangs (was "Exception handling with "await" can hang in Python3.9.0b1")

2020-05-30 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

> I it related to issue25782?

Yes -- I didn't see that issue. I'm a little confused about the resolution of 
that issue though.

For clarification, the existing behavior on master:
When trying to raise the exception H,
F -> G -> H -> I -> NULL
becomes
H -> F -> G -> NULL

But when trying to set the exception A on top of
B -> C -> D -> E -> C -> ...,
it gets stuck in an infinite loop from the existing cycle.

My PR keeps the first behavior and resolves the infinite loop by making it
A -> B -> C -> D -> E -> NULL,
which seems consistent with the existing behavior.

So it should be strictly a bugfix.

--

___
Python tracker 
<https://bugs.python.org/issue40696>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40696] exception chain cycles cause hangs (was "Exception handling with "await" can hang in Python3.9.0b1")

2020-05-30 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

I believe PR 20539 solves the more general problem (using Floyd's Tortoise and 
Hare Algorithm), and I would appreciate review / some more ideas for test cases.

--

___
Python tracker 
<https://bugs.python.org/issue40696>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40696] exception chain cycles cause hangs (was "Exception handling with "await" can hang in Python3.9.0b1")

2020-05-30 Thread Dennis Sweeney


Change by Dennis Sweeney :


--
pull_requests: +19782
stage: needs patch -> patch review
pull_request: https://github.com/python/cpython/pull/20539

___
Python tracker 
<https://bugs.python.org/issue40696>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38938] Possible performance improvement for heapq.merge()

2020-05-28 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

less_movement.py is my favorite so far. It still handles key and reverse,
but using instance attributes instead of the list indices I tried before.
It does this by only advancing the "key" and "leaf" attributes up toward
the root (where the key is just the value if key is None), while the value
is stored only in the leaf. Since the "value" attribute is no longer used
except for at the leaves, we can pack a leaf's value into its left slot
and reduce the number of slots.

This seems to run very well on pypy and it looks like it would be pretty
receptive to a c implementation, which I would be happy to work on.

--
title: Possible performance improvement for heaqq.merge() -> Possible 
performance improvement for heapq.merge()
versions: +Python 3.10 -Python 3.9
Added file: https://bugs.python.org/file49198/less_movement.py

___
Python tracker 
<https://bugs.python.org/issue38938>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40696] exception chain cycles cause hangs (was "Exception handling with "await" can hang in Python3.9.0b1")

2020-05-22 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

Wouldn't Floyd's or Brent's cycle detection algorithms be better here than the 
allocation of a new set? I believe they might also eliminate the need to 
fast-path the first 100 or however many.

As in: https://en.wikipedia.org/wiki/Cycle_detection

--
nosy: +Dennis Sweeney

___
Python tracker 
<https://bugs.python.org/issue40696>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38938] Possible performance improvement for heaqq.merge()

2020-05-22 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

key_and_reverse.py employs the same strategy as winners.py, but uses lists as 
the nodes of the tree rather than using Node instances. It also eliminates the 
recursion of treeify, and adds (with neither much of a performance hit nor much 
code duplication) support for ``key=some_func`` and/or ``reverse=True`` keyword 
arguments, so it now passes the test_heapq suite.

--
Added file: https://bugs.python.org/file49180/key_and_reverse.py

___
Python tracker 
<https://bugs.python.org/issue38938>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40722] test_ttk_guionly times out on Ubuntu CI

2020-05-21 Thread Dennis Sweeney


New submission from Dennis Sweeney :

One of the tests (test_ttk_guionly.test_variable_change) on the Ubuntu CI is 
intermittently hanging on this code:


https://github.com/python/cpython/blob/e42b705188271da108de42b55d9344642170aa2b/Lib/tkinter/test/test_ttk/test_extensions.py#L147

def test_variable_change(self):
x = ttk.LabeledScale(self.root)
x.pack()
x.wait_visibility()
^^^

The failure at https://github.com/python/cpython/runs/697113653 gave the logs:

0:28:29 load avg: 0.00 [425/425/1] test_ttk_guionly crashed (Exit code 1)
Timeout (0:20:00)!
Thread 0x7fa268d0c080 (most recent call first):
File "/home/runner/work/cpython/cpython/Lib/tkinter/__init__.py", line 697 
in wait_visibility
File 
"/home/runner/work/cpython/cpython/Lib/tkinter/test/test_ttk/test_extensions.py",
 line 147 in test_variable_change
...

The same did not happen when I re-ran the CI checks. I thought I remembered a 
similar issue in the past, but I couldn't find an bpo issue about it.

--
components: Tkinter
messages: 369549
nosy: Dennis Sweeney
priority: normal
severity: normal
status: open
title: test_ttk_guionly times out on Ubuntu CI
type: behavior
versions: Python 3.10, Python 3.9

___
Python tracker 
<https://bugs.python.org/issue40722>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40679] show class name in method invocation TypeError

2020-05-21 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

https://bugs.python.org/issue40706

--

___
Python tracker 
<https://bugs.python.org/issue40679>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40706] Unreachable code in _PyEval_EvalCode

2020-05-21 Thread Dennis Sweeney


New submission from Dennis Sweeney :

When I was looking into https://bugs.python.org/issue40679, I couldn't come up 
with a test case for the following block, so I added a print statement:

--- a/Python/ceval.c
+++ b/Python/ceval.c
@@ -4179,6 +4179,7 @@ _PyEval_EvalCode(PyThreadState *tstate,
 Py_ssize_t j;

 if (keyword == NULL || !PyUnicode_Check(keyword)) {
+printf("THIS CODE WAS RUN!\n");
 _PyErr_Format(tstate, PyExc_TypeError,
   "%U() keywords must be strings",
   qualname);

I ran the entire test suite and got no such "THIS CODE WAS RUN!". It looks like 
this is a double-check of the (worse -- no function name) error message 
produced by the changes at 
https://github.com/python/cpython/commit/0567786d26348aa7eaf0ab1b5d038fdabe409d92.
 For example:

py -3.7 -c "f = lambda x: None; f(**{1:1})"
  ...
TypeError: () keywords must be strings

py -3.8 -c "f = lambda x: None; f(**{1:1})"
  ...
TypeError: () keywords must be strings

py -3.9 -c "f = lambda x: None; f(**{1:1})"
  ...
TypeError: keywords must be strings

So:

* Can this check be eliminated since it's unreachable from Python?

* Otherwise, is there some reason a C caller would need this check? And could 
it be replaced by and assert?

* If it shouldn't change, then is there a good way to add test coverage?

------
components: Interpreter Core
messages: 369498
nosy: Dennis Sweeney
priority: normal
severity: normal
status: open
title: Unreachable code in _PyEval_EvalCode
type: behavior
versions: Python 3.10, Python 3.9

___
Python tracker 
<https://bugs.python.org/issue40706>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40679] show class name in method invocation TypeError

2020-05-20 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

Sure -- I'll file the issue.

--

___
Python tracker 
<https://bugs.python.org/issue40679>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40679] show class name in method invocation TypeError

2020-05-20 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

I just ran the entire test suite with:

--- a/Python/ceval.c
+++ b/Python/ceval.c
@@ -4179,6 +4179,7 @@ _PyEval_EvalCode(PyThreadState *tstate,
 Py_ssize_t j;

 if (keyword == NULL || !PyUnicode_Check(keyword)) {
+printf("THIS CODE WAS RUN!\n");
 _PyErr_Format(tstate, PyExc_TypeError,
   "%U() keywords must be strings",
   qualname);

and the line was never printed. So maybe the test coverage (or removal?) should 
be a separate issue.

--

___
Python tracker 
<https://bugs.python.org/issue40679>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40679] show class name in method invocation TypeError

2020-05-20 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

Never mind; I think you're right, and 
https://github.com/python/cpython/blob/master/Objects/call.c#L1009 is the line.

--

___
Python tracker 
<https://bugs.python.org/issue40679>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40679] show class name in method invocation TypeError

2020-05-20 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

I got this:

>>> class A:
...  def f():
...   pass
...
>>> A.f(1)
Traceback (most recent call last):
  File "", line 1, in 
TypeError: A.f() takes 0 positional arguments but 1 was given
>>> A.f(**{1:2})
Traceback (most recent call last):
  File "", line 1, in 
TypeError: keywords must be strings

I think this is coming from 
https://github.com/python/cpython/blob/cd8295ff758891f21084a6a5ad3403d35dda38f7/Python/getargs.c#L1636.

--

___
Python tracker 
<https://bugs.python.org/issue40679>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40679] show class name in method invocation TypeError

2020-05-20 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

While trying to write tests, I stumbled across something interesting: 
_PyObject_FunctionString as discussed here ( https://bugs.python.org/issue37645 
) returns a string that also includes the module name where applicable.  For 
example, the module name is included behind the qualname in the following 
situation:

>>> from random import Random
>>> Random.seed(a=1, **{"a":2})
Traceback (most recent call last):
  File "", line 1, in 
Random.seed(a=1, **{"a":2})
TypeError: random.Random.seed() got multiple values for keyword argument 'a'

or:

>>> class A:
...def foo(bar):
... pass
...
>>> A().foo(bar=1, **{"bar":2})
Traceback (most recent call last):
  File "", line 1, in 
TypeError: __main__.A.foo() got multiple values for keyword argument 'bar'

>From what I can tell, this seems to happen mostly during the CALL_FUNCTION_EX 
>instruction (unpacking *args and **kwargs), while we don't get this level of 
>qualification during the actual do_call_core. Maybe _PyObject_FunctionString 
>should ideally be used everywhere applicable, but there seems to be a 
>different issue with that: the _PyEval_EvalCode function where the error 
>handling occurs doesn't receive a reference to the function, only references 
>to the function's code object and qualname (unless I'm missing to something).

Should the signature of _PyEval_EvalCode be modified? Or should we be satisfied 
with the half-measure of including the qualname but not the module (at least 
for now)? Is there a good reason for the different qualifiers for functions 
when encountering different types of invalid arguments?

There's also a block that I couldn't figure out how to reach in tests from 
Python code: at 
https://github.com/python/cpython/blob/master/Python/ceval.c#L4233 ("keywords 
must be strings" when passing as two arrays), I don't know how to reach this 
code in a way that isn't a SyntaxError first.

--

___
Python tracker 
<https://bugs.python.org/issue40679>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40679] show class name in method invocation TypeError

2020-05-19 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

The attached PR isn't exactly what you requested, but it's a very minimal code 
change that uses the existing __qualname__ functionality to change the message 
to

TypeError: A.foo() takes 1 positional argument but 2 were given

Does that address those problems?

--

___
Python tracker 
<https://bugs.python.org/issue40679>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40679] show class name in method invocation TypeError

2020-05-19 Thread Dennis Sweeney


Change by Dennis Sweeney :


--
keywords: +patch
nosy: +Dennis Sweeney
nosy_count: 2.0 -> 3.0
pull_requests: +19523
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/20236

___
Python tracker 
<https://bugs.python.org/issue40679>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38938] Possible performance improvement for heaqq.merge()

2020-05-18 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

I mostly like new_merge.py too, especially the dynamic reduction of the tree.

However, it looks like ``list(merge([2],[1],[1]))`` currently fails, and I 
think what's missing is the following in the sibling-promotion:

+ if sibling.left is not None:
+ sibling.left.parent = sibling.right.parent = parent

Also for what it's worth, I think both appearances of "c1 if c1.value < 
c2.value else c2" should become "c2 if c2.value < c1.value else c1" for 
stability. 

I've attached winners.py, which is similar to new_merge.py, but replaces the 
"while node.left is not None: node = node.source" traversal with a single "node 
= node.leaf" call and adds a fast yield-from for the last iterator. I don't 
think it's much more complex either.

--
Added file: https://bugs.python.org/file49170/winners.py

___
Python tracker 
<https://bugs.python.org/issue38938>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38938] Possible performance improvement for heaqq.merge()

2020-05-17 Thread Dennis Sweeney


Change by Dennis Sweeney :


Added file: https://bugs.python.org/file49167/losers.py

___
Python tracker 
<https://bugs.python.org/issue38938>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38938] Possible performance improvement for heaqq.merge()

2020-05-17 Thread Dennis Sweeney


Change by Dennis Sweeney :


Removed file: https://bugs.python.org/file49165/losers.py

___
Python tracker 
<https://bugs.python.org/issue38938>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38938] Possible performance improvement for heaqq.merge()

2020-05-17 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

It seems to me that the code sprawl mostly comes from the separate handling of 
the four keyed/unkeyed and forward/reverse cases, which as far as I can tell 
requires a branch in the innermost loop if not unrolled into separate cases. I 
think recursive_merge.py is about as concise as possible while still 
efficiently handling all four cases.

Is there any issue with recursion in this application? Even if there are 
2**64-1 iterables, this would only mean 64 nested next() calls, which should be 
within the stack on most machines, no?

I did the following benchmark:

py -3.9 -m pyperf timeit -s "from [FILENAME] import merge; from collections 
import deque" "deque(merge(open('english.txt'), open('dutch.txt'), 
open('french.txt'), open('german.txt'), open('italian.txt')), maxlen=0)"

new_merge.py:  Mean +- std dev: 773 ms +- 16 ms
tournament_heap.py:Mean +- std dev: 665 ms +- 42 ms
losers.py: Mean +- std dev: 470 ms +- 31 ms
Existing heapq:Mean +- std dev: 313 ms +- 2 ms
recursive_merge.py:Mean +- std dev: 266 ms +- 3 ms

I can look at some more diverse benchmarks (types, iterable length imbalance, 
lengths of an iterator's win-streak, etc.) soon.

--

___
Python tracker 
<https://bugs.python.org/issue38938>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38938] Possible performance improvement for heaqq.merge()

2020-05-17 Thread Dennis Sweeney


Change by Dennis Sweeney :


Added file: https://bugs.python.org/file49166/recursive_merge.py

___
Python tracker 
<https://bugs.python.org/issue38938>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38938] Possible performance improvement for heaqq.merge()

2020-05-17 Thread Dennis Sweeney


Change by Dennis Sweeney :


Added file: https://bugs.python.org/file49165/losers.py

___
Python tracker 
<https://bugs.python.org/issue38938>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38938] Possible performance improvement for heaqq.merge()

2020-05-17 Thread Dennis Sweeney


Change by Dennis Sweeney :


Added file: https://bugs.python.org/file49164/tournament_heap.py

___
Python tracker 
<https://bugs.python.org/issue38938>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38938] Possible performance improvement for heaqq.merge()

2020-05-17 Thread Dennis Sweeney


Change by Dennis Sweeney :


Removed file: https://bugs.python.org/file48748/merge_recipe.py

___
Python tracker 
<https://bugs.python.org/issue38938>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38938] Possible performance improvement for heaqq.merge()

2020-05-17 Thread Dennis Sweeney


Change by Dennis Sweeney :


Removed file: https://bugs.python.org/file49156/recursive_merge.py

___
Python tracker 
<https://bugs.python.org/issue38938>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38938] Possible performance improvement for heaqq.merge()

2020-05-17 Thread Dennis Sweeney


Change by Dennis Sweeney :


Removed file: https://bugs.python.org/file48747/iter_merge.py

___
Python tracker 
<https://bugs.python.org/issue38938>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38938] Possible performance improvement for heaqq.merge()

2020-05-15 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

The attached recursive_merge.py should be much less ugly and still somewhat 
performant.

It should be the same algorithm as that PR, just written recursively rather 
than iteratively.

I got some text files from http://www.gwicks.net/dictionaries.htm and tried 
merging them line-by-line:

py -3.9 -m pyperf timeit -s "from heapq import merge; from collections import 
deque" "deque(merge(open('english.txt'), open('dutch.txt'), open('french.txt'), 
open('german.txt'), open('italian.txt')), maxlen=0)"

Mean +- std dev: 391 ms +- 9 ms

py -3.9 -m pyperf timeit -s "from recursive_merge import merge; from 
collections import deque" "deque(merge(open('english.txt'), open('dutch.txt'), 
open('french.txt'), open('german.txt'), open('italian.txt')), maxlen=0)"

Mean +- std dev: 262 ms +- 9 ms

Perhaps that's a more real-world benchmark.

--
Added file: https://bugs.python.org/file49156/recursive_merge.py

___
Python tracker 
<https://bugs.python.org/issue38938>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



  1   2   >