Change by Tim Peters :
--
keywords: +patch
pull_requests: +13801
stage: -> patch review
pull_request: https://github.com/python/cpython/pull/13934
___
Python tracker
<https://bugs.python.org/issu
New submission from Tim Peters :
On 64-bit Python, many object sizes essentially doubled over 32-bit Python,
because Python objects are so heavy with pointers. More recently, forcing
alignment to 16 bytes on 64-bit boxes boosted the memory requirements more
modestly.
But obmalloc's 256 KiB
Tim Peters added the comment:
I agree: perm(n) should return factorial(n).
--
___
Python tracker
<https://bugs.python.org/issue37178>
___
___
Python-bugs-list m
Tim Peters added the comment:
Also basic: run hardware CPU and memory stress diagnostics, and/or try running
the same thing on a different machine. Hardware isn't infallible, and can fail
in nearly arbitrary ways. For example, perhaps a smidgen of silicon has gone
flaky, so that one time
Tim Peters added the comment:
Thank you, Friedl! I appreciate the extra confirmation - and especially on
even bigger test cases :-)
--
___
Python tracker
<https://bugs.python.org/issue37
Tim Peters added the comment:
I haven't looked at anything in this PR, so just popping in to confirm that the
first time I saw stuff like:
len(obj, /)
in the docs I had no idea at all what it was trying to say. I thought it was a
typo.
Also the second, third, fourth, ..., times I saw
Tim Peters added the comment:
Python needs a benevolent dictator ;-)
I'd be happy for Mark, Raymond, or me to play that role here. If it were me,
I'd drop the k <= n requirement: both arguments must be non-negative integers.
Because a combinatorial (not factorial-, and certai
Tim Peters added the comment:
Ya, I'm mostly with Raymond. `math` was originally a very thin wrapper around
C's libm, but we've been moving away from that more and more for decades, and
it's no longer the case anyway that the vast bulk of new Python programmers are
intimately familiar
Tim Peters added the comment:
I'm not fatally opposed to relaxing k <= n. David makes some good points about
it, and - as Raymond already noted - "0" is consistent with the behavior of
itertools.combinations().
The docs would need to change, though, because the factorial "
Tim Peters added the comment:
I'm going to repeat part of an earlier comment :-)
"""
Please resist pointless feature creep. The original report was about comb(n,
k) for integer n and k with 0 <= k <= n and that's all. Everyone who commented
appeared to agree th
Tim Peters added the comment:
I'm not convinced, although I agree relaxing k <= n is less damaging than
relaxing k >= 0.
Python isn't aimed at mathematicians (although some 3rd-party packages
certainly are, and they're free to define things however they like). We have
to tra
Tim Peters added the comment:
Strongly prefer requiring 0 <= k <= n at first. This is a programming
language: it will be applied to real problems, not to abstract proofs where
some slop can be helpful in reducing the number of cases that need to be
considered.
The Twitter
Tim Peters added the comment:
I prefer that a negative int raise ValueError, but am OK with it using the
absolute value instead (i.e., what it does now).
--
___
Python tracker
<https://bugs.python.org/issue29
Change by Tim Peters :
--
resolution: -> fixed
stage: patch review -> resolved
status: open -> closed
___
Python tracker
<https://bugs.python.or
Tim Peters added the comment:
New changeset 1c263e39c4ed28225a7dc8ca1f24953225ac48ca by Tim Peters in branch
'master':
bpo-37029: keep usable_arenas in sorted order without searching (#13612)
https://github.com/python/cpython/commit/1c263e39c4ed28225a7dc8ca1f24953225ac48ca
Tim Peters added the comment:
Thank you so much, Inada! That's very good to hear :-)
--
___
Python tracker
<https://bugs.python.org/issue37029>
___
___
Pytho
Tim Peters added the comment:
Added file arena.py. This adds some code to the OP's original test, to print
out build time and teardown time, and display obmalloc stats.
You'll need at least 80GB of RAM to run it - I don't have that much. Building
the tree may take on the order of an hour
Tim Peters added the comment:
Created a PR and assigned myself to this bug.
--
assignee: -> tim.peters
___
Python tracker
<https://bugs.python.org/issu
Change by Tim Peters :
--
keywords: +patch
pull_requests: +13516
stage: -> patch review
pull_request: https://github.com/python/cpython/pull/13612
___
Python tracker
<https://bugs.python.org/issu
Tim Peters added the comment:
Same kind of problem with the new upload, Bob.
Line 38 is:
print(Fore.GREEN,"Installed file ",e,Fore.WHITE)
indented 8 spaces. Lines 39 and 40 are empty. Then line 41 is:
if x>=len(files): # if x is more than or equal to the number of files in t
Tim Peters added the comment:
As Steven said, there's an obvious indentation error in the file you actually
attached. So nobody can _guess_ what your problem is. Please upload a file
showing your actual problem.
If I increase the indentation of the `print` Steven identified to match
Tim Peters added the comment:
I believe the thrust of Mark's suggestion was that it would allow using `k =
(n-1).bit_length()` even when n == 1, without special-casing n == 1. But you'd
still be adding a new "subtract 1" operation, and would still change results in
some cases.
Tim Peters added the comment:
Oops! Should be issue 8048.
--
___
Python tracker
<https://bugs.python.org/issue26092>
___
___
Python-bugs-list mailing list
Unsub
Tim Peters added the comment:
Noam Yorav-Raphael, I tried to add you because this is pushing back against
your patch in issue 8408. It's been some years now, and nobody has cared
enough to pursue it, so I'll just close this if you still don't care ;-)
As doctest's original author, I
Change by Tim Peters :
--
nosy: +noam, tim.peters
___
Python tracker
<https://bugs.python.org/issue26092>
___
___
Python-bugs-list mailing list
Unsubscribe:
Tim Peters added the comment:
It sounds like you need to read an introduction (any!) to computer
floating-point formats. The relevant part in the output you showed is this:
mant_dig=53
As on almost other current machines, your platform's floating point format is
restricted to 53 bits
Tim Peters added the comment:
Note that this pure Python gives the same result:
>>> "%.0f" % float(1639873214337061279)
'1639873214337061376'
That's not surprising: int -> float conversion is limited to the precision of
a Python float, which is a C double, which is al
Tim Peters added the comment:
+1 from me! I'm tired of re-inventing this too :-)
Agree with every point Mark made.
Just in passing, noting a triviality: for the ceiling, `1 + isqrt(n - 1)`
fails when `n` is zero.
But I've never had a use for the ceiling here, or for "nearest&qu
Tim Peters added the comment:
I'm inclined to agree with Mark - the wholly naive algorithm is already
"perfect" away from extreme (whether large or small) magnitudes, and can't be
beat for speed either. This kind of thing gets much more attractive in IEEE
single precision, whe
Tim Peters added the comment:
Josh, I agree - but do read Armin's response. The string hack was done inside
ceval.c, where we have extraordinary knowledge of context. Adding similar
hacks inside Python C API functions seems to be a non-starter - they can't
magically start mutating
Tim Peters added the comment:
Serhiy, if I understand this, it _could_ pay big with even just a few
operations. For example,
[0] * 100 + [1] + [2] + [3]
As-is, we'll copy the million zeroes three times. WIth the patch, more likely
the original vector of a million zeroes would
Tim Peters added the comment:
Brandt, it's quite possible I have no real idea what this patch is aimed at. I
didn't reverse-engineer the code, the title is extremely broad, and there are
no Python examples I can find for what it _is_ aimed at. If, e.g., it's
impossible that this patch has
Tim Peters added the comment:
I'm at best -0 on this, and on the edge of -1. We already supply
mutate-in-place operations for when this is wanted, and, e.g., for lists, it's
as dead easy to spell as
L1 += L2
Half the point of the hack for string catenation was that the similar-looking
Tim Peters added the comment:
Jurjen, the errors you see in Python's sin() are _entirely_ due to your
platform C's libm. Python just calls the platform C's sin. So nothing can be
said about it in general.
The better libm trig functions today do indeed perform trig argument reduction
Tim Peters added the comment:
I have no use for this either, and agree with rejecting it - in nearly 30
years, nobody has asked for this before, and it's still the case that we don't
have an actual programming use case (some theoretical use in an abstract
mathematical model isn't
Tim Peters added the comment:
Stuff like that happens in any language supporting a tell() function for a file
opened in text mode on Windows, inherited from the platform C's ftell()
implementation:
https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/ftell-ftelli64?view=vs-2017
Tim Peters added the comment:
> *Clearly*, negative NaNs should be sorted to the start
> of the list, while positive NaNs are sorted to the end
> of the list. (Yes, I'm joking, but only a little bit: that's
> the result you'd get if you used IEEE 754's totalOrder
> to sort.
Tim Peters added the comment:
If we could roll back the clock, I'd impose a total ordering on floats in
Python and supply some other way to spell 754's comparison operators (which
users would have to ask for explicitly if they wanted to hassle with the
near-useless and too-often-surprising
Tim Peters added the comment:
Raymond, I doubt we can do better (faster) than straightforward egcd without
heroic effort anyway. We can't even know whether a modular inverse exists
without checking whether the gcd is 1, and egcd builds on what we have to do
for the latter anyway. Even
Tim Peters added the comment:
- Some form of this would be most welcome!
- If it's spelled this way, put the modulus argument last? "Everyone expects"
the modulus to come last, whether in code:
x = (a+b) % m
x = a*b % m
x = pow(a, b, m)
or in math:
a^(k*(p-1
Tim Peters added the comment:
Thanks, Steven! I'll go on to suggest that the best intro to these issues for
Python programmers is in Python's own tutorial:
https://docs.python.org/3/tutorial/floatingpoint.html
Raymond, `divmod(a, b)[0]` IS the mathematical `floor(a/b)`, where the latter
Tim Peters added the comment:
"Multiple roundings" in the float code is a red herring - that's just
implementation details trying to cheaply get the same effect as computing with
infinite precision. Here with actual unbounded precision:
>>> from fractions import Fracti
Tim Peters added the comment:
> Oh, Tim Peters succeded somehow to
> remove Josh Rosenberg from the nosy list: I add him again ;-)
Certainly wasn't my intent! That happens too often on the tracker. Thanks for
noticing! :-(
--
___
Python t
Tim Peters added the comment:
Josh, I'd say the speed of this code doesn't matter one whit to anything. Like
you, I'd _prefer_ that the issue be fixed by building "real tuples" that own
their own references, which would also (as you showed) allow for briefer,
simpler, cl
Tim Peters added the comment:
> It's impressive *and* scary that such 13 years old
> bug only show up today...
Then again, there's probably no other code in the world that compares slice
objects ;-)
--
nosy: +tim.peters -josh.r
___
Python t
Tim Peters added the comment:
It's probably OK, but there's no "pure win" to be had here. There's generally
more than one way to convert one string to another, and what "looks right" to
humans depends a whole lot on context.
For example, consider these strin
Tim Peters added the comment:
difflib generally synchs on the longest contiguous matching subsequence that
doesn't contain a "junk" element. By default, `ndiff()`'s optional `charjunk`
argument considers blanks and tabs to be junk characters.
In the strings:
"drwxrwxr-x
Tim Peters added the comment:
Without re.IGNORECASE, the leading
^[_a-z0-9-]+
can't even match the first character (`val` starts with uppercase Z), so it
fails instantly.
With re.IGNORECASE, it's not "stuck", but is taking a verry long time to
try an enormous number of (
Tim Peters added the comment:
Yes, it's quadratic time. If the string being searched has N characters, first
it fails to find "x" in all N of 'em, then `.*` advances by one and it fails to
find "x" in the trailing N-1 characters, then again in the trailing N-2, and so
Tim Peters added the comment:
As Mark said, the behavior of Python's floating-point math functions (not just
trig, but also log, pow, exp, ...) is inherited almost entirely from the
platform C's math libraries. Python itself guarantees nothing about what
`math.sin(x)` returns for any `x
Tim Peters added the comment:
> As far as I can tell from the discussions here, Steven
> and you stated a preference for the shortened names, and
> that's it.
Plus Mark, plus me - all backed "comb()" specifically.
> I find it hard to imagine anyone needing combinations
Tim Peters added the comment:
Precisely _how_ are you building it? As noted above, if you're using Visual
Studio, try using build.bat instead for the first time.
--
___
Python tracker
<https://bugs.python.org/issue34
Tim Peters added the comment:
I'd like to divorce `prod()` from floating-point complications. The `sum()`
builtin has proved extremely handy, even for floats, in large part because it's
type-agnostic and straightforward. While I'd usually use `prod()` on ints and
Fractions, in almost all
Tim Peters added the comment:
For history, note that `bisect` doesn't always work in this context either: a
heap is NOT always in sorted order. For example, this is "a (min) heap" too:
[1, 3, 2].
More, that's "the real" problem. If we could find the element to be rem
Tim Peters added the comment:
Please resist pointless feature creep. The original report was about comb(n,
k) for integer n and k with 0 <= k <= n and that's all. Everyone who commented
appeared to agree they'd find that useful.
But nobody has said they'd find generalizing beyond
Tim Peters added the comment:
Steven, thanks for noticing the docs! I was surprised to hear it wasn't
documented, but not surprised enough to check myself ;-)
This decision was suggested by me, and endorsed by Guido, when designing
timsort looking ahead to Python 3, where __cmp__ was going
Tim Peters added the comment:
I don't know that the language needs to define this, but sticking to __lt__ was
a wholly deliberate design decision for CPython.
--
nosy: +tim.peters
___
Python tracker
<https://bugs.python.org/issue35
Tim Peters added the comment:
Antoine, alas, it's subtler than that. The worker process (process_func())
puts _another_ `StopIteration` on the input queue in its `finally` clause. So
the first worker process to finish adds back a sentinel for the next worker to
see, and so on. At the end
Tim Peters added the comment:
Please read my answer again. Your code does not do what you _think_ it does.
It does what I said it does instead.
>>> a = input()
1010
>>> print(a)
1010
>>> print(type(a))
The input you're working with is NOT A LIST OF INTEGERS. I
Tim Peters added the comment:
`input()` returns a string, not a list. For input '1010' you're effectively
computing this:
>>> int('1' * 8) + int('1' * 2) # = + 11
1122
which is the correct answer. If you want your input to be a list of integers
instead of a string,
Tim Peters added the comment:
Just for fun, here's a gonzo implementation (without arg-checking) using ideas
from the sketch. All factors of 2 are shifted out first, and all divisions are
done before any multiplies.
For large arguments, this can run much faster than a dumb loop
Tim Peters added the comment:
My two cents:
- Spell it comb(n, k).
- TypeError if args aren't ints.
- ValueError if not 0 <= k <= n.
Not concerned about speed. It's possible to do this with no divisions
involving integers bigger than `n` and `k`(*), using O(k) space, but for
&quo
Tim Peters added the comment:
Yup, it can do some redundant comparisons; more on that here:
https://mail.python.org/pipermail/python-dev/2018-August/155020.html
I'm not inclined to make this already-difficult code even harder to understand
for something that's quite likely not to matter
Tim Peters added the comment:
To include trailing whitespace on a line in a doctest, _don't_ use raw strings.
Use a regular string, and include add a (one or more) trailing \x20 instead of
a space (for example). For example:
r"""
>>> print("a ")
a
&qu
Tim Peters added the comment:
I left the code in because it was harmless (a 100%-predictable branch), and it
was easier to show that overflow was _considered_ than to explain in full why
it was impossible. In the context of CPython. For example, the Java port of
this code couldn't rely
Tim Peters added the comment:
This doesn't actually matter - the code can never trigger. It would be fine to
replace it with an assert to that effect (see below for a specific suggestion).
The reason: The indices in this code are into vectors of PyObject*. These
vectors can't contain
Tim Peters added the comment:
I don't object to spelling it out more, and your (Terry's) suggestions are
fine. On the other hand, this module has been around for a lng time now,
and this is the first instance I've seen of someone surprised that it doesn't
produce overlapping matches
Tim Peters added the comment:
Sorry, I find this too hard to follow. At the end:
> ['TA', 'CA'] # Missing the substring 'CA'
the comment claims it's missing 'CA', while the output plainly shows it's _not_
missing 'CA'.
If your complaint is that's missing 'AC', yes, it is. I
Tim Peters added the comment:
Not without more expense. Which is why it hasn't been done. But this is a
wrong place to talk about it. If you want Python to change, go to the
python-ideas mailing list?
https://mail.python.org/mailman/listinfo/python-ideas
Tim Peters added the comment:
Questions about Python should be asked, e.g., on the Python mailing list.
The short course is that it's desired that iterating over dicts be as fast as
possible, and nobody knows a way to make iteration robust in the face of
mutations that wouldn't
Tim Peters added the comment:
This won't be changed. The effect on the iteration order of adding and/or
removing keys from a dict while iterating over it has never been defined in
Python, and never will be. The Python 3 docs for dict views say this clearly:
"""
Iterat
Tim Peters added the comment:
This comes up every few years, but that's about it. Here's the iteration from
2 years ago:
https://mail.python.org/pipermail/python-ideas/2016-October/043039.html
Follow the thread. It contains easy-to-use wrappers for both "do it in
multiple simple p
Tim Peters added the comment:
>> Changes initialization to add in the length:
> What's the rationale for that change? You always asked
> me to stay as close as possible to the "official" hash function
> which adds in the length at the end. Is there an ac
Tim Peters added the comment:
We need to worry about timing too :-( I'm using this as a test. It's very
heavy on using 3-tuples of little ints as dict keys. Getting hash codes for
ints is relatively cheap, and there's not much else going on, so this is
intended to be very sensitive
Tim Peters added the comment:
Here's htest.py output from
git pull https://github.com/jdemeyer/cpython.git bpo34751
and a variation. The number of collisions in the variation appear in square
brackets at the end of each line.
32-bit build:
range(100) by 3; 32-bit hash codes; mean 116.42
Tim Peters added the comment:
Attaching htest.py so we have a common way to compare what various things do on
the tests that have been suggested. unittest sucks for that. doctest too.
Here's current code output from a 32-bit build; "ideally" we want "got" values
Tim Peters added the comment:
Thinking about the way xxHash works prompted me to try this initialization
change:
x = 0x345678UL + (Py_uhash_t)len; // adding in the length is new
That was a pure win in my tests, slashing the number of collisions in the new
tuple test, 32-bit build, from
Tim Peters added the comment:
As a sanity check, I tried the 32-bit version of MurmurHash2 (version 3 doesn't
have a 32-bit version). All of my Python tests had collisions, and while the
new tuple test dropped to 15, product([0.5, 0.25], repeat=20) skyrocketed from
141 (32-bit xxHash
Tim Peters added the comment:
Stephane, it's not deep. People who need to write doctests that work across N
versions of Python shouldn't need to read N versions of the documentation.
This is hardly unique to doctest. We routinely add "Changed in version m.n"
blurbs all over
Tim Peters added the comment:
Add a comment along the lines you (Terry) suggested. Some people need to write
doctests that run under many versions of Python, so the info is still supremely
relevant to them.
--
nosy: +tim.peters
___
Python
Tim Peters added the comment:
>> people already wrote substantial test suites dedicated
>> to that sole purpose, and we should aim to be "mere
>> consumers" of functions that pass _those_ tests.
> There are hash functions that pass those tests which
> a
Change by Tim Peters :
--
components: +Interpreter Core -XML
___
Python tracker
<https://bugs.python.org/issue34751>
___
___
Python-bugs-list mailing list
Unsub
Tim Peters added the comment:
>> In the 64-bit build there are no collisions across my
>> tests except for 11 in the new tuple test.
> That's pretty bad actually. With 64 bits, you statistically
> expect something in the order of 10**-8 collisions. So
> what you'
Tim Peters added the comment:
> Note that I'm always considering parametrized
> versions of the hash functions that I'm testing.
> I'm replacing the fixed multiplier (all algorithms
> mentioned here have such a thing) by a random
> multiplier which is 3 mod 8.
I've explained
Tim Peters added the comment:
Here's a complete xxHash-based implementation via throwing away C++-isms from
https://github.com/RedSpah/xxhash_cpp/blob/master/xxhash/xxhash.hpp
In the 64-bit build there are no collisions across my tests except for 11 in
the new tuple test.
The 32-bit build
Tim Peters added the comment:
Thanks for looking at xxHash! An advantage is that it already comes in 32- and
64-bit versions.
> A (simplified and slightly modified version of) xxHash
> seems to work very well, much better than SeaHash.
? I've posted several SeaHash cores that
Tim Peters added the comment:
> SeaHash seems to be designed for 64 bits.
Absolutely.
> I'm guessing that replacing the shifts by
>
> x ^= ((x >> 16) >> (x >> 29))
>
> would be what you'd do for a 32-bit hash.
My guess too. But "the prime" i
Tim Peters added the comment:
FYI, this appears to be a good overview of what SMHasher is looking for:
https://github.com/aappleby/smhasher/wiki/SMHasher
Someg of everything: performance, correctness, statistical measures, and
specific kinds of keysets that have proved problematic for other
Tim Peters added the comment:
> So it seems that this SMHasher test suite doesn't
> catch the problem that we're seeing with negative integers.
Seems to be so, but I've never run SMHasher myself. I believe it's focused on
statistical properties, like avalanche and bit indepe
Tim Peters added the comment:
> >>> from itertools import product
> >>> len(set(map(hash, product([0.5, 0.25], repeat=20
> 32
> Good catch! Would you like me to add this to the testsuite?
It's in mine already ;-) I've added all the "bad examples&
Tim Peters added the comment:
>> For that reason, I've only been looking at those that
>> scored 10 (best possible) on Appleby's SMHasher[1] test suite
> Do you have a list of such hash functions?
A world of searching. The ones rated "Excellent" here (previously ci
Tim Peters added the comment:
>> I've noted before, e.g., that sticking to a prime
>> eliminates a world of regular bit patterns in the
>> multiplier.
> Why do you think this? 0x1fff is prime :-)
Point taken ;-) But "a world of" is not the same a
Tim Peters added the comment:
> If we, e.g., tested tuples of little floats instead ...
Speaking of which:
>>> from itertools import product
>>> len(set(map(hash, product([0.5, 0.25], repeat=20
32
32 hash codes out of 1048576 distinct two-tuples isn't exactly
Tim Peters added the comment:
> So my suggestion remains
>
> for y in INPUT:
>t = hash(y)
>t ^= t * SOME_LARGE_EVEN_NUMBER
>h ^= t
>h *= MULTIPLIER
On the face of it, I'd be amazed if that passed SMHasher, because it only
propagates bits "to the lef
Tim Peters added the comment:
>> the author wants this transformation to be easily
>> invertible, so a prime is necessary
> A multiplication by any odd number modulo 2**64 is
> invertible.
Right! My mistake.
> As I argued before, the concept of primes is
Tim Peters added the comment:
A meta-comment: a great deal of work has been done on non-crypto hashes in
recent years. I'm not interested in rolling our own if one of the "winners"
from that process can be adapted. For that reason, I've only been looking at
those that score
Tim Peters added the comment:
>Py_uhash_t t = (Py_uhash_t)y;
>x ^= t ^ (t << 1);
>x *= (Py_uhash_t)0x6eed0e9da4d94a4fULL;
>x ^= (x >> 32) >> (x >> 60);
>x *= (Py_uhash_t)0x6eed0e9da4d94a4fULL;
Comment out either one of the multip
Tim Peters added the comment:
If SeaHash is interesting to us (it is to me), I suggest dropping our DJB/FNV
combining part entirely and using a purer form, like so:
Py_uhash_t t = (Py_uhash_t)y;
x ^= t ^ (t << 1);
x *= (Py_uhash_t)0x6eed0e9da4d94a4fULL;
x ^= (x >&g
Tim Peters added the comment:
Noting that the failure of high-order bits to propagate is one form of
"avalanche" failure. A recent 64-bit hash, SeaHash, takes this approach which
is said to have provably "perfect" avalanche behavior:
Py_uhash_t t = (Py_uhash_t)y;
Tim Peters added the comment:
An "aha!" moment for me: I noted before that replacing the tuple hash loop with
Py_uhash_t t = (Py_uhash_t)y;
t ^= t << 1;
x = (x * mult) + t;
does OK (64-bit build): most tests had no collisions, but the original tuple
test
501 - 600 of 1679 matches
Mail list logo