Tim Peters added the comment:
[Tim]
> Perhaps worth noting that FNV-1a works great if used as
> _intended_: on a stream of unsigned bytes.
> ...
>
>Py_uhash_t t = (Py_uhash_t)y;
>for (int i = 0; i < sizeof(t); ++i) {
>x = (x ^ (t & 0xff)) *
Tim Peters added the comment:
Also worth noting: other projects need to combine hashes too. Here's a 64-bit
version of the highly regarded C++ Boost[1] library's hash_combine() function
(I replaced its 32-bit int literal with "a random" 64-bit one):
Tim Peters added the comment:
Perhaps worth noting that FNV-1a works great if used as _intended_: on a
stream of unsigned bytes. All tests except the new tuple hash test suffer no
collisions; the new test suffers 14. Nothing is needed to try to worm around
nested tuple catastrophes, or
Tim Peters added the comment:
I should have spelled this out before: these are all permutations, so in
general permuting the result space of `x * mult + y` (or any other permutation
involving x and y) is exactly the same as not permuting it but applying a
different permutation to y instead
Tim Peters added the comment:
>> The two-liner above with the xor in the second line is
>> exactly Bernstein 33A, followed by a permutation
>> of 33A's _output_ space.
> Not output space, but internal state
? 33A's output _is_ its internal state at the e
Tim Peters added the comment:
High-order bit: please restore the original tuple hash test. You have the
worst case of "but I didn't write it" I've ever encountered ;-) Your new test
is valuable, but I've seen several cases now where it fails to detect any
problem
Tim Peters added the comment:
>> j is even implies (j ^ -3) == -(j ^ 3)
> This follows from what I posted before: if j is even, then
> j ^ 3 is odd, so we can apply the rule x ^ -2 = -x to x = j ^ 3
> ...
Thanks! That helps a lot. I had a blind spot there. This kind of th
Tim Peters added the comment:
> Suppose that there is a hash collision, say hash((3, 3)) ==
> hash((-3, -3)) and you change the hashing algorithm to fix
> this collision.
There are _two_ hash functions at play in that collision: the tuple hash
function, and the integer hash
Tim Peters added the comment:
And one more:
x = (x * mult) ^ t;
also appears to work equally well. So, way back when, it appears we _could_
have wormed around the disaster du jour just by applying a shift-xor
permutation to the raw hash results.
Note the implication: if we
Tim Peters added the comment:
Just noting that this Bernstein-like variant appears to work as well as the
FNV-1a version in all the goofy ;-) endcase tests I've accumulated:
while (--len >= 0) {
y = PyObject_Hash(*p++);
if (y == -1)
r
Tim Peters added the comment:
> advantage of my approach is that high-order bits become more
> important:
I don't much care about high-order bits, beyond that we don't systematically
_lose_ them. The dict and set lookup routines have their own strategies for
incorporating
Tim Peters added the comment:
Jeroen, I understood the part about -2 from your initial report ;-) That's why
the last code I posted didn't use -2 at all (neither -1, which hashes to -2).
None of the very many colliding tuples contained -2 in any form. For example,
these 8 tuple
Tim Peters added the comment:
> when you do t ^= t << 7, then you are not changing
> the lower 7 bits at all.
I want to leave low-order hash bits alone. That's deliberate.
The most important tuple component types, for tuples that are hashable, are
strings and contiguous ra
Tim Peters added the comment:
BTW, those tests were all done under a 64-bit build. Some differences in a
32-bit build:
1. The test_tuple hash test started with 6 collisions. With the change, it
went down to 4. Also changing to the FNV-1a 32-bit multiplier boosted it to 8.
The test
Tim Peters added the comment:
FYI, using this for the guts of the tuple hash works well on everything we've
discussed. In particular, no collisions in the current test_tuple hash test,
and none either in the cases mixing negative and positive little ints. This
all remains so usin
Tim Peters added the comment:
[Raymond, on boosting the multiplier on 64-bit boxes]
> Yes, that would be perfectly reasonable (though to some
> extent the objects in the tuple also share some of the
> responsibility for getting all bits into play).
It's of value independent of
Tim Peters added the comment:
Has anyone figured out the real source of the degeneration when mixing in
negative integers? I have not. XOR always permutes the hash range - it's
one-to-one. No possible outputs are lost, and XOR with a negative int isn't
"obviously degener
Tim Peters added the comment:
Oh, I don't agree that it's "broken" either. There's still no real-world test
case here demonstrating catastrophic behavior, neither even a contrived test
case demonstrating that, nor a coherent characterization of what "the proble
Tim Peters added the comment:
Raymond, I share your concerns. There's no reason at all to make gratuitous
changes (like dropping the "post-addition of a constant and incorporating
length signature"), apart from that there's no apparent reason for them
existing to begin
Tim Peters added the comment:
I strive not to believe anything in the absence of evidence ;-)
FNV-1a supplanted Bernstein's scheme in many projects because it works better.
Indeed, Python itself used FNV for string hashing before the security wonks got
exercised over collision attacks
Tim Peters added the comment:
So you don't know of any directly relevant research either. "Offhand I can't
see anything wrong" is better than nothing, but very far from "and we know it
will be OK because [see references 1 and 2]".
That Bernstein's DJBX3
Tim Peters added the comment:
Because the behavior of signed integer overflow isn't defined in C. Picture a
3-bit integer type, where the maximum value of the signed integer type is 3.
3+3 has no defined result. Cast them to the unsigned flavor of the integer
type, though, and the r
Tim Peters added the comment:
>> Why do you claim the original was "too small"? Too small for
>> what purpose?
> If the multiplier is too small, then the resulting hash values are
> small too. This causes collisions to appear for smaller numbers:
All right! An
Tim Peters added the comment:
Thank you, Vincent! I very much enjoyed - and appreciated - your paper I
referenced at the start. Way back when, I thought I had a proof of O(N log N),
but never wrote it up because some details weren't convincing - even to me ;-)
. Then I had to move
Tim Peters added the comment:
Oops!
"""
"j odd implies j^(-2) == -j, so that m*(j^(-2)) == -m"
"""
The tail end should say "m*(j^(-2)) == -m*j" instead.
--
___
P
Tim Peters added the comment:
For me, it's largely because you make raw assertions with extreme confidence
that the first thing you think of off the top of your head can't possibly make
anything else worse. When it turns out it does make some things worse, you're
equally con
Tim Peters added the comment:
You said it yourself: "It's not hard to come up with ...". That's not what
"real life" means. Here:
>>> len(set(hash(1 << i) for i in range(100_000)))
61
Wow! Only 61 hash codes across 100 thousand distinct int
Tim Peters added the comment:
@jdemeyer, you didn't submit a patch, or give any hint that you _might_. It
_looked_ like you wanted other people to do all the work, based on a contrived
example and a vague suggestion.
And we already knew from history that "a simple Bernstein has
Change by Tim Peters :
--
nosy: +ned.deily
___
Python tracker
<https://bugs.python.org/issue34751>
___
___
Python-bugs-list mailing list
Unsubscribe:
Tim Peters added the comment:
Ah! I see that the original SourceForge bug report got duplicated on this
tracker, as PR #942952. So clicking on that is a lot easier than digging thru
the mail archive.
One message there noted that replacing xor with addition made collision
statistics much
Tim Peters added the comment:
@jdemeyer, please define exactly what you mean by "Bernstein hash". Bernstein
has authored many hashes, and none on his current hash page could possibly be
called "simple":
https://cr.yp.to/hash.html
If you're talking about the
Tim Peters added the comment:
Ya, I care: `None` was always intended to be an explicit way to say "nothing
here", and using unique non-None sentinels instead for that purpose is
needlessly convoluted. `initial=None` is perfect. But then I'm old & in the
way ;
Tim Peters added the comment:
FYI, I bet I didn't see a problem with the Win32 target because I followed
instructions ;-) and did my first build using build.bat. Using that for the
x64 too target makes the problem go away.
--
___
Python tr
Tim Peters added the comment:
Another runstack.py adds a bad case for 2-merge, and an even worse
(percentage-wise) bad case for timsort. powersort happens to be optimal for
both.
So they all have contrived bad cases now. powersort's bad cases are the least
bad. So far ;-) But I e
Tim Peters added the comment:
New version of runstack.py.
- Reworked code to reflect that Python's sort uses (start_offset, run_length)
pairs to record runs.
- Two unbounded-integer power implementations, one using a loop and the other
division. The loop version implies that, in Pyt
New submission from Tim Peters :
Using Visual Studio 2017 to build the current master branch of Python
(something I'm trying for the first time in about two years - maybe I'm missing
something obvious!), with the x64 target, under both the Release and Debug
builds I get a Python
Tim Peters added the comment:
No, there's no requirement that run lengths on the stack be ordered in any way
by magnitude. That's simply one rule timsort uses, as well as 2-merge and
various other schemes discussed in papers. powersort has no such rule, and
that's fine.
Re
Tim Peters added the comment:
The notion of cost is that merging runs of lengths A and B has "cost" A+B,
period. Nothing to do with logarithms. Merge runs of lengths 1 and 1000, and
it has cost 1001.
They don't care about galloping, only about how the order in which merges
Tim Peters added the comment:
A new version of the file models a version of the `powersort` merge ordering
too. It clearly dominates timsort and 2-merge in all cases tried, for this
notion of "cost".
Against it, its code is much more complex, and the algorithm is very far fro
Tim Peters added the comment:
"Galloping" is the heart & soul of Python's sorting algorithm. It's explained
in detail here:
https://github.com/python/cpython/blob/master/Objects/listsort.txt
The Java fork of the sorting code has had repeated bugs due to reducing
Tim Peters added the comment:
Looks like all sorts of academics are exercised over the run-merging order now.
Here's a paper that's unhappy because timsort's strategy, and 2-merge too,
aren't always near-optimal with respect to the entropy of the distribution of
Tim Peters added the comment:
The attached runstack.py models the relevant parts of timsort's current
merge_collapse and the proposed 2-merge. Barring conceptual or coding errors,
they appear to behave much the same with respect to "total cost", with no clear
overall win
New submission from Tim Peters :
The invariants on the run-length stack are uncomfortably subtle. There was a
flap a while back when an attempt at a formal correctness proof uncovered that
the _intended_ invariants weren't always maintained. That was easily repaired
(as the resear
Tim Peters added the comment:
Bah - the relevant thing to assert is really
assert((size_t)Py_SIZE(a) + (size_t)Py_SIZE(b) <= (size_t)PY_SSIZE_T_MAX);
C sucks ;-)
--
___
Python tracker
<https://bugs.python.org/issu
Tim Peters added the comment:
I agree there's pointless code now, but don't understand why the patch replaces
it with mysterious asserts. For example, what's the point of this?
assert(Py_SIZE(a) <= PY_SSIZE_T_MAX / sizeof(PyObject*));
assert(Py_SIZE(b) <= PY_SSIZE_T_
Tim Peters added the comment:
Sure, if we make more assumptions. For 754 doubles, e.g., scaling isn't needed
if `1e-100 < absmax < 1e100` unless there are a truly ludicrous number of
points. Because, if that holds, the true sum is between 1e-200 and
number_of_points*1e200, bo
Tim Peters added the comment:
Thanks for doing the "real ulp" calc, Raymond! It was intended to make the
Kahan gimmick look better, and it succeeded ;-) I don't personally care
whether adding 10K things ends up with 50 ulp error, but to each their own.
Division can be most
Tim Peters added the comment:
Not that it matters: "ulp" is a measure of absolute error, but the script is
computing some notion of relative error and _calling_ that "ulp". It can
understate the true ulp error by up to a factor of 2 (the "wobble" of base 2
f
Tim Peters added the comment:
Yes, the assignment does "hide the global definition of g". But this
determination is made at compile time, not at run time: an assignment to `g`
_anywhere_ inside `f()` makes _every_ appearance of `g` within `f()` local to
`f`.
--
nosy: +
Tim Peters added the comment:
Closing as not-a-bug - not enough info to reproduce, but the regexp looked
prone to exponential-time backtracking to both MRAB and me, and there's been no
response to requests for more info.
--
components: +Regular Expressions
nosy: +ezio.me
Tim Peters added the comment:
Note: if you found a regexp like this _in_ the Python distribution, then a bug
report would be appropriate. It's certainly possible to write regexps that can
suffer catastrophic backtracking, and we've repaired a few of those, over the
years, th
Tim Peters added the comment:
Nick suggested two changes on 2018-07-15 (look above). Mark & I agreed about
the first change, so it wasn't mentioned again after that. All the rest has
been refining the second change.
--
___
Pytho
Tim Peters added the comment:
@CuriousLearner, does the PR also include Nick's first suggested change? Here:
"""
1. Replace the opening paragraph of
https://docs.python.org/3/library/stdtypes.html#bitwise-operations-on-integer-types
(the one I originally quoted whe
Tim Peters added the comment:
I'm sure Guido designed the API to discourage subtly bug-ridden code relying on
the mistaken belief that it _can_ know the queue's current size. In the
general multi-threaded context Queue is intended to be used, the only thing
`.qsize()`'s cal
Tim Peters added the comment:
Note that you can consume multiple gigabytes of RAM with this simpler program
too, and for the same reasons:
"""
import concurrent.futures as cf
bucket = range(30_000_000)
def _dns_query(target):
from time import sleep
sleep(0.1)
def
Tim Peters added the comment:
If your `bucket` has 30 million items, then
for element in bucket:
executor.submit(kwargs['function']['name'], element, **kwargs)
is going to create 30 million Future objects (and all the under-the-covers
objects needed to mana
Tim Peters added the comment:
Ya, Mark's got a point there. Perhaps
s/the internal/a finite two's complement/
?
--
___
Python tracker
<https://bugs.python.o
Tim Peters added the comment:
Well, all 6 operations "are calculated as though carried out in two's
complement with an infinite number of sign bits", so I'd float that part out of
the footnote and into the main text. When, e.g., you're thinking of ints _as_
bit
Tim Peters added the comment:
Nick, that seems a decent compromise. "Infinite string of sign bits" is how
Guido & I both thought of it when the semantics of longs were first defined,
and others in this report apparently find it natural enough too. It also
applies to all 6
Tim Peters added the comment:
? I expect your code to return -1 about once per 7**4 = 2401 times, which
would be about 400 times per million tries, which is what your output shows.
If you start with -5, and randint(1, 7) returns 1 four times in a row, r5 is
left at -5 + 4 = -1
Tim Peters added the comment:
Fine, Serhiy, so reword it a tiny bit: it's nice if a code object's co_consts
vector references as few distinct objects as possible. Still a matter of
pragmatics, not of correctness.
--
___
Python track
Tim Peters added the comment:
The language doesn't define anything about this - any program relying on
accidental identity is in error itself.
Still, it's nice if a code object's co_consts vector is as short as reasonably
possible. That's a matter of pragmatics
Tim Peters added the comment:
Lucas, as Mark said you're sorting _strings_ here, not sorting integers.
Please study his reply. As strings, "10" is less than "9", because "1" is less
than "9".
>>> "10
Tim Peters added the comment:
[Victor]
> This method [shuffle()] has a weird API. What is
> the point of passing a random function,
> ... I proposed to deprecate this argument and remove it later.
I don't care here. This is a bug report. Making backward-incompatible API
Tim Peters added the comment:
Victor, look at Raymond's patch. In Python 3, `randrange()` and friends
already use the all-integer `getrandbits()`. He's changing three other lines,
where some variant of `int(random() * someinteger)` is being used in an inner
loop for speed.
Pres
Tim Peters added the comment:
[Mark]
> If we do this, can we also persuade Guido to Pronounce that
> Python implementations assume IEEE 754 format and semantics
> for floating-point?
On its own, I don't think a change to force 53-bit precision _on_ 754 boxes
would justify that
Tim Peters added the comment:
Mark, ya, I agree it's most prudent to let sleeping dogs lie.
In the one "real" complaint we got (issue 24546) the cause was never determined
- but double rounding was ruled out in that specific case, and no _plausible_
cause was identified (sho
Tim Peters added the comment:
Mark, do you believe that 32-bit Linux uses a different libm? One that fails
if, e.g., SSE2 were used instead? I don't know, but I'd sure be surprised it
if did. Very surprised - compilers have been notoriously unpredictable in
exactly when
Tim Peters added the comment:
There are a couple bug reports here that have been open for years, and it's
about time we closed them.
My stance: if any platform still exists on which "double rounding" is still a
potential problem, Python _configuration_ should be changed to
Tim Peters added the comment:
Raymond, I'd say scaling is vital (to prevent spurious infinities), but
complications beyond that are questionable, slowing things down for an
improvement in accuracy that may be of no actual benefit.
Note that your original "simple homework problem
Tim Peters added the comment:
I'd call it a bug fix, but I'm really not anal about what people call things ;-)
--
___
Python tracker
<https://bugs.python.o
Tim Peters added the comment:
Dan, your bug report is pretty much incoherent ;-) This standard Stack
Overflow advice applies here too:
https://stackoverflow.com/help/mcve
Guessing your complaint is that:
sys.getrefcount(itertools.repeat)
keeps increasing by 1 across calls to `leaks
Tim Peters added the comment:
I copy/pasted the definitions of "aware" and "naive" from the docs. Your TZ's
.utcoffset() returns None, so, yes, any datetime using an instance of that for
its tzinfo is naive.
In
print(datetime(2000,1,1).astimezone(timezone.utc))
Tim Peters added the comment:
The message isn't confusing - the definition of "aware" is confusing ;-)
"""
A datetime object d is aware if d.tzinfo is not None and d.tzinfo.utcoffset(d)
does not return None. If d.tzinfo is None, or if d.tzinfo is not None but
Tim Peters added the comment:
Berker Peksag's change (PR 5667) is very simple and, I think, helpful.
--
nosy: +tim.peters
___
Python tracker
<https://bugs.python.org/is
Tim Peters added the comment:
You missed my point about IPython: forget "In/Out arrays, etc". What you
suggest is inadequate for _just_ changing PS1/PS2 for IPython. Again, read
their `parse()` function. They support _more than one_ set of PS1/PS2
conventions. So the code c
Tim Peters added the comment:
Sergey, I understand that, but I don't care. The only people I've ever seen
_use_ this are people writing an entirely different shell interface. They're
rare. There's no value in complicating doctest to cater to theoretical use
cases that
Tim Peters added the comment:
doctest was intended to deal with the standard CPython terminal shell. I'd
like to keep it that way, but recognize that everyone wants to change
everything into "a framework" ;-)
How many other shells are there? As Sergey linked to, IPython alre
Tim Peters added the comment:
They both look wrong to me. Under 3.6.5 on Win10, `one` and `three` are the
same.
Python 3.6.5 (v3.6.5:f59c0932b4, Mar 28 2018, 17:00:18) [MSC v.1900 64 bit
(AMD64)] on win32
time.struct_time(tm_year=2009, tm_mon=2, tm_mday=13, tm_hour=23, tm_min=31,
tm_sec
Tim Peters added the comment:
I expect these docs date back to when ints, longs, and floats were the only
hashable language-supplied types for which mixed-type comparison could ever
return True.
They could stand some updates ;-) `fractions.Fraction` and `decimal.Decimal`
are more language
Tim Peters added the comment:
Min, you need to give a complete example other people can actually run for
themselves.
Offhand, this part of the regexp
(.|\s)*
all by itself _can_ cause exponential-time behavior. You can run this for
yourself:
>>> import re
>>> p = r"
Tim Peters added the comment:
Closing because this appears to be senseless.
--
nosy: +tim.peters
resolution: -> rejected
stage: -> resolved
status: open -> closed
___
Python tracker
<https://bugs.python.or
Tim Peters added the comment:
-1. We should stop pretending this _ might_ happen ;-)
--
nosy: +tim.peters
___
Python tracker
<https://bugs.python.org/issue33
Tim Peters added the comment:
Please find a minimal example that illustrates the problem you think you've
found, and paste the plain text _into_ the bug report.
In the meantime, I'm closing this as "not a bug". The division operator
applied to integers in Python 2 defau
Tim Peters added the comment:
docstrings give brief statements intended to jog your memory; they're not
intended to be comprehensive docs. Read the actual documentation and see
whether you're still confused. When you "assumed it is irrelevant to time
zone", that w
Tim Peters added the comment:
Ned, I think this one is more the case that the OP didn't read the docs ;-)
That said, there's a level of complexity here that seemingly can't be reduced:
the distinctions between the `datetime` and `time` modules' views of the world,
and
Tim Peters added the comment:
I agree this isn't a bug (and it was right to close it). I expect the OP is
confused about what the `.timestamp()` method does, though. This note in the
docs directly address what happens in their problematic
`datetime.utcnow().timestamp()` case:
&qu
Tim Peters added the comment:
Sounds good (removing \b) to me, Terry!
--
nosy: +tim.peters
___
Python tracker
<https://bugs.python.org/issue33204>
___
___
Pytho
Tim Peters added the comment:
I'm the wrong guy to ask about that. Since I worked at Zope Corp, my natural
inclination is to monkey-patch everything - but knowing full well that will
offend everyone else ;-)
That said, this optimization seems straightforward to me: two distinct m
Tim Peters added the comment:
I don't see anything objectionable about the class optimizing the
implementation of a private method.
I'll note that there's a speed benefit beyond just removing the two type checks
in the common case: the optimized `_randbelow()` also avoids
Tim Peters added the comment:
There's nothing in the docs I can see that implies `sample(x, n)` is a prefix
of what `sample(x, n+1)` would have returned had the latter been called
instead. If so, then - as always - it's "at your own risk" when you rely on
behavio
Tim Peters added the comment:
factorial(float) was obviously intended to work the way it does, so I'd leave
it alone in whatever changes are made to resolve _this_ issue. I view it as a
harmless-enough quirk, but, regardless, if people want to deprecate it that
should be a different
Tim Peters added the comment:
Please see the response to issue31889. Short course: you need to pass
`autojunk=False` to the SequenceMatcher constructor.
--
nosy: +tim.peters
resolution: -> duplicate
stage: -> resolved
status: open -&g
Tim Peters added the comment:
Mark, how about writing a clever single-rounding dot product that merely
_detects_ when it encounters troublesome cases? If so, it can fall back to a
(presumably) much slower method. For example, like this for the latter:
def srdp(xs, ys):
"S
Tim Peters added the comment:
Mark, thanks! I'm happy with that resolution: if any argument is infinite,
return +inf; else if any argument is a NaN, return a NaN; else do something
useful ;-)
Serhiy, yes, the scaling that prevents catastrophic overflow/underflow due to
naively squ
Tim Peters added the comment:
Some notes on the hypot() code I pasted in: first, it has to special case
infinities too - it works fine if there's only one of 'em, but returns a NaN if
there's more than one (it ends up computing inf/inf, and the resulting NaN
propagates).
S
Tim Peters added the comment:
I'd be +1 on generalizing math.hypot to accept an arbitrary number of
arguments. It's the natural building block for computing distance, but the
reverse is strained. Both are useful.
Here's scaling code translated from the Fortran implementat
Tim Peters added the comment:
This won't be changed. The dict type doesn't support efficient random choice
(neither do sets, by the way), and it's been repeatedly decided that it would
do a disservice to users to hide that. As you know, you can materialize the
keys in a
Tim Peters added the comment:
If you want to deprecate the method, bring that up on python-dev or
python-ideas. It's inappropriate on the issue tracker (unless, e.g., you open
a new issue with a patch to rip it out of the language). It's also
inappropriate to keep on demanding
Tim Peters added the comment:
Serhiy, nobody is proposing to add float.as_integer(). It already exists:
>>> (3.1).is_integer()
False
I already allowed I don't have a feel for how _generally_ useful it is, but you
have at least my and Stefan's word for that the functionali
601 - 700 of 1332 matches
Mail list logo