[issue37211] obmalloc: eliminate limit on pool size

2019-06-09 Thread Tim Peters
Change by Tim Peters : -- keywords: +patch pull_requests: +13801 stage: -> patch review pull_request: https://github.com/python/cpython/pull/13934 ___ Python tracker <https://bugs.python.org/issu

[issue37211] obmalloc: eliminate limit on pool size

2019-06-09 Thread Tim Peters
New submission from Tim Peters : On 64-bit Python, many object sizes essentially doubled over 32-bit Python, because Python objects are so heavy with pointers. More recently, forcing alignment to 16 bytes on 64-bit boxes boosted the memory requirements more modestly. But obmalloc's 256 KiB

[issue37178] One argument form of math.perm()

2019-06-06 Thread Tim Peters
Tim Peters added the comment: I agree: perm(n) should return factorial(n). -- ___ Python tracker <https://bugs.python.org/issue37178> ___ ___ Python-bugs-list m

[issue37168] Decimal divisions sometimes 10x or 100x too large

2019-06-05 Thread Tim Peters
Tim Peters added the comment: Also basic: run hardware CPU and memory stress diagnostics, and/or try running the same thing on a different machine. Hardware isn't infallible, and can fail in nearly arbitrary ways. For example, perhaps a smidgen of silicon has gone flaky, so that one time

[issue37029] PyObject_Free is O(N) where N = # of arenas

2019-06-03 Thread Tim Peters
Tim Peters added the comment: Thank you, Friedl! I appreciate the extra confirmation - and especially on even bigger test cases :-) -- ___ Python tracker <https://bugs.python.org/issue37

[issue37134] [EASY] Use PEP570 syntax in the documentation

2019-06-03 Thread Tim Peters
Tim Peters added the comment: I haven't looked at anything in this PR, so just popping in to confirm that the first time I saw stuff like: len(obj, /) in the docs I had no idea at all what it was trying to say. I thought it was a typo. Also the second, third, fourth, ..., times I saw

[issue35431] Add a function for computing binomial coefficients to the math module

2019-06-03 Thread Tim Peters
Tim Peters added the comment: Python needs a benevolent dictator ;-) I'd be happy for Mark, Raymond, or me to play that role here. If it were me, I'd drop the k <= n requirement: both arguments must be non-negative integers. Because a combinatorial (not factorial-, and certai

[issue37132] Add a module for integer related math functions

2019-06-02 Thread Tim Peters
Tim Peters added the comment: Ya, I'm mostly with Raymond. `math` was originally a very thin wrapper around C's libm, but we've been moving away from that more and more for decades, and it's no longer the case anyway that the vast bulk of new Python programmers are intimately familiar

[issue35431] Add a function for computing binomial coefficients to the math module

2019-06-02 Thread Tim Peters
Tim Peters added the comment: I'm not fatally opposed to relaxing k <= n. David makes some good points about it, and - as Raymond already noted - "0" is consistent with the behavior of itertools.combinations(). The docs would need to change, though, because the factorial "

[issue35431] Add a function for computing binomial coefficients to the math module

2019-06-02 Thread Tim Peters
Tim Peters added the comment: I'm going to repeat part of an earlier comment :-) """ Please resist pointless feature creep. The original report was about comb(n, k) for integer n and k with 0 <= k <= n and that's all. Everyone who commented appeared to agree th

[issue35431] Add a function for computing binomial coefficients to the math module

2019-06-02 Thread Tim Peters
Tim Peters added the comment: I'm not convinced, although I agree relaxing k <= n is less damaging than relaxing k >= 0. Python isn't aimed at mathematicians (although some 3rd-party packages certainly are, and they're free to define things however they like). We have to tra

[issue35431] Add a function for computing binomial coefficients to the math module

2019-06-01 Thread Tim Peters
Tim Peters added the comment: Strongly prefer requiring 0 <= k <= n at first. This is a programming language: it will be applied to real problems, not to abstract proofs where some slop can be helpful in reducing the number of cases that need to be considered. The Twitter

[issue29882] Add an efficient popcount method for integers

2019-06-01 Thread Tim Peters
Tim Peters added the comment: I prefer that a negative int raise ValueError, but am OK with it using the absolute value instead (i.e., what it does now). -- ___ Python tracker <https://bugs.python.org/issue29

[issue37029] PyObject_Free is O(N) where N = # of arenas

2019-05-31 Thread Tim Peters
Change by Tim Peters : -- resolution: -> fixed stage: patch review -> resolved status: open -> closed ___ Python tracker <https://bugs.python.or

[issue37029] PyObject_Free is O(N) where N = # of arenas

2019-05-31 Thread Tim Peters
Tim Peters added the comment: New changeset 1c263e39c4ed28225a7dc8ca1f24953225ac48ca by Tim Peters in branch 'master': bpo-37029: keep usable_arenas in sorted order without searching (#13612) https://github.com/python/cpython/commit/1c263e39c4ed28225a7dc8ca1f24953225ac48ca

[issue37029] PyObject_Free is O(N) where N = # of arenas

2019-05-31 Thread Tim Peters
Tim Peters added the comment: Thank you so much, Inada! That's very good to hear :-) -- ___ Python tracker <https://bugs.python.org/issue37029> ___ ___ Pytho

[issue37029] PyObject_Free is O(N) where N = # of arenas

2019-05-30 Thread Tim Peters
Tim Peters added the comment: Added file arena.py. This adds some code to the OP's original test, to print out build time and teardown time, and display obmalloc stats. You'll need at least 80GB of RAM to run it - I don't have that much. Building the tree may take on the order of an hour

[issue37029] PyObject_Free is O(N) where N = # of arenas

2019-05-28 Thread Tim Peters
Tim Peters added the comment: Created a PR and assigned myself to this bug. -- assignee: -> tim.peters ___ Python tracker <https://bugs.python.org/issu

[issue37029] PyObject_Free is O(N) where N = # of arenas

2019-05-28 Thread Tim Peters
Change by Tim Peters : -- keywords: +patch pull_requests: +13516 stage: -> patch review pull_request: https://github.com/python/cpython/pull/13612 ___ Python tracker <https://bugs.python.org/issu

[issue37061] The strangest glitch I have ever seen - incorrect indenterror, even on commented out code.

2019-05-26 Thread Tim Peters
Tim Peters added the comment: Same kind of problem with the new upload, Bob. Line 38 is: print(Fore.GREEN,"Installed file ",e,Fore.WHITE) indented 8 spaces. Lines 39 and 40 are empty. Then line 41 is: if x>=len(files): # if x is more than or equal to the number of files in t

[issue37061] The strangest glitch I have ever seen - incorrect indenterror, even on commented out code.

2019-05-26 Thread Tim Peters
Tim Peters added the comment: As Steven said, there's an obvious indentation error in the file you actually attached. So nobody can _guess_ what your problem is. Please upload a file showing your actual problem. If I increase the indentation of the `print` Steven identified to match

[issue37000] _randbelow_with_getrandbits function inefficient with powers of two

2019-05-22 Thread Tim Peters
Tim Peters added the comment: I believe the thrust of Mark's suggestion was that it would allow using `k = (n-1).bit_length()` even when n == 1, without special-casing n == 1. But you'd still be adding a new "subtract 1" operation, and would still change results in some cases.

[issue26092] doctest should allow custom sys.displayhook

2019-05-16 Thread Tim Peters
Tim Peters added the comment: Oops! Should be issue 8048. -- ___ Python tracker <https://bugs.python.org/issue26092> ___ ___ Python-bugs-list mailing list Unsub

[issue26092] doctest should allow custom sys.displayhook

2019-05-16 Thread Tim Peters
Tim Peters added the comment: Noam Yorav-Raphael, I tried to add you because this is pushing back against your patch in issue 8408. It's been some years now, and nobody has cared enough to pursue it, so I'll just close this if you still don't care ;-) As doctest's original author, I

[issue26092] doctest should allow custom sys.displayhook

2019-05-16 Thread Tim Peters
Change by Tim Peters : -- nosy: +noam, tim.peters ___ Python tracker <https://bugs.python.org/issue26092> ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue36934] C API Function PyLong_AsDouble Returning Wrong Value

2019-05-15 Thread Tim Peters
Tim Peters added the comment: It sounds like you need to read an introduction (any!) to computer floating-point formats. The relevant part in the output you showed is this: mant_dig=53 As on almost other current machines, your platform's floating point format is restricted to 53 bits

[issue36934] C API Function PyLong_AsDouble Returning Wrong Value

2019-05-15 Thread Tim Peters
Tim Peters added the comment: Note that this pure Python gives the same result: >>> "%.0f" % float(1639873214337061279) '1639873214337061376' That's not surprising: int -> float conversion is limited to the precision of a Python float, which is a C double, which is al

[issue36887] Add integer square root, math.isqrt

2019-05-13 Thread Tim Peters
Tim Peters added the comment: +1 from me! I'm tired of re-inventing this too :-) Agree with every point Mark made. Just in passing, noting a triviality: for the ceiling, `1 + isqrt(n - 1)` fails when `n` is zero. But I've never had a use for the ceiling here, or for "nearest&qu

[issue36493] Add math.midpoint(a,b) function

2019-04-01 Thread Tim Peters
Tim Peters added the comment: I'm inclined to agree with Mark - the wholly naive algorithm is already "perfect" away from extreme (whether large or small) magnitudes, and can't be beat for speed either. This kind of thing gets much more attractive in IEEE single precision, whe

[issue36229] Avoid unnecessary copies for list, set, and bytearray ops.

2019-03-13 Thread Tim Peters
Tim Peters added the comment: Josh, I agree - but do read Armin's response. The string hack was done inside ceval.c, where we have extraordinary knowledge of context. Adding similar hacks inside Python C API functions seems to be a non-starter - they can't magically start mutating

[issue36229] Avoid unnecessary copies for list, set, and bytearray ops.

2019-03-13 Thread Tim Peters
Tim Peters added the comment: Serhiy, if I understand this, it _could_ pay big with even just a few operations. For example, [0] * 100 + [1] + [2] + [3] As-is, we'll copy the million zeroes three times. WIth the patch, more likely the original vector of a million zeroes would

[issue36229] Linear-time list, set, and bytearray ops.

2019-03-13 Thread Tim Peters
Tim Peters added the comment: Brandt, it's quite possible I have no real idea what this patch is aimed at. I didn't reverse-engineer the code, the title is extremely broad, and there are no Python examples I can find for what it _is_ aimed at. If, e.g., it's impossible that this patch has

[issue36229] Linear-time list, set, and bytearray ops.

2019-03-12 Thread Tim Peters
Tim Peters added the comment: I'm at best -0 on this, and on the edge of -1. We already supply mutate-in-place operations for when this is wanted, and, e.g., for lists, it's as dead easy to spell as L1 += L2 Half the point of the hack for string catenation was that the similar-looking

[issue35880] math.sin has no backward error; this isn't documented

2019-03-08 Thread Tim Peters
Tim Peters added the comment: Jurjen, the errors you see in Python's sin() are _entirely_ due to your platform C's libm. Python just calls the platform C's sin. So nothing can be said about it in general. The better libm trig functions today do indeed perform trig argument reduction

[issue36228] Support coercion of complex to float/int

2019-03-08 Thread Tim Peters
Tim Peters added the comment: I have no use for this either, and agree with rejecting it - in nearly 30 years, nobody has asked for this before, and it's still the case that we don't have an actual programming use case (some theoretical use in an abstract mathematical model isn't

[issue36190] file object method .tell() sometimes returns large number when position is right before a line break

2019-03-04 Thread Tim Peters
Tim Peters added the comment: Stuff like that happens in any language supporting a tell() function for a file opened in text mode on Windows, inherited from the platform C's ftell() implementation: https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/ftell-ftelli64?view=vs-2017

[issue36095] Better NaN sorting.

2019-02-24 Thread Tim Peters
Tim Peters added the comment: > *Clearly*, negative NaNs should be sorted to the start > of the list, while positive NaNs are sorted to the end > of the list. (Yes, I'm joking, but only a little bit: that's > the result you'd get if you used IEEE 754's totalOrder > to sort.

[issue36095] Better NaN sorting.

2019-02-23 Thread Tim Peters
Tim Peters added the comment: If we could roll back the clock, I'd impose a total ordering on floats in Python and supply some other way to spell 754's comparison operators (which users would have to ask for explicitly if they wanted to hassle with the near-useless and too-often-surprising

[issue36027] Consider adding modular multiplicative inverse to the math module

2019-02-19 Thread Tim Peters
Tim Peters added the comment: Raymond, I doubt we can do better (faster) than straightforward egcd without heroic effort anyway. We can't even know whether a modular inverse exists without checking whether the gcd is 1, and egcd builds on what we have to do for the latter anyway. Even

[issue36027] Consider adding modular multiplicative inverse to the math module

2019-02-18 Thread Tim Peters
Tim Peters added the comment: - Some form of this would be most welcome! - If it's spelled this way, put the modulus argument last? "Everyone expects" the modulus to come last, whether in code: x = (a+b) % m x = a*b % m x = pow(a, b, m) or in math: a^(k*(p-1

[issue36028] Integer Division discrepancy with float

2019-02-18 Thread Tim Peters
Tim Peters added the comment: Thanks, Steven! I'll go on to suggest that the best intro to these issues for Python programmers is in Python's own tutorial: https://docs.python.org/3/tutorial/floatingpoint.html Raymond, `divmod(a, b)[0]` IS the mathematical `floor(a/b)`, where the latter

[issue36028] Integer Division discrepancy with decimal

2019-02-18 Thread Tim Peters
Tim Peters added the comment: "Multiple roundings" in the float code is a red herring - that's just implementation details trying to cheaply get the same effect as computing with infinite precision. Here with actual unbounded precision: >>> from fractions import Fracti

[issue35961] test_slice: gc_decref: Assertion "gc_get_refs(g) > 0" failed: refcount is too small

2019-02-12 Thread Tim Peters
Tim Peters added the comment: > Oh, Tim Peters succeded somehow to > remove Josh Rosenberg from the nosy list: I add him again ;-) Certainly wasn't my intent! That happens too often on the tracker. Thanks for noticing! :-( -- ___ Python t

[issue35961] test_slice: gc_decref: Assertion "gc_get_refs(g) > 0" failed: refcount is too small

2019-02-12 Thread Tim Peters
Tim Peters added the comment: Josh, I'd say the speed of this code doesn't matter one whit to anything. Like you, I'd _prefer_ that the issue be fixed by building "real tuples" that own their own references, which would also (as you showed) allow for briefer, simpler, cl

[issue35961] test_slice: gc_decref: Assertion "gc_get_refs(g) > 0" failed: refcount is too small

2019-02-12 Thread Tim Peters
Tim Peters added the comment: > It's impressive *and* scary that such 13 years old > bug only show up today... Then again, there's probably no other code in the world that compares slice objects ;-) -- nosy: +tim.peters -josh.r ___ Python t

[issue35955] difflib reports incorrect location of mismatch

2019-02-11 Thread Tim Peters
Tim Peters added the comment: It's probably OK, but there's no "pure win" to be had here. There's generally more than one way to convert one string to another, and what "looks right" to humans depends a whole lot on context. For example, consider these strin

[issue35955] difflib reports incorrect location of mismatch

2019-02-11 Thread Tim Peters
Tim Peters added the comment: difflib generally synchs on the longest contiguous matching subsequence that doesn't contain a "junk" element. By default, `ndiff()`'s optional `charjunk` argument considers blanks and tabs to be junk characters. In the strings: "drwxrwxr-x

[issue35932] Interpreter gets stuck while applying a regex pattern

2019-02-07 Thread Tim Peters
Tim Peters added the comment: Without re.IGNORECASE, the leading ^[_a-z0-9-]+ can't even match the first character (`val` starts with uppercase Z), so it fails instantly. With re.IGNORECASE, it's not "stuck", but is taking a verry long time to try an enormous number of (

[issue35915] re.search extreme slowness (looks like hang/livelock), searching for patterns containing .* in a large string

2019-02-06 Thread Tim Peters
Tim Peters added the comment: Yes, it's quadratic time. If the string being searched has N characters, first it fails to find "x" in all N of 'em, then `.*` advances by one and it fails to find "x" in the trailing N-1 characters, then again in the trailing N-2, and so

[issue35880] math.sin has no backward error; this isn't documented

2019-02-01 Thread Tim Peters
Tim Peters added the comment: As Mark said, the behavior of Python's floating-point math functions (not just trig, but also log, pow, exp, ...) is inherited almost entirely from the platform C's math libraries. Python itself guarantees nothing about what `math.sin(x)` returns for any `x

[issue35431] Add a function for computing binomial coefficients to the math module

2019-01-28 Thread Tim Peters
Tim Peters added the comment: > As far as I can tell from the discussions here, Steven > and you stated a preference for the shortened names, and > that's it. Plus Mark, plus me - all backed "comb()" specifically. > I find it hard to imagine anyone needing combinations

[issue34691] _contextvars missing in xmaster branch Windows build?

2019-01-16 Thread Tim Peters
Tim Peters added the comment: Precisely _how_ are you building it? As noted above, if you're using Visual Studio, try using build.bat instead for the first time. -- ___ Python tracker <https://bugs.python.org/issue34

[issue35606] Add prod() function to the math module

2019-01-05 Thread Tim Peters
Tim Peters added the comment: I'd like to divorce `prod()` from floating-point complications. The `sum()` builtin has proved extremely handy, even for floats, in large part because it's type-agnostic and straightforward. While I'd usually use `prod()` on ints and Fractions, in almost all

[issue35659] Add heapremove() function to heapq

2019-01-04 Thread Tim Peters
Tim Peters added the comment: For history, note that `bisect` doesn't always work in this context either: a heap is NOT always in sorted order. For example, this is "a (min) heap" too: [1, 3, 2]. More, that's "the real" problem. If we could find the element to be rem

[issue35431] Add a function for computing binomial coefficients to the math module

2019-01-04 Thread Tim Peters
Tim Peters added the comment: Please resist pointless feature creep. The original report was about comb(n, k) for integer n and k with 0 <= k <= n and that's all. Everyone who commented appeared to agree they'd find that useful. But nobody has said they'd find generalizing beyond

[issue35654] Remove 'guarantee' that sorting only relies on __lt__ from sorting howto

2019-01-03 Thread Tim Peters
Tim Peters added the comment: Steven, thanks for noticing the docs! I was surprised to hear it wasn't documented, but not surprised enough to check myself ;-) This decision was suggested by me, and endorsed by Guido, when designing timsort looking ahead to Python 3, where __cmp__ was going

[issue35654] Remove 'guarantee' that sorting only relies on __lt__ from sorting howto

2019-01-03 Thread Tim Peters
Tim Peters added the comment: I don't know that the language needs to define this, but sticking to __lt__ was a wholly deliberate design decision for CPython. -- nosy: +tim.peters ___ Python tracker <https://bugs.python.org/issue35

[issue35608] python3 multiprocessing queue deadlock when use thread and process at same time

2018-12-29 Thread Tim Peters
Tim Peters added the comment: Antoine, alas, it's subtler than that. The worker process (process_func()) puts _another_ `StopIteration` on the input queue in its `finally` clause. So the first worker process to finish adds back a sentinel for the next worker to see, and so on. At the end

[issue35597] Bug in Python's compiler

2018-12-27 Thread Tim Peters
Tim Peters added the comment: Please read my answer again. Your code does not do what you _think_ it does. It does what I said it does instead. >>> a = input() 1010 >>> print(a) 1010 >>> print(type(a)) The input you're working with is NOT A LIST OF INTEGERS. I

[issue35597] Bug in Python's compiler

2018-12-27 Thread Tim Peters
Tim Peters added the comment: `input()` returns a string, not a list. For input '1010' you're effectively computing this: >>> int('1' * 8) + int('1' * 2) # = + 11 1122 which is the correct answer. If you want your input to be a list of integers instead of a string,

[issue35431] Add a function for computing binomial coefficients to the math module

2018-12-14 Thread Tim Peters
Tim Peters added the comment: Just for fun, here's a gonzo implementation (without arg-checking) using ideas from the sketch. All factors of 2 are shifted out first, and all divisions are done before any multiplies. For large arguments, this can run much faster than a dumb loop

[issue35431] Add a function for computing binomial coefficients to the math module

2018-12-12 Thread Tim Peters
Tim Peters added the comment: My two cents: - Spell it comb(n, k). - TypeError if args aren't ints. - ValueError if not 0 <= k <= n. Not concerned about speed. It's possible to do this with no divisions involving integers bigger than `n` and `k`(*), using O(k) space, but for &quo

[issue35369] List sorting makes duplicate comparisons

2018-11-30 Thread Tim Peters
Tim Peters added the comment: Yup, it can do some redundant comparisons; more on that here: https://mail.python.org/pipermail/python-dev/2018-August/155020.html I'm not inclined to make this already-difficult code even harder to understand for something that's quite likely not to matter

[issue24746] doctest 'fancy diff' formats incorrectly strip trailing whitespace

2018-11-21 Thread Tim Peters
Tim Peters added the comment: To include trailing whitespace on a line in a doctest, _don't_ use raw strings. Use a regular string, and include add a (one or more) trailing \x20 instead of a space (for example). For example: r""" >>> print("a ") a &qu

[issue35091] Objects/listobject.c: gallop functions rely on signed integer overflow

2018-10-28 Thread Tim Peters
Tim Peters added the comment: I left the code in because it was harmless (a 100%-predictable branch), and it was easier to show that overflow was _considered_ than to explain in full why it was impossible. In the context of CPython. For example, the Java port of this code couldn't rely

[issue35091] Objects/listobject.c: gallop functions rely on signed integer overflow

2018-10-28 Thread Tim Peters
Tim Peters added the comment: This doesn't actually matter - the code can never trigger. It would be fine to replace it with an assert to that effect (see below for a specific suggestion). The reason: The indices in this code are into vectors of PyObject*. These vectors can't contain

[issue35079] difflib.SequenceMatcher.get_matching_blocks omits common strings

2018-10-26 Thread Tim Peters
Tim Peters added the comment: I don't object to spelling it out more, and your (Terry's) suggestions are fine. On the other hand, this module has been around for a lng time now, and this is the first instance I've seen of someone surprised that it doesn't produce overlapping matches

[issue35079] difflib.SequenceMatcher.get_matching_blocks omits common strings

2018-10-26 Thread Tim Peters
Tim Peters added the comment: Sorry, I find this too hard to follow. At the end: > ['TA', 'CA'] # Missing the substring 'CA' the comment claims it's missing 'CA', while the output plainly shows it's _not_ missing 'CA'. If your complaint is that's missing 'AC', yes, it is. I

[issue35023] Missed a key when iterating over dictionary

2018-10-18 Thread Tim Peters
Tim Peters added the comment: Not without more expense. Which is why it hasn't been done. But this is a wrong place to talk about it. If you want Python to change, go to the python-ideas mailing list? https://mail.python.org/mailman/listinfo/python-ideas

[issue35023] Missed a key when iterating over dictionary

2018-10-18 Thread Tim Peters
Tim Peters added the comment: Questions about Python should be asked, e.g., on the Python mailing list. The short course is that it's desired that iterating over dicts be as fast as possible, and nobody knows a way to make iteration robust in the face of mutations that wouldn't

[issue35023] Missed a key when iterating over dictionary

2018-10-18 Thread Tim Peters
Tim Peters added the comment: This won't be changed. The effect on the iteration order of adding and/or removing keys from a dict while iterating over it has never been defined in Python, and never will be. The Python 3 docs for dict views say this clearly: """ Iterat

[issue35010] sort by partially reversed key tuple

2018-10-17 Thread Tim Peters
Tim Peters added the comment: This comes up every few years, but that's about it. Here's the iteration from 2 years ago: https://mail.python.org/pipermail/python-ideas/2016-October/043039.html Follow the thread. It contains easy-to-use wrappers for both "do it in multiple simple p

[issue34751] Hash collisions for tuples

2018-10-09 Thread Tim Peters
Tim Peters added the comment: >> Changes initialization to add in the length: > What's the rationale for that change? You always asked > me to stay as close as possible to the "official" hash function > which adds in the length at the end. Is there an ac

[issue34751] Hash collisions for tuples

2018-10-08 Thread Tim Peters
Tim Peters added the comment: We need to worry about timing too :-( I'm using this as a test. It's very heavy on using 3-tuples of little ints as dict keys. Getting hash codes for ints is relatively cheap, and there's not much else going on, so this is intended to be very sensitive

[issue34751] Hash collisions for tuples

2018-10-07 Thread Tim Peters
Tim Peters added the comment: Here's htest.py output from git pull https://github.com/jdemeyer/cpython.git bpo34751 and a variation. The number of collisions in the variation appear in square brackets at the end of each line. 32-bit build: range(100) by 3; 32-bit hash codes; mean 116.42

[issue34751] Hash collisions for tuples

2018-10-07 Thread Tim Peters
Tim Peters added the comment: Attaching htest.py so we have a common way to compare what various things do on the tests that have been suggested. unittest sucks for that. doctest too. Here's current code output from a 32-bit build; "ideally" we want "got" values

[issue34751] Hash collisions for tuples

2018-10-06 Thread Tim Peters
Tim Peters added the comment: Thinking about the way xxHash works prompted me to try this initialization change: x = 0x345678UL + (Py_uhash_t)len; // adding in the length is new That was a pure win in my tests, slashing the number of collisions in the new tuple test, 32-bit build, from

[issue34751] Hash collisions for tuples

2018-10-06 Thread Tim Peters
Tim Peters added the comment: As a sanity check, I tried the 32-bit version of MurmurHash2 (version 3 doesn't have a 32-bit version). All of my Python tests had collisions, and while the new tuple test dropped to 15, product([0.5, 0.25], repeat=20) skyrocketed from 141 (32-bit xxHash

[issue34839] doctest: Change example under warnings section

2018-10-05 Thread Tim Peters
Tim Peters added the comment: Stephane, it's not deep. People who need to write doctests that work across N versions of Python shouldn't need to read N versions of the documentation. This is hardly unique to doctest. We routinely add "Changed in version m.n" blurbs all over

[issue34839] doctest: Change example under warnings section

2018-10-05 Thread Tim Peters
Tim Peters added the comment: Add a comment along the lines you (Terry) suggested. Some people need to write doctests that run under many versions of Python, so the info is still supremely relevant to them. -- nosy: +tim.peters ___ Python

[issue34751] Hash collisions for tuples

2018-10-04 Thread Tim Peters
Tim Peters added the comment: >> people already wrote substantial test suites dedicated >> to that sole purpose, and we should aim to be "mere >> consumers" of functions that pass _those_ tests. > There are hash functions that pass those tests which > a

[issue34751] Hash collisions for tuples

2018-10-04 Thread Tim Peters
Change by Tim Peters : -- components: +Interpreter Core -XML ___ Python tracker <https://bugs.python.org/issue34751> ___ ___ Python-bugs-list mailing list Unsub

[issue34751] Hash collisions for tuples

2018-10-04 Thread Tim Peters
Tim Peters added the comment: >> In the 64-bit build there are no collisions across my >> tests except for 11 in the new tuple test. > That's pretty bad actually. With 64 bits, you statistically > expect something in the order of 10**-8 collisions. So > what you'

[issue34751] Hash collisions for tuples

2018-10-04 Thread Tim Peters
Tim Peters added the comment: > Note that I'm always considering parametrized > versions of the hash functions that I'm testing. > I'm replacing the fixed multiplier (all algorithms > mentioned here have such a thing) by a random > multiplier which is 3 mod 8. I've explained

[issue34751] Hash collisions for tuples

2018-10-03 Thread Tim Peters
Tim Peters added the comment: Here's a complete xxHash-based implementation via throwing away C++-isms from https://github.com/RedSpah/xxhash_cpp/blob/master/xxhash/xxhash.hpp In the 64-bit build there are no collisions across my tests except for 11 in the new tuple test. The 32-bit build

[issue34751] Hash collisions for tuples

2018-10-03 Thread Tim Peters
Tim Peters added the comment: Thanks for looking at xxHash! An advantage is that it already comes in 32- and 64-bit versions. > A (simplified and slightly modified version of) xxHash > seems to work very well, much better than SeaHash. ? I've posted several SeaHash cores that

[issue34751] Hash collisions for tuples

2018-10-02 Thread Tim Peters
Tim Peters added the comment: > SeaHash seems to be designed for 64 bits. Absolutely. > I'm guessing that replacing the shifts by > > x ^= ((x >> 16) >> (x >> 29)) > > would be what you'd do for a 32-bit hash. My guess too. But "the prime" i

[issue34751] Hash collisions for tuples

2018-10-02 Thread Tim Peters
Tim Peters added the comment: FYI, this appears to be a good overview of what SMHasher is looking for: https://github.com/aappleby/smhasher/wiki/SMHasher Someg of everything: performance, correctness, statistical measures, and specific kinds of keysets that have proved problematic for other

[issue34751] Hash collisions for tuples

2018-10-02 Thread Tim Peters
Tim Peters added the comment: > So it seems that this SMHasher test suite doesn't > catch the problem that we're seeing with negative integers. Seems to be so, but I've never run SMHasher myself. I believe it's focused on statistical properties, like avalanche and bit indepe

[issue34751] Hash collisions for tuples

2018-10-02 Thread Tim Peters
Tim Peters added the comment: > >>> from itertools import product > >>> len(set(map(hash, product([0.5, 0.25], repeat=20 > 32 > Good catch! Would you like me to add this to the testsuite? It's in mine already ;-) I've added all the "bad examples&

[issue34751] Hash collisions for tuples

2018-10-02 Thread Tim Peters
Tim Peters added the comment: >> For that reason, I've only been looking at those that >> scored 10 (best possible) on Appleby's SMHasher[1] test suite > Do you have a list of such hash functions? A world of searching. The ones rated "Excellent" here (previously ci

[issue34751] Hash collisions for tuples

2018-10-02 Thread Tim Peters
Tim Peters added the comment: >> I've noted before, e.g., that sticking to a prime >> eliminates a world of regular bit patterns in the >> multiplier. > Why do you think this? 0x1fff is prime :-) Point taken ;-) But "a world of" is not the same a

[issue34751] Hash collisions for tuples

2018-10-02 Thread Tim Peters
Tim Peters added the comment: > If we, e.g., tested tuples of little floats instead ... Speaking of which: >>> from itertools import product >>> len(set(map(hash, product([0.5, 0.25], repeat=20 32 32 hash codes out of 1048576 distinct two-tuples isn't exactly

[issue34751] Hash collisions for tuples

2018-10-02 Thread Tim Peters
Tim Peters added the comment: > So my suggestion remains > > for y in INPUT: >t = hash(y) >t ^= t * SOME_LARGE_EVEN_NUMBER >h ^= t >h *= MULTIPLIER On the face of it, I'd be amazed if that passed SMHasher, because it only propagates bits "to the lef

[issue34751] Hash collisions for tuples

2018-10-02 Thread Tim Peters
Tim Peters added the comment: >> the author wants this transformation to be easily >> invertible, so a prime is necessary > A multiplication by any odd number modulo 2**64 is > invertible. Right! My mistake. > As I argued before, the concept of primes is

[issue34751] Hash collisions for tuples

2018-10-02 Thread Tim Peters
Tim Peters added the comment: A meta-comment: a great deal of work has been done on non-crypto hashes in recent years. I'm not interested in rolling our own if one of the "winners" from that process can be adapted. For that reason, I've only been looking at those that score

[issue34751] Hash collisions for tuples

2018-10-01 Thread Tim Peters
Tim Peters added the comment: >Py_uhash_t t = (Py_uhash_t)y; >x ^= t ^ (t << 1); >x *= (Py_uhash_t)0x6eed0e9da4d94a4fULL; >x ^= (x >> 32) >> (x >> 60); >x *= (Py_uhash_t)0x6eed0e9da4d94a4fULL; Comment out either one of the multip

[issue34751] Hash collisions for tuples

2018-10-01 Thread Tim Peters
Tim Peters added the comment: If SeaHash is interesting to us (it is to me), I suggest dropping our DJB/FNV combining part entirely and using a purer form, like so: Py_uhash_t t = (Py_uhash_t)y; x ^= t ^ (t << 1); x *= (Py_uhash_t)0x6eed0e9da4d94a4fULL; x ^= (x >&g

[issue34751] Hash collisions for tuples

2018-10-01 Thread Tim Peters
Tim Peters added the comment: Noting that the failure of high-order bits to propagate is one form of "avalanche" failure. A recent 64-bit hash, SeaHash, takes this approach which is said to have provably "perfect" avalanche behavior: Py_uhash_t t = (Py_uhash_t)y;

[issue34751] Hash collisions for tuples

2018-09-30 Thread Tim Peters
Tim Peters added the comment: An "aha!" moment for me: I noted before that replacing the tuple hash loop with Py_uhash_t t = (Py_uhash_t)y; t ^= t << 1; x = (x * mult) + t; does OK (64-bit build): most tests had no collisions, but the original tuple test

<    1   2   3   4   5   6   7   8   9   10   >