[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-11-21 Thread Roundup Robot
Roundup Robot added the comment: New changeset eec4758e3a45 by Victor Stinner in branch 'default': Issue #19183: Simplify test_gdb http://hg.python.org/cpython/rev/eec4758e3a45 -- ___ Python tracker ___

[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-11-20 Thread Christian Heimes
Christian Heimes added the comment: The problems have been resolved. -- resolution: -> fixed status: open -> closed ___ Python tracker ___ __

[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-11-20 Thread Roundup Robot
Roundup Robot added the comment: New changeset 961d832d8734 by Christian Heimes in branch 'default': Issue #19183: too many tests depend on the sort order of repr(). http://hg.python.org/cpython/rev/961d832d8734 -- ___ Python tracker

[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-11-20 Thread STINNER Victor
STINNER Victor added the comment: Not only test_gdb relies on repr() exact value, there is also test_functools: http://buildbot.python.org/all/builders/AMD64%20OpenIndiana%203.x/builds/6875/steps/test/logs/stdio == FAIL: test_r

[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-11-20 Thread Roundup Robot
Roundup Robot added the comment: New changeset 11cb1c8faf11 by Victor Stinner in branch 'default': Issue #19183: Fix repr() tests of test_gdb, hash() is now platform dependent http://hg.python.org/cpython/rev/11cb1c8faf11 -- ___ Python tracker

[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-11-20 Thread Christian Heimes
Changes by Christian Heimes : -- assignee: ncoghlan -> christian.heimes resolution: -> fixed stage: patch review -> committed/rejected status: open -> closed ___ Python tracker

[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-11-20 Thread Roundup Robot
Roundup Robot added the comment: New changeset 422ed27b62ce by Christian Heimes in branch 'default': Issue #19183: test_gdb's test_dict was failing on some machines as the order or dict keys has changed again. http://hg.python.org/cpython/rev/422ed27b62ce -- ___

[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-11-20 Thread Roundup Robot
Roundup Robot added the comment: New changeset adb471b9cba1 by Christian Heimes in branch 'default': ssue #19183: Implement PEP 456 'secure and interchangeable hash algorithm'. http://hg.python.org/cpython/rev/adb471b9cba1 -- ___ Python tracker

[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-11-19 Thread Christian Heimes
Christian Heimes added the comment: The PEP should be ready now. I have addressed your input in http://hg.python.org/peps/rev/fbe779221a7a -- assignee: christian.heimes -> ncoghlan ___ Python tracker _

[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-11-16 Thread Christian Heimes
Christian Heimes added the comment: The numbers are between cpython default tip and my feature branch. I have pulled and merged all upstream changes into my feature branch yesterday. The results with "sso" in the file name are with small string optimization. Performance greatly depends on comp

[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-11-16 Thread Antoine Pitrou
Antoine Pitrou added the comment: Benchmark report (without the small strings optimization): http://bpaste.net/show/UohtA8dmSREbrtsJYfTI/ -- ___ Python tracker ___ _

[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-11-16 Thread Antoine Pitrou
Antoine Pitrou added the comment: So, amusingly, Christian's patch seems to be 4-5% faster than vanilla on many benchmarks here (Sandy Bridge Core i5, 64-bit, gcc 4.8.1). A couple of benchmarks are a couple % slower, but nothing severe. This without the small strings optimization. On top of t

[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-11-16 Thread Antoine Pitrou
Antoine Pitrou added the comment: For the record, it's better to use a geometric mean when agregating benchmark results into a single score. -- ___ Python tracker ___ __

[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-11-16 Thread Nick Coghlan
Nick Coghlan added the comment: Thanks - are those numbers with the current feature branch, and hence no small string optimization? To be completely clear, I'm happy to accept a performance penalty to fix the hash algorithm. I'd just like to know exactly how big a penalty I'm accepting, and w

[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-11-15 Thread Christian Heimes
Christian Heimes added the comment: Here are benchmarks on two Linux machine. It looks like SipHash24 takes advantage of newer CPUs. I'm a bit puzzled about the results. Or maybe my super simple and naive analyzer doesn't give sensible results... https://bitbucket.org/tiran/pep-456-benchmarks/

[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-11-15 Thread Nick Coghlan
Changes by Nick Coghlan : -- assignee: ncoghlan -> christian.heimes ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscri

[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-11-15 Thread Nick Coghlan
Nick Coghlan added the comment: I reviewed the latest PEP text at http://www.python.org/dev/peps/pep-0456/ I'm almost prepared to accept the current version of the implementation, but there's one technical decision to be clarified and a few placeholders in the PEP that need to be cleaned up pr

[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-11-13 Thread Christian Heimes
Christian Heimes added the comment: Hi Nick, I have updated the patch and the PEP text. The new version has small string hash optimization disabled. The other changes are mostly cleanup, reformatting and simplifications. Can you please do a review so I can get the patch into 3.4 before beta1

[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-11-13 Thread Christian Heimes
Changes by Christian Heimes : Added file: http://bugs.python.org/file32606/ac521cef665a.diff ___ Python tracker ___ ___ Python-bugs-list maili

[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-31 Thread Christian Heimes
Christian Heimes added the comment: I had to add the conversion from LE to host endianess. The missing conversion was affecting and degrading hash value dispersion. -- ___ Python tracker __

[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-31 Thread Christian Heimes
Changes by Christian Heimes : Added file: http://bugs.python.org/file32440/fb2f9c0bbca9.diff ___ Python tracker ___ ___ Python-bugs-list maili

[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-29 Thread Christian Heimes
Changes by Christian Heimes : Added file: http://bugs.python.org/file32417/4756e9ed0328.diff ___ Python tracker ___ ___ Python-bugs-list maili

[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-29 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: About memcpy(). Here is sample file. Compile it to assembler: gcc -O2 -S -masm=intel fnv.c With memcpy() main loop is compiled to: .L3: mov esi, DWORD PTR [ebx] imuleax, eax, 103 add ebx, 4 xor eax, esi sub ecx, 1 mov DWORD P

[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-29 Thread Serhiy Storchaka
Changes by Serhiy Storchaka : -- Removed message: http://bugs.python.org/msg201675 ___ Python tracker ___ ___ Python-bugs-list mailing

[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-29 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: About memcpy(). Here is sample file. Compile it to assembler: gcc -O2 -S -masm=intel fnv.c With memcpy() main loop is compiled to: .L8: movzx ecx, BYTE PTR [ebx+edx] imuleax, eax, 103 add edx, 1 xor eax, ecx cmp edx, edi jn

[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-29 Thread Christian Heimes
Christian Heimes added the comment: Victor: I have added the licence to Doc/licence.rst and created a new ticket for PY_UINT64_T on Windows #19433. Nick: The memory layout of the hash secret is now documented. I have renamed the members to reflect their purpose, too. http://hg.python.org/feat

[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-29 Thread STINNER Victor
STINNER Victor added the comment: To support Windows 32 bit, the following code in PC/pyconfig.h can be modified to use __int64 or _W64: see ssize_t definition below in the same file. #ifndef PY_UINT64_T #if SIZEOF_LONG_LONG == 8 #define HAVE_UINT64_T 1 #define PY_UINT64_T unsigned PY_LONG_LONG

[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-29 Thread STINNER Victor
STINNER Victor added the comment: + The above copyright notice and this permission notice shall be included in + all copies or substantial portions of the Software. You should copy the license into Doc/license.rst. -- nosy: +haypo ___ Python tracker

[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-29 Thread Nick Coghlan
Nick Coghlan added the comment: Christian's general approach looks fine to me - consolidating the "kind" hashes (i.e. byte sequences, numbers and pointers) into one place independent of any particular type implementation makes sense to me, and the clear abstraction of "What is a hash function?

[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-28 Thread Arfrever Frehtes Taifersar Arahesis
Changes by Arfrever Frehtes Taifersar Arahesis : -- nosy: +Arfrever ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscri

[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-28 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Here is simplified version of the patch. 19427e9cc500.diff: 22 files changed, 746 insertions(+), 211 deletions(-), 28 modifications(!) 19427e9cc500-simplified.diff: 21 files changed, 486 insertions(+), 67 deletions(-), 27 modifications(!) -- Added f

[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-28 Thread Antoine Pitrou
Changes by Antoine Pitrou : Added file: http://bugs.python.org/file32398/b8d39bf9ca4a.diff ___ Python tracker ___ ___ Python-bugs-list mailing

[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-28 Thread Antoine Pitrou
Antoine Pitrou added the comment: I think Christian is right here. Hashing unaligned memory areas will happen quite rarely. It should work, but it doesn't have to be as fast as the common case. -- ___ Python tracker

[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-28 Thread Christian Heimes
Christian Heimes added the comment: The code in your example uses volatile. That prevents lots of compiler optimizations. In my experience compilers and CPU do a better optimization job than humans until the human factor interferes with the compiler. Even 40% might not be slower than calling m

[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-28 Thread Christian Heimes
Christian Heimes added the comment: Am 28.10.2013 18:15, schrieb Serhiy Storchaka: > _PyHash_Fini() should be moved out too._Py_HashBytes() is only function which > should be customized. You still haven' convinced me to scatter hash-related functions over multiple C files. And it won't work wit

[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-28 Thread Charles-François Natali
Charles-François Natali added the comment: >> Well, unaligned memory access is usually slower on all architectures :-) >> Also, I think some ARM architectures don't support unaligned access, so >> it's not really a thing of the past... > > On modern computers it's either not slower or just a tiny

[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-28 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: > Because you can't simple replace the files. Why not? This looks as simplest option when you build hard customized CPython. > It also contains _Py_HashBytes() and _PyHash_Fini(). _PyHash_Fini() should be moved out too._Py_HashBytes() is only function

[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-28 Thread Christian Heimes
Christian Heimes added the comment: I have added an optimization for hashing of small strings. It uses an inline version of DJBX33A for small strings [1, 7) on 64bit and [1, 5) on 32bit. Nick, please use "create patch" before you do your review. --

[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-28 Thread Christian Heimes
Christian Heimes added the comment: Am 28.10.2013 16:59, schrieb Charles-François Natali: > Well, unaligned memory access is usually slower on all architectures :-) > Also, I think some ARM architectures don't support unaligned access, so > it's not really a thing of the past... On modern comput

[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-28 Thread Charles-François Natali
Charles-François Natali added the comment: > Seriously, nobody gives a ... about SPARC and MIPS. :) It's nice that > Python still works on these CPU architectures. But I neither want to > deviate from the SipHash24 implementation nor make the code slower on > all relevant platforms such as X86 an

[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-28 Thread Christian Heimes
Christian Heimes added the comment: Serhiy, I would like to land my patch before beta 1 hits the fan. We can always improve the code during beta. Right now I don't want to mess around with SipHash24 code. That includes non-64bit platforms as well as architectures that enforce aligned memory fo

[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-28 Thread Christian Heimes
Christian Heimes added the comment: Am 28.10.2013 16:18, schrieb Serhiy Storchaka: > Christian, why PY_HASH_EXTERNAL is here? Do you plan use it any official > build? I think that in custom build of Python whole files pyhash.c and > pyhash.h can be replaced. Because you can't simple replace th

[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-28 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Christian, why PY_HASH_EXTERNAL is here? Do you plan use it any official build? I think that in custom build of Python whole files pyhash.c and pyhash.h can be replaced. When you will get rid from PY_HASH_EXTERNAL, then you could get rid from PyHash_FuncDef

[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-28 Thread Nick Coghlan
Nick Coghlan added the comment: On 27 Oct 2013 23:46, "Christian Heimes" wrote: > > > Christian Heimes added the comment: > > Nick, please review the latest patch. I have addressed Antoine's review in 257597d20fa8.diff. I'll update the PEP as soon as you are happy with the patch. Comments from

[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-27 Thread Christian Heimes
Changes by Christian Heimes : Added file: http://bugs.python.org/file32392/19427e9cc500.diff ___ Python tracker ___ ___ Python-bugs-list maili

[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-27 Thread Christian Heimes
Christian Heimes added the comment: PyObject_Hash() and PyObject_HashNotImplemented() should not have been moved to pyhash.h. But the other internal helper methods should be kept together. -- ___ Python tracker __

[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-27 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: I suggest move to Include/pyhash.h and Python/pyhash.c only _Py_HashBytes() and string hash algorithm related constants, and don't touch PyObject_Hash(), _Py_HashDouble(), etc. So if anybody want change string hashing algorithm, it need only replace these tw

[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-27 Thread Christian Heimes
Christian Heimes added the comment: Nick, please review the latest patch. I have addressed Antoine's review in 257597d20fa8.diff. I'll update the PEP as soon as you are happy with the patch. -- assignee: -> ncoghlan ___ Python tracker

[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-27 Thread Christian Heimes
Changes by Christian Heimes : Added file: http://bugs.python.org/file32388/257597d20fa8.diff ___ Python tracker ___ ___ Python-bugs-list maili

[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-27 Thread Antoine Pitrou
Antoine Pitrou added the comment: > I can no longer find the configuration for custom path. It's still documented > but there is no field for "repo path". http://buildbot.python.org/all/builders/PPC64%20PowerLinux%20custom (usually, just replace "3.x" with "custom" in the URL) -- ___

[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-27 Thread Christian Heimes
Christian Heimes added the comment: I can no longer find the configuration for custom path. It's still documented but there is no field for "repo path". http://buildbot.python.org/all/buildslaves/edelsohn-powerlinux-ppc64 -- ___ Python tracker

[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-27 Thread Antoine Pitrou
Antoine Pitrou added the comment: > I'm still looking for a 64bit big endian box Have you tried the PPC64 PowerLinux box? It's in the stable buildbots for a reason :-) -- ___ Python tracker __

[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-26 Thread Christian Heimes
Christian Heimes added the comment: Nick, can you do another review? All tests should pass on common boxes. The latest code hides the struct with the hash function. I have added a configure block that detects platforms that don't support unaligned memory access. It works correctly on the SPARC

[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-26 Thread Christian Heimes
Changes by Christian Heimes : Added file: http://bugs.python.org/file32384/31ce9488be1c.diff ___ Python tracker ___ ___ Python-bugs-list maili

[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-26 Thread Christian Heimes
Changes by Christian Heimes : Removed file: http://bugs.python.org/file32365/38b3ad4287ef.diff ___ Python tracker ___ ___ Python-bugs-list mai

[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-25 Thread Christian Heimes
Changes by Christian Heimes : Added file: http://bugs.python.org/file32365/38b3ad4287ef.diff ___ Python tracker ___ ___ Python-bugs-list maili

[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-24 Thread Christian Heimes
Christian Heimes added the comment: I have created a clone for PEP 456 and applied your suggestions. I'm still looking for a nice API to handle the hash definition. Do you have some suggestions? -- hgrepos: +212 ___ Python tracker

[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-14 Thread Nick Coghlan
Nick Coghlan added the comment: Added some structural comments to the patch. I'll defer to Serhiy when it comes to assessing the algorithm details :) -- ___ Python tracker ___ _

[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-07 Thread Christian Heimes
Christian Heimes added the comment: Sure it does. The test for unaligned hashing passes without an error or a segfault. -- ___ Python tracker ___ ___

[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-07 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: > The test for unaligned hashing passes without an error or a segfault. On some platforms it can work without a segfault. -- ___ Python tracker _

[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-07 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: And note that the quality of the FNV hash function is reduced (msg186403). We need "shuffle" result's bits. -- ___ Python tracker ___ ___

[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-07 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Since hash algorithm determined at compile time, the _Py_HashSecret_t structure and the _Py_HashSecret function are redundant. We need define only the _Py_HashBytes function. Currently SipHash algorithm doesn't work with unaligned data. --

[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-07 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: I propose extract all hash related stuff from Include/object.h in separated file Include/pyhash.h. And perhaps move Objects/hash.c to Python/pyhash.c. -- nosy: +serhiy.storchaka ___ Python tracker

[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-07 Thread Roundup Robot
Roundup Robot added the comment: New changeset c960bed22bf6 by Christian Heimes in branch 'default': Make Nick BDFG delegate http://hg.python.org/peps/rev/c960bed22bf6 -- nosy: +python-dev ___ Python tracker __

[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-07 Thread Christian Heimes
Christian Heimes added the comment: unmodified Python: 1000 loops, best of 3: 307 usec per loop (unicode) 1000 loops, best of 3: 930 usec per loop (memoryview) SipHash: 1000 loops, best of 3: 300 usec per loop (unicode) 1000 loops, best of 3: 906 usec per loop (memoryview) -- ___

[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-07 Thread Antoine Pitrou
Antoine Pitrou added the comment: > Your benchmark is a bit unrealistic because it times the hash cache > most of the time. Here is a better benchmark (but bytes-only): > > $ ./python -m timeit -s "words=[w.encode('utf-8') for line in > open('../LICENSE') for w in line.split()]; import collectio

[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-07 Thread Christian Heimes
Christian Heimes added the comment: Your benchmark is a bit unrealistic because it times the hash cache most of the time. Here is a better benchmark (but bytes-only): $ ./python -m timeit -s "words=[w.encode('utf-8') for line in open('../LICENSE') for w in line.split()]; import collections" --

[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-06 Thread Antoine Pitrou
Antoine Pitrou added the comment: Microbenchmarking hash computation (Linux, gcc 4.7.3): * Short strings: python -m timeit -s "b=b'x'*20" "hash(memoryview(b))" - 64-bit build, before: 0.263 usec per loop - 64-bit build, after: 0.263 usec per loop - 32-bit build, before: 0.303 usec per loop -

[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-06 Thread Antoine Pitrou
Antoine Pitrou added the comment: Here is a simple benchmark (Linux, gcc 4.7.3): $ ./python -m timeit -s "words=[w for line in open('LICENSE') for w in line.split()]; import collections" "c = collections.Counter(words); c.most_common(10)" - 64-bit build, before: 313 usec per loop - 64-bit bui

[issue19183] PEP 456 Secure and interchangeable hash algorithm

2013-10-06 Thread Christian Heimes
New submission from Christian Heimes: The patch implements the current state of PEP 456 plus a configure option to select the hash algorithm. I have tested it only on 64bit Linux so far. -- components: Interpreter Core files: pep-0456-1.patch keywords: patch messages: 199078 nosy: chris