Raymond Hettinger added the comment:
Serhiy, Gregory, Raymond, Antoine: so what is your feeling on
this issue? Is it worth it?
I don't think it is worth it. There may be some cases that benefit, but it
adds extra branching code to the common cases (sets and dicts) that already
have the
STINNER Victor added the comment:
If it's hard to see a real speedup, it's probably not interesting to use the
hash in string comparison.
--
resolution: - invalid
status: open - closed
___
Python tracker rep...@bugs.python.org
STINNER Victor added the comment:
I ran pybench with the patch. I don't understand this result (10% slower with
the patch):
DictWithStringKeys:28ms25ms +10.7%28ms26ms +10.5%
This test doesn't use unicode_compare_eq() from Objects/unicodeobject.c but
unicode_eq() from
STINNER Victor added the comment:
(oops, I didn't want to close the issue, it's a mistake)
--
resolution: invalid -
status: closed - open
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16286
STINNER Victor added the comment:
I added recently a new _PyUnicode_CompareWithId() function: changeset
77bebcf5c4cf (issue #19512).
This function can be used instead of PyUnicode_CompareWithASCIIString() when
the right parameter is a common string. It is interesting when the right string
is
STINNER Victor added the comment:
Serhiy, Gregory, Raymond, Antoine: so what is your feeling on this issue? Is it
worth it?
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16286
___
Roundup Robot added the comment:
New changeset 536a7c09c7fd by Victor Stinner in branch 'default':
Issue #16286: write a new subfunction bytes_compare_eq()
http://hg.python.org/cpython/rev/536a7c09c7fd
--
nosy: +python-dev
___
Python tracker
Roundup Robot added the comment:
New changeset 5fa291435740 by Victor Stinner in branch 'default':
Issue #16286: optimize PyUnicode_RichCompare() for identical strings (same
http://hg.python.org/cpython/rev/5fa291435740
--
___
Python tracker
Roundup Robot added the comment:
New changeset da9c6e4ef301 by Victor Stinner in branch 'default':
Issue #16286: remove duplicated identity check from unicode_compare()
http://hg.python.org/cpython/rev/da9c6e4ef301
--
___
Python tracker
STINNER Victor added the comment:
I applied changes unrelated to the hash.
--
Added file: http://bugs.python.org/file32493/compare_hash-3.patch
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16286
STINNER Victor added the comment:
Results of benchmarks using compare_hash-3.patch:
$ time ../benchmarks/perf.py -r -b default ./pythonorig ./pythonhash
INFO:root:Skipping benchmark slowspitfire; not compatible with Python 3.4
INFO:root:Skipping benchmark slowpickle; not compatible with Python
STINNER Victor added the comment:
Updated patch.
--
Added file: http://bugs.python.org/file32445/compare_hash-2.patch
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16286
___
Antoine Pitrou added the comment:
That raises the question of what strings ever have had their hash
already computed if the string hasn't been interned or has been used
in a dict or set?
Currently, none, I think.
--
nosy: +pitrou
___
Python
STINNER Victor added the comment:
That raises the question of what strings ever have had their hash
already computed if the string hasn't been interned or has been used
in a dict or set?
Currently, none, I think.
Strings are used (and compared) outside dict and set.
--
STINNER Victor added the comment:
Let's try to identify some use cases in the Python test suite using gdb:
(gdb) b unicode_compare_eq
(gdb) condition 1 ((PyASCIIObject*)str1)-hash != -1
((PyASCIIObject*)str2)-hash != -1 ((PyASCIIObject*)str1)-hash !=
((PyASCIIObject*)str2)-hash
(gdb) run
STINNER Victor added the comment:
(4) str in __all__ (list of str):
os.py:
if putenv not in __all__:
__all__.append(putenv)
For this example: putenv is probably interned by def putenv(...). putenv
string and the name of the function are the same constant. When a function is
STINNER Victor added the comment:
I will benchmark the overhead of memcmp() on short strings. We may
check the first and last characters before calling memcmp() to limit
the overhead of calling a function.
I created the issue #17628 for this point.
--
Changes by STINNER Victor victor.stin...@gmail.com:
--
title: Optimize a==b and a!=b for bytes and str - Use hash if available to
optimize a==b and a!=b for bytes and str
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue16286
18 matches
Mail list logo