[issue21101] Extend the PyDict C API to handle cases where the hash value is known

2014-05-03 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 39f475aa0163 by Raymond Hettinger in branch 'default':
Issue 21101:  Internal API for dict getitem and setitem where the hash value is 
known.
http://hg.python.org/cpython/rev/39f475aa0163

--
nosy: +python-dev

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21101
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21101] Extend the PyDict C API to handle cases where the hash value is known

2014-05-03 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 592a57682ced by Raymond Hettinger in branch 'default':
Issue #21101:  Eliminate double hashing in the C code for collections.Counter().
http://hg.python.org/cpython/rev/592a57682ced

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21101
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21101] Extend the PyDict C API to handle cases where the hash value is known

2014-05-03 Thread Raymond Hettinger

Changes by Raymond Hettinger raymond.hettin...@gmail.com:


--
resolution:  - fixed
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21101
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21101] Extend the PyDict C API to handle cases where the hash value is known

2014-04-16 Thread Raymond Hettinger

Raymond Hettinger added the comment:

Antoine, do you support adding these as part of the public API?  If not, I can 
make them private.

I think the functions are broadly useful, but no one has ever asked for this 
functionality either.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21101
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21101] Extend the PyDict C API to handle cases where the hash value is known

2014-04-16 Thread Antoine Pitrou

Antoine Pitrou added the comment:

I think we can start with making them private. Do you know of any third-party 
code bases which may be interested in the speedup?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21101
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21101] Extend the PyDict C API to handle cases where the hash value is known

2014-04-04 Thread Terry J. Reedy

Terry J. Reedy added the comment:

While the question is reasonable, I agree with Raymond's answer. As a python 
programmer, I would not like to see
   d.setitem_known_hash(key, hash, d.getitem_known_hash(key, hash) + 1)

Of course, d[key] += 1 already solves the double lookup issue at the Python 
level. Moreover, it abbreviates the form, rather than expanding it, which is 
appropriate since it abbreviates the computation.

You could optimize get-set even more than the current proposal by saving a 
reference to the slot corresponding to a key rather than the hash that leads to 
a slot. Exposing a slot reference probably breaks encapsulation too much. This 
could be avoided by another alternative: add PyDict_Mod(ify)Item(mapping, key, 
func). It would combine get and set: find slot, get item, set func(item), and 
return whatever SetItem does on success/failure.

--
nosy: +terry.reedy

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21101
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21101] Extend the PyDict C API to handle cases where the hash value is known

2014-04-04 Thread Alex Gaynor

Alex Gaynor added the comment:

d[key] += 1 still does two dict lookups, and invokes the hash function twice:

 class X(object):
...   def __hash__(self):
... print hashed
... return 0
...   def __eq__(self, other):
... return True
...
 d = {X(): 0}
hashed
 d[X()]
hashed
0
 d[X()] = 3
hashed
 d[X()] += 1
hashed
hashed

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21101
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21101] Extend the PyDict C API to handle cases where the hash value is known

2014-04-04 Thread Antoine Pitrou

Antoine Pitrou added the comment:

 Of course, d[key] += 1 already solves the double lookup issue at the
 Python level.

What the hell are you talking about?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21101
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21101] Extend the PyDict C API to handle cases where the hash value is known

2014-04-04 Thread Terry J. Reedy

Terry J. Reedy added the comment:

What the hell I am talking about is what the doc says. 'd[key]' is written 
just once and is evaluated just once.
https://docs.python.org/3/reference/simple_stmts.html#augmented-assignment-statements

PS: Try being a bit more polite.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21101
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21101] Extend the PyDict C API to handle cases where the hash value is known

2014-04-04 Thread Antoine Pitrou

Antoine Pitrou added the comment:

 PS: Try being a bit more polite.

You could definitely do some research before posting erroneous 
statements (this one isn't difficult to check, as Alex showed). 
Especially when the other posters (Alex and Raymond) are a lot more 
competent than you on the topic at hand.

If you actually try to *reason* about it, there is no other way for:
x[k] += some_expr

to work in the general case than to execute
x.__setitem__(k, x.__getitem__(k) + some_expr)

So, yes, the lookup is done twice, because it currently can't work 
otherwise.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21101
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21101] Extend the PyDict C API to handle cases where the hash value is known

2014-04-04 Thread R. David Murray

R. David Murray added the comment:

Antoine, being polite never hurts.  Terry is a valuable member of the community 
and sure, he sometimes makes mistakes (or trusts the docs too much?).  So do 
the the rest of us.

--
nosy: +r.david.murray

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21101
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21101] Extend the PyDict C API to handle cases where the hash value is known

2014-04-04 Thread Terry J. Reedy

Terry J. Reedy added the comment:

Raymond identified a need and a possible solution. The important part of my 
post was suggesting another possible solution. Please focus on that.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21101
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21101] Extend the PyDict C API to handle cases where the hash value is known

2014-03-31 Thread Arfrever Frehtes Taifersar Arahesis

Changes by Arfrever Frehtes Taifersar Arahesis arfrever@gmail.com:


--
nosy: +Arfrever

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21101
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21101] Extend the PyDict C API to handle cases where the hash value is known

2014-03-31 Thread Josh Rosenberg

Changes by Josh Rosenberg shadowranger+pyt...@gmail.com:


--
nosy: +josh.rosenberg

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21101
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21101] Extend the PyDict C API to handle cases where the hash value in known

2014-03-30 Thread Raymond Hettinger

Changes by Raymond Hettinger raymond.hettin...@gmail.com:


--
components: Interpreter Core
files: known_hash.diff
keywords: patch
nosy: rhettinger
priority: normal
severity: normal
stage: patch review
status: open
title: Extend the PyDict C API to handle cases where the hash value in known
type: enhancement
versions: Python 3.5
Added file: http://bugs.python.org/file34666/known_hash.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21101
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21101] Extend the PyDict C API to handle cases where the hash value in known

2014-03-30 Thread Raymond Hettinger

Changes by Raymond Hettinger raymond.hettin...@gmail.com:


Added file: http://bugs.python.org/file34667/applied_known_hash.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21101
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21101] Extend the PyDict C API to handle cases where the hash value in known

2014-03-30 Thread Raymond Hettinger

New submission from Raymond Hettinger:

Propose adding two functions, PyDict_GetItem_KnownHash() and 
PyDict_SetItem_KnownHash().

It is reasonably common to make two successive dictionary accesses with the 
same key.  This results in calling the hash function twice to compute the same 
result.

For example, the technique can be used to speed-up collections.Counter (see the 
attached patch to show how).  In that patch, the hash is computed once, then 
used twice (to retrieve the prior count and to store the new count.

There are many other places in the standard library that could benefit:

Modules/posixmodule.c 1254
Modules/pyexpat.c 343 and 1788 and 1798
Modules/_json.c 628 and 1446 and 1515 and 1697
Modules/selectmodule.c 465
Modules/zipmodule.c 137
Objects/typeobject.c 6678 and 6685
Objects/unicodeobject.c 14997
Python/_warnings.c 195
Python/compile.c 1134
Python/import.c 1046 and 1066
Python/symtable 671 and 687 and 1068

A similar technique has been used for years in the Objects/setobject.c 
internals as a way to eliminate unnecessary calls to PyObject_Hash() during 
set-to-set and set-to-dict operations.

The benefit is biggest for objects such as tuples or user-defined classes that 
have to recompute the hash on every call on PyObject_Hash().

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21101
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21101] Extend the PyDict C API to handle cases where the hash value in known

2014-03-30 Thread Raymond Hettinger

Changes by Raymond Hettinger raymond.hettin...@gmail.com:


Added file: http://bugs.python.org/file34668/double_counter_hash.py

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21101
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21101] Extend the PyDict C API to handle cases where the hash value in known

2014-03-30 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


--
nosy: +serhiy.storchaka

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21101
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21101] Extend the PyDict C API to handle cases where the hash value in known

2014-03-30 Thread Antoine Pitrou

Antoine Pitrou added the comment:

Is there any benefit in making them public API functions?

--
nosy: +pitrou

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21101
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21101] Extend the PyDict C API to handle cases where the hash value in known

2014-03-30 Thread Raymond Hettinger

Raymond Hettinger added the comment:

 Is there any benefit in making them public API functions?

Originally, I was going to suggest them as internal functions, but the variety 
of use cases in the standard library suggested that third-party C extensions 
would benefit as well.  

Since this isn't functionality a user can create themselves, a public function 
is the only way to expose this service.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21101
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21101] Extend the PyDict C API to handle cases where the hash value in known

2014-03-30 Thread Alex Gaynor

Alex Gaynor added the comment:

Would it be reasonable to develop a Python API for this? If C functions have a 
need to do this, surely Python code does as well.

--
nosy: +alex

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21101
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21101] Extend the PyDict C API to handle cases where the hash value is known

2014-03-30 Thread Raymond Hettinger

Changes by Raymond Hettinger raymond.hettin...@gmail.com:


--
title: Extend the PyDict C API to handle cases where the hash value in known - 
Extend the PyDict C API to handle cases where the hash value is known

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21101
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21101] Extend the PyDict C API to handle cases where the hash value is known

2014-03-30 Thread Raymond Hettinger

Raymond Hettinger added the comment:

 Would it be reasonable to develop a Python API for this? 

I suspect that in pure Python, the overhead would exceed the benefit.

Current code:

   d[key] = d[key] + 1

Effort to save a double hash:

   h = hash(key)
   c = d.getitem_known_hash(key, hash)
   d.setitem_known_hash(key, hash, c + 1)

In PyPy, the second code sample might actually be faster that the first, but 
all the other pythons suffer from interpreter overhead, a double dict lookup 
for the hash() function, two dict lookups just to find the new methods, 
allocating a bound method, and forming two argument tuples.

There is also an API design issue.  The pure python core datatype APIs are 
designed in a way to encourage higher level thinking (that is why we can't 
directly manage hash table sparsity or list over-allocation for example).

In contrast, the C API is aimed at users who seek additional control and  
optimization at a lower level than the core language provides.  We tend to 
expose a number of tools at the C level that would be inappropriate for 
higher-level code.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21101
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com