[issue28199] Compact dict resizing is doing too much work

2016-11-06 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Re-committed. It might be dangerous to commit this in 3.6 at that stage.

--
resolution:  -> fixed
stage: commit review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28199] Compact dict resizing is doing too much work

2016-11-06 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 39f33c15243b by Serhiy Storchaka in branch 'default':
Issue #28199: Microoptimized dict resizing.  Based on patch by Naoki Inada.
https://hg.python.org/cpython/rev/39f33c15243b

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28199] Compact dict resizing is doing too much work

2016-11-04 Thread Xiang Zhang

Xiang Zhang added the comment:

#28580 and #28583 are resolved now. I think dictresize4 can be recommited now.

--
stage: needs patch -> commit review

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28199] Compact dict resizing is doing too much work

2016-11-02 Thread INADA Naoki

INADA Naoki added the comment:

Thanks, Xiang.

Shard-key dict is very hard to maintain...

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28199] Compact dict resizing is doing too much work

2016-11-01 Thread Xiang Zhang

Xiang Zhang added the comment:

Open #28583 and #28580 to tackle this.

--
dependencies: +Optimize _PyDict_Next for split table, PyDict_SetDefault doesn't 
combine split table when needed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28199] Compact dict resizing is doing too much work

2016-11-01 Thread Xiang Zhang

Xiang Zhang added the comment:

I use gdb to run setuptools test suite and find the assumption, split tables 
are always dense is broken for both dictresize3 and dictresize4.

#0  0x771171c7 in __GI_raise (sig=sig@entry=6) at 
../sysdeps/unix/sysv/linux/raise.c:55
#1  0x77118e2a in __GI_abort () at abort.c:89
#2  0x771100bd in __assert_fail_base (fmt=0x77271f78 "%s%s%s:%u: 
%s%sAssertion `%s' failed.\n%n", 
assertion=assertion@entry=0x5e4b90 "oldvalues[i] != ((void *)0)", 
file=file@entry=0x5e4aa0 "Objects/dictobject.c", line=line@entry=1270, 
function=function@entry=0x5e59f0 <__PRETTY_FUNCTION__.12083> "dictresize") 
at assert.c:92
#3  0x77110172 in __GI___assert_fail 
(assertion=assertion@entry=0x5e4b90 "oldvalues[i] != ((void *)0)", 
file=file@entry=0x5e4aa0 "Objects/dictobject.c", line=line@entry=1270, 
function=function@entry=0x5e59f0 <__PRETTY_FUNCTION__.12083> "dictresize")
at assert.c:101
#4  0x0048bddc in dictresize (mp=mp@entry=0x7219d2b0, 
minused=) at Objects/dictobject.c:1270
#5  0x0048bf93 in insertion_resize (mp=mp@entry=0x7219d2b0) at 
Objects/dictobject.c:1100
#6  0x0048c5fd in insertdict (mp=mp@entry=0x7219d2b0, 
key=key@entry=0x7579c3c0, hash=-3681610201421769281, 
value=value@entry=0x707f56e8) at Objects/dictobject.c:1136
#7  0x0048fdfd in PyDict_SetItem (op=op@entry=0x7219d2b0, 
key=key@entry=0x7579c3c0, value=value@entry=0x707f56e8)
at Objects/dictobject.c:1572
#8  0x00492cb5 in _PyObjectDict_SetItem (tp=tp@entry=0xd52548, 
dictptr=0x7080cbd8, key=key@entry=0x7579c3c0, 
value=value@entry=0x707f56e8) at Objects/dictobject.c:4274
#9  0x0049df8a in _PyObject_GenericSetAttrWithDict (obj=0x7080cbb8, 
name=0x7579c3c0, value=0x707f56e8, dict=dict@entry=0x0)
at Objects/object.c:1172
#10 0x0049e0cf in PyObject_GenericSetAttr (obj=, 
name=, value=) at Objects/object.c:1194
#11 0x0049d80e in PyObject_SetAttr (v=v@entry=0x7080cbb8, 
name=name@entry=0x7579c3c0, value=value@entry=0x707f56e8)
at Objects/object.c:932

Thanks to Victor's _PyDict_CheckConsistency, it's easy then to find even 
without dictresize3 and dictresize4 (the current version), the test suite still 
fails (#define DEBUG_PYDICT).

#0  0x771171c7 in __GI_raise (sig=sig@entry=6) at 
../sysdeps/unix/sysv/linux/raise.c:55
#1  0x77118e2a in __GI_abort () at abort.c:89
#2  0x771100bd in __assert_fail_base (fmt=0x77271f78 "%s%s%s:%u: 
%s%sAssertion `%s' failed.\n%n", 
assertion=assertion@entry=0x5e53a0 "mp->ma_values[i] != ((void *)0)", 
file=file@entry=0x5e4d00 "Objects/dictobject.c", line=line@entry=498, 
function=function@entry=0x5e5dd0 <__PRETTY_FUNCTION__.11869> 
"_PyDict_CheckConsistency") at assert.c:92
#3  0x77110172 in __GI___assert_fail 
(assertion=assertion@entry=0x5e53a0 "mp->ma_values[i] != ((void *)0)", 
file=file@entry=0x5e4d00 "Objects/dictobject.c", line=line@entry=498, 
function=function@entry=0x5e5dd0 <__PRETTY_FUNCTION__.11869> 
"_PyDict_CheckConsistency") at assert.c:101
#4  0x0048ba17 in _PyDict_CheckConsistency (mp=mp@entry=0x70806e68) 
at Objects/dictobject.c:498
#5  0x004927a3 in PyDict_SetDefault (d=d@entry=0x70806e68, 
key=0x72ffcdd8, defaultobj=0x8abf20 <_Py_NoneStruct>)
at Objects/dictobject.c:2807
#6  0x00492854 in dict_setdefault (mp=0x70806e68, args=) at Objects/dictobject.c:2824
#7  0x00499469 in _PyCFunction_FastCallDict 
(func_obj=func_obj@entry=0x70f2f8c8, args=args@entry=0x105afe8, 
nargs=nargs@entry=2, 
kwargs=kwargs@entry=0x0) at Objects/methodobject.c:234
#8  0x00499815 in _PyCFunction_FastCallKeywords 
(func=func@entry=0x70f2f8c8, stack=stack@entry=0x105afe8, 
nargs=nargs@entry=2, 
kwnames=kwnames@entry=0x0) at Objects/methodobject.c:295
#9  0x00537b6f in call_function 
(pp_stack=pp_stack@entry=0x7fff5cd0, oparg=oparg@entry=2, 
kwnames=kwnames@entry=0x0)
at Python/ceval.c:4793

>From the backtrace we can see PyDict_SetDefault breaks the invariant. And 
>reading the code, yes, it doesn't handle split table separately.

I simply replace the logic in PyDict_SetDefault with insertdict to make a test. 
It doesn't fail, even with dictresize4.

An easy example to reproduce:

>>> class C:
... pass
... 
>>> c1, c2 = C(), C()
>>> c1.a, c1.b = 1, 2
>>> c2.__dict__.setdefault('b', None)
python: Objects/dictobject.c:498: _PyDict_CheckConsistency: Assertion 
`mp->ma_values[i] != ((void *)0)' failed.
Aborted (core dumped)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28199] Compact dict resizing is doing too much work

2016-10-31 Thread Ned Deily

Ned Deily added the comment:

Excellent, thanks everyone!  I'll leave this open for re-evaluation for 3.7.

--
priority: release blocker -> 
resolution: duplicate -> 
stage: test needed -> needs patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28199] Compact dict resizing is doing too much work

2016-10-31 Thread Jason R. Coombs

Jason R. Coombs added the comment:

Testing now...

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28199] Compact dict resizing is doing too much work

2016-10-31 Thread Jason R. Coombs

Jason R. Coombs added the comment:

Confirmed. Tests are now passing where they were failing before. Thanks for the 
quick response!

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28199] Compact dict resizing is doing too much work

2016-10-31 Thread Ned Deily

Ned Deily added the comment:

Thanks, Serhiy!  Jason, can you verify that there is no longer a 3.6 regression 
with your tests?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28199] Compact dict resizing is doing too much work

2016-10-31 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

I have rolled back the changes.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28199] Compact dict resizing is doing too much work

2016-10-31 Thread Xiang Zhang

Changes by Xiang Zhang :


--
Removed message: http://bugs.python.org/msg279811

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28199] Compact dict resizing is doing too much work

2016-10-31 Thread Xiang Zhang

Xiang Zhang added the comment:

The bug seems to lies in 
https://hg.python.org/cpython/file/tip/Objects/dictobject.c#l1291. We should 
use oldkeys->dk_nentries instead of numentries.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28199] Compact dict resizing is doing too much work

2016-10-31 Thread Ned Deily

Ned Deily added the comment:

Thanks, Jason, for the heads-up.  Serhiy, can you take a look at this quickly?  
I'm going to hold 360b3 until we have a better idea what's going on.

--
priority: normal -> release blocker
resolution: fixed -> duplicate
stage: resolved -> test needed
status: closed -> open

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28199] Compact dict resizing is doing too much work

2016-10-31 Thread Jason R. Coombs

Jason R. Coombs added the comment:

In https://github.com/pypa/setuptools/issues/836, I've pinpointed this commit 
as implicated in dictionaries spontaneously losing keys. I have not yet 
attempted to replicate the issue in a standalone environment, and I'm hoping 
someone with a better understanding of the implementation can devise a 
reproduction that distills the issue that setuptools seems only to hit in very 
specialized conditions.

Please let me know if I can help by providing more detail in the environment 
where this occurs or by filing another ticket. Somewhere we should capture that 
this is a regression pending release in 3.6.0b3 today, and for that reason, I'm 
adding Ned to this ticket.

--
nosy: +jason.coombs, ned.deily

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28199] Compact dict resizing is doing too much work

2016-10-29 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

I don't have strong preferences, but at this moment dictresize4.patch looks a 
little better to me. Maybe I wrong and in future optimizations we will returned 
to insert_index().

Yet one opportunity for future optimization -- inline dk_get_index() and 
dk_set_index() and move check for indices width out of the loop. I don't know 
whether there is significant effect of this.

--
resolution:  -> fixed
stage:  -> resolved
status: open -> closed
type:  -> performance
versions: +Python 3.7 -Python 3.6

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28199] Compact dict resizing is doing too much work

2016-10-29 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 6b88dfc7b25d by Serhiy Storchaka in branch '3.6':
Issue #28199: Microoptimized dict resizing.  Based on patch by Naoki Inada.
https://hg.python.org/cpython/rev/6b88dfc7b25d

New changeset f0fbc6071d7e by Serhiy Storchaka in branch 'default':
Issue #28199: Microoptimized dict resizing.  Based on patch by Naoki Inada.
https://hg.python.org/cpython/rev/f0fbc6071d7e

--
nosy: +python-dev

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28199] Compact dict resizing is doing too much work

2016-10-29 Thread INADA Naoki

INADA Naoki added the comment:

Serhiy, would you commit it by 3.6b3?

-- sent from mobile

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28199] Compact dict resizing is doing too much work

2016-10-29 Thread Xiang Zhang

Xiang Zhang added the comment:

If you inline insert_index, dictresize3 is not that bad.

./python3 -m perf timeit -s 'x = list(range(1000))' -- 'dict.fromkeys(x)'
dictresize3: Median +- std dev: 43.9 us +- 0.7 us
dictresize3(insert_index inlined): Median +- std dev: 41.6 us +- 0.6 us
dictresize4: Median +- std dev: 41.7 us +- 1.2 us

But don't bother on microbenchmark, just move on. I just think the logic is not 
as clear as dictresize3. But the easiness for future modification makes sense.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28199] Compact dict resizing is doing too much work

2016-10-28 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Yes, the difference is pretty small. The largest difference I got is 7% in 
following microbenchmark:

$ ./python -m perf timeit -s 'x = list(range(1000))' -- 'dict.fromkeys(x)'
Unpatched:  Median +- std dev: 80.6 us +- 0.5 us
dictresize3.patch:  Median +- std dev: 80.9 us +- 2.8 us
dictresize4.patch:  Median +- std dev: 75.4 us +- 2.1 us

I suppose this is due to using memcpy. In other microbenchmarks the difference 
usually is 1-2%.

Xiang, your microbenchmark don't work for this patch because dict(d) make only 
one resizing.

> Cons of Serhiy's patch is it's two pass. If entries are larger than L2 cache,
fetch from L3 cache may be larger.

Agree. But on other hand, dictresize3.patch needs to fit in L2 cache all 
indices table and prefetch two entries arrays: old and new. dictresize4.patch 
in every loop works with smaller memory: two entries arrays in the first loop 
and an indices table and new entries arrays in the second loop.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28199] Compact dict resizing is doing too much work

2016-10-28 Thread STINNER Victor

STINNER Victor added the comment:

Remark about perf timeout: you can use --compare-to with the patched
Python binary to check if the result is significant and compute the
"x.xx faster" number.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28199] Compact dict resizing is doing too much work

2016-10-28 Thread Xiang Zhang

Xiang Zhang added the comment:

I doubt how many memcpy could benefit. Two pass does not necessarily make 
faster. I make a simple test:

With dictresize3, (I make insert_index inline):

[bin]$ ./python3 -m perf timeit -s 'd = {i:i for i in range(6)}' 'dict(d)'

Median +- std dev: 441 ns +- 21 ns
[bin]$ ./python3 -m perf timeit -s 'd = {i:i for i in range(60)}' 'dict(d)'

Median +- std dev: 2.02 us +- 0.10 us
[bin]$ ./python3 -m perf timeit -s 'd = {i:i for i in range(600)}' 'dict(d)'

Median +- std dev: 18.1 us +- 0.9 us

With dictresize4:

[bin]$ ./python3 -m perf timeit -s 'd = {i:i for i in range(6)}' 'dict(d)'

Median +- std dev: 448 ns +- 33 ns
[bin]$ ./python3 -m perf timeit -s 'd = {i:i for i in range(60)}' 'dict(d)'

Median +- std dev: 2.04 us +- 0.09 us
[bin]$ ./python3 -m perf timeit -s 'd = {i:i for i in range(600)}' 'dict(d)'

Median +- std dev: 18.2 us +- 0.1 us

Just like INAKA states, there is hardly any difference. And you need 2 pass for 
dicts with deleted member.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28199] Compact dict resizing is doing too much work

2016-10-28 Thread INADA Naoki

INADA Naoki added the comment:

Current code and my patch called insertdict_clean() or insert_index() for each 
entry.
On the other hand, Serhiy's patch calls build_indices() once.
This may be faster when compiler doesn't inlining the helper function.
As a bonus, we can use memcpy to copy entries.

Cons of Serhiy's patch is it's two pass. If entries are larger than L2 cache,
fetch from L3 cache may be larger.
So I can't declare that Serhiy's patch is faster until benchmark.

(My forecast is no performance difference between my patch and Serhiy's on amd64
machine, and Serhiy's patch is faster on more poor CPU.)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28199] Compact dict resizing is doing too much work

2016-10-28 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

The advantage is using memcpy in case of combined table without deletions. This 
is common case of creating dict without pre-resizing: dict(list), 
dict(iterator), etc.

In future, when will be possible to reuse old entries array, this also could 
help in case of multiple modifications without significant size growth. First 
squeeze entries (skipping NULL values), then clear and rebuild index table.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28199] Compact dict resizing is doing too much work

2016-10-28 Thread INADA Naoki

INADA Naoki added the comment:

LGTM.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28199] Compact dict resizing is doing too much work

2016-10-28 Thread Xiang Zhang

Xiang Zhang added the comment:

Hmm, what's the advantage?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28199] Compact dict resizing is doing too much work

2016-10-28 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

What if split copying entries from inserting indices? First fill a continuous 
array of entries (could use memcpy if there were not deletions), then build a 
hashtable of continuous indices. Following patch implements this idea.

--
Added file: http://bugs.python.org/file45252/dictresize4.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28199] Compact dict resizing is doing too much work

2016-10-25 Thread INADA Naoki

INADA Naoki added the comment:

@haypo, could you review this?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28199] Compact dict resizing is doing too much work

2016-10-06 Thread INADA Naoki

INADA Naoki added the comment:

Since entries array is embedded in PyDictKeysObject, we can't realloc entries.
And while values are split array, dictresize() convert split table into combine 
table.

Split table may have enough size of ma_values at first in typical case.
And in not typical case, split table may not be used.
So I think realloc ma_values is premature optimization, unless profiler says 
make_keys_shared()
is slow.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28199] Compact dict resizing is doing too much work

2016-10-06 Thread Raymond Hettinger

Raymond Hettinger added the comment:

For the simple case with no dummy entries, I was expecting a fast path that 
just realloced the keys/values/hashes arrays and then updated the index table 
with reinsertion logic that only touches the indices.   Use realloc() is nice 
because it makes it possible that the keys/values/hashes don't have to be 
recopied and if they did, it would use a fast memcpy to move them to the newly 
resized array.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28199] Compact dict resizing is doing too much work

2016-10-06 Thread INADA Naoki

Changes by INADA Naoki :


Added file: http://bugs.python.org/file44983/dictresize3.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28199] Compact dict resizing is doing too much work

2016-09-30 Thread INADA Naoki

INADA Naoki added the comment:

Ah, I'm sorry.
I forget to remove some changes relating to inplace compaction (reusing oldkeys 
when oldsize==newsize).

--
Added file: http://bugs.python.org/file44893/dictresize2.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28199] Compact dict resizing is doing too much work

2016-09-30 Thread STINNER Victor

STINNER Victor added the comment:

dictresize.patch is quite big, it seems like half of the patch is more 
cleanup/refactoring. Can you please split the patch into two parts, to only a 
smaller patch just for the optimization part?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28199] Compact dict resizing is doing too much work

2016-09-30 Thread STINNER Victor

STINNER Victor added the comment:

Oh, your message reminded me that I always wanted an option in the timeit 
module to run the benchmark on two Python versions and then directly compare 
the result. I just added the feature to perf and then I released perf 0.7.11, 
enjoy! The output is more compact and it's more reliable because the comparison 
ensures that the difference is significant.

---
$ export PYTHONPATH=~/prog/GIT/perf
$ ./python-resize -m perf timeit --inherit-environ=PYTHONPATH 
--compare-to=./python-ref -s 'x = range(1000); d={}' 'for i in x: d[i]=i; del 
d[i];' --rigorous
python-ref:  77.6 us +- 1.8 us
python-resize:  74.8 us +- 1.9 us

Median +- std dev: [python-ref] 77.6 us +- 1.8 us -> [python-resize] 74.8 us +- 
1.9 us: 1.04x faster
---

I can reproduced the 4% speedup.

(I didn't review the patch yet.)

--
nosy: +haypo

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28199] Compact dict resizing is doing too much work

2016-09-30 Thread INADA Naoki

INADA Naoki added the comment:

$ ./install/bin/python3.6-default -m perf timeit -s 'x = range(1000); d={}' 
'for i in x: d[i]=i; del d[i];'

Median +- std dev: 363 us +- 11 us
$ ./install/bin/python3.6 -m perf timeit -s 'x = range(1000); d={}' 'for i in 
x: d[i]=i; del d[i];'  # patched

Median +- std dev: 343 us +- 17 us
$ ./install/bin/python3.6-default -m perf timeit -s 'x = range(1000); d={}' 
'for i in x: d[i]=i; del d[i];'

Median +- std dev: 362 us +- 11 us
$ ./install/bin/python3.6 -m perf timeit -s 'x = range(1000); d={}' 'for i in 
x: d[i]=i; del d[i];'  # patched

Median +- std dev: 342 us +- 14 us
$ ./install/bin/python3.6-default -m perf timeit -s 'x = range(1000); d={}' 
'for i in x: d[i]=i; del d[i];'

Median +- std dev: 364 us +- 11 us

--
assignee:  -> inada.naoki

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28199] Compact dict resizing is doing too much work

2016-09-29 Thread INADA Naoki

INADA Naoki added the comment:

As written in comment, reusing keys object breaks odict implementation.
I think we can't avoid copy for Python 3.6.

--
keywords: +patch
Added file: http://bugs.python.org/file44888/dictresize.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28199] Compact dict resizing is doing too much work

2016-09-18 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

> (After OrderedDict implementation is improved, functools.lru_cache can use 
> OrderedDict and remove doubly linked list too.)

functools.lru_cache can use just ordered dict. But simple implementation is 1.5 
times slower. I'm working on this.

I think that changing implementation of lru_cache and OrderedDict is a new 
feature and can came only in 3.7, when new dict implementation will be more 
tested.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28199] Compact dict resizing is doing too much work

2016-09-18 Thread INADA Naoki

INADA Naoki added the comment:

> We can still clean this up for Python 3.6.  We're in feature freeze, not 
> development freeze.

Does it mean there is a chance to improve OrderedDict to use new dict 
implementation, if it seems safe enough?
Is new implementation a feature?

(After OrderedDict implementation is improved, functools.lru_cache can use 
OrderedDict and remove doubly linked list too.)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28199] Compact dict resizing is doing too much work

2016-09-18 Thread INADA Naoki

INADA Naoki added the comment:

Current compact ordered dict implementation is bit different from yours.
When there was removed item, there are NULL entries in ma_entries, instead of 
swapping last item and deleted item.
It's important to keep insertion order.

But it's easy to detect clean dict. Your suggestion can be used when:

* dk_lookup == lookdict_unicode_nodummy: There are no dummies, all keys are 
unicode, and no NULL entries.
* ma_used == dk_nentries: It means there are no NULL entries. (All deletion 
except .popitem() is allowed)

I think dictresize for split table can be split function. But I don't know it 
can improve performance or readability.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28199] Compact dict resizing is doing too much work

2016-09-18 Thread Xiang Zhang

Xiang Zhang added the comment:

Then how about entries(key, value pairs)? The size of entries does not match 
the available hash slots. For example, an empty dict allocates a hash array 
with 5 available slots and 5 entries. Now we resize the hash array to size 16 
and it can afford 10 entries but you only get room for 5 entries. How could we 
insert the 6th entry?

--
nosy: +xiang.zhang

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28199] Compact dict resizing is doing too much work

2016-09-18 Thread Raymond Hettinger

Raymond Hettinger added the comment:

Just before the re-insertion, we should also do a compaction-in-place for the 
keys/values/hashes array if it has a significant number of holes for previously 
deleted entries.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28199] Compact dict resizing is doing too much work

2016-09-18 Thread Raymond Hettinger

New submission from Raymond Hettinger:

The dictresize() method unnecessarily copies the keys, values, and hashes as 
well as making insertions in the index table.   Only the latter step is 
necessary or desirable.

Here in the pure Python code for resizing taking from the original 
proof-of-concept code at https://code.activestate.com/recipes/578375

def _resize(self, n):
'''Reindex the existing hash/key/value entries.
   Entries do not get moved, they only get new indices.
   No calls are made to hash() or __eq__().

'''
n = 2 ** n.bit_length() # round-up to power-of-two
self.indices = self._make_index(n)
for index, hashvalue in enumerate(self.hashlist):
for i in Dict._gen_probes(hashvalue, n-1):
if self.indices[i] == FREE:
break
self.indices[i] = index
self.filled = self.used

And here is a rough sketch of what it would look like in the C code (very 
rough, not yet compileable):

static void
insert_indices_clean(PyDictObject *mp, Py_hash_t hash)
{
size_t i, perturb;
PyDictKeysObject *k = mp->ma_keys;
size_t mask = (size_t)DK_SIZE(k)-1;

i = hash & mask;
for (perturb = hash; dk_get_index(k, i) != DKIX_EMPTY;
 perturb >>= PERTURB_SHIFT) {
i = mask & ((i << 2) + i + perturb + 1);
}
dk_set_index(k, i, k->dk_nentries);
}

static int
dictresize(PyDictObject *mp, Py_ssize_t minused)
{
Py_ssize_t i, newsize;
PyDictKeyEntry *ep0;

/* Find the smallest table size > minused. */
for (newsize = PyDict_MINSIZE;
 newsize <= minused && newsize > 0;
 newsize <<= 1)
;
if (newsize <= 0) {
PyErr_NoMemory();
return -1;
}

/* Resize and zero-out the indices array */
realloc(dk->dk_indices, es * newsize);
memset(>dk_indices.as_1[0], 0xff, es * size);
dk->dk_size = size;

/* Loop over hashes, skipping NULLs, inserting new indices */
for (i = 0; i < mp->dk_nentries; i++) {
PyDictKeyEntry *ep = [i];
if (ep->me_value != NULL) {
insert_indices_clean(mp, ep->me_hash);
}
}
return 0;
}

--
components: Interpreter Core
messages: 276921
nosy: methane, rhettinger, serhiy.storchaka
priority: normal
severity: normal
status: open
title: Compact dict resizing is doing too much work
versions: Python 3.6

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com