[issue28509] dict.update allocates too much

2017-10-23 Thread Serhiy Storchaka

Change by Serhiy Storchaka :


--
pull_requests:  -839

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28509] dict.update allocates too much

2017-03-31 Thread Donald Stufft

Changes by Donald Stufft :


--
pull_requests: +839

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28509] dict.update allocates too much

2016-10-27 Thread INADA Naoki

Changes by INADA Naoki :


--
resolution:  -> fixed
stage: commit review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28509] dict.update allocates too much

2016-10-27 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 8c2615decd2e by INADA Naoki in branch '3.6':
Issue #28509: dict.update() no longer allocate unnecessary large memory
https://hg.python.org/cpython/rev/8c2615decd2e

New changeset deb3e5857d8c by INADA Naoki in branch 'default':
Issue #28509: dict.update() no longer allocate unnecessary large memory
https://hg.python.org/cpython/rev/deb3e5857d8c

--
nosy: +python-dev

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28509] dict.update allocates too much

2016-10-27 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

28509-smaller-update2.patch LGTM.

Your idea in msg279480 looks worth, but needs benchmarking different corner 
cases to check that it doesn't cause to a regression. I think we will have 
enough time for this at 3.7 developing stage.

--
assignee:  -> inada.naoki
stage: patch review -> commit review

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28509] dict.update allocates too much

2016-10-26 Thread INADA Naoki

INADA Naoki added the comment:

OK, I won't change it to allow additional resize while merging, after 
pre-resize.
But current code has two problem:

* Pre-resize happen even when resizing is not necessary. (ex. two dict has same 
keys).
* Pre-resize allocates too much memory which doesn't make sense.

Next patch (28509-smaller-update2.patch) seems better because:

* When pre-resize doesn't happen, at most one resize while merging.
* When pre-resize happens, no resize happens while merging.
* Doesn't make surprisingly sparse dict when two dicts have same keys.

PoC code:

import sys
b = {}
for i in range(16):
b[i] = i
a = b.copy()
a.update(b)  # No need to resize
print(i, sys.getsizeof(a), sys.getsizeof(b))

Current:

0 256 256
1 256 256
2 256 256
3 664 256
4 664 256
5 384 384  # !!!
6 664 384
7 664 384
8 664 384
9 664 384
10 664 664
11 664 664
12 1200 664
13 1200 664
14 1200 664
15 1200 664


With second patch:

0 256 256
1 256 256
2 256 256
3 256 256
4 256 256
5 384 384
6 384 384
7 384 384
8 384 384
9 384 384
10 664 664
11 664 664
12 664 664
13 664 664
14 664 664
15 664 664

--
Added file: http://bugs.python.org/file45229/28509-smaller-update2.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28509] dict.update allocates too much

2016-10-25 Thread Raymond Hettinger

Raymond Hettinger added the comment:

FWIW, I prefer the existing code be left as is.   We've already greatly 
compacted the dictionaries.  There is no need to be ultra-aggressive in shaving 
off every little corner.  There is some advantage to having the dict be more 
sparse (fewer collisions, quicker insertion time for the update, quicker 
lookups etc).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28509] dict.update allocates too much

2016-10-25 Thread INADA Naoki

INADA Naoki added the comment:

I feel that accept one resize while merging is better.
How about this?

/* Do one big resize at the start, rather than incrementally
 * resizing.  At most one resize happen while merging.
 */
if (USABLE_FRACTION(mp->ma_keys->dk_size) < other->ma_used) {
assert(mp->ma_used < other->ma_used);
if (dictresize(mp, ESTIMATE_SIZE(other->ma_used))) {
   return -1;
}
}

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28509] dict.update allocates too much

2016-10-25 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

LGTM.

Here is a script that produces more compact output. The first column is a size 
after which the dict is resized.

import sys
p = 1
b = {}
for i in range(1):
a = {}
b[i] = i
a.update(b)
s = sys.getsizeof(a)
if s > p:
print(i, p)
p = s

Unpatched:
5 128
7 196
15 344
31 628
63 1208
127 2612
255 5176
511 10292
1023 20536
2047 41012
4095 81976
8191 163892

Patched:
5 128
10 196
21 344
42 628
85 1208
170 2612
341 5176
682 10292
1365 20536
2730 41012
5461 81976

But I suggest instead the condition

mp->ma_keys->dk_usable < other->ma_used

use the condition

mp->ma_used + mp->ma_keys->dk_usable < other->ma_used

If there are overlapping keys this can allow to avoid resizing. In worst keys 
one resizing will be happened in dictinsert().

Yes one estimation is:

USABLE_FRACTION(2 * mp->ma_keys->dk_size) < mp->ma_used + other->ma_used

Dict size is at least doubled after resizing. No need to make preliminary 
resizing if the final result is the same. The benefit is that if there are many 
overlapping keys the final size can be ESTIMATE_SIZE(other->ma_used) instead of 
ESTIMATE_SIZE(mp->ma_used + other->ma_used).

All this conditions can be combined (because they have different computational 
cost):

mp->ma_keys->dk_usable < other->ma_used &&
mp->ma_used + mp->ma_keys->dk_usable < other->ma_used &&
USABLE_FRACTION(2 * mp->ma_keys->dk_size) < mp->ma_used + other->ma_used

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28509] dict.update allocates too much

2016-10-25 Thread Serhiy Storchaka

Changes by Serhiy Storchaka :


--
stage: needs patch -> patch review

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28509] dict.update allocates too much

2016-10-25 Thread INADA Naoki

INADA Naoki added the comment:

script:
import sys

for i in range(25):
a = {}
b = {j: j for j in range(i)}
a.update(b)
print(i, sys.getsizeof(a))

before:
0 256
1 256
2 256
3 256
4 256
5 256
6 384
7 384
8 664
9 664
10 664
11 664
12 664
13 664
14 664
15 664
16 1200
17 1200
18 1200
19 1200
20 1200
21 1200
22 1200
23 1200
24 1200

patched:
0 256
1 256
2 256
3 256
4 256
5 256
6 384
7 384
8 384
9 384
10 384
11 664
12 664
13 664
14 664
15 664
16 664
17 664
18 664
19 664
20 664
21 664
22 1200
23 1200
24 1200

--
keywords: +patch
Added file: http://bugs.python.org/file45223/28509-smaller-update.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28509] dict.update allocates too much

2016-10-23 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

> Dict is resized. And since __dict__.update() caused the resizing,
both are normal (combined) dict.

Ah, my bad, I missed this point. The original issue now disappears. Thanks 
Naoki.

But dict.update (and maybe inserting) can use more economical allocation 
strategy.

--
stage:  -> needs patch
title: Key-sharing dictionaries can inrease the memory consumption -> 
dict.update allocates too much
versions: +Python 3.6, Python 3.7

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com