INADA Naoki added the comment:

I also feel set created from iterator is too large sometimes.
Especially, frozenset is used for "unordered tuple" dict key sometimes.

$ python -c 'N = 6000; from sys import getsizeof; s = 
[getsizeof(frozenset(range(n))) for n in range(N)]; print( [(n, s[n]) for n in 
range(N) if not n or s[n] != s[n-1]] )'
[(0, 224), (5, 736), (21, 2272), (85, 8416), (341, 32992), (1365, 131296), 
(5461, 524512)]

$ python -c 'N = 6000; from sys import getsizeof; s = 
[getsizeof(dict.fromkeys(range(n))) for n in range(N)]; print( [(n, s[n]) for n 
in range(N) if not n or s[n] != s[n-1]] )'
[(0, 240), (6, 368), (11, 648), (22, 1184), (43, 2280), (86, 4704), (171, 
9320), (342, 18528), (683, 36968), (1366, 73824), (2731, 147560), (5462, 
295008)]

set size increases by *4 each resize.
This is code of set_add_entry():

    if ((size_t)so->fill*5 < mask*3)
        return 0;
    return set_table_resize(so, so->used>50000 ? so->used*2 : so->used*4);

If we simply replace `so->used>50000 ? so->used*2 : so->used*4` with 
`so->used*2`, there may be not enough room when there are some dummy entries. 
(used + dummy = fill).

But how about replacing `*4` with `*3`?
In general, load factor after resize will be changed from 12.5% ~ 25% to 16.6% 
~ 33.3%. It seems OK to me.
In case of no removal (e.g. constructor called with iterator), set size 
increases by *2 each time:

>>> for n in range(3, 20):
...     print(math.ceil(2 ** n * 3/5) * 3, 2 ** (n+1))
15 16
30 32
60 64
117 128
231 256
462 512
924 1024
1845 2048
3687 4096
7374 8192
14748 16384
29493 32768
58983 65536
117966 131072
235932 262144
471861 524288
943719 1048576

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue29961>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to