INADA Naoki added the comment:
I also feel set created from iterator is too large sometimes.
Especially, frozenset is used for "unordered tuple" dict key sometimes.
$ python -c 'N = 6000; from sys import getsizeof; s =
[getsizeof(frozenset(range(n))) for n in range(N)]; print( [(n, s[n]) for n in
range(N) if not n or s[n] != s[n-1]] )'
[(0, 224), (5, 736), (21, 2272), (85, 8416), (341, 32992), (1365, 131296),
(5461, 524512)]
$ python -c 'N = 6000; from sys import getsizeof; s =
[getsizeof(dict.fromkeys(range(n))) for n in range(N)]; print( [(n, s[n]) for n
in range(N) if not n or s[n] != s[n-1]] )'
[(0, 240), (6, 368), (11, 648), (22, 1184), (43, 2280), (86, 4704), (171,
9320), (342, 18528), (683, 36968), (1366, 73824), (2731, 147560), (5462,
295008)]
set size increases by *4 each resize.
This is code of set_add_entry():
if ((size_t)so->fill*5 < mask*3)
return 0;
return set_table_resize(so, so->used>50000 ? so->used*2 : so->used*4);
If we simply replace `so->used>50000 ? so->used*2 : so->used*4` with
`so->used*2`, there may be not enough room when there are some dummy entries.
(used + dummy = fill).
But how about replacing `*4` with `*3`?
In general, load factor after resize will be changed from 12.5% ~ 25% to 16.6%
~ 33.3%. It seems OK to me.
In case of no removal (e.g. constructor called with iterator), set size
increases by *2 each time:
>>> for n in range(3, 20):
... print(math.ceil(2 ** n * 3/5) * 3, 2 ** (n+1))
15 16
30 32
60 64
117 128
231 256
462 512
924 1024
1845 2048
3687 4096
7374 8192
14748 16384
29493 32768
58983 65536
117966 131072
235932 262144
471861 524288
943719 1048576
----------
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue29961>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com