INADA Naoki added the comment: I also feel set created from iterator is too large sometimes. Especially, frozenset is used for "unordered tuple" dict key sometimes.
$ python -c 'N = 6000; from sys import getsizeof; s = [getsizeof(frozenset(range(n))) for n in range(N)]; print( [(n, s[n]) for n in range(N) if not n or s[n] != s[n-1]] )' [(0, 224), (5, 736), (21, 2272), (85, 8416), (341, 32992), (1365, 131296), (5461, 524512)] $ python -c 'N = 6000; from sys import getsizeof; s = [getsizeof(dict.fromkeys(range(n))) for n in range(N)]; print( [(n, s[n]) for n in range(N) if not n or s[n] != s[n-1]] )' [(0, 240), (6, 368), (11, 648), (22, 1184), (43, 2280), (86, 4704), (171, 9320), (342, 18528), (683, 36968), (1366, 73824), (2731, 147560), (5462, 295008)] set size increases by *4 each resize. This is code of set_add_entry(): if ((size_t)so->fill*5 < mask*3) return 0; return set_table_resize(so, so->used>50000 ? so->used*2 : so->used*4); If we simply replace `so->used>50000 ? so->used*2 : so->used*4` with `so->used*2`, there may be not enough room when there are some dummy entries. (used + dummy = fill). But how about replacing `*4` with `*3`? In general, load factor after resize will be changed from 12.5% ~ 25% to 16.6% ~ 33.3%. It seems OK to me. In case of no removal (e.g. constructor called with iterator), set size increases by *2 each time: >>> for n in range(3, 20): ... print(math.ceil(2 ** n * 3/5) * 3, 2 ** (n+1)) 15 16 30 32 60 64 117 128 231 256 462 512 924 1024 1845 2048 3687 4096 7374 8192 14748 16384 29493 32768 58983 65536 117966 131072 235932 262144 471861 524288 943719 1048576 ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue29961> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com