[issue30040] new empty dict can be more small

2019-03-15 Thread Inada Naoki


Inada Naoki  added the comment:

> I do not like how much code is needed for such minor optimization.

PR 12307 is +46 lines.  Benefit is about 10ns when first insertion.

$ cpython/python.opt -m perf timeit --compare-to master/python.opt 
--python-names master:empty-dict2 --duplicate 100 'd={};d["a"]=1'
master: . 62.6 ns +- 1.1 ns
empty-dict2: . 52.5 ns +- 1.0 ns

Mean +- std dev: [master] 62.6 ns +- 1.1 ns -> [empty-dict2] 52.5 ns +- 1.0 ns: 
1.19x faster (-16%)


While "1.19x faster" is significant, I agree that this optimization is "minor".
_PyDict_NewPresized() is used for important cases.  So this optimization 
doesn't work in most case.


But I came up with an idea.  _PyDict_NewPresized(n) skips preallocation when n 
fit in minsize table.
https://github.com/python/cpython/pull/12307/commits/93906f1568b08c59e0afddfc98070827c65cd0f9

With this idea, "first insertion optimization" works.  Additionally, it may 
help branch prediction
for `while (newsize < minsize) { newsize <<= 1 }` loop.

Total diff is still +46 lines.  But it affects more cases.


$ alias compare='cpython/python.opt -m perf timeit --compare-to 
master/python.opt --python-names master:empty-dict2'

$ compare -s 'def f(x, **kw): pass' --duplicate 10 -- 'f(4)'  # No table
Mean +- std dev: [master] 65.4 ns +- 2.3 ns -> [empty-dict2] 64.5 ns +- 1.5 ns: 
1.01x faster (-1%)

$ compare -s 'def f(x, **kw): pass' --duplicate 10 -- 'f(4, a=1)'  # minsize 
table is allocated
Mean +- std dev: [master] 152 ns +- 3 ns -> [empty-dict2] 144 ns +- 4 ns: 1.05x 
faster (-5%)

$ compare -s 'def f(x, **kw): pass' --duplicate 10 -- 'f(4, a=1,b=2)'
Mean +- std dev: [master] 211 ns +- 3 ns -> [empty-dict2] 186 ns +- 3 ns: 1.13x 
faster (-12%)

$ compare -s 'def f(x, **kw): pass' --duplicate 10 -- 'f(4, a=1,b=2,c=3)'
Mean +- std dev: [master] 248 ns +- 3 ns -> [empty-dict2] 223 ns +- 3 ns: 1.11x 
faster (-10%)

$ compare -s 'def f(x, **kw): pass' --duplicate 10 -- 'f(4, 
a=1,b=2,c=3,d=4,e=5)'
Mean +- std dev: [master] 327 ns +- 6 ns -> [empty-dict2] 301 ns +- 6 ns: 1.09x 
faster (-8%)

$ compare -s 'def f(x, **kw): pass' --duplicate 10 -- 'f(4, 
a=1,b=2,c=3,d=4,e=5,f=6)'  # minsize*2 table is allocated
Mean +- std dev: [master] 431 ns +- 8 ns -> [empty-dict2] 406 ns +- 8 ns: 1.06x 
faster (-6%)


Now I think PR 12307 is not so minor.  Of course, same idea can be applied to 
PR 12308.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30040] new empty dict can be more small

2019-03-14 Thread Inada Naoki


Inada Naoki  added the comment:


New changeset 3fe7fa316f74ed630fbbcdf54564f15cda7cb045 by Inada Naoki in branch 
'master':
bpo-30040: update news entry (GH-12324)
https://github.com/python/cpython/commit/3fe7fa316f74ed630fbbcdf54564f15cda7cb045


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30040] new empty dict can be more small

2019-03-14 Thread Inada Naoki


Change by Inada Naoki :


--
pull_requests: +12297

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30040] new empty dict can be more small

2019-03-14 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

I do not like how much code is needed for such minor optimization.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30040] new empty dict can be more small

2019-03-14 Thread Inada Naoki


Inada Naoki  added the comment:

Benchmark result is here.

https://github.com/methane/sandbox/tree/master/2019.01/empty-dict

notes:

* Overhead introduced by PR 1080 (https://bugs.python.org/issue30040#msg337778) 
is cancelled by first insert optimization.  It is now faster than before.

* PR 12307 (first insert optimization) is about 2x faster on creating & destroy 
empty dict.  Other performance diff is small.

* PR 12308 (use keys=NULL instead of shared empty keys) is more faster than PR 
12307 on some cases, while it requires much `if (md->ma_keys == NULL)` checks.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30040] new empty dict can be more small

2019-03-13 Thread Inada Naoki


Inada Naoki  added the comment:

I created two PRs.  PR 12307 just optimize inserting to empty dict.
PR 12308 has same optimization and removes shared empty key (ma_keys = NULL).

I confirmed PR 12307 is faster than before about `d = {}; d["a"]=None`.
I'll benchmark them later.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30040] new empty dict can be more small

2019-03-13 Thread Inada Naoki


Change by Inada Naoki :


--
pull_requests: +12284

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30040] new empty dict can be more small

2019-03-13 Thread Inada Naoki


Change by Inada Naoki :


--
keywords: +patch
pull_requests: +12283
stage: resolved -> patch review

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30040] new empty dict can be more small

2019-03-12 Thread Inada Naoki


Inada Naoki  added the comment:

On Wed, Mar 13, 2019 at 4:46 AM Raymond Hettinger
 wrote:
>
> Raymond Hettinger  added the comment:
>
> I hate to see you working so hard to make this work.  IMO, the effort is 
> futile.  Dicts were intentionally designed to do a single memory allocation 
> and memset(0) to be fully ready to store data.  This PR is undoing that 
> decision and will make Python worse rather than better.

Please that this PR is not do it.  From Python 3.6, dict uses two
allocations.  No embedded small table.
This PR just move 2nd allocation from dict creation to first insertion.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30040] new empty dict can be more small

2019-03-12 Thread Terry J. Reedy


Terry J. Reedy  added the comment:

Some of my thoughts on this.

Conceptually, I expected that clearing a normal dict should make it an empty 
normal dict.  I presume that making it instead an empty shared key dict is a 
matter of efficiency and code simplicity.  If the 'anomaly' is to be corrected, 
changing .clear would be an alternative.

The fact that this patch 'rescues' people who use .setdefault when 
collections.defaultdict would be better does not especially persuade me 
(msg291817).  The dict method doc and docstring could refer to defaultdict for 
such situations.

In 3.8.0a2, empty sets, like empty dicts, are ready to add. Empty lists in a2 
are *not*, so pre-allocation is not universal in CPython for mutable 
collections.

--
nosy: +terry.reedy

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30040] new empty dict can be more small

2019-03-12 Thread Mark Shannon


Mark Shannon  added the comment:

Serhiy, for {'a': None} the dict is created using _PyDict_NewPresized() so this 
makes no difference.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30040] new empty dict can be more small

2019-03-12 Thread Mark Shannon


Mark Shannon  added the comment:

In general, I agree with Raymond that this is likely to counter-productive.

But let's not guess, let's measure :)

I expect that there are few live empty dicts at any time for most programs. In 
which case there is no point in any change that attempts to save memory use for 
empty dicts.

But I could be wrong. If there commonly are lots of live empty dicts,
then some sort of optimisation could be appropriate.


I should also add that dict.clear() uses a key-sharing dict to avoid 
allocation, because PyDict_Clear() is a void function so there is no way to 
handle an allocation failure.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30040] new empty dict can be more small

2019-03-12 Thread Raymond Hettinger


Raymond Hettinger  added the comment:

> Hmm, while 2x faster temporal empty dict is nice, 
> 10% slower case can be mitigated.
> I will try more micro benchmarks and optimizations.

I hate to see you working so hard to make this work.  IMO, the effort is 
futile.  Dicts were intentionally designed to do a single memory allocation and 
memset(0) to be fully ready to store data.  This PR is undoing that decision 
and will make Python worse rather than better.  The PR is optimizing for the 
wrong thing.  The space used by empty dicts is something we care much less 
about than the cost of the first assignment.

No "mitigations" are going to help.  If there is a second malloc and we incur 
the cost of switching from sharing-to-non-sharing, that is additional work that 
will have be done by every single dictionary that actually gets used.  Nothing 
will get that lost work back.

FWIW, if we were going to optimize for space used by empty dicts, the table 
pointer could just be set to NULL.  That would be better than key sharing which 
is not only slower but also conceptually wrong (outside the context of instance 
creation, dicts will never be shared).

> For the record, legitimate case when many empty dicts are 
> created, and few are populated, is the collections-free 
> approach to defaultdict(dict):

While it is helpful to know that there is some possible application that would 
benefit, we don't optimize for the rare case, we optimize for the common case 
where dicts get used.  A substantial fraction of the language is implemented 
using dicts. For the most part, we use NULL values when the dict isn't actually 
needed; so, the normal case is that dicts do get used.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30040] new empty dict can be more small

2019-03-12 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

What about creating small dict using a dict display? E.g. {'a': None}.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30040] new empty dict can be more small

2019-03-12 Thread STINNER Victor


Change by STINNER Victor :


--
nosy: +vstinner

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30040] new empty dict can be more small

2019-03-12 Thread Inada Naoki


Inada Naoki  added the comment:

Another micro benchmark:

$ ./py.edict.opt -m perf timeit --compare-to ./py.master.opt '{}' --duplicate=10
py.master.opt: . 26.3 ns +- 0.5 ns
py.edict.opt: . 13.0 ns +- 0.1 ns

Mean +- std dev: [py.master.opt] 26.3 ns +- 0.5 ns -> [py.edict.opt] 13.0 ns +- 
0.1 ns: 2.02x faster (-51%)

$ ./py.edict.opt -m perf timeit --compare-to ./py.master.opt 'd={}; 
d["a"]=None' --duplicate=10
py.master.opt: . 58.1 ns +- 0.9 ns
py.edict.opt: . 64.1 ns +- 0.9 ns

Mean +- std dev: [py.master.opt] 58.1 ns +- 0.9 ns -> [py.edict.opt] 64.1 ns +- 
0.9 ns: 1.10x slower (+10%)


Hmm, while 2x faster temporal empty dict is nice, 10% slower case can be 
mitigated.
I will try more micro benchmarks and optimizations.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30040] new empty dict can be more small

2019-03-12 Thread Inada Naoki


Change by Inada Naoki :


--
nosy: +Mark.Shannon

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30040] new empty dict can be more small

2019-03-12 Thread Inada Naoki


Inada Naoki  added the comment:

> I don't think this should have been done.  Conceptually, there is no basis 
> for presuming key-sharing for new empty dicts -- you can't know what they 
> would share with.  
This patch essentially undoes the entire reason for having a pre-allocated 
minsize dict.  If it were deemed to be the norm that applications typically had 
huge numbers of empty dicts that were never populated, then the correct 
solution would be a NULL pointer to the table field (as dicts do).

It increases massive NULL checks, and possibly cause bugs.

> FWIW, the macro benchmarks aren't very informative here because they don't 
> exercise much of this code.  I think there is an over-prioritization of small 
> space savings at the expense of the intended use cases for dicts.  This patch 
> just forces every dict that gets used to have to convert back from a 
> key-sharing dict and do a memory allocation and memset(0) in the process.  
> The whole point of the minsize dict was to avoid that cost in the common case 
> of small dicts (i.e. passing keyword arguments).

Note that there is _PyDict_NewPresized().

When kwargs is empty, this patch increase dict creation speed.
When kwargs is not empty, temporary key-sharing table is not used because we 
use _PyDict_NewPresized() is used.

$ ./py.edict bm_edict.py --compare-to ./py.master
py.master: . 192 ns +- 7 ns
py.edict: . 175 ns +- 4 ns
Mean +- std dev: [py.master] 192 ns +- 7 ns -> [py.edict] 175 ns +- 4 ns: 1.10x 
faster (-9%)

py.master: . 269 ns +- 4 ns
py.edict: . 273 ns +- 2 ns
Mean +- std dev: [py.master] 269 ns +- 4 ns -> [py.edict] 273 ns +- 2 ns: 1.02x 
slower (+2%)

So I don't think net performance of kwargs doesn't get worse.

--
nosy:  -Mark.Shannon

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30040] new empty dict can be more small

2019-03-12 Thread Raymond Hettinger


Change by Raymond Hettinger :


--
nosy: +Mark.Shannon

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30040] new empty dict can be more small

2019-03-12 Thread Raymond Hettinger


Raymond Hettinger  added the comment:

I don't think this should have been done.  Conceptually, there is no basis for 
presuming key-sharing for new empty dicts -- you can't know what they would 
share with.  This patch essentially undoes the entire reason for having a 
pre-allocated minsize dict.  If it were deemed to be the norm that applications 
typically had huge numbers of empty dicts that were never populated, then the 
correct solution would be a NULL pointer to the table field (as dicts do).

FWIW, the macro benchmarks aren't very informative here because they don't 
exercise much of this code.  I think there is an over-prioritization of small 
space savings at the expense of the intended use cases for dicts.  This patch 
just forces every dict that gets used to have to convert back from a 
key-sharing dict and do a memory allocation and memset(0) in the process.  The 
whole point of the minsize dict was to avoid that cost in the common case of 
small dicts (i.e. passing keyword arguments).

--
nosy: +rhettinger, tim.peters
status: closed -> open

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30040] new empty dict can be more small

2019-03-12 Thread Inada Naoki


Change by Inada Naoki :


--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed
versions: +Python 3.8 -Python 3.7

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30040] new empty dict can be more small

2019-03-12 Thread Inada Naoki


Inada Naoki  added the comment:


New changeset f2a186712bfe726d54723eba37d80c7f0303a50b by Inada Naoki in branch 
'master':
bpo-30040: new empty dict uses key-sharing dict (GH-1080)
https://github.com/python/cpython/commit/f2a186712bfe726d54723eba37d80c7f0303a50b


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30040] new empty dict can be more small

2017-04-17 Thread Josh Rosenberg

Josh Rosenberg added the comment:

For the record, legitimate case when many empty dicts are created, and few are 
populated, is the collections-free approach to defaultdict(dict):

mydict.setdefault(key1, {})[key2] = val

For, say, 100 unique key1s, and 10,000 total key1/key2 pairs, you'd create 
10,000 empty dicts, discarding 9,900 of them. Granted, 
collections.defaultdict(dict) is even better (avoids the 9,900 unused dicts 
entirely), but I see the setdefault pattern enough, usually with list or dict, 
that it's not totally unreasonable to account for it.

--
nosy: +josh.r

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30040] new empty dict can be more small

2017-04-17 Thread Louie Lu

Louie Lu added the comment:

I'm testing[1] that if we make a fast path that detect if keys is 
`empty_keys_struct` inside `dictresize`. It can be faster than original patch, 
but still slower than default (unpatch) in most case.


➜  cpython git:(compact_empty_dict) ✗ ./python.default -m perf compare_to -G 
--min-speed=1 default.json lpatch.json  
Slower (14):
- pickle_dict: 91.1 us +- 2.2 us -> 98.0 us +- 3.3 us: 1.08x slower (+8%)
- xml_etree_parse: 534 ms +- 29 ms -> 565 ms +- 27 ms: 1.06x slower (+6%)
- crypto_pyaes: 679 ms +- 22 ms -> 708 ms +- 22 ms: 1.04x slower (+4%)
- regex_effbot: 12.1 ms +- 0.2 ms -> 12.6 ms +- 0.2 ms: 1.04x slower (+4%)
- tornado_http: 678 ms +- 21 ms -> 704 ms +- 31 ms: 1.04x slower (+4%)
- pidigits: 432 ms +- 7 ms -> 447 ms +- 18 ms: 1.03x slower (+3%)
- spectral_norm: 869 ms +- 21 ms -> 898 ms +- 22 ms: 1.03x slower (+3%)
- unpickle_list: 20.6 us +- 0.6 us -> 21.2 us +- 0.8 us: 1.03x slower (+3%)
- pathlib: 87.9 ms +- 3.0 ms -> 90.6 ms +- 3.5 ms: 1.03x slower (+3%)
- pickle_list: 13.0 us +- 0.3 us -> 13.3 us +- 0.4 us: 1.03x slower (+3%)
- meteor_contest: 367 ms +- 13 ms -> 378 ms +- 14 ms: 1.03x slower (+3%)
- scimark_sor: 991 ms +- 28 ms -> 1.02 sec +- 0.03 sec: 1.03x slower (+3%)
- sympy_expand: 1.73 sec +- 0.05 sec -> 1.77 sec +- 0.04 sec: 1.02x slower (+2%)
- python_startup: 29.5 ms +- 1.1 ms -> 30.1 ms +- 1.9 ms: 1.02x slower (+2%)

Faster (8):
- sympy_integrate: 84.3 ms +- 8.3 ms -> 78.3 ms +- 5.0 ms: 1.08x faster (-7%)
- call_simple: 30.6 ms +- 1.7 ms -> 29.0 ms +- 1.4 ms: 1.06x faster (-5%)
- pickle: 43.2 us +- 3.2 us -> 41.1 us +- 1.9 us: 1.05x faster (-5%)
- call_method_unknown: 36.4 ms +- 1.6 ms -> 35.0 ms +- 1.5 ms: 1.04x faster 
(-4%)
- scimark_lu: 781 ms +- 42 ms -> 752 ms +- 34 ms: 1.04x faster (-4%)
- sympy_sum: 385 ms +- 21 ms -> 372 ms +- 17 ms: 1.03x faster (-3%)
- logging_silent: 1.30 us +- 0.04 us -> 1.26 us +- 0.04 us: 1.03x faster (-3%)
- django_template: 665 ms +- 20 ms -> 643 ms +- 18 ms: 1.03x faster (-3%)

Benchmark hidden because not significant (42)

[1]: 
https://github.com/lulouie/cpython/blob/compact_empty_dict/Objects/dictobject.c#L1247

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30040] new empty dict can be more small

2017-04-16 Thread Louie Lu

Louie Lu added the comment:

forgive my words, I trace the wrong code, sorry about that.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30040] new empty dict can be more small

2017-04-16 Thread Louie Lu

Louie Lu added the comment:

Inada's patch version act different inside `PyObject_SetItem`, 

when running this code: 'x = {}; x['a'] = 123'

at PyObject_SetItem,

patch version goes to this line:
  >│179 if (m && m->mp_ass_subscript)
   │180 return m->mp_ass_subscript(o, key, value);

but original version goes to:
  >│182 if (o->ob_type->tp_as_sequence) {
   │183 if (PyIndex_Check(key)) {

I think that's why the performance issue came out, still digging why this 
happened.

--
nosy: +louielu

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30040] new empty dict can be more small

2017-04-14 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

I got following result:

$ ./python.patched -m perf timeit --compare-to=./python.default --duplicate=100 
-- '{}'
python.default: . 203 ns +- 5 ns
python.patched: . 97.1 ns +- 0.7 ns

Mean +- std dev: [python.default] 203 ns +- 5 ns -> [python.patched] 97.1 ns +- 
0.7 ns: 2.09x faster (-52%)

$ ./python.patched -m perf timeit --compare-to=./python.default --duplicate=100 
-- 'x={}; x[1]=1'
python.default: . 494 ns +- 5 ns
python.patched: . 592 ns +- 7 ns

Mean +- std dev: [python.default] 494 ns +- 5 ns -> [python.patched] 592 ns +- 
7 ns: 1.20x slower (+20%)

Seems something is wrong with resizing an empty dict. It shouldn't take such 
much time.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30040] new empty dict can be more small

2017-04-11 Thread INADA Naoki

INADA Naoki added the comment:

> I mean creating a solo empty dict doesn't seem to make much sense. Although 
> it saves memory, but when it's populated, it's resized and the memory 
> occupation comes back.

But sometimes it's not populated.

class A:
def __init__(self, **kwargs):
self._extra = kwargs

xa = [A() for _ in range(1000)]

So problem is (a) how many empty dicts, and (b) how much memory this patch 
saves.

> And this makes PyDict_New() hard to understand. :-(

Yes, but it is not new complexity because it's same to d.clear().

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30040] new empty dict can be more small

2017-04-11 Thread R. David Murray

R. David Murray added the comment:

Sorry, but I no longer have access to that application (I'm a consultant, and 
the owner is no longer a client).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30040] new empty dict can be more small

2017-04-11 Thread INADA Naoki

INADA Naoki added the comment:

Thank you for your reply.
Would you try to check how the patch [1] affects memory usage of your 
application?
I think the patch can be applied to 3.6 easily.

[1] https://patch-diff.githubusercontent.com/raw/python/cpython/pull/1080.patch

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30040] new empty dict can be more small

2017-04-11 Thread R. David Murray

R. David Murray added the comment:

I've worked on an application (proprietary, unfortunately) that created a lot 
of empty dictionaries that only sometimes got populated.  It involved 
sqlalchemy, but I don't remember if the dicts came from sqlalchemy itself or 
from the code that used it.  That application did care about memory, and the 
shared-key dicts were a big benefit to it.

--
nosy: +r.david.murray

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30040] new empty dict can be more small

2017-04-11 Thread INADA Naoki

INADA Naoki added the comment:

While I think it's preferable that {} and d.clear() have same memory footprint,
I need real world example which empty dict affects overall memory usage.

I'll check memory usage difference with application I used in this ML thread.
https://mail.python.org/pipermail/python-dev/2017-January/147194.html

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30040] new empty dict can be more small

2017-04-11 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Use "--duplicate 100" when making microbenchmarks for such fast operations. The 
overhead of iterating can be significant and comparable with the time of the 
operation.

--
nosy: +serhiy.storchaka
stage:  -> patch review

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30040] new empty dict can be more small

2017-04-11 Thread Xiang Zhang

Xiang Zhang added the comment:

I mean creating a solo empty dict doesn't seem to make much sense. Although it 
saves memory, but when it's populated, it's resized and the memory occupation 
comes back.

And this makes PyDict_New() hard to understand. :-(

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30040] new empty dict can be more small

2017-04-11 Thread INADA Naoki

INADA Naoki added the comment:

macro bench result:

$ ./python.default -m perf compare_to -G --min-speed=1 default.json patched.json
Slower (11):
- scimark_lu: 362 ms +- 13 ms -> 383 ms +- 22 ms: 1.06x slower (+6%)
- unpickle_pure_python: 882 us +- 18 us -> 924 us +- 18 us: 1.05x slower (+5%)
- regex_v8: 45.4 ms +- 0.6 ms -> 46.7 ms +- 3.1 ms: 1.03x slower (+3%)
- mako: 40.4 ms +- 0.4 ms -> 41.4 ms +- 0.4 ms: 1.03x slower (+3%)
- meteor_contest: 200 ms +- 1 ms -> 204 ms +- 2 ms: 1.02x slower (+2%)
- genshi_text: 88.8 ms +- 1.2 ms -> 90.1 ms +- 1.6 ms: 1.01x slower (+1%)
- scimark_monte_carlo: 255 ms +- 6 ms -> 258 ms +- 7 ms: 1.01x slower (+1%)
- richards: 176 ms +- 4 ms -> 178 ms +- 8 ms: 1.01x slower (+1%)
- pickle: 24.2 us +- 0.5 us -> 24.4 us +- 0.7 us: 1.01x slower (+1%)
- sympy_str: 438 ms +- 3 ms -> 442 ms +- 3 ms: 1.01x slower (+1%)
- genshi_xml: 196 ms +- 3 ms -> 198 ms +- 2 ms: 1.01x slower (+1%)

Faster (7):
- logging_silent: 746 ns +- 12 ns -> 722 ns +- 11 ns: 1.03x faster (-3%)
- xml_etree_generate: 272 ms +- 4 ms -> 264 ms +- 4 ms: 1.03x faster (-3%)
- telco: 20.7 ms +- 0.7 ms -> 20.2 ms +- 0.4 ms: 1.02x faster (-2%)
- xml_etree_parse: 311 ms +- 13 ms -> 305 ms +- 12 ms: 1.02x faster (-2%)
- nqueens: 266 ms +- 4 ms -> 262 ms +- 2 ms: 1.02x faster (-2%)
- unpack_sequence: 123 ns +- 1 ns -> 122 ns +- 2 ns: 1.01x faster (-1%)
- raytrace: 1.27 sec +- 0.01 sec -> 1.25 sec +- 0.01 sec: 1.01x faster (-1%)

Benchmark hidden because not significant (46)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30040] new empty dict can be more small

2017-04-11 Thread INADA Naoki

INADA Naoki added the comment:

> Isn't the latter case the more common one? Creating an empty dict and then 
> populate it.

This is memory usage optimization, not performance optimization.
(But I think memory efficiency makes multi process application faster because 
L3 cache size is limited resource.)
Later case shows how performance penalty is large.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30040] new empty dict can be more small

2017-04-11 Thread Xiang Zhang

Xiang Zhang added the comment:

Isn't the latter case the more common one? Creating an empty dict and then 
populate it.

--
nosy: +xiang.zhang

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30040] new empty dict can be more small

2017-04-11 Thread INADA Naoki

Changes by INADA Naoki :


--
components: +Interpreter Core
type:  -> resource usage
versions: +Python 3.7

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30040] new empty dict can be more small

2017-04-11 Thread INADA Naoki

INADA Naoki added the comment:

performance impact

best case:
$ ./python.patched -m perf timeit --compare-to=`pwd`/python.default  -- '{}'
python.default: . 36.9 ns +- 0.9 ns
python.patched: . 25.3 ns +- 0.7 ns
Mean +- std dev: [python.default] 36.9 ns +- 0.9 ns -> [python.patched] 25.3 ns 
+- 0.7 ns: 1.46x faster (-31%)

worst case:
$ ./python.patched -m perf timeit --compare-to=`pwd`/python.default  -- 'x={}; 
x["a"]=1'
python.default: . 73.3 ns +- 1.2 ns
python.patched: . 82.8 ns +- 1.8 ns
Mean +- std dev: [python.default] 73.3 ns +- 1.2 ns -> [python.patched] 82.8 ns 
+- 1.8 ns: 1.13x slower (+13%)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30040] new empty dict can be more small

2017-04-11 Thread INADA Naoki

Changes by INADA Naoki :


--
pull_requests: +1223

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30040] new empty dict can be more small

2017-04-11 Thread INADA Naoki

New submission from INADA Naoki:

dict.clear() make the dict to empty key-sharing dict to reduce it's size.
New dict can use same technique.

$ ./python.default 
Python 3.7.0a0 (heads/master:6dfcc81, Apr 10 2017, 19:55:52) 
[GCC 6.2.0 20161005] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> d = {}
>>> sys.getsizeof(d)
240
>>> d.clear()
>>> sys.getsizeof(d)
72

$ ./python.patched 
Python 3.7.0a0 (heads/master-dirty:6dfcc81, Apr 11 2017, 18:11:02) 
[GCC 6.2.0 20161005] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.getsizeof({})
72

--
messages: 291470
nosy: inada.naoki
priority: normal
severity: normal
status: open
title: new empty dict can be more small

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com