[issue31179] Speed-up dict.copy() up to 5.5 times.

2018-01-22 Thread Yury Selivanov

Change by Yury Selivanov :


--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31179] Speed-up dict.copy() up to 5.5 times.

2018-01-22 Thread Yury Selivanov

Yury Selivanov  added the comment:


New changeset b0a7a037b8fde56b62f886d5188bced7776777b4 by Yury Selivanov in 
branch 'master':
bpo-31179: Make dict.copy() up to 5.5 times faster. (#3067)
https://github.com/python/cpython/commit/b0a7a037b8fde56b62f886d5188bced7776777b4


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31179] Speed-up dict.copy() up to 5.5 times.

2018-01-22 Thread Yury Selivanov

Yury Selivanov  added the comment:

Victor: done; https://bugs.python.org/issue32623

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31179] Speed-up dict.copy() up to 5.5 times.

2018-01-22 Thread STINNER Victor

STINNER Victor  added the comment:

Yury: Would you mind to open an issue to investigate why dict are not 
compatected automatically?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31179] Speed-up dict.copy() up to 5.5 times.

2018-01-21 Thread Yury Selivanov

Yury Selivanov  added the comment:

I've pushed a new version of the patch that I intend to merge tomorrow.

The last version has only one minor change: it uses fast-path for "slightly" 
non-compact dicts too (dicts don't use *at most* 3 entries).  This protects us 
from pathological cases when a huge dict being almost emptied with pop/del and 
then gets copied -- we indeed want the copy to be compact.

Although I believe that the real issue is that del and pop don't compact dicts 
from time to time, but I don't want that issue to hold off this patch in any 
way.

> If you will make dict copying removing holes and extend your patch to dict 
> constructor, it could be more useful.

It's already useful -- I'm supporting a large code base (>0.5M LOC) which uses 
dict.copy() extensively, and it shows up in profile.  I've seen it in many 
other places (particularly ORMs love to store information in dicts and use 
dict.copy() to track dirty state/changes).  Please don't say that dict.copy() 
is not a common operation or that dict(other_dict) is more common than 
other_dict.copy() -- that's simply incorrect.

> The side effect of this patch is making dict.copy() atomic. This is a worthy 
> feature if extent it to dict constructor.

I agree.  I'll work on that later in a follow-up PR.  Let's move in small steps.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31179] Speed-up dict.copy() up to 5.5 times.

2017-08-17 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

The side effect of this patch is making dict.copy() atomic. This is a worthy 
feature if extent it to dict constructor. For now the only way of making an 
atomic (or almost atomic) copy of a dict is dict(list(d.itemview())). It isn't 
very time and memory efficient.

If you will make dict copying removing holes and extend your patch to dict 
constructor, it could be more useful.

Look at the set implementation. It doesn't just use memcpy, but it contains 
specialized insertion implementation for the case if all items are unique. Fast 
copying is more important for dicts since the copying is more common for sets. 
It is a part of set operations and it is common to convert a set to a frozenset.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31179] Speed-up dict.copy() up to 5.5 times.

2017-08-11 Thread Yury Selivanov

Yury Selivanov added the comment:

> I like idea.
> One worrying point is how to deal with dirty dict.
> How about do it only when ma_used == keys->dk_nentries?

I've added this check.  See the updated PR.

> The PR changes the behavior. Currently the effect of copying is compacting 
> the dict.

The check that INADA suggested enables compacting on copy, if it is needed.

> The PR adds over 50 lines of code for optimising not very often used feature.

I started to look into the problem because I need this for my upcoming PEP, so 
please don't dismiss this idea right away.

I also think that copying a dict isn't a "not very often used feature", it 
depends on your frame of references.  In some applications you do copy dict a 
lot.  50 lines of code speeding up one of the core methods 5.5x is a fair price 
to pay.

> There are two obvious ways of copying, dict(d) and d.copy()

That can also be easily optimized, btw.  I'll see if I can do that without 
impacting the performance of creating new dicts.

> The PR duplicates the low-level code, that increases maintainability cost.

FWIW, the PR doesn't duplicate any of the code.  It provides a new 
implementation that is more efficient than the old approach.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31179] Speed-up dict.copy() up to 5.5 times.

2017-08-11 Thread Yury Selivanov

Yury Selivanov added the comment:

> Why "del" doesn't compact the dict?

This is a good question, btw.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31179] Speed-up dict.copy() up to 5.5 times.

2017-08-11 Thread STINNER Victor

STINNER Victor added the comment:

 d = dict.fromkeys(range(2000))
 for i in range(1999): del d[i]
> ...
 sys.getsizeof(d)
> 41020
 sys.getsizeof(d.copy())
> 136

Why "del" doesn't compact the dict?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31179] Speed-up dict.copy() up to 5.5 times.

2017-08-11 Thread Serhiy Storchaka

Changes by Serhiy Storchaka :


--
nosy: +rhettinger

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31179] Speed-up dict.copy() up to 5.5 times.

2017-08-11 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

The PR adds over 50 lines of code for optimising not very often used feature. 
There are two obvious ways of copying, dict(d) and d.copy(), the PR optimises 
just the latter one, and I'm not sure this is the most used way. The PR 
duplicates the low-level code, that increases maintainability cost.

The PR changes the behavior. Currently the effect of copying is compacting the 
dict.

>>> import sys
>>> sys.getsizeof(d)
41020
>>> sys.getsizeof(d.copy())
41020
>>> sys.getsizeof(dict(d))
41020
>>> for i in range(1000): del d[i]
... 
>>> sys.getsizeof(dict(d))
20544
>>> sys.getsizeof(d.copy())
20544
>>> sys.getsizeof(d)
41020
>>> import sys
>>> d = dict.fromkeys(range(2000))
>>> sys.getsizeof(d)
41020
>>> sys.getsizeof(d.copy())
41020
>>> d = dict.fromkeys(range(2000))
>>> for i in range(1999): del d[i]
... 
>>> sys.getsizeof(d)
41020
>>> sys.getsizeof(d.copy())
136

The PR preserves non compact layout in the copy.

--
nosy: +serhiy.storchaka

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31179] Speed-up dict.copy() up to 5.5 times.

2017-08-10 Thread INADA Naoki

INADA Naoki added the comment:

I like idea.
One worrying point is how to deal with dirty dict.
How about do it only when ma_used == keys->dk_nentries?

Slightly off topic. Copy on write can be implemented via dk_refcnt.
Functions just passing `**kwargs` has temporal copy of dict.
And CoW will reduce the cost of temporal dict.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31179] Speed-up dict.copy() up to 5.5 times.

2017-08-10 Thread Yury Selivanov

Yury Selivanov added the comment:

>> PyDict_Copy creates a new empty dict object and then inserts key/values into 
>> it one by one.

> Why not creating a "preallocated" dict in that case? _PyDict_NewPresized()

I don't think it's related to the proposed patch.  Please take a look at the 
PR.  `_PyDict_NewPresized` and inserting entries one by one will not be faster 
than a `memcpy`.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31179] Speed-up dict.copy() up to 5.5 times.

2017-08-10 Thread STINNER Victor

STINNER Victor added the comment:

> PyDict_Copy creates a new empty dict object and then inserts key/values into 
> it one by one.

Why not creating a "preallocated" dict in that case? _PyDict_NewPresized()

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31179] Speed-up dict.copy() up to 5.5 times.

2017-08-10 Thread Yury Selivanov

New submission from Yury Selivanov:

It's possible to significantly improve performance of shallow dict copy.  
Currently, PyDict_Copy creates a new empty dict object and then inserts 
key/values into it one by one.

My idea is to simply memcpy the whole keys/items region and do the necessary 
increfs after it.  This works just fine for non-key-sharing dicts.

With the following simple microbenchmark:

import time

N = 100

for size in [0, 1, 10, 20, 50, 100, 500, 1000]:
d = dict([(str(i), i) for i in range(size)])

t = time.monotonic()
for i in range(N):
d.copy()
e = time.monotonic() - t

print(f'dict(size={size}).copy() x {N} times:\t {e:.4f}')


Output for 3.7 master:

dict(size=0).copy() x 100 times: 0.1299
dict(size=1).copy() x 100 times: 0.1499
dict(size=10).copy() x 100 times:0.3758
dict(size=20).copy() x 100 times:0.7722
dict(size=50).copy() x 100 times:1.2784
dict(size=100).copy() x 100 times:   2.5128
dict(size=500).copy() x 100 times:   12.8968
dict(size=1000).copy() x 100 times:  25.4276


Output for patched 3.7:

dict(size=0).copy() x 100 times: 0.1352
dict(size=1).copy() x 100 times: 0.1285
dict(size=10).copy() x 100 times:0.1632
dict(size=20).copy() x 100 times:0.3076
dict(size=50).copy() x 100 times:0.3663
dict(size=100).copy() x 100 times:   0.5140
dict(size=500).copy() x 100 times:   2.3419
dict(size=1000).copy() x 100 times:  4.6176

--
components: Interpreter Core
messages: 300117
nosy: haypo, inada.naoki, yselivanov
priority: normal
severity: normal
stage: patch review
status: open
title: Speed-up dict.copy() up to 5.5 times.
type: performance
versions: Python 3.7

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31179] Speed-up dict.copy() up to 5.5 times.

2017-08-10 Thread Yury Selivanov

Changes by Yury Selivanov :


--
pull_requests: +3104

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com