New submission from Никита Сметанин <nikitozzz...@gmail.com>:

All of collections.Counter in-place operators: +=, -=, |= and &= are obviously 
expected to have time complexity O(b) as in the following example:

a = Counter(...)  # e.g 1M elements
b = Counter(...)  # e.g. 10 elements
a += b

But in fact, all of them are having O(a + b) time complexity due to inefficient 
implementation of _keep_positive method, which checks ALL of the elements of 
"a" counter after modification while it's required only to check CHANGED 
elements (no more than a size of "b") having time complexity O(b).

See 
https://github.com/python/cpython/blob/master/Lib/collections/__init__.py#L819 

It also unclear if there's even a need to check for non-positives with 
_keep_positive in ops like __iadd__, __iand__ and __ior__ (except __isub__) as 
it expects Counters which are always positive.

This unobvious inefficiency leads to unnecessary large execution times while, 
for example, iteratively accumulating some small Counters into a large one 
(frequent case). In this case .update method works much faster, but it doesn't 
check for zeros or negatives for some reason (is this a bug too?).

----------
components: Library (Lib)
messages: 338461
nosy: Никита Сметанин
priority: normal
severity: normal
status: open
title: collections.Counter in-place operators are unexpectedly slow
type: performance
versions: Python 2.7, Python 3.5, Python 3.6, Python 3.7, Python 3.8

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue36380>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to