[issue23509] Speed up Counter operators

2015-05-29 Thread Roundup Robot

Roundup Robot added the comment:

New changeset fe4efc0032b5 by Raymond Hettinger in branch '3.5':
Issue #23509: Speed up Counter operators
https://hg.python.org/cpython/rev/fe4efc0032b5

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23509
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23509] Speed up Counter operators

2015-05-26 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Perhaps correct __pos__ docstring?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23509
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23509] Speed up Counter operators

2015-05-26 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


--
Removed message: http://bugs.python.org/msg244128

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23509
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23509] Speed up Counter operators

2015-05-26 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Perhaps update __pos__ docstring?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23509
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23509] Speed up Counter operators

2015-05-25 Thread Raymond Hettinger

Raymond Hettinger added the comment:

The change to __neg__ looked like a nice improvement and the same should 
technique can be done to __pos__.   Attaching a patch for those two.

--
versions: +Python 3.6 -Python 3.5
Added file: http://bugs.python.org/file39497/counter_pos_neg.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23509
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23509] Speed up Counter operators

2015-05-23 Thread Jörn Hees

Jörn Hees added the comment:

 I'm closing this because the OP's original concern about wanting an in-place 
 operation was already solved

Was it? Are you referring to http://bugs.python.org/issue13121 ?

My main concern was that += is considerably slower than .update(), kind of 
catching me off-guard. As you closed this, i'd be very happy if you could maybe 
add a note to the docs 
https://docs.python.org/3/_sources/library/collections.txt that points this 
behavior out. Maybe by changing this:

* The multiset methods are designed only for use cases with positive values.
  The inputs may be negative or zero, but only outputs with positive values
  are created.  There are no type restrictions, but the value type needs to
  support addition, subtraction, and comparison.

* The :meth:`elements` method requires integer counts.  It ignores zero and
  negative counts.


to this:

* The multiset methods (``+``, ``-`` and ``+=``, ``-=``) are designed only
  for use cases with positive values.
  The inputs may be negative or zero, but only outputs with positive values
  are created.  There are no type restrictions, but the value type needs to
  support addition, subtraction, and comparison.

* Because of the necessary additional checks for positive values, a ``c += 
d``
  can be considerably slower than a ``c.update(d)``.

* The :meth:`elements` method requires integer counts.  It ignores zero and
  negative counts.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23509
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23509] Speed up Counter operators

2015-05-23 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

The OrderedCounter recipe doesn't support well multiset operations, because the 
result type is hardcoded to Counter. The order is already scrambled. update() 
and substract() don't work well with overloaded __missing__().

Proposed implementations of __add__ and __or__ simplify the code. If you don't 
want that overloaded inplace operation affect non-inplace operations (I 
consider this rather as a benefit), Counter.__iadd__(result, other) can be used 
instead of result += other.

Optimized __neg__ just contains inlined substraction (note that current 
implementation of __neg__ and __pos__ violate the open-closed-principle), with 
removed no-op code.

My second step would be to add C implementation of _keep_positive(), because 
this function is used in a number of multiset methods. It could be more 
efficient and preserve the order.

In any case thank you for spending your time Raymond.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23509
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23509] Speed up Counter operators

2015-05-22 Thread Raymond Hettinger

Raymond Hettinger added the comment:

Sorry, I don't want to any of these changes (though it is a close call on a 
couple of them).

Before the particulars, here are some high-level thoughts (not definitive).  I 
would like to confine the optimizations and complexities to the more important 
parts of the API (actually counting as opposed to counter-to-counter 
operations).  Also, I don't want to preclude some of the future possibilities 
under consideration (for example, I am leaning toward guaranteeing the order of 
updates so that the OrderedCounter recipe has guaranteed behavior).  Also, I'm 
considering removing the existing self.get(elem, 0) in update() and substract() 
so that subclassers can usefully override/extend the __missing__ method to 
return other types (like decimals, fractions, etc) or have other behaviors like 
logging missing entries, etc.  And the self.get optimization doesn't seem to 
perform well under PyPy in contrast to an inherited __getitem__.  The current 
code choices were biased towards simplicity, space-over-speed, and keeping a 
predictable operation order where possible.

Particulars:

1) get() bound method optimization:  This is a close call.  We already use this 
in update() and subtract() though I'm thinking of removing those two cases.  
Switching from c[k] to c.get(k, 0) is a semantic change that affects subclasses 
that define, __getitem__(), get(), or __missing__().  Speedwise, c.get() is 
faster than a fallback to __missing__() for missing keys; conversely, the 
inherited __getitem__() is faster than c.get(k, 0) when the keys are present.  
There is some room for debate about which is the common case (it really depends 
on what your application is) and I would prefer at this point not to shift 
assumptions about is more common.  Clarity-wise:  The square brackets are 
clearer than the boundmethod trick which I would like to use only where it 
really matters.

2)  The current _keep_positive() is shorter, clearer, maintains order for the 
OrderedCounter use case, and is more space-efficient (never using more space 
than the original counter and intentionally choosing to remove elements rather 
than building a new one from scratch).  This is how it was done in setobject.c 
for the same reasons.

3) Other than brevity, I don't see any advantage to __add__ and __or__ being 
defined via inplace operations.  That is a semantic change that can affect 
subclassers, violating the open-closed-principle (I want people to be able to 
override/extend the in-place methods without unintentionally breaking add/or 
methods).  Also, the current approach has a space saving bias (not storing 
negative counts in the first place rather than using a follow-on call to 
_keep_positive pass to eliminate the negatives after they have been stored).

4) The code expansion for __pos__ and __neg__ grows the code and is less clear 
(IMO).  The change for __pos__ scrambles the order, interfering with the 
OrderedCounter example.  Also, I want the meaning of +c to be the same as c 
plus an empty counter (at least, that is how I think of the operation).  FWIW, 
the unary plus operation was intended to be a trigger for _keep_positive.  It 
was modeled after the unary plus in Decimal which serves the purpose of 
triggering rounding.

I'm sure there is room for argument about any one of the points above, some are 
just judgment calls.  I'm closing this because the OP's original concern about 
wanting an in-place operation was already solved and because the proposed 
optimizations slightly change semantics, aren't really the important part of 
the API, the snarl the code a bit, and they interfere with some future 
directions I want keep open.  Also, I've already spent several hours to 
reviewing this patch and need to return attention to other matters.

--
resolution:  - rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23509
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23509] Speed up Counter operators

2015-05-12 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Could you please look at the patch Raymond? There are only few days are left to 
the feature freeze.

--
keywords: +needs review

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23509
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23509] Speed up Counter operators

2015-03-27 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Added explanation comments to address Victor's comment.

--
Added file: http://bugs.python.org/file38708/counter_faster_2.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23509
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23509] Speed up Counter operators

2015-02-26 Thread Raymond Hettinger

Changes by Raymond Hettinger raymond.hettin...@gmail.com:


--
assignee:  - rhettinger

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23509
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23509] Speed up Counter operators

2015-02-26 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Applied optimizations:

1) Used cached get() method instead of indexing. This optimization already was 
used in update() and subtract().
2) _keep_positive() is optimized for the case when most counts are not positive 
(common case for substraction and intersection).
3) __add__ and __or__ are defined via inplace operations which are faster (due 
to fast copying and _keep_positive()).
4) Inlined and simplified the code for __pos__ and __neg__.

May be following optimization can be made by implementing _keep_positive() in C.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23509
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23509] Speed up Counter operators

2015-02-24 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

 - in the given patch __add__ uses __iadd__, but __sub__ doesn't use
 __isub__, which seems a bit weird.

If Counters are positive (common case), then the result of addition is not 
less than both summands. a + b is a and may be additional elements from b.
In the case of substraction a - b can be less than a and may be much less than 
a. In this case it is cheaper to create empty Counter and copy only those 
elements from a that are not in b, than copy all a and then remove almost all 
elements.

Relative efficiency depends on input data, and for some input data implementing 
__sub__ via __isub__ can be more efficient.

 - is there place for a non multi-set centric Stats object which is like
 Counter but with + and - actually behaving without the (in my use cases of
 Counter often counter intuitive)  0 stuff? (pun intended ;) ) Counter
 feels like a sub-class of Stats with the added _keep_positive(self).

I'm sure there is such class in third-party modules. Counter wouldn't have 
much benefit from inheriting Stats, because it would need to override almost 
all methods.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23509
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23509] Speed up Counter operators

2015-02-24 Thread Jörn Hees

Jörn Hees added the comment:

cool

minor question:
- in the given patch __add__ uses __iadd__, but __sub__ doesn't use __isub__, 
which seems a bit weird.

maybe off-topic, but maybe not, because of _keep_positive(self):
- is there place for a non multi-set centric Stats object which is like 
Counter but with + and - actually behaving without the (in my use cases of 
Counter often counter intuitive)  0 stuff? (pun intended ;) ) Counter feels 
like a sub-class of Stats with the added _keep_positive(self).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23509
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com