[issue39440] Use PyNumber_InPlaceAdd in sum() for the second iteration onward

2020-01-28 Thread Brandt Bucher


Brandt Bucher  added the comment:

Perhaps. I've opened a PR to update the comment with more info.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39440] Use PyNumber_InPlaceAdd in sum() for the second iteration onward

2020-01-28 Thread Brandt Bucher


Change by Brandt Bucher :


--
pull_requests: +17619
pull_request: https://github.com/python/cpython/pull/18240

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39440] Use PyNumber_InPlaceAdd in sum() for the second iteration onward

2020-01-23 Thread edk


edk  added the comment:

It's not only sequences. But I didn't know this had come up before, sorry. 
Maybe the nature of the warning in the source code s a bit misleading?

--
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39440] Use PyNumber_InPlaceAdd in sum() for the second iteration onward

2020-01-23 Thread Brandt Bucher


Change by Brandt Bucher :


--
resolution:  -> duplicate

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39440] Use PyNumber_InPlaceAdd in sum() for the second iteration onward

2020-01-23 Thread Brandt Bucher


Brandt Bucher  added the comment:

Prior discussion at https://bugs.python.org/issue18305. Note the final comment.

In short, this is a breaking semantic change, and the general consensus is that 
using sum for sequence concatenation is an anti-pattern.

--
nosy: +brandtbucher

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39440] Use PyNumber_InPlaceAdd in sum() for the second iteration onward

2020-01-23 Thread Ammar Askar


Change by Ammar Askar :


--
keywords: +patch
pull_requests: +17542
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/18156

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39440] Use PyNumber_InPlaceAdd in sum() for the second iteration onward

2020-01-23 Thread edk


New submission from edk :

The C implementation of sum() contains this comment:

/* It's tempting to use PyNumber_InPlaceAdd instead of
   PyNumber_Add here, to avoid quadratic running time
   when doing 'sum(list_of_lists, [])'.  However, this
   would produce a change in behaviour: a snippet like
 empty = []
 sum([[x] for x in range(10)], empty)
   would change the value of empty. */

But that doesn't hold after PyNumber_Add has been called once,
because from that point onward the accumulator value is something
we got from an __add__, and the caller can't know if we reuse it.

The in-place version is substantially faster for some workloads--
the example in the comment is an obvious one, but the context in
which I ran into this was using a sum of Counters in a one-liner
a bit like this:

sum((Counter({line.split("|", 3): len(line)}) for line in sys.stdin), 
Counter())

in which significant time seems to be spent adding the contents
of the previous accumulator value to the new one.

# before
; ./python -m timeit 'sum(([x] for x in range(1000)), [])'
500 loops, best of 5: 888 usec per loop

# after
; ./python -m timeit 'sum(([x] for x in range(1000)), [])'
5000 loops, best of 5: 65.3 usec per loop


; cat test_.py 
from collections import Counter
import timeit
import random

data = [Counter({random.choice(['foo', 'bar', 'baz', 'qux']): 
random.randint(1,100)}) for _ in range(1)]

print(min(timeit.repeat('sum(data, Counter())', 'from collections import 
Counter', number=100, globals={'data': data})))
print(min(timeit.repeat('reduce(Counter.__iadd__, data, Counter())', 'from 
collections import Counter; from functools import reduce', number=100, 
globals={'data': data})))

# before
; ./python test_.py
1.8981186050223187
0.7094596439856105

# after
; ./python test_.py 
0.715508968976792
0.7050370009965263

--
messages: 360583
nosy: edk
priority: normal
severity: normal
status: open
title: Use PyNumber_InPlaceAdd in sum() for the second iteration onward
type: performance

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com