Re: [Python-Dev] [Python-checkins] Daily reference leaks (09f56fdcacf1): sum=21004
test_codecs is not happy. Looking at the subject lines of commit emails from the past day I don't see any obvious cause. On Thu Aug 07 2014 at 4:35:05 AM solip...@pitrou.net wrote: results for 09f56fdcacf1 on branch default test_codecs leaked [5825, 5825, 5825] references, sum=17475 test_codecs leaked [1172, 1174, 1174] memory blocks, sum=3520 test_collections leaked [0, 2, 0] references, sum=2 test_functools leaked [0, 0, 3] memory blocks, sum=3 test_site leaked [0, 2, 0] references, sum=2 test_site leaked [0, 2, 0] memory blocks, sum=2 Command line was: ['./python', '-m', 'test.regrtest', '-uall', '-R', '3:3:/home/antoine/cpython/refleaks/reflogdA4OO6', '-x'] ___ Python-checkins mailing list python-check...@python.org https://mail.python.org/mailman/listinfo/python-checkins ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pathlib handling of trailing slash (Issue #21039)
Hm. I personally consider a trailing slash significant. It feels semantically different (and in some cases it is) so I don't think it should be normalized. The behavior of os.path.split() here feels right. On Wed, Aug 6, 2014 at 7:30 PM, Antoine Pitrou anto...@python.org wrote: Le 06/08/2014 22:12, Ben Finney a écrit : You seem to be saying that ‘pathlib’ is not intended to be helpful for constructing a shell command. pathlib lets you do operations on paths. It also gives you a string representation of the path that's expected to designate that path when talking to operating system APIs. It doesn't give you the possibility to store other semantic variations (whether a new directory level must be created); that's up to you to add those. (similarly, it doesn't have separate classes to represent a file, a directory, a non-existing file, etc.) Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/ guido%40python.org -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Python-checkins] Daily reference leaks (09f56fdcacf1): sum=21004
On Thu, Aug 7, 2014 at 9:04 AM, Brett Cannon bcan...@gmail.com wrote: test_codecs is not happy. Looking at the subject lines of commit emails from the past day I don't see any obvious cause. Looks like this was caused by the change I made to regrtest in [1] to fix refleak testing in test_asyncio [2]. I'm looking into it, but haven't found any kind of reason for it yet. -- Zach [1] http://hg.python.org/cpython/rev/7bc53cf8b2df [2] http://bugs.python.org/issue22104 ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Python-checkins] Daily reference leaks (09f56fdcacf1): sum=21004
On Thu, Aug 7, 2014 at 12:16 PM, Zachary Ware zachary.ware+py...@gmail.com wrote: On Thu, Aug 7, 2014 at 9:04 AM, Brett Cannon bcan...@gmail.com wrote: test_codecs is not happy. Looking at the subject lines of commit emails from the past day I don't see any obvious cause. Looks like this was caused by the change I made to regrtest in [1] to fix refleak testing in test_asyncio [2]. I'm looking into it, but haven't found any kind of reason for it yet. I've created http://bugs.python.org/issue22166 to keep track of this and report my findings thus far. -- Zach ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] sum(...) limitation
On Mon, Aug 4, 2014 at 11:10 AM, Steven D'Aprano st...@pearwood.info wrote: On Mon, Aug 04, 2014 at 09:25:12AM -0700, Chris Barker wrote: Good point -- I was trying to make the point about .join() vs + for strings in an intro python class last year, and made the mistake of having the students test the performance. You need to concatenate a LOT of strings to see any difference at all If only that were the case, but it isn't. Here's a cautionary tale for how using string concatenation can blow up in your face: Chris Withers asks for help debugging HTTP slowness: https://mail.python.org/pipermail/python-dev/2009-August/091125.html Thanks for that -- interesting story. note that that was not suing sum() in that case though, which is really the issue at hand. It shouldn't be hard to demonstrate the difference between repeated string concatenation and join, all you need do is defeat sum()'s prohibition against strings. Run this bit of code, and you'll see a significant difference in performance, even with CPython's optimized concatenation: well, that does look compelling, but what it shows is that sum(a_list_of_strings) is slow compared to ''.join(a_list_of_stings). That doesn't surprise me a bit -- this is really similar to why: a_numpy_array.sum() is going to be a lot faster than: sum(a_numpy_array) and why I'll tell everyone that is working with lots of numbers to use numpy. ndarray.sum know what data type it's deaing with,a nd can do the loop in C. similarly with ''.join() (though not as optimized. But I'm not sure we're seeing the big O difference here at all -- but rather the extra calls though each element in the list's __add__ method. In the case where you already HAVE a big list of strings, then yes, ''.join is the clear winner. But I think the case we're often talking about, and I've tested with students, is when you are building up a long string on the fly out of little strings. In that case, you need to profile the full append to list, then call join(), not just the join() call: # continued adding of strings ( O(n^2)? ) In [6]: def add_strings(l): ...: s = '' ...: for i in l: ...: s+=i ...: return s Using append and then join ( O(n)? ) In [14]: def join_strings(list_of_strings): : l = [] : for i in list_of_strings: : l.append(i) : return ''.join(l) In [23]: timeit add_strings(strings) 100 loops, best of 3: 831 ns per loop In [24]: timeit join_strings(strings) 10 loops, best of 3: 1.87 µs per loop ## hmm -- concatenating is faster for a small list of tiny strings In [31]: strings = list('Hello World')* 1000 strings *= 1000 In [26]: timeit add_strings(strings) 1000 loops, best of 3: 932 µs per loop In [27]: timeit join_strings(strings) 1000 loops, best of 3: 967 µs per loop ## now about the same. In [31]: strings = list('Hello World')* 1 In [29]: timeit add_strings(strings) 100 loops, best of 3: 9.44 ms per loop In [30]: timeit join_strings(strings) 100 loops, best of 3: 10.1 ms per loop still about he same? In [31]: strings = list('Hello World')* 100 In [32]: timeit add_strings(strings) 1 loops, best of 3: 1.27 s per loop In [33]: timeit join_strings(strings) 1 loops, best of 3: 1.05 s per loop there we go -- slight advantage to joining. So this is why we've said that the common wisdom about string concatenating isn't really a practical issue. But if you already have the strings all in a list, then yes, join() is a major win over sum() In fact, I tried the above with sum() -- and it was really, really slow. So slow I didn't have the patience to wait for it. Here is a smaller example: In [22]: strings = list('Hello World')* 1 In [23]: timeit add_strings(strings) 100 loops, best of 3: 9.61 ms per loop In [24]: timeit sum( strings, Faker() ) 1 loops, best of 3: 246 ms per loop So why is sum() so darn slow with strings compared to a simple loop with += ? (and if I try it with a list 10 times as long it takes forever) Perhaps the http issue cited was before some nifty optimizations in current CPython? -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] sum(...) limitation
On 08/07/2014 03:06 PM, Chris Barker wrote: [snip timings, etc.] I don't remember where, but I believe that cPython has an optimization built in for repeated string concatenation, which is probably why you aren't seeing big differences between the + and the sum(). A little testing shows how to defeat that optimization: blah = '' for string in ['booyah'] * 10: blah = string + blah Note the reversed order of the addition. -- timeit.Timer(for string in ['booya'] * 10: blah = blah + string, blah = '').repeat(3, 1) [0.021117210388183594, 0.013692855834960938, 0.00768280029296875] -- timeit.Timer(for string in ['booya'] * 10: blah = string + blah, blah = '').repeat(3, 1) [15.301048994064331, 15.343288898468018, 15.268463850021362] -- ~Ethan~ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] sum(...) limitation
On 08/07/2014 04:01 PM, Ethan Furman wrote: On 08/07/2014 03:06 PM, Chris Barker wrote: the + and the sum(). Yeah, that 'sum' should be 'join' :/ -- ~Ethan~ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] sum(...) limitation
On 08/07/2014 04:01 PM, Ethan Furman wrote: On 08/07/2014 03:06 PM, Chris Barker wrote: -- timeit.Timer(for string in ['booya'] * 10: blah = blah + string, blah = '').repeat(3, 1) [0.021117210388183594, 0.013692855834960938, 0.00768280029296875] -- timeit.Timer(for string in ['booya'] * 10: blah = string + blah, blah = '').repeat(3, 1) [15.301048994064331, 15.343288898468018, 15.268463850021362] Oh, and the join() timings: -- timeit.Timer(blah = ''.join(['booya'] * 10), blah = '').repeat(3, 1) [0.0014629364013671875, 0.0014190673828125, 0.0011930465698242188] So, + is three orders of magnitude slower than join. -- ~Ethan~ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Surely nullable is a reasonable name?
On 08/05/2014 08:13 AM, Martin v. Löwis wrote: For the feature in question, I find both allow_none and nullable acceptable; noneable is not. Well! It's rare that the core dev community is so consistent in its opinion. I still think nullable is totally appropriate, but I'll change it to allow_none. //arry/ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com