Re: [Python-Dev] [Python-checkins] Daily reference leaks (09f56fdcacf1): sum=21004

2014-08-07 Thread Brett Cannon
test_codecs is not happy. Looking at the subject lines of commit emails
from the past day I don't see any obvious cause.

On Thu Aug 07 2014 at 4:35:05 AM solip...@pitrou.net wrote:

 results for 09f56fdcacf1 on branch default
 

 test_codecs leaked [5825, 5825, 5825] references, sum=17475
 test_codecs leaked [1172, 1174, 1174] memory blocks, sum=3520
 test_collections leaked [0, 2, 0] references, sum=2
 test_functools leaked [0, 0, 3] memory blocks, sum=3
 test_site leaked [0, 2, 0] references, sum=2
 test_site leaked [0, 2, 0] memory blocks, sum=2


 Command line was: ['./python', '-m', 'test.regrtest', '-uall', '-R',
 '3:3:/home/antoine/cpython/refleaks/reflogdA4OO6', '-x']
 ___
 Python-checkins mailing list
 python-check...@python.org
 https://mail.python.org/mailman/listinfo/python-checkins

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib handling of trailing slash (Issue #21039)

2014-08-07 Thread Guido van Rossum
Hm. I personally consider a trailing slash significant. It feels
semantically different (and in some cases it is) so I don't think it should
be normalized. The behavior of os.path.split() here feels right.


On Wed, Aug 6, 2014 at 7:30 PM, Antoine Pitrou anto...@python.org wrote:


 Le 06/08/2014 22:12, Ben Finney a écrit :

  You seem to be saying that ‘pathlib’ is not intended to be helpful for
 constructing a shell command.


 pathlib lets you do operations on paths. It also gives you a string
 representation of the path that's expected to designate that path when
 talking to operating system APIs. It doesn't give you the possibility to
 store other semantic variations (whether a new directory level must be
 created); that's up to you to add those.

 (similarly, it doesn't have separate classes to represent a file, a
 directory, a non-existing file, etc.)

 Regards

 Antoine.



 ___
 Python-Dev mailing list
 Python-Dev@python.org
 https://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe: https://mail.python.org/mailman/options/python-dev/
 guido%40python.org




-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-checkins] Daily reference leaks (09f56fdcacf1): sum=21004

2014-08-07 Thread Zachary Ware
On Thu, Aug 7, 2014 at 9:04 AM, Brett Cannon bcan...@gmail.com wrote:
 test_codecs is not happy. Looking at the subject lines of commit emails from
 the past day I don't see any obvious cause.

Looks like this was caused by the change I made to regrtest in [1] to
fix refleak testing in test_asyncio [2].  I'm looking into it, but
haven't found any kind of reason for it yet.

-- 
Zach

[1] http://hg.python.org/cpython/rev/7bc53cf8b2df
[2] http://bugs.python.org/issue22104
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-checkins] Daily reference leaks (09f56fdcacf1): sum=21004

2014-08-07 Thread Zachary Ware
On Thu, Aug 7, 2014 at 12:16 PM, Zachary Ware
zachary.ware+py...@gmail.com wrote:
 On Thu, Aug 7, 2014 at 9:04 AM, Brett Cannon bcan...@gmail.com wrote:
 test_codecs is not happy. Looking at the subject lines of commit emails from
 the past day I don't see any obvious cause.

 Looks like this was caused by the change I made to regrtest in [1] to
 fix refleak testing in test_asyncio [2].  I'm looking into it, but
 haven't found any kind of reason for it yet.

I've created http://bugs.python.org/issue22166 to keep track of this
and report my findings thus far.

-- 
Zach
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sum(...) limitation

2014-08-07 Thread Chris Barker
On Mon, Aug 4, 2014 at 11:10 AM, Steven D'Aprano st...@pearwood.info
wrote:

 On Mon, Aug 04, 2014 at 09:25:12AM -0700, Chris Barker wrote:

  Good point -- I was trying to make the point about .join() vs + for
 strings
  in an intro python class last year, and made the mistake of having the
  students test the performance.
 
  You need to concatenate a LOT of strings to see any difference at all



 If only that were the case, but it isn't. Here's a cautionary tale for
 how using string concatenation can blow up in your face:

 Chris Withers asks for help debugging HTTP slowness:
 https://mail.python.org/pipermail/python-dev/2009-August/091125.html



Thanks for that -- interesting story. note that that was not suing sum() in
that case though, which is really the issue at hand.

It shouldn't be hard to demonstrate the difference between repeated
 string concatenation and join, all you need do is defeat sum()'s
 prohibition against strings. Run this bit of code, and you'll see a
 significant difference in performance, even with CPython's optimized
 concatenation:


well, that does look compelling, but what it shows is that
sum(a_list_of_strings) is slow compared to ''.join(a_list_of_stings). That
doesn't surprise me a bit -- this is really similar to why:

a_numpy_array.sum()

is going to be a lot faster than:

sum(a_numpy_array)

and why I'll tell everyone that is working with lots of numbers to use
numpy. ndarray.sum know what data type it's deaing with,a nd can do the
loop in C. similarly with ''.join() (though not as optimized.

But I'm not sure we're seeing the big O difference here at all -- but
rather the extra calls though each element in the list's __add__ method.

In the case where you already HAVE a big list of strings, then yes, ''.join
is the clear winner.

But I think the case we're often talking about, and I've tested with
students, is when you are building up a long string on the fly out of
little strings. In that case, you need to profile the full append to list,
then call join(), not just the join() call:

# continued adding of strings ( O(n^2)? )
In [6]: def add_strings(l):
   ...: s = ''
   ...: for i in l:
   ...: s+=i
   ...: return s

Using append and then join ( O(n)? )
In [14]: def join_strings(list_of_strings):
   : l = []
   : for i in list_of_strings:
   : l.append(i)
   : return ''.join(l)

In [23]: timeit add_strings(strings)
100 loops, best of 3: 831 ns per loop

In [24]: timeit join_strings(strings)
10 loops, best of 3: 1.87 µs per loop

## hmm -- concatenating is faster for a small list of tiny strings

In [31]: strings = list('Hello World')* 1000

strings *= 1000
In [26]: timeit add_strings(strings)
1000 loops, best of 3: 932 µs per loop

In [27]: timeit join_strings(strings)
1000 loops, best of 3: 967 µs per loop

## now about the same.

In [31]: strings = list('Hello World')* 1

In [29]: timeit add_strings(strings)
100 loops, best of 3: 9.44 ms per loop

In [30]: timeit join_strings(strings)
100 loops, best of 3: 10.1 ms per loop

still about he same?

In [31]: strings = list('Hello World')* 100

In [32]: timeit add_strings(strings)
1 loops, best of 3: 1.27 s per loop

In [33]: timeit join_strings(strings)
1 loops, best of 3: 1.05 s per loop

there we go -- slight advantage to joining.

So this is why we've said that the common wisdom about string concatenating
isn't really a practical issue.

But if you already have the strings all in a list, then yes, join() is a
major win over sum()

In fact, I tried the above with sum() -- and it was really, really slow. So
slow I didn't have the patience to wait for it.

Here is a smaller example:

In [22]: strings = list('Hello World')* 1

In [23]: timeit add_strings(strings)
100 loops, best of 3: 9.61 ms per loop

In [24]: timeit sum( strings, Faker() )
1 loops, best of 3: 246 ms per loop

So why is sum() so darn slow with strings compared to a simple loop with +=
?

(and if I try it with a list 10 times as long it takes forever)

Perhaps the http issue cited was before some nifty optimizations in current
CPython?

-Chris





-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sum(...) limitation

2014-08-07 Thread Ethan Furman

On 08/07/2014 03:06 PM, Chris Barker wrote:

[snip timings, etc.]

I don't remember where, but I believe that cPython has an optimization built in for repeated string concatenation, which 
is probably why you aren't seeing big differences between the + and the sum().


A little testing shows how to defeat that optimization:

  blah = ''
  for string in ['booyah'] * 10:
  blah = string + blah

Note the reversed order of the addition.

-- timeit.Timer(for string in ['booya'] * 10: blah = blah + string, blah = 
'').repeat(3, 1)
[0.021117210388183594, 0.013692855834960938, 0.00768280029296875]

-- timeit.Timer(for string in ['booya'] * 10: blah = string + blah, blah = 
'').repeat(3, 1)
[15.301048994064331, 15.343288898468018, 15.268463850021362]

--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sum(...) limitation

2014-08-07 Thread Ethan Furman

On 08/07/2014 04:01 PM, Ethan Furman wrote:

On 08/07/2014 03:06 PM, Chris Barker wrote:

 the + and the sum().


Yeah, that 'sum' should be 'join'  :/

--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sum(...) limitation

2014-08-07 Thread Ethan Furman

On 08/07/2014 04:01 PM, Ethan Furman wrote:

On 08/07/2014 03:06 PM, Chris Barker wrote:

-- timeit.Timer(for string in ['booya'] * 10: blah = blah + string, blah = 
'').repeat(3, 1)
[0.021117210388183594, 0.013692855834960938, 0.00768280029296875]

-- timeit.Timer(for string in ['booya'] * 10: blah = string + blah, blah = 
'').repeat(3, 1)
[15.301048994064331, 15.343288898468018, 15.268463850021362]


Oh, and the join() timings:

-- timeit.Timer(blah = ''.join(['booya'] * 10), blah = '').repeat(3, 1)
[0.0014629364013671875, 0.0014190673828125, 0.0011930465698242188]

So, + is three orders of magnitude slower than join.

--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Surely nullable is a reasonable name?

2014-08-07 Thread Larry Hastings

On 08/05/2014 08:13 AM, Martin v. Löwis wrote:

For the feature in question,
I find both allow_none and nullable acceptable; noneable is not.


Well!  It's rare that the core dev community is so consistent in its 
opinion.  I still think nullable is totally appropriate, but I'll 
change it to allow_none.



//arry/
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com