[Python-Dev] python2.7 infinite recursion when loading pickled object

2014-08-11 Thread Schmitt Uwe (ID SIS)
Dear all,

I discovered a problem using cPickle.loads from CPython 2.7.6.

The last line in the following code raises an infinite recursion

class T(object):

def __init__(self):
self.item = list()

def __getattr__(self, name):
return getattr(self.item, name)


import cPickle

t = T()

l = cPickle.dumps(t)
cPickle.loads(l)


loads triggers T.__getattr__ using getattr(inst, __setstate__, None) for 
looking up a __setstate__ method,
which is not implemented for T. As the item attribute is missing at this time, 
the ininfite recursion starts.

The infinite recursion disappears if I attach a default implementation for 
__setstate__ to T:

def __setstate__(self, dd):
self.__dict__ = dd

This could be fixed by using „hasattr“ in pickle before trying to call 
„getattr“.

Is this a bug or did I miss something ?

Kind Regards,
Uwe

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] python2.7 infinite recursion when loading pickled object

2014-08-11 Thread Terry Reedy

On 8/11/2014 5:10 AM, Schmitt Uwe (ID SIS) wrote:

Python usage questions should be directed to python-list, for instance.


I discovered a problem using cPickle.loads from CPython 2.7.6.


The problem is your code having infinite recursion. You only discovered 
it with pickle.




The last line in the following code raises an infinite recursion

 class T(object):

 def __init__(self):
 self.item = list()

 def __getattr__(self, name):
 return getattr(self.item, name)


This is a (common) bug in your program.  __getattr__ should call 
self.__dict__(name) to avoid the recursion.


--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] python2.7 infinite recursion when loading pickled object

2014-08-11 Thread Peter Otten
Terry Reedy wrote:

 On 8/11/2014 5:10 AM, Schmitt Uwe (ID SIS) wrote:
 
 Python usage questions should be directed to python-list, for instance.
 
 I discovered a problem using cPickle.loads from CPython 2.7.6.
 
 The problem is your code having infinite recursion. You only discovered
 it with pickle.
 
 
 The last line in the following code raises an infinite recursion

  class T(object):

  def __init__(self):
  self.item = list()

  def __getattr__(self, name):
  return getattr(self.item, name)
 
 This is a (common) bug in your program.  __getattr__ should call
 self.__dict__(name) to avoid the recursion.

Read again. The OP tries to delegate attribute lookup to an (existing) 
attribute.

IMO the root cause of the problem is that pickle looks up __dunder__ methods 
in the instance rather than the class.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] python2.7 infinite recursion when loading pickled object

2014-08-11 Thread Chris Angelico
On Mon, Aug 11, 2014 at 9:40 PM, Peter Otten __pete...@web.de wrote:
 Read again. The OP tries to delegate attribute lookup to an (existing)
 attribute.

 IMO the root cause of the problem is that pickle looks up __dunder__ methods
 in the instance rather than the class.

The recursion comes from the attempted lookup of self.item, when
__init__ hasn't been called.

ChrisA
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] python2.7 infinite recursion when loading pickled object

2014-08-11 Thread R. David Murray
On Mon, 11 Aug 2014 21:43:00 +1000, Chris Angelico ros...@gmail.com wrote:
 On Mon, Aug 11, 2014 at 9:40 PM, Peter Otten __pete...@web.de wrote:
  Read again. The OP tries to delegate attribute lookup to an (existing)
  attribute.
 
  IMO the root cause of the problem is that pickle looks up __dunder__ methods
  in the instance rather than the class.
 
 The recursion comes from the attempted lookup of self.item, when
 __init__ hasn't been called.

Indeed, and this is what the OP missed.  With a class like this, it is
necessary to *make* it pickleable, since the pickle protocol doesn't
call __init__.

--David
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] python2.7 infinite recursion when loading pickled object

2014-08-11 Thread Peter Otten
Chris Angelico wrote:

 On Mon, Aug 11, 2014 at 9:40 PM, Peter Otten __pete...@web.de wrote:
 Read again. The OP tries to delegate attribute lookup to an (existing)
 attribute.

 IMO the root cause of the problem is that pickle looks up __dunder__
 methods in the instance rather than the class.
 
 The recursion comes from the attempted lookup of self.item, when
 __init__ hasn't been called.

You are right. Sorry for the confusion.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sum(...) limitation

2014-08-11 Thread Ben Hoyt
It seems to me this is something of a pointless discussion -- I highly
doubt the current situation is going to change, and it works very well.
Even if not perfect, sum() is for numbers, sep.join() for strings. However,
I will add one comment:

I'm overall -1 on trying to change the current situation (except for
 adding a join() builtin or str.join class method).


Did you know there actually is a str.join class method? I've never
actually seen it used this way, but for people who just can't stand
sep.join(seq), you can always call str.join(sep, seq) -- works in Python 2
and 3:

 str.join('.', ['abc', 'def', 'ghi'])
'abc.def.ghi'

This works as a side effect of the fact that you can call methods as
cls.method(instance, args).

-Ben
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] python2.7 infinite recursion when loading pickled object

2014-08-11 Thread Akira Li
Schmitt  Uwe (ID SIS) uwe.schm...@id.ethz.ch writes:

 I discovered a problem using cPickle.loads from CPython 2.7.6.

 The last line in the following code raises an infinite recursion

 class T(object):

 def __init__(self):
 self.item = list()

 def __getattr__(self, name):
 return getattr(self.item, name)

 import cPickle

 t = T()

 l = cPickle.dumps(t)
 cPickle.loads(l)
...
 Is this a bug or did I miss something ?

The issue is that your __getattr__ raises RuntimeError (due to infinite
recursion) for non-existing attributes instead of AttributeError. To fix
it, you could use object.__getattribute__:

  class C:
def __init__(self):
self.item = []
def __getattr__(self, name):
return getattr(object.__getattribute__(self, 'item'), name)

There were issues in the past due to {get,has}attr silencing
non-AttributeError exceptions; therefore it is good that pickle breaks
when it gets RuntimeError instead of AttributeError.


--
Akira

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] os.walk() is going to be *fast* with scandir

2014-08-11 Thread Akira Li
Armin Rigo ar...@tunes.org writes:

 On 10 August 2014 08:11, Larry Hastings la...@hastings.org wrote:
 A small tip from my bzr days - cd into the directory before scanning it

 I doubt that's permissible for a library function like os.scandir().

 Indeed, chdir() is notably not compatible with multithreading.  There
 would be a non-portable but clean way to do that: the functions
 openat() and fstatat().  They only exist on relatively modern Linuxes,
 though.

There is os.fwalk() that could be both safer and faster than
os.walk(). It yields rootdir fd that can be used by functions that
support dir_fd parameter, see os.supports_dir_fd set. They use *at()
functions under the hood.

os.fwalk() could be implemented in terms of os.scandir() if the latter
would support fd parameter like os.listdir() does (be in os.supports_fd
set (note: it is different from os.supports_dir_fd)).

Victor Stinner suggested [1] to allow scandir(fd) but I don't see it
being mentioned in the pep 471 [2]: it neither supports nor rejects the
idea.

[1] https://mail.python.org/pipermail/python-dev/2014-July/135283.html
[2] http://legacy.python.org/dev/peps/pep-0471/


--
Akira

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] os.walk() is going to be *fast* with scandir

2014-08-11 Thread Ben Hoyt
 Victor Stinner suggested [1] to allow scandir(fd) but I don't see it
 being mentioned in the pep 471 [2]: it neither supports nor rejects the
 idea.

 [1] https://mail.python.org/pipermail/python-dev/2014-July/135283.html
 [2] http://legacy.python.org/dev/peps/pep-0471/

Yes, listdir() supports fd, and I think scandir() probably will too to
parallel that, if not for v1.0 then soon after. Victor and I want to
focus on getting the PEP 471 (string path only) version working first.

-Ben
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sum(...) limitation

2014-08-11 Thread Chris Barker - NOAA Federal
 I'm very sympathetic to Steven's explanation that we
 wouldn't be having this discussion if we used a different operator for
 string concatenation.

Sure -- but just imagine the conversations we could be having instead
: what does bit wise and of a string mean? A bytes object? I cod see
it as a character-wise and, for instance  ;-)

My confusion is still this:

Repeated summation of strings has been optimized in cpython even
though it's not the recommended way to solve that problem.

So why not special case optimize sum() for strings? We are already
special-case strings to raise an exception.

It seems pretty pedantic to say: we cod make this work well, but we'd
rather chide you for not knowing the proper way to do it.

Practicality beats purity?

-Chris




 Although that's not the whole story: in
 practice even numerical sums get split into multiple functions because
 floating point addition isn't associative, and so needs careful
 treatment to preserve accuracy.  At that point I'm strongly +1 on
 abandoning attempts to rationalize summation.

 I'm not sure how I'd feel about raising an exception if you try to sum
 any iterable containing misbehaved types like float.  But not only
 would that be a Python 4 effort due to backward incompatibility, but
 it sorta contradicts the main argument of proponents (any type
 implementing __add__ should be sum()-able).

 ___
 Python-Dev mailing list
 Python-Dev@python.org
 https://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe: 
 https://mail.python.org/mailman/options/python-dev/chris.barker%40noaa.gov
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sum(...) limitation - temporary elision take 2

2014-08-11 Thread Julian Taylor
On 04.08.2014 22:22, Jim J. Jewett wrote:
 
 
 
 Sat Aug 2 12:11:54 CEST 2014, Julian Taylor wrote (in
 https://mail.python.org/pipermail/python-dev/2014-August/135623.html ) wrote:
 
 
 Andrea Griffini agriff at tin.it wrote:
 
However sum([[1,2,3],[4],[],[5,6]], []) concatenates the lists.
 
 hm could this be a pure python case that would profit from temporary
 elision [ https://mail.python.org/pipermail/python-dev/2014-June/134826.html 
 ]?
 
 lists could declare the tp_can_elide slot and call list.extend on the
 temporary during its tp_add slot instead of creating a new temporary.
 extend/realloc can avoid the copy if there is free memory available
 after the block.
 
 Yes, with all the same problems.
 
 When dealing with a complex object, how can you be sure that __add__
 won't need access to the original values during the entire computation?
 It works with matrix addition, but not with matric multiplication.
 Depending on the details of the implementation, it could even fail for
 a sort of sliding-neighbor addition similar to the original justification.

The c-extension object knows what its add slot does. An object that
cannot elide would simply always return 0 indicating to python to not
call the inplace variant.
E.g. the numpy __matmul__ operator would never tell python that it can
work inplace, but __add__ would (if the arguments allow it).

Though we may have found a way to do it without the direct help of
Python, but it involves reading and storing the current instruction of
the frame object to figure out if it is called directly from the
interpreter.
unfinished patch to numpy, see the can_elide_temp function:
https://github.com/numpy/numpy/pull/4322.diff
Probably not the best way as this is hardly intended Python C-API but
assuming there is no overlooked issue with this approach it could be a
good workaround for known good Python versions.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reviving restricted mode?

2014-08-11 Thread matsjoyce
Yup, I read that post. However, those specific issues do not exist in my 
module, as there is a module whitelist, and a method whitelist. Builtins are 
now proxied, and all types going in to functions are checked for 
modification. There maybe some holes in my approach, but I can't find them.

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reviving restricted mode?

2014-08-11 Thread Mark Lawrence

On 11/08/2014 18:42, matsjoyce wrote:

Yup, I read that post. However, those specific issues do not exist in my
module, as there is a module whitelist, and a method whitelist. Builtins are
now proxied, and all types going in to functions are checked for
modification. There maybe some holes in my approach, but I can't find them.



Any chance of giving us some context, or do I have to retrieve my 
crystal ball from the menders?


--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reviving restricted mode?

2014-08-11 Thread Skip Montanaro
On Mon, Aug 11, 2014 at 12:42 PM, matsjoyce matsjo...@gmail.com wrote:
 There maybe some holes in my approach, but I can't find them.

There's the rub. Given time, I suspect someone will discover a hole or two.

Skip
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sum(...) limitation

2014-08-11 Thread Terry Reedy

On 8/11/2014 8:26 AM, Ben Hoyt wrote:

It seems to me this is something of a pointless discussion -- I highly
doubt the current situation is going to change, and it works very well.
Even if not perfect, sum() is for numbers, sep.join() for strings.
However, I will add one comment:

I'm overall -1 on trying to change the current situation (except for
adding a join() builtin or str.join class method).


Did you know there actually is a str.join class method?


A 'method' is a function accessed as an attribute of a class.
An 'instance method' is a method whose first parameter is an instance of 
the class. str.join is an instance method.  A 'class method', wrapped as 
such with classmether(), usually by decorating it with @classmethod, 
would take the class as a parameter.



I've never
actually seen it used this way, but for people who just can't stand
sep.join(seq), you can always call str.join(sep, seq) -- works in Python
2 and 3:

  str.join('.', ['abc', 'def', 'ghi'])
'abc.def.ghi'


One could even put 'join = str.join' at the top of a file.

All this is true of *every* instance method.  For instance

int.__add__(1, 2) == 1 .__add__(2) == 1 + 2

True

However, your point that people who cannot stand the abbreviation 
*could* use the full form that is being abbreviated.



In ancient Python, when strings did not have methods, the current string 
methods were functions in the string module. The functions were removed 
in 3.0.  Their continued use in 2.x code is bad for 3.x compatibility, 
so I would not encourage it.


 help(string.join)  # 2.7.8
Help on function join in module string:

join(words, sep=' ')
join(list [,sep]) - string

Return a string composed of the words in list, with
intervening occurrences of sep.  The default separator is a
single space.

'List' is obsolete.  Since sometime before 2.7, 'words' meant an 
iterable of strings.


 def digits():
for i in range(10):
yield str(i)

 string.join(digits(), '')
'0123456789'

Of of the string functions, I believe the conversion of join (and its 
synonum 'joinfields') to a method has been the most contentious.


--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] pathlib handling of trailing slash (Issue #21039)

2014-08-11 Thread Isaac Schwabacher
I see this as a parallel to the question of `pathlib.PurePath.resolve()`, about 
which `pathlib` is (rightly!) very opinionated. Just as `foo/../bar` shouldn't 
resolve to `bar`, `foo/` shouldn't be truncated to `foo`. And if `PurePath` 
doesn't do this, `Path` shouldn't either, because the difference between a 
`Path` and a `PurePath` is the availability of filesystem operations, not the 
identities of the objects involved.

On another level, I think that this is a simple decision: `PosixPath` claims 
right there in the name to implement POSIX behavior, and POSIX specifies that 
`foo` and `foo/` refer (in some cases) to different directory entries. 
Therefore, `foo` and `foo/` can't be the same path. Moreover, `PosixPath` 
implements several methods that have the same name as syscalls that POSIX 
specifies to depend on whether their path arguments end in trailing slashes. 
(Even `stat` 
[http://pubs.opengroup.org/onlinepubs/9699919799/functions/stat.html], which 
explicitly follows symbolic links regardless of the presence of a trailing 
slash, fails with ENOTDIR if given path/to/existing/file/.) It feels 
pathological for `pathlib.PosixPath` to be so almost-compliant.

-ijs
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reviving restricted mode?

2014-08-11 Thread Victor Stinner
2014-08-11 19:42 GMT+02:00 matsjoyce matsjo...@gmail.com:
 Yup, I read that post. However, those specific issues do not exist in my
 module, as there is a module whitelist, and a method whitelist. Builtins are
 now proxied, and all types going in to functions are checked for
 modification. There maybe some holes in my approach, but I can't find them.

I take a look at your code and it looks like almost everything is blocked.

Right now, I'm not sure that your sandbox is useful. For example, for
a simple IRC bot, it would help to have access to some modules like
math, time or random. The problem is to provide a way to allow these
modules and ensure that the policy doesn't introduce a new hole.
Allowing more functions increase the risk of new holes.

Even if your sandbox is strong, CPython contains a lot of code written
in C (50% of CPython is written in C), and the C code usually takes
shortcuts which ignore your sandbox. CPython source code is huge
(+210k of C lines just for the core). Bugs are common, your sandbox is
vulnerable to all these bugs. See for example the Lib/test/crashers/
directory of CPython.

For my pysandbox project, I wrote some proxies and many
vulnerabilities were found in these proxies. They can be explained by
the nature of Python, you can introspect everything, modify
everything, etc. It's very hard to design such proxy in Python.
Implementing such proxy in C helps a little bit.

The rule is always the same: your sandbox is as strong as its weakest
function. A very minor bug is enough to break the whole sandbox. See
the history of pysandbox for examples of such bugs (called
vulnerabilities in the case of a sandbox).

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Multiline with statement line continuation

2014-08-11 Thread Allen Li
This is a problem I sometimes run into when working with a lot of files
simultaneously, where I need three or more `with` statements:

with open('foo') as foo:
with open('bar') as bar:
with open('baz') as baz:
pass

Thankfully, support for multiple items was added in 3.1:

with open('foo') as foo, open('bar') as bar, open('baz') as baz:
pass

However, this begs the need for a multiline form, especially when
working with three or more items:

with open('foo') as foo, \
 open('bar') as bar, \
 open('baz') as baz, \
 open('spam') as spam \
 open('eggs') as eggs:
pass

Currently, this works with explicit line continuation, but as all style
guides favor implicit line continuation over explicit, it would be nice
if you could do the following:

with (open('foo') as foo,
  open('bar') as bar,
  open('baz') as baz,
  open('spam') as spam,
  open('eggs') as eggs):
pass

Currently, this is a syntax error, since the language specification for
`with` is

with_stmt ::=  with with_item (, with_item)* : suite
with_item ::=  expression [as target]

as opposed to something like

with_stmt ::=  with with_expr : suite
with_expr ::=  with_item (, with_item)*
  |'(' with_item (, with_item)* ')'

This is really just a style issue, furthermore a style issue that
requires a change to the languagee grammar (probably, someone who knows
for sure please confirm), so at first I thought it wasn't worth
mentioning, but I'd like to hear what everyone else thinks.


pgp_KoQJlTvy9.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Multiline ‘with’ statement line continuation

2014-08-11 Thread Ben Finney
Allen Li cyberdup...@gmail.com writes:

 Currently, this works with explicit line continuation, but as all
 style guides favor implicit line continuation over explicit, it would
 be nice if you could do the following:

 with (open('foo') as foo,
   open('bar') as bar,
   open('baz') as baz,
   open('spam') as spam,
   open('eggs') as eggs):
 pass

 Currently, this is a syntax error

Even if it weren't a syntax error, the syntax would be ambiguous. How
will you discern the meaning of::

with (
foo,
bar,
baz):
pass

Is that three separate context managers? Or is it one tuple with three
items?

I am definitely sympathetic to the desire for a good solution to
multi-line ‘with’ statements, but I also don't want to see a special
case to make it even more difficult to understand when a tuple literal
is being specified in code. I admit I don't have a good answer to
satisfy both those simultaneously.

-- 
 \   “We have met the enemy and he is us.” —Walt Kelly, _Pogo_ |
  `\1971-04-22 |
_o__)  |
Ben Finney

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sum(...) limitation

2014-08-11 Thread Nick Coghlan
On 12 Aug 2014 03:03, Chris Barker - NOAA Federal chris.bar...@noaa.gov
wrote:

 My confusion is still this:

 Repeated summation of strings has been optimized in cpython even
 though it's not the recommended way to solve that problem.

The quadratic behaviour of repeated str summation is a subtle, silent
error. It *is* controversial that CPython silently optimises some cases of
it away, since it can cause problems when porting affected code to other
interpreters that don't use refcounting and thus have a harder time
implementing such a trick.

It's considered worth the cost, since it dramatically improves the
performance of common naive code in a way that doesn't alter the semantics.

 So why not special case optimize sum() for strings? We are already
 special-case strings to raise an exception.

 It seems pretty pedantic to say: we cod make this work well, but we'd
 rather chide you for not knowing the proper way to do it.

Yes, that's exactly what this is - a nudge towards the right way to
concatenate strings without incurring quadratic behaviour. We *want* people
to learn that distinction, not sweep it under the rug. That's the other
reason the implicit optimisation is controversial - it hides an important
difference in algorithmic complexity from users.

 Practicality beats purity?

Teaching users the difference between linear time operations and quadratic
ones isn't about purity, it's about passing along a fundamental principle
of algorithm scalability.

We do it specifically for strings because they *do* have an optimised
algorithm available that we can point users towards, and concatenating
multiple strings is common.

Other containers don't tend to be concatenated like that in the first
place, so there's no such check pushing other iterables towards
itertools.chain.

Regards,
Nick.


 -Chris




  Although that's not the whole story: in
  practice even numerical sums get split into multiple functions because
  floating point addition isn't associative, and so needs careful
  treatment to preserve accuracy.  At that point I'm strongly +1 on
  abandoning attempts to rationalize summation.
 
  I'm not sure how I'd feel about raising an exception if you try to sum
  any iterable containing misbehaved types like float.  But not only
  would that be a Python 4 effort due to backward incompatibility, but
  it sorta contradicts the main argument of proponents (any type
  implementing __add__ should be sum()-able).
 
  ___
  Python-Dev mailing list
  Python-Dev@python.org
  https://mail.python.org/mailman/listinfo/python-dev
  Unsubscribe:
https://mail.python.org/mailman/options/python-dev/chris.barker%40noaa.gov
 ___
 Python-Dev mailing list
 Python-Dev@python.org
 https://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Multiline with statement line continuation

2014-08-11 Thread Nick Coghlan
On 12 Aug 2014 09:09, Allen Li cyberdup...@gmail.com wrote:

 This is a problem I sometimes run into when working with a lot of files
 simultaneously, where I need three or more `with` statements:

 with open('foo') as foo:
 with open('bar') as bar:
 with open('baz') as baz:
 pass

 Thankfully, support for multiple items was added in 3.1:

 with open('foo') as foo, open('bar') as bar, open('baz') as baz:
 pass

 However, this begs the need for a multiline form, especially when
 working with three or more items:

 with open('foo') as foo, \
  open('bar') as bar, \
  open('baz') as baz, \
  open('spam') as spam \
  open('eggs') as eggs:
 pass

I generally see this kind of construct as a sign that refactoring is
needed. For example, contextlib.ExitStack offers a number of ways to manage
multiple context managers dynamically rather than statically.

Regards,
Nick.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Multiline 'with' statement line continuation

2014-08-11 Thread Ben Hoyt
 Even if it weren't a syntax error, the syntax would be ambiguous. How
 will you discern the meaning of::

 with (
 foo,
 bar,
 baz):
 pass

 Is that three separate context managers? Or is it one tuple with three
 items?

Is it meaningful to use with with a tuple, though? Because a tuple
isn't a context manager with __enter__ and __exit__ methods. For
example:

 with (1,2,3): pass
...
Traceback (most recent call last):
  File stdin, line 1, in module
AttributeError: __exit__

So -- although I'm not arguing for it here -- you'd be turning an code
(a runtime AttributeError) into valid syntax.

-Ben
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sum(...) limitation

2014-08-11 Thread Alexander Belopolsky
On Mon, Aug 11, 2014 at 8:19 PM, Nick Coghlan ncogh...@gmail.com wrote:

 Teaching users the difference between linear time operations and quadratic
 ones isn't about purity, it's about passing along a fundamental principle
 of algorithm scalability.


I would understand if this was done in reduce(operator.add, ..) which
indeed spells out the choice of an algorithm, but why sum() should be O(N)
for numbers and O(N**2) for containers?  Would a python implementation
that, for example, optimizes away 0's in sum(list_of_numbers) be
non-compliant with some fundamental principle?
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sum(...) limitation

2014-08-11 Thread Chris Barker - NOAA Federal
Sorry for the bike shedding here, but:

The quadratic behaviour of repeated str summation is a subtle, silent error.

OK, fair enough. I suppose it would be hard and ugly to catch those
instances and raise an exception pointing users to .join.

*is* controversial that CPython silently optimises some cases of it away,
since it can cause problems when porting affected code to other
interpreters that don't use refcounting and thus have a harder time
implementing such a trick.

Is there anything in the language spec that says string concatenation is
O(n^2)? Or for that matter any of the performs characteristics of build in
types? Those striker as implementation details that SHOULD be particular to
the implementation.

Should we cripple the performance of some operation in Cpython so that it
won't work better that Jython? That seems an odd choice. Then how dare PyPy
make scalar computation faster? People might switch to cPython and not know
they should have been using numpy all along...

It's considered worth the cost, since it dramatically improves the
performance of common naive code in a way that doesn't alter the semantics.

Seems the same argument could be made for sum(list_of_strings).

  It seems pretty pedantic to say: we could make this work well, but we'd
 rather chide you for not knowing the proper way to do it.

Yes, that's exactly what this is - a nudge towards the right way to
concatenate strings without incurring quadratic behaviour.

But if it were optimized, it wouldn't incur quadratic behavior.

We *want* people to learn that distinction, not sweep it under the rug.

But sum() is not inherently quadratic -- that's a limitation of the
implementation. I agree that disallowing it is a good idea given that
behavior, but if it were optimized, there would be no reason to steer
people away.

.join _could_ be naively written with the same poor performance -- why
should users need to understand why one was optimized and one was not?

That's the other reason the implicit optimisation is controversial - it
hides an important difference in algorithmic complexity from users.

It doesn't hide it -- it eliminates it. I suppose it's good for folks to
understand the implications of string immutability for when they write
their own algorithms, but this wouldn't be considered a good argument for a
poorly performing sort() for instance.

 Practicality beats purity?

Teaching users the difference between linear time operations and quadratic
ones isn't about purity, it's about passing along a fundamental principle
of algorithm scalability.

That is a very import a lesson to learn, sure, but python is not only a
teaching language. People will need to learn those lessons at some point,
this one feature makes little difference.

We do it specifically for strings because they *do* have an optimised
algorithm available that we can point users towards, and concatenating
multiple strings is common.

Sure, but I think all that does is teach people about a cpython specific
implementation -- and I doubt naive users get any closer to understanding
algorithmic complexity -- all they learn is you should use string.join().

Oh well, not really that big a deal.

-Chris
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Multiline 'with' statement line continuation

2014-08-11 Thread Ben Finney
Ben Hoyt benh...@gmail.com writes:

 So -- although I'm not arguing for it here -- you'd be turning an code
 (a runtime AttributeError) into valid syntax.

Exactly what I'd want to avoid, especially because it *looks* like a
tuple. There are IMO too many pieces of code that look confusingly
similar to tuples but actually mean something else.

-- 
 \ “I have an answering machine in my car. It says, ‘I'm home now. |
  `\  But leave a message and I'll call when I'm out.’” —Steven Wright |
_o__)  |
Ben Finney

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] sum(...) limitation

2014-08-11 Thread Stephen J. Turnbull
Chris Barker - NOAA Federal writes:

  Is there anything in the language spec that says string concatenation is
  O(n^2)? Or for that matter any of the performs characteristics of build in
  types? Those striker as implementation details that SHOULD be particular to
  the implementation.

Container concatenation isn't quadratic in Python at all.  The naive
implementation of sum() as a loop repeatedly calling __add__ is
quadratic for them.  Strings (and immutable containers in general) are
particularly horrible, as they don't have __iadd__.

You could argue that sum() being a function of an iterable isn't just
a calling convention for a loop encapsulated in a function, but rather
a completely different kind of function that doesn't imply anything
about the implementation, and therefore that it should dispatch on
type(it).  But explicitly dispatching on type(x) is yucky (what if
somebody wants to sum a different type not currently recognized by the
sum() builtin?) so, obviously, we should define a standard __sum__
dunder!  IMO we'd also want a homogeneous_iterable ABC, and a concrete
homogeneous_iterable_of_TYPE for each sum()-able TYPE to help users
catch bugs injecting the wrong type into an iterable_of_TYPE.

But this still sucks.  Why?  Because obviously we'd want the
attractive nuisance of if you have __add__, there's a default
definition of __sum__ (AIUI, this is what bothers Alexander most
about the current situation, at least of the things he's mentioned, I
can really sympathize with his dislike).  And new Pythonistas and lazy
programmers who only intend to use sum() on small enough iterables
will use the default, and their programs will appear to hang on
somewhat larger iterable, or a realtime requirement will go
unsatisfied when least expected, or   If we *don't* have that
property for sum(), ugh!  Yuck!  Same old same old!  (IMHO, YMMV of
course)

It's possible that Python could provide some kind of feature that
would allow an optimized sum function for every type that has __add__,
but I think this will take a lot of thinking.  *Somebody* will do it
(I don't think anybody is +1 on restricting sum() to a subset of types
with __add__).  I just think we should wait until that somebody appears.

  Should we cripple the performance of some operation in Cpython so that it
  won't work better that Jython?

Nobody is crippling operations.  We're prohibiting use of a *name* for
an operation that is associated (strongly so, in my mind) with an
inefficient algorithm in favor of the *same operation* by a different
name (which has no existing implementation, and therefore Python
implementers are responsible for implementing it efficiently).  Note:
the inefficient algorithm isn't inefficient for integers, and it
isn't inefficient for numbers in general (although it's inaccurate for
some classes of numbers).

  Seems the same argument [that Python language doesn't prohibit
  optimizations in particular implementations just because they
  aren't made in others] could be made for sum(list_of_strings).

It could.  But then we have to consider special-casing every builtin
type that provides __add__, and we impose an unobvious burden on user
types that provide __add__.

   It seems pretty pedantic to say: we could make this work well,
   but we'd rather chide you for not knowing the proper way to do
   it.

Nobody disagrees.  But backward compatibility gets in the way.

  But sum() is not inherently quadratic -- that's a limitation of the
  implementation.

But the faulty implementation is the canonical implementation, the
only one that can be defined directly in terms of __add__, and it is
efficient for non-container types.[1]

  .join _could_ be naively written with the same poor performance
  -- why should users need to understand why one was optimized and
  one was not?

Good question.  They shouldn't -- thus the prohibition on sum()ing
strings.

  That is a very import a lesson to learn, sure, but python is not
  only a teaching language. People will need to learn those lessons
  at some point, this one feature makes little difference.

No, it makes a big difference.  If you can do something, then it's OK
to do it, is something Python tries to implement.  If sum() works for
everything with an __add__, given current Python language features
some people are going to end up with very inefficient code and it will
bite some of them (and not necessarily the authors!) at some time.

If it doesn't work for every type with __add__, why not?  You'll end
up playing whack-a-mole with type prohibitions.  Ugh.

  Sure, but I think all that does is teach people about a cpython specific
  implementation -- and I doubt naive users get any closer to understanding
  algorithmic complexity -- all they learn is you should use string.join().
  
  Oh well, not really that big a deal.

Not to Python.  Maybe not to you.  But I've learned a lot about
Pythonic ways of doing things trying to channel the folks who
implemented this 

Re: [Python-Dev] sum(...) limitation

2014-08-11 Thread Ethan Furman

On 08/11/2014 08:50 PM, Stephen J. Turnbull wrote:

Chris Barker - NOAA Federal writes:


It seems pretty pedantic to say: we could make this work well,
but we'd rather chide you for not knowing the proper way to do
it.


Nobody disagrees.  But backward compatibility gets in the way.


Something that currently doesn't work, starts to.  How is that a backward 
compatibility problem?

--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Commit-ready patches in need of review

2014-08-11 Thread Nikolaus Rath
Hello,

The following commit-ready patches have been waiting for review since
May and earlier.It'd be great if someone could find the time to take a
look. I'll be happy to incorporate feedback as necessary:

* http://bugs.python.org/issue1738 (filecmp.dircmp does exact match
  only)

* http://bugs.python.org/issue15955 (gzip, bz2, lzma: add option to
  limit output size)

* http://bugs.python.org/issue20177 (Derby #8: Convert 28 sites to
  Argument Clinic across 2 files)

  I only wrote the patch for one file because I'd like to have feedback
  before tackling the second. However, the patches are independent so
  unless there are other problems this is ready for commit.


Best,
Nikolaus


-- 
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

 »Time flies like an arrow, fruit flies like a Banana.«
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com