[Python-Dev] python2.7 infinite recursion when loading pickled object
Dear all, I discovered a problem using cPickle.loads from CPython 2.7.6. The last line in the following code raises an infinite recursion class T(object): def __init__(self): self.item = list() def __getattr__(self, name): return getattr(self.item, name) import cPickle t = T() l = cPickle.dumps(t) cPickle.loads(l) loads triggers T.__getattr__ using getattr(inst, __setstate__, None) for looking up a __setstate__ method, which is not implemented for T. As the item attribute is missing at this time, the ininfite recursion starts. The infinite recursion disappears if I attach a default implementation for __setstate__ to T: def __setstate__(self, dd): self.__dict__ = dd This could be fixed by using „hasattr“ in pickle before trying to call „getattr“. Is this a bug or did I miss something ? Kind Regards, Uwe ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] python2.7 infinite recursion when loading pickled object
On 8/11/2014 5:10 AM, Schmitt Uwe (ID SIS) wrote: Python usage questions should be directed to python-list, for instance. I discovered a problem using cPickle.loads from CPython 2.7.6. The problem is your code having infinite recursion. You only discovered it with pickle. The last line in the following code raises an infinite recursion class T(object): def __init__(self): self.item = list() def __getattr__(self, name): return getattr(self.item, name) This is a (common) bug in your program. __getattr__ should call self.__dict__(name) to avoid the recursion. -- Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] python2.7 infinite recursion when loading pickled object
Terry Reedy wrote: On 8/11/2014 5:10 AM, Schmitt Uwe (ID SIS) wrote: Python usage questions should be directed to python-list, for instance. I discovered a problem using cPickle.loads from CPython 2.7.6. The problem is your code having infinite recursion. You only discovered it with pickle. The last line in the following code raises an infinite recursion class T(object): def __init__(self): self.item = list() def __getattr__(self, name): return getattr(self.item, name) This is a (common) bug in your program. __getattr__ should call self.__dict__(name) to avoid the recursion. Read again. The OP tries to delegate attribute lookup to an (existing) attribute. IMO the root cause of the problem is that pickle looks up __dunder__ methods in the instance rather than the class. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] python2.7 infinite recursion when loading pickled object
On Mon, Aug 11, 2014 at 9:40 PM, Peter Otten __pete...@web.de wrote: Read again. The OP tries to delegate attribute lookup to an (existing) attribute. IMO the root cause of the problem is that pickle looks up __dunder__ methods in the instance rather than the class. The recursion comes from the attempted lookup of self.item, when __init__ hasn't been called. ChrisA ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] python2.7 infinite recursion when loading pickled object
On Mon, 11 Aug 2014 21:43:00 +1000, Chris Angelico ros...@gmail.com wrote: On Mon, Aug 11, 2014 at 9:40 PM, Peter Otten __pete...@web.de wrote: Read again. The OP tries to delegate attribute lookup to an (existing) attribute. IMO the root cause of the problem is that pickle looks up __dunder__ methods in the instance rather than the class. The recursion comes from the attempted lookup of self.item, when __init__ hasn't been called. Indeed, and this is what the OP missed. With a class like this, it is necessary to *make* it pickleable, since the pickle protocol doesn't call __init__. --David ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] python2.7 infinite recursion when loading pickled object
Chris Angelico wrote: On Mon, Aug 11, 2014 at 9:40 PM, Peter Otten __pete...@web.de wrote: Read again. The OP tries to delegate attribute lookup to an (existing) attribute. IMO the root cause of the problem is that pickle looks up __dunder__ methods in the instance rather than the class. The recursion comes from the attempted lookup of self.item, when __init__ hasn't been called. You are right. Sorry for the confusion. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] sum(...) limitation
It seems to me this is something of a pointless discussion -- I highly doubt the current situation is going to change, and it works very well. Even if not perfect, sum() is for numbers, sep.join() for strings. However, I will add one comment: I'm overall -1 on trying to change the current situation (except for adding a join() builtin or str.join class method). Did you know there actually is a str.join class method? I've never actually seen it used this way, but for people who just can't stand sep.join(seq), you can always call str.join(sep, seq) -- works in Python 2 and 3: str.join('.', ['abc', 'def', 'ghi']) 'abc.def.ghi' This works as a side effect of the fact that you can call methods as cls.method(instance, args). -Ben ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] python2.7 infinite recursion when loading pickled object
Schmitt Uwe (ID SIS) uwe.schm...@id.ethz.ch writes: I discovered a problem using cPickle.loads from CPython 2.7.6. The last line in the following code raises an infinite recursion class T(object): def __init__(self): self.item = list() def __getattr__(self, name): return getattr(self.item, name) import cPickle t = T() l = cPickle.dumps(t) cPickle.loads(l) ... Is this a bug or did I miss something ? The issue is that your __getattr__ raises RuntimeError (due to infinite recursion) for non-existing attributes instead of AttributeError. To fix it, you could use object.__getattribute__: class C: def __init__(self): self.item = [] def __getattr__(self, name): return getattr(object.__getattribute__(self, 'item'), name) There were issues in the past due to {get,has}attr silencing non-AttributeError exceptions; therefore it is good that pickle breaks when it gets RuntimeError instead of AttributeError. -- Akira ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] os.walk() is going to be *fast* with scandir
Armin Rigo ar...@tunes.org writes: On 10 August 2014 08:11, Larry Hastings la...@hastings.org wrote: A small tip from my bzr days - cd into the directory before scanning it I doubt that's permissible for a library function like os.scandir(). Indeed, chdir() is notably not compatible with multithreading. There would be a non-portable but clean way to do that: the functions openat() and fstatat(). They only exist on relatively modern Linuxes, though. There is os.fwalk() that could be both safer and faster than os.walk(). It yields rootdir fd that can be used by functions that support dir_fd parameter, see os.supports_dir_fd set. They use *at() functions under the hood. os.fwalk() could be implemented in terms of os.scandir() if the latter would support fd parameter like os.listdir() does (be in os.supports_fd set (note: it is different from os.supports_dir_fd)). Victor Stinner suggested [1] to allow scandir(fd) but I don't see it being mentioned in the pep 471 [2]: it neither supports nor rejects the idea. [1] https://mail.python.org/pipermail/python-dev/2014-July/135283.html [2] http://legacy.python.org/dev/peps/pep-0471/ -- Akira ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] os.walk() is going to be *fast* with scandir
Victor Stinner suggested [1] to allow scandir(fd) but I don't see it being mentioned in the pep 471 [2]: it neither supports nor rejects the idea. [1] https://mail.python.org/pipermail/python-dev/2014-July/135283.html [2] http://legacy.python.org/dev/peps/pep-0471/ Yes, listdir() supports fd, and I think scandir() probably will too to parallel that, if not for v1.0 then soon after. Victor and I want to focus on getting the PEP 471 (string path only) version working first. -Ben ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] sum(...) limitation
I'm very sympathetic to Steven's explanation that we wouldn't be having this discussion if we used a different operator for string concatenation. Sure -- but just imagine the conversations we could be having instead : what does bit wise and of a string mean? A bytes object? I cod see it as a character-wise and, for instance ;-) My confusion is still this: Repeated summation of strings has been optimized in cpython even though it's not the recommended way to solve that problem. So why not special case optimize sum() for strings? We are already special-case strings to raise an exception. It seems pretty pedantic to say: we cod make this work well, but we'd rather chide you for not knowing the proper way to do it. Practicality beats purity? -Chris Although that's not the whole story: in practice even numerical sums get split into multiple functions because floating point addition isn't associative, and so needs careful treatment to preserve accuracy. At that point I'm strongly +1 on abandoning attempts to rationalize summation. I'm not sure how I'd feel about raising an exception if you try to sum any iterable containing misbehaved types like float. But not only would that be a Python 4 effort due to backward incompatibility, but it sorta contradicts the main argument of proponents (any type implementing __add__ should be sum()-able). ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/chris.barker%40noaa.gov ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] sum(...) limitation - temporary elision take 2
On 04.08.2014 22:22, Jim J. Jewett wrote: Sat Aug 2 12:11:54 CEST 2014, Julian Taylor wrote (in https://mail.python.org/pipermail/python-dev/2014-August/135623.html ) wrote: Andrea Griffini agriff at tin.it wrote: However sum([[1,2,3],[4],[],[5,6]], []) concatenates the lists. hm could this be a pure python case that would profit from temporary elision [ https://mail.python.org/pipermail/python-dev/2014-June/134826.html ]? lists could declare the tp_can_elide slot and call list.extend on the temporary during its tp_add slot instead of creating a new temporary. extend/realloc can avoid the copy if there is free memory available after the block. Yes, with all the same problems. When dealing with a complex object, how can you be sure that __add__ won't need access to the original values during the entire computation? It works with matrix addition, but not with matric multiplication. Depending on the details of the implementation, it could even fail for a sort of sliding-neighbor addition similar to the original justification. The c-extension object knows what its add slot does. An object that cannot elide would simply always return 0 indicating to python to not call the inplace variant. E.g. the numpy __matmul__ operator would never tell python that it can work inplace, but __add__ would (if the arguments allow it). Though we may have found a way to do it without the direct help of Python, but it involves reading and storing the current instruction of the frame object to figure out if it is called directly from the interpreter. unfinished patch to numpy, see the can_elide_temp function: https://github.com/numpy/numpy/pull/4322.diff Probably not the best way as this is hardly intended Python C-API but assuming there is no overlooked issue with this approach it could be a good workaround for known good Python versions. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Reviving restricted mode?
Yup, I read that post. However, those specific issues do not exist in my module, as there is a module whitelist, and a method whitelist. Builtins are now proxied, and all types going in to functions are checked for modification. There maybe some holes in my approach, but I can't find them. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Reviving restricted mode?
On 11/08/2014 18:42, matsjoyce wrote: Yup, I read that post. However, those specific issues do not exist in my module, as there is a module whitelist, and a method whitelist. Builtins are now proxied, and all types going in to functions are checked for modification. There maybe some holes in my approach, but I can't find them. Any chance of giving us some context, or do I have to retrieve my crystal ball from the menders? -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Reviving restricted mode?
On Mon, Aug 11, 2014 at 12:42 PM, matsjoyce matsjo...@gmail.com wrote: There maybe some holes in my approach, but I can't find them. There's the rub. Given time, I suspect someone will discover a hole or two. Skip ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] sum(...) limitation
On 8/11/2014 8:26 AM, Ben Hoyt wrote: It seems to me this is something of a pointless discussion -- I highly doubt the current situation is going to change, and it works very well. Even if not perfect, sum() is for numbers, sep.join() for strings. However, I will add one comment: I'm overall -1 on trying to change the current situation (except for adding a join() builtin or str.join class method). Did you know there actually is a str.join class method? A 'method' is a function accessed as an attribute of a class. An 'instance method' is a method whose first parameter is an instance of the class. str.join is an instance method. A 'class method', wrapped as such with classmether(), usually by decorating it with @classmethod, would take the class as a parameter. I've never actually seen it used this way, but for people who just can't stand sep.join(seq), you can always call str.join(sep, seq) -- works in Python 2 and 3: str.join('.', ['abc', 'def', 'ghi']) 'abc.def.ghi' One could even put 'join = str.join' at the top of a file. All this is true of *every* instance method. For instance int.__add__(1, 2) == 1 .__add__(2) == 1 + 2 True However, your point that people who cannot stand the abbreviation *could* use the full form that is being abbreviated. In ancient Python, when strings did not have methods, the current string methods were functions in the string module. The functions were removed in 3.0. Their continued use in 2.x code is bad for 3.x compatibility, so I would not encourage it. help(string.join) # 2.7.8 Help on function join in module string: join(words, sep=' ') join(list [,sep]) - string Return a string composed of the words in list, with intervening occurrences of sep. The default separator is a single space. 'List' is obsolete. Since sometime before 2.7, 'words' meant an iterable of strings. def digits(): for i in range(10): yield str(i) string.join(digits(), '') '0123456789' Of of the string functions, I believe the conversion of join (and its synonum 'joinfields') to a method has been the most contentious. -- Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] pathlib handling of trailing slash (Issue #21039)
I see this as a parallel to the question of `pathlib.PurePath.resolve()`, about which `pathlib` is (rightly!) very opinionated. Just as `foo/../bar` shouldn't resolve to `bar`, `foo/` shouldn't be truncated to `foo`. And if `PurePath` doesn't do this, `Path` shouldn't either, because the difference between a `Path` and a `PurePath` is the availability of filesystem operations, not the identities of the objects involved. On another level, I think that this is a simple decision: `PosixPath` claims right there in the name to implement POSIX behavior, and POSIX specifies that `foo` and `foo/` refer (in some cases) to different directory entries. Therefore, `foo` and `foo/` can't be the same path. Moreover, `PosixPath` implements several methods that have the same name as syscalls that POSIX specifies to depend on whether their path arguments end in trailing slashes. (Even `stat` [http://pubs.opengroup.org/onlinepubs/9699919799/functions/stat.html], which explicitly follows symbolic links regardless of the presence of a trailing slash, fails with ENOTDIR if given path/to/existing/file/.) It feels pathological for `pathlib.PosixPath` to be so almost-compliant. -ijs ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Reviving restricted mode?
2014-08-11 19:42 GMT+02:00 matsjoyce matsjo...@gmail.com: Yup, I read that post. However, those specific issues do not exist in my module, as there is a module whitelist, and a method whitelist. Builtins are now proxied, and all types going in to functions are checked for modification. There maybe some holes in my approach, but I can't find them. I take a look at your code and it looks like almost everything is blocked. Right now, I'm not sure that your sandbox is useful. For example, for a simple IRC bot, it would help to have access to some modules like math, time or random. The problem is to provide a way to allow these modules and ensure that the policy doesn't introduce a new hole. Allowing more functions increase the risk of new holes. Even if your sandbox is strong, CPython contains a lot of code written in C (50% of CPython is written in C), and the C code usually takes shortcuts which ignore your sandbox. CPython source code is huge (+210k of C lines just for the core). Bugs are common, your sandbox is vulnerable to all these bugs. See for example the Lib/test/crashers/ directory of CPython. For my pysandbox project, I wrote some proxies and many vulnerabilities were found in these proxies. They can be explained by the nature of Python, you can introspect everything, modify everything, etc. It's very hard to design such proxy in Python. Implementing such proxy in C helps a little bit. The rule is always the same: your sandbox is as strong as its weakest function. A very minor bug is enough to break the whole sandbox. See the history of pysandbox for examples of such bugs (called vulnerabilities in the case of a sandbox). Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Multiline with statement line continuation
This is a problem I sometimes run into when working with a lot of files simultaneously, where I need three or more `with` statements: with open('foo') as foo: with open('bar') as bar: with open('baz') as baz: pass Thankfully, support for multiple items was added in 3.1: with open('foo') as foo, open('bar') as bar, open('baz') as baz: pass However, this begs the need for a multiline form, especially when working with three or more items: with open('foo') as foo, \ open('bar') as bar, \ open('baz') as baz, \ open('spam') as spam \ open('eggs') as eggs: pass Currently, this works with explicit line continuation, but as all style guides favor implicit line continuation over explicit, it would be nice if you could do the following: with (open('foo') as foo, open('bar') as bar, open('baz') as baz, open('spam') as spam, open('eggs') as eggs): pass Currently, this is a syntax error, since the language specification for `with` is with_stmt ::= with with_item (, with_item)* : suite with_item ::= expression [as target] as opposed to something like with_stmt ::= with with_expr : suite with_expr ::= with_item (, with_item)* |'(' with_item (, with_item)* ')' This is really just a style issue, furthermore a style issue that requires a change to the languagee grammar (probably, someone who knows for sure please confirm), so at first I thought it wasn't worth mentioning, but I'd like to hear what everyone else thinks. pgp_KoQJlTvy9.pgp Description: PGP signature ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Multiline ‘with’ statement line continuation
Allen Li cyberdup...@gmail.com writes: Currently, this works with explicit line continuation, but as all style guides favor implicit line continuation over explicit, it would be nice if you could do the following: with (open('foo') as foo, open('bar') as bar, open('baz') as baz, open('spam') as spam, open('eggs') as eggs): pass Currently, this is a syntax error Even if it weren't a syntax error, the syntax would be ambiguous. How will you discern the meaning of:: with ( foo, bar, baz): pass Is that three separate context managers? Or is it one tuple with three items? I am definitely sympathetic to the desire for a good solution to multi-line ‘with’ statements, but I also don't want to see a special case to make it even more difficult to understand when a tuple literal is being specified in code. I admit I don't have a good answer to satisfy both those simultaneously. -- \ “We have met the enemy and he is us.” —Walt Kelly, _Pogo_ | `\1971-04-22 | _o__) | Ben Finney ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] sum(...) limitation
On 12 Aug 2014 03:03, Chris Barker - NOAA Federal chris.bar...@noaa.gov wrote: My confusion is still this: Repeated summation of strings has been optimized in cpython even though it's not the recommended way to solve that problem. The quadratic behaviour of repeated str summation is a subtle, silent error. It *is* controversial that CPython silently optimises some cases of it away, since it can cause problems when porting affected code to other interpreters that don't use refcounting and thus have a harder time implementing such a trick. It's considered worth the cost, since it dramatically improves the performance of common naive code in a way that doesn't alter the semantics. So why not special case optimize sum() for strings? We are already special-case strings to raise an exception. It seems pretty pedantic to say: we cod make this work well, but we'd rather chide you for not knowing the proper way to do it. Yes, that's exactly what this is - a nudge towards the right way to concatenate strings without incurring quadratic behaviour. We *want* people to learn that distinction, not sweep it under the rug. That's the other reason the implicit optimisation is controversial - it hides an important difference in algorithmic complexity from users. Practicality beats purity? Teaching users the difference between linear time operations and quadratic ones isn't about purity, it's about passing along a fundamental principle of algorithm scalability. We do it specifically for strings because they *do* have an optimised algorithm available that we can point users towards, and concatenating multiple strings is common. Other containers don't tend to be concatenated like that in the first place, so there's no such check pushing other iterables towards itertools.chain. Regards, Nick. -Chris Although that's not the whole story: in practice even numerical sums get split into multiple functions because floating point addition isn't associative, and so needs careful treatment to preserve accuracy. At that point I'm strongly +1 on abandoning attempts to rationalize summation. I'm not sure how I'd feel about raising an exception if you try to sum any iterable containing misbehaved types like float. But not only would that be a Python 4 effort due to backward incompatibility, but it sorta contradicts the main argument of proponents (any type implementing __add__ should be sum()-able). ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/chris.barker%40noaa.gov ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Multiline with statement line continuation
On 12 Aug 2014 09:09, Allen Li cyberdup...@gmail.com wrote: This is a problem I sometimes run into when working with a lot of files simultaneously, where I need three or more `with` statements: with open('foo') as foo: with open('bar') as bar: with open('baz') as baz: pass Thankfully, support for multiple items was added in 3.1: with open('foo') as foo, open('bar') as bar, open('baz') as baz: pass However, this begs the need for a multiline form, especially when working with three or more items: with open('foo') as foo, \ open('bar') as bar, \ open('baz') as baz, \ open('spam') as spam \ open('eggs') as eggs: pass I generally see this kind of construct as a sign that refactoring is needed. For example, contextlib.ExitStack offers a number of ways to manage multiple context managers dynamically rather than statically. Regards, Nick. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Multiline 'with' statement line continuation
Even if it weren't a syntax error, the syntax would be ambiguous. How will you discern the meaning of:: with ( foo, bar, baz): pass Is that three separate context managers? Or is it one tuple with three items? Is it meaningful to use with with a tuple, though? Because a tuple isn't a context manager with __enter__ and __exit__ methods. For example: with (1,2,3): pass ... Traceback (most recent call last): File stdin, line 1, in module AttributeError: __exit__ So -- although I'm not arguing for it here -- you'd be turning an code (a runtime AttributeError) into valid syntax. -Ben ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] sum(...) limitation
On Mon, Aug 11, 2014 at 8:19 PM, Nick Coghlan ncogh...@gmail.com wrote: Teaching users the difference between linear time operations and quadratic ones isn't about purity, it's about passing along a fundamental principle of algorithm scalability. I would understand if this was done in reduce(operator.add, ..) which indeed spells out the choice of an algorithm, but why sum() should be O(N) for numbers and O(N**2) for containers? Would a python implementation that, for example, optimizes away 0's in sum(list_of_numbers) be non-compliant with some fundamental principle? ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] sum(...) limitation
Sorry for the bike shedding here, but: The quadratic behaviour of repeated str summation is a subtle, silent error. OK, fair enough. I suppose it would be hard and ugly to catch those instances and raise an exception pointing users to .join. *is* controversial that CPython silently optimises some cases of it away, since it can cause problems when porting affected code to other interpreters that don't use refcounting and thus have a harder time implementing such a trick. Is there anything in the language spec that says string concatenation is O(n^2)? Or for that matter any of the performs characteristics of build in types? Those striker as implementation details that SHOULD be particular to the implementation. Should we cripple the performance of some operation in Cpython so that it won't work better that Jython? That seems an odd choice. Then how dare PyPy make scalar computation faster? People might switch to cPython and not know they should have been using numpy all along... It's considered worth the cost, since it dramatically improves the performance of common naive code in a way that doesn't alter the semantics. Seems the same argument could be made for sum(list_of_strings). It seems pretty pedantic to say: we could make this work well, but we'd rather chide you for not knowing the proper way to do it. Yes, that's exactly what this is - a nudge towards the right way to concatenate strings without incurring quadratic behaviour. But if it were optimized, it wouldn't incur quadratic behavior. We *want* people to learn that distinction, not sweep it under the rug. But sum() is not inherently quadratic -- that's a limitation of the implementation. I agree that disallowing it is a good idea given that behavior, but if it were optimized, there would be no reason to steer people away. .join _could_ be naively written with the same poor performance -- why should users need to understand why one was optimized and one was not? That's the other reason the implicit optimisation is controversial - it hides an important difference in algorithmic complexity from users. It doesn't hide it -- it eliminates it. I suppose it's good for folks to understand the implications of string immutability for when they write their own algorithms, but this wouldn't be considered a good argument for a poorly performing sort() for instance. Practicality beats purity? Teaching users the difference between linear time operations and quadratic ones isn't about purity, it's about passing along a fundamental principle of algorithm scalability. That is a very import a lesson to learn, sure, but python is not only a teaching language. People will need to learn those lessons at some point, this one feature makes little difference. We do it specifically for strings because they *do* have an optimised algorithm available that we can point users towards, and concatenating multiple strings is common. Sure, but I think all that does is teach people about a cpython specific implementation -- and I doubt naive users get any closer to understanding algorithmic complexity -- all they learn is you should use string.join(). Oh well, not really that big a deal. -Chris ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Multiline 'with' statement line continuation
Ben Hoyt benh...@gmail.com writes: So -- although I'm not arguing for it here -- you'd be turning an code (a runtime AttributeError) into valid syntax. Exactly what I'd want to avoid, especially because it *looks* like a tuple. There are IMO too many pieces of code that look confusingly similar to tuples but actually mean something else. -- \ “I have an answering machine in my car. It says, ‘I'm home now. | `\ But leave a message and I'll call when I'm out.’” —Steven Wright | _o__) | Ben Finney ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] sum(...) limitation
Chris Barker - NOAA Federal writes: Is there anything in the language spec that says string concatenation is O(n^2)? Or for that matter any of the performs characteristics of build in types? Those striker as implementation details that SHOULD be particular to the implementation. Container concatenation isn't quadratic in Python at all. The naive implementation of sum() as a loop repeatedly calling __add__ is quadratic for them. Strings (and immutable containers in general) are particularly horrible, as they don't have __iadd__. You could argue that sum() being a function of an iterable isn't just a calling convention for a loop encapsulated in a function, but rather a completely different kind of function that doesn't imply anything about the implementation, and therefore that it should dispatch on type(it). But explicitly dispatching on type(x) is yucky (what if somebody wants to sum a different type not currently recognized by the sum() builtin?) so, obviously, we should define a standard __sum__ dunder! IMO we'd also want a homogeneous_iterable ABC, and a concrete homogeneous_iterable_of_TYPE for each sum()-able TYPE to help users catch bugs injecting the wrong type into an iterable_of_TYPE. But this still sucks. Why? Because obviously we'd want the attractive nuisance of if you have __add__, there's a default definition of __sum__ (AIUI, this is what bothers Alexander most about the current situation, at least of the things he's mentioned, I can really sympathize with his dislike). And new Pythonistas and lazy programmers who only intend to use sum() on small enough iterables will use the default, and their programs will appear to hang on somewhat larger iterable, or a realtime requirement will go unsatisfied when least expected, or If we *don't* have that property for sum(), ugh! Yuck! Same old same old! (IMHO, YMMV of course) It's possible that Python could provide some kind of feature that would allow an optimized sum function for every type that has __add__, but I think this will take a lot of thinking. *Somebody* will do it (I don't think anybody is +1 on restricting sum() to a subset of types with __add__). I just think we should wait until that somebody appears. Should we cripple the performance of some operation in Cpython so that it won't work better that Jython? Nobody is crippling operations. We're prohibiting use of a *name* for an operation that is associated (strongly so, in my mind) with an inefficient algorithm in favor of the *same operation* by a different name (which has no existing implementation, and therefore Python implementers are responsible for implementing it efficiently). Note: the inefficient algorithm isn't inefficient for integers, and it isn't inefficient for numbers in general (although it's inaccurate for some classes of numbers). Seems the same argument [that Python language doesn't prohibit optimizations in particular implementations just because they aren't made in others] could be made for sum(list_of_strings). It could. But then we have to consider special-casing every builtin type that provides __add__, and we impose an unobvious burden on user types that provide __add__. It seems pretty pedantic to say: we could make this work well, but we'd rather chide you for not knowing the proper way to do it. Nobody disagrees. But backward compatibility gets in the way. But sum() is not inherently quadratic -- that's a limitation of the implementation. But the faulty implementation is the canonical implementation, the only one that can be defined directly in terms of __add__, and it is efficient for non-container types.[1] .join _could_ be naively written with the same poor performance -- why should users need to understand why one was optimized and one was not? Good question. They shouldn't -- thus the prohibition on sum()ing strings. That is a very import a lesson to learn, sure, but python is not only a teaching language. People will need to learn those lessons at some point, this one feature makes little difference. No, it makes a big difference. If you can do something, then it's OK to do it, is something Python tries to implement. If sum() works for everything with an __add__, given current Python language features some people are going to end up with very inefficient code and it will bite some of them (and not necessarily the authors!) at some time. If it doesn't work for every type with __add__, why not? You'll end up playing whack-a-mole with type prohibitions. Ugh. Sure, but I think all that does is teach people about a cpython specific implementation -- and I doubt naive users get any closer to understanding algorithmic complexity -- all they learn is you should use string.join(). Oh well, not really that big a deal. Not to Python. Maybe not to you. But I've learned a lot about Pythonic ways of doing things trying to channel the folks who implemented this
Re: [Python-Dev] sum(...) limitation
On 08/11/2014 08:50 PM, Stephen J. Turnbull wrote: Chris Barker - NOAA Federal writes: It seems pretty pedantic to say: we could make this work well, but we'd rather chide you for not knowing the proper way to do it. Nobody disagrees. But backward compatibility gets in the way. Something that currently doesn't work, starts to. How is that a backward compatibility problem? -- ~Ethan~ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Commit-ready patches in need of review
Hello, The following commit-ready patches have been waiting for review since May and earlier.It'd be great if someone could find the time to take a look. I'll be happy to incorporate feedback as necessary: * http://bugs.python.org/issue1738 (filecmp.dircmp does exact match only) * http://bugs.python.org/issue15955 (gzip, bz2, lzma: add option to limit output size) * http://bugs.python.org/issue20177 (Derby #8: Convert 28 sites to Argument Clinic across 2 files) I only wrote the patch for one file because I'd like to have feedback before tackling the second. However, the patches are independent so unless there are other problems this is ready for commit. Best, Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F »Time flies like an arrow, fruit flies like a Banana.« ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com