[Python-Dev] Suggestion for a new built-in - flatten
Hello all, I have a suggestion for a new Python built in function: 'flatten'. This would (as if it needs explanation) take a single sequence, where each element can be a sequence (or iterable ?) nested to an arbitrary depth. It would return a flattened list. A useful restriction could be that it wouldn't expand strings :-) I've needed this several times, and recently twice at work. There are several implementations in the Python cookbook. When I posted on my blog recently asking for one liners to flatten a list of lists (only 1 level of nesting), I had 26 responses, several of them saying it was a problem they had encountered before. There are also numerous places on the web bewailing the lack of this as a built-in. All of this points to the fact that it is something that would be appreciated as a built in. There is an implementation already in Tkinter : import _tkinter._flatten as flatten There are several different possible approaches in pure Python, but is this an idea that has legs ? All the best, Michael Foord http://www.voidspace.org.uk/python/index.shtml -- No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.1.405 / Virus Database: 268.12.7/454 - Release Date: 21/09/2006 ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Suggestion for a new built-in - flatten
Michael Foord [EMAIL PROTECTED] wrote: Hello all, I have a suggestion for a new Python built in function: 'flatten'. This has been brought up many times. I'm -1 on its inclusion, if only because it's a fairly simple 9-line function (at least the trivial version I came up with), and not all X-line functions should be in the standard library. Also, while I have had need for such a function in the past, I have found that I haven't needed it in a few years. - Josiah ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Suggestion for a new built-in - flatten
On 9/22/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Michael There are several different possible approaches in pure Python,Michael but is this an idea that has legs ?Why not add it to itertools?Then, if you need a true list, just calllist() on the returned iterator. Yeah, this is a better solution. flatten() just doesn't scream built-in! to me.-Brett ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Suggestion for a new built-in - flatten
On 9/22/06, Josiah Carlson [EMAIL PROTECTED] wrote: Michael Foord [EMAIL PROTECTED] wrote: Hello all, I have a suggestion for a new Python built in function: 'flatten'. This has been brought up many times. I'm -1 on its inclusion, if only because it's a fairly simple 9-line function (at least the trivial version I came up with), and not all X-line functions should be in the standard library. Also, while I have had need for such a function in the past, I have found that I haven't needed it in a few years. I think instead of adding a flatten function perhaps we should think about adding something like Erlang's iolist support. The idea is that methods like writelines should be able to take nested iterators and consume any object they find that implements the buffer protocol. -bob ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Suggestion for a new built-in - flatten
On Fri, Sep 22, 2006 at 12:05:19PM -0700, Bob Ippolito wrote: On 9/22/06, Josiah Carlson [EMAIL PROTECTED] wrote: Michael Foord [EMAIL PROTECTED] wrote: Hello all, I have a suggestion for a new Python built in function: 'flatten'. This has been brought up many times. I'm -1 on its inclusion, if only because it's a fairly simple 9-line function (at least the trivial version I came up with), and not all X-line functions should be in the standard library. Also, while I have had need for such a function in the past, I have found that I haven't needed it in a few years. I think instead of adding a flatten function perhaps we should think about adding something like Erlang's iolist support. The idea is that methods like writelines should be able to take nested iterators and consume any object they find that implements the buffer protocol. Which is no different then just passing in a generator/iterator that does flattening. Don't much see the point in gumming up the file protocol with this special casing; still will have requests for a flattener elsewhere. If flattening was added, should definitely be a general obj, not a special casing in one method in my opinion. ~harring pgpudc8tPUGor.pgp Description: PGP signature ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Suggestion for a new built-in - flatten
On Fri, 22 Sep 2006 18:43:42 +0100, Michael Foord [EMAIL PROTECTED] wrote: I have a suggestion for a new Python built in function: 'flatten'. This seems superficially like a good idea, but I think adding it to Python anywhere would do a lot more harm than good. I can see that consensus is already strongly against a builtin, but I think it would be bad to add to itertools too. Flattening always *seems* to be a trivial and obvious operation. I just need something that takes a group of deeply structured data and turns it into a group of shallowly structured data.. Everyone that has this requirement assumes that their list of implicit requirements for flattening is the obviously correct one. This wouldn't be a problem except that everyone has a different idea of those requirements:). Here are a few issues. What do you do when you encounter a dict? You can treat it as its keys(), its values(), or its items(). What do you do when you encounter an iterable object? What order do you flatten set()s in? (and, ha ha, do you Set the same?) How are user-defined flattening behaviors registered? Is it a new special method, a registration API? How do you pass information about the flattening in progress to the user-defined behaviors? If you do something special to iterables, do you special-case strings? Why or why not? What do you do if you encounter a function? This is kind of a trick question, since Nevow's flattener *calls* functions as it encounters them, then treats the *result* of calling them as further input. If you don't think that functions are special, what about *generator* functions? How do you tell the difference? What about functions that return generators but aren't themselves generators? What about functions that return non-generator iterators? What about pre-generated generator objects (if you don't want to treat iterables as special, are generators special?). Do you produce the output as a structured list or an iterator that works incrementally? Also, at least Nevow uses flatten to mean serialize to bytes, not produce a flat list, and I imagine at least a few other web frameworks do as well. That starts to get into encoding issues. If you make a decision one way or another on any of these questions of policy, you are going to make flatten() useless to a significant portion of its potential userbase. The only difference between having it in the standard library and not is that if it's there, they'll spend an hour being confused by the weird way that it's dealing with insert your favorite data type here rather than just doing the obvious thing, and they'll take a minute to write the 10-line function that they need. Without the standard library, they'll skip to step 2 and save a lot of time. I would love to see a unified API that figured out all of these problems, and put them together into a (non-stdlib) library that anyone interested could use for a few years to work the kinks out. Although it might be nice to have a simple flatten interface, I don't think that it would ever be simple enough to stick into a builtin; it would just be the default instance of the IncrementalDestructuringProcess class with the most popular (as determined by polling users of the library after a year or so) IncrementalDestructuringTypePolicy. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Suggestion for a new built-in - flatten
On 9/22/06, Brian Harring [EMAIL PROTECTED] wrote: On Fri, Sep 22, 2006 at 12:05:19PM -0700, Bob Ippolito wrote: On 9/22/06, Josiah Carlson [EMAIL PROTECTED] wrote: Michael Foord [EMAIL PROTECTED] wrote: Hello all, I have a suggestion for a new Python built in function: 'flatten'. This has been brought up many times. I'm -1 on its inclusion, if only because it's a fairly simple 9-line function (at least the trivial version I came up with), and not all X-line functions should be in the standard library. Also, while I have had need for such a function in the past, I have found that I haven't needed it in a few years. I think instead of adding a flatten function perhaps we should think about adding something like Erlang's iolist support. The idea is that methods like writelines should be able to take nested iterators and consume any object they find that implements the buffer protocol. Which is no different then just passing in a generator/iterator that does flattening. Don't much see the point in gumming up the file protocol with this special casing; still will have requests for a flattener elsewhere. If flattening was added, should definitely be a general obj, not a special casing in one method in my opinion. I disagree, the reason for iolist is performance and convenience; the required indirection of having to explicitly call a flattener function removes some optimization potential and makes it less convenient to use. While there certainly should be a general mechanism available to perform the task (easily accessible from C), the user would be better served by not having to explicitly call itertools.iterbuffers every time they want to write recursive iterables of stuff. -bob ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Suggestion for a new built-in - flatten
[EMAIL PROTECTED] wrote: On Fri, 22 Sep 2006 18:43:42 +0100, Michael Foord [EMAIL PROTECTED] wrote: I have a suggestion for a new Python built in function: 'flatten'. This seems superficially like a good idea, but I think adding it to Python anywhere would do a lot more harm than good. I can see that consensus is already strongly against a builtin, but I think it would be bad to add to itertools too. Flattening always *seems* to be a trivial and obvious operation. I just need something that takes a group of deeply structured data and turns it into a group of shallowly structured data.. Everyone that has this requirement assumes that their list of implicit requirements for flattening is the obviously correct one. This wouldn't be a problem except that everyone has a different idea of those requirements:). Here are a few issues. What do you do when you encounter a dict? You can treat it as its keys(), its values(), or its items(). What do you do when you encounter an iterable object? What order do you flatten set()s in? (and, ha ha, do you Set the same?) How are user-defined flattening behaviors registered? Is it a new special method, a registration API? How do you pass information about the flattening in progress to the user-defined behaviors? If you do something special to iterables, do you special-case strings? Why or why not? If you consume iterables, and only special case strings - then none of the issues you raise above seem to be a problem. Sets and dictionaries are both iterable. If it's not iterable it's an element. I'd prefer to see this as a built-in, lots of people seem to want it. IMHO Having it in itertools is a good compromise. What do you do if you encounter a function? This is kind of a trick question, since Nevow's flattener *calls* functions as it encounters them, then treats the *result* of calling them as further input. Sounds like not what anyone would normally expect. If you don't think that functions are special, what about *generator* functions? How do you tell the difference? What about functions that return generators but aren't themselves generators? What about functions that return non-generator iterators? What about pre-generated generator objects (if you don't want to treat iterables as special, are generators special?). What does the list constructor do with these ? Do the same. Do you produce the output as a structured list or an iterator that works incrementally? Either would be fine. I had in mind a list, but converting an iterator into a list is trivial. Also, at least Nevow uses flatten to mean serialize to bytes, not produce a flat list, and I imagine at least a few other web frameworks do as well. That starts to get into encoding issues. Not a use of the term I've come across. On the other hand I've heard of flatten in the context of nested data-structures many times. If you make a decision one way or another on any of these questions of policy, you are going to make flatten() useless to a significant portion of its potential userbase. The only difference between having it in the standard library and not is that if it's there, they'll spend an hour being confused by the weird way that it's dealing with insert your favorite data type here rather than just doing the obvious thing, and they'll take a minute to write the 10-line function that they need. Without the standard library, they'll skip to step 2 and save a lot of time. I think that you're over complicating it and that the term flatten is really fairly straightforward. Especially if it's clearly documented in terms of consuming iterables. All the best, Michael Foord http://www.voidspace.org.uk I would love to see a unified API that figured out all of these problems, and put them together into a (non-stdlib) library that anyone interested could use for a few years to work the kinks out. Although it might be nice to have a simple flatten interface, I don't think that it would ever be simple enough to stick into a builtin; it would just be the default instance of the IncrementalDestructuringProcess class with the most popular (as determined by polling users of the library after a year or so) IncrementalDestructuringTypePolicy. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk -- No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.1.405 / Virus Database: 268.12.7/454 - Release Date: 21/09/2006 ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe:
Re: [Python-Dev] Suggestion for a new built-in - flatten
Bob Ippolito [EMAIL PROTECTED] wrote: On 9/22/06, Brian Harring [EMAIL PROTECTED] wrote: On Fri, Sep 22, 2006 at 12:05:19PM -0700, Bob Ippolito wrote: I think instead of adding a flatten function perhaps we should think about adding something like Erlang's iolist support. The idea is that methods like writelines should be able to take nested iterators and consume any object they find that implements the buffer protocol. Which is no different then just passing in a generator/iterator that does flattening. Don't much see the point in gumming up the file protocol with this special casing; still will have requests for a flattener elsewhere. If flattening was added, should definitely be a general obj, not a special casing in one method in my opinion. I disagree, the reason for iolist is performance and convenience; the required indirection of having to explicitly call a flattener function removes some optimization potential and makes it less convenient to use. Sorry Bob, but I disagree. In the few times where I've needed to 'write a list of buffers to a file handle', I find that iterating over the buffers to be sufficient. And honestly, in all of my time dealing with socket and file IO, I've never needed to write a list of iterators of buffers. Not to say that YAGNI, but I'd like to see an example where 1) it was being used in the wild, and 2) where it would be a measurable speedup. - Josiah ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Suggestion for a new built-in - flatten
[Michael Foord] I have a suggestion for a new Python built in function: 'flatten'. ... There are several different possible approaches in pure Python, but is this an idea that has legs ? No legs. It has been discussed ad naseum on comp.lang.python. People seem to enjoy writing their own versions of flatten more than finding legitimate use cases that don't already have trivial solutions. A general purpose flattener needs some way to be told was is atomic and what can be further subdivided. Also, it not obvious how the algorithm should be extended to cover inputs with tree-like data structures with data at nodes as well as the leaves (preorder, postorder, inorder traversal, etc.) I say use your favorite cookbook approach and leave it out of the language. Raymond ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Suggestion for a new built-in - flatten
On 9/22/06, Josiah Carlson [EMAIL PROTECTED] wrote: Bob Ippolito [EMAIL PROTECTED] wrote: On 9/22/06, Brian Harring [EMAIL PROTECTED] wrote: On Fri, Sep 22, 2006 at 12:05:19PM -0700, Bob Ippolito wrote: I think instead of adding a flatten function perhaps we should think about adding something like Erlang's iolist support. The idea is that methods like writelines should be able to take nested iterators and consume any object they find that implements the buffer protocol. Which is no different then just passing in a generator/iterator that does flattening. Don't much see the point in gumming up the file protocol with this special casing; still will have requests for a flattener elsewhere. If flattening was added, should definitely be a general obj, not a special casing in one method in my opinion. I disagree, the reason for iolist is performance and convenience; the required indirection of having to explicitly call a flattener function removes some optimization potential and makes it less convenient to use. Sorry Bob, but I disagree. In the few times where I've needed to 'write a list of buffers to a file handle', I find that iterating over the buffers to be sufficient. And honestly, in all of my time dealing with socket and file IO, I've never needed to write a list of iterators of buffers. Not to say that YAGNI, but I'd like to see an example where 1) it was being used in the wild, and 2) where it would be a measurable speedup. The primary use for this is structured data, mostly file formats, where you can't write the beginning until you have a bunch of information about the entire structure such as the number of items or the count of bytes when serialized. An efficient way to do that is just to build a bunch of nested lists that you can use to calculate the size (iolist_size(...) in Erlang) instead of having to write a visitor that constructs a new flat list or writes to StringIO first. I suppose in the most common case, for performance reasons, you would want to restrict this to sequences only (as in PySequence_Fast) because iolist_size(...) should be non-destructive (or else it has to flatten into a new list anyway). I've definitely done this before in Python, most recently here: http://svn.red-bean.com/bob/flashticle/trunk/flashticle/ The flatten function in this case is flashticle.util.iter_only, and it's used in flashticle.actions, flashticle.amf, flashticle.flv, flashticle.swf, and flashticle.remoting. -bob ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Suggestion for a new built-in - flatten
On Fri, 22 Sep 2006 20:55:18 +0100, Michael Foord [EMAIL PROTECTED] wrote: [EMAIL PROTECTED] wrote: On Fri, 22 Sep 2006 18:43:42 +0100, Michael Foord [EMAIL PROTECTED] wrote: This wouldn't be a problem except that everyone has a different idea of those requirements:). You didn't really address this, and it was my main point. In fact, you more or less made my point for me. You just assume that the type of application you have in mind right now is the only one that wants to use a flatten function, and dismiss out of hand any uses that I might have in mind. If you consume iterables, and only special case strings - then none of the issues you raise above seem to be a problem. You have just made two major policy decisions about the flattener without presenting a specific use case or set of use cases it is meant to be restricted to. For example, you suggest special casing strings. Why? Your guideline otherwise is to follow what the iter() or list() functions do. What about user-defined classes which subclass str and implement __iter__? Sets and dictionaries are both iterable. If it's not iterable it's an element. I'd prefer to see this as a built-in, lots of people seem to want it. IMHO Can you give specific examples? The only significant use of a flattener I'm intimately familiar with (Nevow) works absolutely nothing like what you described. Having it in itertools is a good compromise. No need to compromise with me. I am not in a position to reject your change. No particular reason for me to make any concessions either: I'm simply trying to communicate the fact that I think this is a terrible idea, not come to an agreement with you about how progress might be made. Absolutely no changes on this front are A-OK by me :). You have made a case for the fact that, perhaps, you should have a utility library which you use in all your projects could use for consistency and to avoid repeating yourself, since you have a clearly defined need for what a flattener should do. I haven't read anything that indicates there's a good reason for this function to be in the standard library. What are the use cases? It's definitely better for the core language to define lots of basic types so that you can say something in a library like returns a dict mapping strings to ints without having a huge argument about what dict and string and int mean. What's the benefit to having everyone flatten things the same way, though? Flattening really isn't that common of an operation, and in the cases where it's needed, a unified approach would only help if you had two flattenable data-structures from different libraries which needed to be combined. I can't say I've ever seen a case where that would happen, let alone for it to be common enough that there should be something in the core language to support it. What do you do if you encounter a function? This is kind of a trick question, since Nevow's flattener *calls* functions as it encounters them, then treats the *result* of calling them as further input. Sounds like not what anyone would normally expect. Of course not. My point is that there is nothing that anyone would normally expect from a flattener except a few basic common features. Bob's use-case is completely different from yours, for example: he's talking about flattening to support high-performance I/O. What does the list constructor do with these ? Do the same. list('hello') ['h', 'e', 'l', 'l', 'o'] What more can I say? Do you produce the output as a structured list or an iterator that works incrementally? Either would be fine. I had in mind a list, but converting an iterator into a list is trivial. There are applications where this makes a big difference. Bob, for example, suggested that this should only work on structures that support the PySequence_Fast operations. Also, at least Nevow uses flatten to mean serialize to bytes, not produce a flat list, and I imagine at least a few other web frameworks do as well. That starts to get into encoding issues. Not a use of the term I've come across. On the other hand I've heard of flatten in the context of nested data-structures many times. Nevertheless the only respondent even mildly in favor of your proposal so far also mentions flattening sequences of bytes, although not quite as directly. I think that you're over complicating it and that the term flatten is really fairly straightforward. Especially if it's clearly documented in terms of consuming iterables. And I think that you're over-simplifying. If you can demonstrate that there is really a broad consensus that this sort of thing is useful in a wide variety of applications, then sure, I wouldn't complain too much. But I've spent a LOT of time thinking about what flattening is, and several applications that I've worked on have very different ideas about how it should work, and I see very little benefit to unifying them. That's just the