[Python-Dev] Suggestion for a new built-in - flatten

2006-09-22 Thread Michael Foord
Hello all,

I have a suggestion for a new Python built in function: 'flatten'.

This would (as if it needs explanation) take a single sequence, where 
each element can be a sequence (or iterable ?) nested to an arbitrary 
depth. It would return a flattened list. A useful restriction could be 
that it wouldn't expand strings :-)

I've needed this several times, and recently twice at work. There are 
several implementations in the Python cookbook. When I posted on my blog 
recently asking for one liners to flatten a list of lists (only 1 level 
of nesting), I had 26 responses, several of them saying it was a problem 
they had encountered before.

There are also numerous  places on the web bewailing the lack of this as 
a built-in. All of this points to the fact that it is something that 
would be appreciated as a built in.

There is an implementation already in Tkinter :

import _tkinter._flatten as flatten

There are several different possible approaches in pure Python, but is 
this an idea that has legs ?

All the best,


Michael Foord
http://www.voidspace.org.uk/python/index.shtml


-- 
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.405 / Virus Database: 268.12.7/454 - Release Date: 21/09/2006

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Suggestion for a new built-in - flatten

2006-09-22 Thread Josiah Carlson

Michael Foord [EMAIL PROTECTED] wrote:
 
 Hello all,
 
 I have a suggestion for a new Python built in function: 'flatten'.

This has been brought up many times.  I'm -1 on its inclusion, if only
because it's a fairly simple 9-line function (at least the trivial
version I came up with), and not all X-line functions should be in the
standard library.  Also, while I have had need for such a function in
the past, I have found that I haven't needed it in a few years.


 - Josiah

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Suggestion for a new built-in - flatten

2006-09-22 Thread Brett Cannon
On 9/22/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
Michael There are several different possible approaches in pure Python,Michael but is this an idea that has legs ?Why not add it to itertools?Then, if you need a true list, just calllist() on the returned iterator.
Yeah, this is a better solution. flatten() just doesn't scream built-in! to me.-Brett
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Suggestion for a new built-in - flatten

2006-09-22 Thread Bob Ippolito
On 9/22/06, Josiah Carlson [EMAIL PROTECTED] wrote:

 Michael Foord [EMAIL PROTECTED] wrote:
 
  Hello all,
 
  I have a suggestion for a new Python built in function: 'flatten'.

 This has been brought up many times.  I'm -1 on its inclusion, if only
 because it's a fairly simple 9-line function (at least the trivial
 version I came up with), and not all X-line functions should be in the
 standard library.  Also, while I have had need for such a function in
 the past, I have found that I haven't needed it in a few years.

I think instead of adding a flatten function perhaps we should think
about adding something like Erlang's iolist support. The idea is
that methods like writelines should be able to take nested iterators
and consume any object they find that implements the buffer protocol.

-bob
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Suggestion for a new built-in - flatten

2006-09-22 Thread Brian Harring
On Fri, Sep 22, 2006 at 12:05:19PM -0700, Bob Ippolito wrote:
 On 9/22/06, Josiah Carlson [EMAIL PROTECTED] wrote:
 
  Michael Foord [EMAIL PROTECTED] wrote:
  
   Hello all,
  
   I have a suggestion for a new Python built in function: 'flatten'.
 
  This has been brought up many times.  I'm -1 on its inclusion, if only
  because it's a fairly simple 9-line function (at least the trivial
  version I came up with), and not all X-line functions should be in the
  standard library.  Also, while I have had need for such a function in
  the past, I have found that I haven't needed it in a few years.
 
 I think instead of adding a flatten function perhaps we should think
 about adding something like Erlang's iolist support. The idea is
 that methods like writelines should be able to take nested iterators
 and consume any object they find that implements the buffer protocol.

Which is no different then just passing in a generator/iterator that 
does flattening.

Don't much see the point in gumming up the file protocol with this 
special casing; still will have requests for a flattener elsewhere.

If flattening was added, should definitely be a general obj, not a 
special casing in one method in my opinion.
~harring


pgpudc8tPUGor.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Suggestion for a new built-in - flatten

2006-09-22 Thread glyph
On Fri, 22 Sep 2006 18:43:42 +0100, Michael Foord [EMAIL PROTECTED] wrote:

I have a suggestion for a new Python built in function: 'flatten'.

This seems superficially like a good idea, but I think adding it to Python 
anywhere would do a lot more harm than good.  I can see that consensus is 
already strongly against a builtin, but I think it would be bad to add to 
itertools too.

Flattening always *seems* to be a trivial and obvious operation.  I just need 
something that takes a group of deeply structured data and turns it into a 
group of shallowly structured data..  Everyone that has this requirement 
assumes that their list of implicit requirements for flattening is the 
obviously correct one.

This wouldn't be a problem except that everyone has a different idea of those 
requirements:).

Here are a few issues.

What do you do when you encounter a dict?  You can treat it as its keys(), its 
values(), or its items().

What do you do when you encounter an iterable object?

What order do you flatten set()s in?  (and, ha ha, do you Set the same?)

How are user-defined flattening behaviors registered?  Is it a new special 
method, a registration API?

How do you pass information about the flattening in progress to the 
user-defined behaviors?

If you do something special to iterables, do you special-case strings?  Why or 
why not?

What do you do if you encounter a function?  This is kind of a trick question, 
since Nevow's flattener *calls* functions as it encounters them, then treats 
the *result* of calling them as further input.

If you don't think that functions are special, what about *generator* 
functions?  How do you tell the difference?  What about functions that return 
generators but aren't themselves generators?  What about functions that return 
non-generator iterators?  What about pre-generated generator objects (if you 
don't want to treat iterables as special, are generators special?).

Do you produce the output as a structured list or an iterator that works 
incrementally?

Also, at least Nevow uses flatten to mean serialize to bytes, not produce 
a flat list, and I imagine at least a few other web frameworks do as well.  
That starts to get into encoding issues.

If you make a decision one way or another on any of these questions of policy, 
you are going to make flatten() useless to a significant portion of its 
potential userbase.  The only difference between having it in the standard 
library and not is that if it's there, they'll spend an hour being confused by 
the weird way that it's dealing with insert your favorite data type here 
rather than just doing the obvious thing, and they'll take a minute to write 
the 10-line function that they need.  Without the standard library, they'll 
skip to step 2 and save a lot of time.

I would love to see a unified API that figured out all of these problems, and 
put them together into a (non-stdlib) library that anyone interested could use 
for a few years to work the kinks out.  Although it might be nice to have a 
simple flatten interface, I don't think that it would ever be simple enough 
to stick into a builtin; it would just be the default instance of the 
IncrementalDestructuringProcess class with the most popular (as determined by 
polling users of the library after a year or so) 
IncrementalDestructuringTypePolicy.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Suggestion for a new built-in - flatten

2006-09-22 Thread Bob Ippolito
On 9/22/06, Brian Harring [EMAIL PROTECTED] wrote:
 On Fri, Sep 22, 2006 at 12:05:19PM -0700, Bob Ippolito wrote:
  On 9/22/06, Josiah Carlson [EMAIL PROTECTED] wrote:
  
   Michael Foord [EMAIL PROTECTED] wrote:
   
Hello all,
   
I have a suggestion for a new Python built in function: 'flatten'.
  
   This has been brought up many times.  I'm -1 on its inclusion, if only
   because it's a fairly simple 9-line function (at least the trivial
   version I came up with), and not all X-line functions should be in the
   standard library.  Also, while I have had need for such a function in
   the past, I have found that I haven't needed it in a few years.
 
  I think instead of adding a flatten function perhaps we should think
  about adding something like Erlang's iolist support. The idea is
  that methods like writelines should be able to take nested iterators
  and consume any object they find that implements the buffer protocol.

 Which is no different then just passing in a generator/iterator that
 does flattening.

 Don't much see the point in gumming up the file protocol with this
 special casing; still will have requests for a flattener elsewhere.

 If flattening was added, should definitely be a general obj, not a
 special casing in one method in my opinion.

I disagree, the reason for iolist is performance and convenience; the
required indirection of having to explicitly call a flattener function
removes some optimization potential and makes it less convenient to
use.

While there certainly should be a general mechanism available to
perform the task (easily accessible from C), the user would be better
served by not having to explicitly call itertools.iterbuffers every
time they want to write recursive iterables of stuff.

-bob
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Suggestion for a new built-in - flatten

2006-09-22 Thread Michael Foord
[EMAIL PROTECTED] wrote:
 On Fri, 22 Sep 2006 18:43:42 +0100, Michael Foord [EMAIL PROTECTED] wrote:

   
 I have a suggestion for a new Python built in function: 'flatten'.
 

 This seems superficially like a good idea, but I think adding it to Python 
 anywhere would do a lot more harm than good.  I can see that consensus is 
 already strongly against a builtin, but I think it would be bad to add to 
 itertools too.

 Flattening always *seems* to be a trivial and obvious operation.  I just 
 need something that takes a group of deeply structured data and turns it into 
 a group of shallowly structured data..  Everyone that has this requirement 
 assumes that their list of implicit requirements for flattening is the 
 obviously correct one.

 This wouldn't be a problem except that everyone has a different idea of those 
 requirements:).

 Here are a few issues.

 What do you do when you encounter a dict?  You can treat it as its keys(), 
 its values(), or its items().

 What do you do when you encounter an iterable object?

 What order do you flatten set()s in?  (and, ha ha, do you Set the same?)

 How are user-defined flattening behaviors registered?  Is it a new special 
 method, a registration API?

 How do you pass information about the flattening in progress to the 
 user-defined behaviors?

 If you do something special to iterables, do you special-case strings?  Why 
 or why not?

   
If you consume iterables, and only special case strings - then none of 
the issues you raise above seem to be a problem.

Sets and dictionaries are both iterable.

If it's not iterable it's an element.

I'd prefer to see this as a built-in, lots of people seem to want it. IMHO

Having it in itertools is a good compromise.

 What do you do if you encounter a function?  This is kind of a trick 
 question, since Nevow's flattener *calls* functions as it encounters them, 
 then treats the *result* of calling them as further input.
   
Sounds like not what anyone would normally expect.


 If you don't think that functions are special, what about *generator* 
 functions?  How do you tell the difference?  What about functions that return 
 generators but aren't themselves generators?  What about functions that 
 return non-generator iterators?  What about pre-generated generator objects 
 (if you don't want to treat iterables as special, are generators special?).

   
What does the list constructor do with these ? Do the same.

 Do you produce the output as a structured list or an iterator that works 
 incrementally?
   
Either would be fine. I had in mind a list, but converting an iterator 
into a list is trivial.

 Also, at least Nevow uses flatten to mean serialize to bytes, not 
 produce a flat list, and I imagine at least a few other web frameworks do 
 as well.  That starts to get into encoding issues.

   
Not a use of the term I've come across. On the other hand I've heard of 
flatten in the context of nested data-structures many times.

 If you make a decision one way or another on any of these questions of 
 policy, you are going to make flatten() useless to a significant portion of 
 its potential userbase.  The only difference between having it in the 
 standard library and not is that if it's there, they'll spend an hour being 
 confused by the weird way that it's dealing with insert your favorite data 
 type here rather than just doing the obvious thing, and they'll take a 
 minute to write the 10-line function that they need.  Without the standard 
 library, they'll skip to step 2 and save a lot of time.
   
I think that you're over complicating it and that the term flatten is 
really fairly straightforward. Especially if it's clearly documented in 
terms of consuming iterables.

All the best,


Michael Foord
http://www.voidspace.org.uk


 I would love to see a unified API that figured out all of these problems, and 
 put them together into a (non-stdlib) library that anyone interested could 
 use for a few years to work the kinks out.  Although it might be nice to have 
 a simple flatten interface, I don't think that it would ever be simple 
 enough to stick into a builtin; it would just be the default instance of the 
 IncrementalDestructuringProcess class with the most popular (as determined by 
 polling users of the library after a year or so) 
 IncrementalDestructuringTypePolicy.
 ___
 Python-Dev mailing list
 Python-Dev@python.org
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe: 
 http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk


   



-- 
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.405 / Virus Database: 268.12.7/454 - Release Date: 21/09/2006

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 

Re: [Python-Dev] Suggestion for a new built-in - flatten

2006-09-22 Thread Josiah Carlson

Bob Ippolito [EMAIL PROTECTED] wrote:
 On 9/22/06, Brian Harring [EMAIL PROTECTED] wrote:
  On Fri, Sep 22, 2006 at 12:05:19PM -0700, Bob Ippolito wrote:
   I think instead of adding a flatten function perhaps we should think
   about adding something like Erlang's iolist support. The idea is
   that methods like writelines should be able to take nested iterators
   and consume any object they find that implements the buffer protocol.
 
  Which is no different then just passing in a generator/iterator that
  does flattening.
 
  Don't much see the point in gumming up the file protocol with this
  special casing; still will have requests for a flattener elsewhere.
 
  If flattening was added, should definitely be a general obj, not a
  special casing in one method in my opinion.
 
 I disagree, the reason for iolist is performance and convenience; the
 required indirection of having to explicitly call a flattener function
 removes some optimization potential and makes it less convenient to
 use.

Sorry Bob, but I disagree.  In the few times where I've needed to 'write
a list of buffers to a file handle', I find that iterating over the
buffers to be sufficient.  And honestly, in all of my time dealing
with socket and file IO, I've never needed to write a list of iterators
of buffers.  Not to say that YAGNI, but I'd like to see an example where
1) it was being used in the wild, and 2) where it would be a measurable
speedup.

 - Josiah

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Suggestion for a new built-in - flatten

2006-09-22 Thread Raymond Hettinger
[Michael Foord]
I have a suggestion for a new Python built in function: 'flatten'.
 ...
 There are several different possible approaches in pure Python, 
 but is this an idea that has legs ?

No legs.

It has been discussed ad naseum on comp.lang.python.  People seem to
enjoy writing their own versions of flatten more than finding legitimate
use cases that don't already have trivial solutions.

A general purpose flattener needs some way to be told was is atomic and
what can be further subdivided.  Also, it not obvious how the algorithm
should be extended to cover inputs with tree-like data structures with
data at nodes as well as the leaves (preorder, postorder, inorder
traversal, etc.)

I say use your favorite cookbook approach and leave it out of the
language.


Raymond
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Suggestion for a new built-in - flatten

2006-09-22 Thread Bob Ippolito
On 9/22/06, Josiah Carlson [EMAIL PROTECTED] wrote:

 Bob Ippolito [EMAIL PROTECTED] wrote:
  On 9/22/06, Brian Harring [EMAIL PROTECTED] wrote:
   On Fri, Sep 22, 2006 at 12:05:19PM -0700, Bob Ippolito wrote:
I think instead of adding a flatten function perhaps we should think
about adding something like Erlang's iolist support. The idea is
that methods like writelines should be able to take nested iterators
and consume any object they find that implements the buffer protocol.
  
   Which is no different then just passing in a generator/iterator that
   does flattening.
  
   Don't much see the point in gumming up the file protocol with this
   special casing; still will have requests for a flattener elsewhere.
  
   If flattening was added, should definitely be a general obj, not a
   special casing in one method in my opinion.
 
  I disagree, the reason for iolist is performance and convenience; the
  required indirection of having to explicitly call a flattener function
  removes some optimization potential and makes it less convenient to
  use.

 Sorry Bob, but I disagree.  In the few times where I've needed to 'write
 a list of buffers to a file handle', I find that iterating over the
 buffers to be sufficient.  And honestly, in all of my time dealing
 with socket and file IO, I've never needed to write a list of iterators
 of buffers.  Not to say that YAGNI, but I'd like to see an example where
 1) it was being used in the wild, and 2) where it would be a measurable
 speedup.

The primary use for this is structured data, mostly file formats,
where you can't write the beginning until you have a bunch of
information about the entire structure such as the number of items or
the count of bytes when serialized. An efficient way to do that is
just to build a bunch of nested lists that you can use to calculate
the size (iolist_size(...) in Erlang) instead of having to write a
visitor that constructs a new flat list or writes to StringIO first. I
suppose in the most common case, for performance reasons, you would
want to restrict this to sequences only (as in PySequence_Fast)
because iolist_size(...) should be non-destructive (or else it has to
flatten into a new list anyway).

I've definitely done this before in Python, most recently here:
http://svn.red-bean.com/bob/flashticle/trunk/flashticle/

The flatten function in this case is flashticle.util.iter_only, and
it's used in flashticle.actions, flashticle.amf, flashticle.flv,
flashticle.swf, and flashticle.remoting.

-bob
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Suggestion for a new built-in - flatten

2006-09-22 Thread glyph
On Fri, 22 Sep 2006 20:55:18 +0100, Michael Foord [EMAIL PROTECTED] wrote:

[EMAIL PROTECTED] wrote:
On Fri, 22 Sep 2006 18:43:42 +0100, Michael Foord 
[EMAIL PROTECTED] wrote:

This wouldn't be a problem except that everyone has a different idea of 
those requirements:).

You didn't really address this, and it was my main point.  In fact, you more or 
less made my point for me.  You just assume that the type of application you 
have in mind right now is the only one that wants to use a flatten function, 
and dismiss out of hand any uses that I might have in mind.

If you consume iterables, and only special case strings - then none of the 
issues you raise above seem to be a problem.

You have just made two major policy decisions about the flattener without 
presenting a specific use case or set of use cases it is meant to be restricted 
to.

For example, you suggest special casing strings.  Why?  Your guideline 
otherwise is to follow what the iter() or list() functions do.  What about 
user-defined classes which subclass str and implement __iter__?

Sets and dictionaries are both iterable.

If it's not iterable it's an element.

I'd prefer to see this as a built-in, lots of people seem to want it. IMHO

Can you give specific examples?  The only significant use of a flattener I'm 
intimately familiar with (Nevow) works absolutely nothing like what you 
described.

Having it in itertools is a good compromise.

No need to compromise with me.  I am not in a position to reject your change.  
No particular reason for me to make any concessions either: I'm simply trying 
to communicate the fact that I think this is a terrible idea, not come to an 
agreement with you about how progress might be made.  Absolutely no changes on 
this front are A-OK by me :).

You have made a case for the fact that, perhaps, you should have a utility 
library which you use in all your projects could use for consistency and to 
avoid repeating yourself, since you have a clearly defined need for what a 
flattener should do.  I haven't read anything that indicates there's a good 
reason for this function to be in the standard library.  What are the use cases?

It's definitely better for the core language to define lots of basic types so 
that you can say something in a library like returns a dict mapping strings to 
ints without having a huge argument about what dict and string and int 
mean.  What's the benefit to having everyone flatten things the same way, 
though?  Flattening really isn't that common of an operation, and in the cases 
where it's needed, a unified approach would only help if you had two 
flattenable data-structures from different libraries which needed to be 
combined.  I can't say I've ever seen a case where that would happen, let alone 
for it to be common enough that there should be something in the core language 
to support it.

What do you do if you encounter a function?  This is kind of a trick 
question, since Nevow's flattener *calls* functions as it encounters 
them, then treats the *result* of calling them as further input.

Sounds like not what anyone would normally expect.

Of course not.  My point is that there is nothing that anyone would normally 
expect from a flattener except a few basic common features.  Bob's use-case is 
completely different from yours, for example: he's talking about flattening to 
support high-performance I/O.

What does the list constructor do with these ? Do the same.

 list('hello')
['h', 'e', 'l', 'l', 'o']

What more can I say?

Do you produce the output as a structured list or an iterator that works 
incrementally?

Either would be fine. I had in mind a list, but converting an iterator into 
a list is trivial.

There are applications where this makes a big difference.  Bob, for example, 
suggested that this should only work on structures that support the 
PySequence_Fast operations.

Also, at least Nevow uses flatten to mean serialize to bytes, not 
produce a flat list, and I imagine at least a few other web frameworks do 
as well.  That starts to get into encoding issues.

Not a use of the term I've come across. On the other hand I've heard of 
flatten in the context of nested data-structures many times.

Nevertheless the only respondent even mildly in favor of your proposal so far 
also mentions flattening sequences of bytes, although not quite as directly.

I think that you're over complicating it and that the term flatten is really 
fairly straightforward. Especially if it's clearly documented in terms of 
consuming iterables.

And I think that you're over-simplifying.  If you can demonstrate that there is 
really a broad consensus that this sort of thing is useful in a wide variety of 
applications, then sure, I wouldn't complain too much.  But I've spent a LOT of 
time thinking about what flattening is, and several applications that I've 
worked on have very different ideas about how it should work, and I see very 
little benefit to unifying them.  That's just the