date:20180703

Re: [Python-ideas] grouping / dict of lists

2018-07-03 Thread Michael Selik

On Tue, Jul 3, 2018, 6:32 PM Steven D'Aprano  wrote:

> On Tue, Jul 03, 2018 at 10:33:55AM -0700, Chris Barker wrote:
> > On Tue, Jul 3, 2018 at 8:33 AM, Steven D'Aprano 
> wrote:
> >
> > > but why are we using key values by hand when grouping ought to do it
> for
> > >> us, as Michael Selik's version does?
> > >
> > > grouping(words, key=len)
> >
> >
> > because supplying a key function is sometimes cleaner, and sometimes
> uglier
> > than building up a comprehension -- which I think comes down to:
> >
> > 1) taste (style?)
> >
> > 2) whether the key function is as simple as the expression
> >
> > 3) whether you ned to transform the value in any way.
>
>
> Of course you can prepare the sequence any way you like, but these are
> not equivalent:
>
> grouping(words, keyfunc=len)
>
> grouping((len(word), word) for word in words)
>
> The first groups words by their length; the second groups pairs of
> (length, word) tuples by equality.
>
>
> py> grouping("a bb ccc d ee fff".split(), keyfunc=len)
> {1: ['a', 'd'], 2: ['bb', 'ee'], 3: ['ccc', 'fff']}
>
> py> grouping((len(w), w) for w in "a bb ccc d ee fff".split())
> {(3, 'ccc'): [(3, 'ccc')], (1, 'd'): [(1, 'd')], (2, 'ee'): [(2, 'ee')],
> (3, 'fff'): [(3, 'fff')], (1, 'a'): [(1, 'a')], (2, 'bb'): [(2, 'bb')]}
>

This handles the case that someone is passing in n-tuple rows and wants to
keep the rows unchanged.

>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Where should grouping() live

2018-07-03 Thread Chris Barker via Python-ideas

So this ended up a long post, so the TL;DR

* There are types of data well suited to the key function approach, and
other data not so well suited to it. If you want to support the not as well
suited use cases, you should have a value function as well and/or take a
(key, value) pair.

* There are some nice advantages in flexibility to having a Grouping class,
rather than simply a function.

So: I propose a best of all worlds version: a Grouping class (subclass of
dict):

* The constructor takes an iterable of (key, value) pairs by default.

* The constructor takes an optional key_func -- when not None, it is used
to determine the keys in the iterable instead.

* The constructor also takes a value_func -- when specified, it processes
the items to determine the values.

* a_grouping[key] = value

  adds the value to the list corresponding to the key.

* a_grouping.add(item) -- applies the key_func and value_func to add a new
value to the appropriate group.

Prototype code here:

https://github.com/PythonCHB/grouper

Now the lengthy commentary and examples:

On Tue, Jul 3, 2018 at 5:21 PM, Steven D'Aprano  wrote:

> On Wed, Jul 04, 2018 at 10:44:17AM +1200, Greg Ewing wrote:
> > Steven D'Aprano wrote:
>
> > Unless we *make* it a data type. Then not only would it fit
> > well in collections, it would also make it fairly easy to do
> > incremental grouping if you really wanted that.

indeed -- one of motivations for my prototype:

https://github.com/PythonCHB/grouper

(Did none of my messages get to this list??)

> > Usual case:
> >
> >g = groupdict((key(val), val) for val in things)
>
>
> How does groupdict differ from regular defaultdicts, aside from the
> slightly different constructor?
>

* You don't need to declare the defaultdict (and what the default is) first

* You don't need to call .append() yourself

* It can have a custom .init() and .update()

* It can have a .add() method

* It can (optionally) use a key function.

* And you can have other methods that do useful things with the groupings.

   >g = groupdict()

> >for key(val), val in things:
> >   g.add(key, val)
> >   process_partial_grouping(g)
>
> I don't think that syntax works. I get:
>
> SyntaxError: can't assign to function call
>

looks like untested code :-)

with my prototype it would be:

g = groupdict()
for key, val in things:
g[key] = val
process_partial_grouping(g)

(this assumes your things are (key, value) pairs)

Again, IF you data are a sequence of items, and the value is the item
itself, and the key is a simple function of the item, THEN the key function
method makes more sense, which for the incremental adding of data would be:

g = groupdict(key_fun=a_fun)
for thing in things:
g.add(thing)
process_partial_grouping(g)

Even if it did work, it's hardly any simpler than
>
> d = defaultdict(list)
> for val in things:
> d[key(val)].append(val)
>
> But then Counter is hardly any simpler than a regular dict too.
>

exactly -- and counter is actually a little annoyingly too much like a
regular dict, in my mind :-)

In the latest version of my prototype, the __init__  expects a (key, value)
pair by default, but you can also pass in a key_func, and then it will
process the iterable passes in as (key_func(item), item) pairs.

And the update() method will also use the key_func if one was provided.

So a best of both worlds -- pick your API.

In this thread, and in the PEP, there various ways of accomplishing this
task presented -- none of them (except using a raw itertools.groupby in
some cases) is all that onerous.

But I do think a custom function or even better, custom class, would create
a "one obvious" way to do a common manipulation.

A final (repeated) point:

Some data are better suited to a (key, value) pair style, and some to a key
function style. All of the examples in the PEP are well suited to the key
function style. But the example that kicked off this discussion was about
data already in (key, value) pairs (actual in that case, (value, key) pairs.

And there are other examples. Here's a good one for how one might want to
use a Grouping dict more like a regular dict -- of maybe like a simple
function constructor:

(code in: https://github.com/PythonCHB/grouper/blob/master/examples/
trigrams.py)

#!/usr/bin/env python3

"""
Demo of processing "trigrams" from Dave Thomas' Coding Kata site:

http://codekata.com/kata/kata14-tom-swift-under-the-milkwood/

This is only addressing the part of the problem of building up the trigrams.

This is showing various ways of doing it with the Grouping object.
"""

from grouper import Grouping
from operator import itemgetter

words = "I wish I may I wish I might".split()

# using setdefault with a regular dict:
# how I might do it without a Grouping class
trigrams = {}
for i in range(len(words) - 2):
pair = tuple(words[i:i + 2])
follower = words[i + 2]
trigrams.setdefault(pair, []).append(follower)

print(trigrams)

# using a Grouping

Re: [Python-ideas] grouping / dict of lists

2018-07-03 Thread Steven D'Aprano

On Tue, Jul 03, 2018 at 10:33:55AM -0700, Chris Barker wrote:
> On Tue, Jul 3, 2018 at 8:33 AM, Steven D'Aprano  wrote:
> 
> > but why are we using key values by hand when grouping ought to do it for
> >> us, as Michael Selik's version does?
> >
> > grouping(words, key=len)
> 
> 
> because supplying a key function is sometimes cleaner, and sometimes uglier
> than building up a comprehension -- which I think comes down to:
> 
> 1) taste (style?)
> 
> 2) whether the key function is as simple as the expression
> 
> 3) whether you ned to transform the value in any way.


Of course you can prepare the sequence any way you like, but these are 
not equivalent:

grouping(words, keyfunc=len)

grouping((len(word), word) for word in words)

The first groups words by their length; the second groups pairs of 
(length, word) tuples by equality.


py> grouping("a bb ccc d ee fff".split(), keyfunc=len)
{1: ['a', 'd'], 2: ['bb', 'ee'], 3: ['ccc', 'fff']}

py> grouping((len(w), w) for w in "a bb ccc d ee fff".split())
{(3, 'ccc'): [(3, 'ccc')], (1, 'd'): [(1, 'd')], (2, 'ee'): [(2, 'ee')], 
(3, 'fff'): [(3, 'fff')], (1, 'a'): [(1, 'a')], (2, 'bb'): [(2, 'bb')]}


Don't worry, it wasn't obvious to me at 1am (my local time) either :-)



-- 
Steve
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Where should grouping() live

2018-07-03 Thread Steven D'Aprano

On Wed, Jul 04, 2018 at 10:44:17AM +1200, Greg Ewing wrote:
> Steven D'Aprano wrote:
> >I propose that a better name which indicates the non-lazy nature of this 
> >function is *grouped* rather than grouping, like sorted().
> 
> +1
> 
> >As for where it belongs, perhaps the collections module is the least 
> >worst fit.
> 
> But then there's the equally strong purist argument that it's
> not a data type, just a function.

Yes, I realised that after I posted my earlier comment.


> Unless we *make* it a data type. Then not only would it fit
> well in collections, it would also make it fairly easy to do
> incremental grouping if you really wanted that.
> 
> Usual case:
> 
>g = groupdict((key(val), val) for val in things)


How does groupdict differ from regular defaultdicts, aside from the 
slightly different constructor?


> Incremental case:
> 
>g = groupdict()
>for key(val), val in things:
>   g.add(key, val)
>   process_partial_grouping(g)

I don't think that syntax works. I get:

SyntaxError: can't assign to function call


Even if it did work, it's hardly any simpler than

d = defaultdict(list)
for val in things:
d[key(val)].append(val)

But then Counter is hardly any simpler than a regular dict too.




-- 
Steve
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] grouping / dict of lists

2018-07-03 Thread Greg Ewing


MRAB wrote:
I think that building an iterable of 2-tuples to pass to 'grouped' is 
much like following a decorate-sort-undecorate pattern when sorting, or 
something similar when using 'min' or 'max'. Passing an iterable of 
items and optionally a key function is simpler, IMHO.


It should certainly be an option, but I don't think it
should be the only one. Like with map() vs. comprehensions,
sometimes one way is more convenient, sometimes the other.

--
Greg
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] grouping / dict of lists

2018-07-03 Thread MRAB


On 2018-07-03 23:20, Greg Ewing wrote:

Nicolas Rolin wrote:


grouping(((len(word), word) for word in words))


That actually has one more level of parens than are needed,
you can just write

 grouping((len(word), word) for word in words)


FWIW, here's my opinion.

I much prefer something like:

grouped(words, key=len)

I think that building an iterable of 2-tuples to pass to 'grouped' is 
much like following a decorate-sort-undecorate pattern when sorting, or 
something similar when using 'min' or 'max'. Passing an iterable of 
items and optionally a key function is simpler, IMHO.


Why would you pass 2-tuples, anyway?

Maybe it's because 'grouped' returns a dict and a dict can be built from 
an iterable of 2-tuples, but that's OK because a dict needs key/value pairs.


When 'Counter' was being proposed, it was suggested that one could be 
created from an iterable of 2-tuples, which sort of made sense because a 
Counter is like a dict, but, then, how would you count 2-tuples?


Fortunately, Counter counts items, so you can do things like:

counts = Counter(list_of_words)

I think it's the same thing here.

'grouped' returns a dict, so passing 2-tuples initially seems 
reasonable, but, as in the case with Counter, I think it would be a mistake.


It would be nice to be able to say:

grouped(words, key=str.casefold)

rather than:

grouped((word.casefold(), word) for word in words)

It would match the pattern of sorted, min and max.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Where should grouping() live

2018-07-03 Thread Greg Ewing


Steven D'Aprano wrote:
I propose that a better name which indicates the non-lazy nature of this 
function is *grouped* rather than grouping, like sorted().


+1

As for where it belongs, perhaps the collections module is the least 
worst fit.


But then there's the equally strong purist argument that it's
not a data type, just a function.

Unless we *make* it a data type. Then not only would it fit
well in collections, it would also make it fairly easy to do
incremental grouping if you really wanted that.

Usual case:

   g = groupdict((key(val), val) for val in things)

Incremental case:

   g = groupdict()
   for key(val), val in things:
  g.add(key, val)
  process_partial_grouping(g)

--
Greg
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Where should grouping() live

2018-07-03 Thread Greg Ewing


David Mertz wrote:
Just 
make grouping() a generator function rather than a plain function.  This 
lets us get an incremental grouping of an iterable.  This can be useful 
if the iterable is slow or infinite, but the partial groupings are 
useful in themselves.


Do you have any real-world examples? I'm having trouble
thinking of any.

Even if there are a few, it seems like the vast majority of
the time you *won't* want the intermediate groupings, just
the final one, and then what do you do? It would be annoying
to have to write code to exhaust the iterator just to get the
result you're after.

Also, were you intending it to return a series of independent
objects, or does it just return the same object every time,
adding stuff to it? The former would be very inefficient for
the majority of uses, whereas the latter doesn't seem to
be in keeping with the spirit of itertools.

This idea seems like purity beating practicality to me.

--
Greg
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] grouping / dict of lists

2018-07-03 Thread Greg Ewing


Nicolas Rolin wrote:


grouping(((len(word), word) for word in words))


That actually has one more level of parens than are needed,
you can just write

   grouping((len(word), word) for word in words)

--
Greg
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Where should grouping() live (was: grouping / dict of lists)

2018-07-03 Thread Chris Barker via Python-ideas

On Tue, Jul 3, 2018 at 12:01 PM, David Mertz  wrote:

> ... but I STILL like a new collections.Grouping (or collections.Grouper)
> the best.
>

me too.


> It might overcome Guido's reluctance... and what goes there is really
> delegated by him, not his own baby.
>

Is collections anyone in particular's baby? like itertools "belongs" to
Raymond?

-CHB




> On Tue, Jul 3, 2018 at 12:19 PM Chris Barker via Python-ideas <
> python-ideas@python.org> wrote:
>
>> On Tue, Jul 3, 2018 at 8:24 AM, Steven D'Aprano 
>> wrote:
>>
>>> On Tue, Jul 03, 2018 at 09:23:07AM -0400, David Mertz wrote:
>>>
>>
>>
>>> > My problem with the second idea is that *I* find it very wrong to have
>>> > something in itertools that does not return an iterator.  It wrecks the
>>> > combinatorial algebra of the module.
>>>
>>
>> hmm -- that seems to be a pretty pedantic approach -- practicality beats
>> purity, after all :-)
>>
>> I think we should first decide if a grouping() function is a useful
>> addition to the standard library (after all:  "not every two line function
>> needs to in the stdlib"), and f so, then we can find a home for it.
>>
>> personally, I'm wondering if a "dicttools" or something module would make
>> sense -- I imagine there are all sorts of other handy utilities for working
>> with dicts that could go there. (though, yeah, we'd want to actually have a
>> handful of these before creating a new module :-) )
>>
>> > That said, it's easy to fix... and I believe independently useful.  Just
>>> > make grouping() a generator function rather than a plain function.
>>> This
>>> > lets us get an incremental grouping of an iterable.
>>>
>>> We already have something which lazily groups an iterable, returning
>>> groups as they are seen: groupby.
>>>
>>> What makes grouping() different from groupby() is that it accumulates
>>> ALL of the subgroups rather than just consecutive subgroupings.
>>
>>
>> well, yeah, but it wont actually get you those until you exhaust the
>> iterator -- so while it's different than itertools.groupby, it is different
>> than itertools.groupby(sorted(iterable))?
>>
>> In short, this wouldn't really solve the problems that itertools.groupby
>> has for this sort of task -- so what's the point?
>>
>>  > As for where it belongs, perhaps the collections module is the least
>> worst fit.
>>
>> That depends some on whether we go with a simple function, in which case
>> collections is a pretty bad fit (but maybe still the least worse).
>>
>> Personally I still like the idea of having this be special type of dict,
>> rather than "just a function" -- and then it's really obvious where to put
>> it :-)
>>
>> -CHB
>>
>>
>> --
>>
>> Christopher Barker, Ph.D.
>> Oceanographer
>>
>> Emergency Response Division
>> NOAA/NOS/OR(206) 526-6959   voice
>> 7600 Sand Point Way NE
>> 
>>   (206) 526-6329   fax
>> Seattle, WA  98115   (206) 526-6317   main reception
>>
>> chris.bar...@noaa.gov
>> ___
>> Python-ideas mailing list
>> Python-ideas@python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>
>
> --
> Keeping medicines from the bloodstreams of the sick; food
> from the bellies of the hungry; books from the hands of the
> uneducated; technology from the underdeveloped; and putting
> advocates of freedom in prisons.  Intellectual property is
> to the 21st century what the slave trade was to the 16th.
>



-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Where should grouping() live (was: grouping / dict of lists)

2018-07-03 Thread David Mertz

I admit a hypothetical itertools.grouping that returned incrementally built
dictionaries doesn't fill any simple need I have often encountered.  I can
be hand-wavy about "stateful bucketing of streams" and looking at
windowing/tails, but I don't have a clean and simple example where I need
this.  The "run to exhaustion" interface has more obvious uses (albeit,
they *must* be technically a subset of the incremental ones).

I think I will also concede that in incrementally built and yielded
dictionary isn't *really* in the spirit of itertools either.  I suppose
tee() can grow unboundedly if only one tine is utilized... but in general,
itertools is meant to provide iterators that keep memory usage limited to a
few elements in memory at a time (yes, groupby, takewhile, or dropwhile
have pathological cases that could be unbounded... but usually they're not).

So maybe we really do need a dicttools or mappingtools module, with this as
the first function to put inside it.

... but I STILL like a new collections.Grouping (or collections.Grouper)
the best.  It might overcome Guido's reluctance... and what goes there is
really delegated by him, not his own baby.

On Tue, Jul 3, 2018 at 12:19 PM Chris Barker via Python-ideas <
python-ideas@python.org> wrote:

> On Tue, Jul 3, 2018 at 8:24 AM, Steven D'Aprano 
> wrote:
>
>> On Tue, Jul 03, 2018 at 09:23:07AM -0400, David Mertz wrote:
>>
>
>
>> > My problem with the second idea is that *I* find it very wrong to have
>> > something in itertools that does not return an iterator.  It wrecks the
>> > combinatorial algebra of the module.
>>
>
> hmm -- that seems to be a pretty pedantic approach -- practicality beats
> purity, after all :-)
>
> I think we should first decide if a grouping() function is a useful
> addition to the standard library (after all:  "not every two line function
> needs to in the stdlib"), and f so, then we can find a home for it.
>
> personally, I'm wondering if a "dicttools" or something module would make
> sense -- I imagine there are all sorts of other handy utilities for working
> with dicts that could go there. (though, yeah, we'd want to actually have a
> handful of these before creating a new module :-) )
>
> > That said, it's easy to fix... and I believe independently useful.  Just
>> > make grouping() a generator function rather than a plain function.  This
>> > lets us get an incremental grouping of an iterable.
>>
>> We already have something which lazily groups an iterable, returning
>> groups as they are seen: groupby.
>>
>> What makes grouping() different from groupby() is that it accumulates
>> ALL of the subgroups rather than just consecutive subgroupings.
>
>
> well, yeah, but it wont actually get you those until you exhaust the
> iterator -- so while it's different than itertools.groupby, it is different
> than itertools.groupby(sorted(iterable))?
>
> In short, this wouldn't really solve the problems that itertools.groupby
> has for this sort of task -- so what's the point?
>
>  > As for where it belongs, perhaps the collections module is the least
> worst fit.
>
> That depends some on whether we go with a simple function, in which case
> collections is a pretty bad fit (but maybe still the least worse).
>
> Personally I still like the idea of having this be special type of dict,
> rather than "just a function" -- and then it's really obvious where to put
> it :-)
>
> -CHB
>
>
> --
>
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR(206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115   (206) 526-6317   main reception
>
> chris.bar...@noaa.gov
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>


-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Where should grouping() live (was: grouping / dict of lists)

2018-07-03 Thread Chris Barker via Python-ideas

It seems a really stupid reason to make this choice, but:

If we make a Grouping class, it has an obvious home in the collections
module

If we make a grouping (or grouped) function, we don't know where to put it

But since I like the Grouping class idea anyway, it's one more reason...

-CHB


On Tue, Jul 3, 2018 at 9:15 AM, Chris Barker  wrote:

> On Tue, Jul 3, 2018 at 8:24 AM, Steven D'Aprano 
> wrote:
>
>> On Tue, Jul 03, 2018 at 09:23:07AM -0400, David Mertz wrote:
>>
>
>
>> > My problem with the second idea is that *I* find it very wrong to have
>> > something in itertools that does not return an iterator.  It wrecks the
>> > combinatorial algebra of the module.
>>
>
> hmm -- that seems to be a pretty pedantic approach -- practicality beats
> purity, after all :-)
>
> I think we should first decide if a grouping() function is a useful
> addition to the standard library (after all:  "not every two line function
> needs to in the stdlib"), and f so, then we can find a home for it.
>
> personally, I'm wondering if a "dicttools" or something module would make
> sense -- I imagine there are all sorts of other handy utilities for working
> with dicts that could go there. (though, yeah, we'd want to actually have a
> handful of these before creating a new module :-) )
>
> > That said, it's easy to fix... and I believe independently useful.  Just
>> > make grouping() a generator function rather than a plain function.  This
>> > lets us get an incremental grouping of an iterable.
>>
>> We already have something which lazily groups an iterable, returning
>> groups as they are seen: groupby.
>>
>> What makes grouping() different from groupby() is that it accumulates
>> ALL of the subgroups rather than just consecutive subgroupings.
>
>
> well, yeah, but it wont actually get you those until you exhaust the
> iterator -- so while it's different than itertools.groupby, it is different
> than itertools.groupby(sorted(iterable))?
>
> In short, this wouldn't really solve the problems that itertools.groupby
> has for this sort of task -- so what's the point?
>
>  > As for where it belongs, perhaps the collections module is the least
> worst fit.
>
> That depends some on whether we go with a simple function, in which case
> collections is a pretty bad fit (but maybe still the least worse).
>
> Personally I still like the idea of having this be special type of dict,
> rather than "just a function" -- and then it's really obvious where to put
> it :-)
>
> -CHB
>
>
> --
>
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR(206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115   (206) 526-6317   main reception
>
> chris.bar...@noaa.gov
>



-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] grouping / dict of lists

2018-07-03 Thread Chris Barker via Python-ideas

On Tue, Jul 3, 2018 at 8:33 AM, Steven D'Aprano  wrote:

> but why are we using key values by hand when grouping ought to do it for
>
us, as Michael Selik's version does?
>
> grouping(words, key=len)

because supplying a key function is sometimes cleaner, and sometimes uglier
than building up a comprehension -- which I think comes down to:

1) taste (style?)

2) whether the key function is as simple as the expression

3) whether you ned to transform the value in any way.

This argument is pretty much the same as whether you should use a
comprehension or map:

map(len, words)

vs

(len(word) for word in words)

In that case, map() looks cleaner and easier, but when you have something
less simple:

map(operator.attrgetter('something'), some_objects)

vs

(object.something for object in some_objects)

I like the comprehension better.

add a filter, and comps really get nicer -- after all they were added to
the language for a reason.

Then when you add the additional complication of needing to "transform" the
value as well, it's easy to do with the comprehension, but there is no way
to do it with only a key function.

I think the "confilct" here is that Micheal started with  a bunch of
examples that area ll well suited to the key_function approach, and Nicolas
started with a use-case that is better suited to the comprehension /
(key,value) approach.

However, while the key, value approach can be reasonably (if a bit klunky)
used everywhere the key function approach can, the opposite is not true
(for when the value needs to be transformed as well.

But in the spirit of "Python has both map and comprehensions", I say let's
use both!

* The default behavior is to process a (key.value) pair.

* A key function can be provided in which case it is used, and the value is
the full item.

* A value function can be provided, in which case, it is used to "process"
the value.

If this is too confusing an interface, we could forget the value function,
and folks would have to use the (key, value) interface if they need to
transform the value.

What makes no sense to me is having the identify function as the default
key (and yes, it is the identity function, it would return the actual
object, or not be there at all) -- the grouping would be done by the hash
of key after passing through the key function).

That's because having a default that is (almost) completely useless  makes
no sense -- it might as well be a required parameter.

(unless there was a value function as well, in which case, it's not a
completely useless default).

- CHB

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Where should grouping() live (was: grouping / dict of lists)

2018-07-03 Thread Chris Barker via Python-ideas

On Tue, Jul 3, 2018 at 8:24 AM, Steven D'Aprano  wrote:

> On Tue, Jul 03, 2018 at 09:23:07AM -0400, David Mertz wrote:
>

> > My problem with the second idea is that *I* find it very wrong to have
> > something in itertools that does not return an iterator.  It wrecks the
> > combinatorial algebra of the module.
>

hmm -- that seems to be a pretty pedantic approach -- practicality beats
purity, after all :-)

I think we should first decide if a grouping() function is a useful
addition to the standard library (after all:  "not every two line function
needs to in the stdlib"), and f so, then we can find a home for it.

personally, I'm wondering if a "dicttools" or something module would make
sense -- I imagine there are all sorts of other handy utilities for working
with dicts that could go there. (though, yeah, we'd want to actually have a
handful of these before creating a new module :-) )

> That said, it's easy to fix... and I believe independently useful.  Just
> > make grouping() a generator function rather than a plain function.  This
> > lets us get an incremental grouping of an iterable.
>
> We already have something which lazily groups an iterable, returning
> groups as they are seen: groupby.
>
> What makes grouping() different from groupby() is that it accumulates
> ALL of the subgroups rather than just consecutive subgroupings.

well, yeah, but it wont actually get you those until you exhaust the
iterator -- so while it's different than itertools.groupby, it is different
than itertools.groupby(sorted(iterable))?

In short, this wouldn't really solve the problems that itertools.groupby
has for this sort of task -- so what's the point?

 > As for where it belongs, perhaps the collections module is the least
worst fit.

That depends some on whether we go with a simple function, in which case
collections is a pretty bad fit (but maybe still the least worse).

Personally I still like the idea of having this be special type of dict,
rather than "just a function" -- and then it's really obvious where to put
it :-)

-CHB

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] grouping / dict of lists

2018-07-03 Thread Steven D'Aprano

On Fri, Jun 29, 2018 at 10:53:34AM -0700, Michael Selik wrote:
> Hello,
> 
> I've drafted a PEP for an easier way to construct groups of elements from a
> sequence. https://github.com/selik/peps/blob/master/pep-.rst

Seems useful, but I suggest that since it has to process the entire data 
set eagerly, the name ought to be grouped() following the precedent set by 
sorted().

I also suggest using keyfunc as the second parameter, following the same 
convention as itertools.groupby. That gives this possible implementation:

def grouped(iterable, keyfunc=None):
groups = {}
for k, g in itertools.groupby(iterable, keyfunc):
groups.setdefault(k, []).extend(g)
return groups

Since Guido has ruled out making this a built-in, there's no really 
comfortable place in the standard library for it:

- it doesn't return an iterator (since it is eager, it would
  be pointless to yield key/items pairs instead of just 
  returning the dict), so itertools is not a good fit;

- it doesn't return a specialist class, so collections is not
  a good fit;

- there's currently no "useful utilities which aren't useful
  enough to be built-in" module.

I fear that this proposal will fall into that awkward position of being 
doomed by not having somewhere to put it.

(Your suggestion to consider this an alternate constructor of dicts 
seems more sensible all the time... but again Guido disagrees.)

-- 
Steve
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] grouping / dict of lists

2018-07-03 Thread Steven D'Aprano

On Tue, Jul 03, 2018 at 04:12:14PM +0200, Nicolas Rolin wrote:

> I agree the examples have lisp-level of brackets. However by using the fact
> tuples don't need brackets and the fact we can use a list instead of an
> iterable (the grouper will have to stock the whole object in memory anyway,
> and if it is really big, use itertools.groupby who is designed exactly for
> that)
> For example
> grouping(((len(word), word) for word in words))
> can be written
> grouping([len(word), word for word in words])
> 
> which is less "bracket issue prone".

Did you try this? It is a syntax error. Generator expressions must be 
surrounded by round brackets:

grouping([len(word), (word for word in words)])

Or perhaps you meant this:

grouping([(len(word), word) for word in words])

but now it seems pointless to use a list comprehension instead of a 
generator expression:

grouping((len(word), word) for word in words)

but why are we using key values by hand when grouping ought to do it for 
us, as Michael Selik's version does?

grouping(words, key=len)



-- 
Steve
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Where should grouping() live (was: grouping / dict of lists)

2018-07-03 Thread Steven D'Aprano

On Tue, Jul 03, 2018 at 09:23:07AM -0400, David Mertz wrote:

> But before putting it on auto-archive, the BDFL said (1) NO GO on getting a
> new builtin; (2) NO OBJECTION to putting it in itertools.
> 
> My problem with the second idea is that *I* find it very wrong to have
> something in itertools that does not return an iterator.  It wrecks the
> combinatorial algebra of the module.

That seems like a reasonable objection to me.

> That said, it's easy to fix... and I believe independently useful.  Just
> make grouping() a generator function rather than a plain function.  This
> lets us get an incremental grouping of an iterable.

We already have something which lazily groups an iterable, returning 
groups as they are seen: groupby.

What makes grouping() different from groupby() is that it accumulates 
ALL of the subgroups rather than just consecutive subgroupings. To make 
it clear with a simulated example (ignoring the keys for brevity):

groupby("aaAAbbCaAB", key=str.upper)
=> groups "aaAA", "bb", "C", "aA", "B"

grouping("aaAAbbCaAB", key=str.upper)
=> groups "aaAAaA", "bbB", "C"

So grouping() cannot even begin returning values until it has processed 
the entire data set. In that regard, it is like sorted() -- it cannot be 
lazy, it is a fundamentally eager operation.

I propose that a better name which indicates the non-lazy nature of this 
function is *grouped* rather than grouping, like sorted().

As for where it belongs, perhaps the collections module is the least 
worst fit.

> This can be useful if
> the iterable is slow or infinite, but the partial groupings are useful in
> themselves.

Under what circumstances would the partial groupings be useful? Given 
the example above:

grouping("aaAAbbCaAB", key=str.upper)

when would you want to see the accumulated partial groups?

# again, ignoring the keys for brevity
"aaAA"
"aaAA", "bb"
"aaAA", "bb", "C"
"aaAAaA", "bb", "C"
"aaAAaA", "bbB", "C"

I don't see any practical use for this -- if you start processing the 
partial groupings immediately, you end up double-processing some 
of the items; if you wait until the last, what's the point of the 
intermediate values?

As you say yourself:

> This isn't so useful for the concrete sequence, but for this it would be
> great:
> 
> for grouped in grouping(data_over_wire()):
> process_partial_groups(grouped)

And that demonstrated exactly why this would be a terrible bug magnet, 
suckering people into doing what you just did, and ending up processing 
values more than once.

To avoid that, your process_partial_groups would need to remember which 
values it has seen before for each key it has seen before.

-- 
Steve
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] grouping / dict of lists

2018-07-03 Thread Nicolas Rolin

2018-07-03 14:58 GMT+02:00 David Mertz :

> On Tue, Jul 3, 2018 at 2:52 AM Chris Barker via Python-ideas  
>
> What you've missed, in *several* examples is the value part of the tuple
> in your API. You've pulled out the key, and forgotten to include anything
> in the actual groups.  I have a hunch that if your API were used, this
> would be a common pitfall.
>
> I think this argues against your API and for Michael's that simply deals
> with "sequences of groupable things."  That's much more like what one deals
> with in SQL, and is familiar that way.  If the things grouped are compound
> object such as dictionaries, objects with common attributes, named tuples,
> etc. then the list of things in a group usually *does not* want the
> grouping attribute removed.
>


I agree the examples have lisp-level of brackets. However by using the fact
tuples don't need brackets and the fact we can use a list instead of an
iterable (the grouper will have to stock the whole object in memory anyway,
and if it is really big, use itertools.groupby who is designed exactly for
that)
For example
grouping(((len(word), word) for word in words))
can be written
grouping([len(word), word for word in words])

which is less "bracket issue prone".
The main advantage of having this syntax is that it gives a definition very
close to the one of a dict comprehension, which is nice considering want we
obtain is a dict (without that feature I'm not sure I will never attempt to
use this function).
And that allows us to have the same construction syntax as a dictionary,
with an iterable of (key, value) pair (
https://docs.python.org/3.7/library/stdtypes.html#dict).



>  So that was an interesting exercise -- many of those are a bit clearer
>> (or more compact) with the key function. But I also notice a pattern -- all
>> those examples fit very well into the key function pattern:
>>
>
> Yep.
>

Well those were the examples used to showcase the keyfunction in the PEP.
This is as bad as it gets for the "initialization by comprehension" syntax.



> I agree still (after all, I proposed it to Michael).  But this seems
> minor, and Guido seems not to like `collections` that much (or at least he
> commented on not using Counter ... which I personally love to use and to
> teach).
>

Actually counter is very very close to grouping (replace the append  with
starting value [] in the for loop by a += with starting value 0 and
groupping becomes a counter), so adding it to collections makes the more
sense by a long shot.

As far as I'm concerned, CHB semantics and syntax for the groupper object
does everything that is needed, and even a little bit too much.
It could be called AppendDict and just accept a (key, value) interable in
input, and instead of doing dict[key] = value as a dict does, does
dict[key] = [value] if key not in dict else dict[key] + [value]  (and
should be coded in C I guess)

-- 
Nicolas Rolin
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Where should grouping() live (was: grouping / dict of lists)

2018-07-03 Thread Michael Selik

I'd prefer to simply write an example for the documentation or clarify the
existing ones, then add good answers to StackOverflow questions.


On Tue, Jul 3, 2018, 6:23 AM David Mertz  wrote:

> Guido said he has mooted this discussion, so it's probably not reaching
> him.  It took one thousand fewer messages for him to stop following this
> than with PEP 572, for some reason :-).
>
> But before putting it on auto-archive, the BDFL said (1) NO GO on getting
> a new builtin; (2) NO OBJECTION to putting it in itertools.
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Where should grouping() live (was: grouping / dict of lists)

2018-07-03 Thread David Mertz

On Tue, Jul 3, 2018 at 9:23 AM David Mertz  wrote:

> Guido said he has mooted this discussion, so it's probably not reaching
> him.
>

I meant 'muted'.  Hopefully he hasn't 'mooted' it.

-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Where should grouping() live (was: grouping / dict of lists)

2018-07-03 Thread David Mertz

Guido said he has mooted this discussion, so it's probably not reaching
him.  It took one thousand fewer messages for him to stop following this
than with PEP 572, for some reason :-).

But before putting it on auto-archive, the BDFL said (1) NO GO on getting a
new builtin; (2) NO OBJECTION to putting it in itertools.

My problem with the second idea is that *I* find it very wrong to have
something in itertools that does not return an iterator.  It wrecks the
combinatorial algebra of the module.

That said, it's easy to fix... and I believe independently useful.  Just
make grouping() a generator function rather than a plain function.  This
lets us get an incremental grouping of an iterable.  This can be useful if
the iterable is slow or infinite, but the partial groupings are useful in
themselves.

Python 3.7.0 (default, Jun 28 2018, 07:39:16)
[Clang 4.0.1 (tags/RELEASE_401/final)] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from grouping import grouping
>>> grouped = grouping('AbBa', key=str.casefold)
>>> for dct in grouped: print(dct)
...
{'a': ['A']}
{'a': ['A'], 'b': ['b']}
{'a': ['A'], 'b': ['b', 'B']}
{'a': ['A', 'a'], 'b': ['b', 'B']}


This isn't so useful for the concrete sequence, but for this it would be
great:

for grouped in grouping(data_over_wire()):

process_partial_groups(grouped)


The implementation need not and should not rely on "pre-grouping" with
itertools.groupby:

def grouping(iterable, key=None):
groups = {}
key = key or (lambda x: x)
for item in iterable:
groups.setdefault(key(item), []).append(item)
yield groups



-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] grouping / dict of lists

2018-07-03 Thread David Mertz

On Tue, Jul 3, 2018 at 2:52 AM Chris Barker via Python-ideas <
python-ideas@python.org> wrote:

> I'd write:
>>
> map(len, words)
>>
>> But I'd also write
>> [len(fullname) for fullname in contacts]
>>
>
> map(lambda name: name.first_name, all_names)
> vs
> [name.first_name for nam in all names]
>
> I really like the comprehension form much better when what you really want
> is a simple expression like an attribute access or index or simple
> calculation, or 
>

Why not `map(attrgetter('first_name'), all_names)`?

> In [56]: grouping(school_student_list)
> Out[56]: {'SchoolA': ['Fred', 'Mary'], 'SchoolB': ['Bob', 'Jane'],
> 'SchoolC': ['Nancy']}
>

This one case is definitely nice. However...

And here are the examples from the PEP:
> (untested -- I may hav missed some brackets, etc)
>

What you've missed, in *several* examples is the value part of the tuple in
your API. You've pulled out the key, and forgotten to include anything in
the actual groups.  I have a hunch that if your API were used, this would
be a common pitfall.

I think this argues against your API and for Michael's that simply deals
with "sequences of groupable things."  That's much more like what one deals
with in SQL, and is familiar that way.  If the things grouped are compound
object such as dictionaries, objects with common attributes, named tuples,
etc. then the list of things in a group usually *does not* want the
grouping attribute removed.

> grouping(((len(word), word) for word in words))
> grouping((name[0], name) for name in names))
> grouping((contact.city, contact) for contact in contacts)
>

Good so far, but a lot of redundancy in always spelling tuple of
`(derived-key, object)`.

> grouping((employee['department'] for employee in employees)
> grouping((os.path.splitext(filepath)[1] for filepath in os.listdir('.')))
> grouping(('debit' if v > 0 else 'credit' for v in transactions))
>

And here you forget about the object itself 3 times in a row (or also
forget some derived "value" that you might want in your other comments).

> grouping(((v, k) for v, k in d.items()))
>

This is nice, and spelled correctly.

> So that was an interesting exercise -- many of those are a bit clearer (or
> more compact) with the key function. But I also notice a pattern -- all
> those examples fit very well into the key function pattern:
>

Yep.

I also think that the row-style "list of data" where you want to discard
the key from the values is nicely spelled (in the PEP) as:

INDEX = 0
grouping(sequence, key=lambda row: row.pop(INDEX))

groups = {}
> for item in iterable:
> groups.setdefault(key(item), []).append(item)
>

I agree this seems better as an implementation.

> I still prefer the class idea over a utility function, because:
> * with a class, you can ad stuff to the grouping later:
>
> a_grouping['key'] = value
>
> or maybe a_grouping.add(item)
> * with a class, you can add utility methods -- I kinda liked that in your
> original PEP.
>

I agree still (after all, I proposed it to Michael).  But this seems minor,
and Guido seems not to like `collections` that much (or at least he
commented on not using Counter ... which I personally love to use and to
teach).

That said, a 'grouping()' function seems fine to me also... with a couple
utility functions (that need not be builtin, or even standard library
necessarily) in place of methods.  A lot of what methods would do can
easily be done using comprehensions as well, some examples are shown in the
PEP.

-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] grouping / dict of lists

2018-07-03 Thread Chris Barker via Python-ideas

On Mon, Jul 2, 2018 at 11:50 PM, Chris Barker  wrote:

> - keep the key function optional parameter.
> - add a value function optional parameter. -- it really makes any case
> where you don't want to store the whole item a lot easier.
>
> - Have the default key function be itemgetter(0) and the default value
> function be itemgetter(1) (or some similar way to implement default support
> for processing an iterable of (key, value) pairs.
>
> Having no value function and an equality default for the key function may
> be "common", but it's also pretty much useless -- why have a useless
> default?
>
> Thinking this through now I do see that having key and value default to to
> the pair method means that if you specify  key function, you will probably
> have to specify a value function as well -- so maybe that's not ideal.
>

OK, I prototyped a class solution that defaults to key, value pairs, but
you can specify a key and/or value function. and with convoluted logic, if
you specify just a key, then the value defaults to the entire item. So
folks will pretty much get what they expect.

I think it's actually pretty slick -- best of both worlds?

Code here:

https://github.com/PythonCHB/grouper/blob/master/grouper/grouper.py

-CHB


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] grouping / dict of lists

2018-07-03 Thread Chris Barker via Python-ideas

On Mon, Jul 2, 2018 at 9:39 AM, Michael Selik  wrote:

> On Mon, Jul 2, 2018 at 2:32 AM Nicolas Rolin 
>>> wrote:
>>>
 For example the default could be such that grouping unpack tuples (key,
 value) from the iterator and do what's expected with it (group value by
 key). It is quite reasonable, and you have one example with (key, value) in
 your example, and no example with the current default.

>>>
I agree, the default should do something that has some chance of being
useful on its own, and ideally, the most "common" use, if we can identify
that.

> On Mon, Jul 2, 2018 at 3:22 AM Nicolas Rolin 
> wrote:
>
>> My question would be: does it have to be a key function? Can't we just
>> remove the "key" argument?
>>
>
> In the examples or from the parameters? A key function is necessary to
> support a wide variety of uses.
>

not if you have the expectation of an iterable of (key, value) pairs as the
input -- then any processing required to get a different key happens before
hand, allowing folks to use comprehension syntax.

as so: :-)

Because for pretty much all the given examples, I would find my default as
>> readable and nearly as short as the "key" syntax :
>>
>> > grouping(words, key=len)
>> grouping((len(word), word for word in words))
>>
>
> I think the fact that you misplaced a closing parenthesis demonstrates how
> the key-function pattern can be more clear.
>

I think it demonstrates that you shouldn't post untested code :-) -- the
missing paren is a syntax error -- it would be caught right away in real
life.

>
> The code is slightly more verbose, but it is akin to filter(iterable,
>> function) vs (i for i in iterable if function(i)).
>>
>
> Sometimes I prefer ``map`` and sometimes I prefer a list comprehension.
>

That is a "problem" with python: there are two ways to do things like map
and filter, and one way is not always the clearest choice.

But I wonder if map and filter would exist if they didn't pre-date
comprehensions. That aside, the comprehension approach is pretty well
liked and useful. And almost always prefer it -- an expression is simple on
the eyes to me :-)

But when it's really a win is when you don't have a handy built-in function
to do what you want, even though it's simple expression.

With the map, filter, key approach, you have to write a bunch of little
utility functions or lambdas, which can really clutter up the code.

If so, I like to write out the comprehension to provide that extra variable
> name for clarity.
>
> I'd write:
> map(len, words)
>
> But I'd also write
> [len(fullname) for fullname in contacts]
>

A key (pun intended) issue is that passing functions around looks so neat
and  clean when it's a simple built in function like "len" -- but if not,
the it gets uglier, like:

map(lambda name: name.first_name, all_names)

vs

[name.first_name for nam in all names]

I really like the comprehension form much better when what you really want
is a simple expression like an attribute access or index or simple
calculation, or 

I appreciate that defaulting the grouping key-function to ``itemgetter(0)``
> would enable a pleasant flexibility for people to make that same choice for
> each use. I haven't fully come around to that, yet, because so many other
> tools use the equality function as the default.
>

well, kinda, but maybe those aren't "pythonic" :-)

(and yes, itemgetter() is probably a better option than lambda in many
cases -- but I always forget that exists)

I started out in this topic answering a question about how to do a grouping
for a list of tuples, in that case the OP wanted a comprehension. I don't
think there's a good way to get a direct comprehension, but I do think we
can make a class of function that takes something you could build with a
comprehension.

And I took a look at itertools.groupby, and found it very, very awkward,
ultimately arriving at:

student_school_list =  [('Fred', 'SchoolA'),
('Bob', 'SchoolB'),
('Mary', 'SchoolA'),
('Jane', 'SchoolB'),
('Nancy', 'SchoolC')]

grouped = {a:[t[0] for t in b] for a,b in groupby(sorted(student_school_list,
key=lambda t: t[1]), key=lambda t: t[1])}

{'SchoolA': ['Fred', 'Mary'], 'SchoolB': ['Bob', 'Jane'], 'SchoolC':
['Nancy']}

So why is that so painful?

("it" is itertools.groupby)

a) it  returns an iterable of tuples, so to get a dict you need to do the
dict comp
b) it requires the data to be sorted -- so you ned to sort it first
c) I need to provide a key function to sort by
d) I need to provide (the same, always?) key function to group by
e) when I make the dict, I need to make the list, and use an expression to
get the value I want.
f) because I need those key functions, I need to use lambda for what could
be a simple expression

So the current proposal in the PEP makes that a lot better:

a) It makes a dict, so that step is done
b) It doesn't require the data to

Re: [Python-ideas] grouping / dict of lists

Re: [Python-ideas] Where should grouping() live

Re: [Python-ideas] grouping / dict of lists

Re: [Python-ideas] Where should grouping() live

Re: [Python-ideas] grouping / dict of lists

Re: [Python-ideas] grouping / dict of lists

Re: [Python-ideas] Where should grouping() live

Re: [Python-ideas] Where should grouping() live

Re: [Python-ideas] grouping / dict of lists

Re: [Python-ideas] Where should grouping() live (was: grouping / dict of lists)

Re: [Python-ideas] Where should grouping() live (was: grouping / dict of lists)

Re: [Python-ideas] Where should grouping() live (was: grouping / dict of lists)

Re: [Python-ideas] grouping / dict of lists

Re: [Python-ideas] Where should grouping() live (was: grouping / dict of lists)

Re: [Python-ideas] grouping / dict of lists

Re: [Python-ideas] grouping / dict of lists

Re: [Python-ideas] Where should grouping() live (was: grouping / dict of lists)

Re: [Python-ideas] grouping / dict of lists

Re: [Python-ideas] Where should grouping() live (was: grouping / dict of lists)

Re: [Python-ideas] Where should grouping() live (was: grouping / dict of lists)

[Python-ideas] Where should grouping() live (was: grouping / dict of lists)

Re: [Python-ideas] grouping / dict of lists

Re: [Python-ideas] grouping / dict of lists

Re: [Python-ideas] grouping / dict of lists

24 matches

Site Navigation

Mail list logo

Footer information