Re: [Python-ideas] Where should grouping() live (was: grouping / dict of lists)

2018-07-09 Thread Franklin? Lee
(Fixing quote and attribution.)

On Fri, Jul 6, 2018, 11:32 Chris Barker - NOAA Federal via
Python-ideas  wrote:
>
> On Jul 6, 2018, at 2:10 AM, Steven D'Aprano  wrote:
>
> > On Fri, Jul 06, 2018 at 09:49:37AM +0100, Cammil Taank wrote:
> > > I would consider statistics
>
> > > to have similarities - median, mean etc are aggregate functions.
>
>
> Not really, more like reduce, actually -/ you get a single result.
>
> > > Histograms are also doing something similar to grouping.
>
> > .(Yes, a few statistics apply to nominal and ordinal data too,
>
>
> And for that, a generic grouping function could be used.
>
> In fact, allowing Counter to be used as the accumulater was one suggestion in 
> this thread, and would build s histogram.
>
> Now that I think about it, you could write a key function that built a 
> histogram for continuous data as well.
>
> Though that might be a bit klunky.
>
> But if someone thinks that’s a good idea, a PR for an example would be 
> accepted:
>
> https://github.com/PythonCHB/grouper

+1 for `collections`, because it's where you look for something
similar to Counter.

-1 for `statistics`, because the need isn't specific to statistics.
It'd be like putting `commonprefix`, which is a general string
operation, into `os.path`. It's hacky to import a domain-specific
module to use one of its non-domain-specific helpers for a different
domain.

Someone can argue for functools, as that's the functional programming
module, containing `reduce`.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Where should grouping() live (was: grouping / dict of lists)

2018-07-06 Thread Chris Barker - NOAA Federal via Python-ideas
On Jul 6, 2018, at 2:10 AM, Steven D'Aprano  wrote:


I would consider statistics

to have similarities - median, mean etc are aggregate functions.


Not really, more like reduce, actually -/ you get a single result.

Histograms

are also doing something similar to grouping.

.(Yes, a few statistics apply to
nominal and ordinal data too,


And for that, a generic grouping function could be used.

In fact, allowing Counter to be used as the accumulater was one suggestion
in this thread, and would build s histogram.

Now that I think about it, you could write a key function that built a
histogram for continuous data as well.

Though that might be a bit klunky.

But if someone thinks that’s a good idea, a PR for an example would be
accepted:

https://github.com/PythonCHB/grouper

-CHB






-- 
Steve
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Where should grouping() live (was: grouping / dict of lists)

2018-07-06 Thread Steven D'Aprano
On Fri, Jul 06, 2018 at 09:49:37AM +0100, Cammil Taank wrote:

> I would consider statistics
> to have similarities - median, mean etc are aggregate functions. Histograms
> are also doing something similar to grouping.

I was thinking the same thing, but I don't think it is a good fit. 
Grouping records with arbitrary structure is very different from the 
numerically-focused statistics module. (Yes, a few statistics apply to 
nominal and ordinal data too, but the primary focus is on numbers.)



-- 
Steve
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Where should grouping() live (was: grouping / dict of lists)

2018-07-06 Thread Cammil Taank
The way I see grouping is as an aggregation operation. As such, in my head,
grouping is similar to min/max. However, if builtins are a no-go, then I
feel I need to think a little outside the box:

Is there a possibility that there will be desired many more aggregate
functions in the near future? Is there a case for collecting aggregate
functions into another top level module? Also, I would consider statistics
to have similarities - median, mean etc are aggregate functions. Histograms
are also doing something similar to grouping.

Apologies I have not offered any concrete suggestions, but just thought I
should offer my thoughts.

On Thu, 5 Jul 2018, 22:24 Chris Barker via Python-ideas, <
python-ideas@python.org> wrote:

> On Thu, Jul 5, 2018 at 3:26 AM, David Mertz  wrote:
>
>> Yes, he said a definite no to a built-in. But he expressed a less
>> specific lack of enthusiasm for collections classes (including Counter,
>> which exists and which I personally use often).
>>
>
> And a Grouping class would do more than Counter, which I find trivial
> enough that I generally don't bother to use it.
>
> -CHB
> --
>
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR(206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115   (206) 526-6317   main reception
>
> chris.bar...@noaa.gov
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Where should grouping() live (was: grouping / dict of lists)

2018-07-05 Thread Chris Barker via Python-ideas
On Thu, Jul 5, 2018 at 3:26 AM, David Mertz  wrote:

> Yes, he said a definite no to a built-in. But he expressed a less specific
> lack of enthusiasm for collections classes (including Counter, which exists
> and which I personally use often).
>

And a Grouping class would do more than Counter, which I find trivial
enough that I generally don't bother to use it.

-CHB
-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Where should grouping() live (was: grouping / dict of lists)

2018-07-05 Thread David Mertz
Yes, he said a definite no to a built-in. But he expressed a less specific
lack of enthusiasm for collections classes (including Counter, which exists
and which I personally use often).

On Thu, Jul 5, 2018, 1:16 AM Chris Barker  wrote:

> On Tue, Jul 3, 2018 at 6:23 AM, David Mertz  wrote:
>
>> Guido said he has muted this discussion
>>
>
> ...
>
> But before putting it on auto-archive, the BDFL said (1) NO GO on getting
> a new builtin; (2) NO OBJECTION to putting it in itertools.
>
> I don't recall him offering an opinion on a class in collections, did he?
>
> -CHB
>
>
>
> --
>
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR(206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115   (206) 526-6317   main reception
>
> chris.bar...@noaa.gov
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Where should grouping() live (was: grouping / dict of lists)

2018-07-04 Thread Chris Barker via Python-ideas
On Tue, Jul 3, 2018 at 6:23 AM, David Mertz  wrote:

> Guido said he has mooted this discussion
>

...

But before putting it on auto-archive, the BDFL said (1) NO GO on getting a
new builtin; (2) NO OBJECTION to putting it in itertools.

I don't recall him offering an opinion on a class in collections, did he?

-CHB



-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Where should grouping() live (was: grouping / dict of lists)

2018-07-04 Thread Chris Barker via Python-ideas
On Wed, Jul 4, 2018 at 3:53 AM, INADA Naoki  wrote:

> But if it happens, I'm -1 on functools and collections.
> They are used very much.  Every Python tool import them regardless how
> much of their contents are used.
>

really? collections?  what for? I'm guessing namedtuple and maybe deque.

But collections already has 9 classes (well, things) in it so we'd be
adding a bit less than 10% more to it.

what is the concern? import time, memory?

In either case, it seems like the wrong driver for deciding where to put
new things.

> If you really want to add it in collections, I suggests
from collections.groupdict import GroupDict.

Perhaps the stdlib should have a deeper namespaces in general -- if that is
established as a policy, then this could be the first thing to follow that
policy. But I thought "flat is better than nested" -- sigh.

So maybe we need to bite the bullet and solve the problem at another level:

1) if, say, namedtuple has gotten very popular, maybe it should move to
builtins.

2) Whatever happened to the proposals to make it easier to lazy-load stuff
in modules? If that gets implemented, then we can speed up startup in
general, and not have to be too worried about adding "too much" to a module
because one thing in it is common use.

-CHB

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Where should grouping() live (was: grouping / dict of lists)

2018-07-04 Thread Michael Selik
On Wed, Jul 4, 2018, 3:11 AM Ivan Levkivskyi  wrote:

> Replying to the question in subject, I think it would be better in
> collections as a class.
> Having it just as a function doesn't  buy much, because one can do the
> same with three lines and a defaultdict.
>

Four lines. You'll need to convert from defaultdict back to a basic dict to
avoid mistaken inserts. For some use cases.


However, if this is a class it can support adding new elements, merge the
> groupeddicts, etc.
>
> --
> Ivan
>
>
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Where should grouping() live (was: grouping / dict of lists)

2018-07-04 Thread INADA Naoki
I'm -1 on adding it in stdlib.

But if it happens, I'm -1 on functools and collections.
They are used very much.  Every Python tool import them regardless how much
of their contents are used.

On the other hand, itertools contains random stuff very rarely used.

If you really want to add it in collections, I suggests
from collections.groupdict import GroupDict.

Regards,

On Tue, Jul 3, 2018 at 10:23 PM David Mertz  wrote:

> Guido said he has mooted this discussion, so it's probably not reaching
> him.  It took one thousand fewer messages for him to stop following this
> than with PEP 572, for some reason :-).
>
> But before putting it on auto-archive, the BDFL said (1) NO GO on getting
> a new builtin; (2) NO OBJECTION to putting it in itertools.
>
> My problem with the second idea is that *I* find it very wrong to have
> something in itertools that does not return an iterator.  It wrecks the
> combinatorial algebra of the module.
>
> That said, it's easy to fix... and I believe independently useful.  Just
> make grouping() a generator function rather than a plain function.  This
> lets us get an incremental grouping of an iterable.  This can be useful if
> the iterable is slow or infinite, but the partial groupings are useful in
> themselves.
>
> Python 3.7.0 (default, Jun 28 2018, 07:39:16)
> [Clang 4.0.1 (tags/RELEASE_401/final)] :: Anaconda, Inc. on darwin
> Type "help", "copyright", "credits" or "license" for more information.
> >>> from grouping import grouping
> >>> grouped = grouping('AbBa', key=str.casefold)
> >>> for dct in grouped: print(dct)
> ...
> {'a': ['A']}
> {'a': ['A'], 'b': ['b']}
> {'a': ['A'], 'b': ['b', 'B']}
> {'a': ['A', 'a'], 'b': ['b', 'B']}
>
>
> This isn't so useful for the concrete sequence, but for this it would be
> great:
>
> for grouped in grouping(data_over_wire()):
>
> process_partial_groups(grouped)
>
>
> The implementation need not and should not rely on "pre-grouping" with
> itertools.groupby:
>
> def grouping(iterable, key=None):
> groups = {}
> key = key or (lambda x: x)
> for item in iterable:
> groups.setdefault(key(item), []).append(item)
> yield groups
>
>
>
> --
> Keeping medicines from the bloodstreams of the sick; food
> from the bellies of the hungry; books from the hands of the
> uneducated; technology from the underdeveloped; and putting
> advocates of freedom in prisons.  Intellectual property is
> to the 21st century what the slave trade was to the 16th.
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>


-- 
INADA Naoki  
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Where should grouping() live (was: grouping / dict of lists)

2018-07-04 Thread Ivan Levkivskyi
On 4 July 2018 at 11:25, Steven D'Aprano  wrote:

> On Wed, Jul 04, 2018 at 11:08:05AM +0100, Ivan Levkivskyi wrote:
> > Replying to the question in subject, I think it would be better in
> > collections as a class.
> > Having it just as a function doesn't  buy much, because one can do the
> same
> > with three lines and a defaultdict.
> > However, if this is a class it can support adding new elements, merge the
> > groupeddicts, etc.
>
> defaultdicts support adding new elements, and they have an update method
> same as regular dicts :-)
>

Except that updating will not do what I want. Merging two groupeddicts is
not just `one.update(other)`
Moreover, using just an update with regular dicts will do something
bug-prone, it will add every group
from `other` as an element to the corresponding group in `one`.

--
Ivan
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Where should grouping() live (was: grouping / dict of lists)

2018-07-04 Thread Steven D'Aprano
On Wed, Jul 04, 2018 at 11:08:05AM +0100, Ivan Levkivskyi wrote:
> Replying to the question in subject, I think it would be better in
> collections as a class.
> Having it just as a function doesn't  buy much, because one can do the same
> with three lines and a defaultdict.
> However, if this is a class it can support adding new elements, merge the
> groupeddicts, etc.

defaultdicts support adding new elements, and they have an update method 
same as regular dicts :-)



-- 
Steve
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Where should grouping() live (was: grouping / dict of lists)

2018-07-04 Thread Ivan Levkivskyi
Replying to the question in subject, I think it would be better in
collections as a class.
Having it just as a function doesn't  buy much, because one can do the same
with three lines and a defaultdict.
However, if this is a class it can support adding new elements, merge the
groupeddicts, etc.

--
Ivan
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Where should grouping() live (was: grouping / dict of lists)

2018-07-03 Thread Chris Barker via Python-ideas
On Tue, Jul 3, 2018 at 12:01 PM, David Mertz  wrote:

> ... but I STILL like a new collections.Grouping (or collections.Grouper)
> the best.
>

me too.


> It might overcome Guido's reluctance... and what goes there is really
> delegated by him, not his own baby.
>

Is collections anyone in particular's baby? like itertools "belongs" to
Raymond?

-CHB




> On Tue, Jul 3, 2018 at 12:19 PM Chris Barker via Python-ideas <
> python-ideas@python.org> wrote:
>
>> On Tue, Jul 3, 2018 at 8:24 AM, Steven D'Aprano 
>> wrote:
>>
>>> On Tue, Jul 03, 2018 at 09:23:07AM -0400, David Mertz wrote:
>>>
>>
>>
>>> > My problem with the second idea is that *I* find it very wrong to have
>>> > something in itertools that does not return an iterator.  It wrecks the
>>> > combinatorial algebra of the module.
>>>
>>
>> hmm -- that seems to be a pretty pedantic approach -- practicality beats
>> purity, after all :-)
>>
>> I think we should first decide if a grouping() function is a useful
>> addition to the standard library (after all:  "not every two line function
>> needs to in the stdlib"), and f so, then we can find a home for it.
>>
>> personally, I'm wondering if a "dicttools" or something module would make
>> sense -- I imagine there are all sorts of other handy utilities for working
>> with dicts that could go there. (though, yeah, we'd want to actually have a
>> handful of these before creating a new module :-) )
>>
>> > That said, it's easy to fix... and I believe independently useful.  Just
>>> > make grouping() a generator function rather than a plain function.
>>> This
>>> > lets us get an incremental grouping of an iterable.
>>>
>>> We already have something which lazily groups an iterable, returning
>>> groups as they are seen: groupby.
>>>
>>> What makes grouping() different from groupby() is that it accumulates
>>> ALL of the subgroups rather than just consecutive subgroupings.
>>
>>
>> well, yeah, but it wont actually get you those until you exhaust the
>> iterator -- so while it's different than itertools.groupby, it is different
>> than itertools.groupby(sorted(iterable))?
>>
>> In short, this wouldn't really solve the problems that itertools.groupby
>> has for this sort of task -- so what's the point?
>>
>>  > As for where it belongs, perhaps the collections module is the least
>> worst fit.
>>
>> That depends some on whether we go with a simple function, in which case
>> collections is a pretty bad fit (but maybe still the least worse).
>>
>> Personally I still like the idea of having this be special type of dict,
>> rather than "just a function" -- and then it's really obvious where to put
>> it :-)
>>
>> -CHB
>>
>>
>> --
>>
>> Christopher Barker, Ph.D.
>> Oceanographer
>>
>> Emergency Response Division
>> NOAA/NOS/OR(206) 526-6959   voice
>> 7600 Sand Point Way NE
>> 
>>   (206) 526-6329   fax
>> Seattle, WA  98115   (206) 526-6317   main reception
>>
>> chris.bar...@noaa.gov
>> ___
>> Python-ideas mailing list
>> Python-ideas@python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>
>
> --
> Keeping medicines from the bloodstreams of the sick; food
> from the bellies of the hungry; books from the hands of the
> uneducated; technology from the underdeveloped; and putting
> advocates of freedom in prisons.  Intellectual property is
> to the 21st century what the slave trade was to the 16th.
>



-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Where should grouping() live (was: grouping / dict of lists)

2018-07-03 Thread David Mertz
I admit a hypothetical itertools.grouping that returned incrementally built
dictionaries doesn't fill any simple need I have often encountered.  I can
be hand-wavy about "stateful bucketing of streams" and looking at
windowing/tails, but I don't have a clean and simple example where I need
this.  The "run to exhaustion" interface has more obvious uses (albeit,
they *must* be technically a subset of the incremental ones).

I think I will also concede that in incrementally built and yielded
dictionary isn't *really* in the spirit of itertools either.  I suppose
tee() can grow unboundedly if only one tine is utilized... but in general,
itertools is meant to provide iterators that keep memory usage limited to a
few elements in memory at a time (yes, groupby, takewhile, or dropwhile
have pathological cases that could be unbounded... but usually they're not).

So maybe we really do need a dicttools or mappingtools module, with this as
the first function to put inside it.

... but I STILL like a new collections.Grouping (or collections.Grouper)
the best.  It might overcome Guido's reluctance... and what goes there is
really delegated by him, not his own baby.

On Tue, Jul 3, 2018 at 12:19 PM Chris Barker via Python-ideas <
python-ideas@python.org> wrote:

> On Tue, Jul 3, 2018 at 8:24 AM, Steven D'Aprano 
> wrote:
>
>> On Tue, Jul 03, 2018 at 09:23:07AM -0400, David Mertz wrote:
>>
>
>
>> > My problem with the second idea is that *I* find it very wrong to have
>> > something in itertools that does not return an iterator.  It wrecks the
>> > combinatorial algebra of the module.
>>
>
> hmm -- that seems to be a pretty pedantic approach -- practicality beats
> purity, after all :-)
>
> I think we should first decide if a grouping() function is a useful
> addition to the standard library (after all:  "not every two line function
> needs to in the stdlib"), and f so, then we can find a home for it.
>
> personally, I'm wondering if a "dicttools" or something module would make
> sense -- I imagine there are all sorts of other handy utilities for working
> with dicts that could go there. (though, yeah, we'd want to actually have a
> handful of these before creating a new module :-) )
>
> > That said, it's easy to fix... and I believe independently useful.  Just
>> > make grouping() a generator function rather than a plain function.  This
>> > lets us get an incremental grouping of an iterable.
>>
>> We already have something which lazily groups an iterable, returning
>> groups as they are seen: groupby.
>>
>> What makes grouping() different from groupby() is that it accumulates
>> ALL of the subgroups rather than just consecutive subgroupings.
>
>
> well, yeah, but it wont actually get you those until you exhaust the
> iterator -- so while it's different than itertools.groupby, it is different
> than itertools.groupby(sorted(iterable))?
>
> In short, this wouldn't really solve the problems that itertools.groupby
> has for this sort of task -- so what's the point?
>
>  > As for where it belongs, perhaps the collections module is the least
> worst fit.
>
> That depends some on whether we go with a simple function, in which case
> collections is a pretty bad fit (but maybe still the least worse).
>
> Personally I still like the idea of having this be special type of dict,
> rather than "just a function" -- and then it's really obvious where to put
> it :-)
>
> -CHB
>
>
> --
>
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR(206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115   (206) 526-6317   main reception
>
> chris.bar...@noaa.gov
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>


-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Where should grouping() live (was: grouping / dict of lists)

2018-07-03 Thread Chris Barker via Python-ideas
It seems a really stupid reason to make this choice, but:

If we make a Grouping class, it has an obvious home in the collections
module

If we make a grouping (or grouped) function, we don't know where to put it

But since I like the Grouping class idea anyway, it's one more reason...

-CHB


On Tue, Jul 3, 2018 at 9:15 AM, Chris Barker  wrote:

> On Tue, Jul 3, 2018 at 8:24 AM, Steven D'Aprano 
> wrote:
>
>> On Tue, Jul 03, 2018 at 09:23:07AM -0400, David Mertz wrote:
>>
>
>
>> > My problem with the second idea is that *I* find it very wrong to have
>> > something in itertools that does not return an iterator.  It wrecks the
>> > combinatorial algebra of the module.
>>
>
> hmm -- that seems to be a pretty pedantic approach -- practicality beats
> purity, after all :-)
>
> I think we should first decide if a grouping() function is a useful
> addition to the standard library (after all:  "not every two line function
> needs to in the stdlib"), and f so, then we can find a home for it.
>
> personally, I'm wondering if a "dicttools" or something module would make
> sense -- I imagine there are all sorts of other handy utilities for working
> with dicts that could go there. (though, yeah, we'd want to actually have a
> handful of these before creating a new module :-) )
>
> > That said, it's easy to fix... and I believe independently useful.  Just
>> > make grouping() a generator function rather than a plain function.  This
>> > lets us get an incremental grouping of an iterable.
>>
>> We already have something which lazily groups an iterable, returning
>> groups as they are seen: groupby.
>>
>> What makes grouping() different from groupby() is that it accumulates
>> ALL of the subgroups rather than just consecutive subgroupings.
>
>
> well, yeah, but it wont actually get you those until you exhaust the
> iterator -- so while it's different than itertools.groupby, it is different
> than itertools.groupby(sorted(iterable))?
>
> In short, this wouldn't really solve the problems that itertools.groupby
> has for this sort of task -- so what's the point?
>
>  > As for where it belongs, perhaps the collections module is the least
> worst fit.
>
> That depends some on whether we go with a simple function, in which case
> collections is a pretty bad fit (but maybe still the least worse).
>
> Personally I still like the idea of having this be special type of dict,
> rather than "just a function" -- and then it's really obvious where to put
> it :-)
>
> -CHB
>
>
> --
>
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR(206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115   (206) 526-6317   main reception
>
> chris.bar...@noaa.gov
>



-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Where should grouping() live (was: grouping / dict of lists)

2018-07-03 Thread Chris Barker via Python-ideas
On Tue, Jul 3, 2018 at 8:24 AM, Steven D'Aprano  wrote:

> On Tue, Jul 03, 2018 at 09:23:07AM -0400, David Mertz wrote:
>


> > My problem with the second idea is that *I* find it very wrong to have
> > something in itertools that does not return an iterator.  It wrecks the
> > combinatorial algebra of the module.
>

hmm -- that seems to be a pretty pedantic approach -- practicality beats
purity, after all :-)

I think we should first decide if a grouping() function is a useful
addition to the standard library (after all:  "not every two line function
needs to in the stdlib"), and f so, then we can find a home for it.

personally, I'm wondering if a "dicttools" or something module would make
sense -- I imagine there are all sorts of other handy utilities for working
with dicts that could go there. (though, yeah, we'd want to actually have a
handful of these before creating a new module :-) )

> That said, it's easy to fix... and I believe independently useful.  Just
> > make grouping() a generator function rather than a plain function.  This
> > lets us get an incremental grouping of an iterable.
>
> We already have something which lazily groups an iterable, returning
> groups as they are seen: groupby.
>
> What makes grouping() different from groupby() is that it accumulates
> ALL of the subgroups rather than just consecutive subgroupings.


well, yeah, but it wont actually get you those until you exhaust the
iterator -- so while it's different than itertools.groupby, it is different
than itertools.groupby(sorted(iterable))?

In short, this wouldn't really solve the problems that itertools.groupby
has for this sort of task -- so what's the point?

 > As for where it belongs, perhaps the collections module is the least
worst fit.

That depends some on whether we go with a simple function, in which case
collections is a pretty bad fit (but maybe still the least worse).

Personally I still like the idea of having this be special type of dict,
rather than "just a function" -- and then it's really obvious where to put
it :-)

-CHB


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Where should grouping() live (was: grouping / dict of lists)

2018-07-03 Thread Steven D'Aprano
On Tue, Jul 03, 2018 at 09:23:07AM -0400, David Mertz wrote:

> But before putting it on auto-archive, the BDFL said (1) NO GO on getting a
> new builtin; (2) NO OBJECTION to putting it in itertools.
> 
> My problem with the second idea is that *I* find it very wrong to have
> something in itertools that does not return an iterator.  It wrecks the
> combinatorial algebra of the module.

That seems like a reasonable objection to me.


> That said, it's easy to fix... and I believe independently useful.  Just
> make grouping() a generator function rather than a plain function.  This
> lets us get an incremental grouping of an iterable.

We already have something which lazily groups an iterable, returning 
groups as they are seen: groupby.

What makes grouping() different from groupby() is that it accumulates 
ALL of the subgroups rather than just consecutive subgroupings. To make 
it clear with a simulated example (ignoring the keys for brevity):

groupby("aaAAbbCaAB", key=str.upper)
=> groups "aaAA", "bb", "C", "aA", "B"

grouping("aaAAbbCaAB", key=str.upper)
=> groups "aaAAaA", "bbB", "C"

So grouping() cannot even begin returning values until it has processed 
the entire data set. In that regard, it is like sorted() -- it cannot be 
lazy, it is a fundamentally eager operation.

I propose that a better name which indicates the non-lazy nature of this 
function is *grouped* rather than grouping, like sorted().

As for where it belongs, perhaps the collections module is the least 
worst fit.


> This can be useful if
> the iterable is slow or infinite, but the partial groupings are useful in
> themselves.

Under what circumstances would the partial groupings be useful? Given 
the example above:

grouping("aaAAbbCaAB", key=str.upper)

when would you want to see the accumulated partial groups?

# again, ignoring the keys for brevity
"aaAA"
"aaAA", "bb"
"aaAA", "bb", "C"
"aaAAaA", "bb", "C"
"aaAAaA", "bbB", "C"


I don't see any practical use for this -- if you start processing the 
partial groupings immediately, you end up double-processing some 
of the items; if you wait until the last, what's the point of the 
intermediate values?

As you say yourself:

> This isn't so useful for the concrete sequence, but for this it would be
> great:
> 
> for grouped in grouping(data_over_wire()):
> process_partial_groups(grouped)

And that demonstrated exactly why this would be a terrible bug magnet, 
suckering people into doing what you just did, and ending up processing 
values more than once.

To avoid that, your process_partial_groups would need to remember which 
values it has seen before for each key it has seen before.


-- 
Steve
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Where should grouping() live (was: grouping / dict of lists)

2018-07-03 Thread Michael Selik
I'd prefer to simply write an example for the documentation or clarify the
existing ones, then add good answers to StackOverflow questions.


On Tue, Jul 3, 2018, 6:23 AM David Mertz  wrote:

> Guido said he has mooted this discussion, so it's probably not reaching
> him.  It took one thousand fewer messages for him to stop following this
> than with PEP 572, for some reason :-).
>
> But before putting it on auto-archive, the BDFL said (1) NO GO on getting
> a new builtin; (2) NO OBJECTION to putting it in itertools.
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Where should grouping() live (was: grouping / dict of lists)

2018-07-03 Thread David Mertz
On Tue, Jul 3, 2018 at 9:23 AM David Mertz  wrote:

> Guido said he has mooted this discussion, so it's probably not reaching
> him.
>

I meant 'muted'.  Hopefully he hasn't 'mooted' it.

-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Where should grouping() live (was: grouping / dict of lists)

2018-07-03 Thread David Mertz
Guido said he has mooted this discussion, so it's probably not reaching
him.  It took one thousand fewer messages for him to stop following this
than with PEP 572, for some reason :-).

But before putting it on auto-archive, the BDFL said (1) NO GO on getting a
new builtin; (2) NO OBJECTION to putting it in itertools.

My problem with the second idea is that *I* find it very wrong to have
something in itertools that does not return an iterator.  It wrecks the
combinatorial algebra of the module.

That said, it's easy to fix... and I believe independently useful.  Just
make grouping() a generator function rather than a plain function.  This
lets us get an incremental grouping of an iterable.  This can be useful if
the iterable is slow or infinite, but the partial groupings are useful in
themselves.

Python 3.7.0 (default, Jun 28 2018, 07:39:16)
[Clang 4.0.1 (tags/RELEASE_401/final)] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from grouping import grouping
>>> grouped = grouping('AbBa', key=str.casefold)
>>> for dct in grouped: print(dct)
...
{'a': ['A']}
{'a': ['A'], 'b': ['b']}
{'a': ['A'], 'b': ['b', 'B']}
{'a': ['A', 'a'], 'b': ['b', 'B']}


This isn't so useful for the concrete sequence, but for this it would be
great:

for grouped in grouping(data_over_wire()):

process_partial_groups(grouped)


The implementation need not and should not rely on "pre-grouping" with
itertools.groupby:

def grouping(iterable, key=None):
groups = {}
key = key or (lambda x: x)
for item in iterable:
groups.setdefault(key(item), []).append(item)
yield groups



-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/