On Tue, Jul 03, 2018 at 09:23:07AM -0400, David Mertz wrote: > But before putting it on auto-archive, the BDFL said (1) NO GO on getting a > new builtin; (2) NO OBJECTION to putting it in itertools. > > My problem with the second idea is that *I* find it very wrong to have > something in itertools that does not return an iterator. It wrecks the > combinatorial algebra of the module.
That seems like a reasonable objection to me. > That said, it's easy to fix... and I believe independently useful. Just > make grouping() a generator function rather than a plain function. This > lets us get an incremental grouping of an iterable. We already have something which lazily groups an iterable, returning groups as they are seen: groupby. What makes grouping() different from groupby() is that it accumulates ALL of the subgroups rather than just consecutive subgroupings. To make it clear with a simulated example (ignoring the keys for brevity): groupby("aaAAbbCaAB", key=str.upper) => groups "aaAA", "bb", "C", "aA", "B" grouping("aaAAbbCaAB", key=str.upper) => groups "aaAAaA", "bbB", "C" So grouping() cannot even begin returning values until it has processed the entire data set. In that regard, it is like sorted() -- it cannot be lazy, it is a fundamentally eager operation. I propose that a better name which indicates the non-lazy nature of this function is *grouped* rather than grouping, like sorted(). As for where it belongs, perhaps the collections module is the least worst fit. > This can be useful if > the iterable is slow or infinite, but the partial groupings are useful in > themselves. Under what circumstances would the partial groupings be useful? Given the example above: grouping("aaAAbbCaAB", key=str.upper) when would you want to see the accumulated partial groups? # again, ignoring the keys for brevity "aaAA" "aaAA", "bb" "aaAA", "bb", "C" "aaAAaA", "bb", "C" "aaAAaA", "bbB", "C" I don't see any practical use for this -- if you start processing the partial groupings immediately, you end up double-processing some of the items; if you wait until the last, what's the point of the intermediate values? As you say yourself: > This isn't so useful for the concrete sequence, but for this it would be > great: > > for grouped in grouping(data_over_wire()): > process_partial_groups(grouped) And that demonstrated exactly why this would be a terrible bug magnet, suckering people into doing what you just did, and ending up processing values more than once. To avoid that, your process_partial_groups would need to remember which values it has seen before for each key it has seen before. -- Steve _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/