> > It is still unclear to me why we should remember all these tricks (or add > lots of extra little functions) when this elegantly can be expressed in a > reduce operation with a simple one line lamba function. > I think you can do that too. KeyValueGroupedDataset has a reduceGroups function. This probably won't be as fast though because you end up creating objects where as the version I gave will get codgened to operate on binary data the whole way though.
> The same applies to these Window functions. I had to read it 3 times to > understand what it all means. Maybe it makes sense for someone who has been > forced to use such limited tools in sql for many years but that's not > necessary what we should aim for. Why can I not just have the sortBy and > then an Iterator[X] => Iterator[Y] to express what I want to do? > We also have orderBy and mapPartitions. > All these functions (rank etc.) can be trivially expressed in this, plus I > can add other operations if needed, instead of being locked in like this > Window framework. > I agree that window functions would probably not be my first choice for many problems, but for people coming from SQL it was a very popular feature. My real goal is to give as many paradigms as possible in a single unified framework. Let people pick the right mode of expression for any given job :)