Re: GroupByKey with sorted values within key

2018-05-31 Thread Etienne Chauchot
tensions SortValues, but it doesn’t have > > > sufficient abstraction for runners. > > > > > > I noticed that in DataflowRunner there is translation of batch GroupByKey > > > to GroupByKeyAndSortValuesOnly but is it > > > considered to have it in beam core so for example SparkRunner could > > > translate “GroupByKey with sorted values > > > within key” with their internals such as > > > repartitionAndSortWithinPartitions. > > > Thank you. > > > Marek Simunek

Re: GroupByKey with sorted values within key

2018-05-30 Thread Lukasz Cwik
>>>>> specific use case. Users should rely on SortValues as it is the public >>>>> implementation for sorting. >>>>> >>>>> 1: >>>>> https://github.com/apache/beam/blob/85dcab56268fbac923ffd5885489ee154f097fc5/runners/spark/s

Re: GroupByKey with sorted values within key

2018-05-30 Thread Kenneth Knowles
bac923ffd5885489ee154f097fc5/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/TransformTranslator.java#L200 >>>> >>>> As a side note, its uncommon where you need to sort all values, usually >>>> top 100 suffices and can be implemented

Re: GroupByKey with sorted values within key

2018-05-30 Thread David Morávek
m/apache/beam/blob/85dcab56268fbac923ffd5885489ee >>> 154f097fc5/runners/spark/src/main/java/org/apache/beam/ >>> runners/spark/translation/TransformTranslator.java#L200 >>> >>> As a side note, its uncommon where you need to sort all values, usually >>> to

Re: GroupByKey with sorted values within key

2018-05-30 Thread Lukasz Cwik
much more efficiently with a >> combiner when compared to sorting. >> >> On Wed, May 30, 2018 at 3:38 AM wrote: >> >>> Hi, >>> I have question I am trying to do translation in dsl-euphoria for >>> “GroupByKey with sorted values withi

Re: GroupByKey with sorted values within key

2018-05-30 Thread Kenneth Knowles
sorting. > > On Wed, May 30, 2018 at 3:38 AM wrote: > >> Hi, >> I have question I am trying to do translation in dsl-euphoria for >> “GroupByKey with sorted values within key” to Beam. I am aware of java sdk >> extensions SortValues, but it doesn’t have sufficient ab

Re: GroupByKey with sorted values within key

2018-05-30 Thread Lukasz Cwik
Hi, > I have question I am trying to do translation in dsl-euphoria for > “GroupByKey with sorted values within key” to Beam. I am aware of java sdk > extensions SortValues, but it doesn’t have sufficient abstraction for > runners. > > I noticed that in DataflowRunner there is

GroupByKey with sorted values within key

2018-05-30 Thread marek-simunek
Hi,  I have question I am trying to do translation in dsl-euphoria for “ GroupByKey with sorted values within key” to Beam. I am aware of java sdk extensions SortValues, but it doesn’t have sufficient abstraction for runners. I noticed that in DataflowRunner there is translation of batch