Re: [PySpark] order of values in GroupByKey()

2014-08-22 Thread Matthew Farrellee
you can kv.mapValues(sorted), but that's definitely less efficient than sorting during the groupBy you could try using combineByKey directly w/ heapq... from heapq import heapify, heappush, merge def createCombiner(x): return [x] def mergeValues(xs, x): heappush(xs, x) return xs def

Re: [PySpark] order of values in GroupByKey()

2014-08-22 Thread Arpan Ghosh
I was grouping time series data by a key. I want the values to be sorted by timestamp after the grouping. On Fri, Aug 22, 2014 at 7:26 PM, Matthew Farrellee wrote: > On 08/22/2014 04:32 PM, Arpan Ghosh wrote: > >> Is there any way to control the ordering of values for each key during a >> group

Re: [PySpark] order of values in GroupByKey()

2014-08-22 Thread Matthew Farrellee
On 08/22/2014 04:32 PM, Arpan Ghosh wrote: Is there any way to control the ordering of values for each key during a groupByKey() operation? Is there some sort of implicit ordering in place already? Thanks Arpan there's no implicit ordering in place. the same holds for the order of keys, unle

[PySpark] order of values in GroupByKey()

2014-08-22 Thread Arpan Ghosh
Is there any way to control the ordering of values for each key during a groupByKey() operation? Is there some sort of implicit ordering in place already? Thanks Arpan