you can kv.mapValues(sorted), but that's definitely less efficient than
sorting during the groupBy
you could try using combineByKey directly w/ heapq...
from heapq import heapify, heappush, merge
def createCombiner(x):
return [x]
def mergeValues(xs, x):
heappush(xs, x)
return xs
def mergeCombiners(a, b):
return merge(a, b)
rdd.combineByKey(createCombiner, mergeValues, mergeCombiners)
best,
matt
On 08/22/2014 10:41 PM, Arpan Ghosh wrote:
I was grouping time series data by a key. I want the values to be sorted
by timestamp after the grouping.
On Fri, Aug 22, 2014 at 7:26 PM, Matthew Farrellee m...@redhat.com
mailto:m...@redhat.com wrote:
On 08/22/2014 04:32 PM, Arpan Ghosh wrote:
Is there any way to control the ordering of values for each key
during a
groupByKey() operation? Is there some sort of implicit ordering
in place
already?
Thanks
Arpan
there's no implicit ordering in place. the same holds for the order
of keys, unless you use sortByKey.
what are you trying to achieve?
best,
matt
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org