subject:"\[PySpark\] order of values in GroupByKey\(\)"

[PySpark] order of values in GroupByKey()

2014-08-22 Thread Arpan Ghosh

Is there any way to control the ordering of values for each key during a
groupByKey() operation? Is there some sort of implicit ordering in place
already?

Thanks

Arpan

Re: [PySpark] order of values in GroupByKey()

2014-08-22 Thread Matthew Farrellee


On 08/22/2014 04:32 PM, Arpan Ghosh wrote:

Is there any way to control the ordering of values for each key during a
groupByKey() operation? Is there some sort of implicit ordering in place
already?

Thanks

Arpan


there's no implicit ordering in place. the same holds for the order of 
keys, unless you use sortByKey.


what are you trying to achieve?

best,


matt

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: [PySpark] order of values in GroupByKey()

2014-08-22 Thread Matthew Farrellee

you can kv.mapValues(sorted), but that's definitely less efficient than 
sorting during the groupBy


you could try using combineByKey directly w/ heapq...

from heapq import heapify, heappush, merge
def createCombiner(x):
return [x]
def mergeValues(xs, x):
heappush(xs, x)
return xs
def mergeCombiners(a, b):
return merge(a, b)

rdd.combineByKey(createCombiner, mergeValues, mergeCombiners)

best,


matt

On 08/22/2014 10:41 PM, Arpan Ghosh wrote:

I was grouping time series data by a key. I want the values to be sorted
by timestamp after the grouping.


On Fri, Aug 22, 2014 at 7:26 PM, Matthew Farrellee m...@redhat.com
mailto:m...@redhat.com wrote:

On 08/22/2014 04:32 PM, Arpan Ghosh wrote:

Is there any way to control the ordering of values for each key
during a
groupByKey() operation? Is there some sort of implicit ordering
in place
already?

Thanks

Arpan


there's no implicit ordering in place. the same holds for the order
of keys, unless you use sortByKey.

what are you trying to achieve?

best,


matt





-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

[PySpark] order of values in GroupByKey()

Re: [PySpark] order of values in GroupByKey()

Re: [PySpark] order of values in GroupByKey()

3 matches

Site Navigation

Mail list logo

Footer information