Re: Finding previous and next element in a sorted RDD

Evan Chan Thu, 21 Aug 2014 23:42:09 -0700

There's no way to avoid a shuffle due to the first and last elements
of each partition needing to be computed with the others, but I wonder
if there is a way to do a minimal shuffle.


On Thu, Aug 21, 2014 at 6:13 PM, cjwang <c...@cjwang.us> wrote:
> One way is to do zipWithIndex on the RDD.  Then use the index as a key.  Add
> or subtract 1 for previous or next element.  Then use cogroup or join to
> bind them together.
>
> val idx = input.zipWithIndex
> val previous = idx.map(x => (x._2+1, x._1))
> val current = idx.map(x => (x._2, x._1))
> val next = idx.map(x => (x._2-1, x._1))
>
> val joined = current leftOuterJoin previous leftOuterJoin next
>
> Code looks clean to me, but I feel uneasy about the performance of join.
>
>
>
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Finding-previous-and-next-element-in-a-sorted-RDD-tp12621p12623.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Finding previous and next element in a sorted RDD

Reply via email to