There's no way to avoid a shuffle due to the first and last elements of each partition needing to be computed with the others, but I wonder if there is a way to do a minimal shuffle.
On Thu, Aug 21, 2014 at 6:13 PM, cjwang <c...@cjwang.us> wrote: > One way is to do zipWithIndex on the RDD. Then use the index as a key. Add > or subtract 1 for previous or next element. Then use cogroup or join to > bind them together. > > val idx = input.zipWithIndex > val previous = idx.map(x => (x._2+1, x._1)) > val current = idx.map(x => (x._2, x._1)) > val next = idx.map(x => (x._2-1, x._1)) > > val joined = current leftOuterJoin previous leftOuterJoin next > > Code looks clean to me, but I feel uneasy about the performance of join. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Finding-previous-and-next-element-in-a-sorted-RDD-tp12621p12623.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org