Re: Finding previous and next element in a sorted RDD

2014-08-23 Thread Victor Tso-Guillen
Using mapPartitions, you could get the neighbors within a partition, but if
you think about it, it's much more difficult to accomplish this for the
complete dataset.


On Fri, Aug 22, 2014 at 11:24 AM, cjwang c...@cjwang.us wrote:

 It would be nice if an RDD that was massaged by OrderedRDDFunction could
 know
 its neighbors.




 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Finding-previous-and-next-element-in-a-sorted-RDD-tp12621p12664.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




Re: Finding previous and next element in a sorted RDD

2014-08-22 Thread Evan Chan
There's no way to avoid a shuffle due to the first and last elements
of each partition needing to be computed with the others, but I wonder
if there is a way to do a minimal shuffle.

On Thu, Aug 21, 2014 at 6:13 PM, cjwang c...@cjwang.us wrote:
 One way is to do zipWithIndex on the RDD.  Then use the index as a key.  Add
 or subtract 1 for previous or next element.  Then use cogroup or join to
 bind them together.

 val idx = input.zipWithIndex
 val previous = idx.map(x = (x._2+1, x._1))
 val current = idx.map(x = (x._2, x._1))
 val next = idx.map(x = (x._2-1, x._1))

 val joined = current leftOuterJoin previous leftOuterJoin next

 Code looks clean to me, but I feel uneasy about the performance of join.



 --
 View this message in context: 
 http://apache-spark-user-list.1001560.n3.nabble.com/Finding-previous-and-next-element-in-a-sorted-RDD-tp12621p12623.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Finding previous and next element in a sorted RDD

2014-08-22 Thread cjwang
It would be nice if an RDD that was massaged by OrderedRDDFunction could know
its neighbors.




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Finding-previous-and-next-element-in-a-sorted-RDD-tp12621p12664.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Finding previous and next element in a sorted RDD

2014-08-21 Thread cjwang
One way is to do zipWithIndex on the RDD.  Then use the index as a key.  Add
or subtract 1 for previous or next element.  Then use cogroup or join to
bind them together.

val idx = input.zipWithIndex
val previous = idx.map(x = (x._2+1, x._1))
val current = idx.map(x = (x._2, x._1))
val next = idx.map(x = (x._2-1, x._1))

val joined = current leftOuterJoin previous leftOuterJoin next

Code looks clean to me, but I feel uneasy about the performance of join.  



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Finding-previous-and-next-element-in-a-sorted-RDD-tp12621p12623.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org