Re: Pairwise Processing of a List

2015-01-26 Thread Sean Owen
AFAIK ordering is not strictly guaranteed unless the RDD is the product of a sort. I think that in practice, you'll never find elements of a file read in some random order, for example (although see the recent issue about partition ordering potentially depending on how the local file system lists t

Re: Pairwise Processing of a List

2015-01-25 Thread Tobias Pfeiffer
Sean, On Mon, Jan 26, 2015 at 10:28 AM, Sean Owen wrote: > Note that RDDs don't really guarantee anything about ordering though, > so this only makes sense if you've already sorted some upstream RDD by > a timestamp or sequence number. > Speaking of order, is there some reading on guarantees an

Re: Pairwise Processing of a List

2015-01-25 Thread Sean Owen
stance (x1,y2) and (x2,y2) and > distance (x2,y2) and (x3,y3) > > Imagine that the list of coordinate point comes from a GPS and describes a > trip. > > - Steve > > From: Joseph Lust > Date: Sunday, January 25, 2015 at 17:17 > To: Steve Nunez , "user@spark.apache

Re: Pairwise Processing of a List

2015-01-25 Thread Sean Owen
If this is really about just Scala Lists, then a simple answer (using tuples of doubles) is: val points: List[(Double,Double)] = ... val distances = for (p1 <- points; p2 <- points) yield { val dx = p1._1 - p2._1 val dy = p1._2 - p2._2 math.sqrt(dx*dx + dy*dy) } distances.sum / 2 It's "/ 2"

Re: Pairwise Processing of a List

2015-01-25 Thread Steve Nunez
...@mc10inc.com>> Date: Sunday, January 25, 2015 at 17:17 To: Steve Nunez mailto:snu...@hortonworks.com>>, "user@spark.apache.org<mailto:user@spark.apache.org>" mailto:user@spark.apache.org>> Subject: Re: Pairwise Processing of a List So you've got a point A and you w

Re: Pairwise Processing of a List

2015-01-25 Thread Joseph Lust
ache.org>> Subject: Pairwise Processing of a List Spark Experts, I’ve got a list of points: List[(Float, Float)]) that represent (x,y) coordinate pairs and need to sum the distance. It’s easy enough to compute the distance: case class Point(x: Float, y: Float) { def distance(other: Point)

Re: Pairwise Processing of a List

2015-01-25 Thread Tobias Pfeiffer
Hi, On Mon, Jan 26, 2015 at 9:32 AM, Steve Nunez wrote: > I’ve got a list of points: List[(Float, Float)]) that represent (x,y) > coordinate pairs and need to sum the distance. It’s easy enough to compute > the distance: > Are you saying you want all combinations (N^2) of distances? That shoul

Pairwise Processing of a List

2015-01-25 Thread Steve Nunez
Spark Experts, I've got a list of points: List[(Float, Float)]) that represent (x,y) coordinate pairs and need to sum the distance. It's easy enough to compute the distance: case class Point(x: Float, y: Float) { def distance(other: Point): Float = sqrt(pow(x - other.x, 2) + pow(y - other