subject:"Why these operations are slower than the equivalent on Hadoop\?"

Re: Why these operations are slower than the equivalent on Hadoop?

2014-04-16 Thread Yanzhe Chen

Hi Eugen, Sorry if I haven’t catch your point. In the second example, val result = data.mapPartitions(points => skyline(points.toArray).iterator) .reduce { case (left, right) => skyline(left ++ right) } In my understanding, if the data is type RDD, then both left and right

Re: Why these operations are slower than the equivalent on Hadoop?

2014-04-16 Thread Cheng Lian

Hmm… The good part of reduce is that it performs local combining within a single partition automatically, but since you turned each partition into a single-value one, local combining is not applicable, and reduce simply degrades to collect and then perform a skyline over all collected partial resul

Re: Why these operations are slower than the equivalent on Hadoop?

2014-04-15 Thread Eugen Cepoi

Yes, the second example does that. It transforms all the points of a partition into a single element the skyline, thus reduce will run on the skyline of two partitions and not on single points. Le 16 avr. 2014 06:47, "Yanzhe Chen" a écrit : > Eugen, > > Thanks for your tip and I do want to merge

Re: Why these operations are slower than the equivalent on Hadoop?

2014-04-15 Thread Yanzhe Chen

Eugen, Thanks for your tip and I do want to merge the result of a partition with another one but I am still not quite clear how to do it. Say the original data rdd has 32 partitions and since mapPartitions won’t change the number of partitions, it will remain 32 partitions which each contains

Re: Why these operations are slower than the equivalent on Hadoop?

2014-04-15 Thread Eugen Cepoi

It depends on your algorithm but I guess that you probably should use reduce (the code probably doesn't compile but it shows you the idea). val result = data.reduce { case (left, right) => skyline(left ++ right) } Or in the case you want to merge the result of a partition with another one you c

Re: Why these operations are slower than the equivalent on Hadoop?

2014-04-15 Thread Cheng Lian

Your Spark solution first reduces partial results into a single partition, computes the final result, and then collects to the driver side. This involves a shuffle and two waves of network traffic. Instead, you can directly collect partial results to the driver and then computes the final results o

Why these operations are slower than the equivalent on Hadoop?

2014-04-15 Thread Yanzhe Chen

Hi all, As a previous thread, I am asking how to implement a divide-and-conquer algorithm (skyline) in Spark. Here is my current solution: val data = sc.textFile(…).map(line => line.split(“,”).map(_.toDouble)) val result = data.mapPartitions(points => skyline(points.toArray).iterator).coales

Re: Why these operations are slower than the equivalent on Hadoop?

Re: Why these operations are slower than the equivalent on Hadoop?

Re: Why these operations are slower than the equivalent on Hadoop?

Re: Why these operations are slower than the equivalent on Hadoop?

Re: Why these operations are slower than the equivalent on Hadoop?

Re: Why these operations are slower than the equivalent on Hadoop?

Why these operations are slower than the equivalent on Hadoop?

7 matches

Site Navigation

Mail list logo

Footer information