Locality aware tree reduction

2016-05-07 Thread Ayman Khalil
Hello,

Is there a way to instruct treeReduce() to reduce RDD partitions on the
same node locally?

In my case, I'm using treeReduce() to reduce map results in parallel. My
reduce function is just arithmetically adding map results (i.e. no notion
of aggregation by key). As far as I understand, a shuffle will happen at
each treeReduce() stage using a hash partitioner with the RDD partition
index as input. I would like to enforce RDD partitions on the same node to
be reduced locally and only shuffle when each node has one RDD partition
left and before the results are sent to the driver.

I have few nodes and lots of partitions so I think this will give better
performance.

Thank you,
Ayman


Locality aware tree reduction

2016-05-04 Thread aymkhalil
Hello,

Is there a way to instruct treeReduce() to reduce RDD partitions on the same
node locally?

In my case, I'm using treeReduce() to reduce map results in parallel. My
reduce function is just arithmetically adding map values (i.e. no notion of
aggregation by key). As far as I understand, a shuffle will happen at each
treeReduce() stage using a hash partitioner with the RDD partition index as
input. I would like to enforce RDD partitions on the same node to be reduced
locally (i.e. no shuffling) and only shuffle when each node has one RDD
partition left and before the results are sent to the driver. 

I have few nodes and lots of partitions so I think this will give better
performance.

Thank you,
Ayman



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Locality-aware-tree-reduction-tp26885.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org