, 2015 6:00 PM
To: Shuai Zheng
Cc: Shao, Saisai; user@spark.apache.org
Subject: Re: Union and reduceByKey will trigger shuffle even same partition?
I think you're getting tripped up lazy evaluation and the way stage boundaries
work (admittedly its pretty confusing in this case).
It is
ugh put in memory), how can I access other RDD’s local partition in the
> mapParitition
> method? Is it anyway to do this in Spark?
>
>
>
> *From:* Shao, Saisai [mailto:saisai.s...@intel.com]
> *Sent:* Monday, February 23, 2015 3:13 PM
> *To:* Shuai Zheng
> *Cc:* user@spar
ai
Cc: user@spark.apache.org
Subject: RE: Union and reduceByKey will trigger shuffle even same partition?
In the book of learning spark:
[cid:image002.jpg@01D04F74.28C9F870]
So here it means only no shuffle happen crossing network but still will do
shuffle locally? Even it is the case, why union
apache.org<mailto:user@spark.apache.org>
Subject: RE: Union and reduceByKey will trigger shuffle even same partition?
If you call reduceByKey(), internally Spark will introduce a shuffle
operations, not matter the data is already partitioned locally, Spark itself do
not know the data is already
educeByKey will trigger shuffle even same partition?
If you call reduceByKey(), internally Spark will introduce a shuffle
operations, not matter the data is already partitioned locally, Spark itself
do not know the data is already well partitioned.
So if you want to avoid Shuffle, you have to
: Monday, February 23, 2015 3:13 PM
To: Shuai Zheng
Cc: user@spark.apache.org
Subject: RE: Union and reduceByKey will trigger shuffle even same partition?
If you call reduceByKey(), internally Spark will introduce a shuffle
operations, not matter the data is already partitioned locally, Spark
If you call reduceByKey(), internally Spark will introduce a shuffle
operations, not matter the data is already partitioned locally, Spark itself do
not know the data is already well partitioned.
So if you want to avoid Shuffle, you have to write the code explicitly to
avoid this, from my unde