Dear Ted,
You are right. ReduceByKey is transformation. My fault.
I would rephrase my question using following code snippet.
object ScalaApp {
def main(args: Array[String]): Unit ={
val conf = new SparkConf().setAppName("ScalaApp").setMaster("local")
val sc = new SparkContext(conf)
//val textFile: RDD[String] =
val file = sc.textFile("/home/usr/test.dat")
val output = file.flatMap(line => line.split(" "))
.map(word => (word, 1))
.reduceByKey(_ + _)
output.persist()
output.count()
output.collect()
}
It's a simple code snippet.
I realize that the first action count() would trigger the execution
based on HadoopRDD, MapParititonRDD and the reduceByKey will take the
ShuffleRDD as input to perform the count.
The second action collect just perform the collect over the same ShuffleRDD.
I think the re-calculation will also be carried out over ShuffleRDD
instead of re-executing preceding HadoopRDD and MapParitionRDD in case
one partition of persisted output is missing.
Am I right?
Thanks and Regards,
Weiping
On 25.04.2016 17:46, Ted Yu wrote:
Can you show snippet of your code which demonstrates what you observed ?
Thansk
On Mon, Apr 25, 2016 at 8:38 AM, Weiping Qu <q...@informatik.uni-kl.de
<mailto:q...@informatik.uni-kl.de>> wrote:
Thanks.
I read that from the specification.
I thought the way people distinguish actions and transformations
depends on whether they are lazily executed or not.
As far as I saw from my codes, the reduceByKey will be executed
without any operations in the Action category.
Please correct me if I am wrong.
Thanks,
Regards,
Weiping
On 25.04.2016 17 <tel:25.04.2016%2017>:20, Chadha Pooja wrote:
Reduce By Key is a Transformation
http://spark.apache.org/docs/latest/programming-guide.html#transformations
Thanks
_________________________________________________________________________________________________
Pooja Chadha
Senior Architect
THE BOSTON CONSULTING GROUP
Mobile +1 617 794 3862 <tel:%2B1%20617%20794%203862>
_________________________________________________________________________________________________
-----Original Message-----
From: Weiping Qu [mailto:q...@informatik.uni-kl.de
<mailto:q...@informatik.uni-kl.de>]
Sent: Monday, April 25, 2016 11:05 AM
To: u...@spark.incubator.apache.org
<mailto:u...@spark.incubator.apache.org>
Subject: reduceByKey as Action or Transformation
Hi,
I'd like just to verify that whether reduceByKey is
transformation or
actions.
As written in RDD papers, spark flow will not be triggered only if
actions are reached.
I tried and saw that the my flow will be executed once there is a
reduceByKey while it is categorized into transformations in
Spark 1.6.1
specification.
Thanks and Regards,
Weiping
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
<mailto:user-unsubscr...@spark.apache.org>
For additional commands, e-mail: user-h...@spark.apache.org
<mailto:user-h...@spark.apache.org>
______________________________________________________________________________
The Boston Consulting Group, Inc.
This e-mail message may contain confidential and/or
privileged information.
If you are not an addressee or otherwise authorized to receive
this message,
you should not use, copy, disclose or take any action based on
this e-mail or
any information contained in the message. If you have received
this material
in error, please advise the sender immediately by reply e-mail
and delete this
message. Thank you.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
<mailto:user-unsubscr...@spark.apache.org>
For additional commands, e-mail: user-h...@spark.apache.org
<mailto:user-h...@spark.apache.org>