Dear Ted,

You are right. ReduceByKey is transformation. My fault.
I would rephrase my question using following code snippet.
object ScalaApp {

  def main(args: Array[String]): Unit ={
    val conf = new SparkConf().setAppName("ScalaApp").setMaster("local")
    val sc = new SparkContext(conf)
    //val textFile: RDD[String] =
    val file = sc.textFile("/home/usr/test.dat")
    val output = file.flatMap(line => line.split(" "))
      .map(word => (word, 1))
      .reduceByKey(_ + _)

    output.persist()
    output.count()
    output.collect()
}

It's a simple code snippet.
I realize that the first action count() would trigger the execution based on HadoopRDD, MapParititonRDD and the reduceByKey will take the ShuffleRDD as input to perform the count.
The second action collect just perform the collect over the same ShuffleRDD.
I think the re-calculation will also be carried out over ShuffleRDD instead of re-executing preceding HadoopRDD and MapParitionRDD in case one partition of persisted output is missing.
Am I right?

Thanks and Regards,
Weiping

On 25.04.2016 17:46, Ted Yu wrote:
Can you show snippet of your code which demonstrates what you observed ?

Thansk

On Mon, Apr 25, 2016 at 8:38 AM, Weiping Qu <q...@informatik.uni-kl.de <mailto:q...@informatik.uni-kl.de>> wrote:

    Thanks.
    I read that from the specification.
    I thought the way people distinguish actions and transformations
    depends on whether they are lazily executed or not.
    As far as I saw from my codes, the reduceByKey will be executed
    without any operations in the Action category.
    Please correct me if I am wrong.

    Thanks,
    Regards,
    Weiping

    On 25.04.2016 17 <tel:25.04.2016%2017>:20, Chadha Pooja wrote:

        Reduce By Key is a Transformation

        
http://spark.apache.org/docs/latest/programming-guide.html#transformations

        Thanks
        
_________________________________________________________________________________________________

        Pooja Chadha
        Senior Architect
        THE BOSTON CONSULTING GROUP
        Mobile +1 617 794 3862 <tel:%2B1%20617%20794%203862>

        
_________________________________________________________________________________________________




        -----Original Message-----
        From: Weiping Qu [mailto:q...@informatik.uni-kl.de
        <mailto:q...@informatik.uni-kl.de>]
        Sent: Monday, April 25, 2016 11:05 AM
        To: u...@spark.incubator.apache.org
        <mailto:u...@spark.incubator.apache.org>
        Subject: reduceByKey as Action or Transformation

        Hi,

        I'd like just to verify that whether reduceByKey is
        transformation or
        actions.
        As written in RDD papers, spark flow will not be triggered only if
        actions are reached.
        I tried and saw that the my flow will be executed once there is a
        reduceByKey while it is categorized into transformations in
        Spark 1.6.1
        specification.

        Thanks and Regards,
        Weiping

        ---------------------------------------------------------------------
        To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
        <mailto:user-unsubscr...@spark.apache.org>
        For additional commands, e-mail: user-h...@spark.apache.org
        <mailto:user-h...@spark.apache.org>

        
______________________________________________________________________________
        The Boston Consulting Group, Inc.
          This e-mail message may contain confidential and/or
        privileged information.
        If you are not an addressee or otherwise authorized to receive
        this message,
        you should not use, copy, disclose or take any action based on
        this e-mail or
        any information contained in the message. If you have received
        this material
        in error, please advise the sender immediately by reply e-mail
        and delete this
        message. Thank you.



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
    <mailto:user-unsubscr...@spark.apache.org>
    For additional commands, e-mail: user-h...@spark.apache.org
    <mailto:user-h...@spark.apache.org>



Reply via email to