Hi, Can you give me more details or give me a tutorial on "You'd have to intercept execution events and correlate them. Not an easy task yet doable"
Thank Vào Th 4, 12 thg 4, 2023 vào lúc 21:04 Jacek Laskowski <ja...@japila.pl> đã viết: > Hi, > > tl;dr it's not possible to "reverse-engineer" tasks to functions. > > In essence, Spark SQL is an abstraction layer over RDD API that's made up > of partitions and tasks. Tasks are Scala functions (possibly with some > Python for PySpark). A simple-looking high-level operator like > DataFrame.join can end up with multiple RDDs, each with a set of partitions > (and hence tasks). What the tasks do is an implementation detail that you'd > have to know about by reading the source code of Spark SQL that produces > the "bytecode". > > Just looking at the DAG or the tasks screenshots won't give you that level > of detail. You'd have to intercept execution events and correlate them. Not > an easy task yet doable. HTH. > > Pozdrawiam, > Jacek Laskowski > ---- > "The Internals Of" Online Books <https://books.japila.pl/> > Follow me on https://twitter.com/jaceklaskowski > > <https://twitter.com/jaceklaskowski> > > > On Tue, Apr 11, 2023 at 6:53 PM Trường Trần Phan An < > truong...@vlute.edu.vn> wrote: > >> Hi all, >> >> I am conducting a study comparing the execution time of Bloom Filter Join >> operation on two environments: Apache Spark Cluster and Apache Spark. I >> have compared the overall time of the two environments, but I want to >> compare specific "tasks on each stage" to see which computation has the >> most significant difference. >> >> I have taken a screenshot of the DAG of Stage 0 and the list of tasks >> executed in Stage 0. >> - DAG.png >> - Task.png >> >> *I have questions:* >> 1. Can we determine which tasks are responsible for executing each step >> scheduled on the DAG during the processing? >> 2. Is it possible to know the function of each task (e.g., what is task >> ID 0 responsible for? What is task ID 1 responsible for? ... )? >> >> Best regards, >> Truong >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >