Re: How to determine the function of tasks on each stage in an Apache Spark application?

Trường Trần Phan An Thu, 13 Apr 2023 05:21:07 -0700

Hi,

Can you give me more details or give me a tutorial on "You'd have to
intercept execution events and correlate them. Not an easy task yet doable"


Thank

Vào Th 4, 12 thg 4, 2023 vào lúc 21:04 Jacek Laskowski <ja...@japila.pl>
đã viết:

> Hi,
>
> tl;dr it's not possible to "reverse-engineer" tasks to functions.
>
> In essence, Spark SQL is an abstraction layer over RDD API that's made up
> of partitions and tasks. Tasks are Scala functions (possibly with some
> Python for PySpark). A simple-looking high-level operator like
> DataFrame.join can end up with multiple RDDs, each with a set of partitions
> (and hence tasks). What the tasks do is an implementation detail that you'd
> have to know about by reading the source code of Spark SQL that produces
> the "bytecode".
>
> Just looking at the DAG or the tasks screenshots won't give you that level
> of detail. You'd have to intercept execution events and correlate them. Not
> an easy task yet doable. HTH.
>
> Pozdrawiam,
> Jacek Laskowski
> ----
> "The Internals Of" Online Books <https://books.japila.pl/>
> Follow me on https://twitter.com/jaceklaskowski
>
> <https://twitter.com/jaceklaskowski>
>
>
> On Tue, Apr 11, 2023 at 6:53 PM Trường Trần Phan An <
> truong...@vlute.edu.vn> wrote:
>
>> Hi all,
>>
>> I am conducting a study comparing the execution time of Bloom Filter Join
>> operation on two environments: Apache Spark Cluster and Apache Spark. I
>> have compared the overall time of the two environments, but I want to
>> compare specific "tasks on each stage" to see which computation has the
>> most significant difference.
>>
>> I have taken a screenshot of the DAG of Stage 0 and the list of tasks
>> executed in Stage 0.
>> - DAG.png
>> - Task.png
>>
>> *I have questions:*
>> 1. Can we determine which tasks are responsible for executing each step
>> scheduled on the DAG during the processing?
>> 2. Is it possible to know the function of each task (e.g., what is task
>> ID 0 responsible for? What is task ID 1 responsible for? ... )?
>>
>> Best regards,
>> Truong
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

Re: How to determine the function of tasks on each stage in an Apache Spark application?

Reply via email to