Re: Spark DAG scheduler

2020-04-16 Thread Reynold Xin
If you are talking about a tree, then the RDDs are nodes, and the dependencies are the edges. If you are talking about a DAG, then the partitions in the RDDs are the nodes, and the dependencies between the partitions are the edges. On Thu, Apr 16, 2020 at 4:02 PM, Mania Abdi <

Re: Spark DAG scheduler

2020-04-16 Thread Mania Abdi
Is it correct to say, the nodes in the DAG are RDDs and the edges are computations? On Thu, Apr 16, 2020 at 6:21 PM Reynold Xin wrote: > The RDD is the DAG. > > > On Thu, Apr 16, 2020 at 3:16 PM, Mania Abdi wrote: > >> Hello everyone, >> >> I am implementing a caching mechanism for analytic

Re: Spark DAG scheduler

2020-04-16 Thread Reynold Xin
The RDD is the DAG. On Thu, Apr 16, 2020 at 3:16 PM, Mania Abdi < abdi...@husky.neu.edu > wrote: > > Hello everyone, > > I am implementing a caching mechanism for analytic workloads running on > top of Spark and I need to retrieve the Spark DAG right after it is > generated and the DAG

Spark DAG scheduler

2020-04-16 Thread Mania Abdi
Hello everyone, I am implementing a caching mechanism for analytic workloads running on top of Spark and I need to retrieve the Spark DAG right after it is generated and the DAG scheduler. I would appreciate it if you could give me some hints or reference me to some documents about where the DAG