Hi, can you please share the SPARK code?
Regards, Gourav On Sun, Jun 28, 2020 at 12:58 AM Sanjeev Mishra <sanjeev.mis...@gmail.com> wrote: > > I have large amount of json files that Spark can read in 36 seconds but > Spark 3.0 takes almost 33 minutes to read the same. On closer analysis, > looks like Spark 3.0 is choosing different DAG than Spark 2.0. Does anyone > have any idea what is going on? Is there any configuration problem with > Spark 3.0. > > Here are the details: > > *Spark 2.4* > > Summary Metrics for 2203 Completed Tasks > <http://10.0.0.8:4040/stages/stage/?id=0&attempt=0#tasksTitle> > MetricMin25th percentileMedian75th percentileMax > Duration 0.0 ms 0.0 ms 0.0 ms 1.0 ms 62.0 ms > GC Time 0.0 ms 0.0 ms 0.0 ms 0.0 ms 11.0 ms > Showing 1 to 2 of 2 entries > Aggregated Metrics by Executor > Show 204060100All entries > Search: > Executor IDLogsAddressTask TimeTotal TasksFailed TasksKilled TasksSucceeded > TasksBlacklisted > driver 10.0.0.8:49159 36 s 2203 0 0 2203 false > > > *Spark 3.0* > > Summary Metrics for 8 Completed Tasks > <http://10.0.0.8:4040/stages/stage/?id=1&attempt=0&task.eventTimelinePageNumber=1&task.eventTimelinePageSize=47#tasksTitle> > MetricMin25th percentileMedian75th percentileMax > Duration 3.8 min 4.0 min 4.1 min 4.4 min 5.0 min > GC Time 3 s 3 s 3 s 4 s 4 s > Input Size / Records 15.6 MiB / 51028 16.2 MiB / 53303 16.8 MiB / 55259 17.8 > MiB / 58148 20.2 MiB / 71624 > Showing 1 to 3 of 3 entries > Aggregated Metrics by Executor > Show 204060100All entries > Search: > Executor IDLogsAddressTask TimeTotal TasksFailed TasksKilled TasksSucceeded > TasksBlacklistedInput Size / Records > driver 10.0.0.8:50224 33 min 8 0 0 8 false 136.1 MiB / 451999 > > > The DAG is also different > Spark 2.0 DAG > > [image: Screenshot 2020-06-27 16.30.26.png] > > Spark 3.0 DAG > > [image: Screenshot 2020-06-27 16.32.32.png] > > >