Look at your DAG. Are there lots of CSV files? Does your input CSV
dataframe have lots of partitions to start with? Bear in mind cross join
makes the dataset much larger so expect to have more tasks.
On Fri, 30 Aug 2019 at 14:11, Rishi Shah wrote:
> Hi All,
>
> I am scratching my head against this weird behavior, where df (read from
> .csv) of size ~3.4GB gets cross joined with itself and creates 50K tasks!
> How to correlate input size with number of tasks in this case?
>
> --
> Regards,
>
> Rishi Shah
>
--
Chris