Hello,

I'm using Spark 2.3.1.

I have a job that reads 5.000 small parquet files into s3.

When I do a mapPartitions followed by a collect, only *278* tasks are used
(I would have expected 5000). Does Spark group small files ? If yes, what
is the threshold for grouping ? Is it configurable ? Any link to
corresponding source code ?

Rgds,

Yann.

Reply via email to