[Spark SQL] Does Spark group small files

Yann Moisan Tue, 13 Nov 2018 12:29:04 -0800

Hello,

I'm using Spark 2.3.1.


I have a job that reads 5.000 small parquet files into s3.

When I do a mapPartitions followed by a collect, only *278* tasks are used
(I would have expected 5000). Does Spark group small files ? If yes, what
is the threshold for grouping ? Is it configurable ? Any link to
corresponding source code ?

Rgds,

Yann.

[Spark SQL] Does Spark group small files

Reply via email to