When using spark.read on a large number of small files, these are
automatically coalesced into fewer partitions. The only documentation I can
find on this is in the Spark 2.0.0 release notes, where it simply says (
http://spark.apache.org/releases/spark-release-2-0-0.html):

"Automatic file coalescing for native data sources"

Can anyone point me to documentation explaining what triggers this feature,
how it decides how many partitions to coalesce to, and what counts as a
"native data source"? I couldn't find any mention of this feature in the
SQL Programming Guide and Google was not helpful.

--
Daniel Siegmann
Senior Software Engineer
*SecurityScorecard Inc.*
214 W 29th Street, 5th Floor
New York, NY 10001

Reply via email to