Hey all, A reply on this would be great!
Thanks, A.B. On 17-May-2017 1:43 AM, "Daniel Siegmann" <dsiegm...@securityscorecard.io> wrote: > When using spark.read on a large number of small files, these are > automatically coalesced into fewer partitions. The only documentation I can > find on this is in the Spark 2.0.0 release notes, where it simply says ( > http://spark.apache.org/releases/spark-release-2-0-0.html): > > "Automatic file coalescing for native data sources" > > Can anyone point me to documentation explaining what triggers this > feature, how it decides how many partitions to coalesce to, and what counts > as a "native data source"? I couldn't find any mention of this feature in > the SQL Programming Guide and Google was not helpful. > > -- > Daniel Siegmann > Senior Software Engineer > *SecurityScorecard Inc.* > 214 W 29th Street, 5th Floor > New York, NY 10001 > >