Sorry, please ignore the above. I now see I called coalesce on a different reference, than I used to register the table.
On Sun, Jun 26, 2016 at 6:34 PM, Randy Gelhausen <rgel...@gmail.com> wrote: > <code> > val enriched_web_logs = sqlContext.sql(""" > select web_logs.datetime, web_logs.node as app_host, source_ip, b.node as > source_host, log > from web_logs > left outer join (select distinct node, address from nodes) b on source_ip > = address > """) > > enriched_web_logs.coalesce(1).write.format("parquet").mode("overwrite").save(bucket+"derived/enriched_web_logs") > enriched_web_logs.registerTempTable("enriched_web_logs") > sqlContext.cacheTable("enriched_web_logs") > </code> > > There are only 524 records in the resulting table, and I have explicitly > attempted to coalesce into 1 partition. > > Yet my Spark UI shows 200 (mostly empty) partitions: > RDD NameStorage LevelCached PartitionsFraction CachedSize in MemorySize > in ExternalBlockStoreSize on Disk > In-memory table enriched_web_logs > <http://localhost:4040/storage/rdd?id=86> Memory Deserialized 1x > Replicated 200 100% 22.0 KB 0.0 B 0.0 BWhy would there be 200 partitions > despite the coalesce call? >