On 16 Nov 2017, at 10:22, Michael Shtelma <mshte...@gmail.com> wrote: > you call repartition(1) before starting processing your files. This > will ensure that you end up with just one partition.
One question and one remark: Q) val ds = sqlContext.read.parquet(path).repartition(1) Am I absolutely sure that my file here is read by a single executor and that no data shuffling takes place afterwards to get that single partition? R) This approach did not work for me. val ds = sqlContext.read.parquet(path).repartition(1) // ds on a single partition ds.createOrReplaceTempView("ds") val result = sqlContext.sql("... from ds") // result on 166 partitions... How to force the processing on a // single executor? result.write.csv(...) // 166 files :-/ Jeroen --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org