Re: Howto force spark to honor parquet partitioning

2019-05-03 Thread Gourav Sengupta
so you want data from one physical partition in the disk to go to only one executor? On Fri, May 3, 2019 at 5:38 PM Tomas Bartalos wrote: > Hello, > > I have partitioned parquet files based on "event_hour" column. > After reading parquet files to spark: >

Howto force spark to honor parquet partitioning

2019-05-03 Thread Tomas Bartalos
Hello, I have partitioned parquet files based on "event_hour" column. After reading parquet files to spark: spark.read.format("parquet").load("...") Files from the same parquet partition are scattered in many spark partitions. Example of mapping spark partition -> parquet partition: Spark