so you want data from one physical partition in the disk to go to only one
executor?
On Fri, May 3, 2019 at 5:38 PM Tomas Bartalos
wrote:
> Hello,
>
> I have partitioned parquet files based on "event_hour" column.
> After reading parquet files to spark:
>
Hello,
I have partitioned parquet files based on "event_hour" column.
After reading parquet files to spark:
spark.read.format("parquet").load("...")
Files from the same parquet partition are scattered in many spark
partitions.
Example of mapping spark partition -> parquet partition:
Spark