Yeachan Park created SPARK-39763: ------------------------------------ Summary: Executor memory footprint substantially increases while reading zstd compressed parquet files Key: SPARK-39763 URL: https://issues.apache.org/jira/browse/SPARK-39763 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.2.0 Reporter: Yeachan Park
Hi all, While transitioning from the default snappy compression to zstd, we noticed a substantial increase in executor memory whilst reading and processing zstd compressed parquet files. Memory footprint increased increased nearly 3 fold in some cases. Reading and processing files in snappy and writing to zstd did not result in this behaviour. To reproduce: # Set "spark.sql.parquet.compression.codec" to zstd # Write some parquet files, the compression will default to zstd after setting the option above # Read the compressed zstd file and run some transformations. Compare the memory usage of the executor vs running the same transformation on a parquet file with snappy compression. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org