Yeachan Park created SPARK-39763:
------------------------------------

             Summary: Executor memory footprint substantially increases while 
reading zstd compressed parquet files
                 Key: SPARK-39763
                 URL: https://issues.apache.org/jira/browse/SPARK-39763
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 3.2.0
            Reporter: Yeachan Park


Hi all,

 

While transitioning from the default snappy compression to zstd, we noticed a 
substantial increase in executor memory whilst reading and processing zstd 
compressed parquet files.

Memory footprint increased increased nearly 3 fold in some cases.

Reading and processing files in snappy and writing to zstd did not result in 
this behaviour.

To reproduce:
 # Set "spark.sql.parquet.compression.codec" to zstd
 # Write some parquet files, the compression will default to zstd after setting 
the option above
 # Read the compressed zstd file and run some transformations. Compare the 
memory usage of the executor vs running the same transformation on a parquet 
file with snappy compression.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to