[jira] [Assigned] (SPARK-48359) Built-in functions for Zstd compression and decompression

ASF GitHub Bot (Jira) Tue, 21 May 2024 02:18:05 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-48359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


ASF GitHub Bot reassigned SPARK-48359:
--------------------------------------

    Assignee:     (was: Apache Spark)

> Built-in functions for Zstd compression and decompression
> ---------------------------------------------------------
>
>                 Key: SPARK-48359
>                 URL: https://issues.apache.org/jira/browse/SPARK-48359
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core
>    Affects Versions: 4.0.0
>            Reporter: Xi Lyu
>            Priority: Major
>              Labels: pull-request-available
>
> Some users are using UDFs for Zstd compression and decompression, which 
> results in poor performance. If we provide native functions, the performance 
> will be improved by compressing and decompressing just within the JVM.
>  
> Now, we are introducing three new built-in functions:
> {code:java}
> zstd_compress(input: binary [, level: int [, steaming_mode: bool]])
> zstd_decompress(input: binary)
> try_zstd_decompress(input: binary)
> {code}
> where
>  * `input`: The binary value to compress or decompress.
>  * `level`: Optional integer argument that represents the compression level. 
> The compression level controls the trade-off between compression speed and 
> compression ratio. The default level is 3. Valid values: between 1 and 22 
> inclusive
>  * `streaming_mode`: Optional boolean argument that represents whether to use 
> streaming mode to compress. 
> Examples:
> {code:sql}
> > SELECT base64(zstd_compress(repeat("Apache Spark ", 10)));
>   KLUv/SCCpQAAaEFwYWNoZSBTcGFyayABABLS+QU=
> > SELECT base64(zstd_compress(repeat("Apache Spark ", 10), 3, true));
>   KLUv/QBYpAAAaEFwYWNoZSBTcGFyayABABLS+QUBAAA=
> > SELECT 
> > string(zstd_decompress(unbase64("KLUv/SCCpQAAaEFwYWNoZSBTcGFyayABABLS+QU=")));
>   Apache Spark Apache Spark Apache Spark Apache Spark Apache Spark Apache 
> Spark Apache Spark Apache Spark Apache Spark Apache Spark
> > SELECT zstd_decompress(zstd_compress("Apache Spark"));
>   Apache Spark
> > SELECT try_zstd_decompress("invalid input")
>   NULL
> {code}
> These three built-in functions are also available in Python and Scala.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48359) Built-in functions for Zstd compression and decompression

Reply via email to