Re: [PR] [SPARK-48359][SQL] Built-in functions for Zstd compression and decompression [spark]

via GitHub Tue, 21 May 2024 01:36:29 -0700


xi-db commented on PR #46672:
URL: https://github.com/apache/spark/pull/46672#issuecomment-2122077693


   > Instead of adding (de)compression functions for different codecs, how 
about adding the `compression` and `decompression` directly, like,
   > 
   > * 
https://dev.mysql.com/doc/refman/8.0/en/encryption-functions.html#function_compress
   > * 
https://learn.microsoft.com/en-us/sql/t-sql/functions/compress-transact-sql?view=sql-server-ver16
   
   Hi @yaooqinn, yes, that can be one way of implementing them. However, based 
on the following,
   * The `compress` methods in MySQL and SQL Server only accept one argument 
and users can't specify the compression algorithm or compression level. 
Besides, the compression algorithm used in [MySQL's `compress` is not 
specified](https://dev.mysql.com/doc/refman/8.0/en/encryption-functions.html#function_compress:~:text=a%20binary%20string.-,This%20function%20requires%20MySQL%20to%20have%20been%20compiled%20with%20a%20compression%20library%20such%20as%20zlib.%20Otherwise%2C%20the%20return%20value%20is%20always%20NULL,-.%20The%20return%20value),
 and [SQL Server only uses 
gzip](https://learn.microsoft.com/en-us/sql/t-sql/functions/compress-transact-sql?view=sql-server-ver16#:~:text=using%20the%20Gzip%20algorithm),
 which is different from our cases. This may cause confusion for users who are 
familiar with other databases when using `compress` function in Apache Spark if 
we reuse the same name. 
   * Looking at our [SQL Function 
Reference](https://spark.apache.org/docs/latest/api/sql/#built-in-functions), 
there is no precedent for integrating multiple algorithms into one SQL 
function, which might make the functions more complicated to use. Following the 
naming convention like `aes_encrypt`, `url_encode` and `regexp_replace`, this 
function is named `zstd_compress`, including the algorithm name.
   
   Thus, the functions are named `zstd_compress`, `zstd_decompress`, and 
`try_zstd_decompress` in this PR, explicitly showing the algorithm they use, to 
make them simple to understand and use.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-48359][SQL] Built-in functions for Zstd compression and decompression [spark]

Reply via email to