xi-db commented on PR #46672: URL: https://github.com/apache/spark/pull/46672#issuecomment-2122077693
> Instead of adding (de)compression functions for different codecs, how about adding the `compression` and `decompression` directly, like, > > * https://dev.mysql.com/doc/refman/8.0/en/encryption-functions.html#function_compress > * https://learn.microsoft.com/en-us/sql/t-sql/functions/compress-transact-sql?view=sql-server-ver16 Hi @yaooqinn, yes, that can be one way of implementing them. However, based on the following, * The `compress` methods in MySQL and SQL Server only accept one argument and users can't specify the compression algorithm or compression level. Besides, the compression algorithm used in [MySQL's `compress` is not specified](https://dev.mysql.com/doc/refman/8.0/en/encryption-functions.html#function_compress:~:text=a%20binary%20string.-,This%20function%20requires%20MySQL%20to%20have%20been%20compiled%20with%20a%20compression%20library%20such%20as%20zlib.%20Otherwise%2C%20the%20return%20value%20is%20always%20NULL,-.%20The%20return%20value), and [SQL Server only uses gzip](https://learn.microsoft.com/en-us/sql/t-sql/functions/compress-transact-sql?view=sql-server-ver16#:~:text=using%20the%20Gzip%20algorithm), which is different from our cases. This may cause confusion for users who are familiar with other databases when using `compress` function in Apache Spark if we reuse the same name. * Looking at our [SQL Function Reference](https://spark.apache.org/docs/latest/api/sql/#built-in-functions), there is no precedent for integrating multiple algorithms into one SQL function, which might make the functions more complicated to use. Following the naming convention like `aes_encrypt`, `url_encode` and `regexp_replace`, this function is named `zstd_compress`, including the algorithm name. Thus, the functions are named `zstd_compress`, `zstd_decompress`, and `try_zstd_decompress` in this PR, explicitly showing the algorithm they use, to make them simple to understand and use. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
