[
https://issues.apache.org/jira/browse/FLINK-32565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17751138#comment-17751138
]
Hanyu Zheng commented on FLINK-32565:
-
[~twalthr] , Through research, It seem that other vendors use cast but not
convert.
> Support cast from NUMBER to BYTES
> -
>
> Key: FLINK-32565
> URL: https://issues.apache.org/jira/browse/FLINK-32565
> Project: Flink
> Issue Type: Sub-task
>Reporter: Hanyu Zheng
>Assignee: Hanyu Zheng
>Priority: Major
> Labels: pull-request-available
>
> We are undertaking a task that requires casting from the DOUBLE type to BYTES
> In particular, we have a INTEGER 1234. Our current approach is to convert
> this INTEGER to BYTES using the following SQL query:
> {code:java}
> SELECT CAST(1234 as BYTES);{code}
> {{ }}
> However, we encounter an issue when executing this query, potentially due to
> an error in the conversion between INTEGER and BYTES. Our goal is to identify
> and correct this issue so that our query can execute successfully. The tasks
> involved are:
> # Investigate and pinpoint the specific reason for the conversion failure
> from INTEGER to BYTES.
> # Design and implement a solution that enables our query to function
> correctly.
> # Test this solution across all required scenarios to ensure its robustness.
>
> see also:
> 1. PostgreSQL: PostgreSQL supports casting from NUMBER types (INTEGER,
> BIGINT, DECIMAL, etc.) to BYTES type (BYTEA). In PostgreSQL, you can use CAST
> or TO_BINARY function for performing the conversion. URL:
> [https://www.postgresql.org/docs/current/sql-expressions.html#SQL-SYNTAX-TYPE-CASTS]
> 2. MySQL: MySQL supports casting from NUMBER types (INTEGER, BIGINT, DECIMAL,
> etc.) to BYTES type (BINARY or BLOB). In MySQL, you can use CAST or CONVERT
> functions for performing the conversion. URL:
> [https://dev.mysql.com/doc/refman/8.0/en/cast-functions.html]
> 3. Microsoft SQL Server: SQL Server supports casting from NUMBER types (INT,
> BIGINT, NUMERIC, etc.) to BYTES type (VARBINARY or IMAGE). You can use CAST
> or CONVERT functions for performing the conversion. URL:
> [https://docs.microsoft.com/en-us/sql/t-sql/functions/cast-and-convert-transact-sql]
> 4. Oracle Database: Oracle supports casting from NUMBER types (NUMBER,
> INTEGER, FLOAT, etc.) to BYTES type (RAW). You can use UTL_RAW.CAST_TO_RAW
> function for performing the conversion. URL:
> [https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/TO_BINARY_DOUBLE.html]
>
> for the problem of bytes order may arise (little vs big endian).
>
> 1. Apache Hadoop: Hadoop, being an open-source framework, has to deal with
> byte order issues across different platforms and architectures. The Hadoop
> File System (HDFS) uses a technique called "sequence files," which include
> metadata to describe the byte order of the data. This metadata ensures that
> data is read and written correctly, regardless of the endianness of the
> platform.
> 2. Apache Avro: Avro is a data serialization system used by various big data
> frameworks like Hadoop and Apache Kafka. Avro uses a compact binary encoding
> format that includes a marker for the byte order. This allows Avro to handle
> endianness issues seamlessly when data is exchanged between systems with
> different byte orders.
> 3. Apache Parquet: Parquet is a columnar storage format used in big data
> processing frameworks like Apache Spark. Parquet uses a little-endian format
> for encoding numeric values, which is the most common format on modern
> systems. When reading or writing Parquet data, data processing engines
> typically handle any necessary byte order conversions transparently.
> 4. Apache Spark: Spark is a popular big data processing engine that can
> handle data on distributed systems. It relies on the underlying data formats
> it reads (e.g., Avro, Parquet, ORC) to manage byte order issues. These
> formats are designed to handle byte order correctly, ensuring that Spark can
> handle data correctly on different platforms.
> 5. Google Cloud BigQuery: BigQuery is a serverless data warehouse offered by
> Google Cloud. When dealing with binary data and endianness, BigQuery relies
> on the data encoding format. For example, when loading data in Avro or
> Parquet formats, these formats already include byte order information,
> allowing BigQuery to handle data across different platforms correctly.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)