[jira] [Created] (CALCITE-6375) Fix failing quidem tests for FORMAT in CAST
Jerin John created CALCITE-6375: --- Summary: Fix failing quidem tests for FORMAT in CAST Key: CALCITE-6375 URL: https://issues.apache.org/jira/browse/CALCITE-6375 Project: Calcite Issue Type: Bug Reporter: Jerin John CALCITE-2980 implemented the FORMAT clause used inside the CAST operator. We had imported a large number of quidem tests from Apache Impala for the same, stored in the file: [cast-with-format.iq|https://github.com/apache/calcite/blob/main/core/src/test/resources/sql/cast-with-format.iq]. Many of these tests are currently disabled as they're either unsupported or incorrectly implemented, some tests outputs may also need to be changed as per dialect behavior. Creating this placeholder issue to track and fix the pending tests. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (CALCITE-6367) Add timezone support for FORMAT clause in CAST (enabled in BigQuery)
Jerin John created CALCITE-6367: --- Summary: Add timezone support for FORMAT clause in CAST (enabled in BigQuery) Key: CALCITE-6367 URL: https://issues.apache.org/jira/browse/CALCITE-6367 Project: Calcite Issue Type: Bug Reporter: Jerin John This issue is a followup on CALCITE-6269 that fixes some of Calcite's existing format elements implementation to be aligned to BQ functionality. Two major formats that might require a bit more rework is adding support for the TZH/TZM elements along with time zone areas as described below: * [Parsing timestamp literals|https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_tz_as_string] with timezones as used by BQ does not seem to be supported yet (format element TZR is unimplemented, BQ has TZH, TZM for hour and minute offsets) (eg: {{cast('2020.06.03 00:00:53+00' as timestamp format '.MM.DD HH:MI:SSTZH')}} * BQ format [timezone as string |https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_tz_as_string] can take an additional argument {{{}AT TIME ZONE 'Asia/Kolkata'{}}}, which would require additional parser changes and time zone parameter to be plumbed in to the cast operator call. One important thing to consider, is that the {{SimpleDateFormat}} class which currently stores the datetime object, may not fully support timezone features as described and might warrant a broader refactoring of this code. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (CALCITE-6270) Add CAST from Numeric and BYTES to String FORMAT (Enabled in BigQuery)
Jerin John created CALCITE-6270: --- Summary: Add CAST from Numeric and BYTES to String FORMAT (Enabled in BigQuery) Key: CALCITE-6270 URL: https://issues.apache.org/jira/browse/CALCITE-6270 Project: Calcite Issue Type: Bug Reporter: Jerin John -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (CALCITE-6269) Fix missing/broken BQ date-time format elements
Jerin John created CALCITE-6269: --- Summary: Fix missing/broken BQ date-time format elements Key: CALCITE-6269 URL: https://issues.apache.org/jira/browse/CALCITE-6269 Project: Calcite Issue Type: Bug Reporter: Jerin John Assignee: Jerin John Add support for missing/broken DATE-TIME format elements: * [YYY / Y|https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_year_as_string] - last three or 1 digits of year * [|https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_year_as_string] - supports four or more digits in the year, Calcite using [DateString|https://github.com/apache/calcite/blob/3326475c766267d521330006cc80730c4e456191/core/src/main/java/org/apache/calcite/util/DateString.java] util throws: {{java.lang.IllegalArgumentException: Year out of range: [12018]}} * [S|https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_second_as_string] - five digit seconds precision, only SS two digit precision is available * [FFn (n=1/2)|https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_second_as_string] - always returns seconds with precision 3 (=FF3); also BQ supports n=1-9, calcite format models supports n=1-6, should we expand this range? * [AM/PM|https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_meridian_as_string] - Meridian formats not available * [Parsing timestamp literals|https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_tz_as_string] with timezones as used by BQ does not seem to be supported yet (format element TZR is unimplemented, BQ has TZH, TZM for hour and minute offsets) (eg: {{cast('2020.06.03 00:00:53+00' as timestamp format '.MM.DD HH:MI:SSTZH')}} * BQ format [timezone as string |https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_tz_as_string] can take an additional argument {{{}AT TIME ZONE 'Asia/Kolkata'{}}}, which would require additional parser changes and time zone parameter to be plumbed in to the cast operator call. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (CALCITE-6092) Invalid test cases in CAST String to Time
Jerin John created CALCITE-6092: --- Summary: Invalid test cases in CAST String to Time Key: CALCITE-6092 URL: https://issues.apache.org/jira/browse/CALCITE-6092 Project: Calcite Issue Type: Bug Reporter: Jerin John Encountered some [tests|https://github.com/apache/calcite/blob/590ec85f0fcff7173c288c350c3f60e640976a34/testkit/src/main/java/org/apache/calcite/test/SqlOperatorTest.java#L1237-L1238] within SqlOperatorTest file for the CAST operator on String to Datetime conversions, which are found to be invalid time strings on some tested instances of DBs like BQ, MySql, Postgres. It seems these tests were originally ignored as `BAD_DATETIME_MESSAGE` but then updated to verify conversion performed by the CAST operator (refer to JIRA ticket: CALCITE-5554 and [commit)|https://github.com/apache/calcite/commit/625a2e03c4c5583279350bf04e3db2a31b1ec411#diff-fdd0c725fc7c6fd56965f59b1d51e4c7a9b5b5be27da2e54b8b8273dc980cd64R1118-R1119] Considering that multiple dialects catch these cases as invalid time strings for conversion, code should be updated to handle them as exceptions and the tests to be corrected to reflect this behavior. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (CALCITE-5979) Add REGEXP_REPLACE function (enabled in BigQuery library)
Jerin John created CALCITE-5979: --- Summary: Add REGEXP_REPLACE function (enabled in BigQuery library) Key: CALCITE-5979 URL: https://issues.apache.org/jira/browse/CALCITE-5979 Project: Calcite Issue Type: Task Reporter: Jerin John Assignee: Jerin John Fix For: 1.36.0 Add support for [REGEXP_EXTRACT_ALL|https://cloud.google.com/bigquery/docs/reference/standard-sql/string_functions#regexp_extract_all] function from BigQuery. *{{REGEXP_EXTRACT_ALL(value, regexp)}}* Returns an array of all substrings of value that match the re2 regular expression, regexp. Returns an empty array if there is no match. * If the regular expression contains a capturing group ((...)), and there is a match for that capturing group, that match is added to the results. If there are multiple matches for a capturing group, the last match is added to the results. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (CALCITE-5978) Add REGEXP_INSTR function (enabled in BigQuery library)
Jerin John created CALCITE-5978: --- Summary: Add REGEXP_INSTR function (enabled in BigQuery library) Key: CALCITE-5978 URL: https://issues.apache.org/jira/browse/CALCITE-5978 Project: Calcite Issue Type: Task Reporter: Jerin John Assignee: Jerin John Fix For: 1.36.0 Add support for [REGEXP_EXTRACT_ALL|https://cloud.google.com/bigquery/docs/reference/standard-sql/string_functions#regexp_extract_all] function from BigQuery. *{{REGEXP_EXTRACT_ALL(value, regexp)}}* Returns an array of all substrings of value that match the re2 regular expression, regexp. Returns an empty array if there is no match. * If the regular expression contains a capturing group ((...)), and there is a match for that capturing group, that match is added to the results. If there are multiple matches for a capturing group, the last match is added to the results. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (CALCITE-5911) Implement BigQuery function REGEXP_EXTRACT_ALL
Jerin John created CALCITE-5911: --- Summary: Implement BigQuery function REGEXP_EXTRACT_ALL Key: CALCITE-5911 URL: https://issues.apache.org/jira/browse/CALCITE-5911 Project: Calcite Issue Type: Task Reporter: Jerin John Assignee: Jerin John Add support for [REGEXP_EXTRACT|https://cloud.google.com/bigquery/docs/reference/standard-sql/string_functions#regexp_extract] and [REGEXP_EXTRACT_ALL|https://cloud.google.com/bigquery/docs/reference/standard-sql/string_functions#regexp_extract_all] functions from BigQuery. *{{REGEXP_EXTRACT(value, regexp[, position[, occurrence]])}}* Returns the substring in {{value}} that matches the regular expression {{{}regexp{}}}. Returns {{NULL}} if there is no match. * If the regular expression contains a capturing group ({{{}(...){}}}), and there is a match for that capturing group, that match is returned. If there are multiple matches for a capturing group, the last match is returned. * If {{position}} is specified, the search starts at this position in {{{}value{}}}, otherwise it starts at the beginning of {{{}value{}}}. * If {{occurrence}} is specified, the search returns a specific occurrence of the {{regexp}} in {{{}value{}}}, otherwise returns the first match. *{{REGEXP_EXTRACT_ALL(value, regexp)}}* Returns an array of all substrings of value that match the re2 regular expression, regexp. Returns an empty array if there is no match. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (CALCITE-5910) Implement BigQuery functions REGEXP_EXTRACT and REGEXP_EXTRACT_ALL
Jerin John created CALCITE-5910: --- Summary: Implement BigQuery functions REGEXP_EXTRACT and REGEXP_EXTRACT_ALL Key: CALCITE-5910 URL: https://issues.apache.org/jira/browse/CALCITE-5910 Project: Calcite Issue Type: Task Reporter: Jerin John Assignee: Jerin John Add support for REGEXP_CONTAINS function from BigQuery. Function returns TRUE if input value is a partial match for the regular expression. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (CALCITE-5873) Implement BigQuery functions REGEXP_CONTAINS
Jerin John created CALCITE-5873: --- Summary: Implement BigQuery functions REGEXP_CONTAINS Key: CALCITE-5873 URL: https://issues.apache.org/jira/browse/CALCITE-5873 Project: Calcite Issue Type: Task Reporter: Jerin John Assignee: Jerin John -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (CALCITE-5858) Implement BigQuery functions REGEXP_CONTAINS, REGEXP_EXTRACT, REGEXP_EXTRACT_ALL, REGEXP_INSTR, REGEXP_REPLACE and REGEXP_SUBSTR
Jerin John created CALCITE-5858: --- Summary: Implement BigQuery functions REGEXP_CONTAINS, REGEXP_EXTRACT, REGEXP_EXTRACT_ALL, REGEXP_INSTR, REGEXP_REPLACE and REGEXP_SUBSTR Key: CALCITE-5858 URL: https://issues.apache.org/jira/browse/CALCITE-5858 Project: Calcite Issue Type: New Feature Reporter: Jerin John Add support for REGEX_* string functions in BigQuery. Function descriptions: * REGEXP_CONTAINS: Returns TRUE if input value is a partial match for the regular expression. * REGEXP_EXTRACT: Returns the substring in input value that matches the regular expression. Returns NULL if there is no match. * REGEXP_EXTRACT_ALL: Returns an array of all substrings of input value that match the regular expression. Returns an empty array if there is no match. * REGEXP_INSTR: Returns the lowest 1-based position of a regular expression in an input value. * REGEXP_REPLACE: Returns a STRING where all substrings of input value that match regular expression are replaced with the input replacement. * REGEXP_SUBSTR: Synonym for REGEXP_EXTRACT, returns the substring in input value that matches the regular expression. Returns NULL if there is no match. More on these functions and their syntax may be found here: [BigQuery Doc|https://cloud.google.com/bigquery/docs/reference/standard-sql/string_functions#regexp_contains] -- This message was sent by Atlassian Jira (v8.20.10#820010)