[jira] [Created] (CALCITE-6375) Fix failing quidem tests for FORMAT in CAST

2024-04-18 Thread Jerin John (Jira)
Jerin John created CALCITE-6375:
---

 Summary: Fix failing quidem tests for FORMAT in CAST
 Key: CALCITE-6375
 URL: https://issues.apache.org/jira/browse/CALCITE-6375
 Project: Calcite
  Issue Type: Bug
Reporter: Jerin John


CALCITE-2980 implemented the FORMAT clause used inside the CAST operator. We 
had imported a large number of quidem tests from Apache Impala for the same, 
stored in the file: 
[cast-with-format.iq|https://github.com/apache/calcite/blob/main/core/src/test/resources/sql/cast-with-format.iq].
 
Many of these tests are currently disabled as they're either unsupported or 
incorrectly implemented, some tests outputs may also need to be changed as per 
dialect behavior. Creating this placeholder issue to track and fix the pending 
tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CALCITE-6367) Add timezone support for FORMAT clause in CAST (enabled in BigQuery)

2024-04-15 Thread Jerin John (Jira)
Jerin John created CALCITE-6367:
---

 Summary: Add timezone support for FORMAT clause in CAST (enabled 
in BigQuery)
 Key: CALCITE-6367
 URL: https://issues.apache.org/jira/browse/CALCITE-6367
 Project: Calcite
  Issue Type: Bug
Reporter: Jerin John


This issue is a followup on CALCITE-6269 that fixes some of Calcite's existing 
format elements implementation to be aligned to BQ functionality. Two major 
formats that might require a bit more rework is adding support for the TZH/TZM 
elements along with time zone areas as described below:
 * [Parsing timestamp 
literals|https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_tz_as_string]
 with timezones as used by BQ does not seem to be supported yet (format element 
TZR is unimplemented, BQ has TZH, TZM for hour and minute offsets)
(eg: {{cast('2020.06.03 00:00:53+00' as timestamp format '.MM.DD 
HH:MI:SSTZH')}}

 * BQ format [timezone as string 
|https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_tz_as_string]
 can take an additional argument {{{}AT TIME ZONE 'Asia/Kolkata'{}}}, which 
would require additional parser changes and time zone parameter to be plumbed 
in to the cast operator call.

One important thing to consider, is that the {{SimpleDateFormat}} class which 
currently stores the datetime object, may not fully support timezone features 
as described and might warrant a broader refactoring of this code.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CALCITE-6270) Add CAST from Numeric and BYTES to String FORMAT (Enabled in BigQuery)

2024-02-16 Thread Jerin John (Jira)
Jerin John created CALCITE-6270:
---

 Summary: Add CAST from Numeric and BYTES to String FORMAT (Enabled 
in BigQuery)
 Key: CALCITE-6270
 URL: https://issues.apache.org/jira/browse/CALCITE-6270
 Project: Calcite
  Issue Type: Bug
Reporter: Jerin John






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CALCITE-6269) Fix missing/broken BQ date-time format elements

2024-02-16 Thread Jerin John (Jira)
Jerin John created CALCITE-6269:
---

 Summary: Fix missing/broken BQ date-time format elements
 Key: CALCITE-6269
 URL: https://issues.apache.org/jira/browse/CALCITE-6269
 Project: Calcite
  Issue Type: Bug
Reporter: Jerin John
Assignee: Jerin John


Add support for missing/broken DATE-TIME format elements:

* [YYY / 
Y|https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_year_as_string]
 - last three or 1 digits of year
 * 
[|https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_year_as_string]
 - supports four or more digits in the year, Calcite using 
[DateString|https://github.com/apache/calcite/blob/3326475c766267d521330006cc80730c4e456191/core/src/main/java/org/apache/calcite/util/DateString.java]
 util throws:
{{java.lang.IllegalArgumentException: Year out of range: [12018]}}

 * 
[S|https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_second_as_string]
 - five digit seconds precision, only SS two digit precision is available
 * [FFn 
(n=1/2)|https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_second_as_string]
 - always returns seconds with precision 3 (=FF3); also BQ supports n=1-9, 
calcite format models supports n=1-6, should we expand this range?
 * 
[AM/PM|https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_meridian_as_string]
 - Meridian formats not available

 * [Parsing timestamp 
literals|https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_tz_as_string]
 with timezones as used by BQ does not seem to be supported yet (format element 
TZR is unimplemented, BQ has TZH, TZM for hour and minute offsets)
(eg: {{cast('2020.06.03 00:00:53+00' as timestamp format '.MM.DD 
HH:MI:SSTZH')}}

 * BQ format [timezone as string 
|https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_tz_as_string]
 can take an additional argument {{{}AT TIME ZONE 'Asia/Kolkata'{}}}, which 
would require additional parser changes and time zone parameter to be plumbed 
in to the cast operator call.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CALCITE-6092) Invalid test cases in CAST String to Time

2023-11-03 Thread Jerin John (Jira)
Jerin John created CALCITE-6092:
---

 Summary: Invalid test cases in CAST String to Time
 Key: CALCITE-6092
 URL: https://issues.apache.org/jira/browse/CALCITE-6092
 Project: Calcite
  Issue Type: Bug
Reporter: Jerin John


Encountered some 
[tests|https://github.com/apache/calcite/blob/590ec85f0fcff7173c288c350c3f60e640976a34/testkit/src/main/java/org/apache/calcite/test/SqlOperatorTest.java#L1237-L1238]
 within SqlOperatorTest file for the CAST operator on String to Datetime 
conversions, which are found to be invalid time strings on some tested 
instances of DBs like BQ, MySql, Postgres.

It seems these tests were originally ignored as `BAD_DATETIME_MESSAGE` but then 
updated to verify conversion performed by the CAST operator (refer to JIRA 
ticket: CALCITE-5554 and 
[commit)|https://github.com/apache/calcite/commit/625a2e03c4c5583279350bf04e3db2a31b1ec411#diff-fdd0c725fc7c6fd56965f59b1d51e4c7a9b5b5be27da2e54b8b8273dc980cd64R1118-R1119]

Considering that multiple dialects catch these cases as invalid time strings 
for conversion, code should be updated to handle them as exceptions and the 
tests to be corrected to reflect this behavior.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CALCITE-5979) Add REGEXP_REPLACE function (enabled in BigQuery library)

2023-09-05 Thread Jerin John (Jira)
Jerin John created CALCITE-5979:
---

 Summary: Add REGEXP_REPLACE function (enabled in BigQuery library)
 Key: CALCITE-5979
 URL: https://issues.apache.org/jira/browse/CALCITE-5979
 Project: Calcite
  Issue Type: Task
Reporter: Jerin John
Assignee: Jerin John
 Fix For: 1.36.0


Add support for 
[REGEXP_EXTRACT_ALL|https://cloud.google.com/bigquery/docs/reference/standard-sql/string_functions#regexp_extract_all]
 function from BigQuery.

*{{REGEXP_EXTRACT_ALL(value, regexp)}}*
Returns an array of all substrings of value that match the re2 regular 
expression, regexp. Returns an empty array if there is no match.

* If the regular expression contains a capturing group ((...)), and there is a 
match for that capturing group, that match is added to the results. If there 
are multiple matches for a capturing group, the last match is added to the 
results.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CALCITE-5978) Add REGEXP_INSTR function (enabled in BigQuery library)

2023-09-05 Thread Jerin John (Jira)
Jerin John created CALCITE-5978:
---

 Summary: Add REGEXP_INSTR function (enabled in BigQuery library)
 Key: CALCITE-5978
 URL: https://issues.apache.org/jira/browse/CALCITE-5978
 Project: Calcite
  Issue Type: Task
Reporter: Jerin John
Assignee: Jerin John
 Fix For: 1.36.0


Add support for 
[REGEXP_EXTRACT_ALL|https://cloud.google.com/bigquery/docs/reference/standard-sql/string_functions#regexp_extract_all]
 function from BigQuery.

*{{REGEXP_EXTRACT_ALL(value, regexp)}}*
Returns an array of all substrings of value that match the re2 regular 
expression, regexp. Returns an empty array if there is no match.

* If the regular expression contains a capturing group ((...)), and there is a 
match for that capturing group, that match is added to the results. If there 
are multiple matches for a capturing group, the last match is added to the 
results.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CALCITE-5911) Implement BigQuery function REGEXP_EXTRACT_ALL

2023-08-08 Thread Jerin John (Jira)
Jerin John created CALCITE-5911:
---

 Summary: Implement BigQuery function REGEXP_EXTRACT_ALL
 Key: CALCITE-5911
 URL: https://issues.apache.org/jira/browse/CALCITE-5911
 Project: Calcite
  Issue Type: Task
Reporter: Jerin John
Assignee: Jerin John


Add support for 
[REGEXP_EXTRACT|https://cloud.google.com/bigquery/docs/reference/standard-sql/string_functions#regexp_extract]
 and 
[REGEXP_EXTRACT_ALL|https://cloud.google.com/bigquery/docs/reference/standard-sql/string_functions#regexp_extract_all]
 functions from BigQuery.

*{{REGEXP_EXTRACT(value, regexp[, position[, occurrence]])}}*
Returns the substring in {{value}} that matches the regular expression 
{{{}regexp{}}}. Returns {{NULL}} if there is no match.

* If the regular expression contains a capturing group ({{{}(...){}}}), and 
there is a match for that capturing group, that match is returned. If there are 
multiple matches for a capturing group, the last match is returned.
* If {{position}} is specified, the search starts at this position in 
{{{}value{}}}, otherwise it starts at the beginning of {{{}value{}}}.
* If {{occurrence}} is specified, the search returns a specific occurrence of 
the {{regexp}} in {{{}value{}}}, otherwise returns the first match.

*{{REGEXP_EXTRACT_ALL(value, regexp)}}*
Returns an array of all substrings of value that match the re2 regular 
expression, regexp. Returns an empty array if there is no match.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CALCITE-5910) Implement BigQuery functions REGEXP_EXTRACT and REGEXP_EXTRACT_ALL

2023-08-08 Thread Jerin John (Jira)
Jerin John created CALCITE-5910:
---

 Summary: Implement BigQuery functions REGEXP_EXTRACT and 
REGEXP_EXTRACT_ALL
 Key: CALCITE-5910
 URL: https://issues.apache.org/jira/browse/CALCITE-5910
 Project: Calcite
  Issue Type: Task
Reporter: Jerin John
Assignee: Jerin John


Add support for REGEXP_CONTAINS function from BigQuery.

Function returns TRUE if input value is a partial match for the regular 
expression.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CALCITE-5873) Implement BigQuery functions REGEXP_CONTAINS

2023-07-25 Thread Jerin John (Jira)
Jerin John created CALCITE-5873:
---

 Summary: Implement BigQuery functions REGEXP_CONTAINS
 Key: CALCITE-5873
 URL: https://issues.apache.org/jira/browse/CALCITE-5873
 Project: Calcite
  Issue Type: Task
Reporter: Jerin John
Assignee: Jerin John






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CALCITE-5858) Implement BigQuery functions REGEXP_CONTAINS, REGEXP_EXTRACT, REGEXP_EXTRACT_ALL, REGEXP_INSTR, REGEXP_REPLACE and REGEXP_SUBSTR

2023-07-18 Thread Jerin John (Jira)
Jerin John created CALCITE-5858:
---

 Summary: Implement BigQuery functions REGEXP_CONTAINS, 
REGEXP_EXTRACT, REGEXP_EXTRACT_ALL, REGEXP_INSTR, REGEXP_REPLACE and 
REGEXP_SUBSTR
 Key: CALCITE-5858
 URL: https://issues.apache.org/jira/browse/CALCITE-5858
 Project: Calcite
  Issue Type: New Feature
Reporter: Jerin John


Add support for REGEX_* string functions in BigQuery.

Function descriptions:
 * REGEXP_CONTAINS: Returns TRUE if input value is a partial match for the 
regular expression.

 * REGEXP_EXTRACT: Returns the substring in input value that matches the 
regular expression. Returns NULL if there is no match.
 * REGEXP_EXTRACT_ALL: Returns an array of all substrings of input value that 
match the regular expression. Returns an empty array if there is no match.
 * REGEXP_INSTR: Returns the lowest 1-based position of a regular expression in 
an input value.
 * REGEXP_REPLACE: Returns a STRING where all substrings of input value that 
match regular expression are replaced with the input replacement.
 * REGEXP_SUBSTR: Synonym for REGEXP_EXTRACT, returns the substring in input 
value that matches the regular expression. Returns NULL if there is no match.

More on these functions and their syntax may be found here: [BigQuery 
Doc|https://cloud.google.com/bigquery/docs/reference/standard-sql/string_functions#regexp_contains]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)