[jira] [Commented] (DRILL-8424) Accommodate RexBuilder changes made for SAFE_CAST
[ https://issues.apache.org/jira/browse/DRILL-8424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713327#comment-17713327 ] ASF GitHub Bot commented on DRILL-8424: --- cgivre merged PR #2794: URL: https://github.com/apache/drill/pull/2794 > Accommodate RexBuilder changes made for SAFE_CAST > - > > Key: DRILL-8424 > URL: https://issues.apache.org/jira/browse/DRILL-8424 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.22.0 >Reporter: James Turton >Assignee: James Turton >Priority: Major > Fix For: 1.22.0 > > > The introduction of SAFE_CAST support in CALCITE-5575 made method signature > changes in RexBuilder that broke a needed override in DrillRexBuilder. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8421) Parquet TIMESTAMP_MICROS columns in WHERE clauses are not converted to milliseconds before filtering
[ https://issues.apache.org/jira/browse/DRILL-8421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713153#comment-17713153 ] ASF GitHub Bot commented on DRILL-8421: --- jnturton commented on PR #2793: URL: https://github.com/apache/drill/pull/2793#issuecomment-1511592469 > Thanks for the contribution and welcome to Drill! Would you mind rebasing once https://github.com/apache/drill/pull/2794 is merged? Heh, I just came here to type exactly this. I reviewed the code changes and they look great so really we just need the CI run after rebasing. > Parquet TIMESTAMP_MICROS columns in WHERE clauses are not converted to > milliseconds before filtering > > > Key: DRILL-8421 > URL: https://issues.apache.org/jira/browse/DRILL-8421 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet >Affects Versions: 1.21.0 >Reporter: Peter Franzen >Priority: Major > Fix For: 1.21.1 > > > When using Drill with parquet files where the timestamp columns are in > microseconds, Drill converts the microsecond values to milliseconds when > displayed. However, when using a timestamp column in WHERE clauses it looks > like the original microsecond value is used instead of the adjusted > millisecond value when filtering records. > *To Reproduce* > Assume a parquet file in a directory "Test" with a column _timestampCol_ > having the type > {{{}org.apache.parquet.schema.OriginalType.TIMESTAMP_MICROS{}}}. > Assume there are two records with the values 1673981999806149 and > 1674759597743552, respectively, in that column (i.e. the UTC dates > 2023-01-17T18:59:59.806149 and 2023-01-26T18:59:57.743552) > # Execute the query > {{SELECT timestampCol FROM dfs.Test;}} > The result includes both records, as expected. > # Execute the query > {{SELECT timestampCol FROM dfs.Test WHERE timestampCol < > TO_TIMESTAMP('2023-02-01 00:00:00', '-MM-dd HH:mm:ss')}} > This produces an empty result although both records have a value less than > the argument. > # Execute > {{SELECT timestampCol FROM dfs.Test WHERE timestampCol > > TO_TIMESTAMP('2023-02-01 00:00:00', '-MM-dd HH:mm:ss')}} > The result includes both records although neither have a value greater than > the argument. > *Expected behavior* > The query in 2) above should produce a result with both records, and the > query in 3) should produce an empty result. > *Additional context* > Even timestamps long into the future produce results with both records, e.g.: > {{SELECT timestampCol FROM dfs.Test WHERE timestampCol > > TO_TIMESTAMP('2502-04-04 00:00:00', '-MM-dd HH:mm:ss')}} > Manually converting the timestamp column to milliseconds produces the > expected result: > {{SELECT timestampCol FROM dfs.Test WHERE > TO_TIMESTAMP(CONVERT_FROM(CONVERT_TO(timestampCol, 'TIMESTAMP_EPOCH'), > 'BIGINT')/1000) < TO_TIMESTAMP('2023-02-01 00:00:00', '-MM-dd HH:mm:ss')}} > produces a result with both records. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8424) Accommodate RexBuilder changes made for SAFE_CAST
[ https://issues.apache.org/jira/browse/DRILL-8424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713150#comment-17713150 ] ASF GitHub Bot commented on DRILL-8424: --- cgivre commented on code in PR #2794: URL: https://github.com/apache/drill/pull/2794#discussion_r1168893372 ## exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/conversion/DrillRexBuilder.java: ## @@ -65,9 +65,9 @@ public RexNode ensureType( * @return Call to CAST operator */ @Override - public RexNode makeCast(RelDataType type, RexNode exp, boolean matchNullability) { + public RexNode makeCast(RelDataType type, RexNode exp, boolean matchNullability, boolean safe) { Review Comment: 🤦 > Accommodate RexBuilder changes made for SAFE_CAST > - > > Key: DRILL-8424 > URL: https://issues.apache.org/jira/browse/DRILL-8424 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.22.0 >Reporter: James Turton >Assignee: James Turton >Priority: Major > Fix For: 1.22.0 > > > The introduction of SAFE_CAST support in CALCITE-5575 made method signature > changes in RexBuilder that broke a needed override in DrillRexBuilder. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8424) Accommodate RexBuilder changes made for SAFE_CAST
[ https://issues.apache.org/jira/browse/DRILL-8424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713137#comment-17713137 ] ASF GitHub Bot commented on DRILL-8424: --- jnturton commented on code in PR #2794: URL: https://github.com/apache/drill/pull/2794#discussion_r1168863776 ## exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/conversion/DrillRexBuilder.java: ## @@ -65,9 +65,9 @@ public RexNode ensureType( * @return Call to CAST operator */ @Override - public RexNode makeCast(RelDataType type, RexNode exp, boolean matchNullability) { + public RexNode makeCast(RelDataType type, RexNode exp, boolean matchNullability, boolean safe) { Review Comment: They did do this and only deprecated the original method so our build wasn't broken but our subclass DrillRexBuilder was broken in terms of runtime logic because our method override no longer took effect when it needed to. > Accommodate RexBuilder changes made for SAFE_CAST > - > > Key: DRILL-8424 > URL: https://issues.apache.org/jira/browse/DRILL-8424 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.22.0 >Reporter: James Turton >Assignee: James Turton >Priority: Major > Fix For: 1.22.0 > > > The introduction of SAFE_CAST support in CALCITE-5575 made method signature > changes in RexBuilder that broke a needed override in DrillRexBuilder. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8424) Accommodate RexBuilder changes made for SAFE_CAST
[ https://issues.apache.org/jira/browse/DRILL-8424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713133#comment-17713133 ] ASF GitHub Bot commented on DRILL-8424: --- jnturton commented on code in PR #2794: URL: https://github.com/apache/drill/pull/2794#discussion_r1168859598 ## exec/java-exec/src/main/codegen/templates/Parser.jj: ## @@ -7727,6 +7764,8 @@ SqlPostfixOperator PostfixRowOperator() : | < DATETIME_INTERVAL_CODE: "DATETIME_INTERVAL_CODE" > | < DATETIME_INTERVAL_PRECISION: "DATETIME_INTERVAL_PRECISION" > | < DAY: "DAY" > +| < DAYOFWEEK: "DAYOFWEEK" > +| < DAYOFYEAR: "DAYOFYEAR" > Review Comment: Yes we should, thanks, added. > Accommodate RexBuilder changes made for SAFE_CAST > - > > Key: DRILL-8424 > URL: https://issues.apache.org/jira/browse/DRILL-8424 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.22.0 >Reporter: James Turton >Assignee: James Turton >Priority: Major > Fix For: 1.22.0 > > > The introduction of SAFE_CAST support in CALCITE-5575 made method signature > changes in RexBuilder that broke a needed override in DrillRexBuilder. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8424) Accommodate RexBuilder changes made for SAFE_CAST
[ https://issues.apache.org/jira/browse/DRILL-8424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713132#comment-17713132 ] ASF GitHub Bot commented on DRILL-8424: --- jnturton commented on code in PR #2794: URL: https://github.com/apache/drill/pull/2794#discussion_r1168857297 ## exec/java-exec/src/main/codegen/templates/Parser.jj: ## @@ -15,9 +15,11 @@ * limitations under the License. */ -// TODO: Delete this file to reinstate its extraction from calcite-core.jar -// once CALCITE-5579 is resolved and the incompatible grammar changes introduced -// by CALCITE-5469 have been backed out. Also see: exec/java-exec/pom.xml. Review Comment: Thanks, resolved. > Accommodate RexBuilder changes made for SAFE_CAST > - > > Key: DRILL-8424 > URL: https://issues.apache.org/jira/browse/DRILL-8424 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.22.0 >Reporter: James Turton >Assignee: James Turton >Priority: Major > Fix For: 1.22.0 > > > The introduction of SAFE_CAST support in CALCITE-5575 made method signature > changes in RexBuilder that broke a needed override in DrillRexBuilder. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8421) Parquet TIMESTAMP_MICROS columns in WHERE clauses are not converted to milliseconds before filtering
[ https://issues.apache.org/jira/browse/DRILL-8421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713127#comment-17713127 ] ASF GitHub Bot commented on DRILL-8421: --- handmadecode commented on PR #2793: URL: https://github.com/apache/drill/pull/2793#issuecomment-1511492806 @cgivre thanks, happy to contribute. I will rebase when 8424 is merged. > Parquet TIMESTAMP_MICROS columns in WHERE clauses are not converted to > milliseconds before filtering > > > Key: DRILL-8421 > URL: https://issues.apache.org/jira/browse/DRILL-8421 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet >Affects Versions: 1.21.0 >Reporter: Peter Franzen >Priority: Major > Fix For: 1.21.1 > > > When using Drill with parquet files where the timestamp columns are in > microseconds, Drill converts the microsecond values to milliseconds when > displayed. However, when using a timestamp column in WHERE clauses it looks > like the original microsecond value is used instead of the adjusted > millisecond value when filtering records. > *To Reproduce* > Assume a parquet file in a directory "Test" with a column _timestampCol_ > having the type > {{{}org.apache.parquet.schema.OriginalType.TIMESTAMP_MICROS{}}}. > Assume there are two records with the values 1673981999806149 and > 1674759597743552, respectively, in that column (i.e. the UTC dates > 2023-01-17T18:59:59.806149 and 2023-01-26T18:59:57.743552) > # Execute the query > {{SELECT timestampCol FROM dfs.Test;}} > The result includes both records, as expected. > # Execute the query > {{SELECT timestampCol FROM dfs.Test WHERE timestampCol < > TO_TIMESTAMP('2023-02-01 00:00:00', '-MM-dd HH:mm:ss')}} > This produces an empty result although both records have a value less than > the argument. > # Execute > {{SELECT timestampCol FROM dfs.Test WHERE timestampCol > > TO_TIMESTAMP('2023-02-01 00:00:00', '-MM-dd HH:mm:ss')}} > The result includes both records although neither have a value greater than > the argument. > *Expected behavior* > The query in 2) above should produce a result with both records, and the > query in 3) should produce an empty result. > *Additional context* > Even timestamps long into the future produce results with both records, e.g.: > {{SELECT timestampCol FROM dfs.Test WHERE timestampCol > > TO_TIMESTAMP('2502-04-04 00:00:00', '-MM-dd HH:mm:ss')}} > Manually converting the timestamp column to milliseconds produces the > expected result: > {{SELECT timestampCol FROM dfs.Test WHERE > TO_TIMESTAMP(CONVERT_FROM(CONVERT_TO(timestampCol, 'TIMESTAMP_EPOCH'), > 'BIGINT')/1000) < TO_TIMESTAMP('2023-02-01 00:00:00', '-MM-dd HH:mm:ss')}} > produces a result with both records. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8421) Parquet TIMESTAMP_MICROS columns in WHERE clauses are not converted to milliseconds before filtering
[ https://issues.apache.org/jira/browse/DRILL-8421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713123#comment-17713123 ] ASF GitHub Bot commented on DRILL-8421: --- cgivre commented on PR #2793: URL: https://github.com/apache/drill/pull/2793#issuecomment-1511473455 @handmadecode Thanks for the contribution and welcome to Drill! Would you mind rebasing once [DRILL-8424] (https://github.com/apache/drill/pull/2794) is merged? There are some CI issues which will be fixed by that PR. Thanks! > Parquet TIMESTAMP_MICROS columns in WHERE clauses are not converted to > milliseconds before filtering > > > Key: DRILL-8421 > URL: https://issues.apache.org/jira/browse/DRILL-8421 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet >Affects Versions: 1.21.0 >Reporter: Peter Franzen >Priority: Major > Fix For: 1.21.1 > > > When using Drill with parquet files where the timestamp columns are in > microseconds, Drill converts the microsecond values to milliseconds when > displayed. However, when using a timestamp column in WHERE clauses it looks > like the original microsecond value is used instead of the adjusted > millisecond value when filtering records. > *To Reproduce* > Assume a parquet file in a directory "Test" with a column _timestampCol_ > having the type > {{{}org.apache.parquet.schema.OriginalType.TIMESTAMP_MICROS{}}}. > Assume there are two records with the values 1673981999806149 and > 1674759597743552, respectively, in that column (i.e. the UTC dates > 2023-01-17T18:59:59.806149 and 2023-01-26T18:59:57.743552) > # Execute the query > {{SELECT timestampCol FROM dfs.Test;}} > The result includes both records, as expected. > # Execute the query > {{SELECT timestampCol FROM dfs.Test WHERE timestampCol < > TO_TIMESTAMP('2023-02-01 00:00:00', '-MM-dd HH:mm:ss')}} > This produces an empty result although both records have a value less than > the argument. > # Execute > {{SELECT timestampCol FROM dfs.Test WHERE timestampCol > > TO_TIMESTAMP('2023-02-01 00:00:00', '-MM-dd HH:mm:ss')}} > The result includes both records although neither have a value greater than > the argument. > *Expected behavior* > The query in 2) above should produce a result with both records, and the > query in 3) should produce an empty result. > *Additional context* > Even timestamps long into the future produce results with both records, e.g.: > {{SELECT timestampCol FROM dfs.Test WHERE timestampCol > > TO_TIMESTAMP('2502-04-04 00:00:00', '-MM-dd HH:mm:ss')}} > Manually converting the timestamp column to milliseconds produces the > expected result: > {{SELECT timestampCol FROM dfs.Test WHERE > TO_TIMESTAMP(CONVERT_FROM(CONVERT_TO(timestampCol, 'TIMESTAMP_EPOCH'), > 'BIGINT')/1000) < TO_TIMESTAMP('2023-02-01 00:00:00', '-MM-dd HH:mm:ss')}} > produces a result with both records. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8424) Accommodate RexBuilder changes made for SAFE_CAST
[ https://issues.apache.org/jira/browse/DRILL-8424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713119#comment-17713119 ] ASF GitHub Bot commented on DRILL-8424: --- cgivre commented on code in PR #2794: URL: https://github.com/apache/drill/pull/2794#discussion_r1168776897 ## exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/conversion/DrillRexBuilder.java: ## @@ -65,9 +65,9 @@ public RexNode ensureType( * @return Call to CAST operator */ @Override - public RexNode makeCast(RelDataType type, RexNode exp, boolean matchNullability) { + public RexNode makeCast(RelDataType type, RexNode exp, boolean matchNullability, boolean safe) { Review Comment: This really highlights an issue with Calcite. They really could have added an additional function something like below and nothing would have broken... ``` makeCast(RelDataType type, RexNode exp, boolean matchNullability) { return makeCast(type, exp, matchNullability, false); } ``` ## exec/java-exec/src/main/codegen/templates/Parser.jj: ## @@ -15,9 +15,11 @@ * limitations under the License. */ -// TODO: Delete this file to reinstate its extraction from calcite-core.jar -// once CALCITE-5579 is resolved and the incompatible grammar changes introduced -// by CALCITE-5469 have been backed out. Also see: exec/java-exec/pom.xml. Review Comment: Do we want to leave the original info here just so that we know which Calcite PRs we're waiting for? ## exec/java-exec/src/main/codegen/templates/Parser.jj: ## @@ -7727,6 +7764,8 @@ SqlPostfixOperator PostfixRowOperator() : | < DATETIME_INTERVAL_CODE: "DATETIME_INTERVAL_CODE" > | < DATETIME_INTERVAL_PRECISION: "DATETIME_INTERVAL_PRECISION" > | < DAY: "DAY" > +| < DAYOFWEEK: "DAYOFWEEK" > +| < DAYOFYEAR: "DAYOFYEAR" > Review Comment: Should we add a unit test for these synonyms? > Accommodate RexBuilder changes made for SAFE_CAST > - > > Key: DRILL-8424 > URL: https://issues.apache.org/jira/browse/DRILL-8424 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.22.0 >Reporter: James Turton >Assignee: James Turton >Priority: Major > Fix For: 1.22.0 > > > The introduction of SAFE_CAST support in CALCITE-5575 made method signature > changes in RexBuilder that broke a needed override in DrillRexBuilder. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8417) Allow Excel Reader to Ignore Formula Errors
[ https://issues.apache.org/jira/browse/DRILL-8417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713121#comment-17713121 ] ASF GitHub Bot commented on DRILL-8417: --- cgivre commented on PR #2783: URL: https://github.com/apache/drill/pull/2783#issuecomment-1511468213 Once https://github.com/apache/drill/pull/2794 is merged, I'll rebase and merge this, pending @jnturton's approval. > Allow Excel Reader to Ignore Formula Errors > --- > > Key: DRILL-8417 > URL: https://issues.apache.org/jira/browse/DRILL-8417 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Excel >Affects Versions: 1.21.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.21.1 > > > If Drill encounters an Excel formula which is invalid somehow, such as a > DIV/0, Drill is unable to proceed and throws a number format exception. > This PR adds a config parameter called ignoreErrors which allows Drill to > skip such records and returns null for that cell. Drill will also output a > log warning. When set to false, original behavior is retained. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8424) Accommodate RexBuilder changes made for SAFE_CAST
[ https://issues.apache.org/jira/browse/DRILL-8424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713077#comment-17713077 ] ASF GitHub Bot commented on DRILL-8424: --- jnturton commented on PR #2794: URL: https://github.com/apache/drill/pull/2794#issuecomment-1511297568 I botched the "Move distro tarball to the Maven install phase" commit but that's a one-liner and unrelated to any unit tests so I'll let the test tuns here complete before pushing its fix. > Accommodate RexBuilder changes made for SAFE_CAST > - > > Key: DRILL-8424 > URL: https://issues.apache.org/jira/browse/DRILL-8424 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.22.0 >Reporter: James Turton >Assignee: James Turton >Priority: Major > Fix For: 1.22.0 > > > The introduction of SAFE_CAST support in CALCITE-5575 made method signature > changes in RexBuilder that broke a needed override in DrillRexBuilder. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8424) Accommodate RexBuilder changes made for SAFE_CAST
[ https://issues.apache.org/jira/browse/DRILL-8424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713048#comment-17713048 ] ASF GitHub Bot commented on DRILL-8424: --- jnturton opened a new pull request, #2794: URL: https://github.com/apache/drill/pull/2794 # [DRILL-8424](https://issues.apache.org/jira/browse/DRILL-8424): Accommodate RexBuilder changes made for SAFE_CAST ## Description Resolves the current CI test failues affecting decimal and empty-literal-to-null casting. Also incorporates upstream syntax additions in Drill's Parser.jj which can only be dropped when [CALCITE-5579](https://issues.apache.org/jira/browse/CALCITE-5579) is resolved. * Incorporate method signature changes in RexBuilder made by CALCITE-5557. * Fix float rounding error in TestCastFunctions.testCastFloatDecimalOverflow. * Incorporate Calcite parser changes. * [CALCITE-5557] Add SAFE_CAST function (enabled in BigQuery library) * [CALCITE-5548] Add MSSQL-style CONVERT function (enabled in MSSql library) * [CALCITE-5554] In EXTRACT function, add DAYOFWEEK and DAYOFYEAR as synonyms for DOW, DOY * Ignore .mvn/maven.config. * Upgrade Apache RAT plugin. * Upgrade os-maven-plugin. * Move distro tarball to the Maven install phase. ## Documentation SAFE_CAST to be documented and relevant syntax additions to be documented. ## Testing Failing tests now pass. > Accommodate RexBuilder changes made for SAFE_CAST > - > > Key: DRILL-8424 > URL: https://issues.apache.org/jira/browse/DRILL-8424 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.22.0 >Reporter: James Turton >Assignee: James Turton >Priority: Major > Fix For: 1.22.0 > > > The introduction of SAFE_CAST support in CALCITE-5575 made method signature > changes in RexBuilder that broke a needed override in DrillRexBuilder. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8421) Parquet TIMESTAMP_MICROS columns in WHERE clauses are not converted to milliseconds before filtering
[ https://issues.apache.org/jira/browse/DRILL-8421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17712433#comment-17712433 ] ASF GitHub Bot commented on DRILL-8421: --- handmadecode opened a new pull request, #2793: URL: https://github.com/apache/drill/pull/2793 # [DRILL-8421](https://issues.apache.org/jira/browse/DRILL-8421): Truncate parquet microsecond columns ## Description The metadata min and max values of parquet microsecond columns are truncated to milliseconds, which is the time unit expected by the initial file pruning during filtering. Also, `TIME_MICROS` columns are read as 64-bit values before they are truncated to 32-bit milliseconds values. Previously they were read as 32-bit values, causing values > `Integer.MAX_VALUE` to be incorrect. The second fix also addresses [DRILL-8423](https://issues.apache.org/jira/browse/DRILL-8423). ## Documentation Bugfix only, no documentation changes ## Testing Unit tests added in new test class `org.apache.drill.exec.store.parquet.TestMicrosecondColumns`. > Parquet TIMESTAMP_MICROS columns in WHERE clauses are not converted to > milliseconds before filtering > > > Key: DRILL-8421 > URL: https://issues.apache.org/jira/browse/DRILL-8421 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet >Affects Versions: 1.21.0 >Reporter: Peter Franzen >Priority: Major > Fix For: 1.21.1 > > > When using Drill with parquet files where the timestamp columns are in > microseconds, Drill converts the microsecond values to milliseconds when > displayed. However, when using a timestamp column in WHERE clauses it looks > like the original microsecond value is used instead of the adjusted > millisecond value when filtering records. > *To Reproduce* > Assume a parquet file in a directory "Test" with a column _timestampCol_ > having the type > {{{}org.apache.parquet.schema.OriginalType.TIMESTAMP_MICROS{}}}. > Assume there are two records with the values 1673981999806149 and > 1674759597743552, respectively, in that column (i.e. the UTC dates > 2023-01-17T18:59:59.806149 and 2023-01-26T18:59:57.743552) > # Execute the query > {{SELECT timestampCol FROM dfs.Test;}} > The result includes both records, as expected. > # Execute the query > {{SELECT timestampCol FROM dfs.Test WHERE timestampCol < > TO_TIMESTAMP('2023-02-01 00:00:00', '-MM-dd HH:mm:ss')}} > This produces an empty result although both records have a value less than > the argument. > # Execute > {{SELECT timestampCol FROM dfs.Test WHERE timestampCol > > TO_TIMESTAMP('2023-02-01 00:00:00', '-MM-dd HH:mm:ss')}} > The result includes both records although neither have a value greater than > the argument. > *Expected behavior* > The query in 2) above should produce a result with both records, and the > query in 3) should produce an empty result. > *Additional context* > Even timestamps long into the future produce results with both records, e.g.: > {{SELECT timestampCol FROM dfs.Test WHERE timestampCol > > TO_TIMESTAMP('2502-04-04 00:00:00', '-MM-dd HH:mm:ss')}} > Manually converting the timestamp column to milliseconds produces the > expected result: > {{SELECT timestampCol FROM dfs.Test WHERE > TO_TIMESTAMP(CONVERT_FROM(CONVERT_TO(timestampCol, 'TIMESTAMP_EPOCH'), > 'BIGINT')/1000) < TO_TIMESTAMP('2023-02-01 00:00:00', '-MM-dd HH:mm:ss')}} > produces a result with both records. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8417) Allow Excel Reader to Ignore Formula Errors
[ https://issues.apache.org/jira/browse/DRILL-8417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17712254#comment-17712254 ] ASF GitHub Bot commented on DRILL-8417: --- jnturton commented on PR #2783: URL: https://github.com/apache/drill/pull/2783#issuecomment-1508106845 Reviewer's note: all format-excel tests do pass, the CI test failures here are a result of as yet unfixed breakage brought in by Calcite 1.35-SNAPSHOT. > Allow Excel Reader to Ignore Formula Errors > --- > > Key: DRILL-8417 > URL: https://issues.apache.org/jira/browse/DRILL-8417 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Excel >Affects Versions: 1.21.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.21.1 > > > If Drill encounters an Excel formula which is invalid somehow, such as a > DIV/0, Drill is unable to proceed and throws a number format exception. > This PR adds a config parameter called ignoreErrors which allows Drill to > skip such records and returns null for that cell. Drill will also output a > log warning. When set to false, original behavior is retained. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8417) Allow Excel Reader to Ignore Formula Errors
[ https://issues.apache.org/jira/browse/DRILL-8417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17712148#comment-17712148 ] ASF GitHub Bot commented on DRILL-8417: --- cgivre commented on PR #2783: URL: https://github.com/apache/drill/pull/2783#issuecomment-1507822172 > @jnturton I updated the PR to default to `false` and updated the README as well. > Allow Excel Reader to Ignore Formula Errors > --- > > Key: DRILL-8417 > URL: https://issues.apache.org/jira/browse/DRILL-8417 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Excel >Affects Versions: 1.21.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.21.1 > > > If Drill encounters an Excel formula which is invalid somehow, such as a > DIV/0, Drill is unable to proceed and throws a number format exception. > This PR adds a config parameter called ignoreErrors which allows Drill to > skip such records and returns null for that cell. Drill will also output a > log warning. When set to false, original behavior is retained. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8412) Upgrade to Calcite 1.35-SNAPSHOT
[ https://issues.apache.org/jira/browse/DRILL-8412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17710173#comment-17710173 ] ASF GitHub Bot commented on DRILL-8412: --- jnturton merged PR #2776: URL: https://github.com/apache/drill/pull/2776 > Upgrade to Calcite 1.35-SNAPSHOT > > > Key: DRILL-8412 > URL: https://issues.apache.org/jira/browse/DRILL-8412 > Project: Apache Drill > Issue Type: Task > Components: SQL Parser >Reporter: James Turton >Assignee: James Turton >Priority: Major > > This issue proposes that we try basing Drill master on snapshot builds of the > upcoming version of Calcite so that the CI tests that run automatically upon > commits to master will exercise Drill with present day Calcite. > 1. The CI tests that run automatically upon commits to Drill master will > exercise Drill with present day Calcite. > 2. Breaking changes in Calcite would (mostly) break the Drill CI and force us > to deal with them in order to proceed. > 3. Regressions in Calcite would (mostly) break the Drill CI and force to > report them in order to proceed. > 4. If Drill master becomes too unstable when it is based on Calcite snapshots > then this change is trivially undoable. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8420) Remove Guava shading and patching
[ https://issues.apache.org/jira/browse/DRILL-8420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17709249#comment-17709249 ] ASF GitHub Bot commented on DRILL-8420: --- jnturton commented on PR #2786: URL: https://github.com/apache/drill/pull/2786#issuecomment-1498616790 @cgivre > Thanks for this. Aside from imports, which files were actually modified and I'll do a review? Yes, I'll split the import statement changes into a separate commit. > Do we want to add this to back port to stable? I don't think so. It's not a fix and it definitely carries a risk of breakage with it. Currently the Hadoop 2 build is broken because Drill's Guava patches (but not shading) are still needed in that case so I'll set the PR to draft for the moment. > Remove Guava shading and patching > - > > Key: DRILL-8420 > URL: https://issues.apache.org/jira/browse/DRILL-8420 > Project: Apache Drill > Issue Type: Improvement > Components: Tools, Build & Test >Affects Versions: 1.21.0 >Reporter: James Turton >Assignee: James Turton >Priority: Major > Fix For: 1.22.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8420) Remove Guava shading and patching
[ https://issues.apache.org/jira/browse/DRILL-8420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17709074#comment-17709074 ] ASF GitHub Bot commented on DRILL-8420: --- cgivre commented on PR #2786: URL: https://github.com/apache/drill/pull/2786#issuecomment-1497991025 Do we want to add this to back port to stable? > Remove Guava shading and patching > - > > Key: DRILL-8420 > URL: https://issues.apache.org/jira/browse/DRILL-8420 > Project: Apache Drill > Issue Type: Improvement > Components: Tools, Build & Test >Affects Versions: 1.21.0 >Reporter: James Turton >Assignee: James Turton >Priority: Major > Fix For: 1.22.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8420) Remove Guava shading and patching
[ https://issues.apache.org/jira/browse/DRILL-8420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17709072#comment-17709072 ] ASF GitHub Bot commented on DRILL-8420: --- cgivre commented on PR #2786: URL: https://github.com/apache/drill/pull/2786#issuecomment-1497990433 @jnturton Thanks for this. Aside from imports, which files were actually modified and I'll do a review? > Remove Guava shading and patching > - > > Key: DRILL-8420 > URL: https://issues.apache.org/jira/browse/DRILL-8420 > Project: Apache Drill > Issue Type: Improvement > Components: Tools, Build & Test >Affects Versions: 1.21.0 >Reporter: James Turton >Assignee: James Turton >Priority: Major > Fix For: 1.22.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8412) Upgrade to Calcite 1.35-SNAPSHOT
[ https://issues.apache.org/jira/browse/DRILL-8412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17708439#comment-17708439 ] ASF GitHub Bot commented on DRILL-8412: --- jnturton commented on PR #2776: URL: https://github.com/apache/drill/pull/2776#issuecomment-1496047719 Can someone do the formailty of approving so that we can give this a try? > Upgrade to Calcite 1.35-SNAPSHOT > > > Key: DRILL-8412 > URL: https://issues.apache.org/jira/browse/DRILL-8412 > Project: Apache Drill > Issue Type: Task > Components: SQL Parser >Reporter: James Turton >Assignee: James Turton >Priority: Major > > This issue proposes that we try basing Drill master on snapshot builds of the > upcoming version of Calcite so that the CI tests that run automatically upon > commits to master will exercise Drill with present day Calcite. > 1. The CI tests that run automatically upon commits to Drill master will > exercise Drill with present day Calcite. > 2. Breaking changes in Calcite would (mostly) break the Drill CI and force us > to deal with them in order to proceed. > 3. Regressions in Calcite would (mostly) break the Drill CI and force to > report them in order to proceed. > 4. If Drill master becomes too unstable when it is based on Calcite snapshots > then this change is trivially undoable. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8416) Memory leak when the async Parquet reader skips empty pages
[ https://issues.apache.org/jira/browse/DRILL-8416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17708400#comment-17708400 ] ASF GitHub Bot commented on DRILL-8416: --- jnturton merged PR #2784: URL: https://github.com/apache/drill/pull/2784 > Memory leak when the async Parquet reader skips empty pages > --- > > Key: DRILL-8416 > URL: https://issues.apache.org/jira/browse/DRILL-8416 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet >Affects Versions: 1.21.0 >Reporter: Matthias Rosenthaler >Assignee: James Turton >Priority: Major > Fix For: 1.21.1 > > Attachments: example.parquet, meta_steps.parquet > > > If I try to query ( > {code:java} > SELECT * FROM > `hdfs.data`.`./v2/meta_steps/me-2023-03-20-13-15-30-inv230021-kontrollsystemf39st9qrx20-03-2/meta_steps.parquet`{code} > ) the following parquet file which is stored on hadoop file system I am > getting the following error: > {code:java} > org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: > IllegalStateException: Memory was leaked by query. Memory leaked: (64) > Allocator(op:0:0:1:ParquetRowGroupScan) 100/64/34688/100 > (res/actual/peak/limit){code} > Everything is working fine with drill version 1.19. > If I select only columns without NULL values, the query also works in 1.21.0: > {code:java} > SELECT `name`,`type` FROM > `hdfs.data`.`./v2/meta_steps/me-2023-03-20-13-15-30-inv230021-kontrollsystemf39st9qrx20-03-2/meta_steps.parquet`{code} > Generated a new example.parquet with pyarrow 8.0.0 and a float column with > NULL valuues and the same error happened. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8420) Remove Guava shading and patching, and the conjars repo
[ https://issues.apache.org/jira/browse/DRILL-8420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17707995#comment-17707995 ] ASF GitHub Bot commented on DRILL-8420: --- jnturton opened a new pull request, #2786: URL: https://github.com/apache/drill/pull/2786 # [DRILL-8420](https://issues.apache.org/jira/browse/DRILL-8420): Remove Guava shading and patching, and the conjars repo ## Description - Remove shaded Guava. - Drop conjars repository. - Drop Guava patches. - Upgrade guava to 31.1-jre. - Upgrade parquet to 1.12.3 and parquet-format to 2.9.0. - Move Splunk Maven repository declaration to contrib/storage-splunk/pom.xml. ## Documentation N/A ## Testing Existing unit tests. > Remove Guava shading and patching, and the conjars repo > --- > > Key: DRILL-8420 > URL: https://issues.apache.org/jira/browse/DRILL-8420 > Project: Apache Drill > Issue Type: Improvement > Components: Tools, Build & Test >Affects Versions: 1.21.0 >Reporter: James Turton >Assignee: James Turton >Priority: Major > Fix For: 1.21.1 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8416) Memory leak when the async Parquet reader skips empty pages
[ https://issues.apache.org/jira/browse/DRILL-8416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17706953#comment-17706953 ] ASF GitHub Bot commented on DRILL-8416: --- jnturton opened a new pull request, #2784: URL: https://github.com/apache/drill/pull/2784 # [DRILL-8416](https://issues.apache.org/jira/browse/DRILL-8416): Memory leak when the async Parquet reader skips empty pages ## Description A regression introduced by the Parquet reader clean-up released in Drill 1.20 has meant that buffers used for (non-empty) compressed data holding _empty_ dictionary or data pages which are skipped are not freed. Because empty pages are uncommon in real data this bug went undetected for a long time. ## Documentation N/A ## Testing New unit test. > Memory leak when the async Parquet reader skips empty pages > --- > > Key: DRILL-8416 > URL: https://issues.apache.org/jira/browse/DRILL-8416 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet >Affects Versions: 1.21.0 >Reporter: Matthias Rosenthaler >Assignee: James Turton >Priority: Major > Fix For: 1.21.1 > > Attachments: example.parquet, meta_steps.parquet > > > If I try to query ( > {code:java} > SELECT * FROM > `hdfs.data`.`./v2/meta_steps/me-2023-03-20-13-15-30-inv230021-kontrollsystemf39st9qrx20-03-2/meta_steps.parquet`{code} > ) the following parquet file which is stored on hadoop file system I am > getting the following error: > {code:java} > org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: > IllegalStateException: Memory was leaked by query. Memory leaked: (64) > Allocator(op:0:0:1:ParquetRowGroupScan) 100/64/34688/100 > (res/actual/peak/limit){code} > Everything is working fine with drill version 1.19. > If I select only columns without NULL values, the query also works in 1.21.0: > {code:java} > SELECT `name`,`type` FROM > `hdfs.data`.`./v2/meta_steps/me-2023-03-20-13-15-30-inv230021-kontrollsystemf39st9qrx20-03-2/meta_steps.parquet`{code} > Generated a new example.parquet with pyarrow 8.0.0 and a float column with > NULL valuues and the same error happened. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8417) Allow Excel Reader to Ignore Formula Errors
[ https://issues.apache.org/jira/browse/DRILL-8417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17706895#comment-17706895 ] ASF GitHub Bot commented on DRILL-8417: --- cgivre opened a new pull request, #2783: URL: https://github.com/apache/drill/pull/2783 # [DRILL-8417](https://issues.apache.org/jira/browse/DRILL-8417): Allow Excel Reader to Ignore Formula Errors ## Description If Drill encounters an Excel formula which is invalid somehow, such as a `DIV/0`, Drill is unable to proceed and throws a number format exception. This PR adds a config parameter called `ignoreErrors` which allows Drill to skip such records and returns `null` for that cell. Drill will also output a log warning. When set to `false`, original behavior is retained. ## Documentation Updated README * `ignoreErrors`: Defaults to `true`. When set to `true` Drill will return `null` for any formulas or any values that are unparseable. ## Testing Added two unit tests. > Allow Excel Reader to Ignore Formula Errors > --- > > Key: DRILL-8417 > URL: https://issues.apache.org/jira/browse/DRILL-8417 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Excel >Affects Versions: 1.21.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.21.1 > > > If Drill encounters an Excel formula which is invalid somehow, such as a > DIV/0, Drill is unable to proceed and throws a number format exception. > This PR adds a config parameter called ignoreErrors which allows Drill to > skip such records and returns null for that cell. Drill will also output a > log warning. When set to false, original behavior is retained. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8409) Support the configuration of bind addresses for network services
[ https://issues.apache.org/jira/browse/DRILL-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17706814#comment-17706814 ] ASF GitHub Bot commented on DRILL-8409: --- jnturton commented on PR #2777: URL: https://github.com/apache/drill/pull/2777#issuecomment-1490042762 > @jnturton Sorry for the late review. The only thing that I would add is a mention of `drill.exec.rpc.bind_addr` and `drill.exec.http.bind_addr` in [`drill-override-example.conf`](https://github.com/apache/drill/blob/master/distribution/src/main/resources/drill-override-example.conf) if this file is still maintainable of course. In addition to our documentation. See #2782 . > Support the configuration of bind addresses for network services > > > Key: DRILL-8409 > URL: https://issues.apache.org/jira/browse/DRILL-8409 > Project: Apache Drill > Issue Type: Bug > Components: Server >Affects Versions: 1.21.0 >Reporter: James Turton >Assignee: James Turton >Priority: Major > Fix For: 1.21.1 > > > Drill provides the DRILL_HOST_NAME env var which determines what Drillbit > host name will be exchanged over RPC for later look up by a remote client or > Drillbit. This host name is used to check whether Drill is being asked to > bind to the loopback address in distributed mode > {code:java} > if (isDistributedMode && > InetAddress.getByName(hostName).isLoopbackAddress()) { > throw new DrillbitStartupException("Drillbit is disallowed to bind to > loopback address in distributed mode."); > }{code} > but is not subsequently used set the bind address used for the Drillbit's RPC > and web ports! This issue proposes that the Drillbit network services bind > address is determined by DRILL_HOST_NAME. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8409) Support the configuration of bind addresses for network services
[ https://issues.apache.org/jira/browse/DRILL-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17706813#comment-17706813 ] ASF GitHub Bot commented on DRILL-8409: --- jnturton commented on PR #2777: URL: https://github.com/apache/drill/pull/2777#issuecomment-1490042016 > LGTM +1. Do we need any doc updates for this? Documented. > Support the configuration of bind addresses for network services > > > Key: DRILL-8409 > URL: https://issues.apache.org/jira/browse/DRILL-8409 > Project: Apache Drill > Issue Type: Bug > Components: Server >Affects Versions: 1.21.0 >Reporter: James Turton >Assignee: James Turton >Priority: Major > Fix For: 1.21.1 > > > Drill provides the DRILL_HOST_NAME env var which determines what Drillbit > host name will be exchanged over RPC for later look up by a remote client or > Drillbit. This host name is used to check whether Drill is being asked to > bind to the loopback address in distributed mode > {code:java} > if (isDistributedMode && > InetAddress.getByName(hostName).isLoopbackAddress()) { > throw new DrillbitStartupException("Drillbit is disallowed to bind to > loopback address in distributed mode."); > }{code} > but is not subsequently used set the bind address used for the Drillbit's RPC > and web ports! This issue proposes that the Drillbit network services bind > address is determined by DRILL_HOST_NAME. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8409) Support the configuration of bind addresses for network services
[ https://issues.apache.org/jira/browse/DRILL-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17706528#comment-17706528 ] ASF GitHub Bot commented on DRILL-8409: --- rymarm commented on PR #2777: URL: https://github.com/apache/drill/pull/2777#issuecomment-1489063820 @jnturton Sorry for the late review. The only thing that I would add is a mention of `drill.exec.rpc.bind_addr` and `drill.exec.http.bind_addr` in [`drill-override-example.conf`](https://github.com/apache/drill/blob/master/distribution/src/main/resources/drill-override-example.conf) if this file is still maintainable of course. In addition to our documentation. > Support the configuration of bind addresses for network services > > > Key: DRILL-8409 > URL: https://issues.apache.org/jira/browse/DRILL-8409 > Project: Apache Drill > Issue Type: Bug > Components: Server >Affects Versions: 1.21.0 >Reporter: James Turton >Assignee: James Turton >Priority: Major > Fix For: 1.21.1 > > > Drill provides the DRILL_HOST_NAME env var which determines what Drillbit > host name will be exchanged over RPC for later look up by a remote client or > Drillbit. This host name is used to check whether Drill is being asked to > bind to the loopback address in distributed mode > {code:java} > if (isDistributedMode && > InetAddress.getByName(hostName).isLoopbackAddress()) { > throw new DrillbitStartupException("Drillbit is disallowed to bind to > loopback address in distributed mode."); > }{code} > but is not subsequently used set the bind address used for the Drillbit's RPC > and web ports! This issue proposes that the Drillbit network services bind > address is determined by DRILL_HOST_NAME. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8409) Support the configuration of bind addresses for network services
[ https://issues.apache.org/jira/browse/DRILL-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17706437#comment-17706437 ] ASF GitHub Bot commented on DRILL-8409: --- jnturton merged PR #2777: URL: https://github.com/apache/drill/pull/2777 > Support the configuration of bind addresses for network services > > > Key: DRILL-8409 > URL: https://issues.apache.org/jira/browse/DRILL-8409 > Project: Apache Drill > Issue Type: Bug > Components: Server >Affects Versions: 1.21.0 >Reporter: James Turton >Assignee: James Turton >Priority: Major > Fix For: 1.21.1 > > > Drill provides the DRILL_HOST_NAME env var which determines what Drillbit > host name will be exchanged over RPC for later look up by a remote client or > Drillbit. This host name is used to check whether Drill is being asked to > bind to the loopback address in distributed mode > {code:java} > if (isDistributedMode && > InetAddress.getByName(hostName).isLoopbackAddress()) { > throw new DrillbitStartupException("Drillbit is disallowed to bind to > loopback address in distributed mode."); > }{code} > but is not subsequently used set the bind address used for the Drillbit's RPC > and web ports! This issue proposes that the Drillbit network services bind > address is determined by DRILL_HOST_NAME. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8409) Support the configuration of bind addresses for network services
[ https://issues.apache.org/jira/browse/DRILL-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17706436#comment-17706436 ] ASF GitHub Bot commented on DRILL-8409: --- jnturton commented on PR #2777: URL: https://github.com/apache/drill/pull/2777#issuecomment-1488817097 > LGTM +1. Do we need any doc updates for this? Yes I need to document the two new boot options, thanks. > Support the configuration of bind addresses for network services > > > Key: DRILL-8409 > URL: https://issues.apache.org/jira/browse/DRILL-8409 > Project: Apache Drill > Issue Type: Bug > Components: Server >Affects Versions: 1.21.0 >Reporter: James Turton >Assignee: James Turton >Priority: Major > Fix For: 1.21.1 > > > Drill provides the DRILL_HOST_NAME env var which determines what Drillbit > host name will be exchanged over RPC for later look up by a remote client or > Drillbit. This host name is used to check whether Drill is being asked to > bind to the loopback address in distributed mode > {code:java} > if (isDistributedMode && > InetAddress.getByName(hostName).isLoopbackAddress()) { > throw new DrillbitStartupException("Drillbit is disallowed to bind to > loopback address in distributed mode."); > }{code} > but is not subsequently used set the bind address used for the Drillbit's RPC > and web ports! This issue proposes that the Drillbit network services bind > address is determined by DRILL_HOST_NAME. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8412) Upgrade to Calcite 1.35-SNAPSHOT
[ https://issues.apache.org/jira/browse/DRILL-8412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17702884#comment-17702884 ] ASF GitHub Bot commented on DRILL-8412: --- jnturton commented on PR #2776: URL: https://github.com/apache/drill/pull/2776#issuecomment-1476733563 > @jnturton I think we should definitely run this experiment. I'm also curious as to how to run this with a specific PR from Calcite. That way I can contribute to the review process over there and find things that break drill quickly. They won't build and publish artefacts for unmerged PRs (would be a security problem) so we'll have to run our own Calcite builds for these. Probably the best is for the developer to pull the PR into a local branch and build Calcite putting the results in their local Maven repo, then build Drill. > Upgrade to Calcite 1.35-SNAPSHOT > > > Key: DRILL-8412 > URL: https://issues.apache.org/jira/browse/DRILL-8412 > Project: Apache Drill > Issue Type: Task > Components: SQL Parser >Reporter: James Turton >Assignee: James Turton >Priority: Major > > This issue proposes that we try basing Drill master on snapshot builds of the > upcoming version of Calcite so that the CI tests that run automatically upon > commits to master will exercise Drill with present day Calcite. > 1. The CI tests that run automatically upon commits to Drill master will > exercise Drill with present day Calcite. > 2. Breaking changes in Calcite would (mostly) break the Drill CI and force us > to deal with them in order to proceed. > 3. Regressions in Calcite would (mostly) break the Drill CI and force to > report them in order to proceed. > 4. If Drill master becomes too unstable when it is based on Calcite snapshots > then this change is trivially undoable. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8412) Upgrade to Calcite 1.35-SNAPSHOT
[ https://issues.apache.org/jira/browse/DRILL-8412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17702873#comment-17702873 ] ASF GitHub Bot commented on DRILL-8412: --- cgivre commented on PR #2776: URL: https://github.com/apache/drill/pull/2776#issuecomment-1476706574 @jnturton I think we should definitely run this experiment. I'm also curious as to how to run this with a specific PR from Calcite. That way I can contribute to the review process over there and find things that break drill quickly. > Upgrade to Calcite 1.35-SNAPSHOT > > > Key: DRILL-8412 > URL: https://issues.apache.org/jira/browse/DRILL-8412 > Project: Apache Drill > Issue Type: Task > Components: SQL Parser >Reporter: James Turton >Assignee: James Turton >Priority: Major > > This issue proposes that we try basing Drill master on snapshot builds of the > upcoming version of Calcite so that the CI tests that run automatically upon > commits to master will exercise Drill with present day Calcite. > 1. The CI tests that run automatically upon commits to Drill master will > exercise Drill with present day Calcite. > 2. Breaking changes in Calcite would (mostly) break the Drill CI and force us > to deal with them in order to proceed. > 3. Regressions in Calcite would (mostly) break the Drill CI and force to > report them in order to proceed. > 4. If Drill master becomes too unstable when it is based on Calcite snapshots > then this change is trivially undoable. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8414) Index Paginator Not Working When Provided URL
[ https://issues.apache.org/jira/browse/DRILL-8414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17702872#comment-17702872 ] ASF GitHub Bot commented on DRILL-8414: --- cgivre merged PR #2779: URL: https://github.com/apache/drill/pull/2779 > Index Paginator Not Working When Provided URL > - > > Key: DRILL-8414 > URL: https://issues.apache.org/jira/browse/DRILL-8414 > Project: Apache Drill > Issue Type: Bug > Components: Storage - HTTP >Affects Versions: 1.21.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.21.1 > > > The index paginator offers two options: One where the API returns an index > or offset and the other is when it returns a URL. The second was not fully > implemented. This PR also adds functionality in the case where the API > returns a path rather than a URL. In that case, the path will replace the > pre-existing path segments. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8412) Upgrade to Calcite 1.35-SNAPSHOT
[ https://issues.apache.org/jira/browse/DRILL-8412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17702475#comment-17702475 ] ASF GitHub Bot commented on DRILL-8412: --- jnturton commented on PR #2776: URL: https://github.com/apache/drill/pull/2776#issuecomment-1475747749 @cgivre, @vvysotskyi, @rymarm, @luocooong are you up for running this experiment for a while? It's trivial to revert but I personally feel there's a good chance that we won't want to. > Upgrade to Calcite 1.35-SNAPSHOT > > > Key: DRILL-8412 > URL: https://issues.apache.org/jira/browse/DRILL-8412 > Project: Apache Drill > Issue Type: Task > Components: SQL Parser >Reporter: James Turton >Assignee: James Turton >Priority: Major > > This issue proposes that we try basing Drill master on snapshot builds of the > upcoming version of Calcite so that the CI tests that run automatically upon > commits to master will exercise Drill with present day Calcite. > 1. The CI tests that run automatically upon commits to Drill master will > exercise Drill with present day Calcite. > 2. Breaking changes in Calcite would (mostly) break the Drill CI and force us > to deal with them in order to proceed. > 3. Regressions in Calcite would (mostly) break the Drill CI and force to > report them in order to proceed. > 4. If Drill master becomes too unstable when it is based on Calcite snapshots > then this change is trivially undoable. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8413) Add DNS Lookup Functions
[ https://issues.apache.org/jira/browse/DRILL-8413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17702300#comment-17702300 ] ASF GitHub Bot commented on DRILL-8413: --- cgivre merged PR #2778: URL: https://github.com/apache/drill/pull/2778 > Add DNS Lookup Functions > > > Key: DRILL-8413 > URL: https://issues.apache.org/jira/browse/DRILL-8413 > Project: Apache Drill > Issue Type: New Feature > Components: Functions - Drill >Affects Versions: 1.21.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.22 > > > This PR adds additional DNS lookup functions to Drill: > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8413) Add DNS Lookup Functions
[ https://issues.apache.org/jira/browse/DRILL-8413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17702281#comment-17702281 ] ASF GitHub Bot commented on DRILL-8413: --- jnturton commented on code in PR #2778: URL: https://github.com/apache/drill/pull/2778#discussion_r1141347355 ## contrib/udfs/src/test/java/org/apache/drill/exec/udfs/TestDNSFunctions.java: ## @@ -0,0 +1,79 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.udfs; + +import org.apache.drill.categories.SqlFunctionTest; +import org.apache.drill.categories.UnlikelyTest; +import org.apache.drill.test.ClusterFixture; +import org.apache.drill.test.ClusterFixtureBuilder; +import org.apache.drill.test.ClusterTest; +import org.junit.BeforeClass; +import org.junit.Test; +import org.junit.experimental.categories.Category; + +@Category({UnlikelyTest.class, SqlFunctionTest.class}) +public class TestDNSFunctions extends ClusterTest { + + @BeforeClass + public static void setup() throws Exception { +ClusterFixtureBuilder builder = ClusterFixture.builder(dirTestWatcher); +startCluster(builder); + } + + @Test + public void testGetHostAddress() throws Exception { +String query = "select get_host_address('apache.org') as hostname from (values(1))"; + testBuilder().sqlQuery(query).ordered().baselineColumns("hostname").baselineValues("151.101.2.132").go(); Review Comment: I guess these tests are technically nondeterministic but so seldom that it's nothing to worry about. ## contrib/udfs/README.md: ## @@ -436,3 +436,11 @@ The functions are: [1]: https://github.com/target/huntlib + +# DNS Functions Review Comment: It would be nice to mention that the JRE caches DNS records for their TTL which should mean that these functions can scale to big datasets if the number of distinct domains that need to be looked up is not big. ## contrib/udfs/src/main/java/org/apache/drill/exec/udfs/DNSUtils.java: ## @@ -0,0 +1,234 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.drill.exec.udfs; + +import io.netty.buffer.DrillBuf; +import org.apache.commons.lang3.StringUtils; +import org.apache.commons.net.whois.WhoisClient; +import org.apache.drill.common.exceptions.UserException; +import org.apache.drill.exec.expr.holders.VarCharHolder; +import org.apache.drill.exec.vector.complex.writer.BaseWriter.ComplexWriter; +import org.apache.drill.exec.vector.complex.writer.BaseWriter.ListWriter; +import org.apache.drill.exec.vector.complex.writer.BaseWriter.MapWriter; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; +import org.xbill.DNS.Lookup; +import org.xbill.DNS.Record; +import org.xbill.DNS.SimpleResolver; +import org.xbill.DNS.TextParseException; +import org.xbill.DNS.Type; + +import java.io.IOException; +import java.net.SocketException; +import java.net.UnknownHostException; +import java.util.HashMap; +import java.util.Map; +import java.util.regex.Matcher; +import java.util.regex.Pattern; + +/** + * Utility class which contains various methods for performing DNS resolution and WHOIS lookups in Drill UDFs. + */ +public class DNSUtils { + + private static final Logger logger = LoggerFactory.getLogger(DNSUtils.class); + /** + * A list of known DNS resolvers. + */ + private static final Map KNOWN_RESOLVERS = new HashMap<>();
[jira] [Commented] (DRILL-8414) Index Paginator Not Working When Provided URL
[ https://issues.apache.org/jira/browse/DRILL-8414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17702194#comment-17702194 ] ASF GitHub Bot commented on DRILL-8414: --- cgivre opened a new pull request, #2779: URL: https://github.com/apache/drill/pull/2779 # [DRILL-8414](https://issues.apache.org/jira/browse/DRILL-8414): Index Paginator Not Working When Provided URL ## Description The index paginator offers two options: One where the API returns an index or offset and the other is when it returns a URL. The second was not fully implemented. This PR also adds functionality in the case where the API returns a path rather than a URL. In that case, the path will replace the pre-existing path segments. ## Documentation No user facing changes. ## Testing Added three additional unit tests and verified URL generation manually. > Index Paginator Not Working When Provided URL > - > > Key: DRILL-8414 > URL: https://issues.apache.org/jira/browse/DRILL-8414 > Project: Apache Drill > Issue Type: Bug > Components: Storage - HTTP >Affects Versions: 1.21.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.21.1 > > > The index paginator offers two options: One where the API returns an index > or offset and the other is when it returns a URL. The second was not fully > implemented. This PR also adds functionality in the case where the API > returns a path rather than a URL. In that case, the path will replace the > pre-existing path segments. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8413) Add DNS Lookup Functions
[ https://issues.apache.org/jira/browse/DRILL-8413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17701782#comment-17701782 ] ASF GitHub Bot commented on DRILL-8413: --- cgivre opened a new pull request, #2778: URL: https://github.com/apache/drill/pull/2778 # [DRILL-8413](https://issues.apache.org/jira/browse/DRILL-8413): Add DNS Lookup Functions ## Description See below ## Documentation These functions enable DNS research using Drill. * `getHostName()`: Returns the host name associated with an IP address. * `getHostAddress()`: Returns an IP address associated with a host name. * `dnsLookup(, [])`: Performs a DNS lookup on a given host. You can optionally provide a resolver. Possible resolver values are: `cloudflare`, `cloudflare_secondary`, `google`, `google_secondary`, `verisign`, `verisign_secondary`, `yandex`, `yandex_secondary`. * `whois(, [])`: Performs a whois lookup on the given host name. You can optionally provide a resolver URL. Note that not all providers allow bulk automated whois lookups, so please follow the terms fo service for your provider. ## Testing Added unit tests. > Add DNS Lookup Functions > > > Key: DRILL-8413 > URL: https://issues.apache.org/jira/browse/DRILL-8413 > Project: Apache Drill > Issue Type: New Feature > Components: Functions - Drill >Affects Versions: 1.21.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.22 > > > This PR adds additional DNS lookup functions to Drill: > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8409) Support the configuration of bind addresses for network services
[ https://issues.apache.org/jira/browse/DRILL-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17701684#comment-17701684 ] ASF GitHub Bot commented on DRILL-8409: --- jnturton commented on PR #2777: URL: https://github.com/apache/drill/pull/2777#issuecomment-1473742795 I've added two unrelated minor changes implementing safe calls to close() methods. Currently when these calls fail due to some earlier error they drown interesting messages out in unhelpful NPE noise. > Support the configuration of bind addresses for network services > > > Key: DRILL-8409 > URL: https://issues.apache.org/jira/browse/DRILL-8409 > Project: Apache Drill > Issue Type: Bug > Components: Server >Affects Versions: 1.21.0 >Reporter: James Turton >Assignee: James Turton >Priority: Major > Fix For: 1.21.1 > > > Drill provides the DRILL_HOST_NAME env var which determines what Drillbit > host name will be exchanged over RPC for later look up by a remote client or > Drillbit. This host name is used to check whether Drill is being asked to > bind to the loopback address in distributed mode > {code:java} > if (isDistributedMode && > InetAddress.getByName(hostName).isLoopbackAddress()) { > throw new DrillbitStartupException("Drillbit is disallowed to bind to > loopback address in distributed mode."); > }{code} > but is not subsequently used set the bind address used for the Drillbit's RPC > and web ports! This issue proposes that the Drillbit network services bind > address is determined by DRILL_HOST_NAME. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8409) Support the configuration of bind addresses for network services
[ https://issues.apache.org/jira/browse/DRILL-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17701208#comment-17701208 ] ASF GitHub Bot commented on DRILL-8409: --- jnturton opened a new pull request, #2777: URL: https://github.com/apache/drill/pull/2777 # [DRILL-8409](https://issues.apache.org/jira/browse/DRILL-8409): Support the configuration of bind addresses for network services ## Description Drill provides the DRILL_HOST_NAME env var which determines what Drillbit host name will be exchanged over RPC for later look up by a remote client or Drillbit. This host name is used to check whether Drill is being asked to bind to the loopback address in distributed mode ``` if (isDistributedMode && InetAddress.getByName(hostName).isLoopbackAddress()) { throw new DrillbitStartupException("Drillbit is disallowed to bind to loopback address in distributed mode."); } ``` but is not ever used set the bind address used for the Drillbit's RPC and web ports! This PR adds new boot options ``` drill.exec.rpc.bind_addr drill.exec.http.bind_addr ``` and uses them to set the bind addresses used for RPC services and the HTTP service respectively. ## Documentation Document all three of DRILL_HOST_NAME and the two new bind address options. ## Testing Provide no bind addresses and confirm that the effective previous default (0.0.0.0) is applied. Manually set bind addresses and test that Drill is not accessible on other local addresses. > Support the configuration of bind addresses for network services > > > Key: DRILL-8409 > URL: https://issues.apache.org/jira/browse/DRILL-8409 > Project: Apache Drill > Issue Type: Bug > Components: Server >Affects Versions: 1.21.0 >Reporter: James Turton >Assignee: James Turton >Priority: Major > Fix For: 1.21.1 > > > Drill provides the DRILL_HOST_NAME env var which determines what Drillbit > host name will be exchanged over RPC for later look up by a remote client or > Drillbit. This host name is used to check whether Drill is being asked to > bind to the loopback address in distributed mode > {code:java} > if (isDistributedMode && > InetAddress.getByName(hostName).isLoopbackAddress()) { > throw new DrillbitStartupException("Drillbit is disallowed to bind to > loopback address in distributed mode."); > }{code} > but is not subsequently used set the bind address used for the Drillbit's RPC > and web ports! This issue proposes that the Drillbit network services bind > address is determined by DRILL_HOST_NAME. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8412) Upgrade to Calcite 1.35-SNAPSHOT
[ https://issues.apache.org/jira/browse/DRILL-8412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17701147#comment-17701147 ] ASF GitHub Bot commented on DRILL-8412: --- jnturton opened a new pull request, #2776: URL: https://github.com/apache/drill/pull/2776 # [DRILL-8412](https://issues.apache.org/jira/browse/DRILL-8412): Upgrade to Calcite 1.35-SNAPSHOT ## Description If we're willing to try basing Drill master on snapshot builds of the upcoming version of Calcite for a while then here's a PR to do that. Please see the related discussion in Drill and Calcite mailing lists this week for more information. Important notes. 1. The CI tests that run automatically upon commits to Drill master will exercise Drill with present day Calcite. 2. Breaking changes in Calcite would (mostly) break the Drill CI and force us to deal with them in order to proceed. 3. Regressions in Calcite would (mostly) break the Drill CI and force to report them in order to proceed. 4. If Drill master becomes too unstable when it is based on Calcite snapshots then this PR is trivially undoable. ## Documentation N/A ## Testing Existing unit test suite. > Upgrade to Calcite 1.35-SNAPSHOT > > > Key: DRILL-8412 > URL: https://issues.apache.org/jira/browse/DRILL-8412 > Project: Apache Drill > Issue Type: Task > Components: SQL Parser >Reporter: James Turton >Assignee: James Turton >Priority: Major > > This issue proposes that we try basing Drill master on snapshot builds of the > upcoming version of Calcite so that the CI tests that run automatically upon > commits to master will exercise Drill with present day Calcite. > Breaking changes in Calcite would (mostly) break the Drill CI and force us to > deal with them in order to proceed. > Regressions in Calcite would (mostly) break the Drill CI and force to report > them in order to proceed. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8410) Upgrade to Calcite 1.34
[ https://issues.apache.org/jira/browse/DRILL-8410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17701139#comment-17701139 ] ASF GitHub Bot commented on DRILL-8410: --- jnturton merged PR #2775: URL: https://github.com/apache/drill/pull/2775 > Upgrade to Calcite 1.34 > --- > > Key: DRILL-8410 > URL: https://issues.apache.org/jira/browse/DRILL-8410 > Project: Apache Drill > Issue Type: Improvement > Components: library >Affects Versions: 1.21.0 >Reporter: James Turton >Assignee: James Turton >Priority: Major > Fix For: 1.21.1 > > > Calcite 1.34 includes > # a fix for the currently broken date_trunc function in Drill > # support for a new QUALIFY clause in windows functions > # incompatible core parser grammar changes that break date_diff in Drill. > Because of (3), Drill needs to make temporary use of a modified Parser.jj > until Calcite backs out the mentioned parser changes. See the linked Calcite > issues for more details. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8410) Upgrade to Calcite 1.34
[ https://issues.apache.org/jira/browse/DRILL-8410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17701101#comment-17701101 ] ASF GitHub Bot commented on DRILL-8410: --- jnturton commented on PR #2775: URL: https://github.com/apache/drill/pull/2775#issuecomment-1471784565 It would probably be possible to create patch-based version of this PR that makes use of [maven-patch-plugin](https://maven.apache.org/plugins/maven-patch-plugin/) and has a much lower line count. On the other hand it's not inconceivable that we decide to maintain our Parser.jj. > Upgrade to Calcite 1.34 > --- > > Key: DRILL-8410 > URL: https://issues.apache.org/jira/browse/DRILL-8410 > Project: Apache Drill > Issue Type: Improvement > Components: library >Affects Versions: 1.21.0 >Reporter: James Turton >Assignee: James Turton >Priority: Major > Fix For: 1.21.1 > > > Calcite 1.34 includes > # a fix for the currently broken date_trunc function in Drill > # support for a new QUALIFY clause in windows functions > # incompatible core parser grammar changes that break date_diff in Drill. > Because of (3), Drill needs to make temporary use of a modified Parser.jj > until Calcite backs out the mentioned parser changes. See the linked Calcite > issues for more details. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8393) Allow parameters to be passed to headers through SQL in WHERE clause
[ https://issues.apache.org/jira/browse/DRILL-8393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17700914#comment-17700914 ] ASF GitHub Bot commented on DRILL-8393: --- LYCJeff commented on PR #2747: URL: https://github.com/apache/drill/pull/2747#issuecomment-1471106650 > @LYCJeff Thanks for making these changes. I have a few questions: > > 1. Are you certain that these filters are in fact being pushed down as intended? > 2. I'm really concerned about what would happen if a user aliased a data source as `header` or `tail`. > > IE: > > ```sql > SELECT ... > FROM api.foo > INNER JOIN dfs.`tail.csv` AS tail > ON tail.id = foo.id > WHERE tail.name = 'something' > ``` > > Do we know how this would be interpreted? Well, we actually need to recognize `header.xxx` as a whole parameter name, so we need to use back quotes. Only then can it be pushed normally, so these prefixes are not confused with data source aliases. If the `name` in your example above is an argument to the `foo` api, it should be written as follows. ```sql SELECT ... FROM api.foo INNER JOIN dfs.`tail.csv` AS tail ON tail.id=foo.id WHERE `tail.name` = 'something' ``` > Allow parameters to be passed to headers through SQL in WHERE clause > > > Key: DRILL-8393 > URL: https://issues.apache.org/jira/browse/DRILL-8393 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - HTTP >Affects Versions: 1.20.0 >Reporter: Yuchen Liang >Priority: Major > > Some APIs require parameters (e.g. digital signature) in the headers to be > generated at access time.So I'm wondering if we can pass it in through filter > statement. > Perhaps we could design it like the params field in connections parameter. > For example: > > Config: > { "url": "https://api.sunrise-sunset.org/json";, "requireTail": false, > "params": ["body.lat", "body.lng", "body.date", "header.header1"], > "parameterLocation": "json_body" } > > SQL Query: > SELECT * FROM api.sunrise > WHERE `body.lat` = 36.7201600 > AND `body.lng` = -4.4203400 > AND `body.date` = '2019-10-02' > AND `header.header1` = 'value1'; > > Post body: > { "lat": 36.7201600, "lng": -4.4203400, "date": "2019-10-02"} > > Headers: > { "header1": "value1", ……} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8393) Allow parameters to be passed to headers through SQL in WHERE clause
[ https://issues.apache.org/jira/browse/DRILL-8393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17700907#comment-17700907 ] ASF GitHub Bot commented on DRILL-8393: --- cgivre commented on PR #2747: URL: https://github.com/apache/drill/pull/2747#issuecomment-1471002379 @LYCJeff Thanks for making these changes. I have a few questions: 1. Are you certain that these filters are in fact being pushed down as intended? 2. I'm really concerned about what would happen if a user aliased a data source as `header` or `tail`. IE: ```sql SELECT ... FROM api.foo INNER JOIN dfs.`tail.csv` AS tail ON tail.id = foo.id WHERE tail.name = 'something' ``` Do we know how this would be interpreted? > Allow parameters to be passed to headers through SQL in WHERE clause > > > Key: DRILL-8393 > URL: https://issues.apache.org/jira/browse/DRILL-8393 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - HTTP >Affects Versions: 1.20.0 >Reporter: Yuchen Liang >Priority: Major > > Some APIs require parameters (e.g. digital signature) in the headers to be > generated at access time.So I'm wondering if we can pass it in through filter > statement. > Perhaps we could design it like the params field in connections parameter. > For example: > > Config: > { "url": "https://api.sunrise-sunset.org/json";, "requireTail": false, > "params": ["body.lat", "body.lng", "body.date", "header.header1"], > "parameterLocation": "json_body" } > > SQL Query: > SELECT * FROM api.sunrise > WHERE `body.lat` = 36.7201600 > AND `body.lng` = -4.4203400 > AND `body.date` = '2019-10-02' > AND `header.header1` = 'value1'; > > Post body: > { "lat": 36.7201600, "lng": -4.4203400, "date": "2019-10-02"} > > Headers: > { "header1": "value1", ……} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8411) GoogleSheets Reader Will Not Read More than 1K Rows
[ https://issues.apache.org/jira/browse/DRILL-8411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17700797#comment-17700797 ] ASF GitHub Bot commented on DRILL-8411: --- cgivre merged PR #2774: URL: https://github.com/apache/drill/pull/2774 > GoogleSheets Reader Will Not Read More than 1K Rows > --- > > Key: DRILL-8411 > URL: https://issues.apache.org/jira/browse/DRILL-8411 > Project: Apache Drill > Issue Type: Bug > Components: Storage - GoogleSheets >Affects Versions: 1.21.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.21.1 > > > The GoogleSheets reader hits the batch limit from the GoogleSheets SDK of > 1000 rows and stops. This PR fixes that. > It also fixes a minor but annoying issue whereby the GoogleSheets reader > determines a column is a date/time, but is then unable to parse it because it > is in a non-standard format. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8410) Upgrade to Calcite 1.34
[ https://issues.apache.org/jira/browse/DRILL-8410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17700673#comment-17700673 ] ASF GitHub Bot commented on DRILL-8410: --- jnturton opened a new pull request, #2775: URL: https://github.com/apache/drill/pull/2775 # [DRILL-8410](https://issues.apache.org/jira/browse/DRILL-8410): Upgrade to Calcite 1.34 ## Description Calcite 1.34 includes 1.a fix for the currently broken date_trunc function in Drill 2. support for a new QUALIFY clause in windows functions 3. incompatible core parser grammar changes that break date_diff in Drill. Because of (3), Drill needs to make temporary use of a modified Parser.jj until Calcite backs out the mentioned parser changes. See the linked Calcite issues for more details. Normally it would be undesirable to backport the new QUALIFY clause but, short of setting up cherry picking from Calcite, getting the fix for the regression in DATE_TRUNC forces the addition of support for QUALIFY. Calcite does not do seperate bugfix releases. ## Documentation Document the new QUALIFY clause. ## Testing - Existing unit tests of DATE_TRUNC. - Existing unit tests of DATE_DIFF. - New unit test of QUALIFY. > Upgrade to Calcite 1.34 > --- > > Key: DRILL-8410 > URL: https://issues.apache.org/jira/browse/DRILL-8410 > Project: Apache Drill > Issue Type: Improvement > Components: library >Affects Versions: 1.21.0 >Reporter: James Turton >Assignee: James Turton >Priority: Major > Fix For: 1.21.1 > > > Calcite 1.34 includes > # a fix for the currently broken date_trunc function in Drill > # support for a new QUALIFY clause in windows functions > # incompatible core parser grammar changes that break date_diff in Drill. > Because of (3), Drill needs to make temporary use of a modified Parser.jj > until Calcite backs out the mentioned parser changes. See the linked Calcite > issues for more details. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8411) GoogleSheets Reader Will Not Read More than 1K Rows
[ https://issues.apache.org/jira/browse/DRILL-8411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17700476#comment-17700476 ] ASF GitHub Bot commented on DRILL-8411: --- cgivre opened a new pull request, #2774: URL: https://github.com/apache/drill/pull/2774 # [DRILL-8411](https://issues.apache.org/jira/browse/DRILL-8411): GoogleSheets Reader Will Not Read More than 1K Rows ## Description The GoogleSheets reader hits the batch limit from the GoogleSheets SDK of 1000 rows and stops. This PR fixes that. It also fixes a minor but annoying issue whereby the GoogleSheets reader determines a column is a date/time, but is then unable to parse it because it is in a non-standard format. ## Documentation N/A ## Testing Ran existing unit tests and tested manually. > GoogleSheets Reader Will Not Read More than 1K Rows > --- > > Key: DRILL-8411 > URL: https://issues.apache.org/jira/browse/DRILL-8411 > Project: Apache Drill > Issue Type: Bug > Components: Storage - GoogleSheets >Affects Versions: 1.21.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.21.1 > > > The GoogleSheets reader hits the batch limit from the GoogleSheets SDK of > 1000 rows and stops. This PR fixes that. > It also fixes a minor but annoying issue whereby the GoogleSheets reader > determines a column is a date/time, but is then unable to parse it because it > is in a non-standard format. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8408) Allow Implicit Casts on Join
[ https://issues.apache.org/jira/browse/DRILL-8408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17699293#comment-17699293 ] ASF GitHub Bot commented on DRILL-8408: --- cgivre merged PR #2772: URL: https://github.com/apache/drill/pull/2772 > Allow Implicit Casts on Join > > > Key: DRILL-8408 > URL: https://issues.apache.org/jira/browse/DRILL-8408 > Project: Apache Drill > Issue Type: Improvement > Components: Execution - Data Types >Affects Versions: 1.21.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.21.1 > > > Currently, Drill does not allow implicit casts on joins. With DRILL-8136, > this has been significantly improved, and it might make sense to do so. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8408) Allow Implicit Casts on Join
[ https://issues.apache.org/jira/browse/DRILL-8408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17699116#comment-17699116 ] ASF GitHub Bot commented on DRILL-8408: --- cgivre commented on PR #2772: URL: https://github.com/apache/drill/pull/2772#issuecomment-1464476318 > I agree that it would be nice to be able to switch this on or off using a session option. And I wonder if we should begin with that option defaulted to false so that we can > > 1. include this in 1.21.x and > 2. collect some experience from opt-ins (like ourselves) about whether such joins turn out to be badly behaved, before exposing out-of-the-box users to it. I added a new exec option defaulted to `false`. > Allow Implicit Casts on Join > > > Key: DRILL-8408 > URL: https://issues.apache.org/jira/browse/DRILL-8408 > Project: Apache Drill > Issue Type: Improvement > Components: Execution - Data Types >Affects Versions: 1.21.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.21.1 > > > Currently, Drill does not allow implicit casts on joins. With DRILL-8136, > this has been significantly improved, and it might make sense to do so. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8408) Allow Implicit Casts on Join
[ https://issues.apache.org/jira/browse/DRILL-8408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17699024#comment-17699024 ] ASF GitHub Bot commented on DRILL-8408: --- jnturton commented on PR #2772: URL: https://github.com/apache/drill/pull/2772#issuecomment-1464068067 I agree that it would be nice to be able to switch this on or off using a session option. And I wonder if we should begin with that option defaulted to false so that we can 1. include this in 1.21.x and 2. collect some experience from opt-ins (like ourselves) about whether such joins turn out to be badly behaved, before exposing out-of-the-box users to it. > Allow Implicit Casts on Join > > > Key: DRILL-8408 > URL: https://issues.apache.org/jira/browse/DRILL-8408 > Project: Apache Drill > Issue Type: Improvement > Components: Execution - Data Types >Affects Versions: 1.21.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.21.1 > > > Currently, Drill does not allow implicit casts on joins. With DRILL-8136, > this has been significantly improved, and it might make sense to do so. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8408) Allow Implicit Casts on Join
[ https://issues.apache.org/jira/browse/DRILL-8408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17698368#comment-17698368 ] ASF GitHub Bot commented on DRILL-8408: --- cgivre commented on PR #2772: URL: https://github.com/apache/drill/pull/2772#issuecomment-1462040949 @jnturton @vvysotskyi I don't know if these checks were there for a reason or not, but with the improved implicit casting from DRILL-8316, this PR seems to work. If there's a performance reason we shouldn't do this, I was thinking that we could add an exec option to enable/disable this functionality. > Allow Implicit Casts on Join > > > Key: DRILL-8408 > URL: https://issues.apache.org/jira/browse/DRILL-8408 > Project: Apache Drill > Issue Type: Improvement > Components: Execution - Data Types >Affects Versions: 1.21.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.21.1 > > > Currently, Drill does not allow implicit casts on joins. With DRILL-8136, > this has been significantly improved, and it might make sense to do so. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8408) Allow Implicit Casts on Join
[ https://issues.apache.org/jira/browse/DRILL-8408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17698170#comment-17698170 ] ASF GitHub Bot commented on DRILL-8408: --- cgivre opened a new pull request, #2772: URL: https://github.com/apache/drill/pull/2772 # [DRILL-8048](https://issues.apache.org/jira/browse/DRILL-8408): Allow Implicit Casts on Join ## Description With the revision of Drill's implicit casting rules as a part of DRILL-8136, Drill now supports much improved implicit casting logic. However, that does not carry over to joins. This PR allows the implicit casting to carry through to joins as well. ## Documentation N/A ## Testing Ran existing unit tests > Allow Implicit Casts on Join > > > Key: DRILL-8408 > URL: https://issues.apache.org/jira/browse/DRILL-8408 > Project: Apache Drill > Issue Type: Improvement > Components: Execution - Data Types >Affects Versions: 1.21.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.21.1 > > > Currently, Drill does not allow implicit casts on joins. With DRILL-8136, > this has been significantly improved, and it might make sense to do so. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8407) Add Support for SFTP File Systems
[ https://issues.apache.org/jira/browse/DRILL-8407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17696939#comment-17696939 ] ASF GitHub Bot commented on DRILL-8407: --- cgivre merged PR #2770: URL: https://github.com/apache/drill/pull/2770 > Add Support for SFTP File Systems > - > > Key: DRILL-8407 > URL: https://issues.apache.org/jira/browse/DRILL-8407 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - File >Affects Versions: 1.20.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: Future > > > Add support for SFTP File Systems. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8407) Add Support for SFTP File Systems
[ https://issues.apache.org/jira/browse/DRILL-8407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17696593#comment-17696593 ] ASF GitHub Bot commented on DRILL-8407: --- cgivre opened a new pull request, #2770: URL: https://github.com/apache/drill/pull/2770 # [DRILL-8407](https://issues.apache.org/jira/browse/DRILL-): Add Support for SFTP File Systems ## Description This PR enables Drill to query files stored in SFTP file systems. ## Documentation An SFTP file system behaves exactly as any other file system. ## Configuration To query data from an SFTP file system, follow the instructions for any other file system. For the URL, provide the host as shown below: ```json { "type": "file", "connection": "sftp://", "workspaces": { "test": { "location": "", "writable": true, "defaultInputFormat": null, "allowAccessOutsideWorkspace": false }, ... ``` ### Authentication The SFTP plugin requires a username and password to authenticate. The best way to do this is to provide the information via a `credentialProvider` as shown below. SFTP file systems can be used with `USER_TRANSLATION` enabled, but not `USER_IMPERSONATION`. ```json "credentialsProvider": { "credentialsProviderType": "PlainCredentialsProvider", "credentials": { "username": "", "password": "" }, "userCredentials": {} }, ``` If you need to pass additional configuration variables to the SFTP server, you can do so in the `config` parameter in the file system. You will need to prefix any parameters with `fs.sftp`. ## Testing Manually Tested > Add Support for SFTP File Systems > - > > Key: DRILL-8407 > URL: https://issues.apache.org/jira/browse/DRILL-8407 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - File >Affects Versions: 1.20.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: Future > > > Add support for SFTP File Systems. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8393) Allow parameters to be passed to headers through SQL in WHERE clause
[ https://issues.apache.org/jira/browse/DRILL-8393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17695578#comment-17695578 ] ASF GitHub Bot commented on DRILL-8393: --- LYCJeff commented on PR #2747: URL: https://github.com/apache/drill/pull/2747#issuecomment-1451555417 > Two ideas > > 1. Since we won't backport this PR and it will only go out in the next major release, some breakage inside a plugin is probably something that can be swallowed. > 2. If it is still desired to preserve the ability to use the existing syntax in Drill 1.22 and beyond then a storage config option like `"useLegacyRequestParmSyntax": true` could be added for users who want it. @jnturton @cgivre That's a good idea without confusing old and new syntax, although it requires existing users to make small additions to the configuration. If it is acceptable to you, I will take some time to add a configuration item in the near future. > Allow parameters to be passed to headers through SQL in WHERE clause > > > Key: DRILL-8393 > URL: https://issues.apache.org/jira/browse/DRILL-8393 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - HTTP >Affects Versions: 1.20.0 >Reporter: Yuchen Liang >Priority: Major > > Some APIs require parameters (e.g. digital signature) in the headers to be > generated at access time.So I'm wondering if we can pass it in through filter > statement. > Perhaps we could design it like the params field in connections parameter. > For example: > > Config: > { "url": "https://api.sunrise-sunset.org/json";, "requireTail": false, > "params": ["body.lat", "body.lng", "body.date", "header.header1"], > "parameterLocation": "json_body" } > > SQL Query: > SELECT * FROM api.sunrise > WHERE `body.lat` = 36.7201600 > AND `body.lng` = -4.4203400 > AND `body.date` = '2019-10-02' > AND `header.header1` = 'value1'; > > Post body: > { "lat": 36.7201600, "lng": -4.4203400, "date": "2019-10-02"} > > Headers: > { "header1": "value1", ……} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8393) Allow parameters to be passed to headers through SQL in WHERE clause
[ https://issues.apache.org/jira/browse/DRILL-8393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17695531#comment-17695531 ] ASF GitHub Bot commented on DRILL-8393: --- jnturton commented on PR #2747: URL: https://github.com/apache/drill/pull/2747#issuecomment-1451459573 Two ideas 1. Since we won't backport this PR and it will only go out in the next major release, some breakage inside a plugin is probably something that can be swallowed. 2. If it is still desired to preserve the ability to use the existing syntax in Drill 1.22 and beyond then a storage config option like `"useLegacyRequestParmSyntax": true` could be added for users who want it. > Allow parameters to be passed to headers through SQL in WHERE clause > > > Key: DRILL-8393 > URL: https://issues.apache.org/jira/browse/DRILL-8393 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - HTTP >Affects Versions: 1.20.0 >Reporter: Yuchen Liang >Priority: Major > > Some APIs require parameters (e.g. digital signature) in the headers to be > generated at access time.So I'm wondering if we can pass it in through filter > statement. > Perhaps we could design it like the params field in connections parameter. > For example: > > Config: > { "url": "https://api.sunrise-sunset.org/json";, "requireTail": false, > "params": ["body.lat", "body.lng", "body.date", "header.header1"], > "parameterLocation": "json_body" } > > SQL Query: > SELECT * FROM api.sunrise > WHERE `body.lat` = 36.7201600 > AND `body.lng` = -4.4203400 > AND `body.date` = '2019-10-02' > AND `header.header1` = 'value1'; > > Post body: > { "lat": 36.7201600, "lng": -4.4203400, "date": "2019-10-02"} > > Headers: > { "header1": "value1", ……} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8405) Upgrade to snakeyaml 2.0 due to CVE
[ https://issues.apache.org/jira/browse/DRILL-8405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17695365#comment-17695365 ] ASF GitHub Bot commented on DRILL-8405: --- cgivre merged PR #2767: URL: https://github.com/apache/drill/pull/2767 > Upgrade to snakeyaml 2.0 due to CVE > --- > > Key: DRILL-8405 > URL: https://issues.apache.org/jira/browse/DRILL-8405 > Project: Apache Drill > Issue Type: Task >Reporter: PJ Fanning >Priority: Major > > https://bitbucket.org/snakeyaml/snakeyaml/issues/561/cve-2022-1471-vulnerability-in -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8405) Upgrade to snakeyaml 2.0 due to CVE
[ https://issues.apache.org/jira/browse/DRILL-8405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17695303#comment-17695303 ] ASF GitHub Bot commented on DRILL-8405: --- cgivre commented on PR #2767: URL: https://github.com/apache/drill/pull/2767#issuecomment-1450778034 @pjfanning I'll keep an eye on it, but it looks good. I'll restart if it times out. > Upgrade to snakeyaml 2.0 due to CVE > --- > > Key: DRILL-8405 > URL: https://issues.apache.org/jira/browse/DRILL-8405 > Project: Apache Drill > Issue Type: Task >Reporter: PJ Fanning >Priority: Major > > https://bitbucket.org/snakeyaml/snakeyaml/issues/561/cve-2022-1471-vulnerability-in -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8405) Upgrade to snakeyaml 2.0 due to CVE
[ https://issues.apache.org/jira/browse/DRILL-8405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17695302#comment-17695302 ] ASF GitHub Bot commented on DRILL-8405: --- pjfanning commented on PR #2767: URL: https://github.com/apache/drill/pull/2767#issuecomment-1450776742 @cgivre one of the CI subtasks is taking a bit longer to complete but it looks like using the new liquibase jar has fixed this general issue > Upgrade to snakeyaml 2.0 due to CVE > --- > > Key: DRILL-8405 > URL: https://issues.apache.org/jira/browse/DRILL-8405 > Project: Apache Drill > Issue Type: Task >Reporter: PJ Fanning >Priority: Major > > https://bitbucket.org/snakeyaml/snakeyaml/issues/561/cve-2022-1471-vulnerability-in -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8405) Upgrade to snakeyaml 2.0 due to CVE
[ https://issues.apache.org/jira/browse/DRILL-8405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17695235#comment-17695235 ] ASF GitHub Bot commented on DRILL-8405: --- pjfanning commented on PR #2767: URL: https://github.com/apache/drill/pull/2767#issuecomment-145069 https://github.com/liquibase/liquibase/issues/3617#issuecomment-1450560162 > Upgrade to snakeyaml 2.0 due to CVE > --- > > Key: DRILL-8405 > URL: https://issues.apache.org/jira/browse/DRILL-8405 > Project: Apache Drill > Issue Type: Task >Reporter: PJ Fanning >Priority: Major > > https://bitbucket.org/snakeyaml/snakeyaml/issues/561/cve-2022-1471-vulnerability-in -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8393) Allow parameters to be passed to headers through SQL in WHERE clause
[ https://issues.apache.org/jira/browse/DRILL-8393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694879#comment-17694879 ] ASF GitHub Bot commented on DRILL-8393: --- LYCJeff commented on PR #2747: URL: https://github.com/apache/drill/pull/2747#issuecomment-1449446242 > @LYCJeff I really like the functionality here, but I am concerned that this is a breaking change and will affect existing Drill users. Also, it adds effectively new syntax to the SQL queries. @cgivre At this point, I can pass the unprefixed parameters in their place by default, the way they were. This minimizes the impact on existing users, except in the following cases. For example, the argument that the user passed into the request body was called `header.xxx`, but now needs to be rewritten as `body.header.xxx`, otherwise the argument will be passed into the request header. In addition, a problem that had been fixed would reappear. The argument that is passed to the url path is also passed to the end of the url, which has been clearly distinguished since I changed it. Let me know if you think this is more friendly to existing users, then I'll move in this direction. > Allow parameters to be passed to headers through SQL in WHERE clause > > > Key: DRILL-8393 > URL: https://issues.apache.org/jira/browse/DRILL-8393 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - HTTP >Affects Versions: 1.20.0 >Reporter: Yuchen Liang >Priority: Major > > Some APIs require parameters (e.g. digital signature) in the headers to be > generated at access time.So I'm wondering if we can pass it in through filter > statement. > Perhaps we could design it like the params field in connections parameter. > For example: > > Config: > { "url": "https://api.sunrise-sunset.org/json";, "requireTail": false, > "params": ["body.lat", "body.lng", "body.date", "header.header1"], > "parameterLocation": "json_body" } > > SQL Query: > SELECT * FROM api.sunrise > WHERE `body.lat` = 36.7201600 > AND `body.lng` = -4.4203400 > AND `body.date` = '2019-10-02' > AND `header.header1` = 'value1'; > > Post body: > { "lat": 36.7201600, "lng": -4.4203400, "date": "2019-10-02"} > > Headers: > { "header1": "value1", ……} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8406) Enable implicit casting of VARCHAR and BIT args in aggregate functions
[ https://issues.apache.org/jira/browse/DRILL-8406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694707#comment-17694707 ] ASF GitHub Bot commented on DRILL-8406: --- cgivre merged PR #2768: URL: https://github.com/apache/drill/pull/2768 > Enable implicit casting of VARCHAR and BIT args in aggregate functions > -- > > Key: DRILL-8406 > URL: https://issues.apache.org/jira/browse/DRILL-8406 > Project: Apache Drill > Issue Type: Improvement > Components: Functions - Drill >Affects Versions: 1.21.0 >Reporter: James Turton >Assignee: James Turton >Priority: Minor > Fix For: 1.21.1 > > > Default function implementations that that throw unsupported operation > exceptions in the class AggregateErrorFunctions prevent the implicit casting > of VARCHAR and BIT arguments to neighbouring types. E.g. > {code:java} > apache drill> select sum('1'); > Error: UNSUPPORTED_OPERATION ERROR: Only COUNT, MIN and MAX aggregate > functions supported for VarChar type{code} > This issue proposes to remove AggregateErrorFunctions so that implicit > casting works, the example above changing as follows. > {code:java} > apache drill> select sum('1'); > EXPR$0 1 > 1 row selected (2.346 seconds) > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8393) Allow parameters to be passed to headers through SQL in WHERE clause
[ https://issues.apache.org/jira/browse/DRILL-8393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694615#comment-17694615 ] ASF GitHub Bot commented on DRILL-8393: --- cgivre commented on PR #2747: URL: https://github.com/apache/drill/pull/2747#issuecomment-1448388913 @LYCJeff I really like the functionality here, but I am concerned that this is a breaking change and will affect existing Drill users. Also, it adds effectively new syntax to the SQL queries. > Allow parameters to be passed to headers through SQL in WHERE clause > > > Key: DRILL-8393 > URL: https://issues.apache.org/jira/browse/DRILL-8393 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - HTTP >Affects Versions: 1.20.0 >Reporter: Yuchen Liang >Priority: Major > > Some APIs require parameters (e.g. digital signature) in the headers to be > generated at access time.So I'm wondering if we can pass it in through filter > statement. > Perhaps we could design it like the params field in connections parameter. > For example: > > Config: > { "url": "https://api.sunrise-sunset.org/json";, "requireTail": false, > "params": ["body.lat", "body.lng", "body.date", "header.header1"], > "parameterLocation": "json_body" } > > SQL Query: > SELECT * FROM api.sunrise > WHERE `body.lat` = 36.7201600 > AND `body.lng` = -4.4203400 > AND `body.date` = '2019-10-02' > AND `header.header1` = 'value1'; > > Post body: > { "lat": 36.7201600, "lng": -4.4203400, "date": "2019-10-02"} > > Headers: > { "header1": "value1", ……} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8405) Upgrade to snakeyaml 2.0 due to CVE
[ https://issues.apache.org/jira/browse/DRILL-8405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694613#comment-17694613 ] ASF GitHub Bot commented on DRILL-8405: --- cgivre commented on PR #2767: URL: https://github.com/apache/drill/pull/2767#issuecomment-1448370911 @pjfanning I'm going to convert this to draft status until we can update liquibase. > Upgrade to snakeyaml 2.0 due to CVE > --- > > Key: DRILL-8405 > URL: https://issues.apache.org/jira/browse/DRILL-8405 > Project: Apache Drill > Issue Type: Task >Reporter: PJ Fanning >Priority: Major > > https://bitbucket.org/snakeyaml/snakeyaml/issues/561/cve-2022-1471-vulnerability-in -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8405) Upgrade to snakeyaml 2.0 due to CVE
[ https://issues.apache.org/jira/browse/DRILL-8405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694549#comment-17694549 ] ASF GitHub Bot commented on DRILL-8405: --- pjfanning commented on PR #2767: URL: https://github.com/apache/drill/pull/2767#issuecomment-1448205614 Need to wait for liquibase to upgrade their lib. I don't know if snakeyaml is used elsewhere in Drill. If it is, it may be possible to upgrade snakeyaml in some places and keep the old version where liquibase is used. > Upgrade to snakeyaml 2.0 due to CVE > --- > > Key: DRILL-8405 > URL: https://issues.apache.org/jira/browse/DRILL-8405 > Project: Apache Drill > Issue Type: Task >Reporter: PJ Fanning >Priority: Major > > https://bitbucket.org/snakeyaml/snakeyaml/issues/561/cve-2022-1471-vulnerability-in -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8405) Upgrade to snakeyaml 2.0 due to CVE
[ https://issues.apache.org/jira/browse/DRILL-8405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694531#comment-17694531 ] ASF GitHub Bot commented on DRILL-8405: --- cgivre commented on PR #2767: URL: https://github.com/apache/drill/pull/2767#issuecomment-1448156621 @pjfanning Is there any workaround for this? > Upgrade to snakeyaml 2.0 due to CVE > --- > > Key: DRILL-8405 > URL: https://issues.apache.org/jira/browse/DRILL-8405 > Project: Apache Drill > Issue Type: Task >Reporter: PJ Fanning >Priority: Major > > https://bitbucket.org/snakeyaml/snakeyaml/issues/561/cve-2022-1471-vulnerability-in -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8406) Enable implicit casting of VARCHAR and BIT args in aggregate functions
[ https://issues.apache.org/jira/browse/DRILL-8406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694439#comment-17694439 ] ASF GitHub Bot commented on DRILL-8406: --- jnturton opened a new pull request, #2768: URL: https://github.com/apache/drill/pull/2768 # [DRILL-8406](https://issues.apache.org/jira/browse/DRILL-8406): Enable implicit casting of VARCHAR and BIT args in aggregate functions ## Description Default function implementations that that throw unsupported operation exceptions in the class AggregateErrorFunctions prevent the implicit casting of VARCHAR and BIT arguments to neighbouring types. E.g. ``` apache drill> select sum('1'); Error: UNSUPPORTED_OPERATION ERROR: Only COUNT, MIN and MAX aggregate functions supported for VarChar type ``` This PR removes AggregateErrorFunctions so that implicit casting works, the example above changing as follows. ``` apache drill> select sum('1'); EXPR$0 1 1 row selected (2.346 seconds) ``` ## Documentation N/A ## Testing New unit test. > Enable implicit casting of VARCHAR and BIT args in aggregate functions > -- > > Key: DRILL-8406 > URL: https://issues.apache.org/jira/browse/DRILL-8406 > Project: Apache Drill > Issue Type: Improvement > Components: Functions - Drill >Affects Versions: 1.21.0 >Reporter: James Turton >Assignee: James Turton >Priority: Minor > Fix For: 1.21.1 > > > Default function implementations that that throw unsupported operation > exceptions in the class AggregateErrorFunctions prevent the implicit casting > of VARCHAR and BIT arguments to neighbouring types. E.g. > {code:java} > apache drill> select sum('1'); > Error: UNSUPPORTED_OPERATION ERROR: Only COUNT, MIN and MAX aggregate > functions supported for VarChar type{code} > This issue proposes to remove AggregateErrorFunctions so that implicit > casting works, the example above changing as follows. > {code:java} > apache drill> select sum('1'); > EXPR$0 1 > 1 row selected (2.346 seconds) > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8158) Remove non-reproducible build outputs
[ https://issues.apache.org/jira/browse/DRILL-8158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17693724#comment-17693724 ] ASF GitHub Bot commented on DRILL-8158: --- cgivre merged PR #2766: URL: https://github.com/apache/drill/pull/2766 > Remove non-reproducible build outputs > - > > Key: DRILL-8158 > URL: https://issues.apache.org/jira/browse/DRILL-8158 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.20.0 >Reporter: Herve Boutemy >Assignee: James Turton >Priority: Major > Fix For: 1.20.2 > > > For context see [1] and [2]. The git-commit-id plugin includes information > like build host, email and time which is not compatible with a reproducible > build. Drill's built in sys.version table will return the build email and > time if they are present in the build's git.properties file so these columns > must be deprecated. Other useful Git-related information is retained. > In accompanying commits, some Kerberos unit test fixes are applied, and the > tests reenabled, and some updates to Release.md are included. > [1] [https://maven.apache.org/guides/mini/guide-reproducible-builds.html] > [2] > [https://github.com/jvm-repo-rebuild/reproducible-central#org.apache.drill:drill-root] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8405) upgrade to snakeyaml 2.0 due to cve
[ https://issues.apache.org/jira/browse/DRILL-8405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17693700#comment-17693700 ] ASF GitHub Bot commented on DRILL-8405: --- pjfanning commented on PR #2767: URL: https://github.com/apache/drill/pull/2767#issuecomment-1445455789 It looks like Liquibase uses a snakeyaml 1.0 API call that is not supported in snaleyaml 2.0. ``` 2023-02-26T15:12:21.4680779Z Caused by: java.lang.NoSuchMethodError: org.yaml.snakeyaml.constructor.SafeConstructor: method ()V not found 2023-02-26T15:12:21.4681347Z at liquibase.parser.core.yaml.YamlChangeLogParser.parse(YamlChangeLogParser.java:23) 2023-02-26T15:12:21.4681830Z at liquibase.Liquibase.getDatabaseChangeLog(Liquibase.java:369) ``` > upgrade to snakeyaml 2.0 due to cve > --- > > Key: DRILL-8405 > URL: https://issues.apache.org/jira/browse/DRILL-8405 > Project: Apache Drill > Issue Type: Task >Reporter: PJ Fanning >Priority: Major > > https://bitbucket.org/snakeyaml/snakeyaml/issues/561/cve-2022-1471-vulnerability-in -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8158) Remove non-reproducible build outputs
[ https://issues.apache.org/jira/browse/DRILL-8158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17693675#comment-17693675 ] ASF GitHub Bot commented on DRILL-8158: --- hboutemy commented on PR #2766: URL: https://github.com/apache/drill/pull/2766#issuecomment-1445406594 I'd love that it could be feasible, but I don't think CI is able to check reproducibility another aspect is that we currently have no regression, but just fixes that are done step by step: once we have fixed one issue that creates a lot of noise, next release shows issues that are less noisy, then were not much visible before IMHO, we just need to accept that for such big project, having a build that is fully reproducible requires multiple iterations: that's not unexpected I'm confident that once this PR is merged, the remaining issues will impact much less content > Remove non-reproducible build outputs > - > > Key: DRILL-8158 > URL: https://issues.apache.org/jira/browse/DRILL-8158 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.20.0 >Reporter: Herve Boutemy >Assignee: James Turton >Priority: Major > Fix For: 1.20.2 > > > For context see [1] and [2]. The git-commit-id plugin includes information > like build host, email and time which is not compatible with a reproducible > build. Drill's built in sys.version table will return the build email and > time if they are present in the build's git.properties file so these columns > must be deprecated. Other useful Git-related information is retained. > In accompanying commits, some Kerberos unit test fixes are applied, and the > tests reenabled, and some updates to Release.md are included. > [1] [https://maven.apache.org/guides/mini/guide-reproducible-builds.html] > [2] > [https://github.com/jvm-repo-rebuild/reproducible-central#org.apache.drill:drill-root] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8405) upgrade to snakeyaml 2.0 due to cve
[ https://issues.apache.org/jira/browse/DRILL-8405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17693667#comment-17693667 ] ASF GitHub Bot commented on DRILL-8405: --- cgivre commented on PR #2767: URL: https://github.com/apache/drill/pull/2767#issuecomment-1445393983 Ugh.. it looks like the new library broke something. Disregard approval. :-( > upgrade to snakeyaml 2.0 due to cve > --- > > Key: DRILL-8405 > URL: https://issues.apache.org/jira/browse/DRILL-8405 > Project: Apache Drill > Issue Type: Task >Reporter: PJ Fanning >Priority: Major > > https://bitbucket.org/snakeyaml/snakeyaml/issues/561/cve-2022-1471-vulnerability-in -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8158) Remove non-reproducible build outputs
[ https://issues.apache.org/jira/browse/DRILL-8158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17693650#comment-17693650 ] ASF GitHub Bot commented on DRILL-8158: --- cgivre commented on PR #2766: URL: https://github.com/apache/drill/pull/2766#issuecomment-1445371514 @hboutemy Should we add this as a CI check? > Remove non-reproducible build outputs > - > > Key: DRILL-8158 > URL: https://issues.apache.org/jira/browse/DRILL-8158 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.20.0 >Reporter: Herve Boutemy >Assignee: James Turton >Priority: Major > Fix For: 1.20.2 > > > For context see [1] and [2]. The git-commit-id plugin includes information > like build host, email and time which is not compatible with a reproducible > build. Drill's built in sys.version table will return the build email and > time if they are present in the build's git.properties file so these columns > must be deprecated. Other useful Git-related information is retained. > In accompanying commits, some Kerberos unit test fixes are applied, and the > tests reenabled, and some updates to Release.md are included. > [1] [https://maven.apache.org/guides/mini/guide-reproducible-builds.html] > [2] > [https://github.com/jvm-repo-rebuild/reproducible-central#org.apache.drill:drill-root] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8405) upgrade to snakeyaml 2.0 due to cve
[ https://issues.apache.org/jira/browse/DRILL-8405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17693647#comment-17693647 ] ASF GitHub Bot commented on DRILL-8405: --- pjfanning opened a new pull request, #2767: URL: https://github.com/apache/drill/pull/2767 ## Description upgrade to snakeyaml 2.0 due to CVE ## Testing CI build > upgrade to snakeyaml 2.0 due to cve > --- > > Key: DRILL-8405 > URL: https://issues.apache.org/jira/browse/DRILL-8405 > Project: Apache Drill > Issue Type: Task >Reporter: PJ Fanning >Priority: Major > > https://bitbucket.org/snakeyaml/snakeyaml/issues/561/cve-2022-1471-vulnerability-in -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8158) Remove non-reproducible build outputs
[ https://issues.apache.org/jira/browse/DRILL-8158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17693064#comment-17693064 ] ASF GitHub Bot commented on DRILL-8158: --- hboutemy opened a new pull request, #2766: URL: https://github.com/apache/drill/pull/2766 see #2590 for initial improvements check of release 1.21.0 shows that there are still a few issues https://github.com/jvm-repo-rebuild/reproducible-central/blob/master/content/org/apache/drill/README.md > Remove non-reproducible build outputs > - > > Key: DRILL-8158 > URL: https://issues.apache.org/jira/browse/DRILL-8158 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.20.0 >Reporter: Herve Boutemy >Assignee: James Turton >Priority: Major > Fix For: 1.20.2 > > > For context see [1] and [2]. The git-commit-id plugin includes information > like build host, email and time which is not compatible with a reproducible > build. Drill's built in sys.version table will return the build email and > time if they are present in the build's git.properties file so these columns > must be deprecated. Other useful Git-related information is retained. > In accompanying commits, some Kerberos unit test fixes are applied, and the > tests reenabled, and some updates to Release.md are included. > [1] [https://maven.apache.org/guides/mini/guide-reproducible-builds.html] > [2] > [https://github.com/jvm-repo-rebuild/reproducible-central#org.apache.drill:drill-root] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8402) Add REGEXP_EXTRACT Function
[ https://issues.apache.org/jira/browse/DRILL-8402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17692904#comment-17692904 ] ASF GitHub Bot commented on DRILL-8402: --- cgivre merged PR #2762: URL: https://github.com/apache/drill/pull/2762 > Add REGEXP_EXTRACT Function > --- > > Key: DRILL-8402 > URL: https://issues.apache.org/jira/browse/DRILL-8402 > Project: Apache Drill > Issue Type: Improvement > Components: Functions - Drill >Affects Versions: 1.21.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.21.1 > > > This PR adds two UDFs to Drill: > regexp_extract(, ) which returns an array of strings which > were captured by capturing groups in the regex. > regexp_extract(, , ) returns the text captured by a > specific capturing group. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8402) Add REGEXP_EXTRACT Function
[ https://issues.apache.org/jira/browse/DRILL-8402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17692805#comment-17692805 ] ASF GitHub Bot commented on DRILL-8402: --- vvysotskyi commented on code in PR #2762: URL: https://github.com/apache/drill/pull/2762#discussion_r1116028724 ## exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/StringFunctions.java: ## @@ -293,6 +293,109 @@ public void eval() { } } + /* + * This function returns the capturing groups from a regex. + */ + @FunctionTemplate(name = "regexp_extract", scope = FunctionScope.SIMPLE, + outputWidthCalculatorType = OutputWidthCalculatorType.CUSTOM_FIXED_WIDTH_DEFAULT) + public static class RegexpExtract implements DrillSimpleFunc { + +@Param VarCharHolder input; +@Param(constant=true) VarCharHolder pattern; +@Inject +DrillBuf buffer; +@Workspace +java.util.regex.Matcher matcher; +@Workspace +org.apache.drill.exec.expr.fn.impl.CharSequenceWrapper charSequenceWrapper; +@Output +ComplexWriter out; + +@Override +public void setup() { + matcher = java.util.regex.Pattern.compile(org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.toStringFromUTF8(pattern.start, pattern.end, pattern.buffer)).matcher(""); + charSequenceWrapper = new org.apache.drill.exec.expr.fn.impl.CharSequenceWrapper(); + matcher.reset(charSequenceWrapper); +} + +@Override +public void eval() { + charSequenceWrapper.setBuffer(input.start, input.end, input.buffer); + + // Reusing same charSequenceWrapper, no need to pass it in. + matcher.reset(); + boolean result = matcher.find(); + + // Start the list here. If there are no matches, we return an empty list. + org.apache.drill.exec.vector.complex.writer.BaseWriter.ListWriter listWriter = out.rootAsList(); + listWriter.startList(); + + if (result) { +org.apache.drill.exec.vector.complex.writer.VarCharWriter varCharWriter = listWriter.varChar(); + +for(int i = 1; i <= matcher.groupCount(); i++) { + final byte[] strBytes = matcher.group(i).getBytes(com.google.common.base.Charsets.UTF_8); Review Comment: `matcher.group(i)` creates and returns string > Add REGEXP_EXTRACT Function > --- > > Key: DRILL-8402 > URL: https://issues.apache.org/jira/browse/DRILL-8402 > Project: Apache Drill > Issue Type: Improvement > Components: Functions - Drill >Affects Versions: 1.21.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.21.1 > > > This PR adds two UDFs to Drill: > regexp_extract(, ) which returns an array of strings which > were captured by capturing groups in the regex. > regexp_extract(, , ) returns the text captured by a > specific capturing group. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8402) Add REGEXP_EXTRACT Function
[ https://issues.apache.org/jira/browse/DRILL-8402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17692668#comment-17692668 ] ASF GitHub Bot commented on DRILL-8402: --- cgivre commented on PR #2762: URL: https://github.com/apache/drill/pull/2762#issuecomment-1441742721 @vvysotskyi Thanks for the review. I refactored the functions so that they are not creating extra String objects. > Add REGEXP_EXTRACT Function > --- > > Key: DRILL-8402 > URL: https://issues.apache.org/jira/browse/DRILL-8402 > Project: Apache Drill > Issue Type: Improvement > Components: Functions - Drill >Affects Versions: 1.21.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.21.1 > > > This PR adds two UDFs to Drill: > regexp_extract(, ) which returns an array of strings which > were captured by capturing groups in the regex. > regexp_extract(, , ) returns the text captured by a > specific capturing group. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8403) Generate aggregate function calls are missing required filters when used with PIVOT
[ https://issues.apache.org/jira/browse/DRILL-8403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17692586#comment-17692586 ] ASF GitHub Bot commented on DRILL-8403: --- jnturton merged PR #2765: URL: https://github.com/apache/drill/pull/2765 > Generate aggregate function calls are missing required filters when used with > PIVOT > --- > > Key: DRILL-8403 > URL: https://issues.apache.org/jira/browse/DRILL-8403 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.21.0 >Reporter: James Turton >Assignee: Vova Vysotskyi >Priority: Major > Fix For: 1.21.1 > > > The following query should generate aggregates grouped by education_level and > containing filters on marital_status but the requisite filters are lost > during function rewriting. > apache drill> SELECT > 2..semicolon> * > 3..semicolon> FROM > 4..semicolon> (SELECT > 5..)> education_level, > 6..)> salary, > 7..)> marital_status, > 8..)> extract(year from age(birth_date)) age > 9..)> FROM > 10.)> cp.`employee.json`) > 11.semicolon> PIVOT ( > 12.)> avg(salary) avg_salary, avg(age) avg_age FOR marital_status IN > ('M' married, 'S' single) > 13.)> ); > {+}{-}{-}{+}--{-}++{-}--{-}{-}--{-}++{-}--- > |education_level|married_avg_salary|married_avg_age|single_avg_salary|single_avg_age| > {+}{-}{-}{+}--{-}++{-}--{-}{-}--{-}++{-}--- > |Graduate > Degree|4392.823529411765|100.32352941176471|4392.823529411765|100.32352941176471| > |Bachelors > Degree|4492.404181184669|102.22996515679442|4492.404181184669|102.22996515679442| > |Partial > College|4047.11807|100.100694|4047.11807|100.100694| > |High School > Degree|3516.1565836298932|103.12811387900356|3516.1565836298932|103.12811387900356| > |Partial High > School|3511.0852713178297|102.30232558139535|3511.0852713178297|102.30232558139535| > {+}{-}{-}{+}--{-}++{-}--{-}{-}--{-}++{-}--- > 5 rows selected (0.285 seconds) > > 00-00 Screen : rowType = RecordType(ANY education_level, ANY > married_min_salary, DOUBLE married_avg_age, ANY single_min_salary, DOUBLE > single_avg_age): rowcount = 46.3, cumulative cost = \{1486.23 rows, > 35748.2296 cpu, 474630.0 io, 0.0 network, 8148.8001 memory}, > id = 812 > 00-01 Project(education_level=[$0], married_min_salary=[$1], > married_avg_age=[$2], single_min_salary=[$3], single_avg_age=[$4]) : rowType > = RecordType(ANY education_level, ANY married_min_salary, DOUBLE > married_avg_age, ANY single_min_salary, DOUBLE single_avg_age): rowcount = > 46.3, cumulative cost = \{1481.6 rows, 35743.6 cpu, 474630.0 io, 0.0 network, > 8148.8001 memory}, id = 811 > 00-02 Project(education_level=[$0], > married_min_salary=[divide(CastHigh(CASE(=($2, 0), null:NULL, $1)), $2)], > married_avg_age=[divide(CastHigh(CASE(=($4, 0), null:NULL, $3)), $4)], > single_min_salary=[divide(CastHigh(CASE(=($2, 0), null:NULL, $1)), $2)], > single_avg_age=[divide(CastHigh(CASE(=($4, 0), null:NULL, $3)), $4)]) : > rowType = RecordType(ANY education_level, ANY married_min_salary, DOUBLE > married_avg_age, ANY single_min_salary, DOUBLE single_avg_age): rowcount = > 46.3, cumulative cost = \{1435.3 rows, 35512.1 cpu, 474630.0 io, 0.0 network, > 8148.8001 memory}, id = 808 > 00-03 HashAgg(group=[\\{0}], agg#0=[$SUM0($2)], agg#1=[COUNT($2)], > agg#2=[$SUM0($3)], agg#3=[COUNT($3)]) : rowType = RecordType(ANY > education_level, ANY $f1, BIGINT $f2, BIGINT $f3, BIGINT $f4): rowcount = > 46.3, cumulative cost = \{1389.0 rows, 34725.0 cpu, 474630.0 io, 0.0 network, > 8148.8001 memory}, id = 807 > 00-04 Project(education_level=[$0], marital_status=[$1], salary=[$2], > age=[EXTRACT(FLAG(YEAR), AGE($3))], $f4=[IS TRUE(=($1, 'M'))], $f5=[IS > TRUE(=($1, 'S'))]) : rowType = RecordType(ANY education_level, ANY > marital_status, ANY salary, BIGINT age, BOOLEAN $f4, BOOLEAN $f5): rowcount = > 463.0, cumulative cost = \{926.0 rows, 8797.0 cpu, 474630.0 io, 0.0 network, > 0.0 memory}, id = 806 > 00-05 Scan(table=[[cp, employee.json]], groupscan=[EasyGroupScan > [selectionRoot=classpath:/employee.json, numFiles=1, > columns=[`education_level`, `marital_status`, `salary`, `birth_date`], > files=[classpath:/employee.json], usedMetastore=false, limit=-1, > formatConfig=JSONFormatConfig [extensions=[json) : rowType = > RecordType(ANY education_level, ANY marital_status, ANY salary, ANY > birth_date): rowcount = 463.0, cumulative cos
[jira] [Commented] (DRILL-8402) Add REGEXP_EXTRACT Function
[ https://issues.apache.org/jira/browse/DRILL-8402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17692464#comment-17692464 ] ASF GitHub Bot commented on DRILL-8402: --- cgivre commented on code in PR #2762: URL: https://github.com/apache/drill/pull/2762#discussion_r1115197954 ## exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/StringFunctions.java: ## @@ -293,6 +293,115 @@ public void eval() { } } + /* + * This function returns the capturing groups from a regex. + */ + @FunctionTemplate(name = "regexp_extract", scope = FunctionScope.SIMPLE, + outputWidthCalculatorType = OutputWidthCalculatorType.CUSTOM_FIXED_WIDTH_DEFAULT) + public static class RegexpExtract implements DrillSimpleFunc { + +@Param VarCharHolder input; +@Param(constant=true) VarCharHolder pattern; +@Inject +DrillBuf buffer; +@Workspace +java.util.regex.Matcher matcher; +@Workspace +org.apache.drill.exec.expr.fn.impl.CharSequenceWrapper charSequenceWrapper; +@Output +ComplexWriter out; + +@Override +public void setup() { + matcher = java.util.regex.Pattern.compile(org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.toStringFromUTF8(pattern.start, pattern.end, pattern.buffer)).matcher(""); + charSequenceWrapper = new org.apache.drill.exec.expr.fn.impl.CharSequenceWrapper(); + matcher.reset(charSequenceWrapper); +} + +@Override +public void eval() { + charSequenceWrapper.setBuffer(input.start, input.end, input.buffer); + + // Reusing same charSequenceWrapper, no need to pass it in. + matcher.reset(); + boolean result = matcher.find(); + + // Start the list here. If there are no matches, we return an empty list. + org.apache.drill.exec.vector.complex.writer.BaseWriter.ListWriter listWriter = out.rootAsList(); + listWriter.startList(); + + if (result) { +org.apache.drill.exec.vector.complex.writer.VarCharWriter varCharWriter = listWriter.varChar(); +String extractedResult; +for(int i = 1; i <= matcher.groupCount(); i++) { + extractedResult = matcher.group(i); Review Comment: Fixed. > Add REGEXP_EXTRACT Function > --- > > Key: DRILL-8402 > URL: https://issues.apache.org/jira/browse/DRILL-8402 > Project: Apache Drill > Issue Type: Improvement > Components: Functions - Drill >Affects Versions: 1.21.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.21.1 > > > This PR adds two UDFs to Drill: > regexp_extract(, ) which returns an array of strings which > were captured by capturing groups in the regex. > regexp_extract(, , ) returns the text captured by a > specific capturing group. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8402) Add REGEXP_EXTRACT Function
[ https://issues.apache.org/jira/browse/DRILL-8402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17692456#comment-17692456 ] ASF GitHub Bot commented on DRILL-8402: --- cgivre commented on code in PR #2762: URL: https://github.com/apache/drill/pull/2762#discussion_r1115193957 ## exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/CharSequenceWrapper.java: ## @@ -90,7 +90,10 @@ public char charAt(int index) { */ @Override public CharSequence subSequence(int start, int end) { -throw new UnsupportedOperationException(); +// throw new UnsupportedOperationException(); Review Comment: Fixed. > Add REGEXP_EXTRACT Function > --- > > Key: DRILL-8402 > URL: https://issues.apache.org/jira/browse/DRILL-8402 > Project: Apache Drill > Issue Type: Improvement > Components: Functions - Drill >Affects Versions: 1.21.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.21.1 > > > This PR adds two UDFs to Drill: > regexp_extract(, ) which returns an array of strings which > were captured by capturing groups in the regex. > regexp_extract(, , ) returns the text captured by a > specific capturing group. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8402) Add REGEXP_EXTRACT Function
[ https://issues.apache.org/jira/browse/DRILL-8402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17692455#comment-17692455 ] ASF GitHub Bot commented on DRILL-8402: --- cgivre commented on code in PR #2762: URL: https://github.com/apache/drill/pull/2762#discussion_r1115193471 ## NOTICE: ## @@ -1,5 +1,5 @@ Apache Drill -Copyright 2013-2022 The Apache Software Foundation +Copyright 2013-2023 The Apache Software Foundation Review Comment: Fixed. > Add REGEXP_EXTRACT Function > --- > > Key: DRILL-8402 > URL: https://issues.apache.org/jira/browse/DRILL-8402 > Project: Apache Drill > Issue Type: Improvement > Components: Functions - Drill >Affects Versions: 1.21.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.21.1 > > > This PR adds two UDFs to Drill: > regexp_extract(, ) which returns an array of strings which > were captured by capturing groups in the regex. > regexp_extract(, , ) returns the text captured by a > specific capturing group. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8403) Generate aggregate function calls are missing required filters when used with PIVOT
[ https://issues.apache.org/jira/browse/DRILL-8403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17692373#comment-17692373 ] ASF GitHub Bot commented on DRILL-8403: --- vvysotskyi opened a new pull request, #2765: URL: https://github.com/apache/drill/pull/2765 # [DRILL-8403](https://issues.apache.org/jira/browse/DRILL-8403): Generate aggregate function calls are missing required filters when used with PIVOT ## Description Passing filters to agg calls when applying agg reduce rule. ## Documentation NA ## Testing Added UT. > Generate aggregate function calls are missing required filters when used with > PIVOT > --- > > Key: DRILL-8403 > URL: https://issues.apache.org/jira/browse/DRILL-8403 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.21.0 >Reporter: James Turton >Assignee: Vova Vysotskyi >Priority: Major > Fix For: 1.21.1 > > > The following query should generate aggregates grouped by education_level and > containing filters on marital_status but the requisite filters are lost > during function rewriting. > apache drill> SELECT > 2..semicolon> * > 3..semicolon> FROM > 4..semicolon> (SELECT > 5..)> education_level, > 6..)> salary, > 7..)> marital_status, > 8..)> extract(year from age(birth_date)) age > 9..)> FROM > 10.)> cp.`employee.json`) > 11.semicolon> PIVOT ( > 12.)> avg(salary) avg_salary, avg(age) avg_age FOR marital_status IN > ('M' married, 'S' single) > 13.)> ); > {+}{-}{-}{+}--{-}++{-}--{-}{-}--{-}++{-}--- > |education_level|married_avg_salary|married_avg_age|single_avg_salary|single_avg_age| > {+}{-}{-}{+}--{-}++{-}--{-}{-}--{-}++{-}--- > |Graduate > Degree|4392.823529411765|100.32352941176471|4392.823529411765|100.32352941176471| > |Bachelors > Degree|4492.404181184669|102.22996515679442|4492.404181184669|102.22996515679442| > |Partial > College|4047.11807|100.100694|4047.11807|100.100694| > |High School > Degree|3516.1565836298932|103.12811387900356|3516.1565836298932|103.12811387900356| > |Partial High > School|3511.0852713178297|102.30232558139535|3511.0852713178297|102.30232558139535| > {+}{-}{-}{+}--{-}++{-}--{-}{-}--{-}++{-}--- > 5 rows selected (0.285 seconds) > > 00-00 Screen : rowType = RecordType(ANY education_level, ANY > married_min_salary, DOUBLE married_avg_age, ANY single_min_salary, DOUBLE > single_avg_age): rowcount = 46.3, cumulative cost = \{1486.23 rows, > 35748.2296 cpu, 474630.0 io, 0.0 network, 8148.8001 memory}, > id = 812 > 00-01 Project(education_level=[$0], married_min_salary=[$1], > married_avg_age=[$2], single_min_salary=[$3], single_avg_age=[$4]) : rowType > = RecordType(ANY education_level, ANY married_min_salary, DOUBLE > married_avg_age, ANY single_min_salary, DOUBLE single_avg_age): rowcount = > 46.3, cumulative cost = \{1481.6 rows, 35743.6 cpu, 474630.0 io, 0.0 network, > 8148.8001 memory}, id = 811 > 00-02 Project(education_level=[$0], > married_min_salary=[divide(CastHigh(CASE(=($2, 0), null:NULL, $1)), $2)], > married_avg_age=[divide(CastHigh(CASE(=($4, 0), null:NULL, $3)), $4)], > single_min_salary=[divide(CastHigh(CASE(=($2, 0), null:NULL, $1)), $2)], > single_avg_age=[divide(CastHigh(CASE(=($4, 0), null:NULL, $3)), $4)]) : > rowType = RecordType(ANY education_level, ANY married_min_salary, DOUBLE > married_avg_age, ANY single_min_salary, DOUBLE single_avg_age): rowcount = > 46.3, cumulative cost = \{1435.3 rows, 35512.1 cpu, 474630.0 io, 0.0 network, > 8148.8001 memory}, id = 808 > 00-03 HashAgg(group=[\\{0}], agg#0=[$SUM0($2)], agg#1=[COUNT($2)], > agg#2=[$SUM0($3)], agg#3=[COUNT($3)]) : rowType = RecordType(ANY > education_level, ANY $f1, BIGINT $f2, BIGINT $f3, BIGINT $f4): rowcount = > 46.3, cumulative cost = \{1389.0 rows, 34725.0 cpu, 474630.0 io, 0.0 network, > 8148.8001 memory}, id = 807 > 00-04 Project(education_level=[$0], marital_status=[$1], salary=[$2], > age=[EXTRACT(FLAG(YEAR), AGE($3))], $f4=[IS TRUE(=($1, 'M'))], $f5=[IS > TRUE(=($1, 'S'))]) : rowType = RecordType(ANY education_level, ANY > marital_status, ANY salary, BIGINT age, BOOLEAN $f4, BOOLEAN $f5): rowcount = > 463.0, cumulative cost = \{926.0 rows, 8797.0 cpu, 474630.0 io, 0.0 network, > 0.0 memory}, id = 806 > 00-05 Scan(table=[[cp, employee.json]], groupscan=[EasyGroupScan > [selectionRoot=classpath:/employee.json, numFiles=1, > co
[jira] [Commented] (DRILL-8402) Add REGEXP_EXTRACT Function
[ https://issues.apache.org/jira/browse/DRILL-8402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17692334#comment-17692334 ] ASF GitHub Bot commented on DRILL-8402: --- vvysotskyi commented on code in PR #2762: URL: https://github.com/apache/drill/pull/2762#discussion_r1114857738 ## exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/StringFunctions.java: ## @@ -293,6 +293,115 @@ public void eval() { } } + /* + * This function returns the capturing groups from a regex. + */ + @FunctionTemplate(name = "regexp_extract", scope = FunctionScope.SIMPLE, + outputWidthCalculatorType = OutputWidthCalculatorType.CUSTOM_FIXED_WIDTH_DEFAULT) + public static class RegexpExtract implements DrillSimpleFunc { + +@Param VarCharHolder input; +@Param(constant=true) VarCharHolder pattern; +@Inject +DrillBuf buffer; +@Workspace +java.util.regex.Matcher matcher; +@Workspace +org.apache.drill.exec.expr.fn.impl.CharSequenceWrapper charSequenceWrapper; +@Output +ComplexWriter out; + +@Override +public void setup() { + matcher = java.util.regex.Pattern.compile(org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.toStringFromUTF8(pattern.start, pattern.end, pattern.buffer)).matcher(""); + charSequenceWrapper = new org.apache.drill.exec.expr.fn.impl.CharSequenceWrapper(); + matcher.reset(charSequenceWrapper); +} + +@Override +public void eval() { + charSequenceWrapper.setBuffer(input.start, input.end, input.buffer); + + // Reusing same charSequenceWrapper, no need to pass it in. + matcher.reset(); + boolean result = matcher.find(); + + // Start the list here. If there are no matches, we return an empty list. + org.apache.drill.exec.vector.complex.writer.BaseWriter.ListWriter listWriter = out.rootAsList(); + listWriter.startList(); + + if (result) { +org.apache.drill.exec.vector.complex.writer.VarCharWriter varCharWriter = listWriter.varChar(); +String extractedResult; +for(int i = 1; i <= matcher.groupCount(); i++) { + extractedResult = matcher.group(i); Review Comment: It is better to avoid creating extra objects in UDFs to reduce the load on the garbage collector. Matcher has `Matcher.start(int group)` and `Matcher.end(int group)`, so please use them to obtain bytes that correspond to marching subsequence. ## NOTICE: ## @@ -1,5 +1,5 @@ Apache Drill -Copyright 2013-2022 The Apache Software Foundation +Copyright 2013-2023 The Apache Software Foundation Review Comment: Looks like this PR should be rebased on the latest master. Probably these changes are present because of the force push to master. ## exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/CharSequenceWrapper.java: ## @@ -90,7 +90,10 @@ public char charAt(int index) { */ @Override public CharSequence subSequence(int start, int end) { -throw new UnsupportedOperationException(); +// throw new UnsupportedOperationException(); Review Comment: Please remove commented code. > Add REGEXP_EXTRACT Function > --- > > Key: DRILL-8402 > URL: https://issues.apache.org/jira/browse/DRILL-8402 > Project: Apache Drill > Issue Type: Improvement > Components: Functions - Drill >Affects Versions: 1.21.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.21.1 > > > This PR adds two UDFs to Drill: > regexp_extract(, ) which returns an array of strings which > were captured by capturing groups in the regex. > regexp_extract(, , ) returns the text captured by a > specific capturing group. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8402) Add REGEXP_EXTRACT Function
[ https://issues.apache.org/jira/browse/DRILL-8402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17692318#comment-17692318 ] ASF GitHub Bot commented on DRILL-8402: --- cgivre commented on PR #2762: URL: https://github.com/apache/drill/pull/2762#issuecomment-1440613580 > @vvysotskyi Should we proceed with this? Is that a LGTM +1? > Add REGEXP_EXTRACT Function > --- > > Key: DRILL-8402 > URL: https://issues.apache.org/jira/browse/DRILL-8402 > Project: Apache Drill > Issue Type: Improvement > Components: Functions - Drill >Affects Versions: 1.21.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.21.1 > > > This PR adds two UDFs to Drill: > regexp_extract(, ) which returns an array of strings which > were captured by capturing groups in the regex. > regexp_extract(, , ) returns the text captured by a > specific capturing group. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8117) Upgrade unit tests to the cluster fixture framework
[ https://issues.apache.org/jira/browse/DRILL-8117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17692141#comment-17692141 ] ASF GitHub Bot commented on DRILL-8117: --- jnturton merged PR #2763: URL: https://github.com/apache/drill/pull/2763 > Upgrade unit tests to the cluster fixture framework > --- > > Key: DRILL-8117 > URL: https://issues.apache.org/jira/browse/DRILL-8117 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.20.1 >Reporter: Jingchuan Hu >Assignee: James Turton >Priority: Major > Fix For: 1.21.0 > > > Upgrade various unit tests to the cluster fixture framework and replace other > instances of deprecated code usage. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8117) Upgrade unit tests to the cluster fixture framework
[ https://issues.apache.org/jira/browse/DRILL-8117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17692075#comment-17692075 ] ASF GitHub Bot commented on DRILL-8117: --- jnturton commented on code in PR #2763: URL: https://github.com/apache/drill/pull/2763#discussion_r1114070404 ## docs/dev/ClusterFixture.md: ## @@ -125,6 +125,27 @@ In some cases, you may want to change an option in a test. Rather than writing o Again, you can pass a Java value which the test code will convert to a string, then will build the `ALTER SESSION` command. +# Try-with-resource Style of Creating Single-use Client Fixtures. + +The benefit of Cluster Fixture framework is to define specific config for specific clusterFixture and clientFixture as needed flexibly. + +In some cases, clusterFixture has been initialized, and we need to create several different config clients for different test cases, Review Comment: ```suggestion In some cases, a clusterFixture has been initialized and we need to create several different config clients for different test cases. ``` > Upgrade unit tests to the cluster fixture framework > --- > > Key: DRILL-8117 > URL: https://issues.apache.org/jira/browse/DRILL-8117 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.20.1 >Reporter: Jingchuan Hu >Assignee: James Turton >Priority: Major > Fix For: 1.21.0 > > > Upgrade various unit tests to the cluster fixture framework and replace other > instances of deprecated code usage. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8117) Upgrade unit tests to the cluster fixture framework
[ https://issues.apache.org/jira/browse/DRILL-8117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17692073#comment-17692073 ] ASF GitHub Bot commented on DRILL-8117: --- jnturton commented on code in PR #2763: URL: https://github.com/apache/drill/pull/2763#discussion_r1114071613 ## docs/dev/ClusterFixture.md: ## @@ -125,6 +125,27 @@ In some cases, you may want to change an option in a test. Rather than writing o Again, you can pass a Java value which the test code will convert to a string, then will build the `ALTER SESSION` command. +# Try-with-resource Style of Creating Single-use Client Fixtures. + +The benefit of Cluster Fixture framework is to define specific config for specific clusterFixture and clientFixture as needed flexibly. Review Comment: ```suggestion A benefit of the Cluster Fixture framework is the ability to define specific configs for specific clusterFixtures and clientFixtures as needed flexibly. ``` ## docs/dev/ClusterFixture.md: ## @@ -156,6 +177,28 @@ It is often very handy, during development, to accumulate a collection of test f * The (local) file system location * The default format +# Exception Matcher + +The `QueryBuilder` provides a clean and concise way to handle Exception match which includes type match and pattern match: Review Comment: ```suggestion The `QueryBuilder` provides a clean and concise way to handle UserException matching which includes error type matching and error message pattern matching: ``` ## docs/dev/ClusterFixture.md: ## @@ -125,6 +125,27 @@ In some cases, you may want to change an option in a test. Rather than writing o Again, you can pass a Java value which the test code will convert to a string, then will build the `ALTER SESSION` command. +# Try-with-resource Style of Creating Single-use Client Fixtures. + +The benefit of Cluster Fixture framework is to define specific config for specific clusterFixture and clientFixture as needed flexibly. + +In some cases, clusterFixture has been initialized, and we need to create several different config clients for different test cases, + +We could use try-with-resource style to creating single-use clientFixture. Review Comment: ```suggestion Using Java's try-with-resources syntax to create a single-use clientFixture is a convenient way to ensure that the clientFixture will automatically be closed once we've finished with it. ``` ## docs/dev/ClusterFixture.md: ## @@ -125,6 +125,27 @@ In some cases, you may want to change an option in a test. Rather than writing o Again, you can pass a Java value which the test code will convert to a string, then will build the `ALTER SESSION` command. +# Try-with-resource Style of Creating Single-use Client Fixtures. + +The benefit of Cluster Fixture framework is to define specific config for specific clusterFixture and clientFixture as needed flexibly. + +In some cases, clusterFixture has been initialized, and we need to create several different config clients for different test cases, Review Comment: ```suggestion In some cases, clusterFixture has been initialized and we need to create several different config clients for different test cases. ``` ```suggestion In some cases, a clusterFixture has been initialized and we need to create several different config clients for different test cases. ``` ## docs/dev/ClusterFixture.md: ## @@ -156,6 +177,28 @@ It is often very handy, during development, to accumulate a collection of test f * The (local) file system location * The default format +# Exception Matcher + +The `QueryBuilder` provides a clean and concise way to handle Exception match which includes type match and pattern match: + +``` +@Test +public void unsupportedLiteralValidation() throws Exception { + String query = "ALTER session SET `%s` = %s"; + + client.queryBuilder() +.sql(query, ENABLE_VERBOSE_ERRORS_KEY, "DATE '1995-01-01'") +.userExceptionMatcher() +.expectedType(ErrorType.VALIDATION) +.include("Drill doesn't support assigning literals of type") +.match(); +} +``` +* Use `.userExceptionMatcher` to call UserExceptionMatcher +* Use `.expectedType` to define expected Error type +* Use `.include` to define expected Error pattern Review Comment: ```suggestion * Use `.include` to define an expected error message regex pattern * Use `.exclude` to define an unexpected error message regex pattern ``` > Upgrade unit tests to the cluster fixture framework > --- > > Key: DRILL-8117 > URL: https://issues.apache.org/jira/browse/DRILL-8117 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.20.1 >Repo
[jira] [Commented] (DRILL-8117) Upgrade unit tests to the cluster fixture framework
[ https://issues.apache.org/jira/browse/DRILL-8117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17692063#comment-17692063 ] ASF GitHub Bot commented on DRILL-8117: --- kingswanwho opened a new pull request, #2763: URL: https://github.com/apache/drill/pull/2763 # [MINOR UPDATE]: /docs update for DRILL-8117 ## Description Update /docs base on the discussion in https://github.com/apache/drill/pull/2756 ## Documentation This is a doc update ## Testing N/A > Upgrade unit tests to the cluster fixture framework > --- > > Key: DRILL-8117 > URL: https://issues.apache.org/jira/browse/DRILL-8117 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.20.1 >Reporter: Jingchuan Hu >Assignee: James Turton >Priority: Major > Fix For: 1.21.0 > > > Upgrade various unit tests to the cluster fixture framework and replace other > instances of deprecated code usage. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8402) Add REGEXP_EXTRACT Function
[ https://issues.apache.org/jira/browse/DRILL-8402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17691944#comment-17691944 ] ASF GitHub Bot commented on DRILL-8402: --- vvysotskyi commented on PR #2762: URL: https://github.com/apache/drill/pull/2762#issuecomment-1439487042 Ok, in this case, we can add this UDF. > Add REGEXP_EXTRACT Function > --- > > Key: DRILL-8402 > URL: https://issues.apache.org/jira/browse/DRILL-8402 > Project: Apache Drill > Issue Type: Improvement > Components: Functions - Drill >Affects Versions: 1.21.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.21.1 > > > This PR adds two UDFs to Drill: > regexp_extract(, ) which returns an array of strings which > were captured by capturing groups in the regex. > regexp_extract(, , ) returns the text captured by a > specific capturing group. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8402) Add REGEXP_EXTRACT Function
[ https://issues.apache.org/jira/browse/DRILL-8402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17691724#comment-17691724 ] ASF GitHub Bot commented on DRILL-8402: --- cgivre commented on PR #2762: URL: https://github.com/apache/drill/pull/2762#issuecomment-1438871668 > Wouldn't this change introduce ReDoS vulnerability? Potentially, but we already allow `REGEXP_REPLACE` and `REGEX_MATCHES`, so I don't know that this actually makes anything worse. I did try adding a validator with this `saferegex`[1] but that library is not suitable for inclusion in Drill. (It prints all kinds of stuff to STDOUT.) [1]: https://github.com/jkutner/saferegex > Add REGEXP_EXTRACT Function > --- > > Key: DRILL-8402 > URL: https://issues.apache.org/jira/browse/DRILL-8402 > Project: Apache Drill > Issue Type: Improvement > Components: Functions - Drill >Affects Versions: 1.21.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.21.1 > > > This PR adds two UDFs to Drill: > regexp_extract(, ) which returns an array of strings which > were captured by capturing groups in the regex. > regexp_extract(, , ) returns the text captured by a > specific capturing group. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8402) Add REGEXP_EXTRACT Function
[ https://issues.apache.org/jira/browse/DRILL-8402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17691471#comment-17691471 ] ASF GitHub Bot commented on DRILL-8402: --- vvysotskyi commented on PR #2762: URL: https://github.com/apache/drill/pull/2762#issuecomment-1438117260 Wouldn't this change introduce ReDoS vulnerability? > Add REGEXP_EXTRACT Function > --- > > Key: DRILL-8402 > URL: https://issues.apache.org/jira/browse/DRILL-8402 > Project: Apache Drill > Issue Type: Improvement > Components: Functions - Drill >Affects Versions: 1.21.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.21.0 > > > This PR adds two UDFs to Drill: > regexp_extract(, ) which returns an array of strings which > were captured by capturing groups in the regex. > regexp_extract(, , ) returns the text captured by a > specific capturing group. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8402) Add REGEXP_EXTRACT Function
[ https://issues.apache.org/jira/browse/DRILL-8402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17691361#comment-17691361 ] ASF GitHub Bot commented on DRILL-8402: --- cgivre opened a new pull request, #2762: URL: https://github.com/apache/drill/pull/2762 # [DRILL-8402](https://issues.apache.org/jira/browse/DRILL-8402): Add REGEXP_EXTRACT Function ## Description Adds `regexp_extract` functions to Drill. ## Documentation This PR adds support for `regexp_extract(, )` which returns an array of text corresponding with the capturing groups in the regex. It also includes `regexp_extract(, , )` which returns the text of a specific capturing group. ```sql SELECT regexp_extract('123-456-789', '([0-9]{3})-([0-9]{3})-([0-9]{3})'); +-+ | EXPR$0| +-+ | ["123","456","789"] | +-+ SELECT regexp_extract('123-456-789', '([0-9]{3})-([0-9]{3})-([0-9]{3})', 0); +-+ | EXPR$0| +-+ | 123-456-789 | +-+ SELECT regexp_extract('123-456-789', '([0-9]{3})-([0-9]{3})-([0-9]{3})', 3); ++ | EXPR$0 | ++ | 789| ++ ``` ## Testing Added unit tests. > Add REGEXP_EXTRACT Function > --- > > Key: DRILL-8402 > URL: https://issues.apache.org/jira/browse/DRILL-8402 > Project: Apache Drill > Issue Type: Improvement > Components: Functions - Drill >Affects Versions: 1.21.0 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.21.0 > > > This PR adds two UDFs to Drill: > regexp_extract(, ) which returns an array of strings which > were captured by capturing groups in the regex. > regexp_extract(, , ) returns the text captured by a > specific capturing group. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8117) Upgrade unit tests to the cluster fixture framework
[ https://issues.apache.org/jira/browse/DRILL-8117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17688812#comment-17688812 ] ASF GitHub Bot commented on DRILL-8117: --- cgivre merged PR #2756: URL: https://github.com/apache/drill/pull/2756 > Upgrade unit tests to the cluster fixture framework > --- > > Key: DRILL-8117 > URL: https://issues.apache.org/jira/browse/DRILL-8117 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.20.1 >Reporter: Jingchuan Hu >Assignee: James Turton >Priority: Major > Fix For: 1.21.0 > > > Upgrade various unit tests to the cluster fixture framework and replace other > instances of deprecated code usage. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8117) Upgrade unit tests to the cluster fixture framework
[ https://issues.apache.org/jira/browse/DRILL-8117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17688803#comment-17688803 ] ASF GitHub Bot commented on DRILL-8117: --- jnturton commented on PR #2756: URL: https://github.com/apache/drill/pull/2756#issuecomment-1430707391 Message to whoever squashes and merges here, in case it's not me: when cleaning up the squashed commit detail message please retain the co-author footer so that the repo will reflect @kingswanwho's contribution. ``` - Co-authored-by: kingswanwho ``` > Upgrade unit tests to the cluster fixture framework > --- > > Key: DRILL-8117 > URL: https://issues.apache.org/jira/browse/DRILL-8117 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.20.1 >Reporter: Jingchuan Hu >Assignee: James Turton >Priority: Major > Fix For: 1.21.0 > > > Upgrade various unit tests to the cluster fixture framework and replace other > instances of deprecated code usage. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8117) Upgrade unit tests to the cluster fixture framework
[ https://issues.apache.org/jira/browse/DRILL-8117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17688802#comment-17688802 ] ASF GitHub Bot commented on DRILL-8117: --- jnturton commented on PR #2756: URL: https://github.com/apache/drill/pull/2756#issuecomment-1430704625 Okay, I think the penny's finally dropped. I was also thinking about the markdown in /docs but couldn't fathom what we'd add. But the new UserExceptionMatcher usage can be described and also the try-with-resources style of creating single-use client fixtures. > Upgrade unit tests to the cluster fixture framework > --- > > Key: DRILL-8117 > URL: https://issues.apache.org/jira/browse/DRILL-8117 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.20.1 >Reporter: Jingchuan Hu >Assignee: James Turton >Priority: Major > Fix For: 1.21.0 > > > Upgrade various unit tests to the cluster fixture framework and replace other > instances of deprecated code usage. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8117) Upgrade unit tests to the cluster fixture framework
[ https://issues.apache.org/jira/browse/DRILL-8117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17688779#comment-17688779 ] ASF GitHub Bot commented on DRILL-8117: --- kingswanwho commented on PR #2756: URL: https://github.com/apache/drill/pull/2756#issuecomment-1430637603 > > > One other question. Should we document this in the developer documentation? > > > > > > I think we do have developer documentation describing cluster fixture tests, or do you mean something else? > > I was referring to the markdown files in the `/docs` folder. With this PR do those need to be updated? (It doesn't have to be a part of this PR.) Hi Charles, I have checked /docs developer information, this PR transfer test framework from BaseTestQuery to ClusterTest, and doesn't change the test logic of ClusterTest. James helps to find a clean way to create new ClientFixture, and handle UserException. I can help to update those information in /docs in a new PR. > Upgrade unit tests to the cluster fixture framework > --- > > Key: DRILL-8117 > URL: https://issues.apache.org/jira/browse/DRILL-8117 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.20.1 >Reporter: Jingchuan Hu >Assignee: James Turton >Priority: Major > Fix For: 1.21.0 > > > Upgrade various unit tests to the cluster fixture framework and replace other > instances of deprecated code usage. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8401) Skip nested MAP column without children when creating parquet tables
[ https://issues.apache.org/jira/browse/DRILL-8401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17688685#comment-17688685 ] ASF GitHub Bot commented on DRILL-8401: --- cgivre merged PR #2757: URL: https://github.com/apache/drill/pull/2757 > Skip nested MAP column without children when creating parquet tables > > > Key: DRILL-8401 > URL: https://issues.apache.org/jira/browse/DRILL-8401 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet >Affects Versions: 1.20.3 >Reporter: James Turton >Assignee: James Turton >Priority: Major > Fix For: 1.21.0 > > > This extends the work of DRILL-8272 in order to handle nested empty MAPs > which currently also break the Parquet writer. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8117) Upgrade unit tests to the cluster fixture framework
[ https://issues.apache.org/jira/browse/DRILL-8117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17688630#comment-17688630 ] ASF GitHub Bot commented on DRILL-8117: --- cgivre commented on PR #2756: URL: https://github.com/apache/drill/pull/2756#issuecomment-1430116439 > > One other question. Should we document this in the developer documentation? > > I think we do have developer documentation describing cluster fixture tests, or do you mean something else? I was referring to the markdown files in the `/docs` folder. With this PR do those need to be updated? (It doesn't have to be a part of this PR.) > Upgrade unit tests to the cluster fixture framework > --- > > Key: DRILL-8117 > URL: https://issues.apache.org/jira/browse/DRILL-8117 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.20.1 >Reporter: Jingchuan Hu >Assignee: James Turton >Priority: Major > Fix For: 1.21.0 > > > Upgrade various unit tests to the cluster fixture framework and replace other > instances of deprecated code usage. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8117) Upgrade unit tests to the cluster fixture framework
[ https://issues.apache.org/jira/browse/DRILL-8117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17688628#comment-17688628 ] ASF GitHub Bot commented on DRILL-8117: --- jnturton commented on PR #2756: URL: https://github.com/apache/drill/pull/2756#issuecomment-1430114758 > One other question. Should we document this in the developer documentation? I think we do have developer documentation describing cluster tests, or do you mean something else? > Upgrade unit tests to the cluster fixture framework > --- > > Key: DRILL-8117 > URL: https://issues.apache.org/jira/browse/DRILL-8117 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.20.1 >Reporter: Jingchuan Hu >Assignee: James Turton >Priority: Major > Fix For: 1.21.0 > > > Upgrade various unit tests to the cluster fixture framework and replace other > instances of deprecated code usage. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8117) Upgrade unit tests to the cluster fixture framework
[ https://issues.apache.org/jira/browse/DRILL-8117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17688612#comment-17688612 ] ASF GitHub Bot commented on DRILL-8117: --- kingswanwho commented on PR #2756: URL: https://github.com/apache/drill/pull/2756#issuecomment-1430077393 > @jnturton @kingswanwho Should we close the other PR? Yes, I closed another PR > Upgrade unit tests to the cluster fixture framework > --- > > Key: DRILL-8117 > URL: https://issues.apache.org/jira/browse/DRILL-8117 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.20.1 >Reporter: Jingchuan Hu >Assignee: James Turton >Priority: Major > Fix For: 1.21.0 > > > Upgrade various unit tests to the cluster fixture framework and replace other > instances of deprecated code usage. -- This message was sent by Atlassian Jira (v8.20.10#820010)