[jira] [Commented] (DRILL-8424) Accommodate RexBuilder changes made for SAFE_CAST

2023-04-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713327#comment-17713327
 ] 

ASF GitHub Bot commented on DRILL-8424:
---

cgivre merged PR #2794:
URL: https://github.com/apache/drill/pull/2794




> Accommodate RexBuilder changes made for SAFE_CAST
> -
>
> Key: DRILL-8424
> URL: https://issues.apache.org/jira/browse/DRILL-8424
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.22.0
>Reporter: James Turton
>Assignee: James Turton
>Priority: Major
> Fix For: 1.22.0
>
>
> The introduction of SAFE_CAST support in CALCITE-5575 made method signature 
> changes in RexBuilder that broke a needed override in DrillRexBuilder.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8421) Parquet TIMESTAMP_MICROS columns in WHERE clauses are not converted to milliseconds before filtering

2023-04-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713153#comment-17713153
 ] 

ASF GitHub Bot commented on DRILL-8421:
---

jnturton commented on PR #2793:
URL: https://github.com/apache/drill/pull/2793#issuecomment-1511592469

   > Thanks for the contribution and welcome to Drill! Would you mind rebasing 
once https://github.com/apache/drill/pull/2794 is merged?
   
   Heh, I just came here to type exactly this. I reviewed the code changes and 
they look great so really we just need the CI run after rebasing.




> Parquet TIMESTAMP_MICROS columns in WHERE clauses are not converted to 
> milliseconds before filtering
> 
>
> Key: DRILL-8421
> URL: https://issues.apache.org/jira/browse/DRILL-8421
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.21.0
>Reporter: Peter Franzen
>Priority: Major
> Fix For: 1.21.1
>
>
> When using Drill with parquet files where the timestamp columns are in 
> microseconds, Drill converts the microsecond values to milliseconds when 
> displayed. However, when using a timestamp column in WHERE clauses it looks 
> like the original microsecond value is used instead of the adjusted 
> millisecond value when filtering records.
> *To Reproduce*
> Assume a parquet file in a directory "Test" with a column _timestampCol_ 
> having the type 
> {{{}org.apache.parquet.schema.OriginalType.TIMESTAMP_MICROS{}}}.
> Assume there are two records with the values 1673981999806149 and 
> 1674759597743552, respectively, in that column (i.e. the UTC dates 
> 2023-01-17T18:59:59.806149 and 2023-01-26T18:59:57.743552)
>  # Execute the query
> {{SELECT timestampCol FROM dfs.Test;}}
> The result includes both records, as expected.
>  # Execute the query
> {{SELECT timestampCol FROM dfs.Test WHERE timestampCol < 
> TO_TIMESTAMP('2023-02-01 00:00:00', '-MM-dd HH:mm:ss')}}
> This produces an empty result although both records have a value less than 
> the argument.
>  # Execute
> {{SELECT timestampCol FROM dfs.Test WHERE timestampCol > 
> TO_TIMESTAMP('2023-02-01 00:00:00', '-MM-dd HH:mm:ss')}}
> The result includes both records although neither have a value greater than 
> the argument.
> *Expected behavior*
> The query in 2) above should produce a result with both records, and the 
> query in 3) should produce an empty result.
> *Additional context*
> Even timestamps long into the future produce results with both records, e.g.:
> {{SELECT timestampCol FROM dfs.Test WHERE timestampCol > 
> TO_TIMESTAMP('2502-04-04 00:00:00', '-MM-dd HH:mm:ss')}}
> Manually converting the timestamp column to milliseconds produces the 
> expected result:
> {{SELECT timestampCol FROM dfs.Test WHERE 
> TO_TIMESTAMP(CONVERT_FROM(CONVERT_TO(timestampCol, 'TIMESTAMP_EPOCH'), 
> 'BIGINT')/1000) < TO_TIMESTAMP('2023-02-01 00:00:00', '-MM-dd HH:mm:ss')}}
> produces a result with both records.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8424) Accommodate RexBuilder changes made for SAFE_CAST

2023-04-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713150#comment-17713150
 ] 

ASF GitHub Bot commented on DRILL-8424:
---

cgivre commented on code in PR #2794:
URL: https://github.com/apache/drill/pull/2794#discussion_r1168893372


##
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/conversion/DrillRexBuilder.java:
##
@@ -65,9 +65,9 @@ public RexNode ensureType(
* @return Call to CAST operator
*/
   @Override
-  public RexNode makeCast(RelDataType type, RexNode exp, boolean 
matchNullability) {
+  public RexNode makeCast(RelDataType type, RexNode exp, boolean 
matchNullability, boolean safe) {

Review Comment:
   🤦 





> Accommodate RexBuilder changes made for SAFE_CAST
> -
>
> Key: DRILL-8424
> URL: https://issues.apache.org/jira/browse/DRILL-8424
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.22.0
>Reporter: James Turton
>Assignee: James Turton
>Priority: Major
> Fix For: 1.22.0
>
>
> The introduction of SAFE_CAST support in CALCITE-5575 made method signature 
> changes in RexBuilder that broke a needed override in DrillRexBuilder.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8424) Accommodate RexBuilder changes made for SAFE_CAST

2023-04-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713137#comment-17713137
 ] 

ASF GitHub Bot commented on DRILL-8424:
---

jnturton commented on code in PR #2794:
URL: https://github.com/apache/drill/pull/2794#discussion_r1168863776


##
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/conversion/DrillRexBuilder.java:
##
@@ -65,9 +65,9 @@ public RexNode ensureType(
* @return Call to CAST operator
*/
   @Override
-  public RexNode makeCast(RelDataType type, RexNode exp, boolean 
matchNullability) {
+  public RexNode makeCast(RelDataType type, RexNode exp, boolean 
matchNullability, boolean safe) {

Review Comment:
   They did do this and only deprecated the original method so our build wasn't 
broken but our subclass DrillRexBuilder was broken in terms of runtime logic 
because our method override no longer took effect when it needed to.





> Accommodate RexBuilder changes made for SAFE_CAST
> -
>
> Key: DRILL-8424
> URL: https://issues.apache.org/jira/browse/DRILL-8424
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.22.0
>Reporter: James Turton
>Assignee: James Turton
>Priority: Major
> Fix For: 1.22.0
>
>
> The introduction of SAFE_CAST support in CALCITE-5575 made method signature 
> changes in RexBuilder that broke a needed override in DrillRexBuilder.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8424) Accommodate RexBuilder changes made for SAFE_CAST

2023-04-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713133#comment-17713133
 ] 

ASF GitHub Bot commented on DRILL-8424:
---

jnturton commented on code in PR #2794:
URL: https://github.com/apache/drill/pull/2794#discussion_r1168859598


##
exec/java-exec/src/main/codegen/templates/Parser.jj:
##
@@ -7727,6 +7764,8 @@ SqlPostfixOperator PostfixRowOperator() :
 |   < DATETIME_INTERVAL_CODE: "DATETIME_INTERVAL_CODE" >
 |   < DATETIME_INTERVAL_PRECISION: "DATETIME_INTERVAL_PRECISION" >
 |   < DAY: "DAY" >
+|   < DAYOFWEEK: "DAYOFWEEK" >
+|   < DAYOFYEAR: "DAYOFYEAR" >

Review Comment:
   Yes we should, thanks, added.





> Accommodate RexBuilder changes made for SAFE_CAST
> -
>
> Key: DRILL-8424
> URL: https://issues.apache.org/jira/browse/DRILL-8424
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.22.0
>Reporter: James Turton
>Assignee: James Turton
>Priority: Major
> Fix For: 1.22.0
>
>
> The introduction of SAFE_CAST support in CALCITE-5575 made method signature 
> changes in RexBuilder that broke a needed override in DrillRexBuilder.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8424) Accommodate RexBuilder changes made for SAFE_CAST

2023-04-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713132#comment-17713132
 ] 

ASF GitHub Bot commented on DRILL-8424:
---

jnturton commented on code in PR #2794:
URL: https://github.com/apache/drill/pull/2794#discussion_r1168857297


##
exec/java-exec/src/main/codegen/templates/Parser.jj:
##
@@ -15,9 +15,11 @@
  * limitations under the License.
  */
 
-// TODO: Delete this file to reinstate its extraction from calcite-core.jar
-// once CALCITE-5579 is resolved and the incompatible grammar changes 
introduced
-// by CALCITE-5469 have been backed out. Also see: exec/java-exec/pom.xml.

Review Comment:
   Thanks, resolved.





> Accommodate RexBuilder changes made for SAFE_CAST
> -
>
> Key: DRILL-8424
> URL: https://issues.apache.org/jira/browse/DRILL-8424
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.22.0
>Reporter: James Turton
>Assignee: James Turton
>Priority: Major
> Fix For: 1.22.0
>
>
> The introduction of SAFE_CAST support in CALCITE-5575 made method signature 
> changes in RexBuilder that broke a needed override in DrillRexBuilder.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8421) Parquet TIMESTAMP_MICROS columns in WHERE clauses are not converted to milliseconds before filtering

2023-04-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713127#comment-17713127
 ] 

ASF GitHub Bot commented on DRILL-8421:
---

handmadecode commented on PR #2793:
URL: https://github.com/apache/drill/pull/2793#issuecomment-1511492806

   @cgivre thanks, happy to contribute. I will rebase when 8424 is merged. 




> Parquet TIMESTAMP_MICROS columns in WHERE clauses are not converted to 
> milliseconds before filtering
> 
>
> Key: DRILL-8421
> URL: https://issues.apache.org/jira/browse/DRILL-8421
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.21.0
>Reporter: Peter Franzen
>Priority: Major
> Fix For: 1.21.1
>
>
> When using Drill with parquet files where the timestamp columns are in 
> microseconds, Drill converts the microsecond values to milliseconds when 
> displayed. However, when using a timestamp column in WHERE clauses it looks 
> like the original microsecond value is used instead of the adjusted 
> millisecond value when filtering records.
> *To Reproduce*
> Assume a parquet file in a directory "Test" with a column _timestampCol_ 
> having the type 
> {{{}org.apache.parquet.schema.OriginalType.TIMESTAMP_MICROS{}}}.
> Assume there are two records with the values 1673981999806149 and 
> 1674759597743552, respectively, in that column (i.e. the UTC dates 
> 2023-01-17T18:59:59.806149 and 2023-01-26T18:59:57.743552)
>  # Execute the query
> {{SELECT timestampCol FROM dfs.Test;}}
> The result includes both records, as expected.
>  # Execute the query
> {{SELECT timestampCol FROM dfs.Test WHERE timestampCol < 
> TO_TIMESTAMP('2023-02-01 00:00:00', '-MM-dd HH:mm:ss')}}
> This produces an empty result although both records have a value less than 
> the argument.
>  # Execute
> {{SELECT timestampCol FROM dfs.Test WHERE timestampCol > 
> TO_TIMESTAMP('2023-02-01 00:00:00', '-MM-dd HH:mm:ss')}}
> The result includes both records although neither have a value greater than 
> the argument.
> *Expected behavior*
> The query in 2) above should produce a result with both records, and the 
> query in 3) should produce an empty result.
> *Additional context*
> Even timestamps long into the future produce results with both records, e.g.:
> {{SELECT timestampCol FROM dfs.Test WHERE timestampCol > 
> TO_TIMESTAMP('2502-04-04 00:00:00', '-MM-dd HH:mm:ss')}}
> Manually converting the timestamp column to milliseconds produces the 
> expected result:
> {{SELECT timestampCol FROM dfs.Test WHERE 
> TO_TIMESTAMP(CONVERT_FROM(CONVERT_TO(timestampCol, 'TIMESTAMP_EPOCH'), 
> 'BIGINT')/1000) < TO_TIMESTAMP('2023-02-01 00:00:00', '-MM-dd HH:mm:ss')}}
> produces a result with both records.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8421) Parquet TIMESTAMP_MICROS columns in WHERE clauses are not converted to milliseconds before filtering

2023-04-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713123#comment-17713123
 ] 

ASF GitHub Bot commented on DRILL-8421:
---

cgivre commented on PR #2793:
URL: https://github.com/apache/drill/pull/2793#issuecomment-1511473455

   @handmadecode 
   Thanks for the contribution and welcome to Drill!  Would you mind rebasing 
once [DRILL-8424] (https://github.com/apache/drill/pull/2794) is merged?   
There are some CI issues which will be fixed by that PR.
   Thanks!




> Parquet TIMESTAMP_MICROS columns in WHERE clauses are not converted to 
> milliseconds before filtering
> 
>
> Key: DRILL-8421
> URL: https://issues.apache.org/jira/browse/DRILL-8421
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.21.0
>Reporter: Peter Franzen
>Priority: Major
> Fix For: 1.21.1
>
>
> When using Drill with parquet files where the timestamp columns are in 
> microseconds, Drill converts the microsecond values to milliseconds when 
> displayed. However, when using a timestamp column in WHERE clauses it looks 
> like the original microsecond value is used instead of the adjusted 
> millisecond value when filtering records.
> *To Reproduce*
> Assume a parquet file in a directory "Test" with a column _timestampCol_ 
> having the type 
> {{{}org.apache.parquet.schema.OriginalType.TIMESTAMP_MICROS{}}}.
> Assume there are two records with the values 1673981999806149 and 
> 1674759597743552, respectively, in that column (i.e. the UTC dates 
> 2023-01-17T18:59:59.806149 and 2023-01-26T18:59:57.743552)
>  # Execute the query
> {{SELECT timestampCol FROM dfs.Test;}}
> The result includes both records, as expected.
>  # Execute the query
> {{SELECT timestampCol FROM dfs.Test WHERE timestampCol < 
> TO_TIMESTAMP('2023-02-01 00:00:00', '-MM-dd HH:mm:ss')}}
> This produces an empty result although both records have a value less than 
> the argument.
>  # Execute
> {{SELECT timestampCol FROM dfs.Test WHERE timestampCol > 
> TO_TIMESTAMP('2023-02-01 00:00:00', '-MM-dd HH:mm:ss')}}
> The result includes both records although neither have a value greater than 
> the argument.
> *Expected behavior*
> The query in 2) above should produce a result with both records, and the 
> query in 3) should produce an empty result.
> *Additional context*
> Even timestamps long into the future produce results with both records, e.g.:
> {{SELECT timestampCol FROM dfs.Test WHERE timestampCol > 
> TO_TIMESTAMP('2502-04-04 00:00:00', '-MM-dd HH:mm:ss')}}
> Manually converting the timestamp column to milliseconds produces the 
> expected result:
> {{SELECT timestampCol FROM dfs.Test WHERE 
> TO_TIMESTAMP(CONVERT_FROM(CONVERT_TO(timestampCol, 'TIMESTAMP_EPOCH'), 
> 'BIGINT')/1000) < TO_TIMESTAMP('2023-02-01 00:00:00', '-MM-dd HH:mm:ss')}}
> produces a result with both records.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8424) Accommodate RexBuilder changes made for SAFE_CAST

2023-04-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713119#comment-17713119
 ] 

ASF GitHub Bot commented on DRILL-8424:
---

cgivre commented on code in PR #2794:
URL: https://github.com/apache/drill/pull/2794#discussion_r1168776897


##
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/conversion/DrillRexBuilder.java:
##
@@ -65,9 +65,9 @@ public RexNode ensureType(
* @return Call to CAST operator
*/
   @Override
-  public RexNode makeCast(RelDataType type, RexNode exp, boolean 
matchNullability) {
+  public RexNode makeCast(RelDataType type, RexNode exp, boolean 
matchNullability, boolean safe) {

Review Comment:
   This really highlights an issue with Calcite.  They really could have added 
an additional function something like below and nothing would have broken...
   
   
   ```
   makeCast(RelDataType type, RexNode exp, boolean matchNullability) {
  return makeCast(type, exp, matchNullability, false);
   }
   ```



##
exec/java-exec/src/main/codegen/templates/Parser.jj:
##
@@ -15,9 +15,11 @@
  * limitations under the License.
  */
 
-// TODO: Delete this file to reinstate its extraction from calcite-core.jar
-// once CALCITE-5579 is resolved and the incompatible grammar changes 
introduced
-// by CALCITE-5469 have been backed out. Also see: exec/java-exec/pom.xml.

Review Comment:
   Do we want to leave the original info here just so that we know which 
Calcite PRs we're waiting for?



##
exec/java-exec/src/main/codegen/templates/Parser.jj:
##
@@ -7727,6 +7764,8 @@ SqlPostfixOperator PostfixRowOperator() :
 |   < DATETIME_INTERVAL_CODE: "DATETIME_INTERVAL_CODE" >
 |   < DATETIME_INTERVAL_PRECISION: "DATETIME_INTERVAL_PRECISION" >
 |   < DAY: "DAY" >
+|   < DAYOFWEEK: "DAYOFWEEK" >
+|   < DAYOFYEAR: "DAYOFYEAR" >

Review Comment:
   Should we add a unit test for these synonyms?





> Accommodate RexBuilder changes made for SAFE_CAST
> -
>
> Key: DRILL-8424
> URL: https://issues.apache.org/jira/browse/DRILL-8424
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.22.0
>Reporter: James Turton
>Assignee: James Turton
>Priority: Major
> Fix For: 1.22.0
>
>
> The introduction of SAFE_CAST support in CALCITE-5575 made method signature 
> changes in RexBuilder that broke a needed override in DrillRexBuilder.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8417) Allow Excel Reader to Ignore Formula Errors

2023-04-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713121#comment-17713121
 ] 

ASF GitHub Bot commented on DRILL-8417:
---

cgivre commented on PR #2783:
URL: https://github.com/apache/drill/pull/2783#issuecomment-1511468213

   Once https://github.com/apache/drill/pull/2794 is merged, I'll rebase and 
merge this, pending @jnturton's approval.




> Allow Excel Reader to Ignore Formula Errors
> ---
>
> Key: DRILL-8417
> URL: https://issues.apache.org/jira/browse/DRILL-8417
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Excel
>Affects Versions: 1.21.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 1.21.1
>
>
> If Drill encounters an Excel formula which is invalid somehow, such as a 
> DIV/0, Drill is unable to proceed and throws a number format exception. 
> This PR adds a config parameter called ignoreErrors which allows Drill to 
> skip such records and returns null for that cell.  Drill will also output a 
> log warning.  When set to false, original behavior is retained.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8424) Accommodate RexBuilder changes made for SAFE_CAST

2023-04-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713077#comment-17713077
 ] 

ASF GitHub Bot commented on DRILL-8424:
---

jnturton commented on PR #2794:
URL: https://github.com/apache/drill/pull/2794#issuecomment-1511297568

   I botched the "Move distro tarball to the Maven install phase" commit but 
that's a one-liner and unrelated to any unit tests so I'll let the test tuns 
here complete before pushing its fix.




> Accommodate RexBuilder changes made for SAFE_CAST
> -
>
> Key: DRILL-8424
> URL: https://issues.apache.org/jira/browse/DRILL-8424
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.22.0
>Reporter: James Turton
>Assignee: James Turton
>Priority: Major
> Fix For: 1.22.0
>
>
> The introduction of SAFE_CAST support in CALCITE-5575 made method signature 
> changes in RexBuilder that broke a needed override in DrillRexBuilder.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8424) Accommodate RexBuilder changes made for SAFE_CAST

2023-04-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713048#comment-17713048
 ] 

ASF GitHub Bot commented on DRILL-8424:
---

jnturton opened a new pull request, #2794:
URL: https://github.com/apache/drill/pull/2794

   # [DRILL-8424](https://issues.apache.org/jira/browse/DRILL-8424): 
Accommodate RexBuilder changes made for SAFE_CAST
   
   ## Description
   
   Resolves the current CI test failues affecting decimal and 
empty-literal-to-null casting. Also incorporates upstream syntax additions in 
Drill's Parser.jj which can only be dropped when 
[CALCITE-5579](https://issues.apache.org/jira/browse/CALCITE-5579) is resolved.
   
   * Incorporate method signature changes in RexBuilder made by CALCITE-5557.
   * Fix float rounding error in TestCastFunctions.testCastFloatDecimalOverflow.
   * Incorporate Calcite parser changes.
 * [CALCITE-5557] Add SAFE_CAST function (enabled in BigQuery library)
 * [CALCITE-5548] Add MSSQL-style CONVERT function (enabled in MSSql 
library)
 * [CALCITE-5554] In EXTRACT function, add DAYOFWEEK and DAYOFYEAR as 
synonyms for DOW, DOY
   * Ignore .mvn/maven.config.
   * Upgrade Apache RAT plugin.
   * Upgrade os-maven-plugin.
   * Move distro tarball to the Maven install phase.
   
   ## Documentation
   SAFE_CAST to be documented and relevant syntax additions to be documented.
   
   ## Testing
   Failing tests now pass.
   




> Accommodate RexBuilder changes made for SAFE_CAST
> -
>
> Key: DRILL-8424
> URL: https://issues.apache.org/jira/browse/DRILL-8424
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.22.0
>Reporter: James Turton
>Assignee: James Turton
>Priority: Major
> Fix For: 1.22.0
>
>
> The introduction of SAFE_CAST support in CALCITE-5575 made method signature 
> changes in RexBuilder that broke a needed override in DrillRexBuilder.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8421) Parquet TIMESTAMP_MICROS columns in WHERE clauses are not converted to milliseconds before filtering

2023-04-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17712433#comment-17712433
 ] 

ASF GitHub Bot commented on DRILL-8421:
---

handmadecode opened a new pull request, #2793:
URL: https://github.com/apache/drill/pull/2793

   # [DRILL-8421](https://issues.apache.org/jira/browse/DRILL-8421): Truncate 
parquet microsecond columns
   
   ## Description
   
   The metadata min and max values of parquet microsecond columns are truncated 
to milliseconds, which is the time unit expected by the initial file pruning 
during filtering. Also, `TIME_MICROS` columns are read as 64-bit values before 
they are truncated to 32-bit milliseconds values. Previously they were read as 
32-bit values, causing values > `Integer.MAX_VALUE` to be incorrect.
   
   The second fix also addresses 
[DRILL-8423](https://issues.apache.org/jira/browse/DRILL-8423).
   
   ## Documentation
   Bugfix only, no documentation changes
   
   ## Testing
   Unit tests added in new test class 
`org.apache.drill.exec.store.parquet.TestMicrosecondColumns`.
   




> Parquet TIMESTAMP_MICROS columns in WHERE clauses are not converted to 
> milliseconds before filtering
> 
>
> Key: DRILL-8421
> URL: https://issues.apache.org/jira/browse/DRILL-8421
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.21.0
>Reporter: Peter Franzen
>Priority: Major
> Fix For: 1.21.1
>
>
> When using Drill with parquet files where the timestamp columns are in 
> microseconds, Drill converts the microsecond values to milliseconds when 
> displayed. However, when using a timestamp column in WHERE clauses it looks 
> like the original microsecond value is used instead of the adjusted 
> millisecond value when filtering records.
> *To Reproduce*
> Assume a parquet file in a directory "Test" with a column _timestampCol_ 
> having the type 
> {{{}org.apache.parquet.schema.OriginalType.TIMESTAMP_MICROS{}}}.
> Assume there are two records with the values 1673981999806149 and 
> 1674759597743552, respectively, in that column (i.e. the UTC dates 
> 2023-01-17T18:59:59.806149 and 2023-01-26T18:59:57.743552)
>  # Execute the query
> {{SELECT timestampCol FROM dfs.Test;}}
> The result includes both records, as expected.
>  # Execute the query
> {{SELECT timestampCol FROM dfs.Test WHERE timestampCol < 
> TO_TIMESTAMP('2023-02-01 00:00:00', '-MM-dd HH:mm:ss')}}
> This produces an empty result although both records have a value less than 
> the argument.
>  # Execute
> {{SELECT timestampCol FROM dfs.Test WHERE timestampCol > 
> TO_TIMESTAMP('2023-02-01 00:00:00', '-MM-dd HH:mm:ss')}}
> The result includes both records although neither have a value greater than 
> the argument.
> *Expected behavior*
> The query in 2) above should produce a result with both records, and the 
> query in 3) should produce an empty result.
> *Additional context*
> Even timestamps long into the future produce results with both records, e.g.:
> {{SELECT timestampCol FROM dfs.Test WHERE timestampCol > 
> TO_TIMESTAMP('2502-04-04 00:00:00', '-MM-dd HH:mm:ss')}}
> Manually converting the timestamp column to milliseconds produces the 
> expected result:
> {{SELECT timestampCol FROM dfs.Test WHERE 
> TO_TIMESTAMP(CONVERT_FROM(CONVERT_TO(timestampCol, 'TIMESTAMP_EPOCH'), 
> 'BIGINT')/1000) < TO_TIMESTAMP('2023-02-01 00:00:00', '-MM-dd HH:mm:ss')}}
> produces a result with both records.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8417) Allow Excel Reader to Ignore Formula Errors

2023-04-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17712254#comment-17712254
 ] 

ASF GitHub Bot commented on DRILL-8417:
---

jnturton commented on PR #2783:
URL: https://github.com/apache/drill/pull/2783#issuecomment-1508106845

   Reviewer's note: all format-excel tests do pass, the CI test failures here 
are a result of as yet unfixed breakage brought in by Calcite 1.35-SNAPSHOT.




> Allow Excel Reader to Ignore Formula Errors
> ---
>
> Key: DRILL-8417
> URL: https://issues.apache.org/jira/browse/DRILL-8417
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Excel
>Affects Versions: 1.21.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 1.21.1
>
>
> If Drill encounters an Excel formula which is invalid somehow, such as a 
> DIV/0, Drill is unable to proceed and throws a number format exception. 
> This PR adds a config parameter called ignoreErrors which allows Drill to 
> skip such records and returns null for that cell.  Drill will also output a 
> log warning.  When set to false, original behavior is retained.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8417) Allow Excel Reader to Ignore Formula Errors

2023-04-13 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17712148#comment-17712148
 ] 

ASF GitHub Bot commented on DRILL-8417:
---

cgivre commented on PR #2783:
URL: https://github.com/apache/drill/pull/2783#issuecomment-1507822172

   > 
   
   @jnturton
   I updated the PR to default to `false` and updated the README as well. 
   




> Allow Excel Reader to Ignore Formula Errors
> ---
>
> Key: DRILL-8417
> URL: https://issues.apache.org/jira/browse/DRILL-8417
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Excel
>Affects Versions: 1.21.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 1.21.1
>
>
> If Drill encounters an Excel formula which is invalid somehow, such as a 
> DIV/0, Drill is unable to proceed and throws a number format exception. 
> This PR adds a config parameter called ignoreErrors which allows Drill to 
> skip such records and returns null for that cell.  Drill will also output a 
> log warning.  When set to false, original behavior is retained.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8412) Upgrade to Calcite 1.35-SNAPSHOT

2023-04-10 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17710173#comment-17710173
 ] 

ASF GitHub Bot commented on DRILL-8412:
---

jnturton merged PR #2776:
URL: https://github.com/apache/drill/pull/2776




> Upgrade to Calcite 1.35-SNAPSHOT
> 
>
> Key: DRILL-8412
> URL: https://issues.apache.org/jira/browse/DRILL-8412
> Project: Apache Drill
>  Issue Type: Task
>  Components: SQL Parser
>Reporter: James Turton
>Assignee: James Turton
>Priority: Major
>
> This issue proposes that we try basing Drill master on snapshot builds of the 
> upcoming version of Calcite so that the CI tests that run automatically upon 
> commits to master will exercise Drill with present day Calcite.
> 1. The CI tests that run automatically upon commits to Drill master will 
> exercise Drill with present day Calcite.
> 2. Breaking changes in Calcite would (mostly) break the Drill CI and force us 
> to deal with them in order to proceed.
> 3. Regressions in Calcite would (mostly) break the Drill CI and force to 
> report them in order to proceed.
> 4. If Drill master becomes too unstable when it is based on Calcite snapshots 
> then this change is trivially undoable.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8420) Remove Guava shading and patching

2023-04-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17709249#comment-17709249
 ] 

ASF GitHub Bot commented on DRILL-8420:
---

jnturton commented on PR #2786:
URL: https://github.com/apache/drill/pull/2786#issuecomment-1498616790

   @cgivre 
   
   > Thanks for this. Aside from imports, which files were actually modified 
and I'll do a review?
   
   Yes, I'll split the import statement changes into a separate commit.
   
   > Do we want to add this to back port to stable?
   
   I don't think so. It's not a fix and it definitely carries a risk of 
breakage with it. Currently the Hadoop 2 build is broken because Drill's Guava 
patches (but not shading) are still needed in that case so I'll set the PR to 
draft for the moment.




> Remove Guava shading and patching
> -
>
> Key: DRILL-8420
> URL: https://issues.apache.org/jira/browse/DRILL-8420
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Tools, Build & Test
>Affects Versions: 1.21.0
>Reporter: James Turton
>Assignee: James Turton
>Priority: Major
> Fix For: 1.22.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8420) Remove Guava shading and patching

2023-04-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17709074#comment-17709074
 ] 

ASF GitHub Bot commented on DRILL-8420:
---

cgivre commented on PR #2786:
URL: https://github.com/apache/drill/pull/2786#issuecomment-1497991025

   Do we want to add this to back port to stable?




> Remove Guava shading and patching
> -
>
> Key: DRILL-8420
> URL: https://issues.apache.org/jira/browse/DRILL-8420
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Tools, Build & Test
>Affects Versions: 1.21.0
>Reporter: James Turton
>Assignee: James Turton
>Priority: Major
> Fix For: 1.22.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8420) Remove Guava shading and patching

2023-04-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17709072#comment-17709072
 ] 

ASF GitHub Bot commented on DRILL-8420:
---

cgivre commented on PR #2786:
URL: https://github.com/apache/drill/pull/2786#issuecomment-1497990433

   @jnturton Thanks for this.  Aside from imports, which files were actually 
modified and I'll do a review?




> Remove Guava shading and patching
> -
>
> Key: DRILL-8420
> URL: https://issues.apache.org/jira/browse/DRILL-8420
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Tools, Build & Test
>Affects Versions: 1.21.0
>Reporter: James Turton
>Assignee: James Turton
>Priority: Major
> Fix For: 1.22.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8412) Upgrade to Calcite 1.35-SNAPSHOT

2023-04-04 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17708439#comment-17708439
 ] 

ASF GitHub Bot commented on DRILL-8412:
---

jnturton commented on PR #2776:
URL: https://github.com/apache/drill/pull/2776#issuecomment-1496047719

   Can someone do the formailty of approving so that we can give this a try?




> Upgrade to Calcite 1.35-SNAPSHOT
> 
>
> Key: DRILL-8412
> URL: https://issues.apache.org/jira/browse/DRILL-8412
> Project: Apache Drill
>  Issue Type: Task
>  Components: SQL Parser
>Reporter: James Turton
>Assignee: James Turton
>Priority: Major
>
> This issue proposes that we try basing Drill master on snapshot builds of the 
> upcoming version of Calcite so that the CI tests that run automatically upon 
> commits to master will exercise Drill with present day Calcite.
> 1. The CI tests that run automatically upon commits to Drill master will 
> exercise Drill with present day Calcite.
> 2. Breaking changes in Calcite would (mostly) break the Drill CI and force us 
> to deal with them in order to proceed.
> 3. Regressions in Calcite would (mostly) break the Drill CI and force to 
> report them in order to proceed.
> 4. If Drill master becomes too unstable when it is based on Calcite snapshots 
> then this change is trivially undoable.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8416) Memory leak when the async Parquet reader skips empty pages

2023-04-04 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17708400#comment-17708400
 ] 

ASF GitHub Bot commented on DRILL-8416:
---

jnturton merged PR #2784:
URL: https://github.com/apache/drill/pull/2784




> Memory leak when the async Parquet reader skips empty pages
> ---
>
> Key: DRILL-8416
> URL: https://issues.apache.org/jira/browse/DRILL-8416
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.21.0
>Reporter: Matthias Rosenthaler
>Assignee: James Turton
>Priority: Major
> Fix For: 1.21.1
>
> Attachments: example.parquet, meta_steps.parquet
>
>
> If I try to query (
> {code:java}
> SELECT * FROM 
> `hdfs.data`.`./v2/meta_steps/me-2023-03-20-13-15-30-inv230021-kontrollsystemf39st9qrx20-03-2/meta_steps.parquet`{code}
> ) the following parquet file which is stored on hadoop file system I am 
> getting the following error:
> {code:java}
> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: 
> IllegalStateException: Memory was leaked by query. Memory leaked: (64) 
> Allocator(op:0:0:1:ParquetRowGroupScan) 100/64/34688/100 
> (res/actual/peak/limit){code}
> Everything is working fine with drill version 1.19.
> If I select only columns without NULL values, the query also works in 1.21.0:
> {code:java}
> SELECT `name`,`type` FROM 
> `hdfs.data`.`./v2/meta_steps/me-2023-03-20-13-15-30-inv230021-kontrollsystemf39st9qrx20-03-2/meta_steps.parquet`{code}
> Generated a new example.parquet with pyarrow 8.0.0 and a float column with 
> NULL valuues and the same error happened.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8420) Remove Guava shading and patching, and the conjars repo

2023-04-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17707995#comment-17707995
 ] 

ASF GitHub Bot commented on DRILL-8420:
---

jnturton opened a new pull request, #2786:
URL: https://github.com/apache/drill/pull/2786

   # [DRILL-8420](https://issues.apache.org/jira/browse/DRILL-8420): Remove 
Guava shading and patching, and the conjars repo
   
   ## Description
   
   - Remove shaded Guava.
   - Drop conjars repository.
   - Drop Guava patches.
   - Upgrade guava to 31.1-jre.
   - Upgrade parquet to 1.12.3 and parquet-format to 2.9.0.
   - Move Splunk Maven repository declaration to contrib/storage-splunk/pom.xml.
   
   ## Documentation
   
   N/A
   
   ## Testing
   
   Existing unit tests.
   




> Remove Guava shading and patching, and the conjars repo
> ---
>
> Key: DRILL-8420
> URL: https://issues.apache.org/jira/browse/DRILL-8420
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Tools, Build & Test
>Affects Versions: 1.21.0
>Reporter: James Turton
>Assignee: James Turton
>Priority: Major
> Fix For: 1.21.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8416) Memory leak when the async Parquet reader skips empty pages

2023-03-30 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17706953#comment-17706953
 ] 

ASF GitHub Bot commented on DRILL-8416:
---

jnturton opened a new pull request, #2784:
URL: https://github.com/apache/drill/pull/2784

   # [DRILL-8416](https://issues.apache.org/jira/browse/DRILL-8416): Memory 
leak when the async Parquet reader skips empty pages
   
   ## Description
   
   A regression introduced by the Parquet reader clean-up released in Drill 
1.20 has meant that buffers used for (non-empty) compressed data holding 
_empty_ dictionary or data pages which are skipped are not freed. Because empty 
pages are uncommon in real data this bug went undetected for a long time.
   
   ## Documentation
   N/A
   
   ## Testing
   New unit test.
   




> Memory leak when the async Parquet reader skips empty pages
> ---
>
> Key: DRILL-8416
> URL: https://issues.apache.org/jira/browse/DRILL-8416
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.21.0
>Reporter: Matthias Rosenthaler
>Assignee: James Turton
>Priority: Major
> Fix For: 1.21.1
>
> Attachments: example.parquet, meta_steps.parquet
>
>
> If I try to query (
> {code:java}
> SELECT * FROM 
> `hdfs.data`.`./v2/meta_steps/me-2023-03-20-13-15-30-inv230021-kontrollsystemf39st9qrx20-03-2/meta_steps.parquet`{code}
> ) the following parquet file which is stored on hadoop file system I am 
> getting the following error:
> {code:java}
> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: 
> IllegalStateException: Memory was leaked by query. Memory leaked: (64) 
> Allocator(op:0:0:1:ParquetRowGroupScan) 100/64/34688/100 
> (res/actual/peak/limit){code}
> Everything is working fine with drill version 1.19.
> If I select only columns without NULL values, the query also works in 1.21.0:
> {code:java}
> SELECT `name`,`type` FROM 
> `hdfs.data`.`./v2/meta_steps/me-2023-03-20-13-15-30-inv230021-kontrollsystemf39st9qrx20-03-2/meta_steps.parquet`{code}
> Generated a new example.parquet with pyarrow 8.0.0 and a float column with 
> NULL valuues and the same error happened.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8417) Allow Excel Reader to Ignore Formula Errors

2023-03-30 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17706895#comment-17706895
 ] 

ASF GitHub Bot commented on DRILL-8417:
---

cgivre opened a new pull request, #2783:
URL: https://github.com/apache/drill/pull/2783

   # [DRILL-8417](https://issues.apache.org/jira/browse/DRILL-8417): Allow 
Excel Reader to Ignore Formula Errors
   
   ## Description
   If Drill encounters an Excel formula which is invalid somehow, such as a 
`DIV/0`, Drill is unable to proceed and throws a number format exception. 
   This PR adds a config parameter called `ignoreErrors` which allows Drill to 
skip such records and returns `null` for that cell.  Drill will also output a 
log warning.  When set to `false`, original behavior is retained.
   
   ## Documentation
   Updated README
   
   * `ignoreErrors`:  Defaults to `true`.  When set to `true` Drill will return 
`null` for any
 formulas or any values that are unparseable.
   
   
   ## Testing
   Added two unit tests.




> Allow Excel Reader to Ignore Formula Errors
> ---
>
> Key: DRILL-8417
> URL: https://issues.apache.org/jira/browse/DRILL-8417
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Excel
>Affects Versions: 1.21.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 1.21.1
>
>
> If Drill encounters an Excel formula which is invalid somehow, such as a 
> DIV/0, Drill is unable to proceed and throws a number format exception. 
> This PR adds a config parameter called ignoreErrors which allows Drill to 
> skip such records and returns null for that cell.  Drill will also output a 
> log warning.  When set to false, original behavior is retained.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8409) Support the configuration of bind addresses for network services

2023-03-30 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17706814#comment-17706814
 ] 

ASF GitHub Bot commented on DRILL-8409:
---

jnturton commented on PR #2777:
URL: https://github.com/apache/drill/pull/2777#issuecomment-1490042762

   > @jnturton Sorry for the late review. The only thing that I would add is a 
mention of `drill.exec.rpc.bind_addr` and `drill.exec.http.bind_addr` in 
[`drill-override-example.conf`](https://github.com/apache/drill/blob/master/distribution/src/main/resources/drill-override-example.conf)
 if this file is still maintainable of course. In addition to our documentation.
   
   See #2782 .




> Support the configuration of bind addresses for network services
> 
>
> Key: DRILL-8409
> URL: https://issues.apache.org/jira/browse/DRILL-8409
> Project: Apache Drill
>  Issue Type: Bug
>  Components:  Server
>Affects Versions: 1.21.0
>Reporter: James Turton
>Assignee: James Turton
>Priority: Major
> Fix For: 1.21.1
>
>
> Drill provides the DRILL_HOST_NAME env var which determines what Drillbit 
> host name will be exchanged over RPC for later look up by a remote client or 
> Drillbit. This host name is used to check whether Drill is being asked to 
> bind to the loopback address in distributed mode
> {code:java}
>     if (isDistributedMode && 
> InetAddress.getByName(hostName).isLoopbackAddress()) {
>       throw new DrillbitStartupException("Drillbit is disallowed to bind to 
> loopback address in distributed mode.");
>     }{code}
> but is not subsequently used set the bind address used for the Drillbit's RPC 
> and web ports! This issue proposes that the Drillbit network services bind 
> address is determined by DRILL_HOST_NAME.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8409) Support the configuration of bind addresses for network services

2023-03-30 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17706813#comment-17706813
 ] 

ASF GitHub Bot commented on DRILL-8409:
---

jnturton commented on PR #2777:
URL: https://github.com/apache/drill/pull/2777#issuecomment-1490042016

   > LGTM +1. Do we need any doc updates for this?
   
   Documented.




> Support the configuration of bind addresses for network services
> 
>
> Key: DRILL-8409
> URL: https://issues.apache.org/jira/browse/DRILL-8409
> Project: Apache Drill
>  Issue Type: Bug
>  Components:  Server
>Affects Versions: 1.21.0
>Reporter: James Turton
>Assignee: James Turton
>Priority: Major
> Fix For: 1.21.1
>
>
> Drill provides the DRILL_HOST_NAME env var which determines what Drillbit 
> host name will be exchanged over RPC for later look up by a remote client or 
> Drillbit. This host name is used to check whether Drill is being asked to 
> bind to the loopback address in distributed mode
> {code:java}
>     if (isDistributedMode && 
> InetAddress.getByName(hostName).isLoopbackAddress()) {
>       throw new DrillbitStartupException("Drillbit is disallowed to bind to 
> loopback address in distributed mode.");
>     }{code}
> but is not subsequently used set the bind address used for the Drillbit's RPC 
> and web ports! This issue proposes that the Drillbit network services bind 
> address is determined by DRILL_HOST_NAME.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8409) Support the configuration of bind addresses for network services

2023-03-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17706528#comment-17706528
 ] 

ASF GitHub Bot commented on DRILL-8409:
---

rymarm commented on PR #2777:
URL: https://github.com/apache/drill/pull/2777#issuecomment-1489063820

   @jnturton Sorry for the late review. The only thing that I would add is a 
mention of `drill.exec.rpc.bind_addr` and `drill.exec.http.bind_addr` in 
[`drill-override-example.conf`](https://github.com/apache/drill/blob/master/distribution/src/main/resources/drill-override-example.conf)
 if this file is still maintainable of course. In addition to our documentation.
   




> Support the configuration of bind addresses for network services
> 
>
> Key: DRILL-8409
> URL: https://issues.apache.org/jira/browse/DRILL-8409
> Project: Apache Drill
>  Issue Type: Bug
>  Components:  Server
>Affects Versions: 1.21.0
>Reporter: James Turton
>Assignee: James Turton
>Priority: Major
> Fix For: 1.21.1
>
>
> Drill provides the DRILL_HOST_NAME env var which determines what Drillbit 
> host name will be exchanged over RPC for later look up by a remote client or 
> Drillbit. This host name is used to check whether Drill is being asked to 
> bind to the loopback address in distributed mode
> {code:java}
>     if (isDistributedMode && 
> InetAddress.getByName(hostName).isLoopbackAddress()) {
>       throw new DrillbitStartupException("Drillbit is disallowed to bind to 
> loopback address in distributed mode.");
>     }{code}
> but is not subsequently used set the bind address used for the Drillbit's RPC 
> and web ports! This issue proposes that the Drillbit network services bind 
> address is determined by DRILL_HOST_NAME.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8409) Support the configuration of bind addresses for network services

2023-03-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17706437#comment-17706437
 ] 

ASF GitHub Bot commented on DRILL-8409:
---

jnturton merged PR #2777:
URL: https://github.com/apache/drill/pull/2777




> Support the configuration of bind addresses for network services
> 
>
> Key: DRILL-8409
> URL: https://issues.apache.org/jira/browse/DRILL-8409
> Project: Apache Drill
>  Issue Type: Bug
>  Components:  Server
>Affects Versions: 1.21.0
>Reporter: James Turton
>Assignee: James Turton
>Priority: Major
> Fix For: 1.21.1
>
>
> Drill provides the DRILL_HOST_NAME env var which determines what Drillbit 
> host name will be exchanged over RPC for later look up by a remote client or 
> Drillbit. This host name is used to check whether Drill is being asked to 
> bind to the loopback address in distributed mode
> {code:java}
>     if (isDistributedMode && 
> InetAddress.getByName(hostName).isLoopbackAddress()) {
>       throw new DrillbitStartupException("Drillbit is disallowed to bind to 
> loopback address in distributed mode.");
>     }{code}
> but is not subsequently used set the bind address used for the Drillbit's RPC 
> and web ports! This issue proposes that the Drillbit network services bind 
> address is determined by DRILL_HOST_NAME.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8409) Support the configuration of bind addresses for network services

2023-03-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17706436#comment-17706436
 ] 

ASF GitHub Bot commented on DRILL-8409:
---

jnturton commented on PR #2777:
URL: https://github.com/apache/drill/pull/2777#issuecomment-1488817097

   > LGTM +1. Do we need any doc updates for this?
   
   Yes I need to document the two new boot options, thanks.




> Support the configuration of bind addresses for network services
> 
>
> Key: DRILL-8409
> URL: https://issues.apache.org/jira/browse/DRILL-8409
> Project: Apache Drill
>  Issue Type: Bug
>  Components:  Server
>Affects Versions: 1.21.0
>Reporter: James Turton
>Assignee: James Turton
>Priority: Major
> Fix For: 1.21.1
>
>
> Drill provides the DRILL_HOST_NAME env var which determines what Drillbit 
> host name will be exchanged over RPC for later look up by a remote client or 
> Drillbit. This host name is used to check whether Drill is being asked to 
> bind to the loopback address in distributed mode
> {code:java}
>     if (isDistributedMode && 
> InetAddress.getByName(hostName).isLoopbackAddress()) {
>       throw new DrillbitStartupException("Drillbit is disallowed to bind to 
> loopback address in distributed mode.");
>     }{code}
> but is not subsequently used set the bind address used for the Drillbit's RPC 
> and web ports! This issue proposes that the Drillbit network services bind 
> address is determined by DRILL_HOST_NAME.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8412) Upgrade to Calcite 1.35-SNAPSHOT

2023-03-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17702884#comment-17702884
 ] 

ASF GitHub Bot commented on DRILL-8412:
---

jnturton commented on PR #2776:
URL: https://github.com/apache/drill/pull/2776#issuecomment-1476733563

   > @jnturton  I think we should definitely run this experiment.  I'm also 
curious as to how to run this with a specific PR from Calcite.  That way I can 
contribute to the review process over there and find things that break drill 
quickly.
   
   They won't build and publish artefacts for unmerged PRs (would be a security 
problem) so we'll have to run our own Calcite builds for these. Probably the 
best is for the developer to pull the PR into a local branch and build Calcite 
putting the results in their local Maven repo, then build Drill.




> Upgrade to Calcite 1.35-SNAPSHOT
> 
>
> Key: DRILL-8412
> URL: https://issues.apache.org/jira/browse/DRILL-8412
> Project: Apache Drill
>  Issue Type: Task
>  Components: SQL Parser
>Reporter: James Turton
>Assignee: James Turton
>Priority: Major
>
> This issue proposes that we try basing Drill master on snapshot builds of the 
> upcoming version of Calcite so that the CI tests that run automatically upon 
> commits to master will exercise Drill with present day Calcite.
> 1. The CI tests that run automatically upon commits to Drill master will 
> exercise Drill with present day Calcite.
> 2. Breaking changes in Calcite would (mostly) break the Drill CI and force us 
> to deal with them in order to proceed.
> 3. Regressions in Calcite would (mostly) break the Drill CI and force to 
> report them in order to proceed.
> 4. If Drill master becomes too unstable when it is based on Calcite snapshots 
> then this change is trivially undoable.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8412) Upgrade to Calcite 1.35-SNAPSHOT

2023-03-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17702873#comment-17702873
 ] 

ASF GitHub Bot commented on DRILL-8412:
---

cgivre commented on PR #2776:
URL: https://github.com/apache/drill/pull/2776#issuecomment-1476706574

   @jnturton  I think we should definitely run this experiment.  I'm also 
curious as to how to run this with a specific PR from Calcite.  That way I can 
contribute to the review process over there and find things that break drill 
quickly.




> Upgrade to Calcite 1.35-SNAPSHOT
> 
>
> Key: DRILL-8412
> URL: https://issues.apache.org/jira/browse/DRILL-8412
> Project: Apache Drill
>  Issue Type: Task
>  Components: SQL Parser
>Reporter: James Turton
>Assignee: James Turton
>Priority: Major
>
> This issue proposes that we try basing Drill master on snapshot builds of the 
> upcoming version of Calcite so that the CI tests that run automatically upon 
> commits to master will exercise Drill with present day Calcite.
> 1. The CI tests that run automatically upon commits to Drill master will 
> exercise Drill with present day Calcite.
> 2. Breaking changes in Calcite would (mostly) break the Drill CI and force us 
> to deal with them in order to proceed.
> 3. Regressions in Calcite would (mostly) break the Drill CI and force to 
> report them in order to proceed.
> 4. If Drill master becomes too unstable when it is based on Calcite snapshots 
> then this change is trivially undoable.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8414) Index Paginator Not Working When Provided URL

2023-03-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17702872#comment-17702872
 ] 

ASF GitHub Bot commented on DRILL-8414:
---

cgivre merged PR #2779:
URL: https://github.com/apache/drill/pull/2779




> Index Paginator Not Working When Provided URL
> -
>
> Key: DRILL-8414
> URL: https://issues.apache.org/jira/browse/DRILL-8414
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - HTTP
>Affects Versions: 1.21.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 1.21.1
>
>
> The index paginator offers two options:  One where the API returns an index 
> or offset and the other is when it returns a URL.  The second was not fully 
> implemented.  This PR also adds functionality in the case where the API 
> returns a path rather than a URL.  In that case, the path will replace the 
> pre-existing path segments.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8412) Upgrade to Calcite 1.35-SNAPSHOT

2023-03-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17702475#comment-17702475
 ] 

ASF GitHub Bot commented on DRILL-8412:
---

jnturton commented on PR #2776:
URL: https://github.com/apache/drill/pull/2776#issuecomment-1475747749

   @cgivre, @vvysotskyi, @rymarm, @luocooong are you up for running this 
experiment for a while? It's trivial to revert but I personally feel there's a 
good chance that we won't want to.




> Upgrade to Calcite 1.35-SNAPSHOT
> 
>
> Key: DRILL-8412
> URL: https://issues.apache.org/jira/browse/DRILL-8412
> Project: Apache Drill
>  Issue Type: Task
>  Components: SQL Parser
>Reporter: James Turton
>Assignee: James Turton
>Priority: Major
>
> This issue proposes that we try basing Drill master on snapshot builds of the 
> upcoming version of Calcite so that the CI tests that run automatically upon 
> commits to master will exercise Drill with present day Calcite.
> 1. The CI tests that run automatically upon commits to Drill master will 
> exercise Drill with present day Calcite.
> 2. Breaking changes in Calcite would (mostly) break the Drill CI and force us 
> to deal with them in order to proceed.
> 3. Regressions in Calcite would (mostly) break the Drill CI and force to 
> report them in order to proceed.
> 4. If Drill master becomes too unstable when it is based on Calcite snapshots 
> then this change is trivially undoable.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8413) Add DNS Lookup Functions

2023-03-19 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17702300#comment-17702300
 ] 

ASF GitHub Bot commented on DRILL-8413:
---

cgivre merged PR #2778:
URL: https://github.com/apache/drill/pull/2778




> Add DNS Lookup Functions
> 
>
> Key: DRILL-8413
> URL: https://issues.apache.org/jira/browse/DRILL-8413
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Functions - Drill
>Affects Versions: 1.21.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 1.22
>
>
> This PR adds additional DNS lookup functions to Drill:
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8413) Add DNS Lookup Functions

2023-03-19 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17702281#comment-17702281
 ] 

ASF GitHub Bot commented on DRILL-8413:
---

jnturton commented on code in PR #2778:
URL: https://github.com/apache/drill/pull/2778#discussion_r1141347355


##
contrib/udfs/src/test/java/org/apache/drill/exec/udfs/TestDNSFunctions.java:
##
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.udfs;
+
+import org.apache.drill.categories.SqlFunctionTest;
+import org.apache.drill.categories.UnlikelyTest;
+import org.apache.drill.test.ClusterFixture;
+import org.apache.drill.test.ClusterFixtureBuilder;
+import org.apache.drill.test.ClusterTest;
+import org.junit.BeforeClass;
+import org.junit.Test;
+import org.junit.experimental.categories.Category;
+
+@Category({UnlikelyTest.class, SqlFunctionTest.class})
+public class TestDNSFunctions extends ClusterTest {
+
+  @BeforeClass
+  public static void setup() throws Exception {
+ClusterFixtureBuilder builder = ClusterFixture.builder(dirTestWatcher);
+startCluster(builder);
+  }
+
+  @Test
+  public void testGetHostAddress() throws Exception {
+String query = "select get_host_address('apache.org') as hostname from 
(values(1))";
+
testBuilder().sqlQuery(query).ordered().baselineColumns("hostname").baselineValues("151.101.2.132").go();

Review Comment:
   I guess these tests are technically nondeterministic but so seldom that it's 
nothing to worry about.



##
contrib/udfs/README.md:
##
@@ -436,3 +436,11 @@ The functions are:
 
 [1]: https://github.com/target/huntlib
 
+
+# DNS Functions

Review Comment:
   It would be nice to mention that the JRE caches DNS records for their TTL 
which should mean that these functions can scale to big datasets if the number 
of distinct domains that need to be looked up is not big.



##
contrib/udfs/src/main/java/org/apache/drill/exec/udfs/DNSUtils.java:
##
@@ -0,0 +1,234 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.udfs;
+
+import io.netty.buffer.DrillBuf;
+import org.apache.commons.lang3.StringUtils;
+import org.apache.commons.net.whois.WhoisClient;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.exec.expr.holders.VarCharHolder;
+import org.apache.drill.exec.vector.complex.writer.BaseWriter.ComplexWriter;
+import org.apache.drill.exec.vector.complex.writer.BaseWriter.ListWriter;
+import org.apache.drill.exec.vector.complex.writer.BaseWriter.MapWriter;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import org.xbill.DNS.Lookup;
+import org.xbill.DNS.Record;
+import org.xbill.DNS.SimpleResolver;
+import org.xbill.DNS.TextParseException;
+import org.xbill.DNS.Type;
+
+import java.io.IOException;
+import java.net.SocketException;
+import java.net.UnknownHostException;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+/**
+ * Utility class which contains various methods for performing DNS resolution 
and WHOIS lookups in Drill UDFs.
+ */
+public class DNSUtils {
+
+  private static final Logger logger = LoggerFactory.getLogger(DNSUtils.class);
+  /**
+   *  A list of known DNS resolvers.
+   */
+  private static final Map KNOWN_RESOLVERS = new HashMap<>();

[jira] [Commented] (DRILL-8414) Index Paginator Not Working When Provided URL

2023-03-18 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17702194#comment-17702194
 ] 

ASF GitHub Bot commented on DRILL-8414:
---

cgivre opened a new pull request, #2779:
URL: https://github.com/apache/drill/pull/2779

   # [DRILL-8414](https://issues.apache.org/jira/browse/DRILL-8414): Index 
Paginator Not Working When Provided URL
   
   ## Description
   The index paginator offers two options:  One where the API returns an index 
or offset and the other is when it returns a URL.  The second was not fully 
implemented.  This PR also adds functionality in the case where the API returns 
a path rather than a URL.  In that case, the path will replace the pre-existing 
path segments.
   
   ## Documentation
   No user facing changes.
   
   ## Testing
   Added three additional unit tests and verified URL generation manually. 




> Index Paginator Not Working When Provided URL
> -
>
> Key: DRILL-8414
> URL: https://issues.apache.org/jira/browse/DRILL-8414
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - HTTP
>Affects Versions: 1.21.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 1.21.1
>
>
> The index paginator offers two options:  One where the API returns an index 
> or offset and the other is when it returns a URL.  The second was not fully 
> implemented.  This PR also adds functionality in the case where the API 
> returns a path rather than a URL.  In that case, the path will replace the 
> pre-existing path segments.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8413) Add DNS Lookup Functions

2023-03-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17701782#comment-17701782
 ] 

ASF GitHub Bot commented on DRILL-8413:
---

cgivre opened a new pull request, #2778:
URL: https://github.com/apache/drill/pull/2778

   # [DRILL-8413](https://issues.apache.org/jira/browse/DRILL-8413): Add DNS 
Lookup Functions
   
   
   ## Description
   See below
   
   ## Documentation
   
   These functions enable DNS research using Drill.
   
   * `getHostName()`:  Returns the host name associated with an IP 
address.
   * `getHostAddress()`:  Returns an IP address associated with a host 
name.
   * `dnsLookup(, [])`:  Performs a DNS lookup on a given host. 
 You can optionally provide a resolver.  Possible resolver values are: 
`cloudflare`,  `cloudflare_secondary`, `google`, `google_secondary`, 
`verisign`, `verisign_secondary`, `yandex`, `yandex_secondary`.
   * `whois(, [])`:  Performs a whois lookup on the given host 
name.  You can optionally provide a resolver URL. Note that not all providers 
allow bulk automated whois lookups, so please follow the terms fo service for 
your provider.
   
   ## Testing
   Added unit tests.




> Add DNS Lookup Functions
> 
>
> Key: DRILL-8413
> URL: https://issues.apache.org/jira/browse/DRILL-8413
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Functions - Drill
>Affects Versions: 1.21.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 1.22
>
>
> This PR adds additional DNS lookup functions to Drill:
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8409) Support the configuration of bind addresses for network services

2023-03-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17701684#comment-17701684
 ] 

ASF GitHub Bot commented on DRILL-8409:
---

jnturton commented on PR #2777:
URL: https://github.com/apache/drill/pull/2777#issuecomment-1473742795

   I've added two unrelated minor changes implementing safe calls to close() 
methods. Currently when these calls fail due to some earlier error they drown 
interesting messages out in unhelpful NPE noise.




> Support the configuration of bind addresses for network services
> 
>
> Key: DRILL-8409
> URL: https://issues.apache.org/jira/browse/DRILL-8409
> Project: Apache Drill
>  Issue Type: Bug
>  Components:  Server
>Affects Versions: 1.21.0
>Reporter: James Turton
>Assignee: James Turton
>Priority: Major
> Fix For: 1.21.1
>
>
> Drill provides the DRILL_HOST_NAME env var which determines what Drillbit 
> host name will be exchanged over RPC for later look up by a remote client or 
> Drillbit. This host name is used to check whether Drill is being asked to 
> bind to the loopback address in distributed mode
> {code:java}
>     if (isDistributedMode && 
> InetAddress.getByName(hostName).isLoopbackAddress()) {
>       throw new DrillbitStartupException("Drillbit is disallowed to bind to 
> loopback address in distributed mode.");
>     }{code}
> but is not subsequently used set the bind address used for the Drillbit's RPC 
> and web ports! This issue proposes that the Drillbit network services bind 
> address is determined by DRILL_HOST_NAME.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8409) Support the configuration of bind addresses for network services

2023-03-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17701208#comment-17701208
 ] 

ASF GitHub Bot commented on DRILL-8409:
---

jnturton opened a new pull request, #2777:
URL: https://github.com/apache/drill/pull/2777

   # [DRILL-8409](https://issues.apache.org/jira/browse/DRILL-8409): Support 
the configuration of bind addresses for network services
   
   ## Description
   
   Drill provides the DRILL_HOST_NAME env var which determines what Drillbit 
host name will be exchanged over RPC for later look up by a remote client or 
Drillbit. This host name is used to check whether Drill is being asked to bind 
to the loopback address in distributed mode
   ```
   if (isDistributedMode && 
InetAddress.getByName(hostName).isLoopbackAddress()) {
 throw new DrillbitStartupException("Drillbit is disallowed to bind to 
loopback address in distributed mode.");
   }
   ```
   but is not ever used set the bind address used for the Drillbit's RPC and 
web ports! This PR adds new boot options
   ```
   drill.exec.rpc.bind_addr
   drill.exec.http.bind_addr
   ```
   and uses them to set the bind addresses used for RPC services and the HTTP 
service respectively.
   
   ## Documentation
   Document all three of DRILL_HOST_NAME and the two new bind address options.
   
   ## Testing
   Provide no bind addresses and confirm that the effective previous default 
(0.0.0.0) is applied.
   Manually set bind addresses and test that Drill is not accessible on other 
local addresses.
   




> Support the configuration of bind addresses for network services
> 
>
> Key: DRILL-8409
> URL: https://issues.apache.org/jira/browse/DRILL-8409
> Project: Apache Drill
>  Issue Type: Bug
>  Components:  Server
>Affects Versions: 1.21.0
>Reporter: James Turton
>Assignee: James Turton
>Priority: Major
> Fix For: 1.21.1
>
>
> Drill provides the DRILL_HOST_NAME env var which determines what Drillbit 
> host name will be exchanged over RPC for later look up by a remote client or 
> Drillbit. This host name is used to check whether Drill is being asked to 
> bind to the loopback address in distributed mode
> {code:java}
>     if (isDistributedMode && 
> InetAddress.getByName(hostName).isLoopbackAddress()) {
>       throw new DrillbitStartupException("Drillbit is disallowed to bind to 
> loopback address in distributed mode.");
>     }{code}
> but is not subsequently used set the bind address used for the Drillbit's RPC 
> and web ports! This issue proposes that the Drillbit network services bind 
> address is determined by DRILL_HOST_NAME.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8412) Upgrade to Calcite 1.35-SNAPSHOT

2023-03-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17701147#comment-17701147
 ] 

ASF GitHub Bot commented on DRILL-8412:
---

jnturton opened a new pull request, #2776:
URL: https://github.com/apache/drill/pull/2776

   # [DRILL-8412](https://issues.apache.org/jira/browse/DRILL-8412): Upgrade to 
Calcite 1.35-SNAPSHOT
   
   ## Description
   
   If we're willing to try basing Drill master on snapshot builds of the 
upcoming version of Calcite for a while then here's a PR to do that. Please see 
the related discussion in Drill and Calcite mailing lists this week for more 
information.
   
   Important notes.
   
   1. The CI tests that run automatically upon commits to Drill master will 
exercise Drill with present day Calcite.
   2. Breaking changes in Calcite would (mostly) break the Drill CI and force 
us to deal with them in order to proceed.
   3. Regressions in Calcite would (mostly) break the Drill CI and force to 
report them in order to proceed.
   4. If Drill master becomes too unstable when it is based on Calcite 
snapshots then this PR is trivially undoable.
   
   ## Documentation
   N/A
   
   ## Testing
   Existing unit test suite.
   




> Upgrade to Calcite 1.35-SNAPSHOT
> 
>
> Key: DRILL-8412
> URL: https://issues.apache.org/jira/browse/DRILL-8412
> Project: Apache Drill
>  Issue Type: Task
>  Components: SQL Parser
>Reporter: James Turton
>Assignee: James Turton
>Priority: Major
>
> This issue proposes that we try basing Drill master on snapshot builds of the 
> upcoming version of Calcite so that the CI tests that run automatically upon 
> commits to master will exercise Drill with present day Calcite.
> Breaking changes in Calcite would (mostly) break the Drill CI and force us to 
> deal with them in order to proceed.
> Regressions in Calcite would (mostly) break the Drill CI and force to report 
> them in order to proceed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8410) Upgrade to Calcite 1.34

2023-03-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17701139#comment-17701139
 ] 

ASF GitHub Bot commented on DRILL-8410:
---

jnturton merged PR #2775:
URL: https://github.com/apache/drill/pull/2775




> Upgrade to Calcite 1.34
> ---
>
> Key: DRILL-8410
> URL: https://issues.apache.org/jira/browse/DRILL-8410
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: library
>Affects Versions: 1.21.0
>Reporter: James Turton
>Assignee: James Turton
>Priority: Major
> Fix For: 1.21.1
>
>
> Calcite 1.34 includes
>  # a fix for the currently broken date_trunc function in Drill
>  # support for a new QUALIFY clause in windows functions
>  # incompatible core parser grammar changes that break date_diff in Drill.
> Because of (3), Drill needs to make temporary use of a modified Parser.jj 
> until Calcite backs out the mentioned parser changes. See the linked Calcite 
> issues for more details.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8410) Upgrade to Calcite 1.34

2023-03-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17701101#comment-17701101
 ] 

ASF GitHub Bot commented on DRILL-8410:
---

jnturton commented on PR #2775:
URL: https://github.com/apache/drill/pull/2775#issuecomment-1471784565

   It would probably be possible to create patch-based version of this PR that 
makes use of 
[maven-patch-plugin](https://maven.apache.org/plugins/maven-patch-plugin/) and 
has a much lower line count. On the other hand it's not inconceivable that we 
decide to maintain our Parser.jj.




> Upgrade to Calcite 1.34
> ---
>
> Key: DRILL-8410
> URL: https://issues.apache.org/jira/browse/DRILL-8410
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: library
>Affects Versions: 1.21.0
>Reporter: James Turton
>Assignee: James Turton
>Priority: Major
> Fix For: 1.21.1
>
>
> Calcite 1.34 includes
>  # a fix for the currently broken date_trunc function in Drill
>  # support for a new QUALIFY clause in windows functions
>  # incompatible core parser grammar changes that break date_diff in Drill.
> Because of (3), Drill needs to make temporary use of a modified Parser.jj 
> until Calcite backs out the mentioned parser changes. See the linked Calcite 
> issues for more details.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8393) Allow parameters to be passed to headers through SQL in WHERE clause

2023-03-15 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17700914#comment-17700914
 ] 

ASF GitHub Bot commented on DRILL-8393:
---

LYCJeff commented on PR #2747:
URL: https://github.com/apache/drill/pull/2747#issuecomment-1471106650

   > @LYCJeff Thanks for making these changes. I have a few questions:
   > 
   > 1. Are you certain that these filters are in fact being pushed down as 
intended?
   > 2. I'm really concerned about what would happen if a user aliased a data 
source as `header` or `tail`.
   > 
   > IE:
   > 
   > ```sql
   > SELECT ... 
   > FROM api.foo 
   > INNER JOIN dfs.`tail.csv` AS tail
   > ON tail.id = foo.id
   > WHERE tail.name = 'something'
   > ```
   > 
   > Do we know how this would be interpreted?
   
   Well, we actually need to recognize `header.xxx` as a whole parameter name, 
so we need to use back quotes. Only then can it be pushed normally, so these 
prefixes are not confused with data source aliases.
   
   If the `name` in your example above is an argument to the `foo` api, it 
should be written as follows.
   
   ```sql
   SELECT ...
   FROM api.foo
   INNER JOIN dfs.`tail.csv` AS tail
   ON tail.id=foo.id
   WHERE `tail.name` = 'something'
   ```




> Allow parameters to be passed to headers through SQL in WHERE clause
> 
>
> Key: DRILL-8393
> URL: https://issues.apache.org/jira/browse/DRILL-8393
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - HTTP
>Affects Versions: 1.20.0
>Reporter: Yuchen Liang
>Priority: Major
>
> Some APIs require parameters (e.g. digital signature) in the headers to be 
> generated at access time.So I'm wondering if we can pass it in through filter 
> statement.
> Perhaps we could design it like the params field in connections parameter. 
> For example:
>  
> Config:
> { "url": "https://api.sunrise-sunset.org/json";, "requireTail": false, 
> "params": ["body.lat", "body.lng", "body.date", "header.header1"], 
> "parameterLocation": "json_body" }
>  
> SQL Query:
> SELECT * FROM api.sunrise
> WHERE `body.lat` = 36.7201600
> AND `body.lng` = -4.4203400
> AND `body.date` = '2019-10-02'
> AND `header.header1` = 'value1';
>  
> Post body:
> { "lat": 36.7201600, "lng": -4.4203400, "date": "2019-10-02"}
>  
> Headers:
> { "header1": "value1", ……}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8393) Allow parameters to be passed to headers through SQL in WHERE clause

2023-03-15 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17700907#comment-17700907
 ] 

ASF GitHub Bot commented on DRILL-8393:
---

cgivre commented on PR #2747:
URL: https://github.com/apache/drill/pull/2747#issuecomment-1471002379

   @LYCJeff Thanks for making these changes.  I have a few questions:
   1.  Are you certain that these filters are in fact being pushed down as 
intended? 
   2.  I'm really concerned about what would happen if a user aliased a data 
source as `header` or `tail`.   
   
   IE:
   
   ```sql
   SELECT ... 
   FROM api.foo 
   INNER JOIN dfs.`tail.csv` AS tail
   ON tail.id = foo.id
   WHERE tail.name = 'something'
   ```
   Do we know how this would be interpreted?  




> Allow parameters to be passed to headers through SQL in WHERE clause
> 
>
> Key: DRILL-8393
> URL: https://issues.apache.org/jira/browse/DRILL-8393
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - HTTP
>Affects Versions: 1.20.0
>Reporter: Yuchen Liang
>Priority: Major
>
> Some APIs require parameters (e.g. digital signature) in the headers to be 
> generated at access time.So I'm wondering if we can pass it in through filter 
> statement.
> Perhaps we could design it like the params field in connections parameter. 
> For example:
>  
> Config:
> { "url": "https://api.sunrise-sunset.org/json";, "requireTail": false, 
> "params": ["body.lat", "body.lng", "body.date", "header.header1"], 
> "parameterLocation": "json_body" }
>  
> SQL Query:
> SELECT * FROM api.sunrise
> WHERE `body.lat` = 36.7201600
> AND `body.lng` = -4.4203400
> AND `body.date` = '2019-10-02'
> AND `header.header1` = 'value1';
>  
> Post body:
> { "lat": 36.7201600, "lng": -4.4203400, "date": "2019-10-02"}
>  
> Headers:
> { "header1": "value1", ……}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8411) GoogleSheets Reader Will Not Read More than 1K Rows

2023-03-15 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17700797#comment-17700797
 ] 

ASF GitHub Bot commented on DRILL-8411:
---

cgivre merged PR #2774:
URL: https://github.com/apache/drill/pull/2774




> GoogleSheets Reader Will Not Read More than 1K Rows
> ---
>
> Key: DRILL-8411
> URL: https://issues.apache.org/jira/browse/DRILL-8411
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - GoogleSheets
>Affects Versions: 1.21.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 1.21.1
>
>
> The GoogleSheets reader hits the batch limit from the GoogleSheets SDK of 
> 1000 rows and stops.   This PR fixes that.  
> It also fixes a minor but annoying issue whereby the GoogleSheets reader 
> determines a column is a date/time, but is then unable to parse it because it 
> is in a non-standard format.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8410) Upgrade to Calcite 1.34

2023-03-15 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17700673#comment-17700673
 ] 

ASF GitHub Bot commented on DRILL-8410:
---

jnturton opened a new pull request, #2775:
URL: https://github.com/apache/drill/pull/2775

   # [DRILL-8410](https://issues.apache.org/jira/browse/DRILL-8410): Upgrade to 
Calcite 1.34
   
   ## Description
   
   Calcite 1.34 includes
   
   1.a fix for the currently broken date_trunc function in Drill
   2. support for a new QUALIFY clause in windows functions
   3. incompatible core parser grammar changes that break date_diff in Drill.
   
   Because of (3), Drill needs to make temporary use of a modified Parser.jj 
until Calcite backs out the mentioned parser changes. See the linked Calcite 
issues for more details.
   
   Normally it would be undesirable to backport the new QUALIFY clause but, 
short of setting up cherry picking from Calcite, getting the fix for the 
regression in DATE_TRUNC forces the addition of support for QUALIFY. Calcite 
does not do seperate bugfix releases.
   
   ## Documentation
   Document the new QUALIFY clause. 
   
   ## Testing
   - Existing unit tests of DATE_TRUNC.
   - Existing unit tests of DATE_DIFF.
   - New unit test of QUALIFY.




> Upgrade to Calcite 1.34
> ---
>
> Key: DRILL-8410
> URL: https://issues.apache.org/jira/browse/DRILL-8410
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: library
>Affects Versions: 1.21.0
>Reporter: James Turton
>Assignee: James Turton
>Priority: Major
> Fix For: 1.21.1
>
>
> Calcite 1.34 includes
>  # a fix for the currently broken date_trunc function in Drill
>  # support for a new QUALIFY clause in windows functions
>  # incompatible core parser grammar changes that break date_diff in Drill.
> Because of (3), Drill needs to make temporary use of a modified Parser.jj 
> until Calcite backs out the mentioned parser changes. See the linked Calcite 
> issues for more details.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8411) GoogleSheets Reader Will Not Read More than 1K Rows

2023-03-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17700476#comment-17700476
 ] 

ASF GitHub Bot commented on DRILL-8411:
---

cgivre opened a new pull request, #2774:
URL: https://github.com/apache/drill/pull/2774

   # [DRILL-8411](https://issues.apache.org/jira/browse/DRILL-8411): 
GoogleSheets Reader Will Not Read More than 1K Rows
   
   ## Description
   The GoogleSheets reader hits the batch limit from the GoogleSheets SDK of 
1000 rows and stops.   This PR fixes that.  
   
   It also fixes a minor but annoying issue whereby the GoogleSheets reader 
determines a column is a date/time, but is then unable to parse it because it 
is in a non-standard format.
   
   ## Documentation
   N/A
   
   ## Testing
   Ran existing unit tests and tested manually.




> GoogleSheets Reader Will Not Read More than 1K Rows
> ---
>
> Key: DRILL-8411
> URL: https://issues.apache.org/jira/browse/DRILL-8411
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - GoogleSheets
>Affects Versions: 1.21.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 1.21.1
>
>
> The GoogleSheets reader hits the batch limit from the GoogleSheets SDK of 
> 1000 rows and stops.   This PR fixes that.  
> It also fixes a minor but annoying issue whereby the GoogleSheets reader 
> determines a column is a date/time, but is then unable to parse it because it 
> is in a non-standard format.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8408) Allow Implicit Casts on Join

2023-03-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17699293#comment-17699293
 ] 

ASF GitHub Bot commented on DRILL-8408:
---

cgivre merged PR #2772:
URL: https://github.com/apache/drill/pull/2772




> Allow Implicit Casts on Join
> 
>
> Key: DRILL-8408
> URL: https://issues.apache.org/jira/browse/DRILL-8408
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Data Types
>Affects Versions: 1.21.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 1.21.1
>
>
> Currently, Drill does not allow implicit casts on joins.  With DRILL-8136, 
> this has been significantly improved, and it might make sense to do so. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8408) Allow Implicit Casts on Join

2023-03-10 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17699116#comment-17699116
 ] 

ASF GitHub Bot commented on DRILL-8408:
---

cgivre commented on PR #2772:
URL: https://github.com/apache/drill/pull/2772#issuecomment-1464476318

   > I agree that it would be nice to be able to switch this on or off using a 
session option. And I wonder if we should begin with that option defaulted to 
false so that we can
   > 
   > 1. include this in 1.21.x and
   > 2. collect some experience from opt-ins (like ourselves) about whether 
such joins turn out to be badly behaved, before exposing out-of-the-box users 
to it.
   
   I added a new exec option defaulted to `false`.  




> Allow Implicit Casts on Join
> 
>
> Key: DRILL-8408
> URL: https://issues.apache.org/jira/browse/DRILL-8408
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Data Types
>Affects Versions: 1.21.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 1.21.1
>
>
> Currently, Drill does not allow implicit casts on joins.  With DRILL-8136, 
> this has been significantly improved, and it might make sense to do so. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8408) Allow Implicit Casts on Join

2023-03-10 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17699024#comment-17699024
 ] 

ASF GitHub Bot commented on DRILL-8408:
---

jnturton commented on PR #2772:
URL: https://github.com/apache/drill/pull/2772#issuecomment-1464068067

   I agree that it would be nice to be able to switch this on or off using a 
session option. And I wonder if we should begin with that option defaulted to 
false so that we can 
   
   1. include this in 1.21.x and 
   2. collect some experience from opt-ins (like ourselves) about whether such 
joins turn out to be badly behaved, before exposing out-of-the-box users to it.




> Allow Implicit Casts on Join
> 
>
> Key: DRILL-8408
> URL: https://issues.apache.org/jira/browse/DRILL-8408
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Data Types
>Affects Versions: 1.21.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 1.21.1
>
>
> Currently, Drill does not allow implicit casts on joins.  With DRILL-8136, 
> this has been significantly improved, and it might make sense to do so. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8408) Allow Implicit Casts on Join

2023-03-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17698368#comment-17698368
 ] 

ASF GitHub Bot commented on DRILL-8408:
---

cgivre commented on PR #2772:
URL: https://github.com/apache/drill/pull/2772#issuecomment-1462040949

   @jnturton @vvysotskyi I don't know if these checks were there for a reason 
or not, but with the improved implicit casting from DRILL-8316, this PR seems 
to work.   
   
   If there's a performance reason we shouldn't do this, I was thinking that we 
could add an exec option to enable/disable this functionality.




> Allow Implicit Casts on Join
> 
>
> Key: DRILL-8408
> URL: https://issues.apache.org/jira/browse/DRILL-8408
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Data Types
>Affects Versions: 1.21.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 1.21.1
>
>
> Currently, Drill does not allow implicit casts on joins.  With DRILL-8136, 
> this has been significantly improved, and it might make sense to do so. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8408) Allow Implicit Casts on Join

2023-03-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17698170#comment-17698170
 ] 

ASF GitHub Bot commented on DRILL-8408:
---

cgivre opened a new pull request, #2772:
URL: https://github.com/apache/drill/pull/2772

   # [DRILL-8048](https://issues.apache.org/jira/browse/DRILL-8408): Allow 
Implicit Casts on Join
   
   ## Description
   With the revision of Drill's implicit casting rules as a part of DRILL-8136, 
Drill now supports much improved implicit casting logic.  However, that does 
not carry over to joins.  This PR allows the implicit casting to carry through 
to joins as well. 
   
   ## Documentation
   N/A
   
   ## Testing
   Ran existing unit tests




> Allow Implicit Casts on Join
> 
>
> Key: DRILL-8408
> URL: https://issues.apache.org/jira/browse/DRILL-8408
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Data Types
>Affects Versions: 1.21.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 1.21.1
>
>
> Currently, Drill does not allow implicit casts on joins.  With DRILL-8136, 
> this has been significantly improved, and it might make sense to do so. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8407) Add Support for SFTP File Systems

2023-03-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17696939#comment-17696939
 ] 

ASF GitHub Bot commented on DRILL-8407:
---

cgivre merged PR #2770:
URL: https://github.com/apache/drill/pull/2770




> Add Support for SFTP File Systems
> -
>
> Key: DRILL-8407
> URL: https://issues.apache.org/jira/browse/DRILL-8407
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - File
>Affects Versions: 1.20.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: Future
>
>
> Add support for SFTP File Systems. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8407) Add Support for SFTP File Systems

2023-03-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17696593#comment-17696593
 ] 

ASF GitHub Bot commented on DRILL-8407:
---

cgivre opened a new pull request, #2770:
URL: https://github.com/apache/drill/pull/2770

   # [DRILL-8407](https://issues.apache.org/jira/browse/DRILL-): Add 
Support for SFTP File Systems
   
   ## Description
   This PR enables Drill to query files stored in SFTP file systems.
   
   ## Documentation
   An SFTP file system behaves exactly as any other file system. 
   
   ## Configuration
   To query data from an SFTP file system, follow the instructions for any 
other file system.  For the URL, provide the host as shown below:
   
   ```json
   {
 "type": "file",
 "connection": "sftp://",
 "workspaces": {
   "test": {
 "location": "",
 "writable": true,
 "defaultInputFormat": null,
 "allowAccessOutsideWorkspace": false
   },
 ...
   ```
   ### Authentication
   
   The SFTP plugin requires a username and password to authenticate.  The best 
way to do this is to provide the information via a `credentialProvider` as 
shown below.   SFTP file systems can be used with `USER_TRANSLATION` enabled, 
but not `USER_IMPERSONATION`.  
   
   ```json
"credentialsProvider": {
   "credentialsProviderType": "PlainCredentialsProvider",
   "credentials": {
 "username": "",
 "password": ""
   },
   "userCredentials": {}
 },
   ```
   
   If you need to pass additional configuration variables to the SFTP server, 
you can do so in the `config` parameter in the file system.  You will need to 
prefix any parameters with `fs.sftp`.  
   
   
   ## Testing
   Manually Tested




> Add Support for SFTP File Systems
> -
>
> Key: DRILL-8407
> URL: https://issues.apache.org/jira/browse/DRILL-8407
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - File
>Affects Versions: 1.20.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: Future
>
>
> Add support for SFTP File Systems. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8393) Allow parameters to be passed to headers through SQL in WHERE clause

2023-03-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17695578#comment-17695578
 ] 

ASF GitHub Bot commented on DRILL-8393:
---

LYCJeff commented on PR #2747:
URL: https://github.com/apache/drill/pull/2747#issuecomment-1451555417

   > Two ideas
   > 
   > 1. Since we won't backport this PR and it will only go out in the next 
major release, some breakage inside a plugin is probably something that can be 
swallowed.
   > 2. If it is still desired to preserve the ability to use the existing 
syntax in Drill 1.22 and beyond then a storage config option like 
`"useLegacyRequestParmSyntax": true` could be added for users who want it.
   
   @jnturton @cgivre That's a good idea without confusing old and new syntax, 
although it requires existing users to make small additions to the 
configuration. If it is acceptable to you, I will take some time to add a 
configuration item in the near future.




> Allow parameters to be passed to headers through SQL in WHERE clause
> 
>
> Key: DRILL-8393
> URL: https://issues.apache.org/jira/browse/DRILL-8393
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - HTTP
>Affects Versions: 1.20.0
>Reporter: Yuchen Liang
>Priority: Major
>
> Some APIs require parameters (e.g. digital signature) in the headers to be 
> generated at access time.So I'm wondering if we can pass it in through filter 
> statement.
> Perhaps we could design it like the params field in connections parameter. 
> For example:
>  
> Config:
> { "url": "https://api.sunrise-sunset.org/json";, "requireTail": false, 
> "params": ["body.lat", "body.lng", "body.date", "header.header1"], 
> "parameterLocation": "json_body" }
>  
> SQL Query:
> SELECT * FROM api.sunrise
> WHERE `body.lat` = 36.7201600
> AND `body.lng` = -4.4203400
> AND `body.date` = '2019-10-02'
> AND `header.header1` = 'value1';
>  
> Post body:
> { "lat": 36.7201600, "lng": -4.4203400, "date": "2019-10-02"}
>  
> Headers:
> { "header1": "value1", ……}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8393) Allow parameters to be passed to headers through SQL in WHERE clause

2023-03-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17695531#comment-17695531
 ] 

ASF GitHub Bot commented on DRILL-8393:
---

jnturton commented on PR #2747:
URL: https://github.com/apache/drill/pull/2747#issuecomment-1451459573

   Two ideas
   
   1. Since we won't backport this PR and it will only go out in the next major 
release, some breakage inside a plugin is probably something that can be 
swallowed.
   2. If it is still desired to preserve the ability to use the existing syntax 
in Drill 1.22 and beyond then a storage config option like 
`"useLegacyRequestParmSyntax": true` could be added for users who want it.




> Allow parameters to be passed to headers through SQL in WHERE clause
> 
>
> Key: DRILL-8393
> URL: https://issues.apache.org/jira/browse/DRILL-8393
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - HTTP
>Affects Versions: 1.20.0
>Reporter: Yuchen Liang
>Priority: Major
>
> Some APIs require parameters (e.g. digital signature) in the headers to be 
> generated at access time.So I'm wondering if we can pass it in through filter 
> statement.
> Perhaps we could design it like the params field in connections parameter. 
> For example:
>  
> Config:
> { "url": "https://api.sunrise-sunset.org/json";, "requireTail": false, 
> "params": ["body.lat", "body.lng", "body.date", "header.header1"], 
> "parameterLocation": "json_body" }
>  
> SQL Query:
> SELECT * FROM api.sunrise
> WHERE `body.lat` = 36.7201600
> AND `body.lng` = -4.4203400
> AND `body.date` = '2019-10-02'
> AND `header.header1` = 'value1';
>  
> Post body:
> { "lat": 36.7201600, "lng": -4.4203400, "date": "2019-10-02"}
>  
> Headers:
> { "header1": "value1", ……}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8405) Upgrade to snakeyaml 2.0 due to CVE

2023-03-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17695365#comment-17695365
 ] 

ASF GitHub Bot commented on DRILL-8405:
---

cgivre merged PR #2767:
URL: https://github.com/apache/drill/pull/2767




> Upgrade to snakeyaml 2.0 due to CVE
> ---
>
> Key: DRILL-8405
> URL: https://issues.apache.org/jira/browse/DRILL-8405
> Project: Apache Drill
>  Issue Type: Task
>Reporter: PJ Fanning
>Priority: Major
>
> https://bitbucket.org/snakeyaml/snakeyaml/issues/561/cve-2022-1471-vulnerability-in



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8405) Upgrade to snakeyaml 2.0 due to CVE

2023-03-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17695303#comment-17695303
 ] 

ASF GitHub Bot commented on DRILL-8405:
---

cgivre commented on PR #2767:
URL: https://github.com/apache/drill/pull/2767#issuecomment-1450778034

   @pjfanning I'll keep an eye on it, but it looks good.  I'll restart if it 
times out. 




> Upgrade to snakeyaml 2.0 due to CVE
> ---
>
> Key: DRILL-8405
> URL: https://issues.apache.org/jira/browse/DRILL-8405
> Project: Apache Drill
>  Issue Type: Task
>Reporter: PJ Fanning
>Priority: Major
>
> https://bitbucket.org/snakeyaml/snakeyaml/issues/561/cve-2022-1471-vulnerability-in



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8405) Upgrade to snakeyaml 2.0 due to CVE

2023-03-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17695302#comment-17695302
 ] 

ASF GitHub Bot commented on DRILL-8405:
---

pjfanning commented on PR #2767:
URL: https://github.com/apache/drill/pull/2767#issuecomment-1450776742

   @cgivre one of the CI subtasks is taking a bit longer to complete but it 
looks like using the new liquibase jar has fixed this general issue




> Upgrade to snakeyaml 2.0 due to CVE
> ---
>
> Key: DRILL-8405
> URL: https://issues.apache.org/jira/browse/DRILL-8405
> Project: Apache Drill
>  Issue Type: Task
>Reporter: PJ Fanning
>Priority: Major
>
> https://bitbucket.org/snakeyaml/snakeyaml/issues/561/cve-2022-1471-vulnerability-in



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8405) Upgrade to snakeyaml 2.0 due to CVE

2023-03-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17695235#comment-17695235
 ] 

ASF GitHub Bot commented on DRILL-8405:
---

pjfanning commented on PR #2767:
URL: https://github.com/apache/drill/pull/2767#issuecomment-145069

   https://github.com/liquibase/liquibase/issues/3617#issuecomment-1450560162




> Upgrade to snakeyaml 2.0 due to CVE
> ---
>
> Key: DRILL-8405
> URL: https://issues.apache.org/jira/browse/DRILL-8405
> Project: Apache Drill
>  Issue Type: Task
>Reporter: PJ Fanning
>Priority: Major
>
> https://bitbucket.org/snakeyaml/snakeyaml/issues/561/cve-2022-1471-vulnerability-in



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8393) Allow parameters to be passed to headers through SQL in WHERE clause

2023-02-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694879#comment-17694879
 ] 

ASF GitHub Bot commented on DRILL-8393:
---

LYCJeff commented on PR #2747:
URL: https://github.com/apache/drill/pull/2747#issuecomment-1449446242

   > @LYCJeff I really like the functionality here, but I am concerned that 
this is a breaking change and will affect existing Drill users. Also, it adds 
effectively new syntax to the SQL queries.
   
   @cgivre At this point, I can pass the unprefixed parameters in their place 
by default, the way they were. This minimizes the impact on existing users, 
except in the following cases. For example, the argument that the user passed 
into the request body was called `header.xxx`, but now needs to be rewritten as 
`body.header.xxx`, otherwise the argument will be passed into the request 
header.
   
   In addition, a problem that had been fixed would reappear. The argument that 
is passed to the url path is also passed to the end of the url, which has been 
clearly distinguished since I changed it.
   
   Let me know if you think this is more friendly to existing users, then I'll 
move in this direction.




> Allow parameters to be passed to headers through SQL in WHERE clause
> 
>
> Key: DRILL-8393
> URL: https://issues.apache.org/jira/browse/DRILL-8393
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - HTTP
>Affects Versions: 1.20.0
>Reporter: Yuchen Liang
>Priority: Major
>
> Some APIs require parameters (e.g. digital signature) in the headers to be 
> generated at access time.So I'm wondering if we can pass it in through filter 
> statement.
> Perhaps we could design it like the params field in connections parameter. 
> For example:
>  
> Config:
> { "url": "https://api.sunrise-sunset.org/json";, "requireTail": false, 
> "params": ["body.lat", "body.lng", "body.date", "header.header1"], 
> "parameterLocation": "json_body" }
>  
> SQL Query:
> SELECT * FROM api.sunrise
> WHERE `body.lat` = 36.7201600
> AND `body.lng` = -4.4203400
> AND `body.date` = '2019-10-02'
> AND `header.header1` = 'value1';
>  
> Post body:
> { "lat": 36.7201600, "lng": -4.4203400, "date": "2019-10-02"}
>  
> Headers:
> { "header1": "value1", ……}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8406) Enable implicit casting of VARCHAR and BIT args in aggregate functions

2023-02-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694707#comment-17694707
 ] 

ASF GitHub Bot commented on DRILL-8406:
---

cgivre merged PR #2768:
URL: https://github.com/apache/drill/pull/2768




> Enable implicit casting of VARCHAR and BIT args in aggregate functions
> --
>
> Key: DRILL-8406
> URL: https://issues.apache.org/jira/browse/DRILL-8406
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Drill
>Affects Versions: 1.21.0
>Reporter: James Turton
>Assignee: James Turton
>Priority: Minor
> Fix For: 1.21.1
>
>
> Default function implementations that that throw unsupported operation 
> exceptions in the class AggregateErrorFunctions prevent the implicit casting 
> of VARCHAR and BIT arguments to neighbouring types. E.g. 
> {code:java}
> apache drill> select sum('1');
> Error: UNSUPPORTED_OPERATION ERROR: Only COUNT, MIN and MAX aggregate 
> functions supported for VarChar type{code}
> This issue proposes to remove AggregateErrorFunctions so that implicit 
> casting works, the example above changing as follows.
> {code:java}
> apache drill> select sum('1');
> EXPR$0  1
> 1 row selected (2.346 seconds)
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8393) Allow parameters to be passed to headers through SQL in WHERE clause

2023-02-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694615#comment-17694615
 ] 

ASF GitHub Bot commented on DRILL-8393:
---

cgivre commented on PR #2747:
URL: https://github.com/apache/drill/pull/2747#issuecomment-1448388913

   @LYCJeff 
   I really like the functionality here, but I am concerned that this is a 
breaking change and will affect existing Drill users.  Also, it adds 
effectively new syntax to the SQL queries. 
   
   




> Allow parameters to be passed to headers through SQL in WHERE clause
> 
>
> Key: DRILL-8393
> URL: https://issues.apache.org/jira/browse/DRILL-8393
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - HTTP
>Affects Versions: 1.20.0
>Reporter: Yuchen Liang
>Priority: Major
>
> Some APIs require parameters (e.g. digital signature) in the headers to be 
> generated at access time.So I'm wondering if we can pass it in through filter 
> statement.
> Perhaps we could design it like the params field in connections parameter. 
> For example:
>  
> Config:
> { "url": "https://api.sunrise-sunset.org/json";, "requireTail": false, 
> "params": ["body.lat", "body.lng", "body.date", "header.header1"], 
> "parameterLocation": "json_body" }
>  
> SQL Query:
> SELECT * FROM api.sunrise
> WHERE `body.lat` = 36.7201600
> AND `body.lng` = -4.4203400
> AND `body.date` = '2019-10-02'
> AND `header.header1` = 'value1';
>  
> Post body:
> { "lat": 36.7201600, "lng": -4.4203400, "date": "2019-10-02"}
>  
> Headers:
> { "header1": "value1", ……}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8405) Upgrade to snakeyaml 2.0 due to CVE

2023-02-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694613#comment-17694613
 ] 

ASF GitHub Bot commented on DRILL-8405:
---

cgivre commented on PR #2767:
URL: https://github.com/apache/drill/pull/2767#issuecomment-1448370911

   @pjfanning I'm going to convert this to draft status until we can update 
liquibase.




> Upgrade to snakeyaml 2.0 due to CVE
> ---
>
> Key: DRILL-8405
> URL: https://issues.apache.org/jira/browse/DRILL-8405
> Project: Apache Drill
>  Issue Type: Task
>Reporter: PJ Fanning
>Priority: Major
>
> https://bitbucket.org/snakeyaml/snakeyaml/issues/561/cve-2022-1471-vulnerability-in



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8405) Upgrade to snakeyaml 2.0 due to CVE

2023-02-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694549#comment-17694549
 ] 

ASF GitHub Bot commented on DRILL-8405:
---

pjfanning commented on PR #2767:
URL: https://github.com/apache/drill/pull/2767#issuecomment-1448205614

   Need to wait for liquibase to upgrade their lib. I don't know if snakeyaml 
is used elsewhere in Drill. If it is, it may be possible to upgrade snakeyaml 
in some places and keep the old version where liquibase is used.




> Upgrade to snakeyaml 2.0 due to CVE
> ---
>
> Key: DRILL-8405
> URL: https://issues.apache.org/jira/browse/DRILL-8405
> Project: Apache Drill
>  Issue Type: Task
>Reporter: PJ Fanning
>Priority: Major
>
> https://bitbucket.org/snakeyaml/snakeyaml/issues/561/cve-2022-1471-vulnerability-in



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8405) Upgrade to snakeyaml 2.0 due to CVE

2023-02-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694531#comment-17694531
 ] 

ASF GitHub Bot commented on DRILL-8405:
---

cgivre commented on PR #2767:
URL: https://github.com/apache/drill/pull/2767#issuecomment-1448156621

   @pjfanning Is there any workaround for this?




> Upgrade to snakeyaml 2.0 due to CVE
> ---
>
> Key: DRILL-8405
> URL: https://issues.apache.org/jira/browse/DRILL-8405
> Project: Apache Drill
>  Issue Type: Task
>Reporter: PJ Fanning
>Priority: Major
>
> https://bitbucket.org/snakeyaml/snakeyaml/issues/561/cve-2022-1471-vulnerability-in



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8406) Enable implicit casting of VARCHAR and BIT args in aggregate functions

2023-02-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694439#comment-17694439
 ] 

ASF GitHub Bot commented on DRILL-8406:
---

jnturton opened a new pull request, #2768:
URL: https://github.com/apache/drill/pull/2768

   # [DRILL-8406](https://issues.apache.org/jira/browse/DRILL-8406): Enable 
implicit casting of VARCHAR and BIT args in aggregate functions
   
   ## Description
   
   Default function implementations that that throw unsupported operation 
exceptions in the class AggregateErrorFunctions prevent the implicit casting of 
VARCHAR and BIT arguments to neighbouring types. E.g.
   ```
   apache drill> select sum('1');
   Error: UNSUPPORTED_OPERATION ERROR: Only COUNT, MIN and MAX aggregate 
functions supported for VarChar type
   ```
   This PR removes AggregateErrorFunctions so that implicit casting works, the 
example above changing as follows.
   ```
   apache drill> select sum('1');
   EXPR$0  1
   1 row selected (2.346 seconds)
   ```
   
   ## Documentation
   N/A
   
   ## Testing
   New unit test.
   




> Enable implicit casting of VARCHAR and BIT args in aggregate functions
> --
>
> Key: DRILL-8406
> URL: https://issues.apache.org/jira/browse/DRILL-8406
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Drill
>Affects Versions: 1.21.0
>Reporter: James Turton
>Assignee: James Turton
>Priority: Minor
> Fix For: 1.21.1
>
>
> Default function implementations that that throw unsupported operation 
> exceptions in the class AggregateErrorFunctions prevent the implicit casting 
> of VARCHAR and BIT arguments to neighbouring types. E.g. 
> {code:java}
> apache drill> select sum('1');
> Error: UNSUPPORTED_OPERATION ERROR: Only COUNT, MIN and MAX aggregate 
> functions supported for VarChar type{code}
> This issue proposes to remove AggregateErrorFunctions so that implicit 
> casting works, the example above changing as follows.
> {code:java}
> apache drill> select sum('1');
> EXPR$0  1
> 1 row selected (2.346 seconds)
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8158) Remove non-reproducible build outputs

2023-02-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17693724#comment-17693724
 ] 

ASF GitHub Bot commented on DRILL-8158:
---

cgivre merged PR #2766:
URL: https://github.com/apache/drill/pull/2766




> Remove non-reproducible build outputs
> -
>
> Key: DRILL-8158
> URL: https://issues.apache.org/jira/browse/DRILL-8158
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.20.0
>Reporter: Herve Boutemy
>Assignee: James Turton
>Priority: Major
> Fix For: 1.20.2
>
>
> For context see [1] and [2]. The git-commit-id plugin includes information 
> like build host, email and time which is not compatible with a reproducible 
> build. Drill's built in sys.version table will return the build email and 
> time if they are present in the build's git.properties file so these columns 
> must be deprecated. Other useful Git-related information is retained.
> In accompanying commits, some Kerberos unit test fixes are applied, and the 
> tests reenabled, and some updates to Release.md are included.
> [1] [https://maven.apache.org/guides/mini/guide-reproducible-builds.html]
> [2] 
> [https://github.com/jvm-repo-rebuild/reproducible-central#org.apache.drill:drill-root]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8405) upgrade to snakeyaml 2.0 due to cve

2023-02-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17693700#comment-17693700
 ] 

ASF GitHub Bot commented on DRILL-8405:
---

pjfanning commented on PR #2767:
URL: https://github.com/apache/drill/pull/2767#issuecomment-1445455789

   It looks like Liquibase uses a snakeyaml 1.0 API call that is not supported 
in snaleyaml 2.0.
   
   ```
   2023-02-26T15:12:21.4680779Z Caused by: java.lang.NoSuchMethodError: 
org.yaml.snakeyaml.constructor.SafeConstructor: method ()V not found
   2023-02-26T15:12:21.4681347Z at 
liquibase.parser.core.yaml.YamlChangeLogParser.parse(YamlChangeLogParser.java:23)
   2023-02-26T15:12:21.4681830Z at 
liquibase.Liquibase.getDatabaseChangeLog(Liquibase.java:369)
   ```




> upgrade to snakeyaml 2.0 due to cve
> ---
>
> Key: DRILL-8405
> URL: https://issues.apache.org/jira/browse/DRILL-8405
> Project: Apache Drill
>  Issue Type: Task
>Reporter: PJ Fanning
>Priority: Major
>
> https://bitbucket.org/snakeyaml/snakeyaml/issues/561/cve-2022-1471-vulnerability-in



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8158) Remove non-reproducible build outputs

2023-02-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17693675#comment-17693675
 ] 

ASF GitHub Bot commented on DRILL-8158:
---

hboutemy commented on PR #2766:
URL: https://github.com/apache/drill/pull/2766#issuecomment-1445406594

   I'd love that it could be feasible, but I don't think CI is able to check 
reproducibility
   
   another aspect is that we currently have no regression, but just fixes that 
are done step by step: once we have fixed one issue that creates a lot of 
noise, next release shows issues that are less noisy, then were not much 
visible before
   
   IMHO, we just need to accept that for such big project, having a build that 
is fully reproducible requires multiple iterations: that's not unexpected
   I'm confident that once this PR is merged, the remaining issues will impact 
much less content




> Remove non-reproducible build outputs
> -
>
> Key: DRILL-8158
> URL: https://issues.apache.org/jira/browse/DRILL-8158
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.20.0
>Reporter: Herve Boutemy
>Assignee: James Turton
>Priority: Major
> Fix For: 1.20.2
>
>
> For context see [1] and [2]. The git-commit-id plugin includes information 
> like build host, email and time which is not compatible with a reproducible 
> build. Drill's built in sys.version table will return the build email and 
> time if they are present in the build's git.properties file so these columns 
> must be deprecated. Other useful Git-related information is retained.
> In accompanying commits, some Kerberos unit test fixes are applied, and the 
> tests reenabled, and some updates to Release.md are included.
> [1] [https://maven.apache.org/guides/mini/guide-reproducible-builds.html]
> [2] 
> [https://github.com/jvm-repo-rebuild/reproducible-central#org.apache.drill:drill-root]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8405) upgrade to snakeyaml 2.0 due to cve

2023-02-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17693667#comment-17693667
 ] 

ASF GitHub Bot commented on DRILL-8405:
---

cgivre commented on PR #2767:
URL: https://github.com/apache/drill/pull/2767#issuecomment-1445393983

   Ugh.. it looks like the new library broke something.   Disregard approval. 
:-(




> upgrade to snakeyaml 2.0 due to cve
> ---
>
> Key: DRILL-8405
> URL: https://issues.apache.org/jira/browse/DRILL-8405
> Project: Apache Drill
>  Issue Type: Task
>Reporter: PJ Fanning
>Priority: Major
>
> https://bitbucket.org/snakeyaml/snakeyaml/issues/561/cve-2022-1471-vulnerability-in



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8158) Remove non-reproducible build outputs

2023-02-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17693650#comment-17693650
 ] 

ASF GitHub Bot commented on DRILL-8158:
---

cgivre commented on PR #2766:
URL: https://github.com/apache/drill/pull/2766#issuecomment-1445371514

   @hboutemy Should we add this as a CI check?




> Remove non-reproducible build outputs
> -
>
> Key: DRILL-8158
> URL: https://issues.apache.org/jira/browse/DRILL-8158
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.20.0
>Reporter: Herve Boutemy
>Assignee: James Turton
>Priority: Major
> Fix For: 1.20.2
>
>
> For context see [1] and [2]. The git-commit-id plugin includes information 
> like build host, email and time which is not compatible with a reproducible 
> build. Drill's built in sys.version table will return the build email and 
> time if they are present in the build's git.properties file so these columns 
> must be deprecated. Other useful Git-related information is retained.
> In accompanying commits, some Kerberos unit test fixes are applied, and the 
> tests reenabled, and some updates to Release.md are included.
> [1] [https://maven.apache.org/guides/mini/guide-reproducible-builds.html]
> [2] 
> [https://github.com/jvm-repo-rebuild/reproducible-central#org.apache.drill:drill-root]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8405) upgrade to snakeyaml 2.0 due to cve

2023-02-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17693647#comment-17693647
 ] 

ASF GitHub Bot commented on DRILL-8405:
---

pjfanning opened a new pull request, #2767:
URL: https://github.com/apache/drill/pull/2767

   ## Description
   
   upgrade to snakeyaml 2.0 due to CVE
   
   ## Testing
   CI build




> upgrade to snakeyaml 2.0 due to cve
> ---
>
> Key: DRILL-8405
> URL: https://issues.apache.org/jira/browse/DRILL-8405
> Project: Apache Drill
>  Issue Type: Task
>Reporter: PJ Fanning
>Priority: Major
>
> https://bitbucket.org/snakeyaml/snakeyaml/issues/561/cve-2022-1471-vulnerability-in



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8158) Remove non-reproducible build outputs

2023-02-24 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17693064#comment-17693064
 ] 

ASF GitHub Bot commented on DRILL-8158:
---

hboutemy opened a new pull request, #2766:
URL: https://github.com/apache/drill/pull/2766

   see #2590 for initial improvements
   check of release 1.21.0 shows that there are still a few issues 
https://github.com/jvm-repo-rebuild/reproducible-central/blob/master/content/org/apache/drill/README.md




> Remove non-reproducible build outputs
> -
>
> Key: DRILL-8158
> URL: https://issues.apache.org/jira/browse/DRILL-8158
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.20.0
>Reporter: Herve Boutemy
>Assignee: James Turton
>Priority: Major
> Fix For: 1.20.2
>
>
> For context see [1] and [2]. The git-commit-id plugin includes information 
> like build host, email and time which is not compatible with a reproducible 
> build. Drill's built in sys.version table will return the build email and 
> time if they are present in the build's git.properties file so these columns 
> must be deprecated. Other useful Git-related information is retained.
> In accompanying commits, some Kerberos unit test fixes are applied, and the 
> tests reenabled, and some updates to Release.md are included.
> [1] [https://maven.apache.org/guides/mini/guide-reproducible-builds.html]
> [2] 
> [https://github.com/jvm-repo-rebuild/reproducible-central#org.apache.drill:drill-root]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8402) Add REGEXP_EXTRACT Function

2023-02-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17692904#comment-17692904
 ] 

ASF GitHub Bot commented on DRILL-8402:
---

cgivre merged PR #2762:
URL: https://github.com/apache/drill/pull/2762




> Add REGEXP_EXTRACT Function
> ---
>
> Key: DRILL-8402
> URL: https://issues.apache.org/jira/browse/DRILL-8402
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Drill
>Affects Versions: 1.21.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 1.21.1
>
>
> This PR adds two UDFs to Drill:
> regexp_extract(, ) which returns an array of strings which 
> were captured by capturing groups in the regex.
> regexp_extract(, , ) returns the text captured by a 
> specific capturing group. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8402) Add REGEXP_EXTRACT Function

2023-02-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17692805#comment-17692805
 ] 

ASF GitHub Bot commented on DRILL-8402:
---

vvysotskyi commented on code in PR #2762:
URL: https://github.com/apache/drill/pull/2762#discussion_r1116028724


##
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/StringFunctions.java:
##
@@ -293,6 +293,109 @@ public void eval() {
 }
   }
 
+  /*
+   * This function returns the capturing groups from a regex.
+   */
+  @FunctionTemplate(name = "regexp_extract", scope = FunctionScope.SIMPLE,
+  outputWidthCalculatorType = 
OutputWidthCalculatorType.CUSTOM_FIXED_WIDTH_DEFAULT)
+  public static class RegexpExtract implements DrillSimpleFunc {
+
+@Param VarCharHolder input;
+@Param(constant=true) VarCharHolder pattern;
+@Inject
+DrillBuf buffer;
+@Workspace
+java.util.regex.Matcher matcher;
+@Workspace
+org.apache.drill.exec.expr.fn.impl.CharSequenceWrapper charSequenceWrapper;
+@Output
+ComplexWriter out;
+
+@Override
+public void setup() {
+  matcher = 
java.util.regex.Pattern.compile(org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.toStringFromUTF8(pattern.start,
  pattern.end,  pattern.buffer)).matcher("");
+  charSequenceWrapper = new 
org.apache.drill.exec.expr.fn.impl.CharSequenceWrapper();
+  matcher.reset(charSequenceWrapper);
+}
+
+@Override
+public void eval() {
+  charSequenceWrapper.setBuffer(input.start, input.end, input.buffer);
+
+  // Reusing same charSequenceWrapper, no need to pass it in.
+  matcher.reset();
+  boolean result = matcher.find();
+
+  // Start the list here.  If there are no matches, we return an empty 
list.
+  org.apache.drill.exec.vector.complex.writer.BaseWriter.ListWriter 
listWriter = out.rootAsList();
+  listWriter.startList();
+
+  if (result) {
+org.apache.drill.exec.vector.complex.writer.VarCharWriter 
varCharWriter = listWriter.varChar();
+
+for(int i = 1; i <= matcher.groupCount(); i++) {
+  final byte[] strBytes = 
matcher.group(i).getBytes(com.google.common.base.Charsets.UTF_8);

Review Comment:
   `matcher.group(i)` creates and returns string





> Add REGEXP_EXTRACT Function
> ---
>
> Key: DRILL-8402
> URL: https://issues.apache.org/jira/browse/DRILL-8402
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Drill
>Affects Versions: 1.21.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 1.21.1
>
>
> This PR adds two UDFs to Drill:
> regexp_extract(, ) which returns an array of strings which 
> were captured by capturing groups in the regex.
> regexp_extract(, , ) returns the text captured by a 
> specific capturing group. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8402) Add REGEXP_EXTRACT Function

2023-02-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17692668#comment-17692668
 ] 

ASF GitHub Bot commented on DRILL-8402:
---

cgivre commented on PR #2762:
URL: https://github.com/apache/drill/pull/2762#issuecomment-1441742721

   @vvysotskyi Thanks for the review.  I refactored the functions so that they 
are not creating extra String objects.




> Add REGEXP_EXTRACT Function
> ---
>
> Key: DRILL-8402
> URL: https://issues.apache.org/jira/browse/DRILL-8402
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Drill
>Affects Versions: 1.21.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 1.21.1
>
>
> This PR adds two UDFs to Drill:
> regexp_extract(, ) which returns an array of strings which 
> were captured by capturing groups in the regex.
> regexp_extract(, , ) returns the text captured by a 
> specific capturing group. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8403) Generate aggregate function calls are missing required filters when used with PIVOT

2023-02-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17692586#comment-17692586
 ] 

ASF GitHub Bot commented on DRILL-8403:
---

jnturton merged PR #2765:
URL: https://github.com/apache/drill/pull/2765




> Generate aggregate function calls are missing required filters when used with 
> PIVOT
> ---
>
> Key: DRILL-8403
> URL: https://issues.apache.org/jira/browse/DRILL-8403
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.21.0
>Reporter: James Turton
>Assignee: Vova Vysotskyi
>Priority: Major
> Fix For: 1.21.1
>
>
> The following query should generate aggregates grouped by education_level and 
> containing filters on marital_status but the requisite filters are lost 
> during function rewriting.
> apache drill> SELECT
> 2..semicolon> *
> 3..semicolon> FROM
> 4..semicolon> (SELECT
> 5..)> education_level,
> 6..)> salary,
> 7..)> marital_status,
> 8..)> extract(year from age(birth_date)) age
> 9..)> FROM
> 10.)> cp.`employee.json`)
> 11.semicolon> PIVOT (
> 12.)> avg(salary) avg_salary, avg(age) avg_age FOR marital_status IN 
> ('M' married, 'S' single)
> 13.)> );
> {+}{-}{-}{+}--{-}++{-}--{-}{-}--{-}++{-}---
> |education_level|married_avg_salary|married_avg_age|single_avg_salary|single_avg_age|
> {+}{-}{-}{+}--{-}++{-}--{-}{-}--{-}++{-}---
> |Graduate 
> Degree|4392.823529411765|100.32352941176471|4392.823529411765|100.32352941176471|
> |Bachelors 
> Degree|4492.404181184669|102.22996515679442|4492.404181184669|102.22996515679442|
> |Partial 
> College|4047.11807|100.100694|4047.11807|100.100694|
> |High School 
> Degree|3516.1565836298932|103.12811387900356|3516.1565836298932|103.12811387900356|
> |Partial High 
> School|3511.0852713178297|102.30232558139535|3511.0852713178297|102.30232558139535|
> {+}{-}{-}{+}--{-}++{-}--{-}{-}--{-}++{-}---
> 5 rows selected (0.285 seconds)
>  
> 00-00 Screen : rowType = RecordType(ANY education_level, ANY 
> married_min_salary, DOUBLE married_avg_age, ANY single_min_salary, DOUBLE 
> single_avg_age): rowcount = 46.3, cumulative cost = \{1486.23 rows, 
> 35748.2296 cpu, 474630.0 io, 0.0 network, 8148.8001 memory}, 
> id = 812
> 00-01 Project(education_level=[$0], married_min_salary=[$1], 
> married_avg_age=[$2], single_min_salary=[$3], single_avg_age=[$4]) : rowType 
> = RecordType(ANY education_level, ANY married_min_salary, DOUBLE 
> married_avg_age, ANY single_min_salary, DOUBLE single_avg_age): rowcount = 
> 46.3, cumulative cost = \{1481.6 rows, 35743.6 cpu, 474630.0 io, 0.0 network, 
> 8148.8001 memory}, id = 811
> 00-02 Project(education_level=[$0], 
> married_min_salary=[divide(CastHigh(CASE(=($2, 0), null:NULL, $1)), $2)], 
> married_avg_age=[divide(CastHigh(CASE(=($4, 0), null:NULL, $3)), $4)], 
> single_min_salary=[divide(CastHigh(CASE(=($2, 0), null:NULL, $1)), $2)], 
> single_avg_age=[divide(CastHigh(CASE(=($4, 0), null:NULL, $3)), $4)]) : 
> rowType = RecordType(ANY education_level, ANY married_min_salary, DOUBLE 
> married_avg_age, ANY single_min_salary, DOUBLE single_avg_age): rowcount = 
> 46.3, cumulative cost = \{1435.3 rows, 35512.1 cpu, 474630.0 io, 0.0 network, 
> 8148.8001 memory}, id = 808
> 00-03 HashAgg(group=[\\{0}], agg#0=[$SUM0($2)], agg#1=[COUNT($2)], 
> agg#2=[$SUM0($3)], agg#3=[COUNT($3)]) : rowType = RecordType(ANY 
> education_level, ANY $f1, BIGINT $f2, BIGINT $f3, BIGINT $f4): rowcount = 
> 46.3, cumulative cost = \{1389.0 rows, 34725.0 cpu, 474630.0 io, 0.0 network, 
> 8148.8001 memory}, id = 807
> 00-04 Project(education_level=[$0], marital_status=[$1], salary=[$2], 
> age=[EXTRACT(FLAG(YEAR), AGE($3))], $f4=[IS TRUE(=($1, 'M'))], $f5=[IS 
> TRUE(=($1, 'S'))]) : rowType = RecordType(ANY education_level, ANY 
> marital_status, ANY salary, BIGINT age, BOOLEAN $f4, BOOLEAN $f5): rowcount = 
> 463.0, cumulative cost = \{926.0 rows, 8797.0 cpu, 474630.0 io, 0.0 network, 
> 0.0 memory}, id = 806
> 00-05 Scan(table=[[cp, employee.json]], groupscan=[EasyGroupScan 
> [selectionRoot=classpath:/employee.json, numFiles=1, 
> columns=[`education_level`, `marital_status`, `salary`, `birth_date`], 
> files=[classpath:/employee.json], usedMetastore=false, limit=-1, 
> formatConfig=JSONFormatConfig [extensions=[json) : rowType = 
> RecordType(ANY education_level, ANY marital_status, ANY salary, ANY 
> birth_date): rowcount = 463.0, cumulative cos

[jira] [Commented] (DRILL-8402) Add REGEXP_EXTRACT Function

2023-02-22 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17692464#comment-17692464
 ] 

ASF GitHub Bot commented on DRILL-8402:
---

cgivre commented on code in PR #2762:
URL: https://github.com/apache/drill/pull/2762#discussion_r1115197954


##
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/StringFunctions.java:
##
@@ -293,6 +293,115 @@ public void eval() {
 }
   }
 
+  /*
+   * This function returns the capturing groups from a regex.
+   */
+  @FunctionTemplate(name = "regexp_extract", scope = FunctionScope.SIMPLE,
+  outputWidthCalculatorType = 
OutputWidthCalculatorType.CUSTOM_FIXED_WIDTH_DEFAULT)
+  public static class RegexpExtract implements DrillSimpleFunc {
+
+@Param VarCharHolder input;
+@Param(constant=true) VarCharHolder pattern;
+@Inject
+DrillBuf buffer;
+@Workspace
+java.util.regex.Matcher matcher;
+@Workspace
+org.apache.drill.exec.expr.fn.impl.CharSequenceWrapper charSequenceWrapper;
+@Output
+ComplexWriter out;
+
+@Override
+public void setup() {
+  matcher = 
java.util.regex.Pattern.compile(org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.toStringFromUTF8(pattern.start,
  pattern.end,  pattern.buffer)).matcher("");
+  charSequenceWrapper = new 
org.apache.drill.exec.expr.fn.impl.CharSequenceWrapper();
+  matcher.reset(charSequenceWrapper);
+}
+
+@Override
+public void eval() {
+  charSequenceWrapper.setBuffer(input.start, input.end, input.buffer);
+
+  // Reusing same charSequenceWrapper, no need to pass it in.
+  matcher.reset();
+  boolean result = matcher.find();
+
+  // Start the list here.  If there are no matches, we return an empty 
list.
+  org.apache.drill.exec.vector.complex.writer.BaseWriter.ListWriter 
listWriter = out.rootAsList();
+  listWriter.startList();
+
+  if (result) {
+org.apache.drill.exec.vector.complex.writer.VarCharWriter 
varCharWriter = listWriter.varChar();
+String extractedResult;
+for(int i = 1; i <= matcher.groupCount(); i++) {
+  extractedResult = matcher.group(i);

Review Comment:
   Fixed. 





> Add REGEXP_EXTRACT Function
> ---
>
> Key: DRILL-8402
> URL: https://issues.apache.org/jira/browse/DRILL-8402
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Drill
>Affects Versions: 1.21.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 1.21.1
>
>
> This PR adds two UDFs to Drill:
> regexp_extract(, ) which returns an array of strings which 
> were captured by capturing groups in the regex.
> regexp_extract(, , ) returns the text captured by a 
> specific capturing group. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8402) Add REGEXP_EXTRACT Function

2023-02-22 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17692456#comment-17692456
 ] 

ASF GitHub Bot commented on DRILL-8402:
---

cgivre commented on code in PR #2762:
URL: https://github.com/apache/drill/pull/2762#discussion_r1115193957


##
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/CharSequenceWrapper.java:
##
@@ -90,7 +90,10 @@ public char charAt(int index) {
*/
   @Override
   public CharSequence subSequence(int start, int end) {
-throw new UnsupportedOperationException();
+// throw new UnsupportedOperationException();

Review Comment:
   Fixed.





> Add REGEXP_EXTRACT Function
> ---
>
> Key: DRILL-8402
> URL: https://issues.apache.org/jira/browse/DRILL-8402
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Drill
>Affects Versions: 1.21.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 1.21.1
>
>
> This PR adds two UDFs to Drill:
> regexp_extract(, ) which returns an array of strings which 
> were captured by capturing groups in the regex.
> regexp_extract(, , ) returns the text captured by a 
> specific capturing group. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8402) Add REGEXP_EXTRACT Function

2023-02-22 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17692455#comment-17692455
 ] 

ASF GitHub Bot commented on DRILL-8402:
---

cgivre commented on code in PR #2762:
URL: https://github.com/apache/drill/pull/2762#discussion_r1115193471


##
NOTICE:
##
@@ -1,5 +1,5 @@
 Apache Drill
-Copyright 2013-2022 The Apache Software Foundation
+Copyright 2013-2023 The Apache Software Foundation

Review Comment:
   Fixed.





> Add REGEXP_EXTRACT Function
> ---
>
> Key: DRILL-8402
> URL: https://issues.apache.org/jira/browse/DRILL-8402
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Drill
>Affects Versions: 1.21.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 1.21.1
>
>
> This PR adds two UDFs to Drill:
> regexp_extract(, ) which returns an array of strings which 
> were captured by capturing groups in the regex.
> regexp_extract(, , ) returns the text captured by a 
> specific capturing group. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8403) Generate aggregate function calls are missing required filters when used with PIVOT

2023-02-22 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17692373#comment-17692373
 ] 

ASF GitHub Bot commented on DRILL-8403:
---

vvysotskyi opened a new pull request, #2765:
URL: https://github.com/apache/drill/pull/2765

   # [DRILL-8403](https://issues.apache.org/jira/browse/DRILL-8403): Generate 
aggregate function calls are missing required filters when used with PIVOT
   
   ## Description
   Passing filters to agg calls when applying agg reduce rule.
   
   ## Documentation
   NA
   
   ## Testing
   Added UT.
   




> Generate aggregate function calls are missing required filters when used with 
> PIVOT
> ---
>
> Key: DRILL-8403
> URL: https://issues.apache.org/jira/browse/DRILL-8403
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.21.0
>Reporter: James Turton
>Assignee: Vova Vysotskyi
>Priority: Major
> Fix For: 1.21.1
>
>
> The following query should generate aggregates grouped by education_level and 
> containing filters on marital_status but the requisite filters are lost 
> during function rewriting.
> apache drill> SELECT
> 2..semicolon> *
> 3..semicolon> FROM
> 4..semicolon> (SELECT
> 5..)> education_level,
> 6..)> salary,
> 7..)> marital_status,
> 8..)> extract(year from age(birth_date)) age
> 9..)> FROM
> 10.)> cp.`employee.json`)
> 11.semicolon> PIVOT (
> 12.)> avg(salary) avg_salary, avg(age) avg_age FOR marital_status IN 
> ('M' married, 'S' single)
> 13.)> );
> {+}{-}{-}{+}--{-}++{-}--{-}{-}--{-}++{-}---
> |education_level|married_avg_salary|married_avg_age|single_avg_salary|single_avg_age|
> {+}{-}{-}{+}--{-}++{-}--{-}{-}--{-}++{-}---
> |Graduate 
> Degree|4392.823529411765|100.32352941176471|4392.823529411765|100.32352941176471|
> |Bachelors 
> Degree|4492.404181184669|102.22996515679442|4492.404181184669|102.22996515679442|
> |Partial 
> College|4047.11807|100.100694|4047.11807|100.100694|
> |High School 
> Degree|3516.1565836298932|103.12811387900356|3516.1565836298932|103.12811387900356|
> |Partial High 
> School|3511.0852713178297|102.30232558139535|3511.0852713178297|102.30232558139535|
> {+}{-}{-}{+}--{-}++{-}--{-}{-}--{-}++{-}---
> 5 rows selected (0.285 seconds)
>  
> 00-00 Screen : rowType = RecordType(ANY education_level, ANY 
> married_min_salary, DOUBLE married_avg_age, ANY single_min_salary, DOUBLE 
> single_avg_age): rowcount = 46.3, cumulative cost = \{1486.23 rows, 
> 35748.2296 cpu, 474630.0 io, 0.0 network, 8148.8001 memory}, 
> id = 812
> 00-01 Project(education_level=[$0], married_min_salary=[$1], 
> married_avg_age=[$2], single_min_salary=[$3], single_avg_age=[$4]) : rowType 
> = RecordType(ANY education_level, ANY married_min_salary, DOUBLE 
> married_avg_age, ANY single_min_salary, DOUBLE single_avg_age): rowcount = 
> 46.3, cumulative cost = \{1481.6 rows, 35743.6 cpu, 474630.0 io, 0.0 network, 
> 8148.8001 memory}, id = 811
> 00-02 Project(education_level=[$0], 
> married_min_salary=[divide(CastHigh(CASE(=($2, 0), null:NULL, $1)), $2)], 
> married_avg_age=[divide(CastHigh(CASE(=($4, 0), null:NULL, $3)), $4)], 
> single_min_salary=[divide(CastHigh(CASE(=($2, 0), null:NULL, $1)), $2)], 
> single_avg_age=[divide(CastHigh(CASE(=($4, 0), null:NULL, $3)), $4)]) : 
> rowType = RecordType(ANY education_level, ANY married_min_salary, DOUBLE 
> married_avg_age, ANY single_min_salary, DOUBLE single_avg_age): rowcount = 
> 46.3, cumulative cost = \{1435.3 rows, 35512.1 cpu, 474630.0 io, 0.0 network, 
> 8148.8001 memory}, id = 808
> 00-03 HashAgg(group=[\\{0}], agg#0=[$SUM0($2)], agg#1=[COUNT($2)], 
> agg#2=[$SUM0($3)], agg#3=[COUNT($3)]) : rowType = RecordType(ANY 
> education_level, ANY $f1, BIGINT $f2, BIGINT $f3, BIGINT $f4): rowcount = 
> 46.3, cumulative cost = \{1389.0 rows, 34725.0 cpu, 474630.0 io, 0.0 network, 
> 8148.8001 memory}, id = 807
> 00-04 Project(education_level=[$0], marital_status=[$1], salary=[$2], 
> age=[EXTRACT(FLAG(YEAR), AGE($3))], $f4=[IS TRUE(=($1, 'M'))], $f5=[IS 
> TRUE(=($1, 'S'))]) : rowType = RecordType(ANY education_level, ANY 
> marital_status, ANY salary, BIGINT age, BOOLEAN $f4, BOOLEAN $f5): rowcount = 
> 463.0, cumulative cost = \{926.0 rows, 8797.0 cpu, 474630.0 io, 0.0 network, 
> 0.0 memory}, id = 806
> 00-05 Scan(table=[[cp, employee.json]], groupscan=[EasyGroupScan 
> [selectionRoot=classpath:/employee.json, numFiles=1, 
> co

[jira] [Commented] (DRILL-8402) Add REGEXP_EXTRACT Function

2023-02-22 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17692334#comment-17692334
 ] 

ASF GitHub Bot commented on DRILL-8402:
---

vvysotskyi commented on code in PR #2762:
URL: https://github.com/apache/drill/pull/2762#discussion_r1114857738


##
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/StringFunctions.java:
##
@@ -293,6 +293,115 @@ public void eval() {
 }
   }
 
+  /*
+   * This function returns the capturing groups from a regex.
+   */
+  @FunctionTemplate(name = "regexp_extract", scope = FunctionScope.SIMPLE,
+  outputWidthCalculatorType = 
OutputWidthCalculatorType.CUSTOM_FIXED_WIDTH_DEFAULT)
+  public static class RegexpExtract implements DrillSimpleFunc {
+
+@Param VarCharHolder input;
+@Param(constant=true) VarCharHolder pattern;
+@Inject
+DrillBuf buffer;
+@Workspace
+java.util.regex.Matcher matcher;
+@Workspace
+org.apache.drill.exec.expr.fn.impl.CharSequenceWrapper charSequenceWrapper;
+@Output
+ComplexWriter out;
+
+@Override
+public void setup() {
+  matcher = 
java.util.regex.Pattern.compile(org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.toStringFromUTF8(pattern.start,
  pattern.end,  pattern.buffer)).matcher("");
+  charSequenceWrapper = new 
org.apache.drill.exec.expr.fn.impl.CharSequenceWrapper();
+  matcher.reset(charSequenceWrapper);
+}
+
+@Override
+public void eval() {
+  charSequenceWrapper.setBuffer(input.start, input.end, input.buffer);
+
+  // Reusing same charSequenceWrapper, no need to pass it in.
+  matcher.reset();
+  boolean result = matcher.find();
+
+  // Start the list here.  If there are no matches, we return an empty 
list.
+  org.apache.drill.exec.vector.complex.writer.BaseWriter.ListWriter 
listWriter = out.rootAsList();
+  listWriter.startList();
+
+  if (result) {
+org.apache.drill.exec.vector.complex.writer.VarCharWriter 
varCharWriter = listWriter.varChar();
+String extractedResult;
+for(int i = 1; i <= matcher.groupCount(); i++) {
+  extractedResult = matcher.group(i);

Review Comment:
   It is better to avoid creating extra objects in UDFs to reduce the load on 
the garbage collector. Matcher has `Matcher.start(int group)` and 
`Matcher.end(int group)`, so please use them to obtain bytes that correspond to 
marching subsequence.



##
NOTICE:
##
@@ -1,5 +1,5 @@
 Apache Drill
-Copyright 2013-2022 The Apache Software Foundation
+Copyright 2013-2023 The Apache Software Foundation

Review Comment:
   Looks like this PR should be rebased on the latest master. Probably these 
changes are present because of the force push to master.



##
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/CharSequenceWrapper.java:
##
@@ -90,7 +90,10 @@ public char charAt(int index) {
*/
   @Override
   public CharSequence subSequence(int start, int end) {
-throw new UnsupportedOperationException();
+// throw new UnsupportedOperationException();

Review Comment:
   Please remove commented code.





> Add REGEXP_EXTRACT Function
> ---
>
> Key: DRILL-8402
> URL: https://issues.apache.org/jira/browse/DRILL-8402
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Drill
>Affects Versions: 1.21.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 1.21.1
>
>
> This PR adds two UDFs to Drill:
> regexp_extract(, ) which returns an array of strings which 
> were captured by capturing groups in the regex.
> regexp_extract(, , ) returns the text captured by a 
> specific capturing group. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8402) Add REGEXP_EXTRACT Function

2023-02-22 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17692318#comment-17692318
 ] 

ASF GitHub Bot commented on DRILL-8402:
---

cgivre commented on PR #2762:
URL: https://github.com/apache/drill/pull/2762#issuecomment-1440613580

   > 
   
   @vvysotskyi  Should we proceed with this?  Is that a LGTM +1?




> Add REGEXP_EXTRACT Function
> ---
>
> Key: DRILL-8402
> URL: https://issues.apache.org/jira/browse/DRILL-8402
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Drill
>Affects Versions: 1.21.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 1.21.1
>
>
> This PR adds two UDFs to Drill:
> regexp_extract(, ) which returns an array of strings which 
> were captured by capturing groups in the regex.
> regexp_extract(, , ) returns the text captured by a 
> specific capturing group. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8117) Upgrade unit tests to the cluster fixture framework

2023-02-22 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17692141#comment-17692141
 ] 

ASF GitHub Bot commented on DRILL-8117:
---

jnturton merged PR #2763:
URL: https://github.com/apache/drill/pull/2763




> Upgrade unit tests to the cluster fixture framework
> ---
>
> Key: DRILL-8117
> URL: https://issues.apache.org/jira/browse/DRILL-8117
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.20.1
>Reporter: Jingchuan Hu
>Assignee: James Turton
>Priority: Major
> Fix For: 1.21.0
>
>
> Upgrade various unit tests to the cluster fixture framework and replace other 
> instances of deprecated code usage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8117) Upgrade unit tests to the cluster fixture framework

2023-02-22 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17692075#comment-17692075
 ] 

ASF GitHub Bot commented on DRILL-8117:
---

jnturton commented on code in PR #2763:
URL: https://github.com/apache/drill/pull/2763#discussion_r1114070404


##
docs/dev/ClusterFixture.md:
##
@@ -125,6 +125,27 @@ In some cases, you may want to change an option in a test. 
Rather than writing o
 
 Again, you can pass a Java value which the test code will convert to a string, 
then will build the `ALTER SESSION` command.
 
+# Try-with-resource Style of Creating Single-use Client Fixtures.
+
+The benefit of Cluster Fixture framework is to define specific config for 
specific clusterFixture and clientFixture as needed flexibly.
+
+In some cases, clusterFixture has been initialized, and we need to create 
several different config clients for different test cases,

Review Comment:
   ```suggestion
   In some cases, a clusterFixture has been initialized and we need to create 
several different config clients for different test cases.
   ```





> Upgrade unit tests to the cluster fixture framework
> ---
>
> Key: DRILL-8117
> URL: https://issues.apache.org/jira/browse/DRILL-8117
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.20.1
>Reporter: Jingchuan Hu
>Assignee: James Turton
>Priority: Major
> Fix For: 1.21.0
>
>
> Upgrade various unit tests to the cluster fixture framework and replace other 
> instances of deprecated code usage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8117) Upgrade unit tests to the cluster fixture framework

2023-02-22 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17692073#comment-17692073
 ] 

ASF GitHub Bot commented on DRILL-8117:
---

jnturton commented on code in PR #2763:
URL: https://github.com/apache/drill/pull/2763#discussion_r1114071613


##
docs/dev/ClusterFixture.md:
##
@@ -125,6 +125,27 @@ In some cases, you may want to change an option in a test. 
Rather than writing o
 
 Again, you can pass a Java value which the test code will convert to a string, 
then will build the `ALTER SESSION` command.
 
+# Try-with-resource Style of Creating Single-use Client Fixtures.
+
+The benefit of Cluster Fixture framework is to define specific config for 
specific clusterFixture and clientFixture as needed flexibly.

Review Comment:
   ```suggestion
   A benefit of the Cluster Fixture framework is the ability to define specific 
configs for specific clusterFixtures and clientFixtures as needed flexibly.
   ```



##
docs/dev/ClusterFixture.md:
##
@@ -156,6 +177,28 @@ It is often very handy, during development, to accumulate 
a collection of test f
 * The (local) file system location
 * The default format
 
+# Exception Matcher
+
+The `QueryBuilder` provides a clean and concise way to handle Exception match 
which includes type match and pattern match:

Review Comment:
   ```suggestion
   The `QueryBuilder` provides a clean and concise way to handle UserException 
matching which includes error type matching and error message pattern matching:
   ```



##
docs/dev/ClusterFixture.md:
##
@@ -125,6 +125,27 @@ In some cases, you may want to change an option in a test. 
Rather than writing o
 
 Again, you can pass a Java value which the test code will convert to a string, 
then will build the `ALTER SESSION` command.
 
+# Try-with-resource Style of Creating Single-use Client Fixtures.
+
+The benefit of Cluster Fixture framework is to define specific config for 
specific clusterFixture and clientFixture as needed flexibly.
+
+In some cases, clusterFixture has been initialized, and we need to create 
several different config clients for different test cases,
+
+We could use try-with-resource style to creating single-use clientFixture.

Review Comment:
   ```suggestion
   Using Java's try-with-resources syntax to create a single-use clientFixture 
is a convenient way to ensure that the clientFixture will automatically be 
closed once we've finished with it.
   ```



##
docs/dev/ClusterFixture.md:
##
@@ -125,6 +125,27 @@ In some cases, you may want to change an option in a test. 
Rather than writing o
 
 Again, you can pass a Java value which the test code will convert to a string, 
then will build the `ALTER SESSION` command.
 
+# Try-with-resource Style of Creating Single-use Client Fixtures.
+
+The benefit of Cluster Fixture framework is to define specific config for 
specific clusterFixture and clientFixture as needed flexibly.
+
+In some cases, clusterFixture has been initialized, and we need to create 
several different config clients for different test cases,

Review Comment:
   ```suggestion
   In some cases, clusterFixture has been initialized and we need to create 
several different config clients for different test cases.
   ```
   ```suggestion
   In some cases, a clusterFixture has been initialized and we need to create 
several different config clients for different test cases.
   ```



##
docs/dev/ClusterFixture.md:
##
@@ -156,6 +177,28 @@ It is often very handy, during development, to accumulate 
a collection of test f
 * The (local) file system location
 * The default format
 
+# Exception Matcher
+
+The `QueryBuilder` provides a clean and concise way to handle Exception match 
which includes type match and pattern match:
+
+```
+@Test
+public void unsupportedLiteralValidation() throws Exception {
+  String query = "ALTER session SET `%s` = %s";
+
+  client.queryBuilder()
+.sql(query, ENABLE_VERBOSE_ERRORS_KEY, "DATE '1995-01-01'")
+.userExceptionMatcher()
+.expectedType(ErrorType.VALIDATION)
+.include("Drill doesn't support assigning literals of type")
+.match();
+}
+```
+* Use `.userExceptionMatcher` to call UserExceptionMatcher
+* Use `.expectedType` to define expected Error type
+* Use `.include` to define expected Error pattern

Review Comment:
   ```suggestion
   * Use `.include` to define an expected error message regex pattern
   * Use `.exclude` to define an unexpected error message regex pattern
   ```





> Upgrade unit tests to the cluster fixture framework
> ---
>
> Key: DRILL-8117
> URL: https://issues.apache.org/jira/browse/DRILL-8117
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.20.1
>Repo

[jira] [Commented] (DRILL-8117) Upgrade unit tests to the cluster fixture framework

2023-02-22 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17692063#comment-17692063
 ] 

ASF GitHub Bot commented on DRILL-8117:
---

kingswanwho opened a new pull request, #2763:
URL: https://github.com/apache/drill/pull/2763

   # [MINOR UPDATE]: /docs update for DRILL-8117
   
   ## Description
   
   Update /docs base on the discussion in 
https://github.com/apache/drill/pull/2756
   
   ## Documentation
   
   This is a doc update
   
   ## Testing
   N/A
   




> Upgrade unit tests to the cluster fixture framework
> ---
>
> Key: DRILL-8117
> URL: https://issues.apache.org/jira/browse/DRILL-8117
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.20.1
>Reporter: Jingchuan Hu
>Assignee: James Turton
>Priority: Major
> Fix For: 1.21.0
>
>
> Upgrade various unit tests to the cluster fixture framework and replace other 
> instances of deprecated code usage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8402) Add REGEXP_EXTRACT Function

2023-02-21 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17691944#comment-17691944
 ] 

ASF GitHub Bot commented on DRILL-8402:
---

vvysotskyi commented on PR #2762:
URL: https://github.com/apache/drill/pull/2762#issuecomment-1439487042

   Ok, in this case, we can add this UDF.




> Add REGEXP_EXTRACT Function
> ---
>
> Key: DRILL-8402
> URL: https://issues.apache.org/jira/browse/DRILL-8402
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Drill
>Affects Versions: 1.21.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 1.21.1
>
>
> This PR adds two UDFs to Drill:
> regexp_extract(, ) which returns an array of strings which 
> were captured by capturing groups in the regex.
> regexp_extract(, , ) returns the text captured by a 
> specific capturing group. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8402) Add REGEXP_EXTRACT Function

2023-02-21 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17691724#comment-17691724
 ] 

ASF GitHub Bot commented on DRILL-8402:
---

cgivre commented on PR #2762:
URL: https://github.com/apache/drill/pull/2762#issuecomment-1438871668

   > Wouldn't this change introduce ReDoS vulnerability?
   
   Potentially, but we already allow `REGEXP_REPLACE` and `REGEX_MATCHES`, so I 
don't know that this actually makes anything worse.  I did try adding a 
validator with this `saferegex`[1] but that library is not suitable for 
inclusion in Drill. (It prints all kinds of stuff to STDOUT.) 
   
   [1]: https://github.com/jkutner/saferegex




> Add REGEXP_EXTRACT Function
> ---
>
> Key: DRILL-8402
> URL: https://issues.apache.org/jira/browse/DRILL-8402
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Drill
>Affects Versions: 1.21.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 1.21.1
>
>
> This PR adds two UDFs to Drill:
> regexp_extract(, ) which returns an array of strings which 
> were captured by capturing groups in the regex.
> regexp_extract(, , ) returns the text captured by a 
> specific capturing group. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8402) Add REGEXP_EXTRACT Function

2023-02-21 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17691471#comment-17691471
 ] 

ASF GitHub Bot commented on DRILL-8402:
---

vvysotskyi commented on PR #2762:
URL: https://github.com/apache/drill/pull/2762#issuecomment-1438117260

   Wouldn't this change introduce ReDoS vulnerability?




> Add REGEXP_EXTRACT Function
> ---
>
> Key: DRILL-8402
> URL: https://issues.apache.org/jira/browse/DRILL-8402
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Drill
>Affects Versions: 1.21.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 1.21.0
>
>
> This PR adds two UDFs to Drill:
> regexp_extract(, ) which returns an array of strings which 
> were captured by capturing groups in the regex.
> regexp_extract(, , ) returns the text captured by a 
> specific capturing group. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8402) Add REGEXP_EXTRACT Function

2023-02-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17691361#comment-17691361
 ] 

ASF GitHub Bot commented on DRILL-8402:
---

cgivre opened a new pull request, #2762:
URL: https://github.com/apache/drill/pull/2762

   # [DRILL-8402](https://issues.apache.org/jira/browse/DRILL-8402): Add 
REGEXP_EXTRACT Function
   
   ## Description
   Adds `regexp_extract` functions to Drill.
   
   ## Documentation
   This PR adds support for `regexp_extract(, )` which returns 
an array of text corresponding with the capturing groups in the regex.  It also 
includes `regexp_extract(, , )` which returns the text of 
a specific capturing group.
   
   ```sql
   SELECT regexp_extract('123-456-789', '([0-9]{3})-([0-9]{3})-([0-9]{3})');
   +-+
   |   EXPR$0|
   +-+
   | ["123","456","789"] |
   +-+
   
   SELECT regexp_extract('123-456-789', '([0-9]{3})-([0-9]{3})-([0-9]{3})', 0);
   +-+
   |   EXPR$0|
   +-+
   | 123-456-789 |
   +-+
   
   SELECT regexp_extract('123-456-789', '([0-9]{3})-([0-9]{3})-([0-9]{3})', 3);
   ++
   | EXPR$0 |
   ++
   | 789|
   ++
   ```
   
   
   ## Testing
   Added unit tests.
   




> Add REGEXP_EXTRACT Function
> ---
>
> Key: DRILL-8402
> URL: https://issues.apache.org/jira/browse/DRILL-8402
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Drill
>Affects Versions: 1.21.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 1.21.0
>
>
> This PR adds two UDFs to Drill:
> regexp_extract(, ) which returns an array of strings which 
> were captured by capturing groups in the regex.
> regexp_extract(, , ) returns the text captured by a 
> specific capturing group. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8117) Upgrade unit tests to the cluster fixture framework

2023-02-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17688812#comment-17688812
 ] 

ASF GitHub Bot commented on DRILL-8117:
---

cgivre merged PR #2756:
URL: https://github.com/apache/drill/pull/2756




> Upgrade unit tests to the cluster fixture framework
> ---
>
> Key: DRILL-8117
> URL: https://issues.apache.org/jira/browse/DRILL-8117
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.20.1
>Reporter: Jingchuan Hu
>Assignee: James Turton
>Priority: Major
> Fix For: 1.21.0
>
>
> Upgrade various unit tests to the cluster fixture framework and replace other 
> instances of deprecated code usage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8117) Upgrade unit tests to the cluster fixture framework

2023-02-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17688803#comment-17688803
 ] 

ASF GitHub Bot commented on DRILL-8117:
---

jnturton commented on PR #2756:
URL: https://github.com/apache/drill/pull/2756#issuecomment-1430707391

   Message to whoever squashes and merges here, in case it's not me: when 
cleaning up the squashed commit detail message please retain the co-author 
footer so that the repo will reflect @kingswanwho's contribution.
   ```
   -
   
   Co-authored-by: kingswanwho 
   ```




> Upgrade unit tests to the cluster fixture framework
> ---
>
> Key: DRILL-8117
> URL: https://issues.apache.org/jira/browse/DRILL-8117
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.20.1
>Reporter: Jingchuan Hu
>Assignee: James Turton
>Priority: Major
> Fix For: 1.21.0
>
>
> Upgrade various unit tests to the cluster fixture framework and replace other 
> instances of deprecated code usage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8117) Upgrade unit tests to the cluster fixture framework

2023-02-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17688802#comment-17688802
 ] 

ASF GitHub Bot commented on DRILL-8117:
---

jnturton commented on PR #2756:
URL: https://github.com/apache/drill/pull/2756#issuecomment-1430704625

   Okay, I think the penny's finally dropped. I was also thinking about the 
markdown in /docs but couldn't fathom what we'd add. But the new 
UserExceptionMatcher usage can be described and also the try-with-resources 
style of creating single-use client fixtures.




> Upgrade unit tests to the cluster fixture framework
> ---
>
> Key: DRILL-8117
> URL: https://issues.apache.org/jira/browse/DRILL-8117
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.20.1
>Reporter: Jingchuan Hu
>Assignee: James Turton
>Priority: Major
> Fix For: 1.21.0
>
>
> Upgrade various unit tests to the cluster fixture framework and replace other 
> instances of deprecated code usage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8117) Upgrade unit tests to the cluster fixture framework

2023-02-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17688779#comment-17688779
 ] 

ASF GitHub Bot commented on DRILL-8117:
---

kingswanwho commented on PR #2756:
URL: https://github.com/apache/drill/pull/2756#issuecomment-1430637603

   > > > One other question. Should we document this in the developer 
documentation?
   > > 
   > > 
   > > I think we do have developer documentation describing cluster fixture 
tests, or do you mean something else?
   > 
   > I was referring to the markdown files in the `/docs` folder. With this PR 
do those need to be updated? (It doesn't have to be a part of this PR.)
   
   Hi Charles, I have checked /docs developer information, this PR transfer 
test framework from BaseTestQuery to ClusterTest, and doesn't change the test 
logic of ClusterTest. James helps to find a clean way to create new 
ClientFixture, and handle UserException. I can help to update those information 
in /docs in a new PR.




> Upgrade unit tests to the cluster fixture framework
> ---
>
> Key: DRILL-8117
> URL: https://issues.apache.org/jira/browse/DRILL-8117
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.20.1
>Reporter: Jingchuan Hu
>Assignee: James Turton
>Priority: Major
> Fix For: 1.21.0
>
>
> Upgrade various unit tests to the cluster fixture framework and replace other 
> instances of deprecated code usage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8401) Skip nested MAP column without children when creating parquet tables

2023-02-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17688685#comment-17688685
 ] 

ASF GitHub Bot commented on DRILL-8401:
---

cgivre merged PR #2757:
URL: https://github.com/apache/drill/pull/2757




> Skip nested MAP column without children when creating parquet tables
> 
>
> Key: DRILL-8401
> URL: https://issues.apache.org/jira/browse/DRILL-8401
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.20.3
>Reporter: James Turton
>Assignee: James Turton
>Priority: Major
> Fix For: 1.21.0
>
>
> This extends the work of DRILL-8272 in order to handle nested empty MAPs 
> which currently also break the Parquet writer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8117) Upgrade unit tests to the cluster fixture framework

2023-02-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17688630#comment-17688630
 ] 

ASF GitHub Bot commented on DRILL-8117:
---

cgivre commented on PR #2756:
URL: https://github.com/apache/drill/pull/2756#issuecomment-1430116439

   > > One other question. Should we document this in the developer 
documentation?
   > 
   > I think we do have developer documentation describing cluster fixture 
tests, or do you mean something else?
   
   I was referring to the markdown files in the `/docs` folder.   With this PR 
do those need to be updated?  (It doesn't have to be a part of this PR.)




> Upgrade unit tests to the cluster fixture framework
> ---
>
> Key: DRILL-8117
> URL: https://issues.apache.org/jira/browse/DRILL-8117
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.20.1
>Reporter: Jingchuan Hu
>Assignee: James Turton
>Priority: Major
> Fix For: 1.21.0
>
>
> Upgrade various unit tests to the cluster fixture framework and replace other 
> instances of deprecated code usage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8117) Upgrade unit tests to the cluster fixture framework

2023-02-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17688628#comment-17688628
 ] 

ASF GitHub Bot commented on DRILL-8117:
---

jnturton commented on PR #2756:
URL: https://github.com/apache/drill/pull/2756#issuecomment-1430114758

   > One other question. Should we document this in the developer documentation?
   
   I think we do have developer documentation describing cluster tests, or do 
you mean something else?




> Upgrade unit tests to the cluster fixture framework
> ---
>
> Key: DRILL-8117
> URL: https://issues.apache.org/jira/browse/DRILL-8117
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.20.1
>Reporter: Jingchuan Hu
>Assignee: James Turton
>Priority: Major
> Fix For: 1.21.0
>
>
> Upgrade various unit tests to the cluster fixture framework and replace other 
> instances of deprecated code usage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8117) Upgrade unit tests to the cluster fixture framework

2023-02-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17688612#comment-17688612
 ] 

ASF GitHub Bot commented on DRILL-8117:
---

kingswanwho commented on PR #2756:
URL: https://github.com/apache/drill/pull/2756#issuecomment-1430077393

   > @jnturton @kingswanwho Should we close the other PR?
   
   Yes, I closed another PR




> Upgrade unit tests to the cluster fixture framework
> ---
>
> Key: DRILL-8117
> URL: https://issues.apache.org/jira/browse/DRILL-8117
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.20.1
>Reporter: Jingchuan Hu
>Assignee: James Turton
>Priority: Major
> Fix For: 1.21.0
>
>
> Upgrade various unit tests to the cluster fixture framework and replace other 
> instances of deprecated code usage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


<    1   2   3   4   5   6   7   8   9   10   >