[jira] [Commented] (DRILL-8136) Overhaul implict type casting logic

2022-09-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17602355#comment-17602355
 ] 

ASF GitHub Bot commented on DRILL-8136:
---

cgivre commented on PR #2638:
URL: https://github.com/apache/drill/pull/2638#issuecomment-1241993355

   @jnturton Did you see my question about boolean values?




> Overhaul implict type casting logic
> ---
>
> Key: DRILL-8136
> URL: https://issues.apache.org/jira/browse/DRILL-8136
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Esther Buchwalter
>Assignee: James Turton
>Priority: Minor
>
> The existing implicit casting system is built on simplistic total ordering of 
> data types[1] that yields oddities such as TINYINT being regarded as the 
> closest numeric type to VARCHAR or DATE the closest type to FLOAT8. This, in 
> turn, hurts the range of data types with which SQL functions can be used. 
> E.g. `select sqrt('3.1415926')` works in many RDBMSes but not in Drill while, 
> confusingly, `select '123' + 456` does work in Drill. In addition the 
> limitations of the existing type precedence list mean that it has been 
> supplmented with ad hoc secondary casting rules that go in the opposite 
> direction.
> This Issue proposes a new, more flexible definition of casting distance based 
> on a weighted directed graph built over the Drill data types.
> [1] 
> [https://drill.apache.org/docs/supported-data-types/#implicit-casting-precedence-of-data-types]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8136) Overhaul implict type casting logic

2022-09-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17602354#comment-17602354
 ] 

ASF GitHub Bot commented on DRILL-8136:
---

cgivre commented on PR #2638:
URL: https://github.com/apache/drill/pull/2638#issuecomment-1241992856

   > I wonder whether ResolverTypePrecedence should cache computed casting 
costs, or whether that would be premature optimisation.
   
   Maybe make that a separate PR.   In theory the whole graph could be 
pre-computed right?




> Overhaul implict type casting logic
> ---
>
> Key: DRILL-8136
> URL: https://issues.apache.org/jira/browse/DRILL-8136
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Esther Buchwalter
>Assignee: James Turton
>Priority: Minor
>
> The existing implicit casting system is built on simplistic total ordering of 
> data types[1] that yields oddities such as TINYINT being regarded as the 
> closest numeric type to VARCHAR or DATE the closest type to FLOAT8. This, in 
> turn, hurts the range of data types with which SQL functions can be used. 
> E.g. `select sqrt('3.1415926')` works in many RDBMSes but not in Drill while, 
> confusingly, `select '123' + 456` does work in Drill. In addition the 
> limitations of the existing type precedence list mean that it has been 
> supplmented with ad hoc secondary casting rules that go in the opposite 
> direction.
> This Issue proposes a new, more flexible definition of casting distance based 
> on a weighted directed graph built over the Drill data types.
> [1] 
> [https://drill.apache.org/docs/supported-data-types/#implicit-casting-precedence-of-data-types]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8136) Overhaul implict type casting logic

2022-09-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17602352#comment-17602352
 ] 

ASF GitHub Bot commented on DRILL-8136:
---

jnturton commented on PR #2638:
URL: https://github.com/apache/drill/pull/2638#issuecomment-1241991578

   I wonder whether ResolverTypePrecedence should cache computed casting costs, 
or whether that would be premature optimisation.




> Overhaul implict type casting logic
> ---
>
> Key: DRILL-8136
> URL: https://issues.apache.org/jira/browse/DRILL-8136
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Esther Buchwalter
>Assignee: James Turton
>Priority: Minor
>
> The existing implicit casting system is built on simplistic total ordering of 
> data types[1] that yields oddities such as TINYINT being regarded as the 
> closest numeric type to VARCHAR or DATE the closest type to FLOAT8. This, in 
> turn, hurts the range of data types with which SQL functions can be used. 
> E.g. `select sqrt('3.1415926')` works in many RDBMSes but not in Drill while, 
> confusingly, `select '123' + 456` does work in Drill. In addition the 
> limitations of the existing type precedence list mean that it has been 
> supplmented with ad hoc secondary casting rules that go in the opposite 
> direction.
> This Issue proposes a new, more flexible definition of casting distance based 
> on a weighted directed graph built over the Drill data types.
> [1] 
> [https://drill.apache.org/docs/supported-data-types/#implicit-casting-precedence-of-data-types]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8293) Add a docker-compose file to run Drill in cluster mode

2022-09-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17602118#comment-17602118
 ] 

ASF GitHub Bot commented on DRILL-8293:
---

jnturton merged PR #2640:
URL: https://github.com/apache/drill/pull/2640




> Add a docker-compose file to run Drill in cluster mode
> --
>
> Key: DRILL-8293
> URL: https://issues.apache.org/jira/browse/DRILL-8293
> Project: Apache Drill
>  Issue Type: Improvement
>  Components:  Server
>Affects Versions: 1.20.2
>Reporter: James Turton
>Priority: Minor
> Fix For: 1.20.3
>
>
> Add a docker-compose file based on the official Docker images but overriding 
> the ENTRYPOINT to launch Drill in cluster mode and including a ZooKeeper 
> container. This can be used to experiment with cluster mode on a single 
> machine, or to run a real cluster on platforms that work with docker-compose 
> like Docker Swarm or ECS.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8295) Probable resource leak in the HTTP storage plugin

2022-09-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17602019#comment-17602019
 ] 

ASF GitHub Bot commented on DRILL-8295:
---

cgivre merged PR #2641:
URL: https://github.com/apache/drill/pull/2641




> Probable resource leak in the HTTP storage plugin
> -
>
> Key: DRILL-8295
> URL: https://issues.apache.org/jira/browse/DRILL-8295
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - HTTP
>Affects Versions: 1.20.2
>Reporter: James Turton
>Assignee: James Turton
>Priority: Major
> Fix For: 1.20.3
>
>
> It looks to me like SimpleHttp does not always close objects created using 
> OkHttp, e.g. line 378.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8136) Overhaul implict type casting logic

2022-09-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17601847#comment-17601847
 ] 

ASF GitHub Bot commented on DRILL-8136:
---

cgivre commented on PR #2638:
URL: https://github.com/apache/drill/pull/2638#issuecomment-1240788876

   > 
   
   This is really a MAJOR usability improvement.  Will it also be able to cast 
`"true"` and `"false"` as boolean values?  Likewise for:
   * True
   * TRUE
   * TrUe
   
   etc




> Overhaul implict type casting logic
> ---
>
> Key: DRILL-8136
> URL: https://issues.apache.org/jira/browse/DRILL-8136
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Esther Buchwalter
>Assignee: James Turton
>Priority: Minor
>
> The existing implicit casting system is built on simplistic total ordering of 
> data types[1] that yields oddities such as TINYINT being regarded as the 
> closest numeric type to VARCHAR or DATE the closest type to FLOAT8. This, in 
> turn, hurts the range of data types with which SQL functions can be used. 
> E.g. `select sqrt('3.1415926')` works in many RDBMSes but not in Drill while, 
> confusingly, `select '123' + 456` does work in Drill. In addition the 
> limitations of the existing type precedence list mean that it has been 
> supplmented with ad hoc secondary casting rules that go in the opposite 
> direction.
> This Issue proposes a new, more flexible definition of casting distance based 
> on a weighted directed graph built over the Drill data types.
> [1] 
> [https://drill.apache.org/docs/supported-data-types/#implicit-casting-precedence-of-data-types]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8136) Overhaul implict type casting logic

2022-09-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17601846#comment-17601846
 ] 

ASF GitHub Bot commented on DRILL-8136:
---

jnturton commented on PR #2638:
URL: https://github.com/apache/drill/pull/2638#issuecomment-1240784918

   > > 
   > 
   > That brings a tear to me eye!
   
   A piece I haven't added is a cast function implementation going from BIT to 
INT using the normal 0 = false, 1 = true correspondence to enable little 
conveniences like taking the sum or average of a boolean. I enjoyed using 
tricks like that in Impala IIRC. But the new casting logic here does provide 
for this, all that's missing is the cast function itself:
   
   ```
   Error: Missing function implementation: [castINT(BIT-OPTIONAL)]
   ``




> Overhaul implict type casting logic
> ---
>
> Key: DRILL-8136
> URL: https://issues.apache.org/jira/browse/DRILL-8136
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Esther Buchwalter
>Assignee: James Turton
>Priority: Minor
>
> The existing implicit casting system is built on simplistic total ordering of 
> data types[1] that yields oddities such as TINYINT being regarded as the 
> closest numeric type to VARCHAR or DATE the closest type to FLOAT8. This, in 
> turn, hurts the range of data types with which SQL functions can be used. 
> E.g. `select sqrt('3.1415926')` works in many RDBMSes but not in Drill while, 
> confusingly, `select '123' + 456` does work in Drill. In addition the 
> limitations of the existing type precedence list mean that it has been 
> supplmented with ad hoc secondary casting rules that go in the opposite 
> direction.
> This Issue proposes a new, more flexible definition of casting distance based 
> on a weighted directed graph built over the Drill data types.
> [1] 
> [https://drill.apache.org/docs/supported-data-types/#implicit-casting-precedence-of-data-types]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8136) Overhaul implict type casting logic

2022-09-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17601843#comment-17601843
 ] 

ASF GitHub Bot commented on DRILL-8136:
---

cgivre commented on PR #2638:
URL: https://github.com/apache/drill/pull/2638#issuecomment-1240779567

   > 
   
   That brings a tear to me eye!




> Overhaul implict type casting logic
> ---
>
> Key: DRILL-8136
> URL: https://issues.apache.org/jira/browse/DRILL-8136
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Esther Buchwalter
>Assignee: James Turton
>Priority: Minor
>
> The existing implicit casting system is built on simplistic total ordering of 
> data types[1] that yields oddities such as TINYINT being regarded as the 
> closest numeric type to VARCHAR or DATE the closest type to FLOAT8. This, in 
> turn, hurts the range of data types with which SQL functions can be used. 
> E.g. `select sqrt('3.1415926')` works in many RDBMSes but not in Drill while, 
> confusingly, `select '123' + 456` does work in Drill. In addition the 
> limitations of the existing type precedence list mean that it has been 
> supplmented with ad hoc secondary casting rules that go in the opposite 
> direction.
> This Issue proposes a new, more flexible definition of casting distance based 
> on a weighted directed graph built over the Drill data types.
> [1] 
> [https://drill.apache.org/docs/supported-data-types/#implicit-casting-precedence-of-data-types]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8136) Overhaul implict type casting logic

2022-09-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17601842#comment-17601842
 ] 

ASF GitHub Bot commented on DRILL-8136:
---

jnturton commented on PR #2638:
URL: https://github.com/apache/drill/pull/2638#issuecomment-1240777691

   > Queries like that will work in MySQL and other RDBMS.  In Drill I think 
they won't fail, but the results are not what people expect.   For cases like 
this, would '2020-01-01' be automatically cast to a date?  Would the same thing 
happen in situations like...
   
   ```
   apache drill> select date_diff('2022-09-08', '1970-01-01');
   EXPR$0  19243 days 0:00:00
   
   1 row selected (0.157 seconds)
   apache drill> select sqrt('5');
   EXPR$0  2.23606797749979
   
   1 row selected (0.119 seconds)
   apache drill> select substring(current_date, 1, 4);
   EXPR$0  2022
   
   1 row selected (0.146 seconds)
   apache drill> select now() > '2022-09-08';
   EXPR$0  true
   ```




> Overhaul implict type casting logic
> ---
>
> Key: DRILL-8136
> URL: https://issues.apache.org/jira/browse/DRILL-8136
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Esther Buchwalter
>Assignee: James Turton
>Priority: Minor
>
> The existing implicit casting system is built on simplistic total ordering of 
> data types[1] that yields oddities such as TINYINT being regarded as the 
> closest numeric type to VARCHAR or DATE the closest type to FLOAT8. This, in 
> turn, hurts the range of data types with which SQL functions can be used. 
> E.g. `select sqrt('3.1415926')` works in many RDBMSes but not in Drill while, 
> confusingly, `select '123' + 456` does work in Drill. In addition the 
> limitations of the existing type precedence list mean that it has been 
> supplmented with ad hoc secondary casting rules that go in the opposite 
> direction.
> This Issue proposes a new, more flexible definition of casting distance based 
> on a weighted directed graph built over the Drill data types.
> [1] 
> [https://drill.apache.org/docs/supported-data-types/#implicit-casting-precedence-of-data-types]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8295) Probable resource leak in the HTTP storage plugin

2022-09-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17601796#comment-17601796
 ] 

ASF GitHub Bot commented on DRILL-8295:
---

cgivre commented on code in PR #2641:
URL: https://github.com/apache/drill/pull/2641#discussion_r965897296


##
contrib/storage-http/src/main/java/org/apache/drill/exec/store/http/udfs/HttpHelperFunctions.java:
##
@@ -189,6 +191,8 @@ public void eval() {
 rowWriter.start();
 if (jsonLoader.parser().next()) {
   rowWriter.save();
+} else {

Review Comment:
   That works for me :-)





> Probable resource leak in the HTTP storage plugin
> -
>
> Key: DRILL-8295
> URL: https://issues.apache.org/jira/browse/DRILL-8295
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - HTTP
>Affects Versions: 1.20.2
>Reporter: James Turton
>Assignee: James Turton
>Priority: Major
> Fix For: 1.20.3
>
>
> It looks to me like SimpleHttp does not always close objects created using 
> OkHttp, e.g. line 378.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8301) Standardise on UTF-8 encoding for char to byte (and vice versa) conversions

2022-09-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17601758#comment-17601758
 ] 

ASF GitHub Bot commented on DRILL-8301:
---

pjfanning commented on PR #2637:
URL: https://github.com/apache/drill/pull/2637#issuecomment-1240555174

   With jackson - JSON spec (https://www.ietf.org/rfc/rfc4627.txt) mandates 
unicode with utf-8 as default. XML mandates utf-8 as default. Quite rare in my 
experience to see other Unicode charsets used. Utf-8 encoding should use fewer 
bytes for Latin alphabet based text and numeric data.
   
   Java strings can now use utf-16 internally. I'm not sure if there is a 
performance impact using utf-16 instead of utf-8 
(https://www.dariawan.com/tutorials/java/java-9-compact-string-and-string-new-methods/).
   
   My main concern is correctness and testability as opposed to performance. 
Choosing one encoding for externally facing data and another internally would 
introduce a lot of extra complexity and possibly confusion as to which to 
choose in certain scenarios - and possibly lower performance as you would often 
need to convert between the 2 encodings.




> Standardise on UTF-8 encoding for char to byte (and vice versa) conversions
> ---
>
> Key: DRILL-8301
> URL: https://issues.apache.org/jira/browse/DRILL-8301
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: PJ Fanning
>Priority: Major
>
> Lots of Drill code uses UTF-8 explicitly. Lots more Drill code does not set 
> an explicit encoding which means it relies on the JVM default (which differs 
> by JVM install).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8301) Standardise on UTF-8 encoding for char to byte (and vice versa) conversions

2022-09-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17601754#comment-17601754
 ] 

ASF GitHub Bot commented on DRILL-8301:
---

jnturton commented on PR #2637:
URL: https://github.com/apache/drill/pull/2637#issuecomment-1240540040

   I guess there are two different classes of character data.
   
   1. Internal use character data where we can use whatever encoding we like 
and perhaps would choose based on performance (would that suggest UTF-16?).
   2. Interchange character data that we share with the outside world, e.g. a 
JSON file that Drill wants to query. It feels like it would be nice if we can 
accept different encodings here. I wonder what Jackson and friends do w.r.t. 
character encodings.




> Standardise on UTF-8 encoding for char to byte (and vice versa) conversions
> ---
>
> Key: DRILL-8301
> URL: https://issues.apache.org/jira/browse/DRILL-8301
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: PJ Fanning
>Priority: Major
>
> Lots of Drill code uses UTF-8 explicitly. Lots more Drill code does not set 
> an explicit encoding which means it relies on the JVM default (which differs 
> by JVM install).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8302) tidy up some char conversions

2022-09-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17601749#comment-17601749
 ] 

ASF GitHub Bot commented on DRILL-8302:
---

pjfanning opened a new pull request, #2645:
URL: https://github.com/apache/drill/pull/2645

   ## Description
   
   Code tidy-up.
   
   ## Documentation
   (Please describe user-visible changes similar to what should appear in the 
Drill documentation.)
   
   ## Testing
   (Please describe how this PR has been tested.)
   




> tidy up some char conversions
> -
>
> Key: DRILL-8302
> URL: https://issues.apache.org/jira/browse/DRILL-8302
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: PJ Fanning
>Priority: Major
>
> As part of DRILL-8301, I spotted code that could be tidied up. The aim of 
> this issue is to reduce the size of DRILL-8301 without introducing changes to 
> the char encodings.
>  * uses of a pattern like `new String("")` - IntelliJ and other tools 
> highlight this as unnecessary
>  * uses of `new String(bytes, StandardCharsets.UTF_8.name())` - better to use 
> `new String(bytes, StandardCharsets.UTF_8)`
>  * use Base64 encodeToString instead of case where we encode to bytes and 
> then do our own encoding of those bytes to a String
>  * Replace existing code with `Charset.forName("UTF-8")` to use 
> `StandardCharsets.UTF_8`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8117) Clean up deprecated Apache code in Drill

2022-09-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17601733#comment-17601733
 ] 

ASF GitHub Bot commented on DRILL-8117:
---

jnturton commented on code in PR #2499:
URL: https://github.com/apache/drill/pull/2499#discussion_r965700029


##
exec/java-exec/src/test/java/org/apache/drill/exec/impersonation/TestInboundImpersonation.java:
##
@@ -156,22 +159,25 @@ public void unauthorizedTarget() throws Exception {
 
   @Test
   public void invalidPolicy() throws Exception {
-thrownException.expect(new 
UserExceptionMatcher(UserBitShared.DrillPBError.ErrorType.VALIDATION,
-"Invalid impersonation policies."));
+String query = "ALTER SYSTEM SET `%s`='%s'";

Review Comment:
   Did you try the following here?
   ```
   client.alterSystem(...);
   try {
 // run test
   } finally {
 client.resetSystem(...);
   }
   ```





> Clean up deprecated Apache code in Drill
> 
>
> Key: DRILL-8117
> URL: https://issues.apache.org/jira/browse/DRILL-8117
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.20.1
>Reporter: Jingchuan Hu
>Priority: Major
> Fix For: 2.0.0
>
>
> Clean up and upgrade deprecated Apache code like: 
> Class PathChildrenCache in Class ZookeeperClient and Class StringEscapeUtils 
> in Class PlanStringBuilder
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8117) Clean up deprecated Apache code in Drill

2022-09-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17601732#comment-17601732
 ] 

ASF GitHub Bot commented on DRILL-8117:
---

jnturton commented on PR #2499:
URL: https://github.com/apache/drill/pull/2499#issuecomment-1240441921

   Hi @kingswanwho everything here looks good to me, let's just see if we can 
replace the `ALTER SYSTEM`s with `client.alterSystem`s.




> Clean up deprecated Apache code in Drill
> 
>
> Key: DRILL-8117
> URL: https://issues.apache.org/jira/browse/DRILL-8117
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.20.1
>Reporter: Jingchuan Hu
>Priority: Major
> Fix For: 2.0.0
>
>
> Clean up and upgrade deprecated Apache code like: 
> Class PathChildrenCache in Class ZookeeperClient and Class StringEscapeUtils 
> in Class PlanStringBuilder
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8300) Upgrade to snakeyaml 1.31 due to cve

2022-09-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17601709#comment-17601709
 ] 

ASF GitHub Bot commented on DRILL-8300:
---

jnturton merged PR #2643:
URL: https://github.com/apache/drill/pull/2643




> Upgrade to snakeyaml 1.31 due to cve
> 
>
> Key: DRILL-8300
> URL: https://issues.apache.org/jira/browse/DRILL-8300
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: PJ Fanning
>Priority: Major
>
> https://github.com/advisories/GHSA-3mc7-4q67-w48m



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8295) Probable resource leak in the HTTP storage plugin

2022-09-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17601636#comment-17601636
 ] 

ASF GitHub Bot commented on DRILL-8295:
---

jnturton commented on code in PR #2641:
URL: https://github.com/apache/drill/pull/2641#discussion_r965517773


##
contrib/storage-http/src/main/java/org/apache/drill/exec/store/http/udfs/HttpHelperFunctions.java:
##
@@ -189,6 +191,8 @@ public void eval() {
 rowWriter.start();
 if (jsonLoader.parser().next()) {
   rowWriter.save();
+} else {

Review Comment:
   I fed the http_get function a string containing 50 million little JSON 
objects from sequence {"foo": 1} {"foo": 2} {"foo": 3}... and it got through it 
(took about 45s). I just don't know if that answers the right question.





> Probable resource leak in the HTTP storage plugin
> -
>
> Key: DRILL-8295
> URL: https://issues.apache.org/jira/browse/DRILL-8295
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - HTTP
>Affects Versions: 1.20.2
>Reporter: James Turton
>Assignee: James Turton
>Priority: Major
> Fix For: 1.20.3
>
>
> It looks to me like SimpleHttp does not always close objects created using 
> OkHttp, e.g. line 378.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8283) Add a configurable recursive file listing size limit

2022-09-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17601630#comment-17601630
 ] 

ASF GitHub Bot commented on DRILL-8283:
---

jnturton merged PR #2632:
URL: https://github.com/apache/drill/pull/2632




> Add a configurable recursive file listing size limit
> 
>
> Key: DRILL-8283
> URL: https://issues.apache.org/jira/browse/DRILL-8283
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.20.2
>Reporter: James Turton
>Assignee: James Turton
>Priority: Minor
> Fix For: 1.20.3
>
>
> Currently a malicious or merely unwitting user can crash their Drill foreman 
> by sending
> {code:java}
> select * from dfs.huge_workspace limit 10
> {code}
> causing the query planner to recurse over every file in huge_workspace and 
> culminating in
> {code:java}
> 2022-08-09 15:13:22,251 [1d0da29f-e50c-fd51-43d9-8a5086d52c4e:foreman] ERROR 
> o.a.drill.common.CatastrophicFailure - Catastrophic Failure Occurred, 
> exiting. Information message: Unable to handle out of memory condition in 
> Foreman.java.lang.OutOfMemoryError: null {code}
> if there are enough files in huge_workspace. A SHOW FILES command can produce 
> the same effect. This issue proposes a new BOOT option named 
> drill.exec.storage.file.recursive_listing_max_size with a default value of, 
> say 10 000. If a file listing task exceeds this limit then the initiating 
> operation is terminated with a UserException preventing runaway resource 
> usage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8295) Probable resource leak in the HTTP storage plugin

2022-09-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17601341#comment-17601341
 ] 

ASF GitHub Bot commented on DRILL-8295:
---

jnturton commented on code in PR #2641:
URL: https://github.com/apache/drill/pull/2641#discussion_r964935897


##
contrib/storage-http/src/main/java/org/apache/drill/exec/store/http/udfs/HttpHelperFunctions.java:
##
@@ -189,6 +191,8 @@ public void eval() {
 rowWriter.start();
 if (jsonLoader.parser().next()) {
   rowWriter.save();
+} else {

Review Comment:
   @cgivre yes, you're right. I tried a couple of things. First I provided a 
JSON response that would normally produce 64k+1 rows if queried to http_get but 
it looked to me like it was being handled in a single batch since, I guess, the 
row count of a query based on VALUES(1) is still 1. I then wrote a query to 
`SELECT http_get(some simple JSON)` from a mock table containing 64k+1 rows. 
This overwhelms the okhttp3 mock server and fails with a timeout. I'm not sure 
if there some other test to try here?





> Probable resource leak in the HTTP storage plugin
> -
>
> Key: DRILL-8295
> URL: https://issues.apache.org/jira/browse/DRILL-8295
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - HTTP
>Affects Versions: 1.20.2
>Reporter: James Turton
>Assignee: James Turton
>Priority: Major
> Fix For: 1.20.3
>
>
> It looks to me like SimpleHttp does not always close objects created using 
> OkHttp, e.g. line 378.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8295) Probable resource leak in the HTTP storage plugin

2022-09-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17601340#comment-17601340
 ] 

ASF GitHub Bot commented on DRILL-8295:
---

jnturton commented on code in PR #2641:
URL: https://github.com/apache/drill/pull/2641#discussion_r964935897


##
contrib/storage-http/src/main/java/org/apache/drill/exec/store/http/udfs/HttpHelperFunctions.java:
##
@@ -189,6 +191,8 @@ public void eval() {
 rowWriter.start();
 if (jsonLoader.parser().next()) {
   rowWriter.save();
+} else {

Review Comment:
   @cgivre yes, you're right. I tried a couple of things. First I provided a 
JSON response that would normally produce 64k+1 rows if queried to http_get but 
it looked to me like it was being handled in a single batch since, I guess, the 
row count of the query is still 1. I then wrote a query to `SELECT 
http_get(some simple JSON)` from a mock table containing 64k+1 rows. This 
overwhelms the okhttp3 mock server and fails with a timeout. I'm not sure if 
there some other test to try here?





> Probable resource leak in the HTTP storage plugin
> -
>
> Key: DRILL-8295
> URL: https://issues.apache.org/jira/browse/DRILL-8295
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - HTTP
>Affects Versions: 1.20.2
>Reporter: James Turton
>Assignee: James Turton
>Priority: Major
> Fix For: 1.20.3
>
>
> It looks to me like SimpleHttp does not always close objects created using 
> OkHttp, e.g. line 378.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8295) Probable resource leak in the HTTP storage plugin

2022-09-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17601288#comment-17601288
 ] 

ASF GitHub Bot commented on DRILL-8295:
---

cgivre commented on code in PR #2641:
URL: https://github.com/apache/drill/pull/2641#discussion_r964788631


##
contrib/storage-http/src/main/java/org/apache/drill/exec/store/http/udfs/HttpHelperFunctions.java:
##
@@ -189,6 +191,8 @@ public void eval() {
 rowWriter.start();
 if (jsonLoader.parser().next()) {
   rowWriter.save();
+} else {

Review Comment:
   From my recollection, this function does handle the multiple batches.  It 
was the `convert_fromJSON` that @vdiravka was working on.





> Probable resource leak in the HTTP storage plugin
> -
>
> Key: DRILL-8295
> URL: https://issues.apache.org/jira/browse/DRILL-8295
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - HTTP
>Affects Versions: 1.20.2
>Reporter: James Turton
>Assignee: James Turton
>Priority: Major
> Fix For: 1.20.3
>
>
> It looks to me like SimpleHttp does not always close objects created using 
> OkHttp, e.g. line 378.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8300) Upgrade to snakeyaml 1.31 due to cve

2022-09-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17601286#comment-17601286
 ] 

ASF GitHub Bot commented on DRILL-8300:
---

pjfanning opened a new pull request, #2643:
URL: https://github.com/apache/drill/pull/2643

   ## Description
   
   Snakeyaml has a CVE




> Upgrade to snakeyaml 1.31 due to cve
> 
>
> Key: DRILL-8300
> URL: https://issues.apache.org/jira/browse/DRILL-8300
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: PJ Fanning
>Priority: Major
>
> https://github.com/advisories/GHSA-3mc7-4q67-w48m



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8295) Probable resource leak in the HTTP storage plugin

2022-09-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17601275#comment-17601275
 ] 

ASF GitHub Bot commented on DRILL-8295:
---

jnturton commented on code in PR #2641:
URL: https://github.com/apache/drill/pull/2641#discussion_r964747549


##
contrib/storage-http/src/main/java/org/apache/drill/exec/store/http/udfs/HttpHelperFunctions.java:
##
@@ -189,6 +191,8 @@ public void eval() {
 rowWriter.start();
 if (jsonLoader.parser().next()) {
   rowWriter.save();
+} else {

Review Comment:
   @cgivre
   
   1. The JsonLoader closes the input streams it's been working off of when it 
is closed so I don't think so.
   2. Multiple batch datasets do not work with these UDFs yet from what I 
recall? I think @vdiravka continues to work on that, perhaps he can comment on 
the closing of the JsonLoader here.





> Probable resource leak in the HTTP storage plugin
> -
>
> Key: DRILL-8295
> URL: https://issues.apache.org/jira/browse/DRILL-8295
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - HTTP
>Affects Versions: 1.20.2
>Reporter: James Turton
>Assignee: James Turton
>Priority: Major
> Fix For: 1.20.3
>
>
> It looks to me like SimpleHttp does not always close objects created using 
> OkHttp, e.g. line 378.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8295) Probable resource leak in the HTTP storage plugin

2022-09-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17601259#comment-17601259
 ] 

ASF GitHub Bot commented on DRILL-8295:
---

cgivre commented on code in PR #2641:
URL: https://github.com/apache/drill/pull/2641#discussion_r964729134


##
contrib/storage-http/src/main/java/org/apache/drill/exec/store/http/udfs/HttpHelperFunctions.java:
##
@@ -189,6 +191,8 @@ public void eval() {
 rowWriter.start();
 if (jsonLoader.parser().next()) {
   rowWriter.save();
+} else {

Review Comment:
   Should we explicitly close the `results` `InputStream` here as well?  Would 
mind testing this on a query that produces multiple batches?  





> Probable resource leak in the HTTP storage plugin
> -
>
> Key: DRILL-8295
> URL: https://issues.apache.org/jira/browse/DRILL-8295
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - HTTP
>Affects Versions: 1.20.2
>Reporter: James Turton
>Assignee: James Turton
>Priority: Major
> Fix For: 1.20.3
>
>
> It looks to me like SimpleHttp does not always close objects created using 
> OkHttp, e.g. line 378.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8295) Probable resource leak in the HTTP storage plugin

2022-09-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17601245#comment-17601245
 ] 

ASF GitHub Bot commented on DRILL-8295:
---

jnturton commented on PR #2641:
URL: https://github.com/apache/drill/pull/2641#issuecomment-1239237242

   > @jnturton there are also similar issues in
   > 
   > * org.apache.drill.exec.store.http.util.SimpleHttp
   > 
   > * org.apache.drill.exec.store.http.oauth.OAuthUtils
   
   @pjfanning thanks I picked up a couple of extra instances.




> Probable resource leak in the HTTP storage plugin
> -
>
> Key: DRILL-8295
> URL: https://issues.apache.org/jira/browse/DRILL-8295
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - HTTP
>Affects Versions: 1.20.2
>Reporter: James Turton
>Assignee: James Turton
>Priority: Major
> Fix For: 1.20.3
>
>
> It looks to me like SimpleHttp does not always close objects created using 
> OkHttp, e.g. line 378.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8295) Probable resource leak in the HTTP storage plugin

2022-09-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17601236#comment-17601236
 ] 

ASF GitHub Bot commented on DRILL-8295:
---

pjfanning commented on PR #2641:
URL: https://github.com/apache/drill/pull/2641#issuecomment-1239212973

   @jnturton there are also similar issues in
   * org.apache.drill.exec.store.http.util.SimpleHttp
   * org.apache.drill.exec.store.http.oauth.OAuthUtils




> Probable resource leak in the HTTP storage plugin
> -
>
> Key: DRILL-8295
> URL: https://issues.apache.org/jira/browse/DRILL-8295
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - HTTP
>Affects Versions: 1.20.2
>Reporter: James Turton
>Assignee: James Turton
>Priority: Major
> Fix For: 1.20.3
>
>
> It looks to me like SimpleHttp does not always close objects created using 
> OkHttp, e.g. line 378.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8295) Probable resource leak in the HTTP storage plugin

2022-09-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17601204#comment-17601204
 ] 

ASF GitHub Bot commented on DRILL-8295:
---

jnturton opened a new pull request, #2641:
URL: https://github.com/apache/drill/pull/2641

   # [DRILL-8295](https://issues.apache.org/jira/browse/DRILL-8295): Probable 
resource leak in the HTTP storage plugin
   
   ## Description
   
   Adds close() calls in a number of places where HTTP requests are made in the 
HTTP storage plugin.
   
   ## Documentation
   N/A
   
   ## Testing
   Existing unit tests.
   




> Probable resource leak in the HTTP storage plugin
> -
>
> Key: DRILL-8295
> URL: https://issues.apache.org/jira/browse/DRILL-8295
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - HTTP
>Affects Versions: 1.20.2
>Reporter: James Turton
>Assignee: James Turton
>Priority: Major
> Fix For: 1.20.3
>
>
> It looks to me like SimpleHttp does not always close objects created using 
> OkHttp, e.g. line 378.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8136) Overhaul implict type casting logic

2022-09-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17600953#comment-17600953
 ] 

ASF GitHub Bot commented on DRILL-8136:
---

jnturton commented on PR #2638:
URL: https://github.com/apache/drill/pull/2638#issuecomment-1238561477

   > @jnturton Do you think this should be included in the backport to stable?
   
   My own thought is probably not since it changes the function matching 
process and there isn't any clear bug that it fixes. 




> Overhaul implict type casting logic
> ---
>
> Key: DRILL-8136
> URL: https://issues.apache.org/jira/browse/DRILL-8136
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Esther Buchwalter
>Assignee: James Turton
>Priority: Minor
>
> The existing implicit casting system is built on simplistic total ordering of 
> data types[1] that yields oddities such as TINYINT being regarded as the 
> closest numeric type to VARCHAR or DATE the closest type to FLOAT8. This, in 
> turn, hurts the range of data types with which SQL functions can be used. 
> E.g. `select sqrt('3.1415926')` works in many RDBMSes but not in Drill while, 
> confusingly, `select '123' + 456` does work in Drill. In addition the 
> limitations of the existing type precedence list mean that it has been 
> supplmented with ad hoc secondary casting rules that go in the opposite 
> direction.
> This Issue proposes a new, more flexible definition of casting distance based 
> on a weighted directed graph built over the Drill data types.
> [1] 
> [https://drill.apache.org/docs/supported-data-types/#implicit-casting-precedence-of-data-types]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8136) Overhaul implict type casting logic

2022-09-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17600951#comment-17600951
 ] 

ASF GitHub Bot commented on DRILL-8136:
---

cgivre commented on PR #2638:
URL: https://github.com/apache/drill/pull/2638#issuecomment-1238556786

   @jnturton Do you think this should be included in the backport to stable?




> Overhaul implict type casting logic
> ---
>
> Key: DRILL-8136
> URL: https://issues.apache.org/jira/browse/DRILL-8136
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Esther Buchwalter
>Assignee: James Turton
>Priority: Minor
>
> The existing implicit casting system is built on simplistic total ordering of 
> data types[1] that yields oddities such as TINYINT being regarded as the 
> closest numeric type to VARCHAR or DATE the closest type to FLOAT8. This, in 
> turn, hurts the range of data types with which SQL functions can be used. 
> E.g. `select sqrt('3.1415926')` works in many RDBMSes but not in Drill while, 
> confusingly, `select '123' + 456` does work in Drill. In addition the 
> limitations of the existing type precedence list mean that it has been 
> supplmented with ad hoc secondary casting rules that go in the opposite 
> direction.
> This Issue proposes a new, more flexible definition of casting distance based 
> on a weighted directed graph built over the Drill data types.
> [1] 
> [https://drill.apache.org/docs/supported-data-types/#implicit-casting-precedence-of-data-types]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8293) Add a docker-compose file to run Drill in cluster mode

2022-09-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17600726#comment-17600726
 ] 

ASF GitHub Bot commented on DRILL-8293:
---

jnturton opened a new pull request, #2640:
URL: https://github.com/apache/drill/pull/2640

   # [DRILL-8293](https://issues.apache.org/jira/browse/DRILL-8293): Add a 
docker-compose file to run Drill in cluster mode
   
   ## Description
   
   This directory contains source code artifacts to launch Drill in cluster mode
   along with a ZooKeeper. The Drill image is based on a minor customisation of
   the official Drill image that switches it from an embedded to a cluster mode
   launch. Logging is redirected to stdout.
   
   In the docker-cluster-mode directory:
   
   1. docker build -t apache/drill-cluster-mode
   2. docker-compose up
   
   Then access the web UI at http://localhost:8047 or connect a JDBC client to
   jdbc:drill:drillbit=localhost or jdbc:drill:zk=localhost but note that you
   will need to make the drillbit container hostnames resolvable from the host 
to 
   use a ZooKeeper JDBC URL.
   
   To launch a cluster of 3 Drillbits
   
   3. docker-compose up --scale drillbit=3
   
   but first note that to use docker-compose's "scale" feature to run multiple
   Drillbit containers on a single host you will need to remove the host port
   mappings from the compose file to prevent collisions (see the comments
   on the relevant lines in that file). Once the Drillbits are launched run
   `docker-compose ps` to list the ephemeral ports that have been allocated on
   the host.
   
   
   ## Documentation
   Add the above discussion to the Drill in Docker doc page.
   
   ## Testing
   Launch Drill using the provided commands and run queries.
   




> Add a docker-compose file to run Drill in cluster mode
> --
>
> Key: DRILL-8293
> URL: https://issues.apache.org/jira/browse/DRILL-8293
> Project: Apache Drill
>  Issue Type: Improvement
>  Components:  Server
>Affects Versions: 1.20.2
>Reporter: James Turton
>Priority: Minor
> Fix For: 1.20.3
>
>
> Add a docker-compose file based on the official Docker images but overriding 
> the ENTRYPOINT to launch Drill in cluster mode and including a ZooKeeper 
> container. This can be used to experiment with cluster mode on a single 
> machine, or to run a real cluster on platforms that work with docker-compose 
> like Docker Swarm or ECS.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8291) Allow case sensitive Filters in HTTP Plugin

2022-09-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17600342#comment-17600342
 ] 

ASF GitHub Bot commented on DRILL-8291:
---

cgivre merged PR #2639:
URL: https://github.com/apache/drill/pull/2639




> Allow case sensitive Filters in HTTP Plugin
> ---
>
> Key: DRILL-8291
> URL: https://issues.apache.org/jira/browse/DRILL-8291
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - HTTP
>Affects Versions: 1.20.2
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 1.20.3
>
>
> Some APIs will reject filter pushdowns if they are not in the correct case.  
> This PR adds a config option `caseSensitiveFilters` to the API config and 
> when set to true, preserves the case of the filters pushed down. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8275) Prevent the JDBC Client from creating spurious paths in Zookeeper

2022-09-04 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17600211#comment-17600211
 ] 

ASF GitHub Bot commented on DRILL-8275:
---

jnturton merged PR #2617:
URL: https://github.com/apache/drill/pull/2617




> Prevent the JDBC Client from creating spurious paths in Zookeeper
> -
>
> Key: DRILL-8275
> URL: https://issues.apache.org/jira/browse/DRILL-8275
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Client - JDBC
>Reporter: Cong Luo
>Assignee: Cong Luo
>Priority: Major
> Fix For: 2.0.0
>
>
> Use the ZK style on the connection string and the zkRoot does not match the 
> actual path of the cluster, then the client always creates a spurious path 
> (as a permanent) in the Zookeeper.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8275) Prevent the JDBC Client to create error path in Zookeeper

2022-09-04 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17600210#comment-17600210
 ] 

ASF GitHub Bot commented on DRILL-8275:
---

jnturton commented on PR #2617:
URL: https://github.com/apache/drill/pull/2617#issuecomment-1236561756

   > @jnturton Do you think we should backport this PR to stable? @luocooong Do 
you have any opinion on that? I'm not sure if this qualifies as a bug fix or an 
improvement that should wait for Drill 2.0.
   
   This fix looks good for stable to me +1.




> Prevent the JDBC Client to create error path in Zookeeper
> -
>
> Key: DRILL-8275
> URL: https://issues.apache.org/jira/browse/DRILL-8275
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Client - JDBC
>Reporter: Cong Luo
>Assignee: Cong Luo
>Priority: Major
> Fix For: 2.0.0
>
>
> Use the ZK style on the connection string and the zkRoot does not match the 
> actual path of the cluster, then the client always creates the error path (as 
> a permanent) in the Zookeeper.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8275) Prevent the JDBC Client to create error path in Zookeeper

2022-09-04 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17600067#comment-17600067
 ] 

ASF GitHub Bot commented on DRILL-8275:
---

cgivre commented on PR #2617:
URL: https://github.com/apache/drill/pull/2617#issuecomment-1236355392

   @jnturton Do you think we should backport this PR to stable?  
   @luocooong Do you have any opinion on that?  
   I'm not sure if this qualifies as a bug fix or an improvement that should 
wait for Drill 2.0.




> Prevent the JDBC Client to create error path in Zookeeper
> -
>
> Key: DRILL-8275
> URL: https://issues.apache.org/jira/browse/DRILL-8275
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Client - JDBC
>Reporter: Cong Luo
>Assignee: Cong Luo
>Priority: Major
> Fix For: 2.0.0
>
>
> Use the ZK style on the connection string and the zkRoot does not match the 
> actual path of the cluster, then the client always creates the error path (as 
> a permanent) in the Zookeeper.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8136) Overhaul implict type casting logic

2022-09-04 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17600062#comment-17600062
 ] 

ASF GitHub Bot commented on DRILL-8136:
---

cgivre commented on PR #2638:
URL: https://github.com/apache/drill/pull/2638#issuecomment-1236342717

   @jnturton Thanks for this.  IMHO this will be a MAJOR improvement in 
usability.  I have a question about date conversions.  Let's say we have a 
query like this:
   
   ```sql
   SELECT...
   FROM ...
   WHERE dateField > '2020-01-01'
   ```
   
   Queries like that will work in MySQL and other RDBMS.  In Drill I think they 
won't fail, but the results are not what people expect.   For cases like this, 
would `'2020-01-01'` be automatically cast to a date?  Would the same thing 
happen in situations like:
   
   ```
   DATE_DIFF('2020-01-01', '2021-01-01')
   ```
   
   




> Overhaul implict type casting logic
> ---
>
> Key: DRILL-8136
> URL: https://issues.apache.org/jira/browse/DRILL-8136
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Esther Buchwalter
>Assignee: James Turton
>Priority: Minor
>
> The existing implicit casting system is built on simplistic total ordering of 
> data types[1] that yields oddities such as TINYINT being regarded as the 
> closest numeric type to VARCHAR or DATE the closest type to FLOAT8. This, in 
> turn, hurts the range of data types with which SQL functions can be used. 
> E.g. `select sqrt('3.1415926')` works in many RDBMSes but not in Drill while, 
> confusingly, `select '123' + 456` does work in Drill. In addition the 
> limitations of the existing type precedence list mean that it has been 
> supplmented with ad hoc secondary casting rules that go in the opposite 
> direction.
> This Issue proposes a new, more flexible definition of casting distance based 
> on a weighted directed graph built over the Drill data types.
> [1] 
> [https://drill.apache.org/docs/supported-data-types/#implicit-casting-precedence-of-data-types]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8291) Allow case sensitive Filters in HTTP Plugin

2022-09-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17600010#comment-17600010
 ] 

ASF GitHub Bot commented on DRILL-8291:
---

cgivre opened a new pull request, #2639:
URL: https://github.com/apache/drill/pull/2639

   # [DRILL-8291](https://issues.apache.org/jira/browse/DRILL-8291): PR Title
   
   ## Description
   Some APIs will reject filter pushdowns if they are not in the correct case.  
This PR adds a config option `caseSensitiveFilters` to the API config and when 
set to `true`, preserves the case of the filters pushed down. 
   
   ## Documentation
   See above.
   
   ## Testing
   Manually tested




> Allow case sensitive Filters in HTTP Plugin
> ---
>
> Key: DRILL-8291
> URL: https://issues.apache.org/jira/browse/DRILL-8291
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - HTTP
>Affects Versions: 1.20.2
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 1.20.3
>
>
> Some APIs will reject filter pushdowns if they are not in the correct case.  
> This PR adds a config option `caseSensitiveFilters` to the API config and 
> when set to true, preserves the case of the filters pushed down. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8136) Overhaul implict type casting logic

2022-09-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17599848#comment-17599848
 ] 

ASF GitHub Bot commented on DRILL-8136:
---

jnturton opened a new pull request, #2638:
URL: https://github.com/apache/drill/pull/2638

   # [DRILL-8136](https://issues.apache.org/jira/browse/DRILL-8136): Overhaul 
implict type casting logic
   
   ## Description
   
   The existing implicit casting system is built on simplistic total ordering 
of data types[1] that yields oddities such as TINYINT being regarded as the 
closest numeric type to VARCHAR or DATE the closest type to FLOAT8. This, in 
turn, hurts the range of data types with which SQL functions can be used. E.g. 
`select sqrt('3.1415926')` works in many RDBMSes but not in Drill while, 
confusingly, `select '123' + 456` does work in Drill. In addition the 
limitations of the existing type precedence list mean that it has been 
supplmented with ad hoc secondary casting rules that go in the opposite 
direction.
   
   This PR introduces a new, more flexible definition of casting distance based 
on a weighted directed graph built over the Drill data types.
   
   ## Documentation
   Update [the description of implcit casting 
precedence](https://drill.apache.org/docs/supported-data-types/#implicit-casting-precedence-of-data-types).
   
   ## Testing
   Existing implcit cast unit tests plus new additions.
   




> Overhaul implict type casting logic
> ---
>
> Key: DRILL-8136
> URL: https://issues.apache.org/jira/browse/DRILL-8136
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Esther Buchwalter
>Assignee: James Turton
>Priority: Minor
>
> The existing implicit casting system is built on simplistic total ordering of 
> data types[1] that yields oddities such as TINYINT being regarded as the 
> closest numeric type to VARCHAR or DATE the closest type to FLOAT8. This, in 
> turn, hurts the range of data types with which SQL functions can be used. 
> E.g. `select sqrt('3.1415926')` works in many RDBMSes but not in Drill while, 
> confusingly, `select '123' + 456` does work in Drill. In addition the 
> limitations of the existing type precedence list mean that it has been 
> supplmented with ad hoc secondary casting rules that go in the opposite 
> direction.
> This Issue proposes a new, more flexible definition of casting distance based 
> on a weighted directed graph built over the Drill data types.
> [1] 
> [https://drill.apache.org/docs/supported-data-types/#implicit-casting-precedence-of-data-types]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8289) Add Threat Hunting Functions

2022-09-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17599125#comment-17599125
 ] 

ASF GitHub Bot commented on DRILL-8289:
---

cgivre merged PR #2634:
URL: https://github.com/apache/drill/pull/2634




> Add Threat Hunting Functions
> 
>
> Key: DRILL-8289
> URL: https://issues.apache.org/jira/browse/DRILL-8289
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Functions - Drill
>Affects Versions: 2.0.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 2.0.0
>
>
> # Threat Hunting Functions
> These functions are useful for doing threat hunting with Apache Drill. These 
> were inspired by huntlib.[1]
> The functions are: 
> * `punctuation_pattern()`: Extracts the pattern of punctuation in 
> text.
> * `entropy()`: This function calculates the Shannon Entropy of a 
> given string of text.
> * `entropyPerByte()`: This function calculates the Shannon Entropy of 
> a given string of text, normed for the string length.
> [1]: https://github.com/target/huntlib



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8289) Add Threat Hunting Functions

2022-09-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17599123#comment-17599123
 ] 

ASF GitHub Bot commented on DRILL-8289:
---

pjfanning commented on PR #2634:
URL: https://github.com/apache/drill/pull/2634#issuecomment-1234687747

   @cgivre lgtm




> Add Threat Hunting Functions
> 
>
> Key: DRILL-8289
> URL: https://issues.apache.org/jira/browse/DRILL-8289
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Functions - Drill
>Affects Versions: 2.0.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 2.0.0
>
>
> # Threat Hunting Functions
> These functions are useful for doing threat hunting with Apache Drill. These 
> were inspired by huntlib.[1]
> The functions are: 
> * `punctuation_pattern()`: Extracts the pattern of punctuation in 
> text.
> * `entropy()`: This function calculates the Shannon Entropy of a 
> given string of text.
> * `entropyPerByte()`: This function calculates the Shannon Entropy of 
> a given string of text, normed for the string length.
> [1]: https://github.com/target/huntlib



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8289) Add Threat Hunting Functions

2022-09-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17599121#comment-17599121
 ] 

ASF GitHub Bot commented on DRILL-8289:
---

cgivre commented on PR #2634:
URL: https://github.com/apache/drill/pull/2634#issuecomment-1234683441

   @pjfanning are we ready to merge this?




> Add Threat Hunting Functions
> 
>
> Key: DRILL-8289
> URL: https://issues.apache.org/jira/browse/DRILL-8289
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Functions - Drill
>Affects Versions: 2.0.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 2.0.0
>
>
> # Threat Hunting Functions
> These functions are useful for doing threat hunting with Apache Drill. These 
> were inspired by huntlib.[1]
> The functions are: 
> * `punctuation_pattern()`: Extracts the pattern of punctuation in 
> text.
> * `entropy()`: This function calculates the Shannon Entropy of a 
> given string of text.
> * `entropyPerByte()`: This function calculates the Shannon Entropy of 
> a given string of text, normed for the string length.
> [1]: https://github.com/target/huntlib



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8290) Short cut recursive file listings for LIMIT 0 queries

2022-09-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17598995#comment-17598995
 ] 

ASF GitHub Bot commented on DRILL-8290:
---

jnturton commented on PR #2636:
URL: https://github.com/apache/drill/pull/2636#issuecomment-1234361083

   @vvysotskyi I did spot one [other recursive file 
listing](https://github.com/jnturton/drill/blob/65fb7ddc144ecae5330c9325af63010748f74cdf/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/metadata/Metadata.java#L376)
 that could possibly use the short cut in this PR if I propagate a `limit0` 
flag down to it.
   
   It appears to be invoked only if there are Parquet files present at the top 
level of the queried path which I don't think should be too common for big 
datasets since data files are generally only present at the leaves of the 
directory tree. So I thought I'd ask if you think it's worth trying to 
implement the single file short cut here too, or we just leave it alone?




> Short cut recursive file listings for LIMIT 0 queries
> -
>
> Key: DRILL-8290
> URL: https://issues.apache.org/jira/browse/DRILL-8290
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning  Optimization
>Affects Versions: 1.20.2
>Reporter: James Turton
>Priority: Minor
> Fix For: 2.0.0
>
>
> The existing LIMIT 0 query optimisations do not prevent a query run against 
> the top of a deep DFS directory tree from recursively listing FileStatuses 
> for everything within it using a pool of worker threads. This Issue proposes 
> a new optimisation whereby such queries will recurse into the directory tree 
> on a single thread that returns as soon as any single FileStatus has been 
> obtained.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8283) Add a configurable recursive file listing size limit

2022-09-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17598896#comment-17598896
 ] 

ASF GitHub Bot commented on DRILL-8283:
---

jnturton commented on code in PR #2632:
URL: https://github.com/apache/drill/pull/2632#discussion_r960531148


##
exec/java-exec/src/main/java/org/apache/drill/exec/util/FileSystemUtil.java:
##
@@ -42,6 +47,12 @@ public class FileSystemUtil {
 
   private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(FileSystemUtil.class);
 
+  private static int recursiveListingMaxSize;
+
+  static {
+recursiveListingMaxSize = 
DrillConfig.create().getInt(ExecConstants.RECURSIVE_FILE_LISTING_MAX_SIZE);
+  }

Review Comment:
   It's gone now.





> Add a configurable recursive file listing size limit
> 
>
> Key: DRILL-8283
> URL: https://issues.apache.org/jira/browse/DRILL-8283
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.20.2
>Reporter: James Turton
>Assignee: James Turton
>Priority: Minor
> Fix For: 1.20.3
>
>
> Currently a malicious or merely unwitting user can crash their Drill foreman 
> by sending
> {code:java}
> select * from dfs.huge_workspace limit 10
> {code}
> causing the query planner to recurse over every file in huge_workspace and 
> culminating in
> {code:java}
> 2022-08-09 15:13:22,251 [1d0da29f-e50c-fd51-43d9-8a5086d52c4e:foreman] ERROR 
> o.a.drill.common.CatastrophicFailure - Catastrophic Failure Occurred, 
> exiting. Information message: Unable to handle out of memory condition in 
> Foreman.java.lang.OutOfMemoryError: null {code}
> if there are enough files in huge_workspace. A SHOW FILES command can produce 
> the same effect. This issue proposes a new BOOT option named 
> drill.exec.storage.file.recursive_listing_max_size with a default value of, 
> say 10 000. If a file listing task exceeds this limit then the initiating 
> operation is terminated with a UserException preventing runaway resource 
> usage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8283) Add a configurable recursive file listing size limit

2022-09-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17598882#comment-17598882
 ] 

ASF GitHub Bot commented on DRILL-8283:
---

jnturton commented on PR #2632:
URL: https://github.com/apache/drill/pull/2632#issuecomment-1234106371

   I just went through an exercise to replace the `boolean recursive` parameter 
with a new  `RecursionOpts recurOpts` that allows the specification of the max 
listing size, but that change rippled and in some cases callers _also_ don't 
have access to the Drill config. I now think the only reasonable way for this 
limit to reach the file listing utility classes is by it becoming an env var.




> Add a configurable recursive file listing size limit
> 
>
> Key: DRILL-8283
> URL: https://issues.apache.org/jira/browse/DRILL-8283
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.20.2
>Reporter: James Turton
>Assignee: James Turton
>Priority: Minor
> Fix For: 1.20.3
>
>
> Currently a malicious or merely unwitting user can crash their Drill foreman 
> by sending
> {code:java}
> select * from dfs.huge_workspace limit 10
> {code}
> causing the query planner to recurse over every file in huge_workspace and 
> culminating in
> {code:java}
> 2022-08-09 15:13:22,251 [1d0da29f-e50c-fd51-43d9-8a5086d52c4e:foreman] ERROR 
> o.a.drill.common.CatastrophicFailure - Catastrophic Failure Occurred, 
> exiting. Information message: Unable to handle out of memory condition in 
> Foreman.java.lang.OutOfMemoryError: null {code}
> if there are enough files in huge_workspace. A SHOW FILES command can produce 
> the same effect. This issue proposes a new BOOT option named 
> drill.exec.storage.file.recursive_listing_max_size with a default value of, 
> say 10 000. If a file listing task exceeds this limit then the initiating 
> operation is terminated with a UserException preventing runaway resource 
> usage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8289) Add Threat Hunting Functions

2022-08-31 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17598720#comment-17598720
 ] 

ASF GitHub Bot commented on DRILL-8289:
---

cgivre commented on code in PR #2634:
URL: https://github.com/apache/drill/pull/2634#discussion_r960184271


##
contrib/udfs/src/main/java/org/apache/drill/exec/udfs/ThreatHuntingFunctions.java:
##
@@ -0,0 +1,179 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.udfs;
+
+import io.netty.buffer.DrillBuf;
+import org.apache.drill.exec.expr.DrillSimpleFunc;
+import org.apache.drill.exec.expr.annotations.FunctionTemplate;
+import org.apache.drill.exec.expr.annotations.Output;
+import org.apache.drill.exec.expr.annotations.Param;
+import org.apache.drill.exec.expr.holders.Float8Holder;
+import org.apache.drill.exec.expr.holders.VarCharHolder;
+
+import javax.inject.Inject;
+
+public class ThreatHuntingFunctions {
+  /**
+   * Punctuation pattern is useful for comparing log entries.  It extracts the 
all the punctuation and returns
+   * that pattern.  Spaces are replaced with an underscore.
+   * 
+   * Usage: SELECT punctuation_pattern( string ) FROM...
+   */
+  @FunctionTemplate(names = {"punctuation_pattern", "punctuationPattern"},
+scope = FunctionTemplate.FunctionScope.SIMPLE,
+nulls = FunctionTemplate.NullHandling.NULL_IF_NULL)
+  public static class PunctuationPatternFunction implements DrillSimpleFunc {
+
+@Param
+VarCharHolder rawInput;
+
+@Output
+VarCharHolder out;
+
+@Inject
+DrillBuf buffer;
+
+@Override
+public void setup() {
+}
+
+@Override
+public void eval() {
+
+  String input = 
org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.toStringFromUTF8(rawInput.start,
 rawInput.end, rawInput.buffer);
+
+  String punctuationPattern = input.replaceAll("[a-zA-Z0-9]", "");
+  punctuationPattern = punctuationPattern.replaceAll(" ", "_");
+
+  out.buffer = buffer;
+  out.start = 0;
+  out.end = punctuationPattern.getBytes().length;

Review Comment:
   Fixed





> Add Threat Hunting Functions
> 
>
> Key: DRILL-8289
> URL: https://issues.apache.org/jira/browse/DRILL-8289
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Functions - Drill
>Affects Versions: 2.0.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 2.0.0
>
>
> # Threat Hunting Functions
> These functions are useful for doing threat hunting with Apache Drill. These 
> were inspired by huntlib.[1]
> The functions are: 
> * `punctuation_pattern()`: Extracts the pattern of punctuation in 
> text.
> * `entropy()`: This function calculates the Shannon Entropy of a 
> given string of text.
> * `entropyPerByte()`: This function calculates the Shannon Entropy of 
> a given string of text, normed for the string length.
> [1]: https://github.com/target/huntlib



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8289) Add Threat Hunting Functions

2022-08-31 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17598721#comment-17598721
 ] 

ASF GitHub Bot commented on DRILL-8289:
---

cgivre commented on PR #2634:
URL: https://github.com/apache/drill/pull/2634#issuecomment-1233689516

   Thanks @pjfanning for the review.  I addressed your review comments.




> Add Threat Hunting Functions
> 
>
> Key: DRILL-8289
> URL: https://issues.apache.org/jira/browse/DRILL-8289
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Functions - Drill
>Affects Versions: 2.0.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 2.0.0
>
>
> # Threat Hunting Functions
> These functions are useful for doing threat hunting with Apache Drill. These 
> were inspired by huntlib.[1]
> The functions are: 
> * `punctuation_pattern()`: Extracts the pattern of punctuation in 
> text.
> * `entropy()`: This function calculates the Shannon Entropy of a 
> given string of text.
> * `entropyPerByte()`: This function calculates the Shannon Entropy of 
> a given string of text, normed for the string length.
> [1]: https://github.com/target/huntlib



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8289) Add Threat Hunting Functions

2022-08-31 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17598719#comment-17598719
 ] 

ASF GitHub Bot commented on DRILL-8289:
---

cgivre commented on code in PR #2634:
URL: https://github.com/apache/drill/pull/2634#discussion_r960183139


##
contrib/udfs/src/main/java/org/apache/drill/exec/udfs/ThreatHuntingFunctions.java:
##
@@ -0,0 +1,179 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.udfs;
+
+import io.netty.buffer.DrillBuf;
+import org.apache.drill.exec.expr.DrillSimpleFunc;
+import org.apache.drill.exec.expr.annotations.FunctionTemplate;
+import org.apache.drill.exec.expr.annotations.Output;
+import org.apache.drill.exec.expr.annotations.Param;
+import org.apache.drill.exec.expr.holders.Float8Holder;
+import org.apache.drill.exec.expr.holders.VarCharHolder;
+
+import javax.inject.Inject;
+
+public class ThreatHuntingFunctions {
+  /**
+   * Punctuation pattern is useful for comparing log entries.  It extracts the 
all the punctuation and returns

Review Comment:
   Fixed





> Add Threat Hunting Functions
> 
>
> Key: DRILL-8289
> URL: https://issues.apache.org/jira/browse/DRILL-8289
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Functions - Drill
>Affects Versions: 2.0.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 2.0.0
>
>
> # Threat Hunting Functions
> These functions are useful for doing threat hunting with Apache Drill. These 
> were inspired by huntlib.[1]
> The functions are: 
> * `punctuation_pattern()`: Extracts the pattern of punctuation in 
> text.
> * `entropy()`: This function calculates the Shannon Entropy of a 
> given string of text.
> * `entropyPerByte()`: This function calculates the Shannon Entropy of 
> a given string of text, normed for the string length.
> [1]: https://github.com/target/huntlib



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8290) Short circuit recursive file listings for LIMIT 0 queries

2022-08-31 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17598408#comment-17598408
 ] 

ASF GitHub Bot commented on DRILL-8290:
---

jnturton opened a new pull request, #2636:
URL: https://github.com/apache/drill/pull/2636

   # [DRILL-8290](https://issues.apache.org/jira/browse/DRILL-8290): Short 
circuit recursive file listings for LIMIT 0 queries
   
   ## Description
   
   The existing LIMIT 0 query optimisations do not prevent a query run against 
the top of a deep DFS directory tree from recursively listing FileStatuses for 
everything within it using a pool of worker threads. This PR adds a new 
optimisation whereby such queries will recurse into the directory tree on a 
single thread that returns as soon as any single FileStatus has been obtained.
   
   ## Documentation
   Mention in the docs on LIMIT 0 optimisations.
   
   ## Testing
   TODO
   




> Short circuit recursive file listings for LIMIT 0 queries
> -
>
> Key: DRILL-8290
> URL: https://issues.apache.org/jira/browse/DRILL-8290
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning  Optimization
>Affects Versions: 1.20.2
>Reporter: James Turton
>Priority: Minor
> Fix For: 2.0.0
>
>
> The existing LIMIT 0 query optimisations do not prevent a query run against 
> the top of a deep DFS directory tree from recursively listing FileStatuses 
> for everything within it using a pool of worker threads. This Issue proposes 
> a new optimisation whereby such queries will recurse into the directory tree 
> on a single thread that returns as soon as any single FileStatus has been 
> obtained.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8259) Support advanced HBase persistence storage options

2022-08-31 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17598227#comment-17598227
 ] 

ASF GitHub Bot commented on DRILL-8259:
---

Z0ltrix commented on code in PR #2596:
URL: https://github.com/apache/drill/pull/2596#discussion_r959257458


##
contrib/storage-hbase/src/main/java/org/apache/drill/exec/store/hbase/config/HBasePersistentStoreProvider.java:
##
@@ -20,116 +20,249 @@
 import java.io.IOException;
 import java.util.Map;
 
+import org.apache.drill.common.AutoCloseables;
 import org.apache.drill.common.exceptions.DrillRuntimeException;
 import org.apache.drill.exec.exception.StoreException;
 import org.apache.drill.exec.store.hbase.DrillHBaseConstants;
 import org.apache.drill.exec.store.sys.PersistentStore;
 import org.apache.drill.exec.store.sys.PersistentStoreConfig;
 import org.apache.drill.exec.store.sys.PersistentStoreRegistry;
 import 
org.apache.drill.exec.store.sys.store.provider.BasePersistentStoreProvider;
+import 
org.apache.drill.shaded.guava.com.google.common.annotations.VisibleForTesting;
+import org.apache.drill.shaded.guava.com.google.common.collect.Maps;
 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.hbase.HBaseConfiguration;
-import org.apache.hadoop.hbase.HColumnDescriptor;
 import org.apache.hadoop.hbase.HConstants;
-import org.apache.hadoop.hbase.HTableDescriptor;
 import org.apache.hadoop.hbase.TableName;
 import org.apache.hadoop.hbase.client.Admin;
+import org.apache.hadoop.hbase.client.ColumnFamilyDescriptorBuilder;
 import org.apache.hadoop.hbase.client.Connection;
 import org.apache.hadoop.hbase.client.ConnectionFactory;
+import org.apache.hadoop.hbase.client.Durability;
 import org.apache.hadoop.hbase.client.Table;
+import org.apache.hadoop.hbase.client.TableDescriptor;
+import org.apache.hadoop.hbase.client.TableDescriptorBuilder;
+import org.apache.hadoop.hbase.io.compress.Compression.Algorithm;
+import org.apache.hadoop.hbase.io.encoding.DataBlockEncoding;
 import org.apache.hadoop.hbase.util.Bytes;
 
-import 
org.apache.drill.shaded.guava.com.google.common.annotations.VisibleForTesting;
-
 public class HBasePersistentStoreProvider extends BasePersistentStoreProvider {
   private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(HBasePersistentStoreProvider.class);
 
-  static final byte[] FAMILY = Bytes.toBytes("s");
+  public static final byte[] DEFAULT_FAMILY_NAME = Bytes.toBytes("s");
 
-  static final byte[] QUALIFIER = Bytes.toBytes("d");
+  public static final byte[] QUALIFIER_NAME = Bytes.toBytes("d");
+
+  private static final String HBASE_CLIENT_ID = 
"drill-hbase-persistent-store-client";
 
   private final TableName hbaseTableName;
 
+  private final byte[] family;
+
+  private Table hbaseTable;
+
   private Configuration hbaseConf;
 
-  private Connection connection;
+  private final Map tableConfig;
 
-  private Table hbaseTable;
+  private final Map columnConfig;
 
+  private Connection connection;
+
+  @SuppressWarnings("unchecked")
   public HBasePersistentStoreProvider(PersistentStoreRegistry registry) {
-@SuppressWarnings("unchecked")
-final Map config = (Map) 
registry.getConfig().getAnyRef(DrillHBaseConstants.SYS_STORE_PROVIDER_HBASE_CONFIG);
-this.hbaseConf = HBaseConfiguration.create();
-this.hbaseConf.set(HConstants.HBASE_CLIENT_INSTANCE_ID, 
"drill-hbase-persistent-store-client");
-if (config != null) {
-  for (Map.Entry entry : config.entrySet()) {
-this.hbaseConf.set(entry.getKey(), String.valueOf(entry.getValue()));
+final Map hbaseConfig = (Map) 
registry.getConfig().getAnyRef(DrillHBaseConstants.SYS_STORE_PROVIDER_HBASE_CONFIG);
+if 
(registry.getConfig().hasPath(DrillHBaseConstants.SYS_STORE_PROVIDER_HBASE_TABLE_CONFIG))
 {
+  tableConfig = (Map) 
registry.getConfig().getAnyRef(DrillHBaseConstants.SYS_STORE_PROVIDER_HBASE_TABLE_CONFIG);
+} else {
+  tableConfig = Maps.newHashMap();
+}
+if 
(registry.getConfig().hasPath(DrillHBaseConstants.SYS_STORE_PROVIDER_HBASE_COLUMN_CONFIG))
 {
+  columnConfig = (Map) 
registry.getConfig().getAnyRef(DrillHBaseConstants.SYS_STORE_PROVIDER_HBASE_COLUMN_CONFIG);
+} else {
+  columnConfig = Maps.newHashMap();
+}
+hbaseConf = HBaseConfiguration.create();

Review Comment:
   > As you know, HBase is a nightmare for operational services due to the 
complexity of the settings. The actual value in the above example is not a 
recommended value, no unique value is appropriate for every case, but is simply 
the type of value that this parameter has to fill, is "true/false", not "0/1".
   
   hi @luocooong im still worried about the defaults, escpecially when drill 
creates the table on his own... 
   
   am i correcth that you dont set any defaults except 
SYS_STORE_PROVIDER_HBASE_TABLE, SYS_STORE_PROVIDER_HBASE_NAMESPACE and 
SYS_STORE_PROVIDER_HBASE_FAMILY?
   

[jira] [Commented] (DRILL-8259) Support advanced HBase persistence storage options

2022-08-31 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17598224#comment-17598224
 ] 

ASF GitHub Bot commented on DRILL-8259:
---

Z0ltrix commented on PR #2596:
URL: https://github.com/apache/drill/pull/2596#issuecomment-1232571619

   > @Z0ltrix Would you mind doing a formal review on this PR? @luocooong asked 
me but I don't really have enough experience with HBase to comment 
intelligently on this. If you're already happy with this, all you have to do is 
leave a `+1`.
   
   sorry for the late response, i would love to do the review :)




> Support advanced HBase persistence storage options
> --
>
> Key: DRILL-8259
> URL: https://issues.apache.org/jira/browse/DRILL-8259
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - HBase
>Reporter: Cong Luo
>Assignee: Cong Luo
>Priority: Major
> Fix For: 2.0.0
>
>
> Its contents are as follows
> {code:java}
> sys.store.provider: {
>   class: "org.apache.drill.exec.store.hbase.config.HBasePStoreProvider",
>   hbase: {
>     table : "drill_store",
>     config: {
>       "hbase.zookeeper.quorum": "zk_host3,zk_host2,zk_host1",
>       "hbase.zookeeper.property.clientPort": "2181",
>       "zookeeper.znode.parent": "/hbase-test"
>     },
>     table_config : {
>       "durability": "ASYNC_WAL",
>       "compaction_enabled": false,
>       "split_enabled": false,
>       "max_filesize": 10737418240,
>       "memstore_flushsize": 536870912
>     },
>     column_config : {
>       "versions": 1,
>       "ttl": 2626560,
>       "compression": "SNAPPY",
>       "blockcache": true,
>       "blocksize": 131072,
>       "data_block_encoding": "FAST_DIFF",
>       "in_memory": true,
>       "dfs_replication": 3
>     }
>   }
> }{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8289) Add Threat Hunting Functions

2022-08-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17597394#comment-17597394
 ] 

ASF GitHub Bot commented on DRILL-8289:
---

pjfanning commented on code in PR #2634:
URL: https://github.com/apache/drill/pull/2634#discussion_r957792833


##
contrib/udfs/src/main/java/org/apache/drill/exec/udfs/ThreatHuntingFunctions.java:
##
@@ -0,0 +1,179 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.udfs;
+
+import io.netty.buffer.DrillBuf;
+import org.apache.drill.exec.expr.DrillSimpleFunc;
+import org.apache.drill.exec.expr.annotations.FunctionTemplate;
+import org.apache.drill.exec.expr.annotations.Output;
+import org.apache.drill.exec.expr.annotations.Param;
+import org.apache.drill.exec.expr.holders.Float8Holder;
+import org.apache.drill.exec.expr.holders.VarCharHolder;
+
+import javax.inject.Inject;
+
+public class ThreatHuntingFunctions {
+  /**
+   * Punctuation pattern is useful for comparing log entries.  It extracts the 
all the punctuation and returns

Review Comment:
   `the all the` should probably be `all the`





> Add Threat Hunting Functions
> 
>
> Key: DRILL-8289
> URL: https://issues.apache.org/jira/browse/DRILL-8289
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Functions - Drill
>Affects Versions: 2.0.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 2.0.0
>
>
> # Threat Hunting Functions
> These functions are useful for doing threat hunting with Apache Drill. These 
> were inspired by huntlib.[1]
> The functions are: 
> * `punctuation_pattern()`: Extracts the pattern of punctuation in 
> text.
> * `entropy()`: This function calculates the Shannon Entropy of a 
> given string of text.
> * `entropyPerByte()`: This function calculates the Shannon Entropy of 
> a given string of text, normed for the string length.
> [1]: https://github.com/target/huntlib



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8289) Add Threat Hunting Functions

2022-08-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17597393#comment-17597393
 ] 

ASF GitHub Bot commented on DRILL-8289:
---

pjfanning commented on code in PR #2634:
URL: https://github.com/apache/drill/pull/2634#discussion_r957792365


##
contrib/udfs/src/main/java/org/apache/drill/exec/udfs/ThreatHuntingFunctions.java:
##
@@ -0,0 +1,179 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.udfs;
+
+import io.netty.buffer.DrillBuf;
+import org.apache.drill.exec.expr.DrillSimpleFunc;
+import org.apache.drill.exec.expr.annotations.FunctionTemplate;
+import org.apache.drill.exec.expr.annotations.Output;
+import org.apache.drill.exec.expr.annotations.Param;
+import org.apache.drill.exec.expr.holders.Float8Holder;
+import org.apache.drill.exec.expr.holders.VarCharHolder;
+
+import javax.inject.Inject;
+
+public class ThreatHuntingFunctions {
+  /**
+   * Punctuation pattern is useful for comparing log entries.  It extracts the 
all the punctuation and returns
+   * that pattern.  Spaces are replaced with an underscore.
+   * 
+   * Usage: SELECT punctuation_pattern( string ) FROM...
+   */
+  @FunctionTemplate(names = {"punctuation_pattern", "punctuationPattern"},
+scope = FunctionTemplate.FunctionScope.SIMPLE,
+nulls = FunctionTemplate.NullHandling.NULL_IF_NULL)
+  public static class PunctuationPatternFunction implements DrillSimpleFunc {
+
+@Param
+VarCharHolder rawInput;
+
+@Output
+VarCharHolder out;
+
+@Inject
+DrillBuf buffer;
+
+@Override
+public void setup() {
+}
+
+@Override
+public void eval() {
+
+  String input = 
org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.toStringFromUTF8(rawInput.start,
 rawInput.end, rawInput.buffer);
+
+  String punctuationPattern = input.replaceAll("[a-zA-Z0-9]", "");
+  punctuationPattern = punctuationPattern.replaceAll(" ", "_");
+
+  out.buffer = buffer;
+  out.start = 0;
+  out.end = punctuationPattern.getBytes().length;

Review Comment:
   getBytes is safer if you specify a charset, otherwise you get the JVM 
default which differs from machine to machine (unless Drill startup shell 
scripts specify `-Dfile.encoding=...`)





> Add Threat Hunting Functions
> 
>
> Key: DRILL-8289
> URL: https://issues.apache.org/jira/browse/DRILL-8289
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Functions - Drill
>Affects Versions: 2.0.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 2.0.0
>
>
> # Threat Hunting Functions
> These functions are useful for doing threat hunting with Apache Drill. These 
> were inspired by huntlib.[1]
> The functions are: 
> * `punctuation_pattern()`: Extracts the pattern of punctuation in 
> text.
> * `entropy()`: This function calculates the Shannon Entropy of a 
> given string of text.
> * `entropyPerByte()`: This function calculates the Shannon Entropy of 
> a given string of text, normed for the string length.
> [1]: https://github.com/target/huntlib



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8287) Add Support for Keyset Based Pagination

2022-08-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17597327#comment-17597327
 ] 

ASF GitHub Bot commented on DRILL-8287:
---

cgivre merged PR #2633:
URL: https://github.com/apache/drill/pull/2633




> Add Support for Keyset Based Pagination
> ---
>
> Key: DRILL-8287
> URL: https://issues.apache.org/jira/browse/DRILL-8287
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - HTTP
>Affects Versions: 1.20.2
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 2.0.0
>
>
> Some APIs such as HubSpot use values in the result set to indicate whether 
> there are additional pages.  This PR adds support for this kind of 
> pagination.  Note that current implementation only works for JSON based APIs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8287) Add Support for Keyset Based Pagination

2022-08-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17597256#comment-17597256
 ] 

ASF GitHub Bot commented on DRILL-8287:
---

jnturton commented on code in PR #2633:
URL: https://github.com/apache/drill/pull/2633#discussion_r957489232


##
exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json/parser/SimpleMessageParser.java:
##
@@ -129,6 +135,44 @@ private boolean parseInnerLevel(TokenIterator tokenizer, 
int level) throws Messa
 return parseToElement(tokenizer, level + 1);
   }
 
+  /**
+   * This function is called when a storage plugin needs to retrieve values 
which have been read.  This logic
+   * enables use of the data path in these situations.  Normally, when the 
datapath is defined, the JSON reader
+   * will "free-wheel" over unprojected columns or columns outside of the 
datapath.  However, in this case, often
+   * the values which are being read, are outside the dataPath.  This logic 
offers a way to capture these values
+   * without creating a ValueVector for them.
+   *
+   * @param tokenizer A {@link TokenIterator} of the parsed JSON data.
+   * @param fieldName A {@link String} of the pagination field name.

Review Comment:
   ```suggestion
  * @param fieldName A {@link String} of the listener column name.
   ```



##
exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json/loader/TupleParser.java:
##
@@ -127,10 +127,19 @@ public TupleParser(JsonLoaderImpl loader, TupleWriter 
tupleWriter, TupleMetadata
 
   @Override
   public ElementParser onField(String key, TokenIterator tokenizer) {
-if (!tupleWriter.isProjected(key)) {
+if (projectField(key)) {
+  return fieldParserFor(key, tokenizer);
+} else {
   return fieldFactory().ignoredFieldParser();
+}
+  }
+
+  private boolean projectField(String key) {
+// This method makes sure that fields necessary for pagination are read.

Review Comment:
   ```suggestion
   // This method makes sure that fields necessary for column listeners are 
read.
   ```



##
exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json/values/ScalarListener.java:
##
@@ -76,4 +79,30 @@ protected void setArrayNull() {
   protected UserException typeConversionError(String jsonType) {
 return loader.typeConversionError(schema(), jsonType);
   }
+
+  /**
+   * Adds a field's most recent value to the column listener map.
+   * This data is only stored if the listener column map is defined, and has 
keys.
+   * @param key The key of the pagination field

Review Comment:
   ```suggestion
  * @param key The key of the listener field
   ```



##
exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json/values/ScalarListener.java:
##
@@ -76,4 +79,30 @@ protected void setArrayNull() {
   protected UserException typeConversionError(String jsonType) {
 return loader.typeConversionError(schema(), jsonType);
   }
+
+  /**
+   * Adds a field's most recent value to the column listener map.
+   * This data is only stored if the listener column map is defined, and has 
keys.
+   * @param key The key of the pagination field
+   * @param value The value of to be retained
+   */
+  protected void addValueToListenerMap(String key, String value) {
+Map listenerColumnMap = loader.listenerColumnMap();
+
+if (listenerColumnMap == null || listenerColumnMap.isEmpty()) {
+  return;
+} else if (listenerColumnMap.containsKey(key) && 
StringUtils.isNotEmpty(value)) {
+  listenerColumnMap.put(key, value);
+}
+  }
+
+  protected void addValueToListenerMap(String key, Object value) {
+Map paginationMap = loader.listenerColumnMap();

Review Comment:
   ```suggestion
   Map listenerMap = loader.listenerColumnMap();
   ```





> Add Support for Keyset Based Pagination
> ---
>
> Key: DRILL-8287
> URL: https://issues.apache.org/jira/browse/DRILL-8287
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - HTTP
>Affects Versions: 1.20.2
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 2.0.0
>
>
> Some APIs such as HubSpot use values in the result set to indicate whether 
> there are additional pages.  This PR adds support for this kind of 
> pagination.  Note that current implementation only works for JSON based APIs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8287) Add Support for Keyset Based Pagination

2022-08-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17597242#comment-17597242
 ] 

ASF GitHub Bot commented on DRILL-8287:
---

cgivre commented on code in PR #2633:
URL: https://github.com/apache/drill/pull/2633#discussion_r957448436


##
exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json/values/ScalarListener.java:
##
@@ -76,4 +79,33 @@ protected void setArrayNull() {
   protected UserException typeConversionError(String jsonType) {
 return loader.typeConversionError(schema(), jsonType);
   }
+
+  /**
+   * Adds a field's most recent value to the pagination map.  This is 
necessary for the HTTP plugin
+   * for index or keyset pagination where the API transmits values in the 
results that are used to
+   * generate the next page.
+   *
+   * This data is only stored if the pagination map is defined, and has keys.

Review Comment:
   Done!



##
exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json/parser/SimpleMessageParser.java:
##
@@ -66,11 +68,13 @@
 public class SimpleMessageParser implements MessageParser {
 
   private final String[] path;
+  private final Map paginationFields;
 
-  public SimpleMessageParser(String dataPath) {
+  public SimpleMessageParser(String dataPath, Map 
paginationFields) {

Review Comment:
   Done!





> Add Support for Keyset Based Pagination
> ---
>
> Key: DRILL-8287
> URL: https://issues.apache.org/jira/browse/DRILL-8287
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - HTTP
>Affects Versions: 1.20.2
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 2.0.0
>
>
> Some APIs such as HubSpot use values in the result set to indicate whether 
> there are additional pages.  This PR adds support for this kind of 
> pagination.  Note that current implementation only works for JSON based APIs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8287) Add Support for Keyset Based Pagination

2022-08-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17597234#comment-17597234
 ] 

ASF GitHub Bot commented on DRILL-8287:
---

jnturton commented on code in PR #2633:
URL: https://github.com/apache/drill/pull/2633#discussion_r957433471


##
exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json/values/ScalarListener.java:
##
@@ -76,4 +79,33 @@ protected void setArrayNull() {
   protected UserException typeConversionError(String jsonType) {
 return loader.typeConversionError(schema(), jsonType);
   }
+
+  /**
+   * Adds a field's most recent value to the pagination map.  This is 
necessary for the HTTP plugin
+   * for index or keyset pagination where the API transmits values in the 
results that are used to
+   * generate the next page.
+   *
+   * This data is only stored if the pagination map is defined, and has keys.

Review Comment:
   Can this be rewritten in terms of generic column listeners rather than 
pagination and the HTTP plugin?



##
exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json/parser/SimpleMessageParser.java:
##
@@ -66,11 +68,13 @@
 public class SimpleMessageParser implements MessageParser {
 
   private final String[] path;
+  private final Map paginationFields;
 
-  public SimpleMessageParser(String dataPath) {
+  public SimpleMessageParser(String dataPath, Map 
paginationFields) {

Review Comment:
   Can we rename "pagination" here too?





> Add Support for Keyset Based Pagination
> ---
>
> Key: DRILL-8287
> URL: https://issues.apache.org/jira/browse/DRILL-8287
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - HTTP
>Affects Versions: 1.20.2
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 2.0.0
>
>
> Some APIs such as HubSpot use values in the result set to indicate whether 
> there are additional pages.  This PR adds support for this kind of 
> pagination.  Note that current implementation only works for JSON based APIs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8282) Upgrade to hadoop-common 3.2.4 due to CVE

2022-08-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17597229#comment-17597229
 ] 

ASF GitHub Bot commented on DRILL-8282:
---

jnturton merged PR #2630:
URL: https://github.com/apache/drill/pull/2630




> Upgrade to hadoop-common 3.2.4 due to CVE 
> --
>
> Key: DRILL-8282
> URL: https://issues.apache.org/jira/browse/DRILL-8282
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: PJ Fanning
>Priority: Major
>
> https://github.com/advisories/GHSA-8wm5-8h9c-47pc
> * this change requires some reload4j dependency changes too - see broken 
> build - https://github.com/apache/drill/pull/2628



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8282) Upgrade to hadoop-common 3.2.4 due to CVE

2022-08-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17597230#comment-17597230
 ] 

ASF GitHub Bot commented on DRILL-8282:
---

jnturton merged PR #2635:
URL: https://github.com/apache/drill/pull/2635




> Upgrade to hadoop-common 3.2.4 due to CVE 
> --
>
> Key: DRILL-8282
> URL: https://issues.apache.org/jira/browse/DRILL-8282
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: PJ Fanning
>Priority: Major
>
> https://github.com/advisories/GHSA-8wm5-8h9c-47pc
> * this change requires some reload4j dependency changes too - see broken 
> build - https://github.com/apache/drill/pull/2628



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8287) Add Support for Keyset Based Pagination

2022-08-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17597219#comment-17597219
 ] 

ASF GitHub Bot commented on DRILL-8287:
---

cgivre commented on PR #2633:
URL: https://github.com/apache/drill/pull/2633#issuecomment-1230390598

   @jnturton Thanks for the quick review!  I addressed your comments.   I 
actually reinserted the commented out block as that was intended to make sure 
that the user properly populates the pagination fields.  Not sure why I 
commented that out in the first place.




> Add Support for Keyset Based Pagination
> ---
>
> Key: DRILL-8287
> URL: https://issues.apache.org/jira/browse/DRILL-8287
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - HTTP
>Affects Versions: 1.20.2
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 2.0.0
>
>
> Some APIs such as HubSpot use values in the result set to indicate whether 
> there are additional pages.  This PR adds support for this kind of 
> pagination.  Note that current implementation only works for JSON based APIs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8287) Add Support for Keyset Based Pagination

2022-08-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17597213#comment-17597213
 ] 

ASF GitHub Bot commented on DRILL-8287:
---

cgivre commented on PR #2633:
URL: https://github.com/apache/drill/pull/2633#issuecomment-1230366234

   > I'm not sure that the concept of pagination from the HTTP plugin should 
spill into the JSON reader. Can you abstract it, e.g. by renaming paginationMap 
to, say, listenerColumnMap?
   
   Fixed. 




> Add Support for Keyset Based Pagination
> ---
>
> Key: DRILL-8287
> URL: https://issues.apache.org/jira/browse/DRILL-8287
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - HTTP
>Affects Versions: 1.20.2
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 2.0.0
>
>
> Some APIs such as HubSpot use values in the result set to indicate whether 
> there are additional pages.  This PR adds support for this kind of 
> pagination.  Note that current implementation only works for JSON based APIs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8287) Add Support for Keyset Based Pagination

2022-08-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17597206#comment-17597206
 ] 

ASF GitHub Bot commented on DRILL-8287:
---

cgivre commented on code in PR #2633:
URL: https://github.com/apache/drill/pull/2633#discussion_r957380889


##
contrib/storage-http/src/main/java/org/apache/drill/exec/store/http/HttpPaginatorConfig.java:
##
@@ -137,21 +162,28 @@ public String toString() {
   .field("pageSize", pageSize)
   .field("maxRecords", maxRecords)
   .field("method", method)
+  .field("indexParam", indexParam)
+  .field("hasMoreParam", hasMoreParam)
+  .field("nextPageParam", nextPageParam)
   .toString();
   }
 
   public enum PaginatorMethod {
 OFFSET,
-PAGE
+PAGE,
+INDEX
   }
 
-  private HttpPaginatorConfig(HttpPaginatorConfig.HttpPaginatorBuilder 
builder) {
+  /*private HttpPaginatorConfig(HttpPaginatorConfig.HttpPaginatorConfigBuilder 
builder) {

Review Comment:
   Oops... Fixed.





> Add Support for Keyset Based Pagination
> ---
>
> Key: DRILL-8287
> URL: https://issues.apache.org/jira/browse/DRILL-8287
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - HTTP
>Affects Versions: 1.20.2
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 2.0.0
>
>
> Some APIs such as HubSpot use values in the result set to indicate whether 
> there are additional pages.  This PR adds support for this kind of 
> pagination.  Note that current implementation only works for JSON based APIs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8287) Add Support for Keyset Based Pagination

2022-08-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17597178#comment-17597178
 ] 

ASF GitHub Bot commented on DRILL-8287:
---

jnturton commented on code in PR #2633:
URL: https://github.com/apache/drill/pull/2633#discussion_r957255881


##
contrib/storage-http/src/main/java/org/apache/drill/exec/store/http/HttpPaginatorConfig.java:
##
@@ -137,21 +162,28 @@ public String toString() {
   .field("pageSize", pageSize)
   .field("maxRecords", maxRecords)
   .field("method", method)
+  .field("indexParam", indexParam)
+  .field("hasMoreParam", hasMoreParam)
+  .field("nextPageParam", nextPageParam)
   .toString();
   }
 
   public enum PaginatorMethod {
 OFFSET,
-PAGE
+PAGE,
+INDEX
   }
 
-  private HttpPaginatorConfig(HttpPaginatorConfig.HttpPaginatorBuilder 
builder) {
+  /*private HttpPaginatorConfig(HttpPaginatorConfig.HttpPaginatorConfigBuilder 
builder) {

Review Comment:
   Is this commented out code meant to be included?





> Add Support for Keyset Based Pagination
> ---
>
> Key: DRILL-8287
> URL: https://issues.apache.org/jira/browse/DRILL-8287
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - HTTP
>Affects Versions: 1.20.2
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 2.0.0
>
>
> Some APIs such as HubSpot use values in the result set to indicate whether 
> there are additional pages.  This PR adds support for this kind of 
> pagination.  Note that current implementation only works for JSON based APIs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8287) Add Support for Keyset Based Pagination

2022-08-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17597174#comment-17597174
 ] 

ASF GitHub Bot commented on DRILL-8287:
---

jnturton commented on PR #2633:
URL: https://github.com/apache/drill/pull/2633#issuecomment-1230204078

   I'm not sure that the concept of pagination from the HTTP plugin should 
spill into the JSON reader. Can you abstract it, e.g. by renaming paginationMap 
to, say, listenerColumnMap?  




> Add Support for Keyset Based Pagination
> ---
>
> Key: DRILL-8287
> URL: https://issues.apache.org/jira/browse/DRILL-8287
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - HTTP
>Affects Versions: 1.20.2
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 2.0.0
>
>
> Some APIs such as HubSpot use values in the result set to indicate whether 
> there are additional pages.  This PR adds support for this kind of 
> pagination.  Note that current implementation only works for JSON based APIs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8282) Upgrade to hadoop-common 3.2.4 due to CVE

2022-08-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17597084#comment-17597084
 ] 

ASF GitHub Bot commented on DRILL-8282:
---

jnturton opened a new pull request, #2635:
URL: https://github.com/apache/drill/pull/2635

   # [DRILL-8282](https://issues.apache.org/jira/browse/DRILL-8282): Update 
hadoop.dll and winutils.exe to 3.2.4.
   
   ## Description
   
   Completes #2630 by updating hadoop.dll and winutils.exe to 3.2.4.
   
   ## Documentation
   N/A
   
   ## Testing
   Launch Drill on Windows.
   




> Upgrade to hadoop-common 3.2.4 due to CVE 
> --
>
> Key: DRILL-8282
> URL: https://issues.apache.org/jira/browse/DRILL-8282
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: PJ Fanning
>Priority: Major
>
> https://github.com/advisories/GHSA-8wm5-8h9c-47pc
> * this change requires some reload4j dependency changes too - see broken 
> build - https://github.com/apache/drill/pull/2628



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8282) Upgrade to hadoop-common 3.2.4 due to CVE

2022-08-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17596988#comment-17596988
 ] 

ASF GitHub Bot commented on DRILL-8282:
---

jnturton commented on PR #2630:
URL: https://github.com/apache/drill/pull/2630#issuecomment-1229820379

   We also need to update hadoop.dll and winutils.exe.




> Upgrade to hadoop-common 3.2.4 due to CVE 
> --
>
> Key: DRILL-8282
> URL: https://issues.apache.org/jira/browse/DRILL-8282
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: PJ Fanning
>Priority: Major
>
> https://github.com/advisories/GHSA-8wm5-8h9c-47pc
> * this change requires some reload4j dependency changes too - see broken 
> build - https://github.com/apache/drill/pull/2628



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8282) Upgrade to hadoop-common 3.2.4 due to CVE

2022-08-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17596963#comment-17596963
 ] 

ASF GitHub Bot commented on DRILL-8282:
---

cgivre commented on PR #2630:
URL: https://github.com/apache/drill/pull/2630#issuecomment-1229759970

   @jnturton Are we good to merge this?




> Upgrade to hadoop-common 3.2.4 due to CVE 
> --
>
> Key: DRILL-8282
> URL: https://issues.apache.org/jira/browse/DRILL-8282
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: PJ Fanning
>Priority: Major
>
> https://github.com/advisories/GHSA-8wm5-8h9c-47pc
> * this change requires some reload4j dependency changes too - see broken 
> build - https://github.com/apache/drill/pull/2628



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8289) Add Threat Hunting Functions

2022-08-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17596962#comment-17596962
 ] 

ASF GitHub Bot commented on DRILL-8289:
---

cgivre opened a new pull request, #2634:
URL: https://github.com/apache/drill/pull/2634

   # [DRILL-8289](https://issues.apache.org/jira/browse/DRILL-8289): Add Threat 
Hunting Functions
   
   ## Description
   See below.
   
   ## Documentation
   These functions are useful for doing threat hunting with Apache Drill.  
These were inspired by huntlib.[1]
   
   The functions are: 
   * `punctuation_pattern()`:  Extracts the pattern of punctuation in 
text.
   * `entropy()`: This function calculates the Shannon Entropy of a 
given string of text.
   * `entropyPerByte()`: This function calculates the Shannon Entropy 
of a given string of text, normed for the string length.
   
   [1]: https://github.com/target/huntlib
   
   ## Testing
   Added unit tests.




> Add Threat Hunting Functions
> 
>
> Key: DRILL-8289
> URL: https://issues.apache.org/jira/browse/DRILL-8289
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Functions - Drill
>Affects Versions: 2.0.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 2.0.0
>
>
> # Threat Hunting Functions
> These functions are useful for doing threat hunting with Apache Drill. These 
> were inspired by huntlib.[1]
> The functions are: 
> * `punctuation_pattern()`: Extracts the pattern of punctuation in 
> text.
> * `entropy()`: This function calculates the Shannon Entropy of a 
> given string of text.
> * `entropyPerByte()`: This function calculates the Shannon Entropy of 
> a given string of text, normed for the string length.
> [1]: https://github.com/target/huntlib



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-4232) Support for EXCEPT set operator

2022-08-27 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17586135#comment-17586135
 ] 

ASF GitHub Bot commented on DRILL-4232:
---

Leon-WTF commented on PR #2599:
URL: https://github.com/apache/drill/pull/2599#issuecomment-1229365001

   > @Leon-WTF Is this ready for review?
   
   @cgivre Not yet, I'm handling the EXCEPT case, it needs to remove the 
duplicate records for probe side, I'm trying to add an Agg phase after setop 
phase. The agg phase needs a flag to indicate that it needs to group by all 
columns as the columns can not be known when doing the plan. Any suggestion on 
this?
   
   
   




> Support for EXCEPT set operator
> ---
>
> Key: DRILL-4232
> URL: https://issues.apache.org/jira/browse/DRILL-4232
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Query Planning  Optimization
>Reporter: Victoria Markman
>Assignee: Tengfei Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-4232) Support for EXCEPT set operator

2022-08-27 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-4232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17586130#comment-17586130
 ] 

ASF GitHub Bot commented on DRILL-4232:
---

cgivre commented on PR #2599:
URL: https://github.com/apache/drill/pull/2599#issuecomment-1229353429

   @Leon-WTF Is this ready for review?  




> Support for EXCEPT set operator
> ---
>
> Key: DRILL-4232
> URL: https://issues.apache.org/jira/browse/DRILL-4232
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Query Planning  Optimization
>Reporter: Victoria Markman
>Assignee: Tengfei Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8287) Add Support for Keyset Based Pagination

2022-08-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17584917#comment-17584917
 ] 

ASF GitHub Bot commented on DRILL-8287:
---

cgivre opened a new pull request, #2633:
URL: https://github.com/apache/drill/pull/2633

   # [DRILL-8287](https://issues.apache.org/jira/browse/DRILL-8287): Add 
Support for Keyset Based Pagination
   
   ## Description
   Some APIs such as HubSpot use values in the result set to indicate whether 
there are additional pages.  This PR adds support for this kind of pagination.  
Note that current implementation only works for JSON based APIs.
   
   This PR also addresses 
[DRILL-8286](https://issues.apache.org/jira/browse/DRILL-8286), which is a 
minor bugfix for the GoogleSheets config. 
   
   ## Documentation
   Updated Pagination.md.
   
   ## Testing
   Added unit tests and manually tested against Hubspot API.




> Add Support for Keyset Based Pagination
> ---
>
> Key: DRILL-8287
> URL: https://issues.apache.org/jira/browse/DRILL-8287
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - HTTP
>Affects Versions: 1.20.2
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 2.0.0
>
>
> Some APIs such as HubSpot use values in the result set to indicate whether 
> there are additional pages.  This PR adds support for this kind of 
> pagination.  Note that current implementation only works for JSON based APIs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8283) Add a configurable recursive file listing size limit

2022-08-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17584825#comment-17584825
 ] 

ASF GitHub Bot commented on DRILL-8283:
---

jnturton commented on code in PR #2632:
URL: https://github.com/apache/drill/pull/2632#discussion_r954914617


##
exec/java-exec/src/main/java/org/apache/drill/exec/util/FileSystemUtil.java:
##
@@ -42,6 +47,12 @@ public class FileSystemUtil {
 
   private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(FileSystemUtil.class);
 
+  private static int recursiveListingMaxSize;
+
+  static {
+recursiveListingMaxSize = 
DrillConfig.create().getInt(ExecConstants.RECURSIVE_FILE_LISTING_MAX_SIZE);
+  }

Review Comment:
   That it might be a heavyweight duplication of work was bothering me enough 
that I went and timed it. It takes about ~100ms when I start embedded Drill 
locally. That's just enough to make me wonder if it's worth trying to redesign 
this stuff so that it loads from an existing instance of DrillConfig instead of 
constructing its own.





> Add a configurable recursive file listing size limit
> 
>
> Key: DRILL-8283
> URL: https://issues.apache.org/jira/browse/DRILL-8283
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.20.2
>Reporter: James Turton
>Assignee: James Turton
>Priority: Minor
> Fix For: 1.20.3
>
>
> Currently a malicious or merely unwitting user can crash their Drill foreman 
> by sending
> {code:java}
> select * from dfs.huge_workspace limit 10
> {code}
> causing the query planner to recurse over every file in huge_workspace and 
> culminating in
> {code:java}
> 2022-08-09 15:13:22,251 [1d0da29f-e50c-fd51-43d9-8a5086d52c4e:foreman] ERROR 
> o.a.drill.common.CatastrophicFailure - Catastrophic Failure Occurred, 
> exiting. Information message: Unable to handle out of memory condition in 
> Foreman.java.lang.OutOfMemoryError: null {code}
> if there are enough files in huge_workspace. A SHOW FILES command can produce 
> the same effect. This issue proposes a new BOOT option named 
> drill.exec.storage.file.recursive_listing_max_size with a default value of, 
> say 10 000. If a file listing task exceeds this limit then the initiating 
> operation is terminated with a UserException preventing runaway resource 
> usage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8283) Add a configurable recursive file listing size limit

2022-08-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17584753#comment-17584753
 ] 

ASF GitHub Bot commented on DRILL-8283:
---

jnturton commented on code in PR #2632:
URL: https://github.com/apache/drill/pull/2632#discussion_r954781871


##
exec/java-exec/src/main/java/org/apache/drill/exec/util/FileSystemUtil.java:
##
@@ -302,12 +332,32 @@ protected List compute() {
   List tasks = new ArrayList<>();
 
   try {
-for (FileStatus status : fs.listStatus(path, filter)) {
+FileStatus[] dirFs = fs.listStatus(path, filter);
+if (recursiveListingMaxSize > 0 && fileCounter.addAndGet(dirFs.length) 
> recursiveListingMaxSize) {
+  throw UserException

Review Comment:
   @vvysotskyi I've added an attempt to do that now.





> Add a configurable recursive file listing size limit
> 
>
> Key: DRILL-8283
> URL: https://issues.apache.org/jira/browse/DRILL-8283
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.20.2
>Reporter: James Turton
>Assignee: James Turton
>Priority: Minor
> Fix For: 1.20.3
>
>
> Currently a malicious or merely unwitting user can crash their Drill foreman 
> by sending
> {code:java}
> select * from dfs.huge_workspace limit 10
> {code}
> causing the query planner to recurse over every file in huge_workspace and 
> culminating in
> {code:java}
> 2022-08-09 15:13:22,251 [1d0da29f-e50c-fd51-43d9-8a5086d52c4e:foreman] ERROR 
> o.a.drill.common.CatastrophicFailure - Catastrophic Failure Occurred, 
> exiting. Information message: Unable to handle out of memory condition in 
> Foreman.java.lang.OutOfMemoryError: null {code}
> if there are enough files in huge_workspace. A SHOW FILES command can produce 
> the same effect. This issue proposes a new BOOT option named 
> drill.exec.storage.file.recursive_listing_max_size with a default value of, 
> say 10 000. If a file listing task exceeds this limit then the initiating 
> operation is terminated with a UserException preventing runaway resource 
> usage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8283) Add a configurable recursive file listing size limit

2022-08-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17584726#comment-17584726
 ] 

ASF GitHub Bot commented on DRILL-8283:
---

vvysotskyi commented on code in PR #2632:
URL: https://github.com/apache/drill/pull/2632#discussion_r954723117


##
exec/java-exec/src/main/java/org/apache/drill/exec/util/FileSystemUtil.java:
##
@@ -302,12 +332,32 @@ protected List compute() {
   List tasks = new ArrayList<>();
 
   try {
-for (FileStatus status : fs.listStatus(path, filter)) {
+FileStatus[] dirFs = fs.listStatus(path, filter);
+if (recursiveListingMaxSize > 0 && fileCounter.addAndGet(dirFs.length) 
> recursiveListingMaxSize) {
+  throw UserException

Review Comment:
   This code is executed within fork join pool. Can we somehow stop executing 
all tasks for the case when count is reached?





> Add a configurable recursive file listing size limit
> 
>
> Key: DRILL-8283
> URL: https://issues.apache.org/jira/browse/DRILL-8283
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.20.2
>Reporter: James Turton
>Assignee: James Turton
>Priority: Minor
> Fix For: 1.20.3
>
>
> Currently a malicious or merely unwitting user can crash their Drill foreman 
> by sending
> {code:java}
> select * from dfs.huge_workspace limit 10
> {code}
> causing the query planner to recurse over every file in huge_workspace and 
> culminating in
> {code:java}
> 2022-08-09 15:13:22,251 [1d0da29f-e50c-fd51-43d9-8a5086d52c4e:foreman] ERROR 
> o.a.drill.common.CatastrophicFailure - Catastrophic Failure Occurred, 
> exiting. Information message: Unable to handle out of memory condition in 
> Foreman.java.lang.OutOfMemoryError: null {code}
> if there are enough files in huge_workspace. A SHOW FILES command can produce 
> the same effect. This issue proposes a new BOOT option named 
> drill.exec.storage.file.recursive_listing_max_size with a default value of, 
> say 10 000. If a file listing task exceeds this limit then the initiating 
> operation is terminated with a UserException preventing runaway resource 
> usage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8283) Add a configurable recursive file listing size limit

2022-08-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17584725#comment-17584725
 ] 

ASF GitHub Bot commented on DRILL-8283:
---

vvysotskyi commented on code in PR #2632:
URL: https://github.com/apache/drill/pull/2632#discussion_r954723117


##
exec/java-exec/src/main/java/org/apache/drill/exec/util/FileSystemUtil.java:
##
@@ -302,12 +332,32 @@ protected List compute() {
   List tasks = new ArrayList<>();
 
   try {
-for (FileStatus status : fs.listStatus(path, filter)) {
+FileStatus[] dirFs = fs.listStatus(path, filter);
+if (recursiveListingMaxSize > 0 && fileCounter.addAndGet(dirFs.length) 
> recursiveListingMaxSize) {
+  throw UserException

Review Comment:
   This code is executed within fork join pool, and if error suppression flag 
is enabled, it will call task.fork(). Can we somehow stop executing all tasks 
for the case when count is reached?





> Add a configurable recursive file listing size limit
> 
>
> Key: DRILL-8283
> URL: https://issues.apache.org/jira/browse/DRILL-8283
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.20.2
>Reporter: James Turton
>Assignee: James Turton
>Priority: Minor
> Fix For: 1.20.3
>
>
> Currently a malicious or merely unwitting user can crash their Drill foreman 
> by sending
> {code:java}
> select * from dfs.huge_workspace limit 10
> {code}
> causing the query planner to recurse over every file in huge_workspace and 
> culminating in
> {code:java}
> 2022-08-09 15:13:22,251 [1d0da29f-e50c-fd51-43d9-8a5086d52c4e:foreman] ERROR 
> o.a.drill.common.CatastrophicFailure - Catastrophic Failure Occurred, 
> exiting. Information message: Unable to handle out of memory condition in 
> Foreman.java.lang.OutOfMemoryError: null {code}
> if there are enough files in huge_workspace. A SHOW FILES command can produce 
> the same effect. This issue proposes a new BOOT option named 
> drill.exec.storage.file.recursive_listing_max_size with a default value of, 
> say 10 000. If a file listing task exceeds this limit then the initiating 
> operation is terminated with a UserException preventing runaway resource 
> usage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8283) Add a configurable recursive file listing size limit

2022-08-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17584701#comment-17584701
 ] 

ASF GitHub Bot commented on DRILL-8283:
---

jnturton commented on code in PR #2632:
URL: https://github.com/apache/drill/pull/2632#discussion_r954644543


##
exec/java-exec/src/main/resources/drill-module.conf:
##
@@ -115,7 +115,8 @@ drill.exec: {
   text: {
 buffer.size: 262144,
 batch.size: 4000
-  }
+  },
+  recursive_listing_max_size: 1

Review Comment:
   A limit of 0 (or less) now means no limit and the default, for this PR, is 
now 0. 





> Add a configurable recursive file listing size limit
> 
>
> Key: DRILL-8283
> URL: https://issues.apache.org/jira/browse/DRILL-8283
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.20.2
>Reporter: James Turton
>Assignee: James Turton
>Priority: Minor
> Fix For: 1.20.3
>
>
> Currently a malicious or merely unwitting user can crash their Drill foreman 
> by sending
> {code:java}
> select * from dfs.huge_workspace limit 10
> {code}
> causing the query planner to recurse over every file in huge_workspace and 
> culminating in
> {code:java}
> 2022-08-09 15:13:22,251 [1d0da29f-e50c-fd51-43d9-8a5086d52c4e:foreman] ERROR 
> o.a.drill.common.CatastrophicFailure - Catastrophic Failure Occurred, 
> exiting. Information message: Unable to handle out of memory condition in 
> Foreman.java.lang.OutOfMemoryError: null {code}
> if there are enough files in huge_workspace. A SHOW FILES command can produce 
> the same effect. This issue proposes a new BOOT option named 
> drill.exec.storage.file.recursive_listing_max_size with a default value of, 
> say 10 000. If a file listing task exceeds this limit then the initiating 
> operation is terminated with a UserException preventing runaway resource 
> usage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8283) Add a configurable recursive file listing size limit

2022-08-24 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17584463#comment-17584463
 ] 

ASF GitHub Bot commented on DRILL-8283:
---

vvysotskyi commented on code in PR #2632:
URL: https://github.com/apache/drill/pull/2632#discussion_r954278444


##
exec/java-exec/src/main/resources/drill-module.conf:
##
@@ -115,7 +115,8 @@ drill.exec: {
   text: {
 buffer.size: 262144,
 batch.size: 4000
-  }
+  },
+  recursive_listing_max_size: 1

Review Comment:
   Yes, the default value should be adjusted. For the big data world, thousands 
of files are quite a small amount. For non-parquet files FileStatus is small, 
so it shouldn't cause large pressure on memory. For parquet files, it would be 
good to provide the functionality to disable reading metadata for planning and 
use it only during execution to avoid issues with huge files amount.





> Add a configurable recursive file listing size limit
> 
>
> Key: DRILL-8283
> URL: https://issues.apache.org/jira/browse/DRILL-8283
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.20.2
>Reporter: James Turton
>Assignee: James Turton
>Priority: Minor
> Fix For: 1.20.3
>
>
> Currently a malicious or merely unwitting user can crash their Drill foreman 
> by sending
> {code:java}
> select * from dfs.huge_workspace limit 10
> {code}
> causing the query planner to recurse over every file in huge_workspace and 
> culminating in
> {code:java}
> 2022-08-09 15:13:22,251 [1d0da29f-e50c-fd51-43d9-8a5086d52c4e:foreman] ERROR 
> o.a.drill.common.CatastrophicFailure - Catastrophic Failure Occurred, 
> exiting. Information message: Unable to handle out of memory condition in 
> Foreman.java.lang.OutOfMemoryError: null {code}
> if there are enough files in huge_workspace. A SHOW FILES command can produce 
> the same effect. This issue proposes a new BOOT option named 
> drill.exec.storage.file.recursive_listing_max_size with a default value of, 
> say 10 000. If a file listing task exceeds this limit then the initiating 
> operation is terminated with a UserException preventing runaway resource 
> usage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8283) Add a configurable recursive file listing size limit

2022-08-24 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17584449#comment-17584449
 ] 

ASF GitHub Bot commented on DRILL-8283:
---

pjfanning commented on code in PR #2632:
URL: https://github.com/apache/drill/pull/2632#discussion_r954255629


##
exec/java-exec/src/main/resources/drill-module.conf:
##
@@ -115,7 +115,8 @@ drill.exec: {
   text: {
 buffer.size: 262144,
 batch.size: 4000
-  }
+  },
+  recursive_listing_max_size: 1

Review Comment:
   My 2 cents is that limits ideally should be set by default to a sensible 
level. For Drill 2.0.0, enforcing that some sort of limit is set would be 
something that I'd support. For Drill 1.x, it would not be a good idea to 
enforce limits by default but supporting them optionally would be useful (to 
avoid introducing changes that might force users to tune configs in a minor 
release).





> Add a configurable recursive file listing size limit
> 
>
> Key: DRILL-8283
> URL: https://issues.apache.org/jira/browse/DRILL-8283
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.20.2
>Reporter: James Turton
>Assignee: James Turton
>Priority: Minor
> Fix For: 1.20.3
>
>
> Currently a malicious or merely unwitting user can crash their Drill foreman 
> by sending
> {code:java}
> select * from dfs.huge_workspace limit 10
> {code}
> causing the query planner to recurse over every file in huge_workspace and 
> culminating in
> {code:java}
> 2022-08-09 15:13:22,251 [1d0da29f-e50c-fd51-43d9-8a5086d52c4e:foreman] ERROR 
> o.a.drill.common.CatastrophicFailure - Catastrophic Failure Occurred, 
> exiting. Information message: Unable to handle out of memory condition in 
> Foreman.java.lang.OutOfMemoryError: null {code}
> if there are enough files in huge_workspace. A SHOW FILES command can produce 
> the same effect. This issue proposes a new BOOT option named 
> drill.exec.storage.file.recursive_listing_max_size with a default value of, 
> say 10 000. If a file listing task exceeds this limit then the initiating 
> operation is terminated with a UserException preventing runaway resource 
> usage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8283) Add a configurable recursive file listing size limit

2022-08-24 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17584429#comment-17584429
 ] 

ASF GitHub Bot commented on DRILL-8283:
---

vvysotskyi commented on code in PR #2632:
URL: https://github.com/apache/drill/pull/2632#discussion_r954203172


##
exec/java-exec/src/main/resources/drill-module.conf:
##
@@ -115,7 +115,8 @@ drill.exec: {
   text: {
 buffer.size: 262144,
 batch.size: 4000
-  }
+  },
+  recursive_listing_max_size: 1

Review Comment:
   Could you please make this limit optional?





> Add a configurable recursive file listing size limit
> 
>
> Key: DRILL-8283
> URL: https://issues.apache.org/jira/browse/DRILL-8283
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.20.2
>Reporter: James Turton
>Assignee: James Turton
>Priority: Minor
> Fix For: 1.20.3
>
>
> Currently a malicious or merely unwitting user can crash their Drill foreman 
> by sending
> {code:java}
> select * from dfs.huge_workspace limit 10
> {code}
> causing the query planner to recurse over every file in huge_workspace and 
> culminating in
> {code:java}
> 2022-08-09 15:13:22,251 [1d0da29f-e50c-fd51-43d9-8a5086d52c4e:foreman] ERROR 
> o.a.drill.common.CatastrophicFailure - Catastrophic Failure Occurred, 
> exiting. Information message: Unable to handle out of memory condition in 
> Foreman.java.lang.OutOfMemoryError: null {code}
> if there are enough files in huge_workspace. A SHOW FILES command can produce 
> the same effect. This issue proposes a new BOOT option named 
> drill.exec.storage.file.recursive_listing_max_size with a default value of, 
> say 10 000. If a file listing task exceeds this limit then the initiating 
> operation is terminated with a UserException preventing runaway resource 
> usage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8283) Add a configurable recursive file listing size limit

2022-08-24 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17584296#comment-17584296
 ] 

ASF GitHub Bot commented on DRILL-8283:
---

cgivre commented on code in PR #2632:
URL: https://github.com/apache/drill/pull/2632#discussion_r953900054


##
exec/java-exec/src/main/java/org/apache/drill/exec/util/FileSystemUtil.java:
##
@@ -42,6 +47,12 @@ public class FileSystemUtil {
 
   private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(FileSystemUtil.class);
 
+  private static int recursiveListingMaxSize;
+
+  static {
+recursiveListingMaxSize = 
DrillConfig.create().getInt(ExecConstants.RECURSIVE_FILE_LISTING_MAX_SIZE);
+  }

Review Comment:
   This looks wonky but correct.





> Add a configurable recursive file listing size limit
> 
>
> Key: DRILL-8283
> URL: https://issues.apache.org/jira/browse/DRILL-8283
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.20.2
>Reporter: James Turton
>Assignee: James Turton
>Priority: Minor
> Fix For: 1.20.3
>
>
> Currently a malicious or merely unwitting user can crash their Drill foreman 
> by sending
> {code:java}
> select * from dfs.huge_workspace limit 10
> {code}
> causing the query planner to recurse over every file in huge_workspace and 
> culminating in
> {code:java}
> 2022-08-09 15:13:22,251 [1d0da29f-e50c-fd51-43d9-8a5086d52c4e:foreman] ERROR 
> o.a.drill.common.CatastrophicFailure - Catastrophic Failure Occurred, 
> exiting. Information message: Unable to handle out of memory condition in 
> Foreman.java.lang.OutOfMemoryError: null {code}
> if there are enough files in huge_workspace. A SHOW FILES command can produce 
> the same effect. This issue proposes a new BOOT option named 
> drill.exec.storage.file.recursive_listing_max_size with a default value of, 
> say 10 000. If a file listing task exceeds this limit then the initiating 
> operation is terminated with a UserException preventing runaway resource 
> usage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8283) Add a configurable recursive file listing size limit

2022-08-24 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17584048#comment-17584048
 ] 

ASF GitHub Bot commented on DRILL-8283:
---

jnturton commented on code in PR #2632:
URL: https://github.com/apache/drill/pull/2632#discussion_r953428323


##
exec/java-exec/src/main/java/org/apache/drill/exec/util/FileSystemUtil.java:
##
@@ -42,6 +47,12 @@ public class FileSystemUtil {
 
   private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(FileSystemUtil.class);
 
+  private static int recursiveListingMaxSize;
+
+  static {
+recursiveListingMaxSize = 
DrillConfig.create().getInt(ExecConstants.RECURSIVE_FILE_LISTING_MAX_SIZE);
+  }

Review Comment:
   This route to the config option felt pretty weird, I don't if there's a 
better way?





> Add a configurable recursive file listing size limit
> 
>
> Key: DRILL-8283
> URL: https://issues.apache.org/jira/browse/DRILL-8283
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.20.2
>Reporter: James Turton
>Assignee: James Turton
>Priority: Minor
> Fix For: 1.20.3
>
>
> Currently a malicious or merely unwitting user can crash their Drill foreman 
> by sending
> {code:java}
> select * from dfs.huge_workspace limit 10
> {code}
> causing the query planner to recurse over every file in huge_workspace and 
> culminating in
> {code:java}
> 2022-08-09 15:13:22,251 [1d0da29f-e50c-fd51-43d9-8a5086d52c4e:foreman] ERROR 
> o.a.drill.common.CatastrophicFailure - Catastrophic Failure Occurred, 
> exiting. Information message: Unable to handle out of memory condition in 
> Foreman.java.lang.OutOfMemoryError: null {code}
> if there are enough files in huge_workspace. A SHOW FILES command can produce 
> the same effect. This issue proposes a new BOOT option named 
> drill.exec.storage.file.recursive_listing_max_size with a default value of, 
> say 10 000. If a file listing task exceeds this limit then the initiating 
> operation is terminated with a UserException preventing runaway resource 
> usage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8283) Add a configurable recursive file listing size limit

2022-08-24 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17584052#comment-17584052
 ] 

ASF GitHub Bot commented on DRILL-8283:
---

jnturton commented on code in PR #2632:
URL: https://github.com/apache/drill/pull/2632#discussion_r953428323


##
exec/java-exec/src/main/java/org/apache/drill/exec/util/FileSystemUtil.java:
##
@@ -42,6 +47,12 @@ public class FileSystemUtil {
 
   private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(FileSystemUtil.class);
 
+  private static int recursiveListingMaxSize;
+
+  static {
+recursiveListingMaxSize = 
DrillConfig.create().getInt(ExecConstants.RECURSIVE_FILE_LISTING_MAX_SIZE);
+  }

Review Comment:
   This route to the config option felt pretty weird, I don't know if there's a 
better way?





> Add a configurable recursive file listing size limit
> 
>
> Key: DRILL-8283
> URL: https://issues.apache.org/jira/browse/DRILL-8283
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.20.2
>Reporter: James Turton
>Assignee: James Turton
>Priority: Minor
> Fix For: 1.20.3
>
>
> Currently a malicious or merely unwitting user can crash their Drill foreman 
> by sending
> {code:java}
> select * from dfs.huge_workspace limit 10
> {code}
> causing the query planner to recurse over every file in huge_workspace and 
> culminating in
> {code:java}
> 2022-08-09 15:13:22,251 [1d0da29f-e50c-fd51-43d9-8a5086d52c4e:foreman] ERROR 
> o.a.drill.common.CatastrophicFailure - Catastrophic Failure Occurred, 
> exiting. Information message: Unable to handle out of memory condition in 
> Foreman.java.lang.OutOfMemoryError: null {code}
> if there are enough files in huge_workspace. A SHOW FILES command can produce 
> the same effect. This issue proposes a new BOOT option named 
> drill.exec.storage.file.recursive_listing_max_size with a default value of, 
> say 10 000. If a file listing task exceeds this limit then the initiating 
> operation is terminated with a UserException preventing runaway resource 
> usage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8283) Add a configurable recursive file listing size limit

2022-08-24 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17584047#comment-17584047
 ] 

ASF GitHub Bot commented on DRILL-8283:
---

jnturton commented on code in PR #2632:
URL: https://github.com/apache/drill/pull/2632#discussion_r953427756


##
exec/java-exec/src/main/java/org/apache/drill/exec/util/FileSystemUtil.java:
##
@@ -42,6 +47,12 @@ public class FileSystemUtil {
 
   private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(FileSystemUtil.class);
 
+  private static int recursiveListingMaxSize;
+
+  static {
+recursiveListingMaxSize = 
DrillConfig.create().getInt(ExecConstants.RECURSIVE_FILE_LISTING_MAX_SIZE);

Review Comment:
   This route to the config option felt pretty weird, I don't if there's a 
better way?





> Add a configurable recursive file listing size limit
> 
>
> Key: DRILL-8283
> URL: https://issues.apache.org/jira/browse/DRILL-8283
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.20.2
>Reporter: James Turton
>Assignee: James Turton
>Priority: Minor
> Fix For: 1.20.3
>
>
> Currently a malicious or merely unwitting user can crash their Drill foreman 
> by sending
> {code:java}
> select * from dfs.huge_workspace limit 10
> {code}
> causing the query planner to recurse over every file in huge_workspace and 
> culminating in
> {code:java}
> 2022-08-09 15:13:22,251 [1d0da29f-e50c-fd51-43d9-8a5086d52c4e:foreman] ERROR 
> o.a.drill.common.CatastrophicFailure - Catastrophic Failure Occurred, 
> exiting. Information message: Unable to handle out of memory condition in 
> Foreman.java.lang.OutOfMemoryError: null {code}
> if there are enough files in huge_workspace. A SHOW FILES command can produce 
> the same effect. This issue proposes a new BOOT option named 
> drill.exec.storage.file.recursive_listing_max_size with a default value of, 
> say 10 000. If a file listing task exceeds this limit then the initiating 
> operation is terminated with a UserException preventing runaway resource 
> usage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8283) Add a configurable recursive file listing size limit

2022-08-24 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17584043#comment-17584043
 ] 

ASF GitHub Bot commented on DRILL-8283:
---

jnturton opened a new pull request, #2632:
URL: https://github.com/apache/drill/pull/2632

   # [DRILL-8283](https://issues.apache.org/jira/browse/DRILL-8283): Add a 
configurable recursive file listing size limit
   
   ## Description
   
   Currently a malicious or merely unwitting user can crash their Drill foreman 
by sending
   ```
   select * from dfs.huge_workspace limit 10
   ```
   causing the query planner to recurse over every file in huge_workspace and 
culminating in
   ```
   2022-08-09 15:13:22,251 [1d0da29f-e50c-fd51-43d9-8a5086d52c4e:foreman] ERROR 
o.a.drill.common.CatastrophicFailure - Catastrophic Failure Occurred, exiting. 
Information message: Unable to handle out of memory condition in 
Foreman.java.lang.OutOfMemoryError: null 
   ```
   if there are enough files in huge_workspace. A SHOW FILES command can 
produce the same effect. This issue proposes a new BOOT option named 
drill.exec.storage.file.recursive_listing_max_size with a default value of, say 
10 000. If a file listing task exceeds this limit then the initiating operation 
is terminated with a UserException preventing runaway resource usage.
   
   ## Documentation
   New entry on https://drill.apache.org/docs/start-up-options/
   
   ## Testing
   FileSystemUtilTest#testRecursiveListingMaxSize
   




> Add a configurable recursive file listing size limit
> 
>
> Key: DRILL-8283
> URL: https://issues.apache.org/jira/browse/DRILL-8283
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: 1.20.2
>Reporter: James Turton
>Assignee: James Turton
>Priority: Minor
> Fix For: 1.20.3
>
>
> Currently a malicious or merely unwitting user can crash their Drill foreman 
> by sending
> {code:java}
> select * from dfs.huge_workspace limit 10
> {code}
> causing the query planner to recurse over every file in huge_workspace and 
> culminating in
> {code:java}
> 2022-08-09 15:13:22,251 [1d0da29f-e50c-fd51-43d9-8a5086d52c4e:foreman] ERROR 
> o.a.drill.common.CatastrophicFailure - Catastrophic Failure Occurred, 
> exiting. Information message: Unable to handle out of memory condition in 
> Foreman.java.lang.OutOfMemoryError: null {code}
> if there are enough files in huge_workspace. A SHOW FILES command can produce 
> the same effect. This issue proposes a new BOOT option named 
> drill.exec.storage.file.recursive_listing_max_size with a default value of, 
> say 10 000. If a file listing task exceeds this limit then the initiating 
> operation is terminated with a UserException preventing runaway resource 
> usage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-7856) Add lgtm badge to Drill and fix alerts

2022-08-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17583648#comment-17583648
 ] 

ASF GitHub Bot commented on DRILL-7856:
---

cgivre closed pull request #2187: DRILL-7856 Add lgtm badge to Drill and fix 
alerts
URL: https://github.com/apache/drill/pull/2187




> Add lgtm badge to Drill and fix alerts
> --
>
> Key: DRILL-7856
> URL: https://issues.apache.org/jira/browse/DRILL-7856
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.18.0
>Reporter: Vitalii Diravka
>Priority: Trivial
>  Labels: badge, github
>
> Consider adding new badges to Drill github, for instance _lgtm_ badges (code 
> quality and alerts number):
> [https://lgtm.com/projects/g/apache/drill/context:java]
> As an example please check:
> [https://github.com/kaitoy/pcap4j]
> As a separate ticket can be considered decreasing the number of alerts of 
> Drill project:
> https://lgtm.com/projects/g/apache/drill/alerts/?mode=list



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-7856) Add lgtm badge to Drill and fix alerts

2022-08-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17583647#comment-17583647
 ] 

ASF GitHub Bot commented on DRILL-7856:
---

cgivre commented on PR #2187:
URL: https://github.com/apache/drill/pull/2187#issuecomment-1224126726

   LGTM is closing in Dec, 2022.  
https://github.blog/2022-08-15-the-next-step-for-lgtm-com-github-code-scanning/




> Add lgtm badge to Drill and fix alerts
> --
>
> Key: DRILL-7856
> URL: https://issues.apache.org/jira/browse/DRILL-7856
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.18.0
>Reporter: Vitalii Diravka
>Priority: Trivial
>  Labels: badge, github
>
> Consider adding new badges to Drill github, for instance _lgtm_ badges (code 
> quality and alerts number):
> [https://lgtm.com/projects/g/apache/drill/context:java]
> As an example please check:
> [https://github.com/kaitoy/pcap4j]
> As a separate ticket can be considered decreasing the number of alerts of 
> Drill project:
> https://lgtm.com/projects/g/apache/drill/alerts/?mode=list



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8282) upgrade to hadoop-common 3.2.4 due to cve

2022-08-22 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17583216#comment-17583216
 ] 

ASF GitHub Bot commented on DRILL-8282:
---

pjfanning opened a new pull request, #2630:
URL: https://github.com/apache/drill/pull/2630

   ## Description
   
   There is a CVE fix in hadoop 3.2.4
   
   ## Documentation
   (Please describe user-visible changes similar to what should appear in the 
Drill documentation.)
   
   ## Testing
   (Please describe how this PR has been tested.)
   




> upgrade to hadoop-common 3.2.4 due to cve 
> --
>
> Key: DRILL-8282
> URL: https://issues.apache.org/jira/browse/DRILL-8282
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: PJ Fanning
>Priority: Major
>
> https://github.com/advisories/GHSA-8wm5-8h9c-47pc
> * this change requires some reload4j dependency changes too - see broken 
> build - https://github.com/apache/drill/pull/2628



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8281) Info schema LIKE with ESCAPE push down bug

2022-08-21 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582725#comment-17582725
 ] 

ASF GitHub Bot commented on DRILL-8281:
---

jnturton merged PR #2627:
URL: https://github.com/apache/drill/pull/2627




> Info schema LIKE with ESCAPE push down bug
> --
>
> Key: DRILL-8281
> URL: https://issues.apache.org/jira/browse/DRILL-8281
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Information Schema
>Affects Versions: 1.20.2
>Reporter: James Turton
>Assignee: James Turton
>Priority: Minor
> Fix For: 1.20.3
>
>
> DRILL-8057 brought in a regression whereby info schema LIKE patterns 
> containing an escape character are not correctly processed. For example if a 
> storage plugin called dfs_foo (note the presence of the special '_') is 
> present then the following query wrongly returns no records.
> {code:java}
> apache drill> show databases where schema_name like 'dfs^_foo.%' escape '^';
> No rows selected (2.305 seconds){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8281) Info schema LIKE with ESCAPE push down bug

2022-08-19 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581968#comment-17581968
 ] 

ASF GitHub Bot commented on DRILL-8281:
---

jnturton opened a new pull request, #2627:
URL: https://github.com/apache/drill/pull/2627

   # [DRILL-8281](https://issues.apache.org/jira/browse/DRILL-8281): Info 
schema LIKE with ESCAPE push down bug
   
   ## Description
   [DRILL-8057](https://issues.apache.org/jira/browse/DRILL-8057) brought in a 
regression whereby info schema LIKE patterns containing an escape character are 
not correctly processed. For example if a storage plugin called dfs_foo (note 
the presence of the special '_') is present then the following query wrongly 
returns no records.
   ```
   apache drill> show databases where schema_name like 'dfs^_foo.%' escape '^';
   No rows selected (2.305 seconds)
   ```
   This PR makes schema path prefix comparison use 
RegexpUtil.SqlPatternInfo#getSimplePatternString when comparing prefixes so 
that escape characters are correctly processed.
   
   ## Documentation
   N/A
   
   ## Testing
   TestInfoSchema#likePatternWithEscapeChar
   




> Info schema LIKE with ESCAPE push down bug
> --
>
> Key: DRILL-8281
> URL: https://issues.apache.org/jira/browse/DRILL-8281
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Information Schema
>Affects Versions: 1.20.2
>Reporter: James Turton
>Assignee: James Turton
>Priority: Minor
> Fix For: 1.20.3
>
>
> DRILL-8057 brought in a regression whereby info schema LIKE patterns 
> containing an escape character are not correctly processed. For example if a 
> storage plugin called dfs_foo (note the presence of the special '_') is 
> present then the following query wrongly returns no records.
> {code:java}
> apache drill> show databases where schema_name like 'dfs^_foo.%' escape '^';
> No rows selected (2.305 seconds){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8280) Cannot ANALYZE files containing non-ASCII column names

2022-08-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17580927#comment-17580927
 ] 

ASF GitHub Bot commented on DRILL-8280:
---

cgivre merged PR #2625:
URL: https://github.com/apache/drill/pull/2625




> Cannot ANALYZE files containing non-ASCII column names 
> ---
>
> Key: DRILL-8280
> URL: https://issues.apache.org/jira/browse/DRILL-8280
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.20.2
>Reporter: James Turton
>Assignee: James Turton
>Priority: Minor
> Fix For: 1.20.3
>
> Attachments: 0_0_0.parquet
>
>
> The attached Parquet file contains a single column named "Käse". If it is 
> saved under /tmp/utf8_col and then the Drill command
> {code:java}
> analyze table dfs.tmp.utf8_col columns none refresh metadata;{code}
> is run then the following error is raised during the execution of the 
> merge_schema function.
> {code:java}
> com.fasterxml.jackson.databind.JsonMappingException: Unrecognized character 
> escape 'x' (code 120)
>  at [Source: 
> (String)"{"type":"tuple_schema","columns":[{"name":"K\xC3\xA4se","type":"VARCHAR","mode":"REQUIRED"}]}";
>  line: 1, column: 47]{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8280) Cannot ANALYZE files containing non-ASCII column names

2022-08-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17580817#comment-17580817
 ] 

ASF GitHub Bot commented on DRILL-8280:
---

jnturton opened a new pull request, #2625:
URL: https://github.com/apache/drill/pull/2625

   # [DRILL-8280](https://issues.apache.org/jira/browse/DRILL-8280): Cannot 
ANALYZE files containing non-ASCII column names
   
   ## Description
   
   The merge_schema function in SchemaFunctions is modified to use UTF-8 string 
parsing so that a column with a name like "Käse" will no longer crash ANALYZE 
TABLE REFRESH METADATA.
   
   ## Documentation
   N/A
   
   ## Testing
   TestMetastoreCommands#testNonAsciiColumnName
   




> Cannot ANALYZE files containing non-ASCII column names 
> ---
>
> Key: DRILL-8280
> URL: https://issues.apache.org/jira/browse/DRILL-8280
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.20.2
>Reporter: James Turton
>Assignee: James Turton
>Priority: Minor
> Fix For: 1.20.3
>
> Attachments: 0_0_0.parquet
>
>
> The attached Parquet file contains a single column named "Käse". If it is 
> saved under /tmp/utf8_col and then the Drill command
> {code:java}
> analyze table dfs.tmp.utf8_col columns none refresh metadata;{code}
> is run then the following error is raised during the execution of the 
> merge_schema function.
> {code:java}
> com.fasterxml.jackson.databind.JsonMappingException: Unrecognized character 
> escape 'x' (code 120)
>  at [Source: 
> (String)"{"type":"tuple_schema","columns":[{"name":"K\xC3\xA4se","type":"VARCHAR","mode":"REQUIRED"}]}";
>  line: 1, column: 47]{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8279) Use thick Phoenix driver

2022-08-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17580353#comment-17580353
 ] 

ASF GitHub Bot commented on DRILL-8279:
---

vvysotskyi merged PR #2624:
URL: https://github.com/apache/drill/pull/2624




> Use thick Phoenix driver
> 
>
> Key: DRILL-8279
> URL: https://issues.apache.org/jira/browse/DRILL-8279
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Vova Vysotskyi
>Assignee: Vova Vysotskyi
>Priority: Blocker
>
> phoenix-queryserver-client shades Avatica classes, so it causes issues when 
> starting Drill and shaded class from phoenix jars is loaded before, so Drill 
> wouldn't be able to start correctly.
> To avoid that, phoenix thick client can be used, it also will improve query 
> performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-7916) Support new plugin installation on the running system

2022-08-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17580338#comment-17580338
 ] 

ASF GitHub Bot commented on DRILL-7916:
---

luocooong closed pull request #2215: DRILL-7916: Support new plugin 
installation on the running system
URL: https://github.com/apache/drill/pull/2215




> Support new plugin installation on the running system
> -
>
> Key: DRILL-7916
> URL: https://issues.apache.org/jira/browse/DRILL-7916
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Cong Luo
>Assignee: Cong Luo
>Priority: Major
>
> Drill does not support the new plugin installation on the running system :
>  # Boot the Drill.
>  # Load plugins to the persistent storage : `pluginStore`.
>  ## Upgrade the plugin if the override file exist 
> (storage-plugins-override.conf). (Done)
>  ## Check and add new plugin with the new release. (To-do)
>  ## If 1 and 2 are not true, then initial all the plugins via loading 
> bootstrap configuration. (Done)
>  # End the Boot.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8279) Use thick Phoenix driver

2022-08-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17580228#comment-17580228
 ] 

ASF GitHub Bot commented on DRILL-8279:
---

vvysotskyi commented on PR #2622:
URL: https://github.com/apache/drill/pull/2622#issuecomment-1216459627

   @luocooong, sorry, I didn't aim to outspeak overbearing, I just wanted to 
express my thoughts on why I didn't start a discussion on the mailing list.
   
   If you have ideas on how to avoid this classpath issue and use 
phoenix-queryserver-client, feel free to suggest them or create a pull request 
with changes.




> Use thick Phoenix driver
> 
>
> Key: DRILL-8279
> URL: https://issues.apache.org/jira/browse/DRILL-8279
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Vova Vysotskyi
>Assignee: Vova Vysotskyi
>Priority: Blocker
>
> phoenix-queryserver-client shades Avatica classes, so it causes issues when 
> starting Drill and shaded class from phoenix jars is loaded before, so Drill 
> wouldn't be able to start correctly.
> To avoid that, phoenix thick client can be used, it also will improve query 
> performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8279) Use thick Phoenix driver

2022-08-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17580225#comment-17580225
 ] 

ASF GitHub Bot commented on DRILL-8279:
---

vvysotskyi opened a new pull request, #2624:
URL: https://github.com/apache/drill/pull/2624

   # [DRILL-8279](https://issues.apache.org/jira/browse/DRILL-8279): Rename 
skip tests property to match maven-surefire property name
   
   ## Description
   Renamed property to skip tests to since they were also running with 
-DskipTests flag.
   
   ## Documentation
   NA
   
   ## Testing
   Now check style job also skips phoenix tests.
   




> Use thick Phoenix driver
> 
>
> Key: DRILL-8279
> URL: https://issues.apache.org/jira/browse/DRILL-8279
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Vova Vysotskyi
>Assignee: Vova Vysotskyi
>Priority: Blocker
>
> phoenix-queryserver-client shades Avatica classes, so it causes issues when 
> starting Drill and shaded class from phoenix jars is loaded before, so Drill 
> wouldn't be able to start correctly.
> To avoid that, phoenix thick client can be used, it also will improve query 
> performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8279) Use thick Phoenix driver

2022-08-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17580208#comment-17580208
 ] 

ASF GitHub Bot commented on DRILL-8279:
---

luocooong commented on PR #2622:
URL: https://github.com/apache/drill/pull/2622#issuecomment-1216419739

   Sorry, I cannot accept the above overbearing statement.
   I don't want to argue any more about this pull request, because that pull 
request was actually merged, so the right to speak is your right.




> Use thick Phoenix driver
> 
>
> Key: DRILL-8279
> URL: https://issues.apache.org/jira/browse/DRILL-8279
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Vova Vysotskyi
>Assignee: Vova Vysotskyi
>Priority: Blocker
>
> phoenix-queryserver-client shades Avatica classes, so it causes issues when 
> starting Drill and shaded class from phoenix jars is loaded before, so Drill 
> wouldn't be able to start correctly.
> To avoid that, phoenix thick client can be used, it also will improve query 
> performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8279) Use thick Phoenix driver

2022-08-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17580202#comment-17580202
 ] 

ASF GitHub Bot commented on DRILL-8279:
---

vvysotskyi commented on PR #2622:
URL: https://github.com/apache/drill/pull/2622#issuecomment-1216406330

   @luocooong, at least I know several people that were already affected by 
this classpath issue, so it was the reason for fixing it quickly.
   
   Jira ticket was created `14 Aug 14:36 EEST`, pull request was opened `14 Aug 
17:53 EEST`, and merged `16 Aug 10:37 EEST`, so this time should be enough to 
participate in discussion or request some changes.
   
   I didn't see a reason for creating a discussion in a mailing list for it, 
since plugin wasn't deleted, it is still functioning and even have improved 
developers experience by removing extra steps to run unit tests, enabling them 
in CI, and removing dependencies on custom repositories.




> Use thick Phoenix driver
> 
>
> Key: DRILL-8279
> URL: https://issues.apache.org/jira/browse/DRILL-8279
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Vova Vysotskyi
>Assignee: Vova Vysotskyi
>Priority: Blocker
>
> phoenix-queryserver-client shades Avatica classes, so it causes issues when 
> starting Drill and shaded class from phoenix jars is loaded before, so Drill 
> wouldn't be able to start correctly.
> To avoid that, phoenix thick client can be used, it also will improve query 
> performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8279) Use thick Phoenix driver

2022-08-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17580189#comment-17580189
 ] 

ASF GitHub Bot commented on DRILL-8279:
---

luocooong commented on PR #2622:
URL: https://github.com/apache/drill/pull/2622#issuecomment-1216372092

   It was a real disappointment today.
   
   As Java developers, all path conflicts will be solved in a way, but we chose 
one of the worst results.
   
   What are we doing in such a hurry?
   
   And I didn't see this pull request open for discussion before submission...
   
   What does this mean for contributors?




> Use thick Phoenix driver
> 
>
> Key: DRILL-8279
> URL: https://issues.apache.org/jira/browse/DRILL-8279
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Vova Vysotskyi
>Assignee: Vova Vysotskyi
>Priority: Blocker
>
> phoenix-queryserver-client shades Avatica classes, so it causes issues when 
> starting Drill and shaded class from phoenix jars is loaded before, so Drill 
> wouldn't be able to start correctly.
> To avoid that, phoenix thick client can be used, it also will improve query 
> performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8279) Use thick Phoenix driver

2022-08-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17580184#comment-17580184
 ] 

ASF GitHub Bot commented on DRILL-8279:
---

vvysotskyi commented on PR #2622:
URL: https://github.com/apache/drill/pull/2622#issuecomment-1216365721

   @luocooong, Drill plugins are still pluggable, so you can provide your own 
implementations if you need.
   
   Phoenix official connectors [1] for such big data tools like Spark, Hive and 
Pig also use thick client, so such decision should be production-suitable. By 
the way I didn't find official connectors that use thin client in that 
repository.
   
   Ideally, if Phoenix thin client shades some libraries, it should also 
relocate them to avoid such issues. I don't see any other correct way for 
resolving this class path conflict in other way. Creating specific module and 
repacking Phoenix there when building Drill is overhead, and didn't guarantee 
that nothing would be broken. Having custom repo that provides relocated 
classes also not good decision since it will make more complex supporting new 
Phoenix versions.
   
   [1] https://github.com/apache/phoenix-connectors




> Use thick Phoenix driver
> 
>
> Key: DRILL-8279
> URL: https://issues.apache.org/jira/browse/DRILL-8279
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Vova Vysotskyi
>Assignee: Vova Vysotskyi
>Priority: Blocker
>
> phoenix-queryserver-client shades Avatica classes, so it causes issues when 
> starting Drill and shaded class from phoenix jars is loaded before, so Drill 
> wouldn't be able to start correctly.
> To avoid that, phoenix thick client can be used, it also will improve query 
> performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8279) Use thick Phoenix driver

2022-08-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17580178#comment-17580178
 ] 

ASF GitHub Bot commented on DRILL-8279:
---

jnturton commented on PR #2622:
URL: https://github.com/apache/drill/pull/2622#issuecomment-1216343681

   @luocooong let's see if we can offer storage plugins for both thick and thin 
drivers then? The classpath conflict bug in the thin driver is very serious, 
even if you have not yet been affected. It was important that some action was 
taken immediately. 
   
   Nothing has been released yet so we still have time to come with a path 
forward that works for everyone. Btw I thought I'd requested your review on 
this this PR as it was opened but I looking at the history I see must have 
failed to use the GH mobile app correctly so I do apologise if I only managed 
to bring this to your attention relatively late.




> Use thick Phoenix driver
> 
>
> Key: DRILL-8279
> URL: https://issues.apache.org/jira/browse/DRILL-8279
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Vova Vysotskyi
>Assignee: Vova Vysotskyi
>Priority: Blocker
>
> phoenix-queryserver-client shades Avatica classes, so it causes issues when 
> starting Drill and shaded class from phoenix jars is loaded before, so Drill 
> wouldn't be able to start correctly.
> To avoid that, phoenix thick client can be used, it also will improve query 
> performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8279) Use thick Phoenix driver

2022-08-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17580142#comment-17580142
 ] 

ASF GitHub Bot commented on DRILL-8279:
---

luocooong commented on PR #2622:
URL: https://github.com/apache/drill/pull/2622#issuecomment-1216295980

   **_Holy cow, that's unbelievable !_**
   Why did this get approved without production experience?
   Why do we need to force use of fat clients?
   There are ways to resolve package conflicts, but this pull request directly 
lets me discard using Drill to query Phoenix !




> Use thick Phoenix driver
> 
>
> Key: DRILL-8279
> URL: https://issues.apache.org/jira/browse/DRILL-8279
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Vova Vysotskyi
>Assignee: Vova Vysotskyi
>Priority: Blocker
>
> phoenix-queryserver-client shades Avatica classes, so it causes issues when 
> starting Drill and shaded class from phoenix jars is loaded before, so Drill 
> wouldn't be able to start correctly.
> To avoid that, phoenix thick client can be used, it also will improve query 
> performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8279) Use thick Phoenix driver

2022-08-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17580109#comment-17580109
 ] 

ASF GitHub Bot commented on DRILL-8279:
---

vvysotskyi merged PR #2622:
URL: https://github.com/apache/drill/pull/2622




> Use thick Phoenix driver
> 
>
> Key: DRILL-8279
> URL: https://issues.apache.org/jira/browse/DRILL-8279
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Vova Vysotskyi
>Assignee: Vova Vysotskyi
>Priority: Blocker
>
> phoenix-queryserver-client shades Avatica classes, so it causes issues when 
> starting Drill and shaded class from phoenix jars is loaded before, so Drill 
> wouldn't be able to start correctly.
> To avoid that, phoenix thick client can be used, it also will improve query 
> performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


<    5   6   7   8   9   10   11   12   13   14   >