[jira] [Created] (DRILL-8341) Add Scanned Plugin List to Sys Profiles Table

2022-10-21 Thread Charles Givre (Jira)
Charles Givre created DRILL-8341:


 Summary: Add Scanned Plugin List to Sys Profiles Table
 Key: DRILL-8341
 URL: https://issues.apache.org/jira/browse/DRILL-8341
 Project: Apache Drill
  Issue Type: Improvement
  Components: Execution - Monitoring
Affects Versions: 1.20.2
Reporter: Charles Givre
Assignee: Charles Givre
 Fix For: 2.0.0


In DRILL-8322, [~dzamo] added the list of scanned plugins to the query 
profiles.  This information is extremely useful in query analysis.  This minor 
PR adds this same information to the sys.profiles table. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (DRILL-8340) Add Additional Date Manipulation Functions (Part 1)

2022-10-20 Thread Charles Givre (Jira)
Charles Givre created DRILL-8340:


 Summary: Add Additional Date Manipulation Functions (Part 1)
 Key: DRILL-8340
 URL: https://issues.apache.org/jira/browse/DRILL-8340
 Project: Apache Drill
  Issue Type: Improvement
  Components: Functions - Drill
Affects Versions: 1.20.2
Reporter: Charles Givre
Assignee: Charles Givre
 Fix For: 2.0.0


This PR adds several utility functions to facilitate working with dates and 
times.  These are modeled after the date/time functionality in MySQL.

Specifically this adds:
 * YEARWEEK():  Returns an int of year week. IE (202002)
 * TIME_STAMP():  Converts most anything that looks like a date 
string into a timestamp.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (DRILL-8335) Add Ability to Query GoogleSheets Tabs by Index

2022-10-14 Thread Charles Givre (Jira)
Charles Givre created DRILL-8335:


 Summary: Add Ability to Query GoogleSheets Tabs by Index
 Key: DRILL-8335
 URL: https://issues.apache.org/jira/browse/DRILL-8335
 Project: Apache Drill
  Issue Type: Improvement
  Components: Storage - GoogleSheets
Affects Versions: 1.20.2
Reporter: Charles Givre
Assignee: Charles Givre
 Fix For: 2.0.0


The GoogleSheets plugin does not provide a way for a user to query data if they 
do not know the available tab names.  This adds the ability to query by index 
of the tabs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (DRILL-8333) Fix Resource Leaks in HTTP Plugin

2022-10-13 Thread Charles Givre (Jira)
Charles Givre created DRILL-8333:


 Summary: Fix Resource Leaks in HTTP Plugin
 Key: DRILL-8333
 URL: https://issues.apache.org/jira/browse/DRILL-8333
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - HTTP
Affects Versions: 1.20.2
Reporter: Charles Givre
Assignee: Charles Givre
 Fix For: 1.20.3


The HTTP plugin has several methods which collect a `ResponseBody` object but 
do not close these objects.  This is causing a resource leak and will cause 
Drill to fail in the event that queries fire off many API calls. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (DRILL-8330) Convert ESRI Shape File Reader to EVF2

2022-10-04 Thread Charles Givre (Jira)
Charles Givre created DRILL-8330:


 Summary: Convert ESRI Shape File Reader to EVF2 
 Key: DRILL-8330
 URL: https://issues.apache.org/jira/browse/DRILL-8330
 Project: Apache Drill
  Issue Type: Task
  Components: Format - ESRI
Affects Versions: 1.20.2
Reporter: Charles Givre
Assignee: Charles Givre
 Fix For: 2.0.0


Converts the ESRI Shape File reader to EVF V2.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (DRILL-8329) Close HTTP Caching Resources

2022-10-03 Thread Charles Givre (Jira)
Charles Givre created DRILL-8329:


 Summary: Close HTTP Caching Resources 
 Key: DRILL-8329
 URL: https://issues.apache.org/jira/browse/DRILL-8329
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - HTTP
Affects Versions: 1.20.2
Reporter: Charles Givre
Assignee: Charles Givre
 Fix For: 1.20.3


The HTTP plugin has the ability to cache API responses.  However, the storage 
plugin was not closing the connection to the file cache.  This minor PR fixes 
that. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (DRILL-8328) HTTP UDF Not Resolving Storage Aliases

2022-10-02 Thread Charles Givre (Jira)
Charles Givre created DRILL-8328:


 Summary: HTTP UDF Not Resolving Storage Aliases
 Key: DRILL-8328
 URL: https://issues.apache.org/jira/browse/DRILL-8328
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - HTTP
Affects Versions: 1.20.0
Reporter: Charles Givre
Assignee: Charles Givre
 Fix For: 1.20.3


The http_request function currently does not resolve plugin aliases correctly.  
This PR fixes that issue. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (DRILL-8327) GoogleSheets not Reporting Schemata to Info_Schema

2022-10-01 Thread Charles Givre (Jira)
Charles Givre created DRILL-8327:


 Summary: GoogleSheets not Reporting Schemata to Info_Schema
 Key: DRILL-8327
 URL: https://issues.apache.org/jira/browse/DRILL-8327
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - GoogleSheets
Affects Versions: 2.0.0
Reporter: Charles Givre
Assignee: Charles Givre
 Fix For: 2.0.0


The GoogleSheets (GS) plugin was not reporting the available documents to the 
info schema.  This PR makes some modifications so that users can determine 
which documents are available via the information schema. 

The GS plugin does not report the tabs as tables to the information schema 
because that can cause Drill to exceed Google's rate quota.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (DRILL-8325) Convert PDF Format Plugin to EVF V2

2022-09-29 Thread Charles Givre (Jira)
Charles Givre created DRILL-8325:


 Summary: Convert PDF Format Plugin to EVF V2
 Key: DRILL-8325
 URL: https://issues.apache.org/jira/browse/DRILL-8325
 Project: Apache Drill
  Issue Type: Task
  Components: Format - PDF
Affects Versions: 1.20.2
Reporter: Charles Givre
Assignee: Charles Givre
 Fix For: 2.0.0


Converts the PDF Format Reader to EVF V2. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (DRILL-8320) Prevent Infinite Pagination for Index Paginator

2022-09-27 Thread Charles Givre (Jira)
Charles Givre created DRILL-8320:


 Summary: Prevent Infinite Pagination for Index Paginator
 Key: DRILL-8320
 URL: https://issues.apache.org/jira/browse/DRILL-8320
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - HTTP
Affects Versions: 1.20.2
Reporter: Charles Givre
Assignee: Charles Givre
 Fix For: 2.0.0


In some cases that use keyset/index pagination, if the API does not have a 
boolean column that indicates when to stop, Drill will send requests until the 
API stops returning data.  This PR fixes this by making the boolean parameter 
optional.  

If that parameter is not present, if the index result is blank or the same as 
the previous request, pagination will end.

Note, if the pagination parameters are buried in nested objects, this cannot be 
configured with a dataPath.  If the user uses a dataPath, pagination will stop 
at the first page.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (DRILL-8317) Convert LogRegex Format Plugin to EVF V2

2022-09-24 Thread Charles Givre (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-8317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Givre resolved DRILL-8317.
--
Resolution: Done

> Convert LogRegex Format Plugin to EVF V2
> 
>
> Key: DRILL-8317
> URL: https://issues.apache.org/jira/browse/DRILL-8317
> Project: Apache Drill
>  Issue Type: Task
>  Components: Format - Log Reader
>Affects Versions: 1.20.2
>    Reporter: Charles Givre
>    Assignee: Charles Givre
>Priority: Major
> Fix For: 2.0.0
>
>
> Converts the existing logRegex reader to EVF V2.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (DRILL-8317) Convert LogRegex Format Plugin to EVF V2

2022-09-22 Thread Charles Givre (Jira)
Charles Givre created DRILL-8317:


 Summary: Convert LogRegex Format Plugin to EVF V2
 Key: DRILL-8317
 URL: https://issues.apache.org/jira/browse/DRILL-8317
 Project: Apache Drill
  Issue Type: Task
  Components: Format - Log Reader
Affects Versions: 1.20.2
Reporter: Charles Givre
Assignee: Charles Givre
 Fix For: 2.0.0


Converts the existing logRegex reader to EVF V2.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (DRILL-8316) Convert Druid Storage Plugin to EVF & V2 JSON Reader

2022-09-20 Thread Charles Givre (Jira)
Charles Givre created DRILL-8316:


 Summary: Convert Druid Storage Plugin to EVF & V2 JSON Reader
 Key: DRILL-8316
 URL: https://issues.apache.org/jira/browse/DRILL-8316
 Project: Apache Drill
  Issue Type: Improvement
  Components: Storage - Druid
Affects Versions: 1.20.2
Reporter: Charles Givre
Assignee: Charles Givre
 Fix For: 2.0.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (DRILL-8315) Convert SAS Format Plugin to EVF V2

2022-09-20 Thread Charles Givre (Jira)
Charles Givre created DRILL-8315:


 Summary: Convert SAS Format Plugin to EVF V2
 Key: DRILL-8315
 URL: https://issues.apache.org/jira/browse/DRILL-8315
 Project: Apache Drill
  Issue Type: Improvement
  Components: Format - SAS
Affects Versions: 1.20.2, 1.20.1
Reporter: Charles Givre
Assignee: Charles Givre
 Fix For: 2.0.0


Convert the SAS Format Plugin to EVF V2.  No user facing changes. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (DRILL-8312) Convert Format Plugins to EVF V2

2022-09-19 Thread Charles Givre (Jira)
Charles Givre created DRILL-8312:


 Summary: Convert Format Plugins to EVF V2
 Key: DRILL-8312
 URL: https://issues.apache.org/jira/browse/DRILL-8312
 Project: Apache Drill
  Issue Type: Improvement
Affects Versions: 1.20.2
Reporter: Charles Givre
 Fix For: 2.0.0


This is a blanket ticket to convert all format plugins to EVF V2.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (DRILL-8159) Upgrade HTTPD reader to use EVF V2

2022-09-19 Thread Charles Givre (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-8159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Givre resolved DRILL-8159.
--
Resolution: Done

> Upgrade HTTPD reader to use EVF V2
> --
>
> Key: DRILL-8159
> URL: https://issues.apache.org/jira/browse/DRILL-8159
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>
> Continuation of work originally in the DRILL-8085 PR.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (DRILL-8311) Convert SPSS Format Plugin to EVF V2

2022-09-19 Thread Charles Givre (Jira)
Charles Givre created DRILL-8311:


 Summary: Convert SPSS Format Plugin to EVF V2
 Key: DRILL-8311
 URL: https://issues.apache.org/jira/browse/DRILL-8311
 Project: Apache Drill
  Issue Type: Improvement
  Components: Storage - SPSS
Affects Versions: 1.20.2
Reporter: Charles Givre
Assignee: Charles Givre
 Fix For: 2.0.0


This PR converts the SPSS format plugin to use EVF V2. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (DRILL-8310) Convert Syslog Format to EVF V2

2022-09-19 Thread Charles Givre (Jira)
Charles Givre created DRILL-8310:


 Summary: Convert Syslog Format to EVF V2
 Key: DRILL-8310
 URL: https://issues.apache.org/jira/browse/DRILL-8310
 Project: Apache Drill
  Issue Type: Improvement
  Components: Storage - Syslog
Affects Versions: 1.20.2
Reporter: Charles Givre
Assignee: Charles Givre
 Fix For: 2.0.0


This PR proposes to convert the syslog to use EVF V2.   No user facing changes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (DRILL-8289) Add Threat Hunting Functions

2022-09-12 Thread Charles Givre (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-8289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Givre resolved DRILL-8289.
--
Resolution: Done

> Add Threat Hunting Functions
> 
>
> Key: DRILL-8289
> URL: https://issues.apache.org/jira/browse/DRILL-8289
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Functions - Drill
>Affects Versions: 2.0.0
>    Reporter: Charles Givre
>    Assignee: Charles Givre
>Priority: Major
> Fix For: 2.0.0
>
>
> # Threat Hunting Functions
> These functions are useful for doing threat hunting with Apache Drill. These 
> were inspired by huntlib.[1]
> The functions are: 
> * `punctuation_pattern()`: Extracts the pattern of punctuation in 
> text.
> * `entropy()`: This function calculates the Shannon Entropy of a 
> given string of text.
> * `entropyPerByte()`: This function calculates the Shannon Entropy of 
> a given string of text, normed for the string length.
> [1]: https://github.com/target/huntlib



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (DRILL-8305) Add Implicit Fields to Google Sheets Reader

2022-09-11 Thread Charles Givre (Jira)
Charles Givre created DRILL-8305:


 Summary: Add Implicit Fields to Google Sheets Reader
 Key: DRILL-8305
 URL: https://issues.apache.org/jira/browse/DRILL-8305
 Project: Apache Drill
  Issue Type: Improvement
  Components: Storage - GoogleSheets
Affects Versions: 2.0.0
Reporter: Charles Givre
Assignee: Charles Givre
 Fix For: 2.0.0


GoogleSheets needs additional metadata fields to access the available data.  
This PR adds framework for implicit metadata fields.  

This PR also adds the _sheets field which lists the available tabs within a 
Google Sheets document.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: UTF-8 in Drill

2022-09-08 Thread Charles Givre
James, 
Thanks for sending.  It does seem like it makes the most sense to standardize 
around UTF-8, especially if there is a way for storage plugins to support other 
character sets.
Best, 
-- C

> On Sep 8, 2022, at 1:25 PM, James Turton  wrote:
> 
> Hi folks!
> 
> May I bring DRILL-8301 to our attention? Presently Drill is not always 
> explicit about the en/decoding of its characters. The mentioned Jira and its 
> associated PR explicitly program in an assumption of UTF-8 in places where 
> Drill currently selects whatever the JVM has been configured to default to 
> (typically UTF-8).
> 
> I'm in favour of this standardisation and the simplicity it brings, given the 
> extent to which "the world chose UTF-8". It would still be possible, after 
> standardising on UTF-8, for storage plugins to support different character 
> encodings if they wanted to.
> 
> If you have any concerns or comments please visit the Jira 
>  or the PR 
>  over the next week and share them 
> there.
> 
> Regards
> James



[jira] [Created] (DRILL-8291) Allow case sensitive Filters in HTTP Plugin

2022-09-03 Thread Charles Givre (Jira)
Charles Givre created DRILL-8291:


 Summary: Allow case sensitive Filters in HTTP Plugin
 Key: DRILL-8291
 URL: https://issues.apache.org/jira/browse/DRILL-8291
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - HTTP
Affects Versions: 1.20.2
Reporter: Charles Givre
Assignee: Charles Givre
 Fix For: 1.20.3


Some APIs will reject filter pushdowns if they are not in the correct case.  
This PR adds a config option `caseSensitiveFilters` to the API config and when 
set to true, preserves the case of the filters pushed down. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (DRILL-8289) Add Threat Hunting Functions

2022-08-28 Thread Charles Givre (Jira)
Charles Givre created DRILL-8289:


 Summary: Add Threat Hunting Functions
 Key: DRILL-8289
 URL: https://issues.apache.org/jira/browse/DRILL-8289
 Project: Apache Drill
  Issue Type: New Feature
  Components: Functions - Drill
Affects Versions: 2.0.0
Reporter: Charles Givre
Assignee: Charles Givre
 Fix For: 2.0.0


# Threat Hunting Functions
These functions are useful for doing threat hunting with Apache Drill. These 
were inspired by huntlib.[1]

The functions are: 
* `punctuation_pattern()`: Extracts the pattern of punctuation in text.
* `entropy()`: This function calculates the Shannon Entropy of a given 
string of text.
* `entropyPerByte()`: This function calculates the Shannon Entropy of a 
given string of text, normed for the string length.

[1]: https://github.com/target/huntlib



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (DRILL-8288) Null Columns not being Written to GoogleSheets

2022-08-28 Thread Charles Givre (Jira)
Charles Givre created DRILL-8288:


 Summary: Null Columns not being Written to GoogleSheets
 Key: DRILL-8288
 URL: https://issues.apache.org/jira/browse/DRILL-8288
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - GoogleSheets
Affects Versions: 2.0.0
Reporter: Charles Givre
Assignee: Charles Givre
 Fix For: 2.0.0


When writing to GoogleSheets, null columns are not written which causes wrong 
data. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (DRILL-8287) Add Support for Keyset Based Pagination

2022-08-25 Thread Charles Givre (Jira)
Charles Givre created DRILL-8287:


 Summary: Add Support for Keyset Based Pagination
 Key: DRILL-8287
 URL: https://issues.apache.org/jira/browse/DRILL-8287
 Project: Apache Drill
  Issue Type: New Feature
  Components: Storage - HTTP
Affects Versions: 1.20.2
Reporter: Charles Givre
Assignee: Charles Givre
 Fix For: 2.0.0


Some APIs such as HubSpot use values in the result set to indicate whether 
there are additional pages.  This PR adds support for this kind of pagination.  
Note that current implementation only works for JSON based APIs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (DRILL-8286) GoogleSheets StoragePlugin displaying ClientID and ClientSecret in Config

2022-08-25 Thread Charles Givre (Jira)
Charles Givre created DRILL-8286:


 Summary: GoogleSheets StoragePlugin displaying ClientID and 
ClientSecret in Config
 Key: DRILL-8286
 URL: https://issues.apache.org/jira/browse/DRILL-8286
 Project: Apache Drill
  Issue Type: Bug
Reporter: Charles Givre
Assignee: Charles Givre


The GoogleSheets storage plugin is rendering the `clientID` and `clientSecret` 
in the config body instead of in the credential provider.

This minor PR fixes that.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: lgtm.com shutting down at end of 2022

2022-08-23 Thread Charles Givre
Concur +1. Let's get rid of it. 


> On Aug 23, 2022, at 9:59 AM, James Turton  wrote:
> 
> +1, I can't see any reason to hold off.
> 
> On 2022/08/23 15:29, PJ Fanning wrote:
>> Hi everyone,
>> lgtm.com checks are part of the CI build. lgtm.com is shutting down.
>> 
>> See news item on:
>> https://lgtm.com/projects/g/apache/drill/rev/pr-15878f95181fee59db0dd753c6939e4612066d71
>> 
>> They recommend using Github codeql and we already use that.
>> 
>> Would it be ok to remove lgtm part of the CI build? It would speed up
>> the builds quite a bit.
>> 
>> Regards,
>> PJ
> 



Re: Are any Drill Devs attending ApacheCon in NOLA? Hack Drill + Daffodil ?

2022-08-16 Thread Charles Givre
Hi Mike, 
Thanks for reaching out.  I'm the PMC chair and have been following Apache 
Daffodil for some time.  IMHO it would be a great integration.  With that 
said... I'm on vacation this week but would love to speak to you about it.   If 
you'd be up for a zoom call sometime, please email me at 
char...@datadistillr.com  and we can find a 
time to chat.
Best,
-- C



> On Aug 16, 2022, at 10:05 PM, Mike Beckerle  wrote:
> 
> I am wondering if some of the Apache Drill devs are going to be at
> ApacheCon in October.
> 
> I am hoping to do some hacking of Drill + Apache Daffodil to see how
> hard/easy an integration would be.
> 
> The notion that given a DFDL schema you can immediately query the data, is
> super attractive and our simple Daffodil Command line tool would be greatly
> enhanced with a Drill-based query capability.
> 
> Will anyone also be at ApacheCon to help me hack this?
> 
> Mike Beckerle
> Apache Daffodil PMC | daffodil.apache.org
> OGF DFDL Workgroup Co-Chair | www.ogf.org/ogf/doku.php/standards/dfdl/dfdl
> Owl Cyber Defense | www.owlcyberdefense.com



[jira] [Created] (DRILL-8276) Add Support for User Translation for Splunk

2022-08-07 Thread Charles Givre (Jira)
Charles Givre created DRILL-8276:


 Summary: Add Support for User Translation for Splunk
 Key: DRILL-8276
 URL: https://issues.apache.org/jira/browse/DRILL-8276
 Project: Apache Drill
  Issue Type: Task
  Components: Storage - Other
Affects Versions: 1.20.2
Reporter: Charles Givre
Assignee: Charles Givre
 Fix For: 2.0.0


This PR adds support for user translation to Splunk.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] Release Apache Drill 1.20.2 - RC1

2022-08-01 Thread Charles Givre
Downloaded RC1.  Unzipped tarball, ran test queries.  LGTM +1.

Vote +1 (binding)

> On Aug 1, 2022, at 6:33 AM, James Turton  wrote:
> 
> I'd like to propose the second release candidate (RC1) of Apache Drill, 
> version 1.20.2. The release candidate covers a total of 23 resolved Jiras 
> since 1.20.1 [1]. Thanks to everyone who contributed to this release and to 
> Jingchuan Hu for his help in preparing the release.
> 
> The tarball artifacts are hosted at [2] and the maven artifacts are hosted at 
> [3]. This release candidate is based on commit 
> 3b924b778990c41bf2c15a917097c038a10faf5d located at [4].
> 
> Please download and try out the release.
> 
> [ ] +1
> [ ] +0
> [ ] -1
> 
> ✅ Launch Hadoop 3 build under Java 8 using drill-embedded on Linux, check 
> sys.version, run a CTAS, check the web UI.
> ✅ Launch Hadoop 2 build under Java 8 using drill-embedded on Linux, check 
> sys.version, run a CTAS.
> ✅ Launch Hadoop 3 build under Java 8 using drill-embedded on Windows 10, run 
> a CTAS.
> ✅ Check which Hadoop and Netty jars are present in the Hadoop 3 build.
> ✅ Check which Hadoop and Netty jars are present in the Hadoop 2 build.
> 
> I vote +1 (binding).
> 
> [1] 
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313820=12351742
>  
> [2] https://dist.apache.org/repos/dist/dev/drill/1.20.2-rc1/
> [3] https://repository.apache.org/content/repositories/orgapachedrill-1101/
> [4] https://github.com/jnturton/drill/commits/drill-1.20.2



[jira] [Created] (DRILL-8271) Make Storage and Format Config Case Insensitive

2022-07-25 Thread Charles Givre (Jira)
Charles Givre created DRILL-8271:


 Summary: Make Storage and Format Config Case Insensitive
 Key: DRILL-8271
 URL: https://issues.apache.org/jira/browse/DRILL-8271
 Project: Apache Drill
  Issue Type: Task
Reporter: Charles Givre






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (DRILL-8270) Delete absolete zookeeper patch (tech debt)

2022-07-25 Thread Charles Givre (Jira)
Charles Givre created DRILL-8270:


 Summary: Delete absolete zookeeper patch (tech debt)
 Key: DRILL-8270
 URL: https://issues.apache.org/jira/browse/DRILL-8270
 Project: Apache Drill
  Issue Type: Task
Reporter: Charles Givre






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] Release Apache Drill 1.20.2 - RC0

2022-07-21 Thread Charles Givre
Downloaded built release.  Ran various queries. 
+1 from me. (Binding)

> On Jul 21, 2022, at 10:47 AM, James Turton  wrote:
> 
> This is just a resend that attempts to fix the mangled formatting in the 
> first attempt.
> 
> I'd like to propose the first release candidate (RC0) of Apache Drill, 
> version 1.20.2. The release candidate covers a total of 20 resolved Jiras 
> since 1.20.1 [1]. Thanks to everyone who contributed to this release and to 
> Jingchuan Hu for his help in preparing the release.
> 
> The tarball artifacts are hosted at [2] and the maven artifacts are hosted at 
> [3]. This release candidate is based on commit 
> 1ff69babc1a61b1136f01a20f8f28dfe0f1d9ce8 located at [4].
> 
> Please download and try out the release.
> 
> [ ] +1
> [ ] +0
> [ ] -1
> 
> Here's my vote: +1
> 
> [1] 
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313820=12351742
> [2] https://dist.apache.org/repos/dist/dev/drill/1.20.2-rc0/
> [3] https://repository.apache.org/content/repositories/orgapachedrill-1099/
> [4] https://github.com/jnturton/drill/commits/drill-1.20.2|
> 
> 
> On 2022/07/21 16:39, James Turton wrote:
>> |Hi all, I'd like to propose the first release candidate (RC0) of Apache 
>> Drill, version 1.20.2. The release candidate covers a total of 20 resolved 
>> Jiras since 1.20.1 [1]. Thanks to everyone who contributed to this release 
>> and to Jingchuan Hu for his help in preparing the release. The tarball 
>> artifacts are hosted at [2] and the maven artifacts are hosted at [3]. This 
>> release candidate is based on commit 
>> 1ff69babc1a61b1136f01a20f8f28dfe0f1d9ce8 located at [4]. Please download and 
>> try out the release. [ ] +1 [ ] +0 [ ] -1 Here's my vote: +1 [1] 
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313820=12351742
>>  [2] https://dist.apache.org/repos/dist/dev/drill/1.20.2-rc0/ [3] 
>> https://repository.apache.org/content/repositories/orgapachedrill-1099/ [4] 
>> https://github.com/jnturton/drill/commits/drill-1.20.2|
> 



Re: [DISCUSS] Drill 1.20.2 bugfix release

2022-07-08 Thread Charles Givre
Hey James, 
Thanks for doing this.  There are a few CI and CVE related PRs that we might 
want to think about including such as the one below.  Also I seem to remember 
that Vova made a fix to the Calcite fork that fixed a bug relating to 
Elasticsearch.  I know he's working on some other things, but do you think it 
might be worth including that in 1.20.2?

Best,
-- C

https://github.com/apache/drill/pull/2581 




> On Jul 8, 2022, at 3:38 AM, James Turton  wrote:
> 
> Hi Drillers
> 
> It's been about seven weeks since the last bug fix release and it is time to 
> do the next one. I volunteer to be the release manager with the kind 
> assistance of Jingchuan Hu who has already been busy backporting fixes for us 
> [1] . If there are any issues on which work is in progress, that you feel we 
> *must* include in the release, please post in reply to this thread. Otherwise 
> please indicate that you are in favour of freezing the stable branch at its 
> current height [2].
> 
> [1] https://github.com/apache/drill/pull/2584
> [2] https://github.com/apache/drill/commits/1.20
> 
> Thank you
> James Turton



[jira] [Created] (DRILL-8244) HTTP_Request Not Passing Down Config Variable

2022-06-08 Thread Charles Givre (Jira)
Charles Givre created DRILL-8244:


 Summary: HTTP_Request Not Passing Down Config Variable
 Key: DRILL-8244
 URL: https://issues.apache.org/jira/browse/DRILL-8244
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Other
Affects Versions: 1.20.1
Reporter: Charles Givre
Assignee: Charles Givre
 Fix For: 2.0.0


The http_request UDF was not passing down the provided schema and other config 
parameters down to the jsonLoader.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (DRILL-8243) Move JSON Config Options Out of HTTP Plugin

2022-06-03 Thread Charles Givre (Jira)
Charles Givre created DRILL-8243:


 Summary: Move JSON Config Options Out of HTTP Plugin
 Key: DRILL-8243
 URL: https://issues.apache.org/jira/browse/DRILL-8243
 Project: Apache Drill
  Issue Type: Improvement
  Components: Storage - JSON
Affects Versions: 1.20.1
Reporter: Charles Givre
Assignee: Charles Givre
 Fix For: 2.0.0


As part of DRILL-8241, this PR moves the json configuration options out of the 
HTTP plugin and creates a file which can be used for other plugins that consume 
JSON data. 

The idea being that all such plugins, like Druid, ES, Mongo, can set the same 
JSON options for each plugin instance w/o having to duplicate config code.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


Re: [GitHub] [drill] jnturton commented on pull request #2569: DRILL-8199: Convert Excel EVF1 to EVF2

2022-06-01 Thread Charles Givre
James, 
Is this only happening with the provided schema?   If so... that's interesting, 
and probably a bug somewhere.
--C

> On Jun 1, 2022, at 12:03 PM, GitBox  wrote:
> 
> 
> jnturton commented on PR #2569:
> URL: https://github.com/apache/drill/pull/2569#issuecomment-1143798587
> 
>   @cgivre or @luocooong, something that the upgrade to EVF 2 broke in this 
> plugin is the unit test TestExcelFormat::testStarWithProvidedSchema. Now the 
> metadata columns which should be excluded from star query results are present 
> and making this test fail. For this reason I've left this PR in draft status. 
> Is there something obvious that I've done wrong?
> 
> 
> -- 
> This is an automated message from the Apache Git Service.
> To respond to the message, please log on to GitHub and use the
> URL above to go to the specific comment.
> 
> To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org
> 
> For queries about this service, please contact Infrastructure at:
> us...@infra.apache.org
> 



[jira] [Created] (DRILL-8241) Remove Deprecated JSON Reader

2022-05-29 Thread Charles Givre (Jira)
Charles Givre created DRILL-8241:


 Summary: Remove Deprecated JSON Reader
 Key: DRILL-8241
 URL: https://issues.apache.org/jira/browse/DRILL-8241
 Project: Apache Drill
  Issue Type: Improvement
  Components: Storage - JSON
Affects Versions: 1.20.1
Reporter: Charles Givre
 Fix For: 2.0.0


This is a master ticket to remove the deprecated v1 JSON reader from Drill.  
This JSON reader is used in several places and removing it will ensure 
consistent behavior across all data sources. 

The V2, EVF based JSON reader has several advantages, including the possibility 
of schema provisioning, limit pushdowns and others.

Here are the tasks which need to be completed to fully remove the v1 JSON 
reader.
 * Convert the convert_fromJSON functions to V2
 * Convert the Druid Storage Plugin to V2
 * Convert MongoDB Storage Plugin to V2.  (Note the MongoDB plugin uses an 
EVF-based BSON reader as well as the V1 JSON reader)
 * Remove all V1-based unit tests
 * Migrate the JsonOptions from the HTTP Storage Plugin to global location to 
allow other plugins and users of JSON to set JSON configuration at a more 
granular level.
 * Remove extraneous configuration options.

 

 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (DRILL-8239) Convert JSON UDF to EVF

2022-05-28 Thread Charles Givre (Jira)
Charles Givre created DRILL-8239:


 Summary: Convert JSON UDF to EVF
 Key: DRILL-8239
 URL: https://issues.apache.org/jira/browse/DRILL-8239
 Project: Apache Drill
  Issue Type: Improvement
  Components: Execution - Data Types
Affects Versions: 1.20.1
Reporter: Charles Givre
Assignee: Charles Givre
 Fix For: 2.0.0


In an effort to fully deprecate the old JsonReader, this PR converts the 
convert_from JSON UDF to EVF.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (DRILL-8235) Add Storage Plugin for Google Sheets

2022-05-25 Thread Charles Givre (Jira)
Charles Givre created DRILL-8235:


 Summary: Add Storage Plugin for Google Sheets
 Key: DRILL-8235
 URL: https://issues.apache.org/jira/browse/DRILL-8235
 Project: Apache Drill
  Issue Type: Improvement
  Components: Storage - Other
Affects Versions: 1.20.1
Reporter: Charles Givre
Assignee: Charles Givre
 Fix For: 2.0.0


Google Sheets is a very commonly used data source among business users.  Presto 
and other query engines do include integrations with Google Sheets and so it 
would be useful for Drill to add this functionality. 

The proposed plugin supports both reading and writing to Google Sheets. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (DRILL-8229) Add Parameter to Skip Malformed Records to HTTP UDF

2022-05-19 Thread Charles Givre (Jira)
Charles Givre created DRILL-8229:


 Summary:  Add Parameter to Skip Malformed Records to HTTP UDF
 Key: DRILL-8229
 URL: https://issues.apache.org/jira/browse/DRILL-8229
 Project: Apache Drill
  Issue Type: Improvement
  Components: Functions - Drill
Affects Versions: 1.20.1
Reporter: Charles Givre
Assignee: Charles Givre
 Fix For: 1.20.2


The http_get and http_request UDFs were not using the JSON parameter to skip 
malformed records.  This PR fixes that.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (DRILL-8220) Add User Translation Support for OAuth Enabled Plugins

2022-05-11 Thread Charles Givre (Jira)
Charles Givre created DRILL-8220:


 Summary: Add User Translation Support for OAuth Enabled Plugins
 Key: DRILL-8220
 URL: https://issues.apache.org/jira/browse/DRILL-8220
 Project: Apache Drill
  Issue Type: Improvement
  Components: Storage - Other
Affects Versions: 1.20.0
Reporter: Charles Givre
Assignee: Charles Givre
 Fix For: 2.0.0


This PR adds support for individual users to provide their own credentials for 
plugins that use OAuth 2.0 as a means of authorization and authentication.   
Currently, only the HTTP storage plugin supports OAuth, however, this PR moves 
some of the core features out of the HTTP plugin so that other plugins can 
access this. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (DRILL-8217) Credential Resources Page Throws an Error With Empty Lists.

2022-05-08 Thread Charles Givre (Jira)
Charles Givre created DRILL-8217:


 Summary: Credential Resources Page Throws an Error With Empty 
Lists.
 Key: DRILL-8217
 URL: https://issues.apache.org/jira/browse/DRILL-8217
 Project: Apache Drill
  Issue Type: Bug
  Components: Web Server
Affects Versions: 1.20.0
Reporter: Charles Givre
Assignee: Charles Givre
 Fix For: 2.0.0


If a user does not have any plugins enabled with USER_TRANSLATION on, the 
Credentials page will throw an exception.  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (DRILL-8215) Remove SecurityContext from PluginConfigWrapper

2022-05-08 Thread Charles Givre (Jira)
Charles Givre created DRILL-8215:


 Summary: Remove SecurityContext from PluginConfigWrapper
 Key: DRILL-8215
 URL: https://issues.apache.org/jira/browse/DRILL-8215
 Project: Apache Drill
  Issue Type: Bug
  Components: Web Server
Affects Versions: 1.20.0
Reporter: Charles Givre


Drill-8155 introduced a bug in the PluginConfigWrapper by including the 
SecurityContext in it.   This seemed to cause SerDe issues.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (DRILL-8207) Fix Username Typo in JDBC SerDe

2022-05-05 Thread Charles Givre (Jira)
Charles Givre created DRILL-8207:


 Summary: Fix Username Typo in JDBC SerDe
 Key: DRILL-8207
 URL: https://issues.apache.org/jira/browse/DRILL-8207
 Project: Apache Drill
  Issue Type: Bug
Reporter: Charles Givre
Assignee: Charles Givre


Fixes SerDe error with default JDBC plugin. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (DRILL-8205) Inline Schema Not Being Passed to HTTP Reader.

2022-05-02 Thread Charles Givre (Jira)
Charles Givre created DRILL-8205:


 Summary: Inline Schema Not Being Passed to HTTP Reader.
 Key: DRILL-8205
 URL: https://issues.apache.org/jira/browse/DRILL-8205
 Project: Apache Drill
  Issue Type: Bug
Reporter: Charles Givre






--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (DRILL-8204) Allow Provided Schema for HTTP Plugin in JSON Mode

2022-05-01 Thread Charles Givre (Jira)
Charles Givre created DRILL-8204:


 Summary: Allow Provided Schema for HTTP Plugin in JSON Mode
 Key: DRILL-8204
 URL: https://issues.apache.org/jira/browse/DRILL-8204
 Project: Apache Drill
  Issue Type: Improvement
  Components: Storage - Other
Affects Versions: 1.20.0
Reporter: Charles Givre
Assignee: Charles Givre
 Fix For: 2.0.0


One of the challenges of querying APIs is inconsistent data. Drill allows you 
to provide a schema for individual endpoints. You can do this in one of two 
ways: either by 
providing a serialized TupleMetadata of the desired schema. This is an advanced 
functionality and should only be used by advanced Drill users.

The schema provisioning currently supports complex types of Arrays and Maps at 
any nesting level.

### Example Schema Provisioning:
```json
"jsonOptions": {
"providedSchema": [
{
"fieldName": "int_field",
"fieldType": "bigint"
}, {
"fieldName": "jsonField",
"fieldType": "varchar",
"properties": {
"drill.json-mode":"json"
}
},{
// Array field
"fieldName": "stringField",
"fieldType": "varchar",
"isArray": true
}, {
// Map field
"fieldName": "mapField",
"fieldType": "map",
"fields": [
{
"fieldName": "nestedField",
"fieldType": "int"
},{
"fieldName": "nestedField2",
"fieldType": "varchar"
}
]
}
]
}
```

### Example Provisioning the Schema with a JSON String
```json
"jsonOptions": {
"jsonSchema": 
"\{\"type\":\"tuple_schema\",\"columns\":[{\"name\":\"outer_map\",\"type\":\"STRUCT<`int_field`
 BIGINT, `int_array` ARRAY>\",\"mode\":\"REQUIRED\"}]}"
}
```

You can print out a JSON string of a schema with the Java code below. 

```java
TupleMetadata schema = new SchemaBuilder()
.addNullable("a", MinorType.BIGINT)
.addNullable("m", MinorType.VARCHAR)
.build();
ColumnMetadata m = schema.metadata("m");
m.setProperty(JsonLoader.JSON_MODE, JsonLoader.JSON_LITERAL_MODE);

System.out.println(schema.jsonString());
```

This will generate something like the JSON string below:

```json
{
"type":"tuple_schema",
"columns":[
{"name":"a","type":"BIGINT","mode":"OPTIONAL"},
{"name":"m","type":"VARCHAR","mode":"OPTIONAL","properties":\{"drill.json-mode":"json"}
}
]
}
```

## Dealing With Inconsistent Schemas
One of the major challenges of interacting with JSON data is when the schema is 
inconsistent. Drill has a `UNION` data type which is marked as experimental. At 
the time of
writing, the HTTP plugin does not support the `UNION`, however supplying a 
schema can solve a lot of those issues.

### Json Mode
Drill offers the option of reading all JSON values as a string. While this can 
complicate downstream analytics, it can also be a more memory-efficient way of 
reading data with 
inconsistent schema. Unfortunately, at the time of writing, JSON-mode is only 
available with a provided schema. However, future work will allow this mode to 
be enabled for 
any JSON data.

 Enabling JSON Mode:
You can enable JSON mode simply by adding the `drill.json-mode` property with a 
value of `json` to a field, as shown below:

```json
{
"fieldName": "jsonField",
"fieldType": "varchar",
"properties": {
"drill.json-mode": "json"
}
}
```



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (DRILL-8202) Add Options to Skip Malformed JSON Records to HTTP Plugin

2022-04-27 Thread Charles Givre (Jira)
Charles Givre created DRILL-8202:


 Summary: Add Options to Skip Malformed JSON Records to HTTP Plugin
 Key: DRILL-8202
 URL: https://issues.apache.org/jira/browse/DRILL-8202
 Project: Apache Drill
  Issue Type: Improvement
  Components: Storage - Other
Affects Versions: 1.20.0
Reporter: Charles Givre
Assignee: Charles Givre
 Fix For: 2.0.0


The JSON reader has the possibility of skipping malformed records and 
documents, but this is a global setting.  This PR adds this configuration to 
the HTTP plugin so that it can be set individually for each endpoint. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (DRILL-8193) Incorrect Annotation used for HttpJsonOptions

2022-04-12 Thread Charles Givre (Jira)
Charles Givre created DRILL-8193:


 Summary: Incorrect Annotation used for HttpJsonOptions
 Key: DRILL-8193
 URL: https://issues.apache.org/jira/browse/DRILL-8193
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Other
Affects Versions: 1.20.0
Reporter: Charles Givre
Assignee: Charles Givre
 Fix For: 2.0.0






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (DRILL-8191) HTTP Request Function Not Detecting JSON Config

2022-04-07 Thread Charles Givre (Jira)
Charles Givre created DRILL-8191:


 Summary: HTTP Request Function Not Detecting JSON Config
 Key: DRILL-8191
 URL: https://issues.apache.org/jira/browse/DRILL-8191
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Other
Affects Versions: 1.20.0
Reporter: Charles Givre
Assignee: Charles Givre
 Fix For: 2.0.0


This is a minor fix.  The http_request function was not detecting the input 
format option and throwing an exception. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [DISCUSS] Add schema support for the XML format

2022-04-06 Thread Charles Givre
I'll add to this thread as the developer of the XML plugin for Drill.  IMHO, I 
think it would be a very good idea to add XSD schema support.  I've not had 
time to really dig into that, but it would seem like writing a converter from 
XSD to Drill's TupleMetadata would be relatively straightforward.  Then we'd 
have to make sure the schema provisioning works, which actually isn't that 
hard.  I started looking into the first part, but got side tracked.  In any 
event, having the schema from an XSD would eliminate the ambiguity in XML files.

As a side benefit, this would allow for easy conversion from XML to JSON.  You 
could simply do a CTAS query on an XML file and output JSON.

Best,
-- C


> On Apr 6, 2022, at 5:41 PM, Ted Dunning  wrote:
> 
> And if there are zero instances what happens (curiosity here)?
> 
> 
> 
> On Wed, Apr 6, 2022 at 12:28 PM Lee, David 
> wrote:
> 
>> Which is why using a XSD is more or less full proof..
>> 
>> If the pet element is tagged with maxOccurs="unbounded" it implies it
>> should be saved as an array even if there is just one occurrence of 
>> in your data.
>> 
>> -Original Message-
>> From: Ted Dunning 
>> Sent: Wednesday, April 6, 2022 11:48 AM
>> To: dev 
>> Cc: u...@drill.apache.org
>> Subject: Re: [DISCUSS] Add schema support for the XML format
>> 
>> External Email: Use caution with links and attachments
>> 
>> 
>> That example:
>> 
>> dog
>>> cat
>> 
>> 
>> can also convert to ["pet":"dog", "pet":"dog']
>> 
>> XML is rife with problems like this.
>> 
>> As you say.
>> 
>> But worse than can be imagined unless you have been hit by these problems.
>> 
>> On Wed, Apr 6, 2022 at 11:39 AM Lee, David > .invalid>
>> wrote:
>> 
>>> TO_JSON won't work in cases where..
>>> 
>>> One file contains: dog which converts to {"pet":"dog"}
>>> 
>>> But another file contains:
>>> dog
>>> cat
>>> which converts to: {"pet": ["dog", "cat"]}
>>> 
>>> pet as a column in Drill can't be both a varchar and an array of
>>> varchar
>>> 
>>> There are a ton of gotcha(s) when dealing with XML..
>>> numeric vs string
>>> scalar vs array
>>> 
>>> -Original Message-
>>> From: Lee, David
>>> Sent: Wednesday, April 6, 2022 10:54 AM
>>> To: u...@drill.apache.org; dev@drill.apache.org
>>> Subject: RE: [DISCUSS] Add schema support for the XML format
>>> 
>>> I wrote something to convert XML to JSON using an XSD schema file to
>>> solve fields, types, nested structures, etc.. It's the only real way
>>> to ensure column level data integrity.
>>> 
>>> https://urldefense.com/v3/__https://github.com/davlee1972/xml_to_json_
>>> _;!!KSjYCgUGsB4!JXBZmU6Z9rag7GO9okdk22y102IZz1gw3IThP06jk-0bTwJiGLlbm8
>>> HnWC64OWFHods$
>>> 
>>> Converts XML to valid JSON or JSONL Requires only two files to get
>>> started. Your XML file and the XSD schema file for that XML file.
>>> 
>>> -Original Message-
>>> From: luoc 
>>> Sent: Wednesday, April 6, 2022 5:01 AM
>>> To: u...@drill.apache.org; dev@drill.apache.org
>>> Subject: [DISCUSS] Add schema support for the XML format
>>> 
>>> External Email: Use caution with links and attachments
>>> 
>>> 
>>> Hello dear driller,
>>> 
>>> Before starting the topic, I would like to do a simple survey :
>>> 
>>> 1. Did you know that Drill already supports XML format?
>>> 
>>> 2. If yes, what is the maximum size for the XML files you normally read?
>>> 1MB, 10MB or 100MB
>>> 
>>> 3. Do you expect that reading XML will be as easy as JSON (Schema
>>> Discovery)?
>>> 
>>> Thank you for responding to those questions.
>>> 
>>> XML is different from the JSON file, and if we rely solely on the
>>> Drill drive to deduce the structure of the data. (or called SCHEMA),
>>> the code will get very complex and delicate.
>>> 
>>> For example, inferring array structure and numeric range. So,
>>> "provided schema" or "TO_JSON" may be good medicine :
>>> 
>>> Provided Schema
>>> 
>>> We can add the DTD or XML Schema (XSD) support for the XML. It can
>>> build all value vectors (Writer) before reading data, solving the
>>> fields, types, and complex nested.
>>> 
>>> However, a definition file is actually a rule validator that allows
>>> elements to appear 0 or more times. As a result, it is not possible to
>>> know if all elements exist until the data is read.
>>> 
>>> Therefore, avoid creating a large number of value vectors that do not
>>> actually exist before reading the data.
>>> 
>>> We can build the top schema at the initial stage and add new value
>>> vectors as needed during the reading phase.
>>> 
>>> TO_JSON
>>> 
>>> Read and convert XML directly to JSON, using the JSON Reader for data
>>> resolution.
>>> 
>>> It makes it easier for us to query the XML data such as JSON, but
>>> requires reading the whole XML file in memory.
>>> 
>>> I think the two can be done, so I look forward to your spirited
>> discussion.
>>> 
>>> Thanks.
>>> 
>>> - luoc
>>> 
>>> 
>>> This message may contain information that is confidential or privileged.
>>> If you are not the intended recipient, 

[jira] [Created] (DRILL-8180) Add Icons to Storage Plugin List

2022-03-28 Thread Charles Givre (Jira)
Charles Givre created DRILL-8180:


 Summary: Add Icons to Storage Plugin List
 Key: DRILL-8180
 URL: https://issues.apache.org/jira/browse/DRILL-8180
 Project: Apache Drill
  Issue Type: Task
  Components: Storage - Other, Web Server
Affects Versions: 1.20.0
Reporter: Charles Givre
Assignee: Charles Givre
 Fix For: 2.0.0






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (DRILL-8178) Bump S3 SDK to Lastest Version

2022-03-26 Thread Charles Givre (Jira)
Charles Givre created DRILL-8178:


 Summary: Bump S3 SDK to Lastest Version
 Key: DRILL-8178
 URL: https://issues.apache.org/jira/browse/DRILL-8178
 Project: Apache Drill
  Issue Type: Task
  Components: Storage - Other
Affects Versions: 1.20.0
Reporter: Charles Givre
Assignee: Charles Givre
 Fix For: 2.0.0






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [VOTE] Adopt the Drill Test Framework from MapR

2022-03-17 Thread Charles Givre
+1 from me.  

Sent from my iPhone

> On Mar 17, 2022, at 05:03, James Turton  wrote:
> 
> Hi dev community!
> 
> Many of you need no introduction to the test framework developed by MapR
> 
> https://github.com/mapr/drill-test-framework
> 
> . For those who don't know, the test framework contains around 10k tests 
> often exercising scenarios not covered by Drill's unit tests. Just weeks ago 
> it revealed a regression in a Drill 1.20 RC and saved us from shipping with 
> that bug. The linked repository has been dormant for going on two years but I 
> am aware of bits of work that have been done on the test framework since, and 
> today Anton is actively dusting off and updating it. Since the codebase is 
> under the Apache 2.0 license, we are free to bring a copy into the Drill 
> project. I've created a new (currently empty) possible home for the test 
> framework at
> 
> https://github.com/apache/drill-test-framework
> 
> Before I proceed to push a clone there, please vote if you support or oppose 
> our adoption of the test framework.
> 
> P.S. I have also sent a message to a contact at HPE just in case they might 
> be aware of some concern applicable to our copying this repo but, given the 
> license applied, I cannot see that there will be be one.  Should anything get 
> raised (and we'd decided to proceed) I would, of course, pause so that we can 
> discuss.
> 
> Regards
> James


[jira] [Created] (DRILL-8169) Add UDFs to HTTP Plugin to Facilitate Joins

2022-03-16 Thread Charles Givre (Jira)
Charles Givre created DRILL-8169:


 Summary: Add UDFs to HTTP Plugin to Facilitate Joins
 Key: DRILL-8169
 URL: https://issues.apache.org/jira/browse/DRILL-8169
 Project: Apache Drill
  Issue Type: Improvement
  Components: Storage - Other
Affects Versions: 1.20.0
Reporter: Charles Givre
Assignee: Charles Givre
 Fix For: 2.0.0


There are some situations where a user might want to join data with an API 
result and the pushdowns prevent that from happening. The main situation where 
this happens is when 
an API has parameters which are part of the URL AND these parameters are 
dynamically populated via a join. 

In this case, there are two functions `http_get_url` and `http_get` which you 
can use to faciliate these joins. 

* `http_get('', )`: This function accepts a 
storage plugin as input and an optional list of parameters to include in a URL.
* `http_get_url(, )`: This function works in the same way except 
that it does not pull any configuration information from existing storage 
plugins.

### Example Queries
Let's say that you have a storage plugin called `github` with an endpoint 
called `repos` which points to the url: https://github.com/orgs/\{org}/repos. 
It is easy enough to 
write a query like this:

```sql
SELECT * 
FROM github.repos
WHERE org='apache'
```
However, if you had a file with organizations and wanted to join this with the 
API, the query would fail. Using the functions listed above you could get this 
data as follows:

```sql
SELECT http_get('github.repos', `org`)
FROM dfs.`some_data.csvh`
```
or
```sql
SELECT http_get('https://github.com/orgs/\{org}/repos', `org`)
FROM dfs.`some_data.csvh`
```

** WARNING: This functionality will execute an HTTP Request FOR EVERY ROW IN 
YOUR DATA. Use with caution. **

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (DRILL-8167) Add JSON Config Options to Format Config

2022-03-13 Thread Charles Givre (Jira)
Charles Givre created DRILL-8167:


 Summary: Add JSON Config Options to Format Config
 Key: DRILL-8167
 URL: https://issues.apache.org/jira/browse/DRILL-8167
 Project: Apache Drill
  Issue Type: Improvement
  Components: Storage - JSON
Affects Versions: 1.20.0
Reporter: Charles Givre
Assignee: Charles Givre
 Fix For: Future


Most all Drill format plugins allow the user to configure various options for 
that plugin as part of the format config.  The one glaring exception is the 
JSON reader which has several configuration options which can only be set 
globally.  This PR moves these to the format config so that users can set these 
options when they configure a storage plugin.  

This PR does not eliminate the global settings for JSON.  It simply adds 
another place where a user can update the settings.  If the settings in the 
config file are not defined (`null`) Drill will use the global settings.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (DRILL-8166) Add List of Supported File Format Extensions

2022-03-11 Thread Charles Givre (Jira)
Charles Givre created DRILL-8166:


 Summary: Add List of Supported File Format Extensions
 Key: DRILL-8166
 URL: https://issues.apache.org/jira/browse/DRILL-8166
 Project: Apache Drill
  Issue Type: Improvement
  Components: Web Server
Affects Versions: 1.20.0
Reporter: Charles Givre
Assignee: Charles Givre
 Fix For: Future


Drill does not currently give users a way of knowing what file extensions are 
supported.  This PR adds two REST endpoints which return a list of supported 
file extensions.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (DRILL-8161) Add Global Credentials to HTTP Storage Plugin

2022-03-07 Thread Charles Givre (Jira)
Charles Givre created DRILL-8161:


 Summary: Add Global Credentials to HTTP Storage Plugin
 Key: DRILL-8161
 URL: https://issues.apache.org/jira/browse/DRILL-8161
 Project: Apache Drill
  Issue Type: Improvement
  Components: Storage - Other
Affects Versions: 1.20.0
Reporter: Charles Givre
Assignee: Charles Givre
 Fix For: Future


Currently, Drill forces to you set username and passwords individually for 
every API endpoint in a http storage plugin.  This PR allows you to set global 
credentials which will be used for all endpoints in a given HTTP storage plugin 
instance.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (DRILL-8155) Add Impersonation Support for Non-Hadoop Based Storage Plugins

2022-03-04 Thread Charles Givre (Jira)
Charles Givre created DRILL-8155:


 Summary: Add Impersonation Support for Non-Hadoop Based Storage 
Plugins
 Key: DRILL-8155
 URL: https://issues.apache.org/jira/browse/DRILL-8155
 Project: Apache Drill
  Issue Type: Improvement
  Components: Security
Affects Versions: 1.20.0
Reporter: Charles Givre
Assignee: Charles Givre
 Fix For: Future


Drill's current implementation of user impersonation does not allow non-Hadoop 
based plugins to impersonate the user.  This creates security issues as it 
requires an organization to create service accounts for users to access storage 
such as a relational database or Splunk, ES and the like from Drill. 

This PR proposes to add the framework to support individual credentials for 
non-Hadoop based plugins. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [DISCUSS] Pull Request Cleanup

2022-03-04 Thread Charles Givre
Hi Christian, 
Thanks for your input.  First of all, Drill is clearly a complex system so PRs 
do tend to take a long time to get merged.  One option might be to use a bot 
like stale [1] which automatically closes PRs after a period of inactivity. 

Personally, I'd set the "timeout" period to 90 days.
Best,
-- C


[1]:  https://github.com/apps/stale <https://github.com/apps/stale>


> On Mar 3, 2022, at 3:51 PM, Z0ltrix  wrote:
> 
> Hi Charles,
> 
> what process would you suggest?
> 
> I would think some devs are using a PR to keep the work open for memory 
> and/or others can discuss it but of course, if its stale for months maybe it 
> will never make any more progress.
> Perhaps someone could trigger a comment and ask for further development, but 
> who would be responsible for that trigger?
> 
> Regards
> Christian
> 
> 
> 
> 
> ---- Original-Nachricht 
> Am 3. März 2022, 17:54, Charles Givre schrieb:
> 
> Hello all,
> I wanted to discuss the possibility of doing a cleanup of open and stale pull 
> requests. There seem to be about 10 PRs that are actively being worked, then 
> we have a bunch of PRs of various stages of staleness.
> 
> What do you all think about having some sort of process for closing out old 
> PRs that are not actively being worked?
> Best,
> -- C
> 
> 



[DISCUSS] Pull Request Cleanup

2022-03-03 Thread Charles Givre
Hello all, 
I wanted to discuss the possibility of doing a cleanup of open and stale pull 
requests.  There seem to be about 10 PRs that are actively being worked, then 
we have a bunch of PRs of various stages of staleness. 

What do you all think about having some sort of process for closing out old PRs 
that are not actively being worked?
Best,
-- C

New Committer: Tengfei Wang

2022-03-03 Thread Charles Givre
The Project Management Committee (PMC) for Apache Drill
has invited Tengfei Wang to become a committer and we are pleased 
to announce that he has accepted.

Being a committer enables easier contribution to the
project since there is no need to go via the patch
submission process. This should enable better productivity.
A PMC member helps manage and guide the direction of the project.
Please join me in congratulating Tengfei!



[jira] [Created] (DRILL-8153) Convert OAuth REST APIs to JSON

2022-03-02 Thread Charles Givre (Jira)
Charles Givre created DRILL-8153:


 Summary: Convert OAuth REST APIs to JSON
 Key: DRILL-8153
 URL: https://issues.apache.org/jira/browse/DRILL-8153
 Project: Apache Drill
  Issue Type: Improvement
  Components: Storage - Other
Affects Versions: 1.20.0
Reporter: Charles Givre
Assignee: Charles Givre
 Fix For: Future


This PR converts the OAuth REST endpoints to accept JSON for the sake of 
consistency. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [VOTE] Release Apache Drill 1.20.0 - RC5

2022-02-24 Thread Charles Givre
Tested on various queries in embedded mode.

+1 from me. (binding)

> On Feb 23, 2022, at 12:34 AM, James Turton  wrote:
> 
> Hi all
> 
> I'd like to propose the sixth release candidate (RC5) of Apache Drill, 
> version 1.20.0 which differs from the previous RC in the following.
> 
> DRILL-8144: Cannot launch Drill 1.20 RC 4 on Windows (#2470)
> DRILL-8143: Error querying json with $date field (#2469)
> DRILL-8142: SAS Reader Returns NPE #2468
> 
> The release candidate covers a total of 122 resolved JIRAs [1]. Thanks to 
> everyone who contributed to this release.
> 
> The tarball artifacts are hosted at [2][3] and the maven artifacts are hosted 
> at [4].
> 
> This release candidate is based on commits 
> d19878973ef6723250d231258f470340863ddc23 and 
> 20ff3778fd1a046272426178aeca671ed822d970 located at [5][6].
> 
> Please download and try out the release.
> 
> [ ] +1
> [ ] +0
> [ ] -1
> 
> [1] 
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12350301=12313820
> [2] https://dist.apache.org/repos/dist/dev/drill/drill-1.20.0-rc5/
> [3] https://dist.apache.org/repos/dist/dev/drill/drill-1.20.0-hadoop2-rc5/ 
> (Hadoop 2 build)
> [4] https://repository.apache.org/content/repositories/orgapachedrill-1095/
> [5] https://github.com/jnturton/drill/commits/drill-1.20.0
> [6] https://github.com/jnturton/drill/commits/drill-1.20.0-hadoop2 (Hadoop 2 
> build)
> 



Re: thinking of our Ukrainian friends

2022-02-24 Thread Charles Givre
I would also like to express my sympathy and support for Arina, Vova, Vitalii, 
Igor, Anton and the people of Ukraine.
-- C

> On Feb 24, 2022, at 5:07 AM, James Turton  wrote:
> 
> I too would like to express my sympathy and solidarity.
> 
> On 2022/02/24 11:43, Z0ltrix wrote:
>> oh my goodness i hope this will end soon.
>> Stay safe!
>> 
>> --- Original Message ---
>> 
>> luoc  schrieb am Donnerstag, 24. Februar 2022 um 10:24:
>> 
>>> Vitalii and Vova are my Ukrainian friends, hopefully they will stay safe as 
>>> well.
>>> 
 On Feb 24, 2022, at 14:39, Ted Dunning ted.dunn...@gmail.com wrote:
 
 For commercial historical reasons many of the people who have contributed
 
 to Drill live in Ukraine.
 
 My heart is with them tonight. I hope they stay safe.
> 



[jira] [Created] (DRILL-8148) Add REST Endpoints to Update OAuth Tokens

2022-02-22 Thread Charles Givre (Jira)
Charles Givre created DRILL-8148:


 Summary: Add REST Endpoints to Update OAuth Tokens
 Key: DRILL-8148
 URL: https://issues.apache.org/jira/browse/DRILL-8148
 Project: Apache Drill
  Issue Type: Improvement
Affects Versions: 1.20.0
Reporter: Charles Givre
Assignee: Charles Givre
 Fix For: Future


See attached PR



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [VOTE] Release Apache Drill 1.20.0 - RC4

2022-02-18 Thread Charles Givre
+1 for release.

Great work everyone!!
-- C


> On Feb 18, 2022, at 8:40 AM, Z0ltrix  wrote:
> 
> +1 for release.
> 
> - Installed Hadoop2 RC4 in our developement-environment on aws ec2 (ubuntu 
> 18.04)
>   - zookeeper 3.6.5,
> 
>   - hadoop 2.9.2
> 
>   - hbase 1.5.0
>   - phoenix 4.15.0
> 
>   - phoenix-queryserver 1.0.0
>   - everything secured by kerberos
>   - everything tls encrypted
>   - everything impersonated
> - Run Queries agains Parquet Files stored in HDFS (impersonated) + INT96 
> Timestamps
> - Run Queries against HBase (impersonated)
> - Run Queries against Phoenix (impersonated)
> - Run UNION ALL Querie agains HBase + HDFS (Parquet) to simulate Lambda 
> Dataset
> - Run UNION ALL Querie agains Phoneix + HDFS (Parquet) to simulate Lambda 
> Dataset
> - Run ANALYZE TABLE COMPUTE STATISTICS on HDFS Parquet Talbes (Iceberg 
> Metastore)
> - Run ANALYZE TABLE REFRESH METADATA on HDFS Parquet Talbes (Iceberg 
> Metastore)
> - Run Queries against the iceberg metastore to simulate icequerg format 
> plugin reads
> - Tested some Superset and Tableau Dashboards over ODBC Connection 
> (impersonated)
> - Tested some Queries from Nifi over JDBC Connection
> 
> Regards
> 
> 
> Christian
> 
> --- Original Message ---
> 
> James Turton  schrieb am Donnerstag, 17. Februar 2022 um 
> 19:53:
> 
>> Hi all
>> 
> 
>> I'd like to propose the fifth release candidate (RC4) of Apache Drill,
>> 
> 
>> version 1.20.0 which differs from the previous RC in the following.
>> 
> 
>> DRILL-8139: Parquet CodecFactory thread safety bug (#2463)
>> 
> 
>> DRILL-8134: Cannot query Parquet INT96 columns as timestamps (#2460)
>> 
> 
>> DRILL-8122: Change kafka metadata obtaining due to KAFKA-5697 (#2456)
>> 
> 
>> DRILL-8137: Prevent reading union inputs after cancellation request (#2462)
>> 
> 
>> The release candidate covers a total of 117 resolved JIRAs [1]. Thanks
>> 
> 
>> to everyone who contributed to this release.
>> 
> 
>> The tarball artifacts are hosted at [2][3] and the maven artifacts are
>> 
> 
>> hosted at [4][5].
>> 
> 
>> This release candidate is based on commits
>> 
> 
>> 753bff39d8dd08eaa1273eadc20175d34a87e044 and
>> 
> 
>> 9955d082bcdba401666799f49a6cd3c3f996af97 located at [6][7].
>> 
> 
>> Please download and try out the release.
>> 
> 
>> [ ] +1
>> 
> 
>> [ ] +0
>> 
> 
>> [ ] -1
>> 
> 
>> [1]
>> 
> 
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12350301=12313820
>> 
> 
>> [2] https://dist.apache.org/repos/dist/dev/drill/drill-1.20.0-rc4/
>> 
> 
>> [3]
>> 
> 
>> https://dist.apache.org/repos/dist/dev/drill/drill-1.20.0-hadoop2-rc4/
>> 
> 
>> (Hadoop 2 build)
>> 
> 
>> [4] https://repository.apache.org/content/repositories/orgapachedrill-1094/
>> 
> 
>> [5]
>> 
> 
>> https://repository.apache.org/content/repositories/orgapachedrill-1095/
>> 
> 
>> (Hadoop 2 build)
>> 
> 
>> [6] https://github.com/jnturton/drill/commits/drill-1.20.0
>> 
> 
>> [7] https://github.com/jnturton/drill/commits/drill-1.20.0-hadoop2
>> 
> 
>> (Hadoop 2 build)
> 



signature.asc
Description: Message signed with OpenPGP


[jira] [Created] (DRILL-8142) SAS Reader Returns NPE

2022-02-18 Thread Charles Givre (Jira)
Charles Givre created DRILL-8142:


 Summary: SAS Reader Returns NPE 
 Key: DRILL-8142
 URL: https://issues.apache.org/jira/browse/DRILL-8142
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Text  CSV
Affects Versions: 1.19.0
Reporter: Charles Givre
Assignee: Charles Givre
 Fix For: Future


The SAS reader uses the first row of data to infer the data types.  If the 
first row has null values, the SAS reader was throwing a NPE.  This PR fixes 
that.  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (DRILL-8140) Add JSON Post Body to HTTP Rest Storage Plugin

2022-02-17 Thread Charles Givre (Jira)
Charles Givre created DRILL-8140:


 Summary: Add JSON Post Body to HTTP Rest Storage Plugin
 Key: DRILL-8140
 URL: https://issues.apache.org/jira/browse/DRILL-8140
 Project: Apache Drill
  Issue Type: Improvement
  Components: Storage - Other
Affects Versions: 1.19.0
Reporter: Charles Givre
Assignee: Charles Givre
 Fix For: Future


Some APIs require information be sent as a JSON post body.   This PR enables 
that. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[DISCUSS] Next RC Question: Merging DRILL-8122

2022-02-15 Thread Charles Givre
Hello all, 
I want to thank everyone for their work on the next Drill release.  It seems we 
keep finding teething bugs (grrr) in our RCs and we just found another…. Since 
the commit freeze, there are two pending PRs which are relatively minor and are 
passing CI.  (Excluding the blocker: DRILL-8134)

The PR’s I’m referring to are DRILL-8122 Change Kafka Metadata, which fixes a 
bug relating to Kafka 2.0 and the PR w/o a Jira Issue to add the Jackson-BOM.  
As we prepare for the next RC, I wanted to ask if anyone has any strong 
opinions as to whether we should merge these into the next RC or not.  They 
seem relatively minor, but as we all know, you never know what could go wrong, 
even with “minor” fixes.

We will certainly merge these one way or the other… the question is whether to 
include them in the next RC or not.

Thx,
— C



Drill Board Report

2022-02-07 Thread Charles Givre
# Description:
The mission of Drill is the creation and maintenance of software related to 
Schema-free SQL Query Engine for Apache Hadoop, NoSQL and Cloud Storage

## Issues:
No blocking issues.

## Membership Data:
Apache Drill was founded 2014-11-18 (7 years ago)
There are currently 60 committers and 27 PMC members in this project.
The Committer-to-PMC ratio is roughly 5:3.

Community changes, past quarter:
- James Turton was added to the PMC on 2022-01-23
- PJ Fanning was added as committer on 2022-01-19

## Project Activity:
The Drill team is preparing to release Drill 1.20.
We released RC0 for Drill 1.20 on 5 February. One minor bug was found, 
so we will likely be putting out RC1 shortly. 

Drill 1.20 is significant in that in addition to new functionality and bug fixes
the new version has backwards compatibility with Hadoop 2.  This limitation 
meant that many organizations could not upgrade past Drill circa 1.17.  

Some highlights of Drill 1.20 are:
* Storage plugin for Apache Phoenix
* Format plugin for Apache Iceberg
* Upgrade Parquet reader to Parquet v2
* Support for automatic de-pagination for REST plugin
* Support for OAuth2.0 for REST queries
* Refactoring pushdowns for Mongo
much more... 

The Drill community has been holding monthly hangout meetings which James Turton
has organized. We've been discussing building a Drill 2.0 and what that would 
entail. There are a few key themes of things which we should revise which would
necessarily break some existing functionality.  

* 1.19.0 was released on 2021-06-10.
* 1.18.0 was released on 2020-09-04.
* 1.17.0 was released on 2019-12-26.

## Community Health:

The Drill community is growing and I would say strong. As mentioned above
there has been a good conversation for the last few months about Drill 2.0.

* dev@drill.apache.org had a 80% increase in traffic in the past quarter
(1147 emails compared to 635)
* iss...@drill.apache.org had a 79% increase in traffic in the past quarter
(1033 emails compared to 576)
* 83 issues opened in JIRA, past quarter (45% increase)
* 82 issues closed in JIRA, past quarter (105% increase)
* 135 commits in the past quarter (-18% change)
* 21 code contributors in the past quarter (61% increase)
* 76 PRs opened on GitHub, past quarter (10% increase)
* 82 PRs closed on GitHub, past quarter (22% increase)
* 13 issues opened on GitHub, past quarter (-40% change)
* 10 issues closed on GitHub, past quarter (-16% change)
* 342 members of Drill slack channel.



Re: [VOTE] Release Apache Drill 1.20.0 - RC0

2022-02-07 Thread Charles Givre
Hello all, 
I hate to be the downer, but we found a minor bug in the REST API.  I’ve 
already submitted a PR to fix (https://github.com/apache/drill/pull/2453 
).  So I’d have to give a -1 for RC0 
in favor of merging this PR. 
Thanks,
— C



> On Feb 7, 2022, at 11:02 AM, Z0ltrix  wrote:
> 
> +1 for release.
> 
> - Installed rc0 in our testing-environemnt on aws ec2 (ubuntu 18.04)
>   - zookeeper 3.6.3, 
> 
>   - hadoop 3.2.1 
> 
>   - hbase 2.4.8
>   - phoenix 5.1.2 
> 
>   - phoenix-queryserver 6.0.0
>   - everything secured by kerberos
>   - everything tls encrypted
>   - everything impersonated
> - Run Queries agains Parquet Files stored in HDFS (impersonated)
> - Run Queries against HBase (impersonated)
> - Run Queries against Phoenix (impersonated)
> - Tested HBasePStoreProvider
> 
> Regards 
> 
> Christian
> 
> --- Original Message ---
> 
> James Turton  schrieb am Samstag, 5. Februar 2022 um 10:11:
> 
>> Hi all
>> 
> 
>>  Note from the release manager.
>> 
> 
>> The normal RC announcement follows below but please take note that while
>> 
> 
>> you should test and try this Hadoop 3-based RC 0 of Drill 1.20.0, there
>> 
> 
>> is likely to be another RC which ships both Hadoop 2 and Hadoop 3 builds
>> 
> 
>> as soon as I have got some advice on the best was to incorporate this in
>> 
> 
>> our release process. However, that RC will be based on exactly the same
>> 
> 
>> commit as this one is (assuming no issues are found), so please do test
>> 
> 
>> this one every bit as much as you would have.
>> 
> 
>> - Thank, James
>> 
> 
>> I'd like to propose the first release candidate (RC0) of Apache Drill,
>> 
> 
>> version 1.20.0.
>> 
> 
>> The release candidate covers a total of 105 resolved JIRAs [1]. Thanks
>> 
> 
>> to everyone who contributed to this release.
>> 
> 
>> The tarball artifacts are hosted at [2] and the maven artifacts are
>> 
> 
>> hosted at [3].
>> 
> 
>> This release candidate is based on commit
>> 
> 
>> 556b972560911c20691d5b5de6c656d22c59ce0b located at [4].
>> 
> 
>> Please download and try out the release.
>> 
> 
>> The vote ends at 2022-02-08 10:00 UTC ≅ 3×24 hours after the timestamp
>> 
> 
>> on this email.
>> 
> 
>> [ ] +1
>> 
> 
>> [ ] +0
>> 
> 
>> [ ] -1
>> 
> 
>> [1]
>> 
> 
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12350301=12313820
>> 
> 
>> [2] https://dist.apache.org/repos/dist/dev/drill/drill-1.20.0/
>> 
> 
>> [3] https://repository.apache.org/content/repositories/orgapachedrill-1087/
>> 
> 
>> [4] https://github.com/jnturton/drill/commits/drill-1.20.0
> 



[jira] [Created] (DRILL-8126) Ignore OAuth Parameter in Storage Plugin

2022-02-07 Thread Charles Givre (Jira)
Charles Givre created DRILL-8126:


 Summary: Ignore OAuth Parameter in Storage Plugin
 Key: DRILL-8126
 URL: https://issues.apache.org/jira/browse/DRILL-8126
 Project: Apache Drill
  Issue Type: Bug
  Components: Web Server
Affects Versions: 1.19.0
Reporter: Charles Givre
Assignee: Charles Givre
 Fix For: 1.20.0


During certain REST calls, the REST interface was throwing a 400 error due to 
the `oauth` parameter. This minor fix, makes that parameter ignorable.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [DISCUSS] Some ideas for Drill 1.21

2022-02-06 Thread Charles Givre
Hi Luoc, 
Thanks for your concern.  Apache projects are often backed unofficially by a 
company.  Drill was, for years, backed my MapR as evident by all the MapR 
unique code that is still in the Drill codebase. However, since MapR's 
acquisition, I think it is safe to say that Drill really has become a 
community-driven project.  While some of the committers are colleagues of mine 
at DataDistillr, and Drill is a core part of DataDisitllr, from our 
perspective, we've really just been focusing on making Drill better for 
everyone as well as building the community of Drill users, regardless of 
whether they use DataDistillr or not.  We haven't rejected any PRs because they 
go against our business model or tried to steer Drill against the community or 
anything like that. 

Just for your awareness, there are other OSS projects, including some Apache 
projects where one company controls everything.  Outside contributions are only 
accepted if they fit the company's roadmap, and there is no real 
community-building that happens.  From my perspective, that is not what I want 
from Drill.  My personal goal is to build an active community of users and 
developers around an awesome tool. 

I hope this answers your concerns.
Best,
-- C


> On Feb 6, 2022, at 9:42 AM, luoc  wrote:
> 
> 
> Before we discuss the next release, I would like to explain that Apache 
> project should not be directly linked to a commercial company, otherwise this 
> will affect the motivation of the community to contribute.
> 
> Thanks.
> 
>> On Feb 6, 2022, at 21:29, Charles Givre  wrote:
>> 
>> Hello all, 
>> Firstly, I wanted to thank everyone for all the work that has gone into 
>> Drill 1.20 as well as the ongoing discussion around Drill 2.0.   I wanted to 
>> start a discussion around topic for Drill 1.21 and that is INFO_SCHEMA 
>> improvements.  As my company wades further and further into Drill, it has 
>> become apparent that the INFO_SCHEMA could use some attention.  James Turton 
>> submitted a PR which was merged into Drill 1.20, but in so doing he 
>> uncovered an entire Pandora's box of other issues which might be worth 
>> addressing.  In a nutshell, the issues with the INFO_SCHEMA are all 
>> performance related: it can be very slow and also can consume significant 
>> resources when executing even basic queries.  
>> 
>> My understanding of how the info schema (IS) works is that when a user 
>> executes a query, Drill will attempt to instantiate every enabled storage 
>> plugin to discover schemata and other information. As you might imagine, 
>> this can be costly. 
>> 
>> So, (and again, this is only meant as a conversation starter), I was 
>> thinking there are some general ideas as to how we might improve the IS:
>> 1.  Implement a limit pushdown:  As far as I can tell, there is no limit 
>> pushdown in the IS and this could be a relatively quick win for improving IS 
>> query performance.
>> 2.  Caching:  I understand that caching is tricky, but perhaps we could add 
>> some sort of schema caching for IS queries, or make better use of the Drill 
>> metastore to reduce the number of connections during IS queries.  Perhaps in 
>> combination with the metastore, we could implement some sort of "metastore 
>> first" plan, whereby Drill first hits the metastore for query results and if 
>> the limit is reached, we're done.  If not, query the storage plugins...
>> 3.  Parallelization:  It did not appear to me that Drill parallelizes IS 
>> queries.   We may be able to add some parallelization which would improve 
>> overall speed, but not necessarily reduce overall compute cost
>> 4.  Convert to EVF2:  Not sure that there's a performance benefit here, but 
>> at least we could get rid of cruft
>> 5.  Reduce SeDe:   I imagine there was a good reason for doing this, but the 
>> IS seems to obtain a POJO from the storage plugin then write these results 
>> to old-school Drill vectors.  I'm sure there was a reason it was done this 
>> way, (or maybe not) but I have to wonder if there is a more efficient way of 
>> obtaining the information from the storage plugin, ideally w/o all the 
>> object creation. 
>> 
>> These are just some thoughts, and I'm curious as to what the community 
>> thinks about this.  Thanks everyone!
>> -- C
> 



[DISCUSS] Some ideas for Drill 1.21

2022-02-06 Thread Charles Givre
Hello all, 
Firstly, I wanted to thank everyone for all the work that has gone into Drill 
1.20 as well as the ongoing discussion around Drill 2.0.   I wanted to start a 
discussion around topic for Drill 1.21 and that is INFO_SCHEMA improvements.  
As my company wades further and further into Drill, it has become apparent that 
the INFO_SCHEMA could use some attention.  James Turton submitted a PR which 
was merged into Drill 1.20, but in so doing he uncovered an entire Pandora's 
box of other issues which might be worth addressing.  In a nutshell, the issues 
with the INFO_SCHEMA are all performance related: it can be very slow and also 
can consume significant resources when executing even basic queries.  

My understanding of how the info schema (IS) works is that when a user executes 
a query, Drill will attempt to instantiate every enabled storage plugin to 
discover schemata and other information. As you might imagine, this can be 
costly. 

So, (and again, this is only meant as a conversation starter), I was thinking 
there are some general ideas as to how we might improve the IS:
1.  Implement a limit pushdown:  As far as I can tell, there is no limit 
pushdown in the IS and this could be a relatively quick win for improving IS 
query performance.
2.  Caching:  I understand that caching is tricky, but perhaps we could add 
some sort of schema caching for IS queries, or make better use of the Drill 
metastore to reduce the number of connections during IS queries.  Perhaps in 
combination with the metastore, we could implement some sort of "metastore 
first" plan, whereby Drill first hits the metastore for query results and if 
the limit is reached, we're done.  If not, query the storage plugins...
3.  Parallelization:  It did not appear to me that Drill parallelizes IS 
queries.   We may be able to add some parallelization which would improve 
overall speed, but not necessarily reduce overall compute cost
4.  Convert to EVF2:  Not sure that there's a performance benefit here, but at 
least we could get rid of cruft
5.  Reduce SeDe:   I imagine there was a good reason for doing this, but the IS 
seems to obtain a POJO from the storage plugin then write these results to 
old-school Drill vectors.  I'm sure there was a reason it was done this way, 
(or maybe not) but I have to wonder if there is a more efficient way of 
obtaining the information from the storage plugin, ideally w/o all the object 
creation. 

These are just some thoughts, and I'm curious as to what the community thinks 
about this.  Thanks everyone!
-- C

Re: [VOTE] Freeze for Drill 1.20

2022-02-03 Thread Charles Givre
Great work James!  Thanks everyone for all the work that goes into the release! 
 We’re almost there!
— C


> On Feb 3, 2022, at 8:30 AM, James Turton  
> wrote:
> 
> Great, thanks everyone.  I've started preparing a release candidate.
> 
> Please do not merge anything into the master branch from now until we're 
> clear.
> 
> James
> 
> 
> On 2022/02/02 20:18, Z0ltrix wrote:
>> +1 from me
>> 
>> Regards
>> Christian
>> 
>> 
>> 
>>  Original-Nachricht 
>> Am 2. Feb. 2022, 17:13, Vitalii Diravka schrieb:
>> 
>> 
>>+1
>> 
>>Kind regards
>>Vitalii
>> 
>>On Wed, Feb 2, 2022 at 6:04 PM Vova Vysotskyi
>> wrote:
>> 
>>> +1
>>>
>>> Kind regards,
>>> Volodymyr Vysotskyi
>>>
>>> On 2022/02/02 15:59:55 James Turton wrote:
>>> > PR #2449 was merged and there are now zero Dependabot alerts
>>against
>>> > master.
>>> >
>>> > +1 for freezing from me.
>>> >
>>> > On 2022/02/02 16:36, Charles Givre wrote:
>>> > > Assuming we pass dependabot, big +1 from me!! Great work
>>everyone!
>>> > > --C
>>> > >
>>> > >> On Feb 2, 2022, at 9:35 AM, James Turton 
>>wrote:
>>> > >>
>>> > >> Please vote again on the assumption that the very minor
>>Postgresql
>>> 42.3.1 -> 42.3.2 PR will be merged, clearing the last Dependabot
>>alert. It
>>> passed local testing so it looks like a safe bet.
>>> > >>
>>> > >>
>>> > >> On 2022/01/30 01:51, Charles Givre wrote:
>>> > >>> Hey James,
>>> > >>> Alas... I'm afraid I'd have to give a -1 on this. There
>>are some
>>> dependabot alerts at the moment, which we really should resolve
>>(or at
>>> least look at) before cutting a release. One of which has is
>>linked to a
>>> severe CVE. Also, I just submitted a VERY minor bug fix which
>>I'd love to
>>> squeak into this release, but that's not urgent.
>>> > >>> Best,
>>> > >>> --C
>>> > >>>
>>> > >>>
>>> > >>>> On Jan 29, 2022, at 7:36 AM, James Turton
>> wrote:
>>> > >>>>
>>> > >>>> Hello Dev community
>>> > >>>>
>>> > >>>> Not a moment too soon, we've finally dispatched the last
>>issues
>>> holding back 1.20! Here's a big thank you from the release
>>manager to
>>> everyone who helped to push us forward to this point. I'm sure
>>I'm not the
>>> only one receiving the "When it's coming??" questions. As an
>>interesting
>>> bit of trivia, there have been about 9 months separating recent
>>releases
>>> and it has now been about 8 months since 1.19. Who knew we were so
>>> consistent ;-) ?
>>> > >>>>
>>> > >>>> Please vote for or against a feature freeze on the master
>>branch.
>>> I assume only critical bug or vulnerability fixes get freeze
>>immunity?
>>> > >>>>
>>> > >>>> Thank you
>>> > >>>> James
>>> > >>>>
>>> > >>
>>> > >
>>> >
>>>
>> 



Re: [VOTE] Freeze for Drill 1.20

2022-02-02 Thread Charles Givre
Assuming we pass dependabot, big +1 from me!!   Great work everyone!
--C 

> On Feb 2, 2022, at 9:35 AM, James Turton  wrote:
> 
> Please vote again on the assumption that the very minor Postgresql 42.3.1 -> 
> 42.3.2 PR will be merged, clearing the last Dependabot alert.  It passed 
> local testing so it looks like a safe bet.
> 
> 
> On 2022/01/30 01:51, Charles Givre wrote:
>> Hey James,
>> Alas... I'm afraid I'd have to give a -1 on this.  There are some dependabot 
>> alerts at the moment, which we really should resolve (or at least look at) 
>> before cutting a release.  One of which has is linked to a severe CVE. Also, 
>> I just submitted a VERY minor bug fix which I'd love to squeak into this 
>> release, but that's not urgent.
>> Best,
>> --C
>> 
>> 
>>> On Jan 29, 2022, at 7:36 AM, James Turton  wrote:
>>> 
>>> Hello Dev community
>>> 
>>> Not a moment too soon, we've finally dispatched the last issues holding 
>>> back 1.20!  Here's a big thank you from the release manager to everyone who 
>>> helped to push us forward to this point.  I'm sure I'm not the only one 
>>> receiving the "When it's coming??" questions. As an interesting bit of 
>>> trivia, there have been about 9 months separating recent releases and it 
>>> has now been about 8 months since 1.19.  Who knew we were so consistent ;-) 
>>> ?
>>> 
>>> Please vote for or against a feature freeze on the master branch.  I assume 
>>> only critical bug or vulnerability fixes get freeze immunity?
>>> 
>>> Thank you
>>> James
>>> 
> 



[jira] [Created] (DRILL-8121) Add Partial Support for Per-User Credentials

2022-01-31 Thread Charles Givre (Jira)
Charles Givre created DRILL-8121:


 Summary: Add Partial Support for Per-User Credentials
 Key: DRILL-8121
 URL: https://issues.apache.org/jira/browse/DRILL-8121
 Project: Apache Drill
  Issue Type: Improvement
  Components: Storage - Other
Affects Versions: 1.19.0
Reporter: Charles Givre
Assignee: Charles Givre
 Fix For: Future


See pull request



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [VOTE] Freeze for Drill 1.20

2022-01-29 Thread Charles Givre
Hey James, 
Alas... I'm afraid I'd have to give a -1 on this.  There are some dependabot 
alerts at the moment, which we really should resolve (or at least look at) 
before cutting a release.  One of which has is linked to a severe CVE. Also, I 
just submitted a VERY minor bug fix which I'd love to squeak into this release, 
but that's not urgent.  
Best,
--C 


> On Jan 29, 2022, at 7:36 AM, James Turton  wrote:
> 
> Hello Dev community
> 
> Not a moment too soon, we've finally dispatched the last issues holding back 
> 1.20!  Here's a big thank you from the release manager to everyone who helped 
> to push us forward to this point.  I'm sure I'm not the only one receiving 
> the "When it's coming??" questions. As an interesting bit of trivia, there 
> have been about 9 months separating recent releases and it has now been about 
> 8 months since 1.19.  Who knew we were so consistent ;-) ?
> 
> Please vote for or against a feature freeze on the master branch.  I assume 
> only critical bug or vulnerability fixes get freeze immunity?
> 
> Thank you
> James
> 



[jira] [Created] (DRILL-8118) Add Option to Allow Disk Use on Mongo Queries

2022-01-29 Thread Charles Givre (Jira)
Charles Givre created DRILL-8118:


 Summary: Add Option to Allow Disk Use on Mongo Queries
 Key: DRILL-8118
 URL: https://issues.apache.org/jira/browse/DRILL-8118
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - MongoDB
Affects Versions: 1.19.0
Reporter: Charles Givre
Assignee: Charles Givre
 Fix For: 1.20.0


MongoDB has a strange feature (?) whereby queries which use more than 100MB of 
memory will by default fail.  Mongo allows the user to specify whether they 
want the query to spill to disk which allows larger queries but at a 
performance cost.

This minor PR adds the ability for a user to specify whether they want this 
option included in Mongo queries.  This only affects aggregate queries in 
Mongo. 

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: 1.20.0-SNAPSHOT: Sort exceeded memory limit of 104857600 bytes

2022-01-28 Thread Charles Givre
Good question. I don't know enough about Mongo config to answer that, but let 
me look into that. 
Best,
-- C

> On Jan 28, 2022, at 10:20 AM, Daniel Clark  wrote:
> 
> Hi Charles,
> 
> I was under the impression that the allowDiskUse parameter is passed by the
> client making the call to the mongodb server. Is it possible to add this
> parameter to the mongo storage plugin, similar to how you added the
> "batchSize" parameter for the 1.20 release?
> 
> On Fri, Jan 28, 2022 at 9:54 AM Charles Givre  wrote:
> 
>> Daniel,
>> Thanks for flagging this.  One thing I noticed in your logs is this:
>> 
>> Sort exceeded memory limit of 104857600 bytes, but did not opt in to
>> external sorting. Aborting operation. Pass allowDiskUse:true to opt in.
>> 
>> What's happening here is that in the newer version of Drill, Drill is
>> sending the sort operation to Mongo which (in theory) should be faster.  In
>> contrast, Drill 1.19 would receive the unsorted data from Mongo then sort
>> it.  I wonder if setting your mongo up so that the `allowDiskUse` parameter
>> is true, you might get better results if Mongo sorts the data.
>> 
>> -- C
>> 
>> 
>> 
>>> On Jan 28, 2022, at 9:43 AM, Daniel Clark  wrote:
>>> 
>>> Hi Charles,
>>> 
>>> Yes "supportsSortPushdown" is set to true. I left it at the default. I'll
>>> try setting it to false, and try again. Thanks for the feedback.
>>> 
>>> On Fri, Jan 28, 2022 at 9:38 AM Charles Givre  wrote:
>>> 
>>>> Hey Daniel,
>>>> Did you have the sort pushdown enabled?  This is one change that we
>> added
>>>> to the mongo pushdown since 1.19 and might be affecting your query.
>>>> Best,
>>>> -- C
>>>> 
>>>> 
>>>>> On Jan 28, 2022, at 9:32 AM, Daniel Clark  wrote:
>>>>> 
>>>>> Hello,
>>>>> 
>>>>> While evaluating 1.20.0-SNAPSHOT release performance, I ran a mongo
>>>> query that runs in 15 minutes in the 1.19 release (below).
>>>>> 
>>>>> SELECT `Elements_Efforts`.`EffortTypeName` AS `EffortTypeName`,
>>>>> `Elements`.`ElementSubTypeName` AS `ElementSubTypeName`,
>>>>> `Elements`.`ElementTypeName` AS `ElementTypeName`,
>>>>> `Elements`.`PlanID` AS `PlanID`
>>>>> FROM `mongo.grounds`.`Elements` `Elements`
>>>>> INNER JOIN `mongo.grounds`.`Elements_Efforts` `Elements_Efforts` ON
>>>> (`Elements`.`_id` = `Elements_Efforts`.`_id`)
>>>>> WHERE (`Elements`.`PlanID` = '1623263140')
>>>>> GROUP BY `Elements_Efforts`.`EffortTypeName`,
>>>>> `Elements`.`ElementSubTypeName`,
>>>>> `Elements`.`ElementTypeName`,
>>>>> `Elements`.`PlanID`
>>>>> 
>>>>> The query runs for 34 minutes before returning this error; "Sort
>>>> exceeded memory limit of 104857600 bytes, but did not opt in to external
>>>> sorting. Aborting operation. Pass allowDiskUse:true to opt in.' on
>> server
>>>> localhost:27017." Any ideas? I realize that it's a mongodb error, but
>> the
>>>> mongo database doesn't raise this error with the 1.19 release. I was
>>>> expecting improved performance with the mongo storage plugin in the
>>>> upcoming 1.20 release. Nothing in my environment has changed. I've
>> attached
>>>> the full stacktrace.
>>>>> 
>>>>> 
>>>> 
>>>> 
>> 
>> 



Re: 1.20.0-SNAPSHOT: Sort exceeded memory limit of 104857600 bytes

2022-01-28 Thread Charles Givre
Daniel, 
Thanks for flagging this.  One thing I noticed in your logs is this:

Sort exceeded memory limit of 104857600 bytes, but did not opt in to external 
sorting. Aborting operation. Pass allowDiskUse:true to opt in.

What's happening here is that in the newer version of Drill, Drill is sending 
the sort operation to Mongo which (in theory) should be faster.  In contrast, 
Drill 1.19 would receive the unsorted data from Mongo then sort it.  I wonder 
if setting your mongo up so that the `allowDiskUse` parameter is true, you 
might get better results if Mongo sorts the data.

-- C



> On Jan 28, 2022, at 9:43 AM, Daniel Clark  wrote:
> 
> Hi Charles,
> 
> Yes "supportsSortPushdown" is set to true. I left it at the default. I'll
> try setting it to false, and try again. Thanks for the feedback.
> 
> On Fri, Jan 28, 2022 at 9:38 AM Charles Givre  wrote:
> 
>> Hey Daniel,
>> Did you have the sort pushdown enabled?  This is one change that we added
>> to the mongo pushdown since 1.19 and might be affecting your query.
>> Best,
>> -- C
>> 
>> 
>>> On Jan 28, 2022, at 9:32 AM, Daniel Clark  wrote:
>>> 
>>> Hello,
>>> 
>>> While evaluating 1.20.0-SNAPSHOT release performance, I ran a mongo
>> query that runs in 15 minutes in the 1.19 release (below).
>>> 
>>> SELECT `Elements_Efforts`.`EffortTypeName` AS `EffortTypeName`,
>>>  `Elements`.`ElementSubTypeName` AS `ElementSubTypeName`,
>>>  `Elements`.`ElementTypeName` AS `ElementTypeName`,
>>>  `Elements`.`PlanID` AS `PlanID`
>>> FROM `mongo.grounds`.`Elements` `Elements`
>>>  INNER JOIN `mongo.grounds`.`Elements_Efforts` `Elements_Efforts` ON
>> (`Elements`.`_id` = `Elements_Efforts`.`_id`)
>>> WHERE (`Elements`.`PlanID` = '1623263140')
>>> GROUP BY `Elements_Efforts`.`EffortTypeName`,
>>>  `Elements`.`ElementSubTypeName`,
>>>  `Elements`.`ElementTypeName`,
>>>  `Elements`.`PlanID`
>>> 
>>> The query runs for 34 minutes before returning this error; "Sort
>> exceeded memory limit of 104857600 bytes, but did not opt in to external
>> sorting. Aborting operation. Pass allowDiskUse:true to opt in.' on server
>> localhost:27017." Any ideas? I realize that it's a mongodb error, but the
>> mongo database doesn't raise this error with the 1.19 release. I was
>> expecting improved performance with the mongo storage plugin in the
>> upcoming 1.20 release. Nothing in my environment has changed. I've attached
>> the full stacktrace.
>>> 
>>> 
>> 
>> 



[jira] [Created] (DRILL-8112) Excel Reader Ignores HeaderRow Config Param

2022-01-25 Thread Charles Givre (Jira)
Charles Givre created DRILL-8112:


 Summary: Excel Reader Ignores HeaderRow Config Param
 Key: DRILL-8112
 URL: https://issues.apache.org/jira/browse/DRILL-8112
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Other
Affects Versions: 1.19.0
Reporter: Charles Givre
Assignee: Charles Givre
 Fix For: 1.20.0


Excel reader was ignoring the `headerRow` parameter.  This minor bug fix 
corrects that.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[ANNOUNCE] James Turton as PMC Member

2022-01-24 Thread Charles Givre
The Project Management Committee (PMC) for Apache Drill is pleased to announce 
that we have invited James Turton to join us as a PMC member of the Drill 
project and he has accepted.  Please join me in congratulating James and 
welcoming him to the PMC!


Best,
Charles Givre
PMC Chair, Apache Drill 



 
Charles Givre
Founder, CEO DataDistillr
Email:  char...@datadistillr.com
Phone:  + 443-762-3286
Book a Meeting 30 min <https://calendly.com/datadistillr-ceo/30min> • 60 min 
<https://calendly.com/datadistillr-ceo/60min?month=2021-03>
LinkedIn @cgivre <https://www.linkedin.com/in/cgivre/>
GitHub @cgivre <https://github.com/cgivre>
 <https://www.datadistillr.com/>


[ANNOUNCE] New Committer: PJ Fanning

2022-01-24 Thread Charles Givre
The Project Management Committee (PMC) for Apache Drill is pleased to announce 
that we have invited PJ Fanning to join us as a committer to the Drill project. 
 PJ is a committer and PMC member for the Apache POI project and author of the 
Excel Streaming library which Drill uses for the Excel reader.  He has 
contributed numerous fixes and assistance to Drill relating to the Drill's 
Excel reader.  Please join me in congratulating PJ and welcoming him as a 
committer!

Best,
Charles Givre
PMC Chair, Apache Drill 



Re: [DISCUSS] Lombok - friend or foe?

2022-01-22 Thread Charles Givre
I guess the question is do we de-lombok what has already been done?  I really 
like the builders for plugin configs, but I'm generally in agreement that if it 
is causing problems building, we should ditch it.
Best,
-- C



> On Jan 22, 2022, at 5:02 PM, Ted Dunning  wrote:
> 
> The Lombok story is better in Intellij, possibly because the Lombok devs
> use IntelliJ like the majority of devs. Once I knew to install the plugin,
> things were at least comprehensible.
> 
> But the problem is that it isn't obvious. As a newcomer, you don't know
> what you don't know and because Lombok's major effect is code that isn't
> there, it isn't obvious where to look.
> 
> The point about it not helping that much due to Drill's design (good point,
> paul) is apposite, but I think the naive reader issue is even bigger.
> 
> Net, as a person who isn't developing anything for Drill just lately, I
> don't think it's a good idea at all.
> 
> 
> 
> On Sat, Jan 22, 2022 at 6:37 AM luoc  wrote:
> 
>> 
>> Hi all,
>> 
>> I have a story here. In Oct 2021, I upgraded Eclipse to the latest release
>> (2021–09) and then found out that the Lombok dependency was added Drill
>> repository, So I installed Lombok (as a new plugin) from Eclipse
>> Marketplace as I used to. Finally, restarted the IDE and prepared to open
>> the Drill project, but it is crushed cause by the issue #2956 <
>> https://github.com/projectlombok/lombok/issues/2956>, Lombok was not
>> available until I looked at a temporary solution..
>> 
>> I use both Eclipse and IDEA, but I use Eclipse more often. I have no
>> objection to the use of Lombok, but suggest the following three points :
>> 
>> 1. Could we use Lombok only in `drill-contrib` module?
>> 
>> 2. Could we agree not to use Lombok in common module?
>> 
>> 3. It is best to update the dev documentation to describe this results if
>> we continue to use Lombok.
>> 
>> In fact, I have the same idea as Paul, more about balancing choices.
>> 
>> Thanks.
>> 
>>> 2022年1月22日 下午5:34,Paul Rogers  写道:
>>> 
>>> Hi All,
>>> 
>>> I look at any tool as a cost/benefit tradeoff. If Drill were a typical
>>> business app, with lots of "data objects", then the hassle of Lomboc
>> might
>>> be a net win. However, the nature of Drill is that we have very few data
>>> objects. We have lots of Protobuf objects, or Jackson-serialized objects,
>>> but not too many data objects of the kind used with object-relational
>>> mappers.
>>> 
>>> On the other hand, I had to spend an hour or so trying to figure out why
>>> things would not build in Eclipse. Then, more time to figure out how to
>>> install the half-finished Lomboc plugin for Eclipse and various other
>>> fiddling.
>>> 
>>> So, I'd guess, on balance, Lombok has cost, and will continue to cost,
>> more
>>> time than it saved avoiding a few getter/setter methods. And, I agree
>> with
>>> Ted, Eclipse (and, I assume IntelliJ), is pretty quick at generating
>> those
>>> methods.
>>> 
>>> Since Lomboc has a cost, and is not a huge win, KISS suggests we avoid
>>> adding extra dependencies unnecessarily.
>>> 
>>> That's my 2 cents...
>>> 
>>> - Paul
>>> 
>>> 
>>> 
>>> On Fri, Jan 21, 2022 at 8:51 AM Ted Dunning 
>> wrote:
>>> 
 A couple of years ago, I had a dev introduce Lombok into some code
>> without
 me knowing. That let me be a classic naive user.
 
 The result was total confusion on my part. Sooo much code was being
 automagically generated that I couldn't figure out the code and spent a
>> lot
 of time chasing my tail and very little time looking at the crux of the
 code.
 
 My own personal preference is either
 
 - use a language like Julia if you want magic. It's fantastic and all to
 have amazing stuff and coders expect to see it.
 
 - use an IDE to generate the boiler plate and put it into its own little
 annex in the code with the interesting bits near the top of classes.
>> That
 lets debuggers and IDEs that don't understand Lombok to function without
 impairing readability much. Concurrent with that, use discipline to not
>> do
 strange things like changing the expected meaning of the boilerplate.
 
 That's my preference, but I wouldn't want to push that preference very
 hard. My own prioritization is on readability of the code by outsiders.
 
 
 
 
 On Fri, Jan 21, 2022 at 2:25 AM James Turton  wrote:
 
> Hi again Devs
> 
> This one is simple to describe.  Lombok entered the Drill code base
>> this
> year, but not everyone feels that Lombok is appropriate for every code
> base.  To my, fairly limited, understanding the advantage of Lombok is
> that boilerplate code is reduced while the disadvantage is the
> deployment of code generation magic that can have untoward effects on
> build-time tools and IDEs.
> 
> So here is a chance to opine on Lombok if you'd like to.  My own
>> opinion
> is very near neutral and goes 

[jira] [Created] (DRILL-8108) Excel Reader Fails with Duplicate Columns

2022-01-15 Thread Charles Givre (Jira)
Charles Givre created DRILL-8108:


 Summary: Excel Reader Fails with Duplicate Columns
 Key: DRILL-8108
 URL: https://issues.apache.org/jira/browse/DRILL-8108
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Other
Reporter: Charles Givre
Assignee: Charles Givre


In its current implementation, if Drill encounters an Excel file which contains 
duplicate column names, it will fail to read the data.   This PR fixes this 
issue by appending `_n` after the duplicate column.  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[DISCUSS] Per User Access Controls

2022-01-13 Thread Charles Givre
Hello all, 
One of the issues we've been dancing around is having per-user access controls 
in Drill.  As Drill was originally built around the Hadoop ecosystem, the 
Hadoop based connections make use of user-impersonation for per user access 
controls.  However, a rather glaring deficiency is the lack of per-user access 
controls for connections like JDBC, Mongo, Splunk etc.

Recently when I was working on OAuth pull request, it occurred to me that we 
might be able to slightly extend the credential provider interface to allow for 
per-user credentials.  Here's what I was thinking... 

A bit of background:  The credential provider interface is really an 
abstraction for a HashMap.  Here's my proposal The cred provider interface 
would store two hashmaps, one for per-user creds and one for global creds.   
When a user is authenticated to Drill, when they create a storage plugin 
connection, the credential provider would associate the creds with their Drill 
username.  The storage plugins that use credential provider would thus get 
per-user credentials.  

If users did not want per-user credentials, they could simply use direct 
credentials OR use specify that in the credential provider classes.  What do 
you think?  

Best,
-- C



Re: [DISCUSS] Restarting the Arrow Conversation

2022-01-03 Thread Charles Givre
@Paul, 
Do you mind if I copy the contents of your response to DRILL-8088 to this 
thread?   There's a lot of good info there, and I'd hate to see it get lost.
-- C

> On Jan 3, 2022, at 7:41 PM, Paul Rogers  wrote:
> 
> Hi All,
> 
> Thanks Charles for dredging up that old discussion, your memory is better
> than mine! And, thanks Ted for that summary of MapR history. As one of the
> "replacement crew" brought in after the original folks left, your
> description is consistent with my memory of events. Moreover, as we looked
> at what was needed to run Drill in production, an Arrow port was far down
> on the list: it would not have solved actual customer problems.
> 
> Before we get too excited about Arrow, I think we should have a discussion
> about what we want in an internal storage format. I added a long (sorry)
> set of comments in that PR that Charles mentioned that tries to debunk the
> myths that have grown up around using a columnar format as the internal
> representation for a query engine. (Columnar is great for storage.) The
> note presents the many issues we've encountered over the years that have
> caused us to layer ever more code on top of vectors to solve various
> problems. It also highlights a distributed-systems problem which vectors
> make far worse.
> 
> Arrow is meant to be portable, as Ted discussed, but it is still columnar,
> and this is the source of endless problems in an execution engine. So, we
> want to ask, what is the optimal format for what Drill actually does? I'm
> now of the opinion that Drill might actually better benefit  from a
> row-based format, similar to what Impala uses. The notes even paint a path
> forward.
> 
> Ted's description of the goal for Demio suggests that Arrow might be the
> right answer for that market. Drill, however, tends to be used to query
> myriad data sources at scale and as a "query integrator" across systems.
> This use case has different needs, which may be better served with a
> row-based format.
> 
> The upshot is that "value vectors vs. Arrow" is the wrong place to start
> the discussion. The right place is "what does our many years of experience
> with Drill suggest is the most efficient format for how Drill is actually
> used?"
> 
> Note that Drill could have an Arrow-based API independent of the internal
> format. The quote from Charles explains how we could do that.
> 
> Thanks,
> 
> - Paul
> 
> On Mon, Jan 3, 2022 at 12:54 PM Ted Dunning  wrote:
> 
>> Christian,
>> 
>> Your thoughts are very helpful. I find Arrow very nice (I use it in Agstack
>> with Julia and Python).
>> 
>> I don't think anybody is saying that Drill wouldn't be well set with a
>> switch to Arrow or even just interfaces to Arrow. But it is a lot of work
>> to make it all happen.
>> 
>> 
>> 
>> On Mon, Jan 3, 2022 at 11:37 AM Z0ltrix  wrote:
>> 
>>> Hi Charles, Ted, and the others here,
>>> 
>>> it is very interesting to hear the evolution of Drill, Dremio and Arrow
>> in
>>> that context and thank you Charles for restarting that discussion.
>>> 
>>> I think, and James mentioned this in the PR as well, that Drill could
>>> benefit from the continues progress, the Arrow project has made since its
>>> separation from Drill. And the arrow Community seems to be large, so i
>>> assume this goes on and on with improvements, new features, etc. but i
>> have
>>> not enough experience in Drill internals to have an Idea in which mass of
>>> refactoring this would lead.
>>> 
>>> In addition to that, im not aware of the current roadmap of Arrow and if
>>> these would fit into Drills roadmap. Maybe Arrow would go into a
>> different
>>> direction than Drill and what should we do, if Drill is bound to Arrow
>> then?
>>> 
>>> On the other hand, Arrow could help Drill to a wider adoption with
>> clients
>>> like pyarrow, arrow-flight, various other programming languages etc. and
>>> (im not sure about that) maybe its a performance benefit if Drill use
>> Arrow
>>> to read Data from HDFS(example), useses Arrow to work with it during
>>> execution and gives the vectors directly to my Python(example) programm
>> via
>>> arrow-flight so that i can Play around with Pandas, etc.
>>> 
>>> Just some thoughts i have since i have used Dremio with pyarrow and Drill
>>> with odbc connections.
>>> 
>>> Regards
>>> Christian
>>>  Original-Nachricht 
>>> Am 3. Jan. 2022, 20:08, Charles Givre schrieb:
>>> 

Re: [DISCUSS] Restarting the Arrow Conversation

2022-01-03 Thread Charles Givre
Thanks Ted for the perspective!  I had always wished to be a "fly on the wall" 
in those conversations.  :-)
-- C

> On Jan 3, 2022, at 11:00 AM, Charles Givre  wrote:
> 
> Hello all, 
> There was a discussion in a recently closed PR [1] with a discussion between 
> z0ltrix, James Turton and a few others about integrating Drill with Apache 
> Arrow and wondering why it was never done.  I'd like to share my perspective 
> as someone who has been around Drill for some time but also as someone who 
> never worked for MapR or Dremio.  This just represents my understanding of 
> events as an outsider, and I could be wrong about some or all of this.   
> Please forgive (or correct) any inaccuracies. 
> 
> When I first learned of Arrow and the idea of integrating Arrow with Drill, 
> the thing that interested me the most was the ability to move data between 
> platforms without having to serialize/deserialize the data.  From my 
> understanding, MapR did some research and didn't find a significant 
> performance advantage and hence didn't really pursue the integration.  The 
> other side of it was that it would require a significant amount of work to 
> refactor major parts of Drill. 
> 
> I don't know the internal politics, but this was one of the major points of 
> diversion between Dremio and Drill.
> 
> With that said, there was a renewed discussion on the list [2] where Paul 
> Rogers proposed what he described as a "Crude but Effective" approach to an 
> Arrow integration.  
> 
> This is in the email link but here was a part of Paul's email:
> 
>> Charles, just brainstorming a bit, I think the easiest way to start is to 
>> create a simple, stand-alone server that speaks Arrow to the client, and 
>> uses the native Drill client to speak to Drill. The native Drill client 
>> exposes Drill value vectors. One trick would be to convert Drill vectors to 
>> the Arrow format. I think that data vectors are the same format. Possibly 
>> offset vectors. I think Arrow went its own way with null-value (Drill's 
>> is-set) vectors. So, some conversion might be a no-op, others might need to 
>> rewrite a vector. Good thing, this is purely at the vector level, so would 
>> be easy to write. The next issue is the one that Parth has long pointed out: 
>> Drill and Arrow each have their own memory allocators. How could we share a 
>> data vector between the two? The simplest initial solution is just to copy 
>> the data from Drill to Arrow. Slow, but transparent to the client. A crude 
>> first-approximation of the development steps:
>> 
>> A crude first-approximation of the development steps: 
>> 1. Create the client shell server. 
>> 2. Implement the Arrow client protocol. Need some way to accept a query and 
>> return batches of results. 
>> 3. Forward the query to Drill using the native Drill client. 
>> 4. As a first pass, copy vectors from Drill to Arrow and return them to the 
>> client. 
>> 5. Then, solve that memory allocator problem to pass data without copying.
> 
> One point that Paul made was that these pieces are fairly discrete and could 
> be implemented without refactoring major components of Drill.  Of course, 
> this could be something for Drill 2.0.  At a minimum, could we take the 
> conversation off of the PR and put it in the email list? ;-)
> 
> Let's discuss... All ideas are welcome!
> 
> Best,
> -- C
> 
> 
> [1]: https://github.com/apache/drill/pull/2412 
> <https://github.com/apache/drill/pull/2412>
> [2]: https://lists.apache.org/thread/hcmygrv8q8jyw8p57fm9qy3vw2kqfr5l 
> <https://lists.apache.org/thread/hcmygrv8q8jyw8p57fm9qy3vw2kqfr5l>
> 
> 
> 



[DISCUSS] Restarting the Arrow Conversation

2022-01-03 Thread Charles Givre
Hello all, 
There was a discussion in a recently closed PR [1] with a discussion between 
z0ltrix, James Turton and a few others about integrating Drill with Apache 
Arrow and wondering why it was never done.  I'd like to share my perspective as 
someone who has been around Drill for some time but also as someone who never 
worked for MapR or Dremio.  This just represents my understanding of events as 
an outsider, and I could be wrong about some or all of this.   Please forgive 
(or correct) any inaccuracies. 

When I first learned of Arrow and the idea of integrating Arrow with Drill, the 
thing that interested me the most was the ability to move data between 
platforms without having to serialize/deserialize the data.  From my 
understanding, MapR did some research and didn't find a significant performance 
advantage and hence didn't really pursue the integration.  The other side of it 
was that it would require a significant amount of work to refactor major parts 
of Drill. 

I don't know the internal politics, but this was one of the major points of 
diversion between Dremio and Drill.

With that said, there was a renewed discussion on the list [2] where Paul 
Rogers proposed what he described as a "Crude but Effective" approach to an 
Arrow integration.  

This is in the email link but here was a part of Paul's email:

> Charles, just brainstorming a bit, I think the easiest way to start is to 
> create a simple, stand-alone server that speaks Arrow to the client, and uses 
> the native Drill client to speak to Drill. The native Drill client exposes 
> Drill value vectors. One trick would be to convert Drill vectors to the Arrow 
> format. I think that data vectors are the same format. Possibly offset 
> vectors. I think Arrow went its own way with null-value (Drill's is-set) 
> vectors. So, some conversion might be a no-op, others might need to rewrite a 
> vector. Good thing, this is purely at the vector level, so would be easy to 
> write. The next issue is the one that Parth has long pointed out: Drill and 
> Arrow each have their own memory allocators. How could we share a data vector 
> between the two? The simplest initial solution is just to copy the data from 
> Drill to Arrow. Slow, but transparent to the client. A crude 
> first-approximation of the development steps:
> 
> A crude first-approximation of the development steps: 
> 1. Create the client shell server. 
> 2. Implement the Arrow client protocol. Need some way to accept a query and 
> return batches of results. 
> 3. Forward the query to Drill using the native Drill client. 
> 4. As a first pass, copy vectors from Drill to Arrow and return them to the 
> client. 
> 5. Then, solve that memory allocator problem to pass data without copying.

One point that Paul made was that these pieces are fairly discrete and could be 
implemented without refactoring major components of Drill.  Of course, this 
could be something for Drill 2.0.  At a minimum, could we take the conversation 
off of the PR and put it in the email list? ;-)

Let's discuss... All ideas are welcome!

Best,
-- C


[1]: https://github.com/apache/drill/pull/2412 

[2]: https://lists.apache.org/thread/hcmygrv8q8jyw8p57fm9qy3vw2kqfr5l 






Re: [LAZY VOTE] Delete branches gh-pages and gh-pages-master from apache/drill

2022-01-03 Thread Charles Givre
Hi James, 
I would prefer that we keep all cruft intact in the Drill repo.  In fact, I 
went ahead and created a cruft generator which can add additional cruft to 
areas in which we feel the cruft is insufficient. j/k
Enthusiastic +1 from me. (For removal... not additional cruft) 
-- C

> On Jan 3, 2022, at 6:32 AM, James Turton  wrote:
> 
> Thank you, I found and updated a handful of instances.
> 
> On 2022/01/03 11:05, luoc wrote:
>> James, could you please confirm that there is no link to `gh-pages` directly 
>> in the current document?
>> 
>>> On Jan 3, 2022, at 16:28, James Turton  wrote:
>>> 
>>> It's been about four months since we moved the Drill website source over 
>>> to apache/drill-site.  Things have been working fine and we took the full 
>>> commit history across when we migrated so I propose to delete this cruft 
>>> from apache/drill.
>>> 
>>> Please reply if you object.
>>> 
>>> Thanks
>>> James
> 



[jira] [Created] (DRILL-8092) Add Auto Pagination to HTTP Storage Plugin

2021-12-23 Thread Charles Givre (Jira)
Charles Givre created DRILL-8092:


 Summary: Add Auto Pagination to HTTP Storage Plugin
 Key: DRILL-8092
 URL: https://issues.apache.org/jira/browse/DRILL-8092
 Project: Apache Drill
  Issue Type: Improvement
  Components: Storage - Other
Affects Versions: 1.19.0
Reporter: Charles Givre
Assignee: Charles Givre
 Fix For: 1.20.0


See github



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (DRILL-8078) Add WEEK to Date Extract

2021-12-17 Thread Charles Givre (Jira)
Charles Givre created DRILL-8078:


 Summary: Add WEEK to Date Extract
 Key: DRILL-8078
 URL: https://issues.apache.org/jira/browse/DRILL-8078
 Project: Apache Drill
  Issue Type: Improvement
  Components: Functions - Drill
Affects Versions: 1.19.0
Reporter: Charles Givre
Assignee: Charles Givre
 Fix For: 1.20.0


This minor modification adds `WEEK` as an option for the EXTRACT function for 
dates. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [LAZY VOTE] Drill 1.20 freeze delay

2021-12-08 Thread Charles Givre
I think it's worth extending a week. I'd like to see DRILL-8073, 8069 and 8067 
added to the list as they seem fairly important. 
-- C


> On Dec 8, 2021, at 10:40 AM, James Turton  wrote:
> 
> Dear dev community
> 
> Please reply if you *object* to us pushing out the freeze date by one week to 
> 2021-12-16.  The motivation to delay is to try to include more of the open 
> PRs that we are tracking below, a number of which are essentially 
> dev-complete but not yet over the line.
> 
> Closed
> 
> DRILL-1282 Parquet v2 read+write 
> DRILL-7863 Phoenix storage
> DRILL-8027 Iceberg format 
> DRILL-8009 JDBC isValid() 
> 
> Open
> 
> DRILL-7978 Fixed width format 
> DRILL-7983 Get running/completed profiles from REST API 
> DRILL-8015 MongoDB Metastore 
> DRILL-8028 PDF format 
> DRILL-8057 INFORMATION_SCHEMA filter push down is inefficient (feasibility 
> not yet clear)
> 
> Thank you
> James
> 
> 



[jira] [Created] (DRILL-8072) Fix NPE in HTTP Post Requests

2021-12-07 Thread Charles Givre (Jira)
Charles Givre created DRILL-8072:


 Summary: Fix NPE in HTTP Post Requests
 Key: DRILL-8072
 URL: https://issues.apache.org/jira/browse/DRILL-8072
 Project: Apache Drill
  Issue Type: Improvement
  Components: Storage - Other
Affects Versions: 1.19.0
Reporter: Charles Givre
Assignee: Charles Givre
 Fix For: 1.20.0


There was a minor bug in the HTTP Storage Plugin with POST requests.  If the 
`postBody` configuration parameter is null, the plugin throws an NPE. 

This PR adds a null check which prevents the NPE.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: Not able to launch apache drill

2021-12-02 Thread Charles Givre
HI there, 
The first thing that immediately catches my eye is that you're running a very 
old version of Drill.  Can you upgrade to the latest version?
Best,
--C 

> On Dec 2, 2021, at 4:08 AM, Saiprasad Rapolu  
> wrote:
> 
> Hi Team
> 
> 
> 
> While executing  below command as per instruction
> 
>  sqlline.bat -u "jdbc:drill:zk=local"
> 
> getting an errors as below and not able to launch drill user interface on
> browser window.  Could you please help me in this.
> 
> 
> 
> -
> 
> Microsoft Windows [Version 10.0.22000.318]
> 
> (c) Microsoft Corporation. All rights reserved.
> 
> 
> 
> C:\Apache-Drill\apache-drill-1.14.0\bin>sqlline.bat -u "jdbc:drill:zk=local"
> 
> DRILL_ARGS - " -u jdbc:drill:zk=local"
> 
> Calculating HADOOP_CLASSPATH ...
> 
> HBASE_HOME not detected...
> 
> Calculating Drill classpath...
> 
> Error: Failure in starting embedded Drillbit: java.lang.RuntimeException:
> ExitCodeException exitCode=-1073741515: (state=,code=0)
> 
> java.sql.SQLException: Failure in starting embedded Drillbit:
> java.lang.RuntimeException: ExitCodeException exitCode=-1073741515:
> 
>at
> org.apache.drill.jdbc.impl.DrillConnectionImpl.(DrillConnectionImpl.java:143)
> 
>at
> org.apache.drill.jdbc.impl.DrillJdbc41Factory.newDrillConnection(DrillJdbc41Factory.java:72)
> 
>at
> org.apache.drill.jdbc.impl.DrillFactory.newConnection(DrillFactory.java:68)
> 
>at
> org.apache.calcite.avatica.UnregisteredDriver.connect(UnregisteredDriver.java:138)
> 
>at org.apache.drill.jdbc.Driver.connect(Driver.java:72)
> 
>at sqlline.DatabaseConnection.connect(DatabaseConnection.java:167)
> 
>at
> sqlline.DatabaseConnection.getConnection(DatabaseConnection.java:213)
> 
>at sqlline.Commands.connect(Commands.java:1083)
> 
>at sqlline.Commands.connect(Commands.java:1015)
> 
>at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 
>at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 
>at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 
>at java.lang.reflect.Method.invoke(Method.java:498)
> 
>at
> sqlline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:36)
> 
>at sqlline.SqlLine.dispatch(SqlLine.java:742)
> 
>at sqlline.SqlLine.initArgs(SqlLine.java:528)
> 
>at sqlline.SqlLine.begin(SqlLine.java:596)
> 
>at sqlline.SqlLine.start(SqlLine.java:375)
> 
>at sqlline.SqlLine.main(SqlLine.java:268)
> 
> Caused by: java.lang.RuntimeException: ExitCodeException
> exitCode=-1073741515:
> 
>at
> org.apache.drill.exec.store.sys.store.LocalPersistentStore.put(LocalPersistentStore.java:186)
> 
>at
> org.apache.drill.exec.store.StoragePluginsHandlerService.lambda$null$1(StoragePluginsHandlerService.java:93)
> 
>at java.lang.Iterable.forEach(Iterable.java:75)
> 
>at
> org.apache.drill.exec.store.StoragePluginsHandlerService.lambda$loadPlugins$2(StoragePluginsHandlerService.java:93)
> 
>at java.util.Optional.ifPresent(Optional.java:159)
> 
>at
> org.apache.drill.exec.store.StoragePluginsHandlerService.loadPlugins(StoragePluginsHandlerService.java:93)
> 
>at
> org.apache.drill.exec.store.StoragePluginRegistryImpl.init(StoragePluginRegistryImpl.java:123)
> 
>at org.apache.drill.exec.server.Drillbit.run(Drillbit.java:196)
> 
>at
> org.apache.drill.jdbc.impl.DrillConnectionImpl.(DrillConnectionImpl.java:134)
> 
>... 18 more
> 
> Caused by: ExitCodeException exitCode=-1073741515:
> 
>at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
> 
>at org.apache.hadoop.util.Shell.run(Shell.java:456)
> 
>at
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
> 
>at org.apache.hadoop.util.Shell.execCommand(Shell.java:815)
> 
>at org.apache.hadoop.util.Shell.execCommand(Shell.java:798)
> 
>at
> org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:728)
> 
>at
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.(RawLocalFileSystem.java:225)
> 
>at
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.(RawLocalFileSystem.java:209)
> 
>at
> org.apache.hadoop.fs.RawLocalFileSystem.createOutputStreamWithMode(RawLocalFileSystem.java:305)
> 
>at
> org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:294)
> 
>at
> org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:326)
> 
>at
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.(ChecksumFileSystem.java:393)
> 
>at
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:456)
> 
>at
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:435)
> 
>at 

Re: Drill 1.20 release plan

2021-12-01 Thread Charles Givre
My vote here would be to do a bit of diagnosis first.  Let's see if we can 
figure out what is going wrong and a sense of how much effort is required to 
fix.  Once we know that, we can decide on how to proceed. 
-- C


> On Dec 1, 2021, at 8:13 AM, James Turton  wrote:
> 
> We've picked up a bug: DRILL-8063.  It was present in earlier releases 
> (verified in 1.18) and may require a fix in Calcite rather than our own 
> codebase.  It's a severe one, resulting in an OOM crash for particular 
> queries.  Please indicate if you would like to have the following marked as a 
> release blocker.  Lazy consensus will apply, i.e. no reply = "I do not 
> believe this bug should block 1.20".
> 
> https://issues.apache.org/jira/browse/DRILL-8063
> 
> 
> On 2021/12/01 12:00, James Turton wrote:
>> Hi all
>> 
>> Given we've had no objections, please strive to merge your PRs for 1.20 by 
>> 10 December which is the current targeted freeze date.
>> 
>> Closed:
>> DRILL-1282 Parquet v2 read+write
>> DRILL-8027 Iceberg format
>> DRILL-8009 JDBC isValid()
>> 
>> Open:
>> DRILL-7863 Phoenix storage
>> DRILL-7978 Fixed width format
>> DRILL-7983 Get running/completed profiles from REST API
>> DRILL-8015 MongoDB Metastore
>> DRILL-8028 PDF format
>> * DRILL-8057 INFORMATION_SCHEMA filter push down is inefficient (feasibility 
>> not yet clear)
>> 
>> 
>> On 2021/11/25 09:53, James Turton wrote:
>>> Dear dev community
>>> 
>>> Please see an update on the Jiras earmarked for 1.20 below. We have of 
>>> course also closed other Jiras in the intervening period. If you are aware 
>>> of any reason that one of the listed Jiras will not be ready please say so, 
>>> so I can remove it. Otherwise I'll post comments to the authors asking them 
>>> to aim for merging by the release cut-off date.  How does a cut-off date of 
>>> 10 December sound?
>>> 
>>> (* indicates a Jira not previously discussed in this thread.)
>>> 
>>> Closed:
>>> DRILL-1282 Parquet v2 read+write
>>> DRILL-8027 Iceberg format
>>> 
>>> Open:
>>> DRILL-7863 Phoenix storage
>>> DRILL-7978 Fixed width format
>>> DRILL-7983 Get running/completed profiles from REST API (corrected from 
>>> 7938 which I believe was a typo)
>>> DRILL-8009 JDBC isValid()
>>> * DRILL-8015 MongoDB Metastore
>>> DRILL-8028 PDF format
>>> 
>>> 
>>> The stretch goal DRILL-7871 (StoragePluginStore instance per user) has not 
>>> yet reached design consensus so I propose that it should not be included in 
>>> 1.20.
>>> 
>>> 
>>> On 2021/11/03 15:24, Charles Givre wrote:
>>>> Hi Luoc,
>>>> IMHO there are a few PRs in flight that I’d like to see included in the 
>>>> next release.  I sent them in slack, but so that they are preserved for 
>>>> the mailing list.  I'd like to see DRILL-1282, DRILL-7938, DRILL-8027, 
>>>> DRILL-8028 and possibly DRILL-8009 and DRILL-7978 get merged for the next 
>>>> release. DRILL-7871 would be a stretch goal.
>>>> Best,
>>>> — C
>>>> 
>>>>> On Nov 3, 2021, at 9:21 AM, luoc  wrote:
>>>>> 
>>>>> 
>>>>> Thanks for your support, James. Since there are no negative votes, I will 
>>>>> recommend you as the release manager.
>>>>> 
>>>>> We'll keep a light on for you.
>>>>> 
>>>>>> 在 2021年11月3日,01:04,James Turton  写道:
>>>>>> 
>>>>>> Hi luoc
>>>>>> 
>>>>>> I am willing to help the release in any capacity needed. I know there 
>>>>>> are others who have release experience while I do not but I'm sure it 
>>>>>> can be learned.  I'll have PR #2351 done this week, it would be nice 
>>>>>> (but not critical) to include it.
>>>>>> 
>>>>>> Thanks
>>>>>> James
>>>>>> 
>>>>>>> On 2021/11/01 16:27, luoc wrote:
>>>>>>> 
>>>>>>> Hello, Drill dev and users :
>>>>>>> 
>>>>>>> Since the latest 1.19, Drill master branch has collected many changes, 
>>>>>>> bug fixed and enhanced. Drill team plan to release the 1.20 at the end 
>>>>>>> of November 2021.
>>>>>>> 
>>>>>>> We have some things to work out :
>>>>>>> 
>>>>>>> 1. Are you willing to be the 1.20 release manager?
>>>>>>> 
>>>>>>> 2. Is there one of the unmerged pull request that you want to complete?
>>>>>>> 
>>>>>>> 3. Do you have a feature under development and want to include in 1.20?
>>>>>>> 
>>>>>>> 4. Would you like to help with the test and feedback (build with master 
>>>>>>> branch)?
>>>>>>> 
>>>>>>> I hope everyone will participate in the talk and reply to these 
>>>>>>> questions as soon as possible, thank you.
>>>>>>> 
>>>>>>> -- luoc
>>>>>>> 
>>>>>>> 
> 



Re: Supporting parquet unsigned integers

2021-11-30 Thread Charles Givre
Hey Jason, 
Thanks for sharing this.  Would you consider creating a pull request with these 
changes?
Thanks!
-- C

> On Nov 30, 2021, at 12:35 PM, Jason Gauci  wrote:
> 
> Hi Drill dev!
> 
> I was using drill to analyze data in some parquet files, and ran into
> trouble with columns containing unsigned integers.  Doing a "select
> distinct" or a "group by" or even a cast was failing with an unsupported
> operation error.
> 
> Adding casts for unsigned integers allowed me to cast the columns and then
> group by them.  I don't know if this is the ideal solution, but here is a
> patch that worked for me.  After this change, I was able to work with those
> columns.
> 
> diff --git a/exec/java-exec/src/main/codegen/data/Casts.tdd
> b/exec/java-exec/src/main/codegen/data/Casts.tdd
> index f9ccbf817..29ff4dabb 100644
> --- a/exec/java-exec/src/main/codegen/data/Casts.tdd
> +++ b/exec/java-exec/src/main/codegen/data/Casts.tdd
> @@ -19,11 +19,26 @@
> {
>   types: [
> {from: "Int", to: "BigInt", major: "Fixed"},
> +{from: "UInt1", to: "BigInt", explicit: "long", major: "Fixed"},
> +{from: "UInt2", to: "BigInt", explicit: "long", major: "Fixed"},
> +{from: "UInt4", to: "BigInt", explicit: "long", major: "Fixed"},
> +{from: "UInt8", to: "BigInt", explicit: "long", major: "Fixed"},
> +
> {from: "Float4", to: "Float8", major: "Fixed" },
> {from: "Int", to: "Float4", major: "Fixed" },
> {from: "BigInt", to: "Float4", major: "Fixed" },
> +{from: "UInt1", to: "Float4", major: "Fixed"},
> +{from: "UInt2", to: "Float4", major: "Fixed"},
> +{from: "UInt4", to: "Float4", major: "Fixed"},
> +{from: "UInt8", to: "Float4", major: "Fixed"},
> +
> {from: "Int", to: "Float8", major: "Fixed" },
> {from: "BigInt", to: "Float8", major: "Fixed" },
> +{from: "UInt1", to: "Float8", major: "Fixed"},
> +{from: "UInt2", to: "Float8", major: "Fixed"},
> +{from: "UInt4", to: "Float8", major: "Fixed"},
> +{from: "UInt8", to: "Float8", major: "Fixed"},
> +
> {to: "Int", from: "BigInt", explicit: "int", major: "Fixed"},
> {to: "Float4", from: "Float8" , explicit: "float", major: "Fixed"},
> {to: "Int", from: "Float4" , explicit: "int", native: "float", major:
> "Fixed"},
> (END)



[jira] [Created] (DRILL-8056) Add OAuth2 Support for HTTP Rest Plugin

2021-11-25 Thread Charles Givre (Jira)
Charles Givre created DRILL-8056:


 Summary: Add OAuth2 Support for HTTP Rest Plugin
 Key: DRILL-8056
 URL: https://issues.apache.org/jira/browse/DRILL-8056
 Project: Apache Drill
  Issue Type: Improvement
  Components: Storage - Other
Affects Versions: 1.19.0
Reporter: Charles Givre
Assignee: Charles Givre
 Fix For: 1.20.0


Many enterprise platforms use OAuth2 for authentication and authorization.  
This pull request allows Drill to authenticate with external APIs that use 
OAuth2 and query these data sources.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


<    1   2   3   4   5   6   7   8   >