[GitHub] [drill] paul-rogers commented on a change in pull request #2056: DRILL-7701: EVF V2 Scan Framework

2020-04-18 Thread GitBox
paul-rogers commented on a change in pull request #2056: DRILL-7701: EVF V2 
Scan Framework
URL: https://github.com/apache/drill/pull/2056#discussion_r410816122
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/scan/v3/file/FileScanUtils.java
 ##
 @@ -38,7 +38,7 @@ public static String partitionColName(int partition) {
   }
 
   public static List expandMetadata(int dirCount) {
-List selected = Arrays.asList(
+List selected = Lists.newArrayList(
 
 Review comment:
   Agreed. However, `Arrays.asList()` produces an immutable list, whereas here 
we need a mutable list so we can add a few more items. Is there a JDK method 
that will do what we need?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] paul-rogers opened a new pull request #2062: DRILL-7711: Add data path, parameter filter pushdown to HTTP plugin

2020-04-18 Thread GitBox
paul-rogers opened a new pull request #2062: DRILL-7711: Add data path, 
parameter filter pushdown to HTTP plugin
URL: https://github.com/apache/drill/pull/2062
 
 
   # [DRILL-7711](https://issues.apache.org/jira/browse/DRILL-7711): Add data 
path, parameter filter pushdown to HTTP plugin
   
   ## Description
   
   Adds an option to specify the path to data so the plugin will ignore REST 
message "overhead" except the actual data.
   
   Allows specifying HTTP URL parameters as filter push-downs from SQL.
   
   Includes improvements to the JSON structure parser to handle a few more 
conditions. Also includes a generic filter push-down framework for the case 
where filters are translated to expressions as here. (The framework is not 
applicable to cases where expressions are interpreted, as in partition pruning.)
   
   ## Documentation
   
   Please see the 
[README.md](https://github.com/paul-rogers/drill/blob/a4226a06361bc136427797f542c401e05d608c5f/contrib/storage-http/README.md)
 file for full details.
   
   ## Testing
   
   Added unit tests for new functionality. Reran all existing tests.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


Re: Druid Storage Plugin for Drill

2020-04-18 Thread Charles Givre
I don't mean to be a pain in the you know where about this...  I very much 
would like to see this as a part of Drill. 
Best,
-- C



> On Apr 18, 2020, at 4:54 PM, Ankush Kapur  wrote:
> 
> Apologies for being MIA, got hung-up at work really bad.
> 
> Yes, I did start working on the changes requested in the review. Will act on 
> the other things you mentioned, as well.
> 
> Thank you for following-up though, helps if someone is watching over :)
> 
> - Ankush
> 
> On Fri, Apr 17, 2020 at 12:56 AM Charles Givre  > wrote:
> Hi Ankush, 
> How's it going?  I hope you are doing ok with all that is happening around 
> us.  I hate to be a pest but I was wondering if you've had the chance to work 
> on the Druid storage plugin review?  I'm happy to help with some of the 
> changes, but I didn't want to step on your toes or re-do things you're 
> currently working on. 
> 
> Would it be possible for you to rebase on the current master as well?  There 
> have been some improvements unrelated to this plugin which will be helpful as 
> we move towards committing this. 
> Stay safe!
> 
> -- C
> 
> 
> 
> 
> 
>> On Mar 22, 2020, at 8:34 AM, Ankush Kapur > > wrote:
>> 
>> Yes, thank you. both for your time reviewing...
>> 
>> I am working on changes based on it.
>> 
>> - Ankush
>> 
>> On Fri, Mar 20, 2020 at 11:25 AM Charles Givre > > wrote:
>> Hi Ankush. 
>> I hope all is well.  Were you able to see Paul and my review comments for 
>> the Druid plugin?
>> Thanks,
>> -- C
>> 
>>> On Mar 13, 2020, at 8:17 AM, Charles Givre >> > wrote:
>>> 
>>> HI Ankush, 
>>> I hope you're doing ok. I live in Maryland and all our schools are now 
>>> closed for the next two weeks, so it will be interesting Anyway, I 
>>> tagged you in a few comments, but it's weird that my comments didn't show 
>>> up on the main page.  Take a look here and you should see them: 
>>> https://github.com/apache/drill/pull/1888/files 
>>> 
>>> 
>>> General comments:
>>> 1.  Please verify that all logger creations are in the correct format.  
>>> Also, I'd strongly suggest avoiding info messages and use either debug, 
>>> warn or error as the case may be.
>>> 2.  I had a general question about security and passing credentials to 
>>> Druid.  I don't really know how Druid handles authentication, but it didn't 
>>> seem like there was any way to pass creds to Druid.  In any event, this 
>>> will need to be documented in the README file.   
>>> 3.  Please go through and remove any commented out code. 
>>> 
>>> Writing a storage plugin is not easy, and I'm really impressed that your 
>>> first contribution to Drill is something of this scale.  This is really 
>>> nice work and I'm really hoping we can get it committed for the next 
>>> release. 
>>> Thanks,
>>> -- C
>>> 
>>> 
 On Mar 13, 2020, at 5:56 AM, Ankush Kapur >>> > wrote:
 
 Hi Charles,
 
 Things are good here, and hope the same for you and your family.
 
 Pardon me, but I do not see your review comments on the four files 
 mentioned.
 
 - Ankush
 
 On Thu, Mar 12, 2020 at 8:36 PM Charles Givre >>> > wrote:
 Hi Ankush, 
 I hope all is well.  I started the review on your plugin.  I've reviewed 
 about four files so far.  When you have a chance, please take a look at 
 the review comments and if you have any questions, please let me know. 
 Best,
 -- C
 
 
> On Feb 28, 2020, at 8:33 AM, Charles Givre  > wrote:
> 
> Hi Ankush,
> I hope all is well. I've been speaking with Paul about your plugin and 
> I'm going to do the code review.  I usually don't do big code reviews 
> like this, so I was hoping someone else would pick it up, but that hasn't 
> happened and I really want to see this get committed to the next version 
> of Drill. 
> 
> Since this is a large PR, I'm going to do this in small reviews rather 
> than one massive review so please expect waves of review.  I think it's 
> best that way it doesn't get overwhelming.  My game plan will be to focus 
> on a few files at a time rather than trying to look at the entire plugin 
> and give a few comments here and there.  I'll send future correspondence 
> via the dev alias in the interest of transparency as well.  I'm going to 
> ask Paul to look at the pushdown code as he is a lot more familiar with 
> that than I am, but we're not there yet. 
> 
> Thanks for all your work and patience on this and I am excited about 
> getting it into Drill.  I know it will be a benefit for the community. 
> Best,
> -- C
> 
> 
> 
>> On Jan 21, 2020, at 8:43 AM, Charles Givre > > wrote:
>> 
>> Hey Ankush, 
>> 

[GitHub] [drill] paul-rogers edited a comment on issue #2054: DRILL-6168: Revise format plugin table functions

2020-04-18 Thread GitBox
paul-rogers edited a comment on issue #2054: DRILL-6168: Revise format plugin 
table functions
URL: https://github.com/apache/drill/pull/2054#issuecomment-615934230
 
 
   @arina-ielchiieva, yes, we can modify the queries to force the incorrect 
behavior. We could add a yet another option to turn off inheritance and keep 
the old behavior. However, I'd argue that the old results were actually wrong, 
and the new results are correct. So, if we modify anything, it seems the best 
approach is to modify the expected results to reflect the correct behavior, and 
let the old versions fail because they were, in fact, wrong.
   
   Actually, it is possible to modify the query to produce the correct results 
on old versions: we just specify values for the fields which are inherited in 
Drill 1.18. Doing so will mask the old, incorrect, behavior, and won't test the 
new behavior.
   
   Yet another option is to modify the framework. In addition to the `.q`, and 
`.e` files, add a `.p` (parameters) file and the supporting code. For example:
   
   ```
   validTo: "1.17"
   ```
   
   For the old queries, and:
   
   ```
   validFrom: "1.18"
   ```
   
   For the new queries. Better would be to use commit IDs, but those are not 
ordered.
   
   We used to have this problem all the time back when Drill was taking shape. 
We never did solve it: we just updated the tests and moved on.
   
   Thoughts? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (DRILL-7711) Add data path, parameter filter pushdown to HTTP plugin

2020-04-18 Thread Paul Rogers (Jira)
Paul Rogers created DRILL-7711:
--

 Summary: Add data path, parameter filter pushdown to HTTP plugin
 Key: DRILL-7711
 URL: https://issues.apache.org/jira/browse/DRILL-7711
 Project: Apache Drill
  Issue Type: Improvement
Affects Versions: 1.18.0
Reporter: Paul Rogers
Assignee: Paul Rogers
 Fix For: 1.18.0


Add to the new HTTP plugin two new features:

 * The ability to express a path to the data to avoid having to work with 
complex message objects in SQL.
 * The ability to specify HTTP parameters using filter push-downs from SQL.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Druid Storage Plugin for Drill

2020-04-18 Thread Ankush Kapur
Apologies for being MIA, got hung-up at work really bad.

Yes, I did start working on the changes requested in the review. Will act
on the other things you mentioned, as well.

Thank you for following-up though, helps if someone is watching over :)

- Ankush

On Fri, Apr 17, 2020 at 12:56 AM Charles Givre  wrote:

> Hi Ankush,
> How's it going?  I hope you are doing ok with all that is happening around
> us.  I hate to be a pest but I was wondering if you've had the chance to
> work on the Druid storage plugin review?  I'm happy to help with some of
> the changes, but I didn't want to step on your toes or re-do things you're
> currently working on.
>
> Would it be possible for you to rebase on the current master as well?
> There have been some improvements unrelated to this plugin which will be
> helpful as we move towards committing this.
> Stay safe!
>
> -- C
>
>
>
>
>
> On Mar 22, 2020, at 8:34 AM, Ankush Kapur  wrote:
>
> Yes, thank you. both for your time reviewing...
>
> I am working on changes based on it.
>
> - Ankush
>
> On Fri, Mar 20, 2020 at 11:25 AM Charles Givre  wrote:
>
>> Hi Ankush.
>> I hope all is well.  Were you able to see Paul and my review comments for
>> the Druid plugin?
>> Thanks,
>> -- C
>>
>> On Mar 13, 2020, at 8:17 AM, Charles Givre  wrote:
>>
>> HI Ankush,
>> I hope you're doing ok. I live in Maryland and all our schools are now
>> closed for the next two weeks, so it will be interesting Anyway, I
>> tagged you in a few comments, but it's weird that my comments didn't show
>> up on the main page.  Take a look here and you should see them:
>> https://github.com/apache/drill/pull/1888/files
>>
>> General comments:
>> 1.  Please verify that all logger creations are in the correct format.
>> Also, I'd strongly suggest avoiding info messages and use either debug,
>> warn or error as the case may be.
>> 2.  I had a general question about security and passing credentials to
>> Druid.  I don't really know how Druid handles authentication, but it didn't
>> seem like there was any way to pass creds to Druid.  In any event, this
>> will need to be documented in the README file.
>> 3.  Please go through and remove any commented out code.
>>
>> Writing a storage plugin is not easy, and I'm really impressed that your
>> first contribution to Drill is something of this scale.  This is really
>> nice work and I'm really hoping we can get it committed for the next
>> release.
>> Thanks,
>> -- C
>>
>>
>> On Mar 13, 2020, at 5:56 AM, Ankush Kapur  wrote:
>>
>> Hi Charles,
>>
>> Things are good here, and hope the same for you and your family.
>>
>> Pardon me, but I do not see your review comments on the four files
>> mentioned.
>>
>> - Ankush
>>
>> On Thu, Mar 12, 2020 at 8:36 PM Charles Givre  wrote:
>>
>>> Hi Ankush,
>>> I hope all is well.  I started the review on your plugin.  I've reviewed
>>> about four files so far.  When you have a chance, please take a look at the
>>> review comments and if you have any questions, please let me know.
>>> Best,
>>> -- C
>>>
>>>
>>> On Feb 28, 2020, at 8:33 AM, Charles Givre  wrote:
>>>
>>> Hi Ankush,
>>> I hope all is well. I've been speaking with Paul about your plugin and
>>> I'm going to do the code review.  I usually don't do big code reviews like
>>> this, so I was hoping someone else would pick it up, but that hasn't
>>> happened and I really want to see this get committed to the next version of
>>> Drill.
>>>
>>> Since this is a large PR, I'm going to do this in small reviews rather
>>> than one massive review so please expect waves of review.  I think it's
>>> best that way it doesn't get overwhelming.  My game plan will be to focus
>>> on a few files at a time rather than trying to look at the entire plugin
>>> and give a few comments here and there.  I'll send future correspondence
>>> via the dev alias in the interest of transparency as well.  I'm going to
>>> ask Paul to look at the pushdown code as he is a lot more familiar with
>>> that than I am, but we're not there yet.
>>>
>>> Thanks for all your work and patience on this and I am excited about
>>> getting it into Drill.  I know it will be a benefit for the community.
>>> Best,
>>> -- C
>>>
>>>
>>>
>>> On Jan 21, 2020, at 8:43 AM, Charles Givre  wrote:
>>>
>>> Hey Ankush,
>>> See responses inline.
>>>
>>>
>>> On Jan 21, 2020, at 8:12 AM, Ankush Kapur 
>>> wrote:
>>>
>>> Hi Charles,
>>>
>>> Thanks for taking time to make the changes.
>>>
>>>
>>> No problem.  Also FYI, Drill uses 2 space indents not 4.  I just pushed
>>> an update with all the spacing on 2 space.  Is there anything that I can
>>> assist with?
>>>
>>>
>>> Could you please point me to the "commented-out" code you are referring
>>> too. That's actually an issue i was trying to track down last weekend while
>>> writing tests. I noticed that the "where" clause was not being pushed down.
>>> This was keeping me from writing the tests I have been working on.
>>>
>>>
>>> The commented out code was in the DruidStoragePlugin, 

[GitHub] [drill] paul-rogers closed pull request #2045: DRILL-7683: Add "message parsing" to new JSON loader

2020-04-18 Thread GitBox
paul-rogers closed pull request #2045: DRILL-7683: Add "message parsing" to new 
JSON loader
URL: https://github.com/apache/drill/pull/2045
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] paul-rogers opened a new pull request #2045: DRILL-7683: Add "message parsing" to new JSON loader

2020-04-18 Thread GitBox
paul-rogers opened a new pull request #2045: DRILL-7683: Add "message parsing" 
to new JSON loader
URL: https://github.com/apache/drill/pull/2045
 
 
   # [DRILL-7683](https://issues.apache.org/jira/browse/DRILL-7683): Add 
"message parsing" to new JSON loader
   
   ## Description
   
   Worked on a project that uses the new JSON loader to parse a REST response 
that includes a set of "wrapper" fields around the JSON payload. Example:
   
   ```
   { "status": "ok", "results: [ data here ]}
   ```
   
   To solve this cleanly, added the ability to specify a "message parser" to 
consume JSON tokens up to the start of the data. This parser can be written as 
needed for each different data source.
   
   When working on the REST data source, it became clear we need a no-code way 
to handle the same issue. So, extended the message parser to handle the simple 
case, a path to the data. In the above, the path would be just `results`. The 
path can contain any number of slash-separated elements: `response/body/rows` 
for example.
   
   Since this change adds two more parameters to the JSON structure parser, 
added builders to gather the needed parameters rather than making the 
constructor even larger.
   
   Note that, aside from the private plugin mentioned above, no other code uses 
the JSON loader yet.
   
   ## Developer Documentation
   
   This PR is part of the "new" V2 EVF-based JSON parser. An example of usage 
appears in PR #1892 (REST storage plugin.) To use the simple path-based form of 
message parsing, add the following option to the JSON parser builder:
   
   ```
   .dataPath("path/to/data")
   ```
   
   The tail element should be the one that holds an array of JSON records.
   
   To add custom message parsing (to check return status, say), use a different 
option of the builder:
   
   ```
 .messageParser(parser)
   ```
   
   Then implement the `MessageParser` class to do the parsing. The present 
version works at the level of JSON tokens: you must use the provided 
"tokenizer" to read each token and do the right thing.
   
   Since working at the token level is tedious, the goal is to provide a 
read-made parser that takes a path to the data, such as "response.data" and 
skips all fields except those in the path.
   
   The goal here is to get the mechanism added to the JSON parser so we can 
then try it in the REST plugin and work out exactly what we need in that 
higher-level parser level.
   
   ## User Documentation
   
   N/A
   
   ## Testing
   
   Added unit tests. Reran all existing tests.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #2060: DRILL-7706: Implement Drill RDBMS Metastore for Tables component

2020-04-18 Thread GitBox
vvysotskyi commented on a change in pull request #2060: DRILL-7706: Implement 
Drill RDBMS Metastore for Tables component
URL: https://github.com/apache/drill/pull/2060#discussion_r410730382
 
 

 ##
 File path: 
metastore/rdbms-metastore/src/main/resources/db/changelog/changes/initial_ddls.yaml
 ##
 @@ -0,0 +1,397 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# Creates 5 RDBMS Metastore tables component tables: TABLES, SEGMENTS, FILES, 
ROW_GROUPS, PARTITIONS.
+databaseChangeLog:
+# to preserve upper case naming in some DBs (ex: PostgreSQL) and quote 
reserved column names (ex: COLUMN)
+  - objectQuotingStrategy: QUOTE_ALL_OBJECTS
+  - changeSet:
+  id: 1
+  author: arina
+  changes:
+- createTable:
+tableName: TABLES
+columns:
+  - column:
+  name: STORAGE_PLUGIN
+  type: VARCHAR(100)
+  constraints:
+primaryKey: true
+nullable: false
+  - column:
+  name: WORKSPACE
+  type: VARCHAR(100)
+  constraints:
+primaryKey: true
+nullable: false
+  - column:
+  name: TABLE_NAME
+  type: VARCHAR(200)
+  constraints:
+primaryKey: true
+nullable: false
+  - column:
+  name: OWNER
+  type: VARCHAR(100)
+  - column:
+  name: TABLE_TYPE
+  type: VARCHAR(100)
+  - column:
+  name: METADATA_KEY
+  type: VARCHAR(100)
+  constraints:
+nullable: false
+  - column:
+  name: METADATA_TYPE
+  type: VARCHAR(100)
+  constraints:
+nullable: false
+  - column:
+  name: LOCATION
+  type: VARCHAR(500)
 
 Review comment:
   Realy a good choice for this value. Though some linux-based systems limit 
its size to 256 bytes, other systems like Windows removed the existing 260 
characters limit. So I think 500 value is fine.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on a change in pull request #2060: DRILL-7706: Implement Drill RDBMS Metastore for Tables component

2020-04-18 Thread GitBox
vvysotskyi commented on a change in pull request #2060: DRILL-7706: Implement 
Drill RDBMS Metastore for Tables component
URL: https://github.com/apache/drill/pull/2060#discussion_r410700439
 
 

 ##
 File path: metastore/rdbms-metastore/README.md
 ##
 @@ -0,0 +1,140 @@
+# RDBMS Metastore
+
+RDBMS Metastore implementation allows to store Drill Metastore metadata in 
configured RDBMS.
+
+## Configuration
+
+Currently, RDBMS Metastore is not default Drill Metastore implementation.
+To enable RDBMS Metastore create `drill-metastore-override.conf` and indicate 
RDBMS Metastore class:
+
+```
+drill.metastore: {
+  implementation.class: "org.apache.drill.metastore.rdbms.RdbmsMetastore"
+}
+```
+
+### Connection properties
+
+Data source connection properties allows to indicate how to connect to Drill 
Metastore database.
+
+`drill.metastore.rdbms.data_source.driver` - driver class name. Required. 
+Note, driver must be included into Drill classpath prior to start up for all 
databases except of SQLite.
+
+`drill.metastore.rdbms.data_source.url` - connection url. Required.
+
+`drill.metastore.rdbms.data_source.username` - database user on whose behalf 
the connection is
+being made. Optional, if database does not require user to connect. 
+
+`drill.metastore.rdbms.data_source.password` - database user's password. 
+Optional, if database does not require user's password to connect.
+
+`drill.metastore.rdbms.data_source.properties` - specifies properties which 
will be used
+during data source creation. See list of available [Hikari 
properties](https://github.com/brettwooldridge/HikariCP)
+for more details.
+
+### Default configuration 
+
+Out of the box, Drill RDBMS Metastore is configured to use embedded file 
system based SQLite database.
+It will be created locally in user's home directory under 
`${drill.exec.zk.root}"/metastore` location.
+
+Default setup can be used only in Drill embedded mode. 
+If SQLite setup will be used in distributed mode, each drillbit will have it's 
own SQLite instance
+which will lead to bogus results during queries execution.
+In distributed mode, database instance must be accessible for all drillbits.
+
+### Custom configuration
+
+`drill-metastore-override.conf` is used to customize connection details to the 
Drill Metastore database.
+See `drill-metastore-override-example.conf` for more details.
+
+ Example of PostgreSQL configuration
+
+```
+drill.metastore: {
+  implementation.class: "org.apache.drill.metastore.rdbms.RdbmsMetastore",
+  rdbms: {
+data_source: {
+  driver: "org.postgresql.Driver",
+  url: 
"jdbc:postgresql://localhost:1234/mydb?currentSchema=drill_metastore",
+  username: "user",
+  password: "password"
+}
+  }
+}
+```
+
+Note: PostgreSQL JDBC driver must be present in Drill classpath.
+
+ Example of MySQL configuration
+
+```
+drill.metastore: {
+  implementation.class: "org.apache.drill.metastore.rdbms.RdbmsMetastore",
+  rdbms: {
+data_source: {
+  driver: "com.mysql.cj.jdbc.Driver",
+  url: "jdbc:mysql://localhost:1234/drill_metastore",
+  username: "user",
+  password: "password"
+}
+  }
+}
+```
+
+Note: MySQL JDBC driver must be present in Drill classpath.
+
+# Driver version
+
+For MySQL connector version 6+, use `com.mysql.cj.jdbc.Driver` driver class,
+for older versions use `com.mysql.jdbc.Driver`.
+
+## Tables structure
+
+Drill Metastore consists of components. Currently, only `tables` component is 
implemented.
+This component provides metadata about Drill tables, including their segments, 
files, row groups and partitions.
+In Drill `tables` component unit is represented by `TableMetadataUnit` class 
which is applicable to any metadata type.
+Fields which are not applicable to particular metadata type, remain unset.
+
+In RDBMS Drill Metastore each `tables` component metadata type has it's own 
table.
+There are five tables: `TABLES`, `SEGMENTS`, `FILES`, `ROW_GROUPS`, 
`PARTITIONS`.
+These tables structure and primary keys are defined based on fields specific 
for each metadata type.
+See `src/main/resources/db/changelog/changes/initial_ddls.yaml` for more 
details.
+
+### Tables creation
+
+RDBMS Metastore is using 
[Liquibase](https://www.liquibase.org/documentation/core-concepts/index.html)
+to create all needed tables during RDBMS Metastore initialization, users 
should not create any tables manually.
+
+### Database schema
+
+Liquibase is using yaml configuration file to apply changes into database 
schema: `src/main/resources/db/changelog/chnagelog.yaml`.
 
 Review comment:
   ```suggestion
   Liquibase is using yaml configuration file to apply changes into database 
schema: `src/main/resources/db/changelog/changelog.yaml`.
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For 

[GitHub] [drill] vvysotskyi commented on a change in pull request #2060: DRILL-7706: Implement Drill RDBMS Metastore for Tables component

2020-04-18 Thread GitBox
vvysotskyi commented on a change in pull request #2060: DRILL-7706: Implement 
Drill RDBMS Metastore for Tables component
URL: https://github.com/apache/drill/pull/2060#discussion_r410729508
 
 

 ##
 File path: 
metastore/rdbms-metastore/src/main/resources/db/changelog/changes/initial_ddls.yaml
 ##
 @@ -0,0 +1,397 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# Creates 5 RDBMS Metastore tables component tables: TABLES, SEGMENTS, FILES, 
ROW_GROUPS, PARTITIONS.
+databaseChangeLog:
+# to preserve upper case naming in some DBs (ex: PostgreSQL) and quote 
reserved column names (ex: COLUMN)
+  - objectQuotingStrategy: QUOTE_ALL_OBJECTS
+  - changeSet:
+  id: 1
+  author: arina
+  changes:
+- createTable:
+tableName: TABLES
+columns:
+  - column:
+  name: STORAGE_PLUGIN
+  type: VARCHAR(100)
+  constraints:
+primaryKey: true
+nullable: false
+  - column:
+  name: WORKSPACE
+  type: VARCHAR(100)
+  constraints:
+primaryKey: true
+nullable: false
+  - column:
+  name: TABLE_NAME
+  type: VARCHAR(200)
 
 Review comment:
   Small assumption: table path may be specified as a table name, for example 
dfs.`/tmp/a/b/c/nation.parquet`  so I would recommend making its size 
closer to the `LOCATION` size.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] arina-ielchiieva commented on a change in pull request #2056: DRILL-7701: EVF V2 Scan Framework

2020-04-18 Thread GitBox
arina-ielchiieva commented on a change in pull request #2056: DRILL-7701: EVF 
V2 Scan Framework
URL: https://github.com/apache/drill/pull/2056#discussion_r410711887
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/scan/v3/lifecycle/ScanLifecycle.java
 ##
 @@ -0,0 +1,188 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.scan.v3.lifecycle;
+
+import org.apache.drill.common.exceptions.CustomErrorContext;
+import org.apache.drill.exec.memory.BufferAllocator;
+import org.apache.drill.exec.ops.OperatorContext;
+import org.apache.drill.exec.physical.impl.scan.RowBatchReader;
+import org.apache.drill.exec.physical.impl.scan.v3.ReaderFactory;
+import org.apache.drill.exec.physical.impl.scan.v3.ScanLifecycleBuilder;
+import 
org.apache.drill.exec.physical.impl.scan.v3.schema.ScanSchemaConfigBuilder;
+import org.apache.drill.exec.physical.impl.scan.v3.schema.ScanSchemaTracker;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.impl.ResultVectorCacheImpl;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+
+/**
+/**
+ * Basic scan framework for a set of "managed" readers and which uses the
+ * scan schema tracker to evolve the scan output schema.
+ * Readers are created and managed via a reader
+ * factory class unique to each type of scan. The reader factory also provides
+ * the scan-specific schema negotiator to be passed to the reader.
+ *
+ * Lifecycle
+ *
+ * The options provided in the {@link ScanLifecycleBuilder} are
+ * sufficient to drive the entire scan operator functionality.
+ * Schema resolution and projection is done generically and is the same for all
+ * data sources. Only the
+ * reader (created via the factory class) differs from one type of file to
+ * another.
+ * 
+ * The framework achieves the work described below by composing a
+ * set of detailed classes, each of which performs some specific task. This
+ * structure leaves the reader to simply infer schema and read data.
+ *
+ * Reader Integration
+ *
+ * The details of how a file is structured, how a schema is inferred, how
+ * data is decoded: all that is encapsulated in the reader. The only real
+ * Interaction between the reader and the framework is:
+ * 
+ * The reader factory creates a reader and the corresponding schema
+ * negotiator.
+ * The reader "negotiates" a schema with the framework. The framework
+ * knows the projection list from the query plan, knows something about
+ * data types (whether a column should be scalar, a map or an array), and
+ * knows about the schema already defined by prior readers. The reader knows
+ * what schema it can produce (if "early schema.") The schema negotiator
+ * class handles this task.
+ * The reader reads data from the file and populates value vectors a
+ * batch at a time. The framework creates the result set loader to use for
+ * this work. The schema negotiator returns that loader to the reader, which
+ * uses it during read.
+ * A reader may be "late schema", true "schema on read." In this case, the
+ * reader simply tells the result set loader to create a new column reader
+ * on the fly. The framework will work out if that new column is to be
+ * projected and will return either a real column writer (projected column)
+ * or a dummy column writer (unprojected column.)
+ * The reader then reads batches of data until all data is read. The
+ * result set loader signals when a batch is full; the reader should not
+ * worry about this detail itself.
+ * The reader then releases its resources.
+ * 
+ * 
+ * See {@link ScanSchemaTracker} for details about how the scan schema
+ * evolves over the scan lifecycle.
+ *
+ * Livecycle
 
 Review comment:
   ```suggestion
* Life cycle
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] arina-ielchiieva commented on a change in pull request #2056: DRILL-7701: EVF V2 Scan Framework

2020-04-18 Thread GitBox
arina-ielchiieva commented on a change in pull request #2056: DRILL-7701: EVF 
V2 Scan Framework
URL: https://github.com/apache/drill/pull/2056#discussion_r410087452
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/scan/v3/lifecycle/TestOutputBatchBuilder.java
 ##
 @@ -0,0 +1,272 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.scan.v3.lifecycle;
+
+import static org.apache.drill.test.rowSet.RowSetUtilities.mapValue;
+
+import java.util.Collections;
+
+import org.apache.drill.categories.EvfTests;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import 
org.apache.drill.exec.physical.impl.scan.v3.lifecycle.OutputBatchBuilder.BatchSource;
+import org.apache.drill.exec.physical.rowSet.RowSet;
+import org.apache.drill.exec.physical.rowSet.RowSet.SingleRowSet;
+import org.apache.drill.exec.record.VectorContainer;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
+import org.apache.drill.test.SubOperatorTest;
+import org.apache.drill.test.rowSet.RowSetUtilities;
+import org.junit.Test;
+import org.junit.experimental.categories.Category;
+
+@Category(EvfTests.class)
+public class TestOutputBatchBuilder extends SubOperatorTest {
+
+  public TupleMetadata firstSchema() {
+return new SchemaBuilder()
+  .add("d", MinorType.VARCHAR)
+  .add("a", MinorType.INT)
+  .buildSchema();
+  }
+
+  private VectorContainer makeFirst(TupleMetadata firstSchema) {
+return fixture.rowSetBuilder(firstSchema)
+  .addRow("barney", 10)
+  .addRow("wilma", 20)
+  .build()
+  .container();
+  }
+
+  public TupleMetadata secondSchema() {
+return new SchemaBuilder()
+  .add("b", MinorType.INT)
+  .add("c", MinorType.VARCHAR)
+  .buildSchema();
+  }
+
+  private VectorContainer makeSecond(TupleMetadata secondSchema) {
+return fixture.rowSetBuilder(secondSchema)
+  .addRow(1, "foo.csv")
+  .addRow(2, "foo.csv")
+  .build()
+  .container();
+  }
+
+  @Test
+  public void testSingleInput() {
+
+TupleMetadata schema = firstSchema();
+VectorContainer input = makeFirst(schema);
+
+OutputBatchBuilder builder = new OutputBatchBuilder(schema,
+Collections.singletonList(new BatchSource(schema, input)),
+fixture.allocator());
+builder.load(input.getRecordCount());
+VectorContainer output = builder.outputContainer();
+
+RowSetUtilities.verify(fixture.wrap(input), fixture.wrap(output));
+  }
+
+  @Test
+  public void testReorder() {
+
+TupleMetadata schema = firstSchema();
+VectorContainer input = makeFirst(schema);
+
+TupleMetadata outputSchema = new SchemaBuilder()
+.add("a", MinorType.INT)
+.add("d", MinorType.VARCHAR)
+.buildSchema();
+OutputBatchBuilder builder = new OutputBatchBuilder(outputSchema,
+Collections.singletonList(new BatchSource(schema, input)),
+fixture.allocator());
+builder.load(input.getRecordCount());
+VectorContainer output = builder.outputContainer();
+
+RowSet expected = fixture.rowSetBuilder(outputSchema)
+.addRow(10, "barney")
+.addRow(20, "wilma")
+.build();
+RowSetUtilities.verify(expected, fixture.wrap(output));
+  }
+
+  @Test
+  public void testTwoInputs() {
+
+TupleMetadata schema1 = firstSchema();
+VectorContainer input1 = makeFirst(schema1);
+TupleMetadata schema2 = secondSchema();
+VectorContainer input2 = makeSecond(schema2);
+
+TupleMetadata outputSchema = new SchemaBuilder()
+.add("a", MinorType.INT)
+.add("b", MinorType.INT)
+.add("c", MinorType.VARCHAR)
+.add("d", MinorType.VARCHAR)
+.buildSchema();
+OutputBatchBuilder builder = new OutputBatchBuilder(outputSchema,
+Lists.newArrayList(
+new BatchSource(schema1, input1),
+new BatchSource(schema2, input2)),
+fixture.allocator());
+builder.load(input1.getRecordCount());
+VectorContainer output = builder.outputContainer();
+
+SingleRowSet 

[GitHub] [drill] arina-ielchiieva commented on a change in pull request #2056: DRILL-7701: EVF V2 Scan Framework

2020-04-18 Thread GitBox
arina-ielchiieva commented on a change in pull request #2056: DRILL-7701: EVF 
V2 Scan Framework
URL: https://github.com/apache/drill/pull/2056#discussion_r410711733
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/scan/v3/lifecycle/ReaderLifecycle.java
 ##
 @@ -0,0 +1,383 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.scan.v3.lifecycle;
+
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.drill.common.exceptions.CustomErrorContext;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.exec.physical.impl.scan.RowBatchReader;
+import org.apache.drill.exec.physical.impl.scan.v3.ManagedReader;
+import org.apache.drill.exec.physical.impl.scan.v3.ScanLifecycleBuilder;
+import org.apache.drill.exec.physical.impl.scan.v3.SchemaNegotiator;
+import 
org.apache.drill.exec.physical.impl.scan.v3.ManagedReader.EarlyEofException;
+import 
org.apache.drill.exec.physical.impl.scan.v3.lifecycle.OutputBatchBuilder.BatchSource;
+import org.apache.drill.exec.physical.impl.scan.v3.schema.ScanSchemaTracker;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.impl.ResultSetLoaderImpl;
+import org.apache.drill.exec.physical.resultSet.impl.ResultSetOptionBuilder;
+import org.apache.drill.exec.record.VectorContainer;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * Manages the schema and batch construction for a managed reader.
+ * Allows the reader itself to be as simple as possible. This class
+ * implements the basic {@link RowBatchReader} protocol based on
+ * three methods, and converts it to the two-method protocol of
+ * the managed reader. The {@code open()} call of the
+ * {@code RowBatchReader} is combined with the constructor of the
+ * {@link ManagedReader}, enforcing the rule that the managed reader
+ * is created just-in-time when it is to be used, which avoids
+ * accidentally holding resources for the life of the scan.
+ * 
+ * Coordinates the components that wrap a reader to create the final
+ * output batch:
+ * 
+ * The actual reader which load (possibly a subset of) the
+ * columns requested from the input source.
+ * Implicit columns manager instance which populates implicit
+ * file columns, partition columns, and Drill's internal metadata
+ * columns.
+ * The missing columns handler which "makes up" values for projected
+ * columns not read by the reader.
+ * Batch assembler, which combines the three sources of vectors
+ * to create the output batch with the schema specified by the
+ * schema tracker.
+ * 
+ * 
+ * This class coordinates the reader-visible aspects of the scan:
+ * 
+ * The {@link SchemaNegotiator} (or subclass) which provides
+ * schema-related input to the reader and which creates the reader's
+ * {@link ResultSetLoader}, among other tasks. The schema negotiator
+ * is specific to each kind of scan and is thus created via the
+ * {@link ScanLifecycleBuilder}.
+ * The reader, which is designed to be as simple as possible,
+ * with all generic overhead tasks handled by this "shim" between
+ * the scan operator and the actual reader implementation.
+ * 
+ * 
+ * The reader is schema-driven. See {@link ScanSchemaTracker} for
+ * an overview.
+ * 
+ * The reader is given a reader input schema, via the
+ * schema negotiator, which specifies the desired output schema.
+ * The schema can be fully dynamic (a wildcard), fully defined (a
+ * prior reader already chose column types), or a hybrid.
+ * The reader can load a subset of columns. Those that are
+ * left out become "missing columns" to be filled in by this
+ * class.
+ * The reader output schema along with implicit and missing
+ * columns, together define the scan's output schema.
+ * 
+ * 
+ * The framework handles the projection task so the
+ * reader does not have to worry about it. Reading an unwanted column
+ * is low cost: the result set loader will have provided a "dummy" column
+ * writer that simply discards the value. This is just as fast as having the
+ * reader use if-statements 

[GitHub] [drill] arina-ielchiieva commented on a change in pull request #2056: DRILL-7701: EVF V2 Scan Framework

2020-04-18 Thread GitBox
arina-ielchiieva commented on a change in pull request #2056: DRILL-7701: EVF 
V2 Scan Framework
URL: https://github.com/apache/drill/pull/2056#discussion_r410088714
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/scan/v3/file/FileScanUtils.java
 ##
 @@ -38,7 +38,7 @@ public static String partitionColName(int partition) {
   }
 
   public static List expandMetadata(int dirCount) {
-List selected = Arrays.asList(
+List selected = Lists.newArrayList(
 
 Review comment:
   Why to replace? It's better use java native methods instead of guava.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] arina-ielchiieva commented on a change in pull request #2056: DRILL-7701: EVF V2 Scan Framework

2020-04-18 Thread GitBox
arina-ielchiieva commented on a change in pull request #2056: DRILL-7701: EVF 
V2 Scan Framework
URL: https://github.com/apache/drill/pull/2056#discussion_r410086985
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/scan/v3/lifecycle/TestScanLifecycleSchema.java
 ##
 @@ -0,0 +1,254 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.scan.v3.lifecycle;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertFalse;
+import static org.junit.Assert.assertSame;
+import static org.junit.Assert.assertTrue;
+import static org.junit.Assert.fail;
+
+import org.apache.drill.categories.EvfTests;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.exec.physical.impl.scan.RowBatchReader;
+import org.apache.drill.exec.physical.impl.scan.v3.ManagedReader;
+import org.apache.drill.exec.physical.impl.scan.v3.ScanLifecycleBuilder;
+import org.apache.drill.exec.physical.impl.scan.v3.SchemaNegotiator;
+import 
org.apache.drill.exec.physical.impl.scan.v3.schema.ScanSchemaTracker.ProjectionType;
+import org.apache.drill.exec.physical.rowSet.RowSet;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.test.rowSet.RowSetUtilities;
+import org.junit.Test;
+import org.junit.experimental.categories.Category;
+
+@Category(EvfTests.class)
+public class TestScanLifecycleSchema extends BaseTestScanLifecycle {
+
+  /**
+   * Simplest defined schema case: the defined schema agrees
+   * with the the schema the reader will produce. The defined schema
+   * implies the project list, which is not visible here.
+   */
+  @Test
+  public void testDefinedSchemaSimple() {
+ScanLifecycleBuilder builder = new ScanLifecycleBuilder();
+builder.definedSchema(SCHEMA);
+builder.readerFactory(new SingleReaderFactory() {
+  @Override
+  public ManagedReader next(SchemaNegotiator negotiator) {
+return new MockEarlySchemaReader(negotiator, 1);
+  }
+});
+ScanLifecycle scan = buildScan(builder);
+
+// Projection defaults to SELECT *, but defined schema
+// overrides that setting.
+assertSame(ProjectionType.SOME, scan.schemaTracker().projectionType());
+
+verifyStandardReader(scan, 0);
+scan.close();
+  }
+
+  /**
+   * The defined schema is a subset of the reader's schema; the
+   * defined schema acts as a project list.
+   */
+  @Test
+  public void testDefinedSchemaSubset() {
+ScanLifecycleBuilder builder = new ScanLifecycleBuilder();
+builder.definedSchema(SCHEMA);
+builder.readerFactory(new SingleReaderFactory() {
+  @Override
+  public ManagedReader next(SchemaNegotiator negotiator) {
+return new MockThreeColReader(negotiator);
+  }
+});
+ScanLifecycle scan = buildScan(builder);
+
+RowBatchReader reader = scan.nextReader();
+assertTrue(reader.open());
+assertEquals(SCHEMA, scan.outputSchema());
+assertTrue(reader.next());
+RowSet expected = fixture.rowSetBuilder(SCHEMA)
+  .addRow(101, "wilma")
+  .addRow(102, "betty")
+  .build();
+RowSetUtilities.verify(expected, fixture.wrap(reader.output()));
+assertFalse(reader.next());
+reader.close();
+
+scan.close();
+  }
+
+  /**
+   * The defined schema is a superset of the reader's schema; the
+   * defined schema defines the missing column type.
+   */
+  @Test
+  public void testDefinedSchemaSupersset() {
+ScanLifecycleBuilder builder = new ScanLifecycleBuilder();
+builder.definedSchema(SCHEMA);
+builder.readerFactory(new SingleReaderFactory() {
+  @Override
+  public ManagedReader next(SchemaNegotiator negotiator) {
+return new MockSingleColReader(negotiator);
+  }
+});
+ScanLifecycle scan = buildScan(builder);
+
+RowBatchReader reader = scan.nextReader();
+assertTrue(reader.open());
+assertEquals(SCHEMA, scan.outputSchema());
+assertTrue(reader.next());
+RowSet expected = fixture.rowSetBuilder(SCHEMA)
+  .addRow(101, null)
+  .addRow(102, null)
+  .build();
+

[GitHub] [drill] arina-ielchiieva commented on a change in pull request #2056: DRILL-7701: EVF V2 Scan Framework

2020-04-18 Thread GitBox
arina-ielchiieva commented on a change in pull request #2056: DRILL-7701: EVF 
V2 Scan Framework
URL: https://github.com/apache/drill/pull/2056#discussion_r410711451
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/scan/v3/file/FileScanLifecycle.java
 ##
 @@ -0,0 +1,106 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.scan.v3.file;
+
+import java.io.IOException;
+
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.exec.ops.OperatorContext;
+import org.apache.drill.exec.physical.impl.scan.v3.lifecycle.ReaderLifecycle;
+import org.apache.drill.exec.physical.impl.scan.v3.lifecycle.ScanLifecycle;
+import 
org.apache.drill.exec.physical.impl.scan.v3.lifecycle.SchemaNegotiatorImpl;
+import 
org.apache.drill.exec.physical.impl.scan.v3.lifecycle.StaticBatchBuilder;
+import org.apache.drill.exec.store.dfs.DrillFileSystem;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * The file scan framework adds into the scan framework support for
+ * reading from DFS splits (a file and a block) and for the file-related
+ * implicit and partition columns. The file scan builder gathers
+ * file-related options for the scan as a whole, including the list
+ * of splits. The associated {@link FileSchemaNegotiator} passes
+ * file information to each reader.
+ * 
+ * Only a single storage plugin uses the file scan framework:
+ * the {@link FileSystemPlugin} via the {@link EasyFormatPlugin}. To
+ * make client code as simple as possible, the Drill file system and list
+ * of files is passed though this framework to the
+ * {@link FileReaderFactory}, then to the {@link FileSchemaNegotiator}
+ * which presents them to the reader. This approach avoids the need
+ * for each format handle this common boilerplate code.
+ * 
+ * The {@link FileSanOptions} holds the list of splits to scan. The
+ * {@link {@link FileReaderFactory} iterates over those splits, and
 
 Review comment:
   ```suggestion
* {@link FileReaderFactory} iterates over those splits, and
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] arina-ielchiieva commented on a change in pull request #2056: DRILL-7701: EVF V2 Scan Framework

2020-04-18 Thread GitBox
arina-ielchiieva commented on a change in pull request #2056: DRILL-7701: EVF 
V2 Scan Framework
URL: https://github.com/apache/drill/pull/2056#discussion_r409840377
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/vector/complex/AbstractContainerVector.java
 ##
 @@ -56,7 +58,8 @@ protected AbstractContainerVector(MaterializedField field, 
BufferAllocator alloc
   @Override
   public void allocateNew() throws OutOfMemoryException {
 if (!allocateNewSafe()) {
-  throw new OutOfMemoryException();
+  throw new OutOfMemoryException(""
 
 Review comment:
   ```suggestion
 throw new OutOfMemoryException(
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] arina-ielchiieva commented on a change in pull request #2056: DRILL-7701: EVF V2 Scan Framework

2020-04-18 Thread GitBox
arina-ielchiieva commented on a change in pull request #2056: DRILL-7701: EVF 
V2 Scan Framework
URL: https://github.com/apache/drill/pull/2056#discussion_r410711431
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/scan/v3/file/FileScanLifecycle.java
 ##
 @@ -0,0 +1,106 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.scan.v3.file;
+
+import java.io.IOException;
+
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.exec.ops.OperatorContext;
+import org.apache.drill.exec.physical.impl.scan.v3.lifecycle.ReaderLifecycle;
+import org.apache.drill.exec.physical.impl.scan.v3.lifecycle.ScanLifecycle;
+import 
org.apache.drill.exec.physical.impl.scan.v3.lifecycle.SchemaNegotiatorImpl;
+import 
org.apache.drill.exec.physical.impl.scan.v3.lifecycle.StaticBatchBuilder;
+import org.apache.drill.exec.store.dfs.DrillFileSystem;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * The file scan framework adds into the scan framework support for
+ * reading from DFS splits (a file and a block) and for the file-related
+ * implicit and partition columns. The file scan builder gathers
+ * file-related options for the scan as a whole, including the list
+ * of splits. The associated {@link FileSchemaNegotiator} passes
+ * file information to each reader.
+ * 
+ * Only a single storage plugin uses the file scan framework:
+ * the {@link FileSystemPlugin} via the {@link EasyFormatPlugin}. To
+ * make client code as simple as possible, the Drill file system and list
+ * of files is passed though this framework to the
+ * {@link FileReaderFactory}, then to the {@link FileSchemaNegotiator}
+ * which presents them to the reader. This approach avoids the need
+ * for each format handle this common boilerplate code.
+ * 
+ * The {@link FileSanOptions} holds the list of splits to scan. The
 
 Review comment:
   ```suggestion
* The {@link FileScanOptions} holds the list of splits to scan. The
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] arina-ielchiieva commented on a change in pull request #2056: DRILL-7701: EVF V2 Scan Framework

2020-04-18 Thread GitBox
arina-ielchiieva commented on a change in pull request #2056: DRILL-7701: EVF 
V2 Scan Framework
URL: https://github.com/apache/drill/pull/2056#discussion_r409840961
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/vector/complex/AbstractContainerVector.java
 ##
 @@ -56,7 +58,8 @@ protected AbstractContainerVector(MaterializedField field, 
BufferAllocator alloc
   @Override
   public void allocateNew() throws OutOfMemoryException {
 if (!allocateNewSafe()) {
-  throw new OutOfMemoryException();
+  throw new OutOfMemoryException(""
+  + "Failed to allocate memory for " + getClass().getSimpleName());
 
 Review comment:
   ```suggestion
 "Failed to allocate memory for " + getClass().getSimpleName());
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] arina-ielchiieva commented on issue #2060: DRILL-7706: Implement Drill RDBMS Metastore for Tables component

2020-04-18 Thread GitBox
arina-ielchiieva commented on issue #2060: DRILL-7706: Implement Drill RDBMS 
Metastore for Tables component
URL: https://github.com/apache/drill/pull/2060#issuecomment-615879117
 
 
   @paul-rogers good questions. Though none of them are addressed in this PR, 
since this PR only adds support for Drill Metastore `tables` component. I will 
provide below extended answer to your questions with the guidelines on what 
could be done to support use cases you have asked about.
   
   `First question`:
   Short answer: yes but some parts should be implemented first.
   Extended answer:
   Let's assume we want to store schema for HTTP plugin tables in Drill 
Metastore and use it when querying data from this plugin.
   `ANALYZE TABLE` command collects data about table, including schema, 
statistics etc. It allows user to provide schema inline as well. For example: 
`ANALYZE TABLE table(dfs.tmp.region(schema=>'inline=(id int, country 
varchar)')) REFRESH METADATA`.
   
   You can also call it only with schema and without statistics but you will 
need to disable statistics collection using session option: 
`planner.statistics.use`, in future `ANALYZE TABLE` command can be updated to 
do this without setting the option.
   
   Now `ANALYZE TABLE` command works only with file based tables. So first we 
will need to extend it to support analysis for tables from storage plugins. 
Maybe add interfaces that each storage plugin would need to implement. 
   `ANALYZE TABLE` command will gather data for such tables and transfer it to 
the Drill Metastore. Drill Metastore will store it and will be able to provide 
it when asked (this part is implemented already). 
   Currently, only file based format plugins work with Drill Metastore, so last 
step would be to integrate Drill Metastore usage in HTTP plugin or any other 
plugin.
   
   `Second question`:
   Short answer: yes, but you will have to implement new components.
   Extended answer:
   Drill Metastore consists of megastore-api (which contains Metastore 
interfaces and general classes) and metastore implementations, now we have 
Iceberg, this PR adds also RDBMS.
   Drill Metastore interface consists of components. Now we have only `tables` 
component which stores metadata for Drill tables, including their segments, 
files, row groups and partitions if any. `Views` component is present but not 
implemented. 
https://github.com/apache/drill/blob/master/metastore/metastore-api/src/main/java/org/apache/drill/metastore/Metastore.java
   
   So what if you want to add new component, for example, `pstore`? Just add 
new component to the `DrillMetastore` interface. As you wrote, it would store 
   information `for plugins, UDFs, security credentials and more` so I think 
it's better to create separate component to each information type:
   ```
   Plugins plugins();
   Udfs udfs();
   Credentials credentials();
   ```
   For each component you would also need to come up with some `unit` which 
will be used to provide info to the Metastore and back. For example, for 
`tables` component there is `TableMetadataUnit` unit.
   Then you would need to implement these interfaces in Drill Iceberg and RDBMS 
Metastore implementations. In Iceberg each component would have it's own 
Iceberg table, in RDBMS - one or several database tables. Most of the code is 
already written, you would just need to add code specific for each new 
component.
   And last step is to integrate Metastore calls in Drill code where you will 
need to use it. `DrillMetastore` is accessible though `DrillbitContext`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (DRILL-7710) Fix TestMetastoreCommands#testDefaultSegment test

2020-04-18 Thread Arina Ielchiieva (Jira)
Arina Ielchiieva created DRILL-7710:
---

 Summary: Fix TestMetastoreCommands#testDefaultSegment test
 Key: DRILL-7710
 URL: https://issues.apache.org/jira/browse/DRILL-7710
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.17.0
Reporter: Arina Ielchiieva
Assignee: Vova Vysotskyi
 Fix For: 1.18.0


Test {{TestMetastoreCommands#testDefaultSegment}} sometimes fails:

{noformat}
[ERROR]   TestMetastoreCommands.testDefaultSegment:1870 
expected: but was:
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [drill] arina-ielchiieva commented on issue #2054: DRILL-6168: Revise format plugin table functions

2020-04-18 Thread GitBox
arina-ielchiieva commented on issue #2054: DRILL-6168: Revise format plugin 
table functions
URL: https://github.com/apache/drill/pull/2054#issuecomment-615869665
 
 
   @paul-rogers is there a way to modify queries so they will pass before and 
after your PR? For example, add `fieldDelimiter` to table function, `order by` 
clause?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


Re: [NOTICE] Maven 3.6.3

2020-04-18 Thread Arina Ielchiieva
Looks like decision was made to stay on the Maven 3.6.3.
For those who want to follow the discussion, please see Jira and PR:
https://issues.apache.org/jira/browse/DRILL-7708 

https://github.com/apache/drill/pull/2061 


Kind regards,
Arina

> On Apr 17, 2020, at 8:59 PM, Paul Rogers  wrote:
> 
> Hi Arina,
> 
> Thanks for keeping us up to date!
> 
> As it turns out, I use Ubuntu (Linux Mint) for development. Maven is 
> installed as a package using apt-get. Packages can lag behind a bit. The 
> latest maven available via apt-get is 3.6.0.
> 
> It is a nuisance to install a new version outside the package manager. I 
> changed the Maven version in the root pom.xml to 3.6.0 and the build seemed 
> to work. Any reason we need the absolute latest version rather than just 
> 3.6.0 or later?
> 
> The workaround for now is to manually edit the pom.xml file on each checkout, 
> then revert the change before commit. Can we maybe adjust the "official" 
> version instead?
> 
> 
> Thanks,
> - Paul
> 
> 
> 
>On Friday, April 17, 2020, 5:09:49 AM PDT, Arina Ielchiieva 
>  wrote:  
> 
> Hi all,
> 
> Starting from Drill 1.18.0 (and current master from commit 20ad3c9 [1]), 
> Drill build will require Maven 3.6.3, otherwise build will fail.
> Please make sure you have Maven 3.6.3 installed on your environments. 
> 
> [1] 
> https://github.com/apache/drill/commit/20ad3c9837e9ada149c246fc7a4ac1fe02de6fe8
> 
> Kind regards,
> Arina  



Re: [DISCUSS]: Masking Creds in Query Plans

2020-04-18 Thread Arina Ielchiieva
Agree, that we should not display sensitive data, like passwords, I would say 
the best option is to mask it during output.

Kind regards,
Arina

> On Apr 17, 2020, at 5:34 PM, Charles Givre  wrote:
> 
> Hello all, 
> I was thinking about this, if a user were to execute an EXPLAIN PLAN FOR 
> query, they get a lot of information about the storage plugin, including in 
> some cases creds.
> The example below shows a query plan for the JDBC storage plugin.   As you 
> can see, the user creds are right there 
> 
> I'm wondering would it be advisable or possible to mask the creds in query 
> plans so that users can't access this information?  If masking it isn't an 
> option, is there some other way to prevent users from seeing this 
> information?  In a multi-tenant environment, it seems like a rather large 
> security hole. 
> Thanks,
> -- C
> 
> 
> {
>  "head" : {
>"version" : 1,
>"generator" : {
>  "type" : "ExplainHandler",
>  "info" : ""
>},
>"type" : "APACHE_DRILL_PHYSICAL",
>"options" : [ ],
>"queue" : 0,
>"hasResourcePlan" : false,
>"resultMode" : "EXEC"
>  },
>  "graph" : [ {
>"pop" : "jdbc-scan",
>"@id" : 5,
>"sql" : "SELECT *\nFROM `stats`.`batting`",
>"columns" : [ "`playerID`", "`yearID`", "`stint`", "`teamID`", "`lgID`", 
> "`G`", "`AB`", "`R`", "`H`", "`2B`", "`3B`", "`HR`", "`RBI`", "`SB`", "`CS`", 
> "`BB`", "`SO`", "`IBB`", "`HBP`", "`SH`", "`SF`", "`GIDP`" ],
>"config" : {
>  "type" : "jdbc",
>  "driver" : "com.mysql.cj.jdbc.Driver",
>  "url" : "jdbc:mysql://localhost:3306/?serverTimezone=EST5EDT",
>  "username" : "",
>  "password" : "",
>  "caseInsensitiveTableNames" : false,
>  "sourceParameters" : { },
>  "enabled" : true
>},
>"userName" : "",
>"cost" : {
>  "memoryCost" : 1.6777216E7,
>  "outputRowCount" : 100.0
>}
>  }, {
>"pop" : "limit",
>"@id" : 4,
>"child" : 5,
>"first" : 0,
>"last" : 10,
>"initialAllocation" : 100,
>"maxAllocation" : 100,
>"cost" : {
>  "memoryCost" : 1.6777216E7,
>  "outputRowCount" : 10.0
>}
>  }, {
>"pop" : "limit",
>"@id" : 3,
> 
>