[jira] [Commented] (DRILL-7308) Incorrect Metadata from text file queries

2019-06-24 Thread Paul Rogers (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872033#comment-16872033
 ] 

Paul Rogers commented on DRILL-7308:


Recall that Drill can return not only multiple batches, but multiple "result 
sets": runs of batches with different schemas.


A more sophisticated REST solution would handle this case. I can't find any 
ProtoBuf field that says that the schema changed. Instead, we'd have to reuse 
code from elsewhere which compares the current schema to the previous one. 
Ideally, in that case, we'd create a new JSON element for the second schema. 
Something like:

{code:json}
{ resultSets: [
    { "rows": ...
  "schema": ...
    }, 
    { "rows": ...
  "schema": ...
    } ]
}
{code}

It is easy to create such a case. Simply create two CSV files, one with 2 
columns, the other with three. Use just a simple \{{SELECT * FROM yourTable}} 
query. You will get two data batches, each with a distinct schema.

The current implementation will give just the first schema and all rows, with 
varying schemas. (Actually, the current implementation will list the two 
columns, then the three columns, duplicating the first two, but we want to fix 
that...)

This is yet another reason to use a provisioned schema: with such a schema we 
can guarantee that the entire query will return a single, consistent schema 
regardless of the variation across files.

> Incorrect Metadata from text file queries
> -
>
> Key: DRILL-7308
> URL: https://issues.apache.org/jira/browse/DRILL-7308
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Priority: Major
> Attachments: Screen Shot 2019-06-24 at 3.16.40 PM.png, domains.csvh
>
>
> I'm noticing some strange behavior with the newest version of Drill.  If you 
> query a CSV file, you get the following metadata:
> {code:sql}
> SELECT * FROM dfs.test.`domains.csvh` LIMIT 1
> {code}
> {code:json}
> {
>   "queryId": "22eee85f-c02c-5878-9735-091d18788061",
>   "columns": [
>     "domain"
>   ],
>   "rows": [}
>    {       "domain": "thedataist.com"     }  ],
>   "metadata": [
>     "VARCHAR(0, 0)",
>     "VARCHAR(0, 0)"
>   ],
>   "queryState": "COMPLETED",
>   "attemptedAutoLimit": 0
> }
> {code}
> There are two issues here:
> 1.  VARCHAR now has precision
> 2.  There are twice as many columns as there should be.
> Additionally, if you query a regular CSV, without the columns extracted, you 
> get the following:
> {code:json}
>  "rows": [
>  { 
>       "columns": "[\"ACCT_NUM\",\"PRODUCT\",\"MONTH\",\"REVENUE\"]"     }
>   ],
>    "metadata": [
>      "VARCHAR(0, 0)",
>      "VARCHAR(0, 0)"
>    ],
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7308) Incorrect Metadata from text file queries

2019-06-24 Thread Paul Rogers (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872031#comment-16872031
 ] 

Paul Rogers commented on DRILL-7308:


The problem with duplicated schema is also due to a flaw in the DRILL-6847 code 
in {{WebUserConnection}}:

{code:java}
  @Override
  public void sendData(RpcOutcomeListener listener, QueryWritableBatch 
result) {
...
for (int i = 0; i < loader.getSchema().getFieldCount(); ++i) {
  //DRILL-6847:  This section adds query metadata to the REST results
{code}

The {{sendData()}} method is called for *each* batch of data sent by the 
server. Probably the manual test case was against a short file that fit into a 
single batch. However, if the file is large, or if the query is distributed 
with multiple files, then multiple batches will be sent. Also, with the recent 
"V3" text reader, the code sends an empty schema batch followed by one or more 
non-empty data batches. (This "feature" is being disabled in DRILL-7306.)

So, each time a batch is received, the code adds another copy of the schema to 
the {{metadata}} list maintained in {{WebUserConnection}}.

A quick and dirty solution is to count the batches, and set the schema only on 
the first.

> Incorrect Metadata from text file queries
> -
>
> Key: DRILL-7308
> URL: https://issues.apache.org/jira/browse/DRILL-7308
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Priority: Major
> Attachments: Screen Shot 2019-06-24 at 3.16.40 PM.png, domains.csvh
>
>
> I'm noticing some strange behavior with the newest version of Drill.  If you 
> query a CSV file, you get the following metadata:
> {code:sql}
> SELECT * FROM dfs.test.`domains.csvh` LIMIT 1
> {code}
> {code:json}
> {
>   "queryId": "22eee85f-c02c-5878-9735-091d18788061",
>   "columns": [
>     "domain"
>   ],
>   "rows": [}
>    {       "domain": "thedataist.com"     }  ],
>   "metadata": [
>     "VARCHAR(0, 0)",
>     "VARCHAR(0, 0)"
>   ],
>   "queryState": "COMPLETED",
>   "attemptedAutoLimit": 0
> }
> {code}
> There are two issues here:
> 1.  VARCHAR now has precision
> 2.  There are twice as many columns as there should be.
> Additionally, if you query a regular CSV, without the columns extracted, you 
> get the following:
> {code:json}
>  "rows": [
>  { 
>       "columns": "[\"ACCT_NUM\",\"PRODUCT\",\"MONTH\",\"REVENUE\"]"     }
>   ],
>    "metadata": [
>      "VARCHAR(0, 0)",
>      "VARCHAR(0, 0)"
>    ],
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (DRILL-7308) Incorrect Metadata from text file queries

2019-06-24 Thread Paul Rogers (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871990#comment-16871990
 ] 

Paul Rogers edited comment on DRILL-7308 at 6/25/19 5:27 AM:
-

The width issue appears to have been introduced with this commit: "DRILL-6847: 
Add Query Metadata to RESTful Interface" (which, ahem, [~cgivre], was your 
PR...). In {{WebUserConnection}}:

{code:java}
  //For DECIMAL type
  if (col.getType().hasPrecision()) {
dataType.append("(");
dataType.append(col.getType().getPrecision());

if (col.getType().hasScale()) {
  dataType.append(", ");
  dataType.append(col.getType().getScale());
}

dataType.append(")");
  } else if (col.getType().hasWidth()) {
//Case for VARCHAR columns with specified width
dataType.append("(");
dataType.append(col.getType().getWidth());
dataType.append(")");
  }
{code}

I did not debug the code, but it appears that {{hasPrecision()}} and 
{{hasScale()}} simply report if the field is set; it does *not* tell us if the 
field is zero.

Also, about a year or so ago, Drill moved {{VARCHAR}} width to the precision 
field, so the supposed {{VARCHAR}} code block is a no-op.

The correct code would be something like:

{code:java}
  //For DECIMAL and VARCHAR types
  if (col.getType().hasPrecision() && col.getType().getPrecision() > 0) 
{
dataType.append("(");
dataType.append(col.getType().getPrecision());

if (col.getType().hasScale() && col.getType().getScale() > 0) {
{code}


was (Author: paul.rogers):
The width issue appears to have been introduced with this commit: "DRILL-6847: 
Add Query Metadata to RESTful Interface" (which, ahem, [~cgivre], was your 
PR...):

{code:java}
  //For DECIMAL type
  if (col.getType().hasPrecision()) {
dataType.append("(");
dataType.append(col.getType().getPrecision());

if (col.getType().hasScale()) {
  dataType.append(", ");
  dataType.append(col.getType().getScale());
}

dataType.append(")");
  } else if (col.getType().hasWidth()) {
//Case for VARCHAR columns with specified width
dataType.append("(");
dataType.append(col.getType().getWidth());
dataType.append(")");
  }
{code}

I did not debug the code, but it appears that {{hasPrecision()}} and 
{{hasScale()}} simply report if the field is set; it does *not* tell us if the 
field is zero.

Also, about a year or so ago, Drill moved {{VARCHAR}} width to the precision 
field, so the supposed {{VARCHAR}} code block is a no-op.

The correct code would be something like:

{code:java}
  //For DECIMAL and VARCHAR types
  if (col.getType().hasPrecision() && col.getType().getPrecision() > 0) 
{
dataType.append("(");
dataType.append(col.getType().getPrecision());

if (col.getType().hasScale() && col.getType().getScale() > 0) {
{code}

> Incorrect Metadata from text file queries
> -
>
> Key: DRILL-7308
> URL: https://issues.apache.org/jira/browse/DRILL-7308
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Priority: Major
> Attachments: Screen Shot 2019-06-24 at 3.16.40 PM.png, domains.csvh
>
>
> I'm noticing some strange behavior with the newest version of Drill.  If you 
> query a CSV file, you get the following metadata:
> {code:sql}
> SELECT * FROM dfs.test.`domains.csvh` LIMIT 1
> {code}
> {code:json}
> {
>   "queryId": "22eee85f-c02c-5878-9735-091d18788061",
>   "columns": [
>     "domain"
>   ],
>   "rows": [}
>    {       "domain": "thedataist.com"     }  ],
>   "metadata": [
>     "VARCHAR(0, 0)",
>     "VARCHAR(0, 0)"
>   ],
>   "queryState": "COMPLETED",
>   "attemptedAutoLimit": 0
> }
> {code}
> There are two issues here:
> 1.  VARCHAR now has precision
> 2.  There are twice as many columns as there should be.
> Additionally, if you query a regular CSV, without the columns extracted, you 
> get the following:
> {code:json}
>  "rows": [
>  { 
>       "columns": "[\"ACCT_NUM\",\"PRODUCT\",\"MONTH\",\"REVENUE\"]"     }
>   ],
>    "metadata": [
>      "VARCHAR(0, 0)",
>      "VARCHAR(0, 0)"
>    ],
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872009#comment-16872009
 ] 

ASF GitHub Bot commented on DRILL-7306:
---

paul-rogers commented on issue #1813: DRILL-7306: Disable schema-only batch for 
new scan framework
URL: https://github.com/apache/drill/pull/1813#issuecomment-505286184
 
 
   @arina-ielchiieva, regarding `enableSchemaBatch`, recall that Java boolean 
variables are, by definition in the language spec, set to false. So, since we 
never set it to true, except in tests, it always defaults to false.
   
   Since this was confusing, added Javadoc to explain the problem and the 
default setting of the option. Does this new material answer your question?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Disable "fast schema" batch for new scan framework
> --
>
> Key: DRILL-7306
> URL: https://issues.apache.org/jira/browse/DRILL-7306
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
>  The EVF framework is set up to return a "fast schema" empty batch with only 
> schema as its first batch because, when the code was written, it seemed 
> that's how we wanted operators to work. However, DRILL-7305 notes that many 
> operators cannot handle empty batches.
> Since the empty-batch bugs show that Drill does not, in fact, provide a "fast 
> schema" batch, this ticket asks to disable the feature in the new scan 
> framework. The feature is disabled with a config option; it can be re-enabled 
> if ever it is needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7308) Incorrect Metadata from text file queries

2019-06-24 Thread Paul Rogers (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871992#comment-16871992
 ] 

Paul Rogers commented on DRILL-7308:


This issue points out a unfortunate reality (IMHO): lack of unit tests for the 
REST API. We have nothing, other than vigilent users, to track down issues such 
as this one.

I believe that unit tests can be easily created: use a {{ClusterTest}} and set 
the config(?) option to enable the web server. Use a web client of some sort to 
fire a request. Either compare the results against a golden file, or just test 
for the bits of interest (such as, for DRILL-6847, test against each type and 
mode, and with and without width/precision, and verify just that part of the 
result.

The unit test would also allow very easy debugging. It seems the best we can do 
at present is build all of Drill, start it, and connect a remote debugger. This 
is so cumbersome that folks will avoid stepping through code to see if it works.

> Incorrect Metadata from text file queries
> -
>
> Key: DRILL-7308
> URL: https://issues.apache.org/jira/browse/DRILL-7308
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Priority: Major
> Attachments: Screen Shot 2019-06-24 at 3.16.40 PM.png, domains.csvh
>
>
> I'm noticing some strange behavior with the newest version of Drill.  If you 
> query a CSV file, you get the following metadata:
> {code:sql}
> SELECT * FROM dfs.test.`domains.csvh` LIMIT 1
> {code}
> {code:json}
> {
>   "queryId": "22eee85f-c02c-5878-9735-091d18788061",
>   "columns": [
>     "domain"
>   ],
>   "rows": [}
>    {       "domain": "thedataist.com"     }  ],
>   "metadata": [
>     "VARCHAR(0, 0)",
>     "VARCHAR(0, 0)"
>   ],
>   "queryState": "COMPLETED",
>   "attemptedAutoLimit": 0
> }
> {code}
> There are two issues here:
> 1.  VARCHAR now has precision
> 2.  There are twice as many columns as there should be.
> Additionally, if you query a regular CSV, without the columns extracted, you 
> get the following:
> {code:json}
>  "rows": [
>  { 
>       "columns": "[\"ACCT_NUM\",\"PRODUCT\",\"MONTH\",\"REVENUE\"]"     }
>   ],
>    "metadata": [
>      "VARCHAR(0, 0)",
>      "VARCHAR(0, 0)"
>    ],
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (DRILL-7308) Incorrect Metadata from text file queries

2019-06-24 Thread Paul Rogers (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871992#comment-16871992
 ] 

Paul Rogers edited comment on DRILL-7308 at 6/25/19 4:41 AM:
-

This issue points out a unfortunate reality (IMHO): lack of unit tests for the 
REST API. We have nothing, other than vigilent users, to track down issues such 
as this one.

I believe that unit tests can be easily created: use a {{ClusterTest}} and set 
the config\(?) option to enable the web server. Use a web client of some sort 
to fire a request. Either compare the results against a golden file, or just 
test for the bits of interest (such as, for DRILL-6847, test against each type 
and mode, and with and without width/precision, and verify just that part of 
the result.

The unit test would also allow very easy debugging. It seems the best we can do 
at present is build all of Drill, start it, and connect a remote debugger. This 
is so cumbersome that folks will avoid stepping through code to see if it works.


was (Author: paul.rogers):
This issue points out a unfortunate reality (IMHO): lack of unit tests for the 
REST API. We have nothing, other than vigilent users, to track down issues such 
as this one.

I believe that unit tests can be easily created: use a {{ClusterTest}} and set 
the config(?) option to enable the web server. Use a web client of some sort to 
fire a request. Either compare the results against a golden file, or just test 
for the bits of interest (such as, for DRILL-6847, test against each type and 
mode, and with and without width/precision, and verify just that part of the 
result.

The unit test would also allow very easy debugging. It seems the best we can do 
at present is build all of Drill, start it, and connect a remote debugger. This 
is so cumbersome that folks will avoid stepping through code to see if it works.

> Incorrect Metadata from text file queries
> -
>
> Key: DRILL-7308
> URL: https://issues.apache.org/jira/browse/DRILL-7308
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Priority: Major
> Attachments: Screen Shot 2019-06-24 at 3.16.40 PM.png, domains.csvh
>
>
> I'm noticing some strange behavior with the newest version of Drill.  If you 
> query a CSV file, you get the following metadata:
> {code:sql}
> SELECT * FROM dfs.test.`domains.csvh` LIMIT 1
> {code}
> {code:json}
> {
>   "queryId": "22eee85f-c02c-5878-9735-091d18788061",
>   "columns": [
>     "domain"
>   ],
>   "rows": [}
>    {       "domain": "thedataist.com"     }  ],
>   "metadata": [
>     "VARCHAR(0, 0)",
>     "VARCHAR(0, 0)"
>   ],
>   "queryState": "COMPLETED",
>   "attemptedAutoLimit": 0
> }
> {code}
> There are two issues here:
> 1.  VARCHAR now has precision
> 2.  There are twice as many columns as there should be.
> Additionally, if you query a regular CSV, without the columns extracted, you 
> get the following:
> {code:json}
>  "rows": [
>  { 
>       "columns": "[\"ACCT_NUM\",\"PRODUCT\",\"MONTH\",\"REVENUE\"]"     }
>   ],
>    "metadata": [
>      "VARCHAR(0, 0)",
>      "VARCHAR(0, 0)"
>    ],
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7308) Incorrect Metadata from text file queries

2019-06-24 Thread Paul Rogers (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871990#comment-16871990
 ] 

Paul Rogers commented on DRILL-7308:


The width issue appears to have been introduced with this commit: "DRILL-6847: 
Add Query Metadata to RESTful Interface" (which, ahem, [~cgivre], was your 
PR...):

{code:java}
  //For DECIMAL type
  if (col.getType().hasPrecision()) {
dataType.append("(");
dataType.append(col.getType().getPrecision());

if (col.getType().hasScale()) {
  dataType.append(", ");
  dataType.append(col.getType().getScale());
}

dataType.append(")");
  } else if (col.getType().hasWidth()) {
//Case for VARCHAR columns with specified width
dataType.append("(");
dataType.append(col.getType().getWidth());
dataType.append(")");
  }
{code}

I did not debug the code, but it appears that {{hasPrecision()}} and 
{{hasScale()}} simply report if the field is set; it does *not* tell us if the 
field is zero.

Also, about a year or so ago, Drill moved {{VARCHAR}} width to the precision 
field, so the supposed {{VARCHAR}} code block is a no-op.

The correct code would be something like:

{code:java}
  //For DECIMAL and VARCHAR types
  if (col.getType().hasPrecision() && col.getType().getPrecision() > 0) 
{
dataType.append("(");
dataType.append(col.getType().getPrecision());

if (col.getType().hasScale() && col.getType().getScale() > 0) {
{code}

> Incorrect Metadata from text file queries
> -
>
> Key: DRILL-7308
> URL: https://issues.apache.org/jira/browse/DRILL-7308
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Priority: Major
> Attachments: Screen Shot 2019-06-24 at 3.16.40 PM.png, domains.csvh
>
>
> I'm noticing some strange behavior with the newest version of Drill.  If you 
> query a CSV file, you get the following metadata:
> {code:sql}
> SELECT * FROM dfs.test.`domains.csvh` LIMIT 1
> {code}
> {code:json}
> {
>   "queryId": "22eee85f-c02c-5878-9735-091d18788061",
>   "columns": [
>     "domain"
>   ],
>   "rows": [}
>    {       "domain": "thedataist.com"     }  ],
>   "metadata": [
>     "VARCHAR(0, 0)",
>     "VARCHAR(0, 0)"
>   ],
>   "queryState": "COMPLETED",
>   "attemptedAutoLimit": 0
> }
> {code}
> There are two issues here:
> 1.  VARCHAR now has precision
> 2.  There are twice as many columns as there should be.
> Additionally, if you query a regular CSV, without the columns extracted, you 
> get the following:
> {code:json}
>  "rows": [
>  { 
>       "columns": "[\"ACCT_NUM\",\"PRODUCT\",\"MONTH\",\"REVENUE\"]"     }
>   ],
>    "metadata": [
>      "VARCHAR(0, 0)",
>      "VARCHAR(0, 0)"
>    ],
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7308) Incorrect Metadata from text file queries

2019-06-24 Thread Charles Givre (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871954#comment-16871954
 ] 

Charles Givre commented on DRILL-7308:
--

[~Paul.Rogers], That is correct.  Something is amiss in the REST API. It was 
breaking Superset.   Also, I was attempting to run unit tests on some UDFs I've 
been working on and was encountering strange errors that related to CHAR and 
VARCHAR datatypes.  I suspect that these problems may be related.

> Incorrect Metadata from text file queries
> -
>
> Key: DRILL-7308
> URL: https://issues.apache.org/jira/browse/DRILL-7308
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Priority: Major
> Attachments: Screen Shot 2019-06-24 at 3.16.40 PM.png, domains.csvh
>
>
> I'm noticing some strange behavior with the newest version of Drill.  If you 
> query a CSV file, you get the following metadata:
> {code:sql}
> SELECT * FROM dfs.test.`domains.csvh` LIMIT 1
> {code}
> {code:json}
> {
>   "queryId": "22eee85f-c02c-5878-9735-091d18788061",
>   "columns": [
>     "domain"
>   ],
>   "rows": [}
>    {       "domain": "thedataist.com"     }  ],
>   "metadata": [
>     "VARCHAR(0, 0)",
>     "VARCHAR(0, 0)"
>   ],
>   "queryState": "COMPLETED",
>   "attemptedAutoLimit": 0
> }
> {code}
> There are two issues here:
> 1.  VARCHAR now has precision
> 2.  There are twice as many columns as there should be.
> Additionally, if you query a regular CSV, without the columns extracted, you 
> get the following:
> {code:json}
>  "rows": [
>  { 
>       "columns": "[\"ACCT_NUM\",\"PRODUCT\",\"MONTH\",\"REVENUE\"]"     }
>   ],
>    "metadata": [
>      "VARCHAR(0, 0)",
>      "VARCHAR(0, 0)"
>    ],
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (DRILL-7308) Incorrect Metadata from text file queries

2019-06-24 Thread Paul Rogers (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871945#comment-16871945
 ] 

Paul Rogers edited comment on DRILL-7308 at 6/25/19 3:08 AM:
-

According to the screen shot, this is the REST API, method POST with query as 
payload, URL is {{http::/query.json}}.


was (Author: paul.rogers):
I presume this is the REST API? Please specify the URL used to do the query.

> Incorrect Metadata from text file queries
> -
>
> Key: DRILL-7308
> URL: https://issues.apache.org/jira/browse/DRILL-7308
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Priority: Major
> Attachments: Screen Shot 2019-06-24 at 3.16.40 PM.png, domains.csvh
>
>
> I'm noticing some strange behavior with the newest version of Drill.  If you 
> query a CSV file, you get the following metadata:
> {code:sql}
> SELECT * FROM dfs.test.`domains.csvh` LIMIT 1
> {code}
> {code:json}
> {
>   "queryId": "22eee85f-c02c-5878-9735-091d18788061",
>   "columns": [
>     "domain"
>   ],
>   "rows": [}
>    {       "domain": "thedataist.com"     }  ],
>   "metadata": [
>     "VARCHAR(0, 0)",
>     "VARCHAR(0, 0)"
>   ],
>   "queryState": "COMPLETED",
>   "attemptedAutoLimit": 0
> }
> {code}
> There are two issues here:
> 1.  VARCHAR now has precision
> 2.  There are twice as many columns as there should be.
> Additionally, if you query a regular CSV, without the columns extracted, you 
> get the following:
> {code:json}
>  "rows": [
>  { 
>       "columns": "[\"ACCT_NUM\",\"PRODUCT\",\"MONTH\",\"REVENUE\"]"     }
>   ],
>    "metadata": [
>      "VARCHAR(0, 0)",
>      "VARCHAR(0, 0)"
>    ],
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7308) Incorrect Metadata from text file queries

2019-06-24 Thread Paul Rogers (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871945#comment-16871945
 ] 

Paul Rogers commented on DRILL-7308:


I presume this is the REST API? Please specify the URL used to do the query.

> Incorrect Metadata from text file queries
> -
>
> Key: DRILL-7308
> URL: https://issues.apache.org/jira/browse/DRILL-7308
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Priority: Major
> Attachments: Screen Shot 2019-06-24 at 3.16.40 PM.png, domains.csvh
>
>
> I'm noticing some strange behavior with the newest version of Drill.  If you 
> query a CSV file, you get the following metadata:
> {code:sql}
> SELECT * FROM dfs.test.`domains.csvh` LIMIT 1
> {code}
> {code:json}
> {
>   "queryId": "22eee85f-c02c-5878-9735-091d18788061",
>   "columns": [
>     "domain"
>   ],
>   "rows": [}
>    {       "domain": "thedataist.com"     }  ],
>   "metadata": [
>     "VARCHAR(0, 0)",
>     "VARCHAR(0, 0)"
>   ],
>   "queryState": "COMPLETED",
>   "attemptedAutoLimit": 0
> }
> {code}
> There are two issues here:
> 1.  VARCHAR now has precision
> 2.  There are twice as many columns as there should be.
> Additionally, if you query a regular CSV, without the columns extracted, you 
> get the following:
> {code:json}
>  "rows": [
>  { 
>       "columns": "[\"ACCT_NUM\",\"PRODUCT\",\"MONTH\",\"REVENUE\"]"     }
>   ],
>    "metadata": [
>      "VARCHAR(0, 0)",
>      "VARCHAR(0, 0)"
>    ],
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7308) Incorrect Metadata from text file queries

2019-06-24 Thread Paul Rogers (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated DRILL-7308:
---
Description: 
I'm noticing some strange behavior with the newest version of Drill.  If you 
query a CSV file, you get the following metadata:

{code:sql}
SELECT * FROM dfs.test.`domains.csvh` LIMIT 1
{code}

{code:json}
{
  "queryId": "22eee85f-c02c-5878-9735-091d18788061",
  "columns": [
    "domain"
  ],
  "rows": [}
   {       "domain": "thedataist.com"     }  ],
  "metadata": [
    "VARCHAR(0, 0)",
    "VARCHAR(0, 0)"
  ],
  "queryState": "COMPLETED",
  "attemptedAutoLimit": 0
}
{code}


There are two issues here:

1.  VARCHAR now has precision
2.  There are twice as many columns as there should be.

Additionally, if you query a regular CSV, without the columns extracted, you 
get the following:

{code:json}
 "rows": [
 { 
      "columns": "[\"ACCT_NUM\",\"PRODUCT\",\"MONTH\",\"REVENUE\"]"     }
  ],
   "metadata": [
     "VARCHAR(0, 0)",
     "VARCHAR(0, 0)"
   ],
{code}

  was:
I'm noticing some strange behavior with the newest version of Drill.  If you 
query a CSV file, you get the following metadata:

{code:sql}
SELECT * FROM dfs.test.`domains.csvh` LIMIT 1
{code}

{code:json}
{
  "queryId": "22eee85f-c02c-5878-9735-091d18788061",
  "columns": [
    "domain"
  ],
  "rows": [}
   {       "domain": "thedataist.com"     }  ],
  "metadata": [}}
    "VARCHAR(0, 0)",
    "VARCHAR(0, 0)"
  ],
  "queryState": "COMPLETED",
  "attemptedAutoLimit": 0
}
{code}


There are two issues here:

1.  VARCHAR now has precision
2.  There are twice as many columns as there should be.

Additionally, if you query a regular CSV, without the columns extracted, you 
get the following:

{code:json}
 "rows": [
 { 
      "columns": "[\"ACCT_NUM\",\"PRODUCT\",\"MONTH\",\"REVENUE\"]"     }
  ],
   "metadata": [
     "VARCHAR(0, 0)",
     "VARCHAR(0, 0)"
   ],
{code}


> Incorrect Metadata from text file queries
> -
>
> Key: DRILL-7308
> URL: https://issues.apache.org/jira/browse/DRILL-7308
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Priority: Major
> Attachments: Screen Shot 2019-06-24 at 3.16.40 PM.png, domains.csvh
>
>
> I'm noticing some strange behavior with the newest version of Drill.  If you 
> query a CSV file, you get the following metadata:
> {code:sql}
> SELECT * FROM dfs.test.`domains.csvh` LIMIT 1
> {code}
> {code:json}
> {
>   "queryId": "22eee85f-c02c-5878-9735-091d18788061",
>   "columns": [
>     "domain"
>   ],
>   "rows": [}
>    {       "domain": "thedataist.com"     }  ],
>   "metadata": [
>     "VARCHAR(0, 0)",
>     "VARCHAR(0, 0)"
>   ],
>   "queryState": "COMPLETED",
>   "attemptedAutoLimit": 0
> }
> {code}
> There are two issues here:
> 1.  VARCHAR now has precision
> 2.  There are twice as many columns as there should be.
> Additionally, if you query a regular CSV, without the columns extracted, you 
> get the following:
> {code:json}
>  "rows": [
>  { 
>       "columns": "[\"ACCT_NUM\",\"PRODUCT\",\"MONTH\",\"REVENUE\"]"     }
>   ],
>    "metadata": [
>      "VARCHAR(0, 0)",
>      "VARCHAR(0, 0)"
>    ],
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7308) Incorrect Metadata from text file queries

2019-06-24 Thread Paul Rogers (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated DRILL-7308:
---
Description: 
I'm noticing some strange behavior with the newest version of Drill.  If you 
query a CSV file, you get the following metadata:

{code:sql}
SELECT * FROM dfs.test.`domains.csvh` LIMIT 1
{code}

{code:json}
{
  "queryId": "22eee85f-c02c-5878-9735-091d18788061",
  "columns": [
    "domain"
  ],
  "rows": [}
   {       "domain": "thedataist.com"     }  ],
  "metadata": [}}
    "VARCHAR(0, 0)",
    "VARCHAR(0, 0)"
  ],
  "queryState": "COMPLETED",
  "attemptedAutoLimit": 0
}
{code}


There are two issues here:

1.  VARCHAR now has precision
2.  There are twice as many columns as there should be.

Additionally, if you query a regular CSV, without the columns extracted, you 
get the following:

{code:json}
 "rows": [
 { 
      "columns": "[\"ACCT_NUM\",\"PRODUCT\",\"MONTH\",\"REVENUE\"]"     }
  ],
   "metadata": [
     "VARCHAR(0, 0)",
     "VARCHAR(0, 0)"
   ],
{code}

  was:
{{I'm noticing some strange behavior with the newest version of Drill.  If you 
query a CSV file, you get the following metadata:}}
{{  }}
{{ SELECT * FROM dfs.test.`domains.csvh` LIMIT 1}}
{{  }}
{{ {}}
{{   "queryId": "22eee85f-c02c-5878-9735-091d18788061",}}
{{   "columns": [}}
{{     "domain"}}
{{   ],}}
{{   "rows": [}}
{{    }}{{{       "domain": "thedataist.com"     }}}{{  ],}}
{{   "metadata": [}}
{{     "VARCHAR(0, 0)",}}
{{     "VARCHAR(0, 0)"}}
{{   ],}}
{{   "queryState": "COMPLETED",}}
{{   "attemptedAutoLimit": 0}}
{{ }}}
{{  }}
{{  }}
{{ There are two issues here:}}
{{ 1.  VARCHAR now has precision }}
{{ 2.  There are twice as many columns as there should be.}}
{{  }}
{{ Additionally, if you query a regular CSV, without the columns extracted, you 
get the following:}}
{{  }}
{{ "rows": [}}
{{    }}

{       "columns": "[\"ACCT_NUM\",\"PRODUCT\",\"MONTH\",\"REVENUE\"]"     }

  ],
   "metadata": [
     "VARCHAR(0, 0)",
     "VARCHAR(0, 0)"
   ],
  


> Incorrect Metadata from text file queries
> -
>
> Key: DRILL-7308
> URL: https://issues.apache.org/jira/browse/DRILL-7308
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Priority: Major
> Attachments: Screen Shot 2019-06-24 at 3.16.40 PM.png, domains.csvh
>
>
> I'm noticing some strange behavior with the newest version of Drill.  If you 
> query a CSV file, you get the following metadata:
> {code:sql}
> SELECT * FROM dfs.test.`domains.csvh` LIMIT 1
> {code}
> {code:json}
> {
>   "queryId": "22eee85f-c02c-5878-9735-091d18788061",
>   "columns": [
>     "domain"
>   ],
>   "rows": [}
>    {       "domain": "thedataist.com"     }  ],
>   "metadata": [}}
>     "VARCHAR(0, 0)",
>     "VARCHAR(0, 0)"
>   ],
>   "queryState": "COMPLETED",
>   "attemptedAutoLimit": 0
> }
> {code}
> There are two issues here:
> 1.  VARCHAR now has precision
> 2.  There are twice as many columns as there should be.
> Additionally, if you query a regular CSV, without the columns extracted, you 
> get the following:
> {code:json}
>  "rows": [
>  { 
>       "columns": "[\"ACCT_NUM\",\"PRODUCT\",\"MONTH\",\"REVENUE\"]"     }
>   ],
>    "metadata": [
>      "VARCHAR(0, 0)",
>      "VARCHAR(0, 0)"
>    ],
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6951) Merge row set based mock data source

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871937#comment-16871937
 ] 

ASF GitHub Bot commented on DRILL-6951:
---

paul-rogers commented on issue #1809: DRILL-6951: Row set based mock data source
URL: https://github.com/apache/drill/pull/1809#issuecomment-505257599
 
 
   Squashed commits and rebased on latest master.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Merge row set based mock data source
> 
>
> Key: DRILL-6951
> URL: https://issues.apache.org/jira/browse/DRILL-6951
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.15.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>
> The mock reader framework is an obscure bit of code used in tests that 
> generates fake data for use in things like testing sort, filters and so on.
> Because the mock reader is simple, it is a good demonstration case for the 
> new scanner framework based on the result set loader. This task merges the 
> existing work in migrating the mock data source into master via a PR.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7308) Incorrect Metadata from text file queries

2019-06-24 Thread Charles Givre (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Givre updated DRILL-7308:
-
Attachment: Screen Shot 2019-06-24 at 3.16.40 PM.png

> Incorrect Metadata from text file queries
> -
>
> Key: DRILL-7308
> URL: https://issues.apache.org/jira/browse/DRILL-7308
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Priority: Major
> Attachments: Screen Shot 2019-06-24 at 3.16.40 PM.png, domains.csvh
>
>
> {{I'm noticing some strange behavior with the newest version of Drill.  If 
> you query a CSV file, you get the following metadata:}}
> {{  }}
> {{ SELECT * FROM dfs.test.`domains.csvh` LIMIT 1}}
> {{  }}
> {{ {}}
> {{   "queryId": "22eee85f-c02c-5878-9735-091d18788061",}}
> {{   "columns": [}}
> {{     "domain"}}
> {{   ],}}
> {{   "rows": [}}
> {{    }}{{{       "domain": "thedataist.com"     }}}{{  ],}}
> {{   "metadata": [}}
> {{     "VARCHAR(0, 0)",}}
> {{     "VARCHAR(0, 0)"}}
> {{   ],}}
> {{   "queryState": "COMPLETED",}}
> {{   "attemptedAutoLimit": 0}}
> {{ }}}
> {{  }}
> {{  }}
> {{ There are two issues here:}}
> {{ 1.  VARCHAR now has precision }}
> {{ 2.  There are twice as many columns as there should be.}}
> {{  }}
> {{ Additionally, if you query a regular CSV, without the columns extracted, 
> you get the following:}}
> {{  }}
> {{ "rows": [}}
> {{    }}
> {       "columns": "[\"ACCT_NUM\",\"PRODUCT\",\"MONTH\",\"REVENUE\"]"     }
>   ],
>    "metadata": [
>      "VARCHAR(0, 0)",
>      "VARCHAR(0, 0)"
>    ],
>   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7308) Incorrect Metadata from text file queries

2019-06-24 Thread Charles Givre (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Givre updated DRILL-7308:
-
Description: 
{{I'm noticing some strange behavior with the newest version of Drill.  If you 
query a CSV file, you get the following metadata:}}
{{  }}
{{ SELECT * FROM dfs.test.`domains.csvh` LIMIT 1}}
{{  }}
{{ {}}
{{   "queryId": "22eee85f-c02c-5878-9735-091d18788061",}}
{{   "columns": [}}
{{     "domain"}}
{{   ],}}
{{   "rows": [}}
{{    }}{{{       "domain": "thedataist.com"     }}}{{  ],}}
{{   "metadata": [}}
{{     "VARCHAR(0, 0)",}}
{{     "VARCHAR(0, 0)"}}
{{   ],}}
{{   "queryState": "COMPLETED",}}
{{   "attemptedAutoLimit": 0}}
{{ }}}
{{  }}
{{  }}
{{ There are two issues here:}}
{{ 1.  VARCHAR now has precision }}
{{ 2.  There are twice as many columns as there should be.}}
{{  }}
{{ Additionally, if you query a regular CSV, without the columns extracted, you 
get the following:}}
{{  }}
{{ "rows": [}}
{{    }}

{       "columns": "[\"ACCT_NUM\",\"PRODUCT\",\"MONTH\",\"REVENUE\"]"     }

  ],
   "metadata": [
     "VARCHAR(0, 0)",
     "VARCHAR(0, 0)"
   ],
  

  was:
I'm noticing some strange behavior with the newest version of Drill.  If you 
query a CSV file, you get the following metadata:
 
SELECT * FROM dfs.test.`domains.csvh` LIMIT 1
 
{
  "queryId": "22eee85f-c02c-5878-9735-091d18788061",
  "columns": [
    "domain"
  ],
  "rows": [
    {
      "domain": "thedataist.com"
    }
  ],
  "metadata": [
    "VARCHAR(0, 0)",
    "VARCHAR(0, 0)"
  ],
  "queryState": "COMPLETED",
  "attemptedAutoLimit": 0
}
 
 
There are two issues here:
1.  VARCHAR now has precision 
2.  There are twice as many columns as there should be.
 
Additionally, if you query a regular CSV, without the columns extracted, you 
get the following:
 
"rows": [
    {
      "columns": "[\"ACCT_NUM\",\"PRODUCT\",\"MONTH\",\"REVENUE\"]"
    }
  ],
  "metadata": [
    "VARCHAR(0, 0)",
    "VARCHAR(0, 0)"
  ],
 


> Incorrect Metadata from text file queries
> -
>
> Key: DRILL-7308
> URL: https://issues.apache.org/jira/browse/DRILL-7308
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.17.0
>Reporter: Charles Givre
>Priority: Major
> Attachments: Screen Shot 2019-06-24 at 3.16.40 PM.png, domains.csvh
>
>
> {{I'm noticing some strange behavior with the newest version of Drill.  If 
> you query a CSV file, you get the following metadata:}}
> {{  }}
> {{ SELECT * FROM dfs.test.`domains.csvh` LIMIT 1}}
> {{  }}
> {{ {}}
> {{   "queryId": "22eee85f-c02c-5878-9735-091d18788061",}}
> {{   "columns": [}}
> {{     "domain"}}
> {{   ],}}
> {{   "rows": [}}
> {{    }}{{{       "domain": "thedataist.com"     }}}{{  ],}}
> {{   "metadata": [}}
> {{     "VARCHAR(0, 0)",}}
> {{     "VARCHAR(0, 0)"}}
> {{   ],}}
> {{   "queryState": "COMPLETED",}}
> {{   "attemptedAutoLimit": 0}}
> {{ }}}
> {{  }}
> {{  }}
> {{ There are two issues here:}}
> {{ 1.  VARCHAR now has precision }}
> {{ 2.  There are twice as many columns as there should be.}}
> {{  }}
> {{ Additionally, if you query a regular CSV, without the columns extracted, 
> you get the following:}}
> {{  }}
> {{ "rows": [}}
> {{    }}
> {       "columns": "[\"ACCT_NUM\",\"PRODUCT\",\"MONTH\",\"REVENUE\"]"     }
>   ],
>    "metadata": [
>      "VARCHAR(0, 0)",
>      "VARCHAR(0, 0)"
>    ],
>   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7308) Incorrect Metadata from text file queries

2019-06-24 Thread Charles Givre (JIRA)
Charles Givre created DRILL-7308:


 Summary: Incorrect Metadata from text file queries
 Key: DRILL-7308
 URL: https://issues.apache.org/jira/browse/DRILL-7308
 Project: Apache Drill
  Issue Type: Bug
  Components: Metadata
Affects Versions: 1.17.0
Reporter: Charles Givre
 Attachments: domains.csvh

I'm noticing some strange behavior with the newest version of Drill.  If you 
query a CSV file, you get the following metadata:
 
SELECT * FROM dfs.test.`domains.csvh` LIMIT 1
 
{
  "queryId": "22eee85f-c02c-5878-9735-091d18788061",
  "columns": [
    "domain"
  ],
  "rows": [
    {
      "domain": "thedataist.com"
    }
  ],
  "metadata": [
    "VARCHAR(0, 0)",
    "VARCHAR(0, 0)"
  ],
  "queryState": "COMPLETED",
  "attemptedAutoLimit": 0
}
 
 
There are two issues here:
1.  VARCHAR now has precision 
2.  There are twice as many columns as there should be.
 
Additionally, if you query a regular CSV, without the columns extracted, you 
get the following:
 
"rows": [
    {
      "columns": "[\"ACCT_NUM\",\"PRODUCT\",\"MONTH\",\"REVENUE\"]"
    }
  ],
  "metadata": [
    "VARCHAR(0, 0)",
    "VARCHAR(0, 0)"
  ],
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread Volodymyr Vysotskyi (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Volodymyr Vysotskyi updated DRILL-7271:
---
Description: 
1. Merge info from metadataStatistics + statisticsKinds into one holder: 
Map.
2. Rename hasStatistics to hasDescriptiveStatistics
3. Remove drill-file-metastore-plugin
4. Move  
org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel 
to metadata module, rename to MetadataType and add new value: SEGMENT.
5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
6. Add new info classes:
{noformat}
class TableInfo {
  String storagePlugin;
  String workspace;
  String name;
  String type;
  String owner;
}

class MetadataInfo {

  public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
  public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";

  MetadataType type (enum);
  String key;
  String identifier;
}
{noformat}
7. Modify existing metadata classes:
org.apache.drill.metastore.FileTableMetadata
{noformat}
missing fields
--
storagePlugin, workspace, tableType -> will be covered by TableInfo class
metadataType, metadataKey -> will be covered by MetadataInfo class
interestingColumns

fields to modify

private final Map tableStatistics;
private final Map statisticsKinds;
private final Set partitionKeys; -> Map
{noformat}

org.apache.drill.metastore.PartitionMetadata
{noformat}
missing fields
--
storagePlugin, workspace -> will be covered by TableInfo class
metadataType, metadataKey, metadataIdentifier -> will be covered by 
MetadataInfo class
partitionValues (List)
location (String) (for directory level metadata) - directory location

fields to modify

private final Map tableStatistics;
private final Map statisticsKinds;
private final Set location; -> locations
{noformat}

org.apache.drill.metastore.FileMetadata
{noformat}
missing fields
--
storagePlugin, workspace -> will be covered by TableInfo class
metadataType, metadataKey, metadataIdentifier -> will be covered by 
MetadataInfo class
path - path to file 

fields to modify

private final Map tableStatistics;
private final Map statisticsKinds;
private final Path location; - should contain directory to which file belongs
{noformat}
org.apache.drill.metastore.RowGroupMetadata
{noformat}
missing fields
--
storagePlugin, workspace -> will be covered by TableInfo class
metadataType, metadataKey, metadataIdentifier -> will be covered by 
MetadataInfo class
path - path to file 

fields to modify

private final Map tableStatistics;
private final Map statisticsKinds;
private final Path location; - should contain directory to which file belongs
{noformat}
8. Remove org.apache.drill.exec package from metastore module.
9. Rename ColumnStatisticsImpl class.
10. Separate existing classes in org.apache.drill.metastore package into 
sub-packages.
11. Rename FileTableMetadata -> BaseTableMetadata
12. TableMetadataProvider.getNonInterestingColumnsMeta() -> 
getNonInterestingColumnsMetadata
13. Introduce segment-level metadata class:
{noformat}
class SegmentMetadata {
  TableInfo tableInfo;
  MetadataInfo metadataInfo;
  SchemaPath column;
  TupleMetadata schema;
  String location;
  Map columnsStatistics;
  Map statistics;
  List partitionValues;
  List locations;
  long lastModifiedTime;
}
{noformat}

h1. Segment metadata
In the fix for this Jira, one of the changes is introducing segment level 
metadata.

For now, metadata hierarchy is the following:
- Table
- Segment
- Partition
- File
- Row group

Segment represents some a part of the table united using some specific 
qualities. For example for file system tables, segment may correspond to 
directories with its data. For hive tables, segment corresponds to hive 
partitions.

In opposite, partition metadata, will correspond to "drill partitions". It is 
groups of data which have the same values for specific columns within a file or 
row group.

So filtering will be produced for table level, then for segments, after that 
for partitions, for files and then for row groups.

  was:
1. Merge info from metadataStatistics + statisticsKinds into one holder: 
Map.
2. Rename hasStatistics to hasDescriptiveStatistics
3. Remove drill-file-metastore-plugin
4. Move  
org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel 
to metadata module, rename to MetadataType and add new value: SEGMENT.
5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
6. Add new info classes:
{noformat}
class TableInfo {
  String storagePlugin;
  String workspace;
  String name;
  String type;
  String owner;
}

class MetadataInfo {

  public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
  public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";

  MetadataType type (enum);
  String key;
  String identifier;
}

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871478#comment-16871478
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

arina-ielchiieva commented on issue #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#issuecomment-505076823
 
 
   @vvysotskyi thanks for making the changes. +1
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> org.apache.drill.metastore.RowGroupMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> 8. Remove org.apache.drill.exec package from metastore module.
> 9. Rename ColumnStatisticsImpl class.
> 10. Separate existing classes in org.apache.drill.metastore package into 
> sub-packages.
> 11. Rename FileTableMetadata -> BaseTableMetadata
> 12. TableMetadataProvider.getNonInterestingColumnsMeta() -> 
> getNonInterestingColumnsMetadata
> 13. Introduce segment-level metadata class:
> {noformat}
> class SegmentMetadata {
>   TableInfo tableInfo;
>   MetadataInfo metadataInfo;
>   SchemaPath column;
>   TupleMetadata schema;
>   String location;
>   Map columnsStatistics;
>   Map statistics;
>   List partitionValues;
>   List locations;
>   long lastModifiedTime;
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7271:

Labels: ready-to-commit  (was: )

> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> org.apache.drill.metastore.RowGroupMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> 8. Remove org.apache.drill.exec package from metastore module.
> 9. Rename ColumnStatisticsImpl class.
> 10. Separate existing classes in org.apache.drill.metastore package into 
> sub-packages.
> 11. Rename FileTableMetadata -> BaseTableMetadata
> 12. TableMetadataProvider.getNonInterestingColumnsMeta() -> 
> getNonInterestingColumnsMetadata
> 13. Introduce segment-level metadata class:
> {noformat}
> class SegmentMetadata {
>   TableInfo tableInfo;
>   MetadataInfo metadataInfo;
>   SchemaPath column;
>   TupleMetadata schema;
>   String location;
>   Map columnsStatistics;
>   Map statistics;
>   List partitionValues;
>   List locations;
>   long lastModifiedTime;
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871468#comment-16871468
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296794477
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/MetadataType.java
 ##
 @@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+/**
+ * Enum with possible types of metadata.
+ */
+public enum MetadataType {
+
+  ALL,
 
 Review comment:
   Thanks, done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> org.apache.drill.metastore.RowGroupMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871469#comment-16871469
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296776704
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/PartitionMetadata.java
 ##
 @@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.hadoop.fs.Path;
+
+import java.util.List;
+import java.util.Objects;
+import java.util.Set;
+
+/**
+ * Represents a metadata for the table part, which corresponds to the specific 
partition key.
+ */
+public class PartitionMetadata extends BaseMetadata {
+  private final SchemaPath column;
+  private final List partitionValues;
+  private final Set locations;
+  private final long lastModifiedTime;
+
+  private PartitionMetadata(PartitionMetadataBuilder builder) {
+super(builder);
+this.column = builder.column;
+this.partitionValues = builder.partitionValues;
+this.locations = builder.locations;
+this.lastModifiedTime = builder.lastModifiedTime;
+  }
+
+  /**
+   * It allows to obtain the column path for this partition
+   *
+   * @return column path
+   */
+  public SchemaPath getColumn() {
+return column;
+  }
+
+  /**
+   * File locations for this partition
+   *
+   * @return file locations
+   */
+  public Set getLocations() {
+return locations;
+  }
+
+  /**
+   * It allows to check the time, when any files were modified. It is in Unix 
Timestamp
+   *
+   * @return last modified time of files
+   */
+  public long getLastModifiedTime() {
+return lastModifiedTime;
+  }
+
+  public List getPartitionValues() {
+return partitionValues;
+  }
+
+  public static PartitionMetadataBuilder builder() {
+return new PartitionMetadataBuilder();
+  }
+
+  public static class PartitionMetadataBuilder extends 
BaseMetadataBuilder {
+private SchemaPath column;
+private List partitionValues;
+private Set locations;
+private long lastModifiedTime = 
BaseTableMetadata.NON_DEFINED_LAST_MODIFIED_TIME;
+
+public PartitionMetadataBuilder withLocations(Set locations) {
 
 Review comment:
   Agree, renamed.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871455#comment-16871455
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296753640
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/BaseTableMetadata.java
 ##
 @@ -0,0 +1,143 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.metastore.statistics.ColumnStatistics;
+import org.apache.drill.metastore.statistics.StatisticsHolder;
+import org.apache.hadoop.fs.Path;
+
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+
+/**
+ * Base implementation of {@link TableMetadata} interface.
+ */
+public class BaseTableMetadata extends BaseMetadata implements TableMetadata {
+
+  public static final long NON_DEFINED_LAST_MODIFIED_TIME = -1;
+
+  private final Path location;
+  private final long lastModifiedTime;
+  private final Map partitionKeys;
+  private final List interestingColumns;
+
+  private BaseTableMetadata(BaseTableMetadataBuilder builder) {
+super(builder);
+this.location = builder.location;
+this.partitionKeys = builder.partitionKeys;
+this.interestingColumns = builder.interestingColumns;
+this.lastModifiedTime = builder.lastModifiedTime;
+  }
+
+  public boolean isPartitionColumn(String fieldName) {
+return partitionKeys.containsKey(fieldName);
+  }
+
+  boolean isPartitioned() {
+return !partitionKeys.isEmpty();
+  }
+
+  @Override
+  public Path getLocation() {
+return location;
+  }
+
+  @Override
+  public long getLastModifiedTime() {
+return lastModifiedTime;
+  }
+
+  @Override
+  public List getInterestingColumns() {
+return interestingColumns;
+  }
+
+  @Override
+  @SuppressWarnings("unchecked")
+  public BaseTableMetadata cloneWithStats(Map 
columnStatistics, List tableStatistics) {
+Map mergedTableStatistics = new 
HashMap<>(this.statistics);
+
+// overrides statistics value for the case when new statistics is exact or 
existing one was estimated
+tableStatistics.stream()
+.filter(statisticsHolder -> 
statisticsHolder.getStatisticsKind().isExact()
+  || 
!this.statistics.get(statisticsHolder.getStatisticsKind().getName()).getStatisticsKind().isExact())
+.forEach(statisticsHolder -> 
mergedTableStatistics.put(statisticsHolder.getStatisticsKind().getName(), 
statisticsHolder));
+
+Map newColumnsStatistics = new 
HashMap<>(this.columnsStatistics);
+this.columnsStatistics.forEach(
+(columnName, value) -> newColumnsStatistics.put(columnName, 
value.cloneWith(columnStatistics.get(columnName;
+
+return BaseTableMetadata.builder()
+.withTableInfo(tableInfo)
+.withMetadataInfo(metadataInfo)
+.withLocation(location)
+.withSchema(schema)
+.withColumnsStatistics(newColumnsStatistics)
+.withStatistics(mergedTableStatistics.values())
+.withLastModifiedTime(lastModifiedTime)
+.withPartitionKeys(partitionKeys)
+.withInterestingColumns(interestingColumns)
+.build();
+  }
+
+  public static BaseTableMetadataBuilder builder() {
+return new BaseTableMetadataBuilder();
+  }
+
+  public static class BaseTableMetadataBuilder extends 
BaseMetadataBuilder {
+private Path location;
+private long lastModifiedTime = NON_DEFINED_LAST_MODIFIED_TIME;
+private Map partitionKeys;
+private List interestingColumns;
+
+public BaseTableMetadataBuilder withLocation(Path location) {
 
 Review comment:
   Done, thanks.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871456#comment-16871456
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296757799
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/BaseTableMetadata.java
 ##
 @@ -0,0 +1,143 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.metastore.statistics.ColumnStatistics;
+import org.apache.drill.metastore.statistics.StatisticsHolder;
+import org.apache.hadoop.fs.Path;
+
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+
+/**
+ * Base implementation of {@link TableMetadata} interface.
+ */
+public class BaseTableMetadata extends BaseMetadata implements TableMetadata {
+
+  public static final long NON_DEFINED_LAST_MODIFIED_TIME = -1;
+
+  private final Path location;
+  private final long lastModifiedTime;
+  private final Map partitionKeys;
+  private final List interestingColumns;
+
+  private BaseTableMetadata(BaseTableMetadataBuilder builder) {
+super(builder);
+this.location = builder.location;
+this.partitionKeys = builder.partitionKeys;
+this.interestingColumns = builder.interestingColumns;
+this.lastModifiedTime = builder.lastModifiedTime;
+  }
+
+  public boolean isPartitionColumn(String fieldName) {
+return partitionKeys.containsKey(fieldName);
+  }
+
+  boolean isPartitioned() {
+return !partitionKeys.isEmpty();
+  }
+
+  @Override
+  public Path getLocation() {
+return location;
+  }
+
+  @Override
+  public long getLastModifiedTime() {
+return lastModifiedTime;
+  }
+
+  @Override
+  public List getInterestingColumns() {
+return interestingColumns;
+  }
+
+  @Override
+  @SuppressWarnings("unchecked")
+  public BaseTableMetadata cloneWithStats(Map 
columnStatistics, List tableStatistics) {
+Map mergedTableStatistics = new 
HashMap<>(this.statistics);
+
+// overrides statistics value for the case when new statistics is exact or 
existing one was estimated
+tableStatistics.stream()
+.filter(statisticsHolder -> 
statisticsHolder.getStatisticsKind().isExact()
+  || 
!this.statistics.get(statisticsHolder.getStatisticsKind().getName()).getStatisticsKind().isExact())
+.forEach(statisticsHolder -> 
mergedTableStatistics.put(statisticsHolder.getStatisticsKind().getName(), 
statisticsHolder));
+
+Map newColumnsStatistics = new 
HashMap<>(this.columnsStatistics);
+this.columnsStatistics.forEach(
+(columnName, value) -> newColumnsStatistics.put(columnName, 
value.cloneWith(columnStatistics.get(columnName;
+
+return BaseTableMetadata.builder()
+.withTableInfo(tableInfo)
+.withMetadataInfo(metadataInfo)
+.withLocation(location)
+.withSchema(schema)
+.withColumnsStatistics(newColumnsStatistics)
+.withStatistics(mergedTableStatistics.values())
+.withLastModifiedTime(lastModifiedTime)
+.withPartitionKeys(partitionKeys)
+.withInterestingColumns(interestingColumns)
+.build();
+  }
+
+  public static BaseTableMetadataBuilder builder() {
+return new BaseTableMetadataBuilder();
+  }
+
+  public static class BaseTableMetadataBuilder extends 
BaseMetadataBuilder {
+private Path location;
+private long lastModifiedTime = NON_DEFINED_LAST_MODIFIED_TIME;
+private Map partitionKeys;
+private List interestingColumns;
+
+public BaseTableMetadataBuilder withLocation(Path location) {
+  this.location = location;
+  return self();
+}
+
+public BaseTableMetadataBuilder withLastModifiedTime(long 
lastModifiedTime) {
+  this.lastModifiedTime = lastModifiedTime;
+  return self();
+}
+
+public BaseTableMetadataBuilder 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871459#comment-16871459
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296772407
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/MetadataInfo.java
 ##
 @@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+/**
+ * Class which identifies specific metadata.
+ */
+public class MetadataInfo {
+
+  public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
+  public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
+  public static final String DEFAULT_COLUMN_PREFIX = "_$SEGMENT_";
 
 Review comment:
   This constant will be used for creating a segment column name to avoid 
depending on the values of session options for partition column names.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871454#comment-16871454
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296752797
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/BaseMetadata.java
 ##
 @@ -0,0 +1,148 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.metastore.SchemaPathUtils;
+import org.apache.drill.metastore.statistics.ColumnStatistics;
+import org.apache.drill.metastore.statistics.StatisticsHolder;
+import org.apache.drill.metastore.statistics.StatisticsKind;
+
+import java.util.Collection;
+import java.util.Map;
+import java.util.Objects;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+
+/**
+ * Common provider of tuple schema, column metadata, and statistics for table, 
partition, file or row group.
+ */
+public abstract class BaseMetadata implements Metadata {
+  protected final TableInfo tableInfo;
+  protected final MetadataInfo metadataInfo;
+  protected final TupleMetadata schema;
+  protected final Map columnsStatistics;
+  protected final Map statistics;
+
+  protected > 
BaseMetadata(BaseMetadataBuilder builder) {
+this.tableInfo = builder.tableInfo;
+this.metadataInfo = builder.metadataInfo;
+this.schema = builder.schema;
+this.columnsStatistics = builder.columnsStatistics;
+this.statistics = builder.statistics.stream()
+.collect(Collectors.toMap(
+statistic -> statistic.getStatisticsKind().getName(),
+Function.identity(),
+(a, b) -> a.getStatisticsKind().isExact() ? a : b));
+  }
+
+  @Override
+  public Map getColumnsStatistics() {
+return columnsStatistics;
+  }
+
+  @Override
+  public ColumnStatistics getColumnStatistics(SchemaPath columnName) {
+return columnsStatistics.get(columnName);
+  }
+
+  @Override
+  public TupleMetadata getSchema() {
+return schema;
+  }
+
+  @Override
+  @SuppressWarnings("unchecked")
+  public  V getStatistic(StatisticsKind statisticsKind) {
+StatisticsHolder statisticsHolder = 
statistics.get(statisticsKind.getName());
+return statisticsHolder != null ? statisticsHolder.getStatisticsValue() : 
null;
+  }
+
+  @Override
+  public boolean containsExactStatistics(StatisticsKind statisticsKind) {
+StatisticsHolder statisticsHolder = 
statistics.get(statisticsKind.getName());
+return statisticsHolder != null && 
statisticsHolder.getStatisticsKind().isExact();
+  }
+
+  @Override
+  @SuppressWarnings("unchecked")
+  public  V getStatisticsForColumn(SchemaPath columnName, StatisticsKind 
statisticsKind) {
+return (V) columnsStatistics.get(columnName).get(statisticsKind);
+  }
+
+  @Override
+  public ColumnMetadata getColumn(SchemaPath name) {
+return SchemaPathUtils.getColumnMetadata(name, schema);
+  }
+
+  @Override
+  public TableInfo getTableInfo() {
+return tableInfo;
+  }
+
+  @Override
+  public MetadataInfo getMetadataInfo() {
+return metadataInfo;
+  }
+
+  public static abstract class BaseMetadataBuilder> {
+protected TableInfo tableInfo;
+protected MetadataInfo metadataInfo;
+protected TupleMetadata schema;
+protected Map columnsStatistics;
+protected Collection statistics;
+
+public T withTableInfo(TableInfo tableInfo) {
 
 Review comment:
   Thanks, removed.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871446#comment-16871446
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296737969
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/TableMetadataUtils.java
 ##
 @@ -0,0 +1,139 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.metastore.metadata.BaseMetadata;
+import org.apache.drill.metastore.metadata.TableMetadata;
+import org.apache.drill.metastore.statistics.CollectableColumnStatisticsKind;
+import org.apache.drill.metastore.statistics.ColumnStatistics;
+import org.apache.drill.metastore.statistics.ColumnStatisticsKind;
+import org.apache.drill.metastore.statistics.StatisticsHolder;
+import org.apache.drill.metastore.statistics.TableStatisticsKind;
+import 
org.apache.drill.shaded.guava.com.google.common.primitives.UnsignedBytes;
+
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.Comparator;
+import java.util.HashMap;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+public class TableMetadataUtils {
+
+  private TableMetadataUtils() {
 
 Review comment:
   Agree, removed it.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871447#comment-16871447
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296746956
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/statistics/StatisticsHolder.java
 ##
 @@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.statistics;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonInclude;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.fasterxml.jackson.annotation.JsonTypeInfo;
+import com.fasterxml.jackson.core.JsonProcessingException;
+import com.fasterxml.jackson.core.util.DefaultPrettyPrinter;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.fasterxml.jackson.databind.ObjectReader;
+import com.fasterxml.jackson.databind.ObjectWriter;
+
+import java.io.IOException;
+
+/**
+ * Class-holder for statistics kind and its value.
+ *
+ * @param  Type of statistics value
+ */
+@JsonInclude(JsonInclude.Include.NON_DEFAULT)
+public class StatisticsHolder {
+
+  public static final ObjectWriter OBJECT_WRITER = new 
ObjectMapper().setDefaultPrettyPrinter(new DefaultPrettyPrinter()).writer();
+  private static final ObjectReader OBJECT_READER = new 
ObjectMapper().readerFor(StatisticsHolder.class);
+
+  private final T statisticsValue;
+  private final BaseStatisticsKind statisticsKind;
+
+  @JsonCreator
+  public StatisticsHolder(@JsonProperty("statisticsValue") T statisticsValue,
+  @JsonProperty("statisticsKind") BaseStatisticsKind 
statisticsKind) {
+this.statisticsValue = statisticsValue;
+this.statisticsKind = statisticsKind;
+  }
+
+  public StatisticsHolder(T statisticsValue,
+  StatisticsKind statisticsKind) {
+this.statisticsValue = statisticsValue;
+this.statisticsKind = (BaseStatisticsKind) statisticsKind;
+  }
+
+  @JsonTypeInfo(use = JsonTypeInfo.Id.CLASS,
+include = JsonTypeInfo.As.WRAPPER_OBJECT)
+  public T getStatisticsValue() {
+return statisticsValue;
+  }
+
+  public StatisticsKind getStatisticsKind() {
+return statisticsKind;
+  }
+
+  public static StatisticsHolder deserialize(String serialized) throws 
IOException {
+return OBJECT_READER.readValue(serialized);
+  }
+
+  public static String serialize(StatisticsHolder statisticsHolder) throws 
JsonProcessingException {
 
 Review comment:
   Thanks, done for this class and for `ColumnStatistics`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871465#comment-16871465
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296786531
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/AbstractParquetGroupScan.java
 ##
 @@ -210,6 +213,17 @@ public int getMaxParallelizationWidth() {
 return readEntries;
   }
 
+  /**
+   * {@inheritDoc}
+   * 
+   * - if file metadata was pruned, prunes underlying metadata
 
 Review comment:
   Yes, it can. Fixed.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> org.apache.drill.metastore.RowGroupMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> 8. Remove org.apache.drill.exec package from metastore module.
> 9. Rename ColumnStatisticsImpl class.
> 10. Separate existing classes in org.apache.drill.metastore package into 
> sub-packages.
> 11. Rename FileTableMetadata -> BaseTableMetadata
> 12. TableMetadataProvider.getNonInterestingColumnsMeta() -> 
> getNonInterestingColumnsMetadata
> 13. Introduce segment-level metadata class:
> {noformat}
> class SegmentMetadata {
>   TableInfo tableInfo;
>   MetadataInfo metadataInfo;
>   SchemaPath column;
>   TupleMetadata 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871457#comment-16871457
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296761257
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/AbstractGroupScanWithMetadata.java
 ##
 @@ -221,6 +229,31 @@ public void setFilterForRuntime(LogicalExpression 
filterExpr, OptimizerRulesCont
 if ( ! skipRuntimePruning ) { setFilter(filterExpr); }
   }
 
+  /**
+   * Applies specified filter {@code filterExpr} to current group scan and 
produces filtering at:
+   * 
+   * table level:
+   * - if filter matches all the the data or prunes all the data, sets 
corresponding value to
 
 Review comment:
   Agree, thanks for pointing this, replaced it with nested lists.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> org.apache.drill.metastore.RowGroupMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> 8. Remove org.apache.drill.exec package from metastore module.
> 9. Rename ColumnStatisticsImpl class.
> 10. Separate existing classes in org.apache.drill.metastore package into 
> sub-packages.
> 11. Rename FileTableMetadata -> BaseTableMetadata
> 12. 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871448#comment-16871448
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296746701
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/statistics/StatisticsHolder.java
 ##
 @@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.statistics;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonInclude;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.fasterxml.jackson.annotation.JsonTypeInfo;
+import com.fasterxml.jackson.core.JsonProcessingException;
+import com.fasterxml.jackson.core.util.DefaultPrettyPrinter;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.fasterxml.jackson.databind.ObjectReader;
+import com.fasterxml.jackson.databind.ObjectWriter;
+
+import java.io.IOException;
+
+/**
+ * Class-holder for statistics kind and its value.
+ *
+ * @param  Type of statistics value
+ */
+@JsonInclude(JsonInclude.Include.NON_DEFAULT)
+public class StatisticsHolder {
+
+  public static final ObjectWriter OBJECT_WRITER = new 
ObjectMapper().setDefaultPrettyPrinter(new DefaultPrettyPrinter()).writer();
+  private static final ObjectReader OBJECT_READER = new 
ObjectMapper().readerFor(StatisticsHolder.class);
+
+  private final T statisticsValue;
+  private final BaseStatisticsKind statisticsKind;
+
+  @JsonCreator
+  public StatisticsHolder(@JsonProperty("statisticsValue") T statisticsValue,
+  @JsonProperty("statisticsKind") BaseStatisticsKind 
statisticsKind) {
+this.statisticsValue = statisticsValue;
+this.statisticsKind = statisticsKind;
+  }
+
+  public StatisticsHolder(T statisticsValue,
+  StatisticsKind statisticsKind) {
+this.statisticsValue = statisticsValue;
+this.statisticsKind = (BaseStatisticsKind) statisticsKind;
+  }
+
+  @JsonTypeInfo(use = JsonTypeInfo.Id.CLASS,
+include = JsonTypeInfo.As.WRAPPER_OBJECT)
+  public T getStatisticsValue() {
+return statisticsValue;
+  }
+
+  public StatisticsKind getStatisticsKind() {
+return statisticsKind;
+  }
+
+  public static StatisticsHolder deserialize(String serialized) throws 
IOException {
 
 Review comment:
   Thanks, done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871444#comment-16871444
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296737602
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillStatsTable.java
 ##
 @@ -452,53 +456,54 @@ public static ObjectMapper getMapper() {
 .addDeserializer(TypeProtos.MajorType.class, new MajorTypeSerDe.De())
 .addDeserializer(SchemaPath.class, new SchemaPath.De());
 mapper.registerModule(deModule);
+mapper.registerSubtypes(new NamedType(NumericEquiDepthHistogram.class, 
"numeric-equi-depth"));
 
 Review comment:
   It would be nice, but I think I can break backward compatibility since it 
was defined earlier here: 
https://github.com/apache/drill/blob/05a1a3a888a7408bde683acc36f406fbd2459254/exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/Histogram.java#L31
 
   So all previously created stats files wouldn't be deserialized correctly.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> org.apache.drill.metastore.RowGroupMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> 8. Remove org.apache.drill.exec package from metastore 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871463#comment-16871463
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296765872
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/AbstractGroupScanWithMetadata.java
 ##
 @@ -666,27 +733,66 @@ protected void filterTableMetadata(FilterPredicate 
filterPredicate, Set Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> org.apache.drill.metastore.RowGroupMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> 8. Remove org.apache.drill.exec package from metastore module.
> 9. Rename ColumnStatisticsImpl class.
> 10. Separate existing classes in org.apache.drill.metastore package into 
> sub-packages.
> 11. Rename FileTableMetadata -> BaseTableMetadata
> 12. TableMetadataProvider.getNonInterestingColumnsMeta() -> 
> getNonInterestingColumnsMetadata
> 13. Introduce segment-level metadata class:
> {noformat}
> class SegmentMetadata {
>   TableInfo tableInfo;
>   MetadataInfo metadataInfo;
>   SchemaPath column;
>   TupleMetadata schema;
>   String location;
>   Map columnsStatistics;
>   Map statistics;
>   List partitionValues;
>   List locations;
>   long lastModifiedTime;
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871460#comment-16871460
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296765020
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/AbstractGroupScanWithMetadata.java
 ##
 @@ -572,34 +626,39 @@ public 
GroupScanWithMetadataFilterer(AbstractGroupScanWithMetadata source) {
  */
 public abstract AbstractGroupScanWithMetadata build();
 
-public GroupScanWithMetadataFilterer withTable(TableMetadata 
tableMetadata) {
+public B withTable(TableMetadata tableMetadata) {
   this.tableMetadata = tableMetadata;
-  return this;
+  return self();
 
 Review comment:
   `self()` method was introduced to return a specific type of implementation 
instead of the base type. So we don't need to add casts for the case when 
`this` instance should be returned.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> org.apache.drill.metastore.RowGroupMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> 8. Remove org.apache.drill.exec package from metastore module.
> 9. Rename ColumnStatisticsImpl class.
> 10. Separate existing classes in 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871464#comment-16871464
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296776542
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/PartitionMetadata.java
 ##
 @@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.hadoop.fs.Path;
+
+import java.util.List;
+import java.util.Objects;
+import java.util.Set;
+
+/**
+ * Represents a metadata for the table part, which corresponds to the specific 
partition key.
+ */
+public class PartitionMetadata extends BaseMetadata {
+  private final SchemaPath column;
+  private final List partitionValues;
+  private final Set locations;
+  private final long lastModifiedTime;
+
+  private PartitionMetadata(PartitionMetadataBuilder builder) {
+super(builder);
+this.column = builder.column;
+this.partitionValues = builder.partitionValues;
+this.locations = builder.locations;
+this.lastModifiedTime = builder.lastModifiedTime;
+  }
+
+  /**
+   * It allows to obtain the column path for this partition
+   *
+   * @return column path
+   */
+  public SchemaPath getColumn() {
+return column;
+  }
+
+  /**
+   * File locations for this partition
+   *
+   * @return file locations
+   */
+  public Set getLocations() {
+return locations;
+  }
+
+  /**
+   * It allows to check the time, when any files were modified. It is in Unix 
Timestamp
 
 Review comment:
   Thanks, done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871442#comment-16871442
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296731422
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/StatisticsProvider.java
 ##
 @@ -218,89 +201,85 @@ public ColumnStatistics 
visitFunctionHolderExpression(FunctionHolderExpression h
   ValueHolder minFuncHolder = 
InterpreterEvaluator.evaluateFunction(interpreter, args1, holderExpr.getName());
   ValueHolder maxFuncHolder = 
InterpreterEvaluator.evaluateFunction(interpreter, args2, holderExpr.getName());
 
-  MinMaxStatistics statistics;
   switch (destType) {
 case INT:
-  statistics = new MinMaxStatistics<>(((IntHolder) 
minFuncHolder).value, ((IntHolder) maxFuncHolder).value, Integer::compareTo);
-  break;
+  return StatisticsProvider.getColumnStatistics(
+  ((IntHolder) minFuncHolder).value,
+  ((IntHolder) maxFuncHolder).value,
+  ColumnStatisticsKind.NULLS_COUNT.getFrom(input),
+  destType);
 case BIGINT:
-  statistics = new MinMaxStatistics<>(((BigIntHolder) 
minFuncHolder).value, ((BigIntHolder) maxFuncHolder).value, Long::compareTo);
-  break;
+  return StatisticsProvider.getColumnStatistics(
+  ((BigIntHolder) minFuncHolder).value,
+  ((BigIntHolder) maxFuncHolder).value,
+  ColumnStatisticsKind.NULLS_COUNT.getFrom(input),
+  destType);
 case FLOAT4:
-  statistics = new MinMaxStatistics<>(((Float4Holder) 
minFuncHolder).value, ((Float4Holder) maxFuncHolder).value, Float::compareTo);
-  break;
+  return StatisticsProvider.getColumnStatistics(
+  ((Float4Holder) minFuncHolder).value,
+  ((Float4Holder) maxFuncHolder).value,
+  ColumnStatisticsKind.NULLS_COUNT.getFrom(input),
+  destType);
 case FLOAT8:
-  statistics = new MinMaxStatistics<>(((Float8Holder) 
minFuncHolder).value, ((Float8Holder) maxFuncHolder).value, Double::compareTo);
-  break;
+  return StatisticsProvider.getColumnStatistics(
+  ((Float8Holder) minFuncHolder).value,
+  ((Float8Holder) maxFuncHolder).value,
+  ColumnStatisticsKind.NULLS_COUNT.getFrom(input),
+  destType);
 case TIMESTAMP:
-  statistics = new MinMaxStatistics<>(((TimeStampHolder) 
minFuncHolder).value, ((TimeStampHolder) maxFuncHolder).value, Long::compareTo);
-  break;
+  return StatisticsProvider.getColumnStatistics(
+  ((TimeStampHolder) minFuncHolder).value,
+  ((TimeStampHolder) maxFuncHolder).value,
+  ColumnStatisticsKind.NULLS_COUNT.getFrom(input),
+  destType);
 default:
   return null;
   }
-  statistics.setNullsCount((long) 
input.getStatistic(ColumnStatisticsKind.NULLS_COUNT));
-  return statistics;
 } catch (Exception e) {
-  throw new DrillRuntimeException("Error in evaluating function of " + 
holderExpr.getName() );
+  throw new DrillRuntimeException("Error in evaluating function of " + 
holderExpr.getName());
 }
   }
 
-  public static class MinMaxStatistics implements ColumnStatistics {
-private final V minVal;
-private final V maxVal;
-private final Comparator valueComparator;
-private long nullsCount;
-
-public MinMaxStatistics(V minVal, V maxVal, Comparator valueComparator) 
{
-  this.minVal = minVal;
-  this.maxVal = maxVal;
-  this.valueComparator = valueComparator;
-}
-
-@Override
-public Object getStatistic(StatisticsKind statisticsKind) {
-  switch (statisticsKind.getName()) {
-case ExactStatisticsConstants.MIN_VALUE:
-  return minVal;
-case ExactStatisticsConstants.MAX_VALUE:
-  return maxVal;
-case ExactStatisticsConstants.NULLS_COUNT:
-  return nullsCount;
-default:
-  return null;
-  }
-}
-
-@Override
-public boolean containsStatistic(StatisticsKind statisticsKind) {
-  switch (statisticsKind.getName()) {
-case ExactStatisticsConstants.MIN_VALUE:
-case ExactStatisticsConstants.MAX_VALUE:
-case ExactStatisticsConstants.NULLS_COUNT:
-  return true;
-default:
-  return false;
-  }
-}
-
-@Override
-public boolean containsExactStatistics(StatisticsKind statisticsKind) {
-  return true;
-}
-
-@Override
-public Comparator getValueComparator() {
-  return valueComparator;
-  

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871467#comment-16871467
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296787766
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/AbstractParquetGroupScan.java
 ##
 @@ -281,31 +265,54 @@ public AbstractGroupScanWithMetadata 
applyFilter(LogicalExpression filterExpr, U
 
   logger.debug("All row groups have been filtered out. Add back one to get 
schema from scanner");
 
+  Map segmentsMap = 
getNextOrEmpty(getSegmentsMetadata().values()).stream()
+  .collect(Collectors.toMap(SegmentMetadata::getPath, 
Function.identity()));
+
   Map filesMap = 
getNextOrEmpty(getFilesMetadata().values()).stream()
-  .collect(Collectors.toMap(FileMetadata::getLocation, 
Function.identity()));
+  .collect(Collectors.toMap(FileMetadata::getPath, 
Function.identity()));
 
   Multimap rowGroupsMap = 
LinkedListMultimap.create();
-  getNextOrEmpty(getRowGroupsMetadata().values()).forEach(entry -> 
rowGroupsMap.put(entry.getLocation(), entry));
+  getNextOrEmpty(getRowGroupsMetadata().values()).forEach(entry -> 
rowGroupsMap.put(entry.getPath(), entry));
 
-  builder.withRowGroups(rowGroupsMap)
+  filteredMetadata.withRowGroups(rowGroupsMap)
   .withTable(getTableMetadata())
+  .withSegments(segmentsMap)
   .withPartitions(getNextOrEmpty(getPartitionsMetadata()))
   .withNonInterestingColumns(getNonInterestingColumnsMetadata())
   .withFiles(filesMap)
   .withMatching(false);
 }
 
-if (builder.getOverflowLevel() != MetadataLevel.NONE) {
-  logger.warn("applyFilter {} wasn't able to do pruning for  all metadata 
levels filter condition, since metadata count for " +
-"{} level exceeds 
`planner.store.parquet.rowgroup.filter.pushdown.threshold` value.\n" +
-"But underlying metadata was pruned without filter expression 
according to the metadata with above level.",
-  ExpressionStringBuilder.toString(filterExpr), 
builder.getOverflowLevel());
+if (filteredMetadata.getOverflowLevel() != MetadataType.NONE) {
+  if (logger.isWarnEnabled()) {
 
 Review comment:
   Agree, this is very unlikely)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871470#comment-16871470
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296779015
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/TableInfo.java
 ##
 @@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+/**
+ * General table information.
+ */
+public class TableInfo {
+  public static final String UNKNOWN = "UNKNOWN";
+  public static final TableInfo UNKNOWN_TABLE_INFO = new TableInfo(UNKNOWN, 
UNKNOWN, UNKNOWN, UNKNOWN, UNKNOWN);
+
+  private final String storagePlugin;
+  private final String workspace;
+  private final String name;
+  private final String type;
+  private final String owner;
+
+  public TableInfo(String storagePlugin, String workspace, String name, String 
type, String owner) {
 
 Review comment:
   Thanks, done
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871451#comment-16871451
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296753052
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/BaseMetadata.java
 ##
 @@ -0,0 +1,148 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.metastore.SchemaPathUtils;
+import org.apache.drill.metastore.statistics.ColumnStatistics;
+import org.apache.drill.metastore.statistics.StatisticsHolder;
+import org.apache.drill.metastore.statistics.StatisticsKind;
+
+import java.util.Collection;
+import java.util.Map;
+import java.util.Objects;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+
+/**
+ * Common provider of tuple schema, column metadata, and statistics for table, 
partition, file or row group.
+ */
+public abstract class BaseMetadata implements Metadata {
+  protected final TableInfo tableInfo;
+  protected final MetadataInfo metadataInfo;
+  protected final TupleMetadata schema;
+  protected final Map columnsStatistics;
+  protected final Map statistics;
+
+  protected > 
BaseMetadata(BaseMetadataBuilder builder) {
+this.tableInfo = builder.tableInfo;
+this.metadataInfo = builder.metadataInfo;
+this.schema = builder.schema;
+this.columnsStatistics = builder.columnsStatistics;
+this.statistics = builder.statistics.stream()
+.collect(Collectors.toMap(
+statistic -> statistic.getStatisticsKind().getName(),
+Function.identity(),
+(a, b) -> a.getStatisticsKind().isExact() ? a : b));
+  }
+
+  @Override
+  public Map getColumnsStatistics() {
+return columnsStatistics;
+  }
+
+  @Override
+  public ColumnStatistics getColumnStatistics(SchemaPath columnName) {
+return columnsStatistics.get(columnName);
+  }
+
+  @Override
+  public TupleMetadata getSchema() {
+return schema;
+  }
+
+  @Override
+  @SuppressWarnings("unchecked")
+  public  V getStatistic(StatisticsKind statisticsKind) {
+StatisticsHolder statisticsHolder = 
statistics.get(statisticsKind.getName());
+return statisticsHolder != null ? statisticsHolder.getStatisticsValue() : 
null;
+  }
+
+  @Override
+  public boolean containsExactStatistics(StatisticsKind statisticsKind) {
+StatisticsHolder statisticsHolder = 
statistics.get(statisticsKind.getName());
+return statisticsHolder != null && 
statisticsHolder.getStatisticsKind().isExact();
+  }
+
+  @Override
+  @SuppressWarnings("unchecked")
+  public  V getStatisticsForColumn(SchemaPath columnName, StatisticsKind 
statisticsKind) {
+return (V) columnsStatistics.get(columnName).get(statisticsKind);
+  }
+
+  @Override
+  public ColumnMetadata getColumn(SchemaPath name) {
+return SchemaPathUtils.getColumnMetadata(name, schema);
+  }
+
+  @Override
+  public TableInfo getTableInfo() {
+return tableInfo;
+  }
+
+  @Override
+  public MetadataInfo getMetadataInfo() {
+return metadataInfo;
+  }
+
+  public static abstract class BaseMetadataBuilder> {
+protected TableInfo tableInfo;
+protected MetadataInfo metadataInfo;
+protected TupleMetadata schema;
+protected Map columnsStatistics;
+protected Collection statistics;
+
+public T withTableInfo(TableInfo tableInfo) {
+  this.tableInfo = tableInfo;
+  return self();
+}
+
+public T withMetadataInfo(MetadataInfo metadataInfo) {
+  this.metadataInfo = metadataInfo;
+  return self();
+}
+
+public T withSchema(TupleMetadata schema) {
+  this.schema = schema;
+  return self();
+}
+
+public T withColumnsStatistics(Map 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871445#comment-16871445
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296750708
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/BaseMetadata.java
 ##
 @@ -0,0 +1,148 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.metastore.SchemaPathUtils;
+import org.apache.drill.metastore.statistics.ColumnStatistics;
+import org.apache.drill.metastore.statistics.StatisticsHolder;
+import org.apache.drill.metastore.statistics.StatisticsKind;
+
+import java.util.Collection;
+import java.util.Map;
+import java.util.Objects;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+
+/**
+ * Common provider of tuple schema, column metadata, and statistics for table, 
partition, file or row group.
+ */
+public abstract class BaseMetadata implements Metadata {
+  protected final TableInfo tableInfo;
+  protected final MetadataInfo metadataInfo;
+  protected final TupleMetadata schema;
+  protected final Map columnsStatistics;
+  protected final Map statistics;
 
 Review comment:
   Agree, `metadataStatistics` fits better, renamed it.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871462#comment-16871462
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296762428
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/AbstractGroupScanWithMetadata.java
 ##
 @@ -535,31 +581,39 @@ public TableMetadata getTableMetadata() {
 return partitions;
   }
 
+  protected Map getSegmentsMetadata() {
+if (segments == null) {
+  segments = metadataProvider.getSegmentsMetadataMap();
+}
+return segments;
+  }
+
   @JsonIgnore
   public NonInterestingColumnsMetadata getNonInterestingColumnsMetadata() {
 if (nonInterestingColumnsMetadata == null) {
-  nonInterestingColumnsMetadata = 
metadataProvider.getNonInterestingColumnsMeta();
+  nonInterestingColumnsMetadata = 
metadataProvider.getNonInterestingColumnsMetadata();
 }
 return nonInterestingColumnsMetadata;
   }
 
   /**
* This class is responsible for filtering different metadata levels.
*/
-  protected abstract static class GroupScanWithMetadataFilterer {
+  protected abstract static class GroupScanWithMetadataFilterer> {
 protected final AbstractGroupScanWithMetadata source;
 
 protected boolean matchAllMetadata = false;
 
 protected TableMetadata tableMetadata;
 protected List partitions = Collections.emptyList();
+protected Map segments = Collections.emptyMap();
 
 Review comment:
   Yes, it is expected. Later it may be replaced with a regular list or if 
filtering will not happen, there wouldn't be allocated new object.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871461#comment-16871461
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296770734
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/MetadataInfo.java
 ##
 @@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+/**
+ * Class which identifies specific metadata.
 
 Review comment:
   Thanks, done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> org.apache.drill.metastore.RowGroupMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871452#comment-16871452
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296767257
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/AbstractGroupScanWithMetadata.java
 ##
 @@ -666,27 +733,66 @@ protected void filterTableMetadata(FilterPredicate 
filterPredicate, Set schemaPathsInExpr) {
+protected void filterSegmentMetadata(OptionManager optionManager,
+ FilterPredicate filterPredicate,
+ Set schemaPathsInExpr) {
   if (!matchAllMetadata) {
-if (!source.getPartitionsMetadata().isEmpty()) {
-  if (source.getPartitionsMetadata().size() <= optionManager.getOption(
-
PlannerSettings.PARQUET_ROWGROUP_FILTER_PUSHDOWN_PLANNING_THRESHOLD)) {
+if (!source.getSegmentsMetadata().isEmpty()) {
+  if (source.getSegmentsMetadata().size() <= optionManager.getOption(
+  
PlannerSettings.PARQUET_ROWGROUP_FILTER_PUSHDOWN_PLANNING_THRESHOLD)) {
 matchAllMetadata = true;
-partitions = filterAndGetMetadata(schemaPathsInExpr, 
source.getPartitionsMetadata(), filterPredicate, optionManager);
+segments = filterAndGetMetadata(schemaPathsInExpr,
+source.getSegmentsMetadata().values(),
+filterPredicate,
+optionManager).stream()
+.collect(Collectors.toMap(SegmentMetadata::getPath, 
Function.identity()));
 
 Review comment:
   Thanks, formatted the code and added `BinaryOperator`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871472#comment-16871472
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296781802
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/statistics/ColumnStatistics.java
 ##
 @@ -0,0 +1,167 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.statistics;
+
+import com.fasterxml.jackson.annotation.JsonAutoDetect;
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonInclude;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.fasterxml.jackson.annotation.JsonPropertyOrder;
+import com.fasterxml.jackson.core.JsonProcessingException;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.fasterxml.jackson.databind.ObjectReader;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.metastore.TableMetadataUtils;
+
+import java.io.IOException;
+import java.util.Collection;
+import java.util.Comparator;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+
+import static 
org.apache.drill.metastore.statistics.StatisticsHolder.OBJECT_WRITER;
+
+/**
+ * Represents collection of statistics values for specific column.
 
 Review comment:
   Thanks, added.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871443#comment-16871443
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296728354
 
 

 ##
 File path: 
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeParquetScan.java
 ##
 @@ -228,5 +228,10 @@ public 
HiveDrillNativeParquetScanFilterer(HiveDrillNativeParquetScan source) {
 protected AbstractParquetGroupScan getNewScan() {
   return new HiveDrillNativeParquetScan((HiveDrillNativeParquetScan) 
source);
 }
+
+@Override
+protected HiveDrillNativeParquetScanFilterer self() {
 
 Review comment:
   This method came from `GroupScanWithMetadataFilterer` and is used to return 
the correct type of `this` instance to avoid casts in parent classes.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> org.apache.drill.metastore.RowGroupMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> 8. Remove org.apache.drill.exec package from metastore module.
> 9. Rename ColumnStatisticsImpl class.
> 10. Separate existing classes in org.apache.drill.metastore package into 
> sub-packages.
> 11. Rename FileTableMetadata -> BaseTableMetadata
> 12. 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871471#comment-16871471
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296794257
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/BaseParquetMetadataProvider.java
 ##
 @@ -313,18 +328,30 @@ public TableMetadata getTableMetadata() {
   partitionsForValue.asMap().forEach((partitionKey, value) -> {
 Map columnsStatistics = new 
HashMap<>();
 
-Map statistics = new HashMap<>();
+List statistics = new ArrayList<>();
 partitionKey = partitionKey == NULL_VALUE ? null : partitionKey;
-statistics.put(ColumnStatisticsKind.MIN_VALUE, partitionKey);
-statistics.put(ColumnStatisticsKind.MAX_VALUE, partitionKey);
+statistics.add(new StatisticsHolder<>(partitionKey, 
ColumnStatisticsKind.MIN_VALUE));
+statistics.add(new StatisticsHolder<>(partitionKey, 
ColumnStatisticsKind.MAX_VALUE));
 
-statistics.put(ColumnStatisticsKind.NULLS_COUNT, 
Statistic.NO_COLUMN_STATS);
-statistics.put(TableStatisticsKind.ROW_COUNT, 
Statistic.NO_COLUMN_STATS);
+statistics.add(new StatisticsHolder<>(Statistic.NO_COLUMN_STATS, 
ColumnStatisticsKind.NULLS_COUNT));
+statistics.add(new StatisticsHolder<>(Statistic.NO_COLUMN_STATS, 
TableStatisticsKind.ROW_COUNT));
 columnsStatistics.put(partitionColumn,
-new ColumnStatisticsImpl<>(statistics,
-
ParquetTableMetadataUtils.getComparator(getParquetGroupScanStatistics().getTypeForColumn(partitionColumn).getMinorType(;
-partitions.add(new PartitionMetadata(partitionColumn, 
getTableMetadata().getSchema(),
-columnsStatistics, statistics, (Set) value, tableName, 
-1));
+new ColumnStatistics<>(statistics,
+
getParquetGroupScanStatistics().getTypeForColumn(partitionColumn).getMinorType()));
+MetadataInfo metadataInfo = new 
MetadataInfo(MetadataType.PARTITION, MetadataInfo.GENERAL_INFO_KEY, null);
+TableMetadata tableMetadata = getTableMetadata();
+PartitionMetadata partitionMetadata = PartitionMetadata.builder()
+.withTableInfo(tableMetadata.getTableInfo())
+.withMetadataInfo(metadataInfo)
+.withColumn(partitionColumn)
+.withSchema(tableMetadata.getSchema())
+.withColumnsStatistics(columnsStatistics)
+.withStatistics(statistics)
+.withPartitionValues(Collections.emptyList())
+.withLocations((Set) value)
 
 Review comment:
   It is required because `HashMultimap.asMap()` returns map with Collection in 
the values, but for `HashMultimap` used set. To avoid problems for the case 
when `HashMultimap` implementation is changed, I have replaced it with `new 
HashSet<>(value)`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871458#comment-16871458
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296756147
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/BaseTableMetadata.java
 ##
 @@ -0,0 +1,143 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.metastore.statistics.ColumnStatistics;
+import org.apache.drill.metastore.statistics.StatisticsHolder;
+import org.apache.hadoop.fs.Path;
+
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+
+/**
+ * Base implementation of {@link TableMetadata} interface.
+ */
+public class BaseTableMetadata extends BaseMetadata implements TableMetadata {
+
+  public static final long NON_DEFINED_LAST_MODIFIED_TIME = -1;
+
+  private final Path location;
+  private final long lastModifiedTime;
+  private final Map partitionKeys;
+  private final List interestingColumns;
+
+  private BaseTableMetadata(BaseTableMetadataBuilder builder) {
+super(builder);
+this.location = builder.location;
+this.partitionKeys = builder.partitionKeys;
+this.interestingColumns = builder.interestingColumns;
+this.lastModifiedTime = builder.lastModifiedTime;
+  }
+
+  public boolean isPartitionColumn(String fieldName) {
+return partitionKeys.containsKey(fieldName);
+  }
+
+  boolean isPartitioned() {
+return !partitionKeys.isEmpty();
+  }
+
+  @Override
+  public Path getLocation() {
+return location;
+  }
+
+  @Override
+  public long getLastModifiedTime() {
+return lastModifiedTime;
+  }
+
+  @Override
+  public List getInterestingColumns() {
+return interestingColumns;
+  }
+
+  @Override
+  @SuppressWarnings("unchecked")
+  public BaseTableMetadata cloneWithStats(Map 
columnStatistics, List tableStatistics) {
+Map mergedTableStatistics = new 
HashMap<>(this.statistics);
+
+// overrides statistics value for the case when new statistics is exact or 
existing one was estimated
+tableStatistics.stream()
+.filter(statisticsHolder -> 
statisticsHolder.getStatisticsKind().isExact()
+  || 
!this.statistics.get(statisticsHolder.getStatisticsKind().getName()).getStatisticsKind().isExact())
+.forEach(statisticsHolder -> 
mergedTableStatistics.put(statisticsHolder.getStatisticsKind().getName(), 
statisticsHolder));
+
+Map newColumnsStatistics = new 
HashMap<>(this.columnsStatistics);
+this.columnsStatistics.forEach(
+(columnName, value) -> newColumnsStatistics.put(columnName, 
value.cloneWith(columnStatistics.get(columnName;
+
+return BaseTableMetadata.builder()
+.withTableInfo(tableInfo)
+.withMetadataInfo(metadataInfo)
+.withLocation(location)
+.withSchema(schema)
+.withColumnsStatistics(newColumnsStatistics)
+.withStatistics(mergedTableStatistics.values())
+.withLastModifiedTime(lastModifiedTime)
+.withPartitionKeys(partitionKeys)
+.withInterestingColumns(interestingColumns)
+.build();
+  }
+
+  public static BaseTableMetadataBuilder builder() {
+return new BaseTableMetadataBuilder();
+  }
+
+  public static class BaseTableMetadataBuilder extends 
BaseMetadataBuilder {
+private Path location;
+private long lastModifiedTime = NON_DEFINED_LAST_MODIFIED_TIME;
+private Map partitionKeys;
+private List interestingColumns;
+
+public BaseTableMetadataBuilder withLocation(Path location) {
+  this.location = location;
+  return self();
+}
+
+public BaseTableMetadataBuilder withLastModifiedTime(long 
lastModifiedTime) {
+  this.lastModifiedTime = lastModifiedTime;
+  return self();
+}
+
+public BaseTableMetadataBuilder 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871466#comment-16871466
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296775175
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/MetadataType.java
 ##
 @@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+/**
+ * Enum with possible types of metadata.
+ */
+public enum MetadataType {
+
+  ALL,
+
+  /**
+   * Table level metadata type.
+   */
+  TABLE,
+
+  /**
+   * Segment level metadata type. It corresponds to the metadata
+   * within specific directory for FS tables, or may correspond to partition 
for hive tables.
+   */
+  SEGMENT,
+
+  /**
+   * Drill partition level metadata type. It corresponds to parts of table 
data which has the same
+   * values within specific column, i.e. partitions discovered by Drill.
+   */
+  PARTITION,
+
+  /**
+   * File level metadata type.
+   */
+  FILE,
+
+  /**
+   * Row group level metadata type. Used for parquet tables.
+   */
+  ROW_GROUP,
+
+  NONE
 
 Review comment:
   1. Thanks, added.
   2. It is used during filtering to indicate that filtering was finished and 
there was no metadata whose size exceeds 
`PlannerSettings.PARQUET_ROWGROUP_FILTER_PUSHDOWN_PLANNING_THRESHOLD`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871449#comment-16871449
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296743285
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/statistics/StatisticsHolder.java
 ##
 @@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.statistics;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonInclude;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.fasterxml.jackson.annotation.JsonTypeInfo;
+import com.fasterxml.jackson.core.JsonProcessingException;
+import com.fasterxml.jackson.core.util.DefaultPrettyPrinter;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.fasterxml.jackson.databind.ObjectReader;
+import com.fasterxml.jackson.databind.ObjectWriter;
+
+import java.io.IOException;
+
+/**
+ * Class-holder for statistics kind and its value.
+ *
+ * @param  Type of statistics value
+ */
+@JsonInclude(JsonInclude.Include.NON_DEFAULT)
+public class StatisticsHolder {
+
+  public static final ObjectWriter OBJECT_WRITER = new 
ObjectMapper().setDefaultPrettyPrinter(new DefaultPrettyPrinter()).writer();
 
 Review comment:
   It is also used in `ColumnStatistics`. Set package default visibility.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871453#comment-16871453
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296753431
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/BaseMetadata.java
 ##
 @@ -0,0 +1,148 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.metastore.SchemaPathUtils;
+import org.apache.drill.metastore.statistics.ColumnStatistics;
+import org.apache.drill.metastore.statistics.StatisticsHolder;
+import org.apache.drill.metastore.statistics.StatisticsKind;
+
+import java.util.Collection;
+import java.util.Map;
+import java.util.Objects;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+
+/**
+ * Common provider of tuple schema, column metadata, and statistics for table, 
partition, file or row group.
+ */
+public abstract class BaseMetadata implements Metadata {
+  protected final TableInfo tableInfo;
+  protected final MetadataInfo metadataInfo;
+  protected final TupleMetadata schema;
+  protected final Map columnsStatistics;
+  protected final Map statistics;
+
+  protected > 
BaseMetadata(BaseMetadataBuilder builder) {
+this.tableInfo = builder.tableInfo;
+this.metadataInfo = builder.metadataInfo;
+this.schema = builder.schema;
+this.columnsStatistics = builder.columnsStatistics;
+this.statistics = builder.statistics.stream()
+.collect(Collectors.toMap(
+statistic -> statistic.getStatisticsKind().getName(),
+Function.identity(),
+(a, b) -> a.getStatisticsKind().isExact() ? a : b));
+  }
+
+  @Override
+  public Map getColumnsStatistics() {
+return columnsStatistics;
+  }
+
+  @Override
+  public ColumnStatistics getColumnStatistics(SchemaPath columnName) {
+return columnsStatistics.get(columnName);
+  }
+
+  @Override
+  public TupleMetadata getSchema() {
+return schema;
+  }
+
+  @Override
+  @SuppressWarnings("unchecked")
+  public  V getStatistic(StatisticsKind statisticsKind) {
+StatisticsHolder statisticsHolder = 
statistics.get(statisticsKind.getName());
+return statisticsHolder != null ? statisticsHolder.getStatisticsValue() : 
null;
+  }
+
+  @Override
+  public boolean containsExactStatistics(StatisticsKind statisticsKind) {
+StatisticsHolder statisticsHolder = 
statistics.get(statisticsKind.getName());
+return statisticsHolder != null && 
statisticsHolder.getStatisticsKind().isExact();
+  }
+
+  @Override
+  @SuppressWarnings("unchecked")
+  public  V getStatisticsForColumn(SchemaPath columnName, StatisticsKind 
statisticsKind) {
+return (V) columnsStatistics.get(columnName).get(statisticsKind);
+  }
+
+  @Override
+  public ColumnMetadata getColumn(SchemaPath name) {
+return SchemaPathUtils.getColumnMetadata(name, schema);
+  }
+
+  @Override
+  public TableInfo getTableInfo() {
+return tableInfo;
+  }
+
+  @Override
+  public MetadataInfo getMetadataInfo() {
+return metadataInfo;
+  }
+
+  public static abstract class BaseMetadataBuilder> {
+protected TableInfo tableInfo;
+protected MetadataInfo metadataInfo;
+protected TupleMetadata schema;
+protected Map columnsStatistics;
+protected Collection statistics;
+
+public T withTableInfo(TableInfo tableInfo) {
+  this.tableInfo = tableInfo;
+  return self();
+}
+
+public T withMetadataInfo(MetadataInfo metadataInfo) {
+  this.metadataInfo = metadataInfo;
+  return self();
+}
+
+public T withSchema(TupleMetadata schema) {
+  this.schema = schema;
+  return self();
+}
+
+public T withColumnsStatistics(Map 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871450#comment-16871450
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296753490
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/BaseMetadata.java
 ##
 @@ -0,0 +1,148 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.metastore.SchemaPathUtils;
+import org.apache.drill.metastore.statistics.ColumnStatistics;
+import org.apache.drill.metastore.statistics.StatisticsHolder;
+import org.apache.drill.metastore.statistics.StatisticsKind;
+
+import java.util.Collection;
+import java.util.Map;
+import java.util.Objects;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+
+/**
+ * Common provider of tuple schema, column metadata, and statistics for table, 
partition, file or row group.
+ */
+public abstract class BaseMetadata implements Metadata {
+  protected final TableInfo tableInfo;
+  protected final MetadataInfo metadataInfo;
+  protected final TupleMetadata schema;
+  protected final Map columnsStatistics;
+  protected final Map statistics;
+
+  protected > 
BaseMetadata(BaseMetadataBuilder builder) {
+this.tableInfo = builder.tableInfo;
+this.metadataInfo = builder.metadataInfo;
+this.schema = builder.schema;
+this.columnsStatistics = builder.columnsStatistics;
+this.statistics = builder.statistics.stream()
+.collect(Collectors.toMap(
+statistic -> statistic.getStatisticsKind().getName(),
+Function.identity(),
+(a, b) -> a.getStatisticsKind().isExact() ? a : b));
+  }
+
+  @Override
+  public Map getColumnsStatistics() {
+return columnsStatistics;
+  }
+
+  @Override
+  public ColumnStatistics getColumnStatistics(SchemaPath columnName) {
+return columnsStatistics.get(columnName);
+  }
+
+  @Override
+  public TupleMetadata getSchema() {
+return schema;
+  }
+
+  @Override
+  @SuppressWarnings("unchecked")
+  public  V getStatistic(StatisticsKind statisticsKind) {
+StatisticsHolder statisticsHolder = 
statistics.get(statisticsKind.getName());
+return statisticsHolder != null ? statisticsHolder.getStatisticsValue() : 
null;
+  }
+
+  @Override
+  public boolean containsExactStatistics(StatisticsKind statisticsKind) {
+StatisticsHolder statisticsHolder = 
statistics.get(statisticsKind.getName());
+return statisticsHolder != null && 
statisticsHolder.getStatisticsKind().isExact();
+  }
+
+  @Override
+  @SuppressWarnings("unchecked")
+  public  V getStatisticsForColumn(SchemaPath columnName, StatisticsKind 
statisticsKind) {
+return (V) columnsStatistics.get(columnName).get(statisticsKind);
+  }
+
+  @Override
+  public ColumnMetadata getColumn(SchemaPath name) {
+return SchemaPathUtils.getColumnMetadata(name, schema);
+  }
+
+  @Override
+  public TableInfo getTableInfo() {
+return tableInfo;
+  }
+
+  @Override
+  public MetadataInfo getMetadataInfo() {
+return metadataInfo;
+  }
+
+  public static abstract class BaseMetadataBuilder> {
+protected TableInfo tableInfo;
+protected MetadataInfo metadataInfo;
+protected TupleMetadata schema;
+protected Map columnsStatistics;
+protected Collection statistics;
+
+public T withTableInfo(TableInfo tableInfo) {
+  this.tableInfo = tableInfo;
+  return self();
+}
+
+public T withMetadataInfo(MetadataInfo metadataInfo) {
+  this.metadataInfo = metadataInfo;
+  return self();
+}
+
+public T withSchema(TupleMetadata schema) {
+  this.schema = schema;
+  return self();
+}
+
+public T withColumnsStatistics(Map 

[jira] [Updated] (DRILL-6951) Merge row set based mock data source

2019-06-24 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6951:

Labels: ready-to-commit  (was: )

> Merge row set based mock data source
> 
>
> Key: DRILL-6951
> URL: https://issues.apache.org/jira/browse/DRILL-6951
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.15.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>
> The mock reader framework is an obscure bit of code used in tests that 
> generates fake data for use in things like testing sort, filters and so on.
> Because the mock reader is simple, it is a good demonstration case for the 
> new scanner framework based on the result set loader. This task merges the 
> existing work in migrating the mock data source into master via a PR.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6951) Merge row set based mock data source

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871190#comment-16871190
 ] 

ASF GitHub Bot commented on DRILL-6951:
---

arina-ielchiieva commented on issue #1809: DRILL-6951: Row set based mock data 
source
URL: https://github.com/apache/drill/pull/1809#issuecomment-505004114
 
 
   @paul-rogers looks good, please squash the commits.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Merge row set based mock data source
> 
>
> Key: DRILL-6951
> URL: https://issues.apache.org/jira/browse/DRILL-6951
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.15.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
> The mock reader framework is an obscure bit of code used in tests that 
> generates fake data for use in things like testing sort, filters and so on.
> Because the mock reader is simple, it is a good demonstration case for the 
> new scanner framework based on the result set loader. This task merges the 
> existing work in migrating the mock data source into master via a PR.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871188#comment-16871188
 ] 

ASF GitHub Bot commented on DRILL-7306:
---

arina-ielchiieva commented on pull request #1813: DRILL-7306: Disable 
schema-only batch for new scan framework
URL: https://github.com/apache/drill/pull/1813#discussion_r296713307
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/easy/EasyFormatPlugin.java
 ##
 @@ -398,6 +363,40 @@ public void addContext(UserException.Builder builder) {
 }
   }
 
+  /**
+   * Initialize the scan framework builder with standard options.
+   * Call this from the plugin-specific
+   * {@link #frameworkBuilder(OptionManager, EasySubScan)} method.
+   * The plugin can then customize/revise options as needed.
 
 Review comment:
   Please add two params to the Javadoc as well.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Disable "fast schema" batch for new scan framework
> --
>
> Key: DRILL-7306
> URL: https://issues.apache.org/jira/browse/DRILL-7306
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
>  The EVF framework is set up to return a "fast schema" empty batch with only 
> schema as its first batch because, when the code was written, it seemed 
> that's how we wanted operators to work. However, DRILL-7305 notes that many 
> operators cannot handle empty batches.
> Since the empty-batch bugs show that Drill does not, in fact, provide a "fast 
> schema" batch, this ticket asks to disable the feature in the new scan 
> framework. The feature is disabled with a config option; it can be re-enabled 
> if ever it is needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871133#comment-16871133
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296687506
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/AbstractParquetGroupScan.java
 ##
 @@ -281,31 +265,54 @@ public AbstractGroupScanWithMetadata 
applyFilter(LogicalExpression filterExpr, U
 
   logger.debug("All row groups have been filtered out. Add back one to get 
schema from scanner");
 
+  Map segmentsMap = 
getNextOrEmpty(getSegmentsMetadata().values()).stream()
+  .collect(Collectors.toMap(SegmentMetadata::getPath, 
Function.identity()));
+
   Map filesMap = 
getNextOrEmpty(getFilesMetadata().values()).stream()
-  .collect(Collectors.toMap(FileMetadata::getLocation, 
Function.identity()));
+  .collect(Collectors.toMap(FileMetadata::getPath, 
Function.identity()));
 
   Multimap rowGroupsMap = 
LinkedListMultimap.create();
-  getNextOrEmpty(getRowGroupsMetadata().values()).forEach(entry -> 
rowGroupsMap.put(entry.getLocation(), entry));
+  getNextOrEmpty(getRowGroupsMetadata().values()).forEach(entry -> 
rowGroupsMap.put(entry.getPath(), entry));
 
-  builder.withRowGroups(rowGroupsMap)
+  filteredMetadata.withRowGroups(rowGroupsMap)
   .withTable(getTableMetadata())
+  .withSegments(segmentsMap)
   .withPartitions(getNextOrEmpty(getPartitionsMetadata()))
   .withNonInterestingColumns(getNonInterestingColumnsMetadata())
   .withFiles(filesMap)
   .withMatching(false);
 }
 
-if (builder.getOverflowLevel() != MetadataLevel.NONE) {
-  logger.warn("applyFilter {} wasn't able to do pruning for  all metadata 
levels filter condition, since metadata count for " +
-"{} level exceeds 
`planner.store.parquet.rowgroup.filter.pushdown.threshold` value.\n" +
-"But underlying metadata was pruned without filter expression 
according to the metadata with above level.",
-  ExpressionStringBuilder.toString(filterExpr), 
builder.getOverflowLevel());
+if (filteredMetadata.getOverflowLevel() != MetadataType.NONE) {
+  if (logger.isWarnEnabled()) {
 
 Review comment:
   No objections for this change but what are the odds of warn level being 
disabled? :)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871147#comment-16871147
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296693580
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/BaseTableMetadata.java
 ##
 @@ -0,0 +1,143 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.metastore.statistics.ColumnStatistics;
+import org.apache.drill.metastore.statistics.StatisticsHolder;
+import org.apache.hadoop.fs.Path;
+
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+
+/**
+ * Base implementation of {@link TableMetadata} interface.
+ */
+public class BaseTableMetadata extends BaseMetadata implements TableMetadata {
+
+  public static final long NON_DEFINED_LAST_MODIFIED_TIME = -1;
+
+  private final Path location;
+  private final long lastModifiedTime;
+  private final Map partitionKeys;
+  private final List interestingColumns;
+
+  private BaseTableMetadata(BaseTableMetadataBuilder builder) {
+super(builder);
+this.location = builder.location;
+this.partitionKeys = builder.partitionKeys;
+this.interestingColumns = builder.interestingColumns;
+this.lastModifiedTime = builder.lastModifiedTime;
+  }
+
+  public boolean isPartitionColumn(String fieldName) {
+return partitionKeys.containsKey(fieldName);
+  }
+
+  boolean isPartitioned() {
+return !partitionKeys.isEmpty();
+  }
+
+  @Override
+  public Path getLocation() {
+return location;
+  }
+
+  @Override
+  public long getLastModifiedTime() {
+return lastModifiedTime;
+  }
+
+  @Override
+  public List getInterestingColumns() {
+return interestingColumns;
+  }
+
+  @Override
+  @SuppressWarnings("unchecked")
+  public BaseTableMetadata cloneWithStats(Map 
columnStatistics, List tableStatistics) {
+Map mergedTableStatistics = new 
HashMap<>(this.statistics);
+
+// overrides statistics value for the case when new statistics is exact or 
existing one was estimated
+tableStatistics.stream()
+.filter(statisticsHolder -> 
statisticsHolder.getStatisticsKind().isExact()
+  || 
!this.statistics.get(statisticsHolder.getStatisticsKind().getName()).getStatisticsKind().isExact())
+.forEach(statisticsHolder -> 
mergedTableStatistics.put(statisticsHolder.getStatisticsKind().getName(), 
statisticsHolder));
+
+Map newColumnsStatistics = new 
HashMap<>(this.columnsStatistics);
+this.columnsStatistics.forEach(
+(columnName, value) -> newColumnsStatistics.put(columnName, 
value.cloneWith(columnStatistics.get(columnName;
+
+return BaseTableMetadata.builder()
+.withTableInfo(tableInfo)
+.withMetadataInfo(metadataInfo)
+.withLocation(location)
+.withSchema(schema)
+.withColumnsStatistics(newColumnsStatistics)
+.withStatistics(mergedTableStatistics.values())
+.withLastModifiedTime(lastModifiedTime)
+.withPartitionKeys(partitionKeys)
+.withInterestingColumns(interestingColumns)
+.build();
+  }
+
+  public static BaseTableMetadataBuilder builder() {
+return new BaseTableMetadataBuilder();
+  }
+
+  public static class BaseTableMetadataBuilder extends 
BaseMetadataBuilder {
+private Path location;
+private long lastModifiedTime = NON_DEFINED_LAST_MODIFIED_TIME;
+private Map partitionKeys;
+private List interestingColumns;
+
+public BaseTableMetadataBuilder withLocation(Path location) {
+  this.location = location;
+  return self();
+}
+
+public BaseTableMetadataBuilder withLastModifiedTime(long 
lastModifiedTime) {
+  this.lastModifiedTime = lastModifiedTime;
+  return self();
+}
+
+public BaseTableMetadataBuilder 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871134#comment-16871134
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296686176
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/AbstractParquetGroupScan.java
 ##
 @@ -210,6 +213,17 @@ public int getMaxParallelizationWidth() {
 return readEntries;
   }
 
+  /**
+   * {@inheritDoc}
+   * 
+   * - if file metadata was pruned, prunes underlying metadata
 
 Review comment:
   Not sure if we need dash here, can be this covered with nested list?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> org.apache.drill.metastore.RowGroupMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> 8. Remove org.apache.drill.exec package from metastore module.
> 9. Rename ColumnStatisticsImpl class.
> 10. Separate existing classes in org.apache.drill.metastore package into 
> sub-packages.
> 11. Rename FileTableMetadata -> BaseTableMetadata
> 12. TableMetadataProvider.getNonInterestingColumnsMeta() -> 
> getNonInterestingColumnsMetadata
> 13. Introduce segment-level metadata class:
> {noformat}
> class SegmentMetadata {
>   TableInfo tableInfo;
>   MetadataInfo 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871137#comment-16871137
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296684303
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/AbstractGroupScanWithMetadata.java
 ##
 @@ -666,27 +733,66 @@ protected void filterTableMetadata(FilterPredicate 
filterPredicate, Set schemaPathsInExpr) {
+protected void filterSegmentMetadata(OptionManager optionManager,
+ FilterPredicate filterPredicate,
+ Set schemaPathsInExpr) {
   if (!matchAllMetadata) {
-if (!source.getPartitionsMetadata().isEmpty()) {
-  if (source.getPartitionsMetadata().size() <= optionManager.getOption(
-
PlannerSettings.PARQUET_ROWGROUP_FILTER_PUSHDOWN_PLANNING_THRESHOLD)) {
+if (!source.getSegmentsMetadata().isEmpty()) {
+  if (source.getSegmentsMetadata().size() <= optionManager.getOption(
+  
PlannerSettings.PARQUET_ROWGROUP_FILTER_PUSHDOWN_PLANNING_THRESHOLD)) {
 matchAllMetadata = true;
-partitions = filterAndGetMetadata(schemaPathsInExpr, 
source.getPartitionsMetadata(), filterPredicate, optionManager);
+segments = filterAndGetMetadata(schemaPathsInExpr,
+source.getSegmentsMetadata().values(),
+filterPredicate,
+optionManager).stream()
+.collect(Collectors.toMap(SegmentMetadata::getPath, 
Function.identity()));
 
 Review comment:
   ```suggestion
   .collect(Collectors.toMap(
   SegmentMetadata::getPath,
   Function.identity()));
   ```
   Plus what about duplicates handling? It would be safer to add `(o, n) -> n` 
but of course if you did not intend to fail on duplicate.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> --
> storagePlugin, 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871142#comment-16871142
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296690973
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/MetadataInfo.java
 ##
 @@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+/**
+ * Class which identifies specific metadata.
 
 Review comment:
   Please write better java doc: "Class that specifies metadata type ..." and 
provide an example.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> org.apache.drill.metastore.RowGroupMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871152#comment-16871152
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296691929
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/PartitionMetadata.java
 ##
 @@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.hadoop.fs.Path;
+
+import java.util.List;
+import java.util.Objects;
+import java.util.Set;
+
+/**
+ * Represents a metadata for the table part, which corresponds to the specific 
partition key.
+ */
+public class PartitionMetadata extends BaseMetadata {
+  private final SchemaPath column;
+  private final List partitionValues;
+  private final Set locations;
+  private final long lastModifiedTime;
+
+  private PartitionMetadata(PartitionMetadataBuilder builder) {
+super(builder);
+this.column = builder.column;
+this.partitionValues = builder.partitionValues;
+this.locations = builder.locations;
+this.lastModifiedTime = builder.lastModifiedTime;
+  }
+
+  /**
+   * It allows to obtain the column path for this partition
+   *
+   * @return column path
+   */
+  public SchemaPath getColumn() {
+return column;
+  }
+
+  /**
+   * File locations for this partition
+   *
+   * @return file locations
+   */
+  public Set getLocations() {
+return locations;
+  }
+
+  /**
+   * It allows to check the time, when any files were modified. It is in Unix 
Timestamp
 
 Review comment:
   Add timestamp unit of measurement.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871150#comment-16871150
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296694014
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/BaseMetadata.java
 ##
 @@ -0,0 +1,148 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.metastore.SchemaPathUtils;
+import org.apache.drill.metastore.statistics.ColumnStatistics;
+import org.apache.drill.metastore.statistics.StatisticsHolder;
+import org.apache.drill.metastore.statistics.StatisticsKind;
+
+import java.util.Collection;
+import java.util.Map;
+import java.util.Objects;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+
+/**
+ * Common provider of tuple schema, column metadata, and statistics for table, 
partition, file or row group.
+ */
+public abstract class BaseMetadata implements Metadata {
+  protected final TableInfo tableInfo;
+  protected final MetadataInfo metadataInfo;
+  protected final TupleMetadata schema;
+  protected final Map columnsStatistics;
+  protected final Map statistics;
+
+  protected > 
BaseMetadata(BaseMetadataBuilder builder) {
+this.tableInfo = builder.tableInfo;
+this.metadataInfo = builder.metadataInfo;
+this.schema = builder.schema;
+this.columnsStatistics = builder.columnsStatistics;
+this.statistics = builder.statistics.stream()
+.collect(Collectors.toMap(
+statistic -> statistic.getStatisticsKind().getName(),
+Function.identity(),
+(a, b) -> a.getStatisticsKind().isExact() ? a : b));
+  }
+
+  @Override
+  public Map getColumnsStatistics() {
+return columnsStatistics;
+  }
+
+  @Override
+  public ColumnStatistics getColumnStatistics(SchemaPath columnName) {
+return columnsStatistics.get(columnName);
+  }
+
+  @Override
+  public TupleMetadata getSchema() {
+return schema;
+  }
+
+  @Override
+  @SuppressWarnings("unchecked")
+  public  V getStatistic(StatisticsKind statisticsKind) {
+StatisticsHolder statisticsHolder = 
statistics.get(statisticsKind.getName());
+return statisticsHolder != null ? statisticsHolder.getStatisticsValue() : 
null;
+  }
+
+  @Override
+  public boolean containsExactStatistics(StatisticsKind statisticsKind) {
+StatisticsHolder statisticsHolder = 
statistics.get(statisticsKind.getName());
+return statisticsHolder != null && 
statisticsHolder.getStatisticsKind().isExact();
+  }
+
+  @Override
+  @SuppressWarnings("unchecked")
+  public  V getStatisticsForColumn(SchemaPath columnName, StatisticsKind 
statisticsKind) {
+return (V) columnsStatistics.get(columnName).get(statisticsKind);
+  }
+
+  @Override
+  public ColumnMetadata getColumn(SchemaPath name) {
+return SchemaPathUtils.getColumnMetadata(name, schema);
+  }
+
+  @Override
+  public TableInfo getTableInfo() {
+return tableInfo;
+  }
+
+  @Override
+  public MetadataInfo getMetadataInfo() {
+return metadataInfo;
+  }
+
+  public static abstract class BaseMetadataBuilder> {
+protected TableInfo tableInfo;
+protected MetadataInfo metadataInfo;
+protected TupleMetadata schema;
+protected Map columnsStatistics;
+protected Collection statistics;
+
+public T withTableInfo(TableInfo tableInfo) {
+  this.tableInfo = tableInfo;
+  return self();
+}
+
+public T withMetadataInfo(MetadataInfo metadataInfo) {
+  this.metadataInfo = metadataInfo;
+  return self();
+}
+
+public T withSchema(TupleMetadata schema) {
+  this.schema = schema;
+  return self();
+}
+
+public T withColumnsStatistics(Map 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871156#comment-16871156
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296696051
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/statistics/ColumnStatistics.java
 ##
 @@ -0,0 +1,167 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.statistics;
+
+import com.fasterxml.jackson.annotation.JsonAutoDetect;
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonInclude;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.fasterxml.jackson.annotation.JsonPropertyOrder;
+import com.fasterxml.jackson.core.JsonProcessingException;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.fasterxml.jackson.databind.ObjectReader;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.metastore.TableMetadataUtils;
+
+import java.io.IOException;
+import java.util.Collection;
+import java.util.Comparator;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+
+import static 
org.apache.drill.metastore.statistics.StatisticsHolder.OBJECT_WRITER;
+
+/**
+ * Represents collection of statistics values for specific column.
 
 Review comment:
   Can you please add example.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871132#comment-16871132
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296685467
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillStatsTable.java
 ##
 @@ -452,53 +456,54 @@ public static ObjectMapper getMapper() {
 .addDeserializer(TypeProtos.MajorType.class, new MajorTypeSerDe.De())
 .addDeserializer(SchemaPath.class, new SchemaPath.De());
 mapper.registerModule(deModule);
+mapper.registerSubtypes(new NamedType(NumericEquiDepthHistogram.class, 
"numeric-equi-depth"));
 
 Review comment:
   Do you think it makes sense to add `histogram` word as well: 
`numeric-equi-depth-histogram`?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> org.apache.drill.metastore.RowGroupMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> 8. Remove org.apache.drill.exec package from metastore module.
> 9. Rename ColumnStatisticsImpl class.
> 10. Separate existing classes in org.apache.drill.metastore package into 
> sub-packages.
> 11. Rename FileTableMetadata -> BaseTableMetadata
> 12. 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871131#comment-16871131
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296682838
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/AbstractGroupScanWithMetadata.java
 ##
 @@ -666,27 +733,66 @@ protected void filterTableMetadata(FilterPredicate 
filterPredicate, Set Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> org.apache.drill.metastore.RowGroupMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> 8. Remove org.apache.drill.exec package from metastore module.
> 9. Rename ColumnStatisticsImpl class.
> 10. Separate existing classes in org.apache.drill.metastore package into 
> sub-packages.
> 11. Rename FileTableMetadata -> BaseTableMetadata
> 12. TableMetadataProvider.getNonInterestingColumnsMeta() -> 
> getNonInterestingColumnsMetadata
> 13. Introduce segment-level metadata class:
> {noformat}
> class SegmentMetadata {
>   TableInfo tableInfo;
>   MetadataInfo metadataInfo;
>   SchemaPath column;
>   TupleMetadata schema;
>   String location;
>   Map columnsStatistics;
>   Map statistics;
>   List partitionValues;
>   List locations;
>   long lastModifiedTime;
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871141#comment-16871141
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296691350
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/MetadataType.java
 ##
 @@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+/**
+ * Enum with possible types of metadata.
+ */
+public enum MetadataType {
+
+  ALL,
+
+  /**
+   * Table level metadata type.
+   */
+  TABLE,
+
+  /**
+   * Segment level metadata type. It corresponds to the metadata
+   * within specific directory for FS tables, or may correspond to partition 
for hive tables.
+   */
+  SEGMENT,
+
+  /**
+   * Drill partition level metadata type. It corresponds to parts of table 
data which has the same
+   * values within specific column, i.e. partitions discovered by Drill.
+   */
+  PARTITION,
+
+  /**
+   * File level metadata type.
+   */
+  FILE,
+
+  /**
+   * Row group level metadata type. Used for parquet tables.
+   */
+  ROW_GROUP,
+
+  NONE
 
 Review comment:
   1. Add java doc
   2. Where none can be used?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871136#comment-16871136
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296682028
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/AbstractGroupScanWithMetadata.java
 ##
 @@ -535,31 +581,39 @@ public TableMetadata getTableMetadata() {
 return partitions;
   }
 
+  protected Map getSegmentsMetadata() {
+if (segments == null) {
+  segments = metadataProvider.getSegmentsMetadataMap();
+}
+return segments;
+  }
+
   @JsonIgnore
   public NonInterestingColumnsMetadata getNonInterestingColumnsMetadata() {
 if (nonInterestingColumnsMetadata == null) {
-  nonInterestingColumnsMetadata = 
metadataProvider.getNonInterestingColumnsMeta();
+  nonInterestingColumnsMetadata = 
metadataProvider.getNonInterestingColumnsMetadata();
 }
 return nonInterestingColumnsMetadata;
   }
 
   /**
* This class is responsible for filtering different metadata levels.
*/
-  protected abstract static class GroupScanWithMetadataFilterer {
+  protected abstract static class GroupScanWithMetadataFilterer> {
 protected final AbstractGroupScanWithMetadata source;
 
 protected boolean matchAllMetadata = false;
 
 protected TableMetadata tableMetadata;
 protected List partitions = Collections.emptyList();
+protected Map segments = Collections.emptyMap();
 
 Review comment:
   Using Collections emptyMap or emptyList creates unmodifiable objects, is 
this expected?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871125#comment-16871125
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296615995
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/statistics/StatisticsHolder.java
 ##
 @@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.statistics;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonInclude;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.fasterxml.jackson.annotation.JsonTypeInfo;
+import com.fasterxml.jackson.core.JsonProcessingException;
+import com.fasterxml.jackson.core.util.DefaultPrettyPrinter;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.fasterxml.jackson.databind.ObjectReader;
+import com.fasterxml.jackson.databind.ObjectWriter;
+
+import java.io.IOException;
+
+/**
+ * Class-holder for statistics kind and its value.
+ *
+ * @param  Type of statistics value
+ */
+@JsonInclude(JsonInclude.Include.NON_DEFAULT)
+public class StatisticsHolder {
+
+  public static final ObjectWriter OBJECT_WRITER = new 
ObjectMapper().setDefaultPrettyPrinter(new DefaultPrettyPrinter()).writer();
+  private static final ObjectReader OBJECT_READER = new 
ObjectMapper().readerFor(StatisticsHolder.class);
+
+  private final T statisticsValue;
+  private final BaseStatisticsKind statisticsKind;
+
+  @JsonCreator
+  public StatisticsHolder(@JsonProperty("statisticsValue") T statisticsValue,
+  @JsonProperty("statisticsKind") BaseStatisticsKind 
statisticsKind) {
+this.statisticsValue = statisticsValue;
+this.statisticsKind = statisticsKind;
+  }
+
+  public StatisticsHolder(T statisticsValue,
+  StatisticsKind statisticsKind) {
+this.statisticsValue = statisticsValue;
+this.statisticsKind = (BaseStatisticsKind) statisticsKind;
+  }
+
+  @JsonTypeInfo(use = JsonTypeInfo.Id.CLASS,
+include = JsonTypeInfo.As.WRAPPER_OBJECT)
+  public T getStatisticsValue() {
+return statisticsValue;
+  }
+
+  public StatisticsKind getStatisticsKind() {
+return statisticsKind;
+  }
+
+  public static StatisticsHolder deserialize(String serialized) throws 
IOException {
+return OBJECT_READER.readValue(serialized);
+  }
+
+  public static String serialize(StatisticsHolder statisticsHolder) throws 
JsonProcessingException {
 
 Review comment:
   Should be class level method without parameters: `public String 
toJsonString()`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>  

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871140#comment-16871140
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296692394
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/PartitionMetadata.java
 ##
 @@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.hadoop.fs.Path;
+
+import java.util.List;
+import java.util.Objects;
+import java.util.Set;
+
+/**
+ * Represents a metadata for the table part, which corresponds to the specific 
partition key.
+ */
+public class PartitionMetadata extends BaseMetadata {
+  private final SchemaPath column;
+  private final List partitionValues;
+  private final Set locations;
+  private final long lastModifiedTime;
+
+  private PartitionMetadata(PartitionMetadataBuilder builder) {
+super(builder);
+this.column = builder.column;
+this.partitionValues = builder.partitionValues;
+this.locations = builder.locations;
+this.lastModifiedTime = builder.lastModifiedTime;
+  }
+
+  /**
+   * It allows to obtain the column path for this partition
+   *
+   * @return column path
+   */
+  public SchemaPath getColumn() {
+return column;
+  }
+
+  /**
+   * File locations for this partition
+   *
+   * @return file locations
+   */
+  public Set getLocations() {
+return locations;
+  }
+
+  /**
+   * It allows to check the time, when any files were modified. It is in Unix 
Timestamp
+   *
+   * @return last modified time of files
+   */
+  public long getLastModifiedTime() {
+return lastModifiedTime;
+  }
+
+  public List getPartitionValues() {
+return partitionValues;
+  }
+
+  public static PartitionMetadataBuilder builder() {
+return new PartitionMetadataBuilder();
+  }
+
+  public static class PartitionMetadataBuilder extends 
BaseMetadataBuilder {
+private SchemaPath column;
+private List partitionValues;
+private Set locations;
+private long lastModifiedTime = 
BaseTableMetadata.NON_DEFINED_LAST_MODIFIED_TIME;
+
+public PartitionMetadataBuilder withLocations(Set locations) {
 
 Review comment:
   I think you can omit adding with, example: `locations`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871135#comment-16871135
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296691160
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/MetadataType.java
 ##
 @@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+/**
+ * Enum with possible types of metadata.
+ */
+public enum MetadataType {
+
+  ALL,
 
 Review comment:
   java doc: "Metadata that can be applicable to any type"
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> org.apache.drill.metastore.RowGroupMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871126#comment-16871126
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296615585
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/statistics/StatisticsHolder.java
 ##
 @@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.statistics;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonInclude;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.fasterxml.jackson.annotation.JsonTypeInfo;
+import com.fasterxml.jackson.core.JsonProcessingException;
+import com.fasterxml.jackson.core.util.DefaultPrettyPrinter;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.fasterxml.jackson.databind.ObjectReader;
+import com.fasterxml.jackson.databind.ObjectWriter;
+
+import java.io.IOException;
+
+/**
+ * Class-holder for statistics kind and its value.
+ *
+ * @param  Type of statistics value
+ */
+@JsonInclude(JsonInclude.Include.NON_DEFAULT)
+public class StatisticsHolder {
+
+  public static final ObjectWriter OBJECT_WRITER = new 
ObjectMapper().setDefaultPrettyPrinter(new DefaultPrettyPrinter()).writer();
+  private static final ObjectReader OBJECT_READER = new 
ObjectMapper().readerFor(StatisticsHolder.class);
+
+  private final T statisticsValue;
+  private final BaseStatisticsKind statisticsKind;
+
+  @JsonCreator
+  public StatisticsHolder(@JsonProperty("statisticsValue") T statisticsValue,
+  @JsonProperty("statisticsKind") BaseStatisticsKind 
statisticsKind) {
+this.statisticsValue = statisticsValue;
+this.statisticsKind = statisticsKind;
+  }
+
+  public StatisticsHolder(T statisticsValue,
+  StatisticsKind statisticsKind) {
+this.statisticsValue = statisticsValue;
+this.statisticsKind = (BaseStatisticsKind) statisticsKind;
+  }
+
+  @JsonTypeInfo(use = JsonTypeInfo.Id.CLASS,
+include = JsonTypeInfo.As.WRAPPER_OBJECT)
+  public T getStatisticsValue() {
+return statisticsValue;
+  }
+
+  public StatisticsKind getStatisticsKind() {
+return statisticsKind;
+  }
+
+  public static StatisticsHolder deserialize(String serialized) throws 
IOException {
 
 Review comment:
   Rename: `deserialize` -> `of`, `serialized` -> `jsonString`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871130#comment-16871130
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296682165
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/AbstractGroupScanWithMetadata.java
 ##
 @@ -572,34 +626,39 @@ public 
GroupScanWithMetadataFilterer(AbstractGroupScanWithMetadata source) {
  */
 public abstract AbstractGroupScanWithMetadata build();
 
-public GroupScanWithMetadataFilterer withTable(TableMetadata 
tableMetadata) {
+public B withTable(TableMetadata tableMetadata) {
   this.tableMetadata = tableMetadata;
-  return this;
+  return self();
 
 Review comment:
   Why `self()` method is better than returning `this`?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> org.apache.drill.metastore.RowGroupMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> 8. Remove org.apache.drill.exec package from metastore module.
> 9. Rename ColumnStatisticsImpl class.
> 10. Separate existing classes in org.apache.drill.metastore package into 
> sub-packages.
> 11. Rename FileTableMetadata -> BaseTableMetadata
> 12. 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871153#comment-16871153
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296694427
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/BaseMetadata.java
 ##
 @@ -0,0 +1,148 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.metastore.SchemaPathUtils;
+import org.apache.drill.metastore.statistics.ColumnStatistics;
+import org.apache.drill.metastore.statistics.StatisticsHolder;
+import org.apache.drill.metastore.statistics.StatisticsKind;
+
+import java.util.Collection;
+import java.util.Map;
+import java.util.Objects;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+
+/**
+ * Common provider of tuple schema, column metadata, and statistics for table, 
partition, file or row group.
+ */
+public abstract class BaseMetadata implements Metadata {
+  protected final TableInfo tableInfo;
+  protected final MetadataInfo metadataInfo;
+  protected final TupleMetadata schema;
+  protected final Map columnsStatistics;
+  protected final Map statistics;
 
 Review comment:
   What the difference between statistics and column statistics? Maybe 
statistics should be named better, for example, generalStatistics or 
metadataStatistics?
   I think for Metastore we used `metadataStatistics` naming ...
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871145#comment-16871145
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296696592
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/statistics/StatisticsHolder.java
 ##
 @@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.statistics;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonInclude;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.fasterxml.jackson.annotation.JsonTypeInfo;
+import com.fasterxml.jackson.core.JsonProcessingException;
+import com.fasterxml.jackson.core.util.DefaultPrettyPrinter;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.fasterxml.jackson.databind.ObjectReader;
+import com.fasterxml.jackson.databind.ObjectWriter;
+
+import java.io.IOException;
+
+/**
+ * Class-holder for statistics kind and its value.
+ *
+ * @param  Type of statistics value
+ */
+@JsonInclude(JsonInclude.Include.NON_DEFAULT)
+public class StatisticsHolder {
+
+  public static final ObjectWriter OBJECT_WRITER = new 
ObjectMapper().setDefaultPrettyPrinter(new DefaultPrettyPrinter()).writer();
+  private static final ObjectReader OBJECT_READER = new 
ObjectMapper().readerFor(StatisticsHolder.class);
+
+  private final T statisticsValue;
+  private final BaseStatisticsKind statisticsKind;
+
+  @JsonCreator
+  public StatisticsHolder(@JsonProperty("statisticsValue") T statisticsValue,
+  @JsonProperty("statisticsKind") BaseStatisticsKind 
statisticsKind) {
+this.statisticsValue = statisticsValue;
+this.statisticsKind = statisticsKind;
+  }
+
+  public StatisticsHolder(T statisticsValue,
+  StatisticsKind statisticsKind) {
+this.statisticsValue = statisticsValue;
+this.statisticsKind = (BaseStatisticsKind) statisticsKind;
+  }
+
+  @JsonTypeInfo(use = JsonTypeInfo.Id.CLASS,
+include = JsonTypeInfo.As.WRAPPER_OBJECT)
+  public T getStatisticsValue() {
+return statisticsValue;
+  }
+
+  public StatisticsKind getStatisticsKind() {
+return statisticsKind;
+  }
+
+  public static StatisticsHolder deserialize(String serialized) throws 
IOException {
+return OBJECT_READER.readValue(serialized);
+  }
+
+  public static String serialize(StatisticsHolder statisticsHolder) throws 
JsonProcessingException {
 
 Review comment:
   Please apply the same for other classes.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871148#comment-16871148
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296691639
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/PartitionMetadata.java
 ##
 @@ -0,0 +1,119 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.hadoop.fs.Path;
+
+import java.util.List;
+import java.util.Objects;
+import java.util.Set;
+
+/**
+ * Represents a metadata for the table part, which corresponds to the specific 
partition key.
+ */
+public class PartitionMetadata extends BaseMetadata {
+  private final SchemaPath column;
+  private final List partitionValues;
+  private final Set locations;
+  private final long lastModifiedTime;
+
+  private PartitionMetadata(PartitionMetadataBuilder builder) {
+super(builder);
+this.column = builder.column;
+this.partitionValues = builder.partitionValues;
+this.locations = builder.locations;
+this.lastModifiedTime = builder.lastModifiedTime;
+  }
+
+  /**
+   * It allows to obtain the column path for this partition
+   *
+   * @return column path
+   */
+  public SchemaPath getColumn() {
+return column;
+  }
+
+  /**
+   * File locations for this partition
+   *
+   * @return file locations
+   */
+  public Set getLocations() {
+return locations;
+  }
+
+  /**
+   * It allows to check the time, when any files were modified. It is in Unix 
Timestamp
 
 Review comment:
   ```suggestion
  * Allows to check the time, when any files were modified. It is in Unix 
Timestamp
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871144#comment-16871144
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296689298
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/TableMetadataUtils.java
 ##
 @@ -0,0 +1,139 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.metastore.metadata.BaseMetadata;
+import org.apache.drill.metastore.metadata.TableMetadata;
+import org.apache.drill.metastore.statistics.CollectableColumnStatisticsKind;
+import org.apache.drill.metastore.statistics.ColumnStatistics;
+import org.apache.drill.metastore.statistics.ColumnStatisticsKind;
+import org.apache.drill.metastore.statistics.StatisticsHolder;
+import org.apache.drill.metastore.statistics.TableStatisticsKind;
+import 
org.apache.drill.shaded.guava.com.google.common.primitives.UnsignedBytes;
+
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.Comparator;
+import java.util.HashMap;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+public class TableMetadataUtils {
+
+  private TableMetadataUtils() {
 
 Review comment:
   Again, no objections but just per my opinion this is an overhead.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871143#comment-16871143
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296693933
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/BaseMetadata.java
 ##
 @@ -0,0 +1,148 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.metastore.SchemaPathUtils;
+import org.apache.drill.metastore.statistics.ColumnStatistics;
+import org.apache.drill.metastore.statistics.StatisticsHolder;
+import org.apache.drill.metastore.statistics.StatisticsKind;
+
+import java.util.Collection;
+import java.util.Map;
+import java.util.Objects;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+
+/**
+ * Common provider of tuple schema, column metadata, and statistics for table, 
partition, file or row group.
+ */
+public abstract class BaseMetadata implements Metadata {
+  protected final TableInfo tableInfo;
+  protected final MetadataInfo metadataInfo;
+  protected final TupleMetadata schema;
+  protected final Map columnsStatistics;
+  protected final Map statistics;
+
+  protected > 
BaseMetadata(BaseMetadataBuilder builder) {
+this.tableInfo = builder.tableInfo;
+this.metadataInfo = builder.metadataInfo;
+this.schema = builder.schema;
+this.columnsStatistics = builder.columnsStatistics;
+this.statistics = builder.statistics.stream()
+.collect(Collectors.toMap(
+statistic -> statistic.getStatisticsKind().getName(),
+Function.identity(),
+(a, b) -> a.getStatisticsKind().isExact() ? a : b));
+  }
+
+  @Override
+  public Map getColumnsStatistics() {
+return columnsStatistics;
+  }
+
+  @Override
+  public ColumnStatistics getColumnStatistics(SchemaPath columnName) {
+return columnsStatistics.get(columnName);
+  }
+
+  @Override
+  public TupleMetadata getSchema() {
+return schema;
+  }
+
+  @Override
+  @SuppressWarnings("unchecked")
+  public  V getStatistic(StatisticsKind statisticsKind) {
+StatisticsHolder statisticsHolder = 
statistics.get(statisticsKind.getName());
+return statisticsHolder != null ? statisticsHolder.getStatisticsValue() : 
null;
+  }
+
+  @Override
+  public boolean containsExactStatistics(StatisticsKind statisticsKind) {
+StatisticsHolder statisticsHolder = 
statistics.get(statisticsKind.getName());
+return statisticsHolder != null && 
statisticsHolder.getStatisticsKind().isExact();
+  }
+
+  @Override
+  @SuppressWarnings("unchecked")
+  public  V getStatisticsForColumn(SchemaPath columnName, StatisticsKind 
statisticsKind) {
+return (V) columnsStatistics.get(columnName).get(statisticsKind);
+  }
+
+  @Override
+  public ColumnMetadata getColumn(SchemaPath name) {
+return SchemaPathUtils.getColumnMetadata(name, schema);
+  }
+
+  @Override
+  public TableInfo getTableInfo() {
+return tableInfo;
+  }
+
+  @Override
+  public MetadataInfo getMetadataInfo() {
+return metadataInfo;
+  }
+
+  public static abstract class BaseMetadataBuilder> {
+protected TableInfo tableInfo;
+protected MetadataInfo metadataInfo;
+protected TupleMetadata schema;
+protected Map columnsStatistics;
+protected Collection statistics;
+
+public T withTableInfo(TableInfo tableInfo) {
+  this.tableInfo = tableInfo;
+  return self();
+}
+
+public T withMetadataInfo(MetadataInfo metadataInfo) {
+  this.metadataInfo = metadataInfo;
+  return self();
+}
+
+public T withSchema(TupleMetadata schema) {
+  this.schema = schema;
+  return self();
+}
+
+public T withColumnsStatistics(Map 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871128#comment-16871128
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296680787
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/AbstractGroupScanWithMetadata.java
 ##
 @@ -221,6 +229,31 @@ public void setFilterForRuntime(LogicalExpression 
filterExpr, OptimizerRulesCont
 if ( ! skipRuntimePruning ) { setFilter(filterExpr); }
   }
 
+  /**
+   * Applies specified filter {@code filterExpr} to current group scan and 
produces filtering at:
+   * 
+   * table level:
+   * - if filter matches all the the data or prunes all the data, sets 
corresponding value to
 
 Review comment:
   I believe html formatting has notion of nested lists rather than doing 
custom paragraph with dash.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> org.apache.drill.metastore.RowGroupMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> 8. Remove org.apache.drill.exec package from metastore module.
> 9. Rename ColumnStatisticsImpl class.
> 10. Separate existing classes in org.apache.drill.metastore package into 
> sub-packages.
> 11. Rename FileTableMetadata -> 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871146#comment-16871146
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296694074
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/BaseMetadata.java
 ##
 @@ -0,0 +1,148 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.metastore.SchemaPathUtils;
+import org.apache.drill.metastore.statistics.ColumnStatistics;
+import org.apache.drill.metastore.statistics.StatisticsHolder;
+import org.apache.drill.metastore.statistics.StatisticsKind;
+
+import java.util.Collection;
+import java.util.Map;
+import java.util.Objects;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+
+/**
+ * Common provider of tuple schema, column metadata, and statistics for table, 
partition, file or row group.
+ */
+public abstract class BaseMetadata implements Metadata {
+  protected final TableInfo tableInfo;
+  protected final MetadataInfo metadataInfo;
+  protected final TupleMetadata schema;
+  protected final Map columnsStatistics;
+  protected final Map statistics;
+
+  protected > 
BaseMetadata(BaseMetadataBuilder builder) {
+this.tableInfo = builder.tableInfo;
+this.metadataInfo = builder.metadataInfo;
+this.schema = builder.schema;
+this.columnsStatistics = builder.columnsStatistics;
+this.statistics = builder.statistics.stream()
+.collect(Collectors.toMap(
+statistic -> statistic.getStatisticsKind().getName(),
+Function.identity(),
+(a, b) -> a.getStatisticsKind().isExact() ? a : b));
+  }
+
+  @Override
+  public Map getColumnsStatistics() {
+return columnsStatistics;
+  }
+
+  @Override
+  public ColumnStatistics getColumnStatistics(SchemaPath columnName) {
+return columnsStatistics.get(columnName);
+  }
+
+  @Override
+  public TupleMetadata getSchema() {
+return schema;
+  }
+
+  @Override
+  @SuppressWarnings("unchecked")
+  public  V getStatistic(StatisticsKind statisticsKind) {
+StatisticsHolder statisticsHolder = 
statistics.get(statisticsKind.getName());
+return statisticsHolder != null ? statisticsHolder.getStatisticsValue() : 
null;
+  }
+
+  @Override
+  public boolean containsExactStatistics(StatisticsKind statisticsKind) {
+StatisticsHolder statisticsHolder = 
statistics.get(statisticsKind.getName());
+return statisticsHolder != null && 
statisticsHolder.getStatisticsKind().isExact();
+  }
+
+  @Override
+  @SuppressWarnings("unchecked")
+  public  V getStatisticsForColumn(SchemaPath columnName, StatisticsKind 
statisticsKind) {
+return (V) columnsStatistics.get(columnName).get(statisticsKind);
+  }
+
+  @Override
+  public ColumnMetadata getColumn(SchemaPath name) {
+return SchemaPathUtils.getColumnMetadata(name, schema);
+  }
+
+  @Override
+  public TableInfo getTableInfo() {
+return tableInfo;
+  }
+
+  @Override
+  public MetadataInfo getMetadataInfo() {
+return metadataInfo;
+  }
+
+  public static abstract class BaseMetadataBuilder> {
+protected TableInfo tableInfo;
+protected MetadataInfo metadataInfo;
+protected TupleMetadata schema;
+protected Map columnsStatistics;
+protected Collection statistics;
+
+public T withTableInfo(TableInfo tableInfo) {
+  this.tableInfo = tableInfo;
+  return self();
+}
+
+public T withMetadataInfo(MetadataInfo metadataInfo) {
+  this.metadataInfo = metadataInfo;
+  return self();
+}
+
+public T withSchema(TupleMetadata schema) {
+  this.schema = schema;
+  return self();
+}
+
+public T withColumnsStatistics(Map 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871127#comment-16871127
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296621362
 
 

 ##
 File path: 
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeParquetScan.java
 ##
 @@ -228,5 +228,10 @@ public 
HiveDrillNativeParquetScanFilterer(HiveDrillNativeParquetScan source) {
 protected AbstractParquetGroupScan getNewScan() {
   return new HiveDrillNativeParquetScan((HiveDrillNativeParquetScan) 
source);
 }
+
+@Override
+protected HiveDrillNativeParquetScanFilterer self() {
 
 Review comment:
   Can you please explain where this method came from?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> org.apache.drill.metastore.RowGroupMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> 8. Remove org.apache.drill.exec package from metastore module.
> 9. Rename ColumnStatisticsImpl class.
> 10. Separate existing classes in org.apache.drill.metastore package into 
> sub-packages.
> 11. Rename FileTableMetadata -> BaseTableMetadata
> 12. TableMetadataProvider.getNonInterestingColumnsMeta() -> 
> getNonInterestingColumnsMetadata
> 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871149#comment-16871149
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296695307
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/BaseTableMetadata.java
 ##
 @@ -0,0 +1,143 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.metastore.statistics.ColumnStatistics;
+import org.apache.drill.metastore.statistics.StatisticsHolder;
+import org.apache.hadoop.fs.Path;
+
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+
+/**
+ * Base implementation of {@link TableMetadata} interface.
+ */
+public class BaseTableMetadata extends BaseMetadata implements TableMetadata {
+
+  public static final long NON_DEFINED_LAST_MODIFIED_TIME = -1;
+
+  private final Path location;
+  private final long lastModifiedTime;
+  private final Map partitionKeys;
+  private final List interestingColumns;
+
+  private BaseTableMetadata(BaseTableMetadataBuilder builder) {
+super(builder);
+this.location = builder.location;
+this.partitionKeys = builder.partitionKeys;
+this.interestingColumns = builder.interestingColumns;
+this.lastModifiedTime = builder.lastModifiedTime;
+  }
+
+  public boolean isPartitionColumn(String fieldName) {
+return partitionKeys.containsKey(fieldName);
+  }
+
+  boolean isPartitioned() {
+return !partitionKeys.isEmpty();
+  }
+
+  @Override
+  public Path getLocation() {
+return location;
+  }
+
+  @Override
+  public long getLastModifiedTime() {
+return lastModifiedTime;
+  }
+
+  @Override
+  public List getInterestingColumns() {
+return interestingColumns;
+  }
+
+  @Override
+  @SuppressWarnings("unchecked")
+  public BaseTableMetadata cloneWithStats(Map 
columnStatistics, List tableStatistics) {
+Map mergedTableStatistics = new 
HashMap<>(this.statistics);
+
+// overrides statistics value for the case when new statistics is exact or 
existing one was estimated
+tableStatistics.stream()
+.filter(statisticsHolder -> 
statisticsHolder.getStatisticsKind().isExact()
+  || 
!this.statistics.get(statisticsHolder.getStatisticsKind().getName()).getStatisticsKind().isExact())
+.forEach(statisticsHolder -> 
mergedTableStatistics.put(statisticsHolder.getStatisticsKind().getName(), 
statisticsHolder));
+
+Map newColumnsStatistics = new 
HashMap<>(this.columnsStatistics);
+this.columnsStatistics.forEach(
+(columnName, value) -> newColumnsStatistics.put(columnName, 
value.cloneWith(columnStatistics.get(columnName;
+
+return BaseTableMetadata.builder()
+.withTableInfo(tableInfo)
+.withMetadataInfo(metadataInfo)
+.withLocation(location)
+.withSchema(schema)
+.withColumnsStatistics(newColumnsStatistics)
+.withStatistics(mergedTableStatistics.values())
+.withLastModifiedTime(lastModifiedTime)
+.withPartitionKeys(partitionKeys)
+.withInterestingColumns(interestingColumns)
+.build();
+  }
+
+  public static BaseTableMetadataBuilder builder() {
+return new BaseTableMetadataBuilder();
+  }
+
+  public static class BaseTableMetadataBuilder extends 
BaseMetadataBuilder {
+private Path location;
+private long lastModifiedTime = NON_DEFINED_LAST_MODIFIED_TIME;
+private Map partitionKeys;
+private List interestingColumns;
+
+public BaseTableMetadataBuilder withLocation(Path location) {
+  this.location = location;
+  return self();
+}
+
+public BaseTableMetadataBuilder withLastModifiedTime(long 
lastModifiedTime) {
+  this.lastModifiedTime = lastModifiedTime;
+  return self();
+}
+
+public BaseTableMetadataBuilder 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871155#comment-16871155
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296695012
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/BaseMetadata.java
 ##
 @@ -0,0 +1,148 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.metastore.SchemaPathUtils;
+import org.apache.drill.metastore.statistics.ColumnStatistics;
+import org.apache.drill.metastore.statistics.StatisticsHolder;
+import org.apache.drill.metastore.statistics.StatisticsKind;
+
+import java.util.Collection;
+import java.util.Map;
+import java.util.Objects;
+import java.util.function.Function;
+import java.util.stream.Collectors;
+
+/**
+ * Common provider of tuple schema, column metadata, and statistics for table, 
partition, file or row group.
+ */
+public abstract class BaseMetadata implements Metadata {
+  protected final TableInfo tableInfo;
+  protected final MetadataInfo metadataInfo;
+  protected final TupleMetadata schema;
+  protected final Map columnsStatistics;
+  protected final Map statistics;
+
+  protected > 
BaseMetadata(BaseMetadataBuilder builder) {
+this.tableInfo = builder.tableInfo;
+this.metadataInfo = builder.metadataInfo;
+this.schema = builder.schema;
+this.columnsStatistics = builder.columnsStatistics;
+this.statistics = builder.statistics.stream()
+.collect(Collectors.toMap(
+statistic -> statistic.getStatisticsKind().getName(),
+Function.identity(),
+(a, b) -> a.getStatisticsKind().isExact() ? a : b));
+  }
+
+  @Override
+  public Map getColumnsStatistics() {
+return columnsStatistics;
+  }
+
+  @Override
+  public ColumnStatistics getColumnStatistics(SchemaPath columnName) {
+return columnsStatistics.get(columnName);
+  }
+
+  @Override
+  public TupleMetadata getSchema() {
+return schema;
+  }
+
+  @Override
+  @SuppressWarnings("unchecked")
+  public  V getStatistic(StatisticsKind statisticsKind) {
+StatisticsHolder statisticsHolder = 
statistics.get(statisticsKind.getName());
+return statisticsHolder != null ? statisticsHolder.getStatisticsValue() : 
null;
+  }
+
+  @Override
+  public boolean containsExactStatistics(StatisticsKind statisticsKind) {
+StatisticsHolder statisticsHolder = 
statistics.get(statisticsKind.getName());
+return statisticsHolder != null && 
statisticsHolder.getStatisticsKind().isExact();
+  }
+
+  @Override
+  @SuppressWarnings("unchecked")
+  public  V getStatisticsForColumn(SchemaPath columnName, StatisticsKind 
statisticsKind) {
+return (V) columnsStatistics.get(columnName).get(statisticsKind);
+  }
+
+  @Override
+  public ColumnMetadata getColumn(SchemaPath name) {
+return SchemaPathUtils.getColumnMetadata(name, schema);
+  }
+
+  @Override
+  public TableInfo getTableInfo() {
+return tableInfo;
+  }
+
+  @Override
+  public MetadataInfo getMetadataInfo() {
+return metadataInfo;
+  }
+
+  public static abstract class BaseMetadataBuilder> {
+protected TableInfo tableInfo;
+protected MetadataInfo metadataInfo;
+protected TupleMetadata schema;
+protected Map columnsStatistics;
+protected Collection statistics;
+
+public T withTableInfo(TableInfo tableInfo) {
 
 Review comment:
   Do you think `with` can be removed?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871154#comment-16871154
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296695918
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/TableInfo.java
 ##
 @@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+/**
+ * General table information.
+ */
+public class TableInfo {
+  public static final String UNKNOWN = "UNKNOWN";
+  public static final TableInfo UNKNOWN_TABLE_INFO = new TableInfo(UNKNOWN, 
UNKNOWN, UNKNOWN, UNKNOWN, UNKNOWN);
+
+  private final String storagePlugin;
+  private final String workspace;
+  private final String name;
+  private final String type;
+  private final String owner;
+
+  public TableInfo(String storagePlugin, String workspace, String name, String 
type, String owner) {
 
 Review comment:
   Make constructor private and add builder.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871139#comment-16871139
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296690696
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/MetadataInfo.java
 ##
 @@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+/**
+ * Class which identifies specific metadata.
+ */
+public class MetadataInfo {
+
+  public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
+  public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
+  public static final String DEFAULT_COLUMN_PREFIX = "_$SEGMENT_";
 
 Review comment:
   Where this constant will be used?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set location; -> locations
> {noformat}
> org.apache.drill.metastore.FileMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> path - path to file 
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Path location; - should contain directory to which file belongs
> {noformat}
> 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871129#comment-16871129
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296622444
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/StatisticsProvider.java
 ##
 @@ -218,89 +201,85 @@ public ColumnStatistics 
visitFunctionHolderExpression(FunctionHolderExpression h
   ValueHolder minFuncHolder = 
InterpreterEvaluator.evaluateFunction(interpreter, args1, holderExpr.getName());
   ValueHolder maxFuncHolder = 
InterpreterEvaluator.evaluateFunction(interpreter, args2, holderExpr.getName());
 
-  MinMaxStatistics statistics;
   switch (destType) {
 case INT:
-  statistics = new MinMaxStatistics<>(((IntHolder) 
minFuncHolder).value, ((IntHolder) maxFuncHolder).value, Integer::compareTo);
-  break;
+  return StatisticsProvider.getColumnStatistics(
+  ((IntHolder) minFuncHolder).value,
+  ((IntHolder) maxFuncHolder).value,
+  ColumnStatisticsKind.NULLS_COUNT.getFrom(input),
+  destType);
 case BIGINT:
-  statistics = new MinMaxStatistics<>(((BigIntHolder) 
minFuncHolder).value, ((BigIntHolder) maxFuncHolder).value, Long::compareTo);
-  break;
+  return StatisticsProvider.getColumnStatistics(
+  ((BigIntHolder) minFuncHolder).value,
+  ((BigIntHolder) maxFuncHolder).value,
+  ColumnStatisticsKind.NULLS_COUNT.getFrom(input),
+  destType);
 case FLOAT4:
-  statistics = new MinMaxStatistics<>(((Float4Holder) 
minFuncHolder).value, ((Float4Holder) maxFuncHolder).value, Float::compareTo);
-  break;
+  return StatisticsProvider.getColumnStatistics(
+  ((Float4Holder) minFuncHolder).value,
+  ((Float4Holder) maxFuncHolder).value,
+  ColumnStatisticsKind.NULLS_COUNT.getFrom(input),
+  destType);
 case FLOAT8:
-  statistics = new MinMaxStatistics<>(((Float8Holder) 
minFuncHolder).value, ((Float8Holder) maxFuncHolder).value, Double::compareTo);
-  break;
+  return StatisticsProvider.getColumnStatistics(
+  ((Float8Holder) minFuncHolder).value,
+  ((Float8Holder) maxFuncHolder).value,
+  ColumnStatisticsKind.NULLS_COUNT.getFrom(input),
+  destType);
 case TIMESTAMP:
-  statistics = new MinMaxStatistics<>(((TimeStampHolder) 
minFuncHolder).value, ((TimeStampHolder) maxFuncHolder).value, Long::compareTo);
-  break;
+  return StatisticsProvider.getColumnStatistics(
+  ((TimeStampHolder) minFuncHolder).value,
+  ((TimeStampHolder) maxFuncHolder).value,
+  ColumnStatisticsKind.NULLS_COUNT.getFrom(input),
+  destType);
 default:
   return null;
   }
-  statistics.setNullsCount((long) 
input.getStatistic(ColumnStatisticsKind.NULLS_COUNT));
-  return statistics;
 } catch (Exception e) {
-  throw new DrillRuntimeException("Error in evaluating function of " + 
holderExpr.getName() );
+  throw new DrillRuntimeException("Error in evaluating function of " + 
holderExpr.getName());
 }
   }
 
-  public static class MinMaxStatistics implements ColumnStatistics {
-private final V minVal;
-private final V maxVal;
-private final Comparator valueComparator;
-private long nullsCount;
-
-public MinMaxStatistics(V minVal, V maxVal, Comparator valueComparator) 
{
-  this.minVal = minVal;
-  this.maxVal = maxVal;
-  this.valueComparator = valueComparator;
-}
-
-@Override
-public Object getStatistic(StatisticsKind statisticsKind) {
-  switch (statisticsKind.getName()) {
-case ExactStatisticsConstants.MIN_VALUE:
-  return minVal;
-case ExactStatisticsConstants.MAX_VALUE:
-  return maxVal;
-case ExactStatisticsConstants.NULLS_COUNT:
-  return nullsCount;
-default:
-  return null;
-  }
-}
-
-@Override
-public boolean containsStatistic(StatisticsKind statisticsKind) {
-  switch (statisticsKind.getName()) {
-case ExactStatisticsConstants.MIN_VALUE:
-case ExactStatisticsConstants.MAX_VALUE:
-case ExactStatisticsConstants.NULLS_COUNT:
-  return true;
-default:
-  return false;
-  }
-}
-
-@Override
-public boolean containsExactStatistics(StatisticsKind statisticsKind) {
-  return true;
-}
-
-@Override
-public Comparator getValueComparator() {
-  return 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871124#comment-16871124
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296614730
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/statistics/StatisticsHolder.java
 ##
 @@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.statistics;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonInclude;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.fasterxml.jackson.annotation.JsonTypeInfo;
+import com.fasterxml.jackson.core.JsonProcessingException;
+import com.fasterxml.jackson.core.util.DefaultPrettyPrinter;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.fasterxml.jackson.databind.ObjectReader;
+import com.fasterxml.jackson.databind.ObjectWriter;
+
+import java.io.IOException;
+
+/**
+ * Class-holder for statistics kind and its value.
+ *
+ * @param  Type of statistics value
+ */
+@JsonInclude(JsonInclude.Include.NON_DEFAULT)
+public class StatisticsHolder {
+
+  public static final ObjectWriter OBJECT_WRITER = new 
ObjectMapper().setDefaultPrettyPrinter(new DefaultPrettyPrinter()).writer();
 
 Review comment:
   private?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> private final Set partitionKeys; -> Map
> {noformat}
> org.apache.drill.metastore.PartitionMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace -> will be covered by TableInfo class
> metadataType, metadataKey, metadataIdentifier -> will be covered by 
> MetadataInfo class
> partitionValues (List)
> location (String) (for directory level metadata) - directory location
> fields to modify
> 
> private final Map tableStatistics;
> private final Map statisticsKinds;
> 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871151#comment-16871151
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296695117
 
 

 ##
 File path: 
metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/BaseTableMetadata.java
 ##
 @@ -0,0 +1,143 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.metastore.metadata;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.metastore.statistics.ColumnStatistics;
+import org.apache.drill.metastore.statistics.StatisticsHolder;
+import org.apache.hadoop.fs.Path;
+
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+
+/**
+ * Base implementation of {@link TableMetadata} interface.
+ */
+public class BaseTableMetadata extends BaseMetadata implements TableMetadata {
+
+  public static final long NON_DEFINED_LAST_MODIFIED_TIME = -1;
+
+  private final Path location;
+  private final long lastModifiedTime;
+  private final Map partitionKeys;
+  private final List interestingColumns;
+
+  private BaseTableMetadata(BaseTableMetadataBuilder builder) {
+super(builder);
+this.location = builder.location;
+this.partitionKeys = builder.partitionKeys;
+this.interestingColumns = builder.interestingColumns;
+this.lastModifiedTime = builder.lastModifiedTime;
+  }
+
+  public boolean isPartitionColumn(String fieldName) {
+return partitionKeys.containsKey(fieldName);
+  }
+
+  boolean isPartitioned() {
+return !partitionKeys.isEmpty();
+  }
+
+  @Override
+  public Path getLocation() {
+return location;
+  }
+
+  @Override
+  public long getLastModifiedTime() {
+return lastModifiedTime;
+  }
+
+  @Override
+  public List getInterestingColumns() {
+return interestingColumns;
+  }
+
+  @Override
+  @SuppressWarnings("unchecked")
+  public BaseTableMetadata cloneWithStats(Map 
columnStatistics, List tableStatistics) {
+Map mergedTableStatistics = new 
HashMap<>(this.statistics);
+
+// overrides statistics value for the case when new statistics is exact or 
existing one was estimated
+tableStatistics.stream()
+.filter(statisticsHolder -> 
statisticsHolder.getStatisticsKind().isExact()
+  || 
!this.statistics.get(statisticsHolder.getStatisticsKind().getName()).getStatisticsKind().isExact())
+.forEach(statisticsHolder -> 
mergedTableStatistics.put(statisticsHolder.getStatisticsKind().getName(), 
statisticsHolder));
+
+Map newColumnsStatistics = new 
HashMap<>(this.columnsStatistics);
+this.columnsStatistics.forEach(
+(columnName, value) -> newColumnsStatistics.put(columnName, 
value.cloneWith(columnStatistics.get(columnName;
+
+return BaseTableMetadata.builder()
+.withTableInfo(tableInfo)
+.withMetadataInfo(metadataInfo)
+.withLocation(location)
+.withSchema(schema)
+.withColumnsStatistics(newColumnsStatistics)
+.withStatistics(mergedTableStatistics.values())
+.withLastModifiedTime(lastModifiedTime)
+.withPartitionKeys(partitionKeys)
+.withInterestingColumns(interestingColumns)
+.build();
+  }
+
+  public static BaseTableMetadataBuilder builder() {
+return new BaseTableMetadataBuilder();
+  }
+
+  public static class BaseTableMetadataBuilder extends 
BaseMetadataBuilder {
+private Path location;
+private long lastModifiedTime = NON_DEFINED_LAST_MODIFIED_TIME;
+private Map partitionKeys;
+private List interestingColumns;
+
+public BaseTableMetadataBuilder withLocation(Path location) {
 
 Review comment:
   Same here.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871138#comment-16871138
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296688086
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/BaseParquetMetadataProvider.java
 ##
 @@ -313,18 +328,30 @@ public TableMetadata getTableMetadata() {
   partitionsForValue.asMap().forEach((partitionKey, value) -> {
 Map columnsStatistics = new 
HashMap<>();
 
-Map statistics = new HashMap<>();
+List statistics = new ArrayList<>();
 partitionKey = partitionKey == NULL_VALUE ? null : partitionKey;
-statistics.put(ColumnStatisticsKind.MIN_VALUE, partitionKey);
-statistics.put(ColumnStatisticsKind.MAX_VALUE, partitionKey);
+statistics.add(new StatisticsHolder<>(partitionKey, 
ColumnStatisticsKind.MIN_VALUE));
+statistics.add(new StatisticsHolder<>(partitionKey, 
ColumnStatisticsKind.MAX_VALUE));
 
-statistics.put(ColumnStatisticsKind.NULLS_COUNT, 
Statistic.NO_COLUMN_STATS);
-statistics.put(TableStatisticsKind.ROW_COUNT, 
Statistic.NO_COLUMN_STATS);
+statistics.add(new StatisticsHolder<>(Statistic.NO_COLUMN_STATS, 
ColumnStatisticsKind.NULLS_COUNT));
+statistics.add(new StatisticsHolder<>(Statistic.NO_COLUMN_STATS, 
TableStatisticsKind.ROW_COUNT));
 columnsStatistics.put(partitionColumn,
-new ColumnStatisticsImpl<>(statistics,
-
ParquetTableMetadataUtils.getComparator(getParquetGroupScanStatistics().getTypeForColumn(partitionColumn).getMinorType(;
-partitions.add(new PartitionMetadata(partitionColumn, 
getTableMetadata().getSchema(),
-columnsStatistics, statistics, (Set) value, tableName, 
-1));
+new ColumnStatistics<>(statistics,
+
getParquetGroupScanStatistics().getTypeForColumn(partitionColumn).getMinorType()));
+MetadataInfo metadataInfo = new 
MetadataInfo(MetadataType.PARTITION, MetadataInfo.GENERAL_INFO_KEY, null);
+TableMetadata tableMetadata = getTableMetadata();
+PartitionMetadata partitionMetadata = PartitionMetadata.builder()
+.withTableInfo(tableMetadata.getTableInfo())
+.withMetadataInfo(metadataInfo)
+.withColumn(partitionColumn)
+.withSchema(tableMetadata.getSchema())
+.withColumnsStatistics(columnsStatistics)
+.withStatistics(statistics)
+.withPartitionValues(Collections.emptyList())
+.withLocations((Set) value)
 
 Review comment:
   Why cast is needed here?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> 

[jira] [Commented] (DRILL-7289) fs.s3a.path.style.access does not seem to work

2019-06-24 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871069#comment-16871069
 ] 

Steve Loughran commented on DRILL-7289:
---

it works on the S3A connector since hadoop-2.8 and HADOOP-12963; broadly tested 
as its the default for enterprise S3 stores which don't work with DNS.

if you have problems then either the config isn't complete or you are using an 
out of date version

Recommend

* download Hadoop 3.2 and install
* try the config in a core-site there, then hadoop fs- ls command
* or even better: cloudstore diagnostics 
https://github.com/steveloughran/cloudstore



> fs.s3a.path.style.access does not seem to work
> --
>
> Key: DRILL-7289
> URL: https://issues.apache.org/jira/browse/DRILL-7289
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.16.0
> Environment: Running on Kubernetes
>Reporter: Gururajesh Elango
>Priority: Major
>
> fs.s3a.path.style.access does not seem to work. Please see 
> [https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingBucket.html#access-bucket-intro]
>  to know about path style access
> How to reproduce:
> 1. Create a bucket in minio(S3 simulator)
> 2. Define a storage like below
> "storage": {
>  s3: {
>  type: "file",
>  connection: "s3a://new-bucket",
>  "config": {
>  "fs.s3a.access.key": "minio-user",
>  "fs.s3a.secret.key": "minio-password",
>  "fs.s3a.endpoint": "http://:9000",
>  "fs.s3a.connection.ssl.enabled": "false",
>  "fs.s3a.path.style.access": "true",
>  "fs.s3a.connection.timeout": "5000",
>  "fs.s3a.connection.maximum": "100"
>  },
>  "workspaces": {
>  "tmp": {
>  "location": "/tmp/drill",
>  "writable": "true",
>  "defaultInputFormat": "",
>  "allowAccessOutsideWorkspace": "false"
>  },
>  "root": {
>  "location": "/",
>  "writable": "false",
>  "defaultInputFormat": "",
>  "allowAccessOutsideWorkspace": "false"
>  }
>  },
>  "formats": {
>  "parquet": {
>  "type": "parquet"
>  },
>  "json": {
>  "type": "json",
>  "extensions": [
>  "json"
>  ]
>  },
>  "avro": {
>  "type": "avro"
>  }
>  },
>  "enabled": "true"
>  }
> }
> 3. In the logs, Expect an error which states
> new-bucket.:9000 is not reachable but what is expected is that 
> Drill tries to reach
> http://:9000/impact-enable-bucket



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7302) Bump Apache Avro from 1.8.2 to 1.9.0

2019-06-24 Thread Fokko Driesprong (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871013#comment-16871013
 ] 

Fokko Driesprong commented on DRILL-7302:
-

I'm allowed to assign tickets. I think it is because you have to add me to the 
project first.

> Bump Apache Avro from 1.8.2 to 1.9.0
> 
>
> Key: DRILL-7302
> URL: https://issues.apache.org/jira/browse/DRILL-7302
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Fokko Driesprong
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7307) casthigh for decimal type can lead to the issues with VarDecimalHolder

2019-06-24 Thread Dmytriy Grinchenko (JIRA)
Dmytriy Grinchenko created DRILL-7307:
-

 Summary: casthigh for decimal type can lead to the issues with 
VarDecimalHolder
 Key: DRILL-7307
 URL: https://issues.apache.org/jira/browse/DRILL-7307
 Project: Apache Drill
  Issue Type: Bug
Reporter: Dmytriy Grinchenko
Assignee: Dmytriy Grinchenko
 Fix For: 1.17.0


The decimal cast may lead to issues with VarDercimal transformation and issues 
at uml functions which using casthigh under the hood

Example: 
{code}
apache drill> select casthigh(cast(1025.0 as decimal(28,8)));
Error: SYSTEM ERROR: CompileException: Line 25, Column 60: "isSet" is neither a 
method, a field, nor a member class of 
"org.apache.drill.exec.expr.holders.VarDecimalHolder"

Fragment 0:0

Please, refer to logs for more information.
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870955#comment-16870955
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296612824
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/AbstractParquetGroupScan.java
 ##
 @@ -281,31 +265,50 @@ public AbstractGroupScanWithMetadata 
applyFilter(LogicalExpression filterExpr, U
 
   logger.debug("All row groups have been filtered out. Add back one to get 
schema from scanner");
 
+  Map segmentsMap = 
getNextOrEmpty(getSegmentsMetadata().values()).stream()
+  .collect(Collectors.toMap(SegmentMetadata::getPath, 
Function.identity()));
+
   Map filesMap = 
getNextOrEmpty(getFilesMetadata().values()).stream()
-  .collect(Collectors.toMap(FileMetadata::getLocation, 
Function.identity()));
+  .collect(Collectors.toMap(FileMetadata::getPath, 
Function.identity()));
 
   Multimap rowGroupsMap = 
LinkedListMultimap.create();
-  getNextOrEmpty(getRowGroupsMetadata().values()).forEach(entry -> 
rowGroupsMap.put(entry.getLocation(), entry));
+  getNextOrEmpty(getRowGroupsMetadata().values()).forEach(entry -> 
rowGroupsMap.put(entry.getPath(), entry));
 
-  builder.withRowGroups(rowGroupsMap)
+  filteredMetadata.withRowGroups(rowGroupsMap)
   .withTable(getTableMetadata())
+  .withSegments(segmentsMap)
   .withPartitions(getNextOrEmpty(getPartitionsMetadata()))
   .withNonInterestingColumns(getNonInterestingColumnsMetadata())
   .withFiles(filesMap)
   .withMatching(false);
 }
 
-if (builder.getOverflowLevel() != MetadataLevel.NONE) {
+if (filteredMetadata.getOverflowLevel() != MetadataType.NONE) {
   logger.warn("applyFilter {} wasn't able to do pruning for  all metadata 
levels filter condition, since metadata count for " +
 "{} level exceeds 
`planner.store.parquet.rowgroup.filter.pushdown.threshold` value.\n" +
 "But underlying metadata was pruned without filter expression 
according to the metadata with above level.",
-  ExpressionStringBuilder.toString(filterExpr), 
builder.getOverflowLevel());
+  ExpressionStringBuilder.toString(filterExpr), 
filteredMetadata.getOverflowLevel());
 }
 
 logger.debug("applyFilter {} reduce row groups # from {} to {}",
-ExpressionStringBuilder.toString(filterExpr), 
getRowGroupsMetadata().size(), builder.getRowGroups().size());
+ExpressionStringBuilder.toString(filterExpr), 
getRowGroupsMetadata().size(), filteredMetadata.getRowGroups().size());
 
 Review comment:
   Thanks, done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 
> private final 

[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870949#comment-16870949
 ] 

ASF GitHub Bot commented on DRILL-7271:
---

ihuzenko commented on pull request #1810: DRILL-7271: Refactor Metadata 
interfaces and classes to contain all needed information for the File based 
Metastore
URL: https://github.com/apache/drill/pull/1810#discussion_r296327891
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/AbstractParquetGroupScan.java
 ##
 @@ -281,31 +265,50 @@ public AbstractGroupScanWithMetadata 
applyFilter(LogicalExpression filterExpr, U
 
   logger.debug("All row groups have been filtered out. Add back one to get 
schema from scanner");
 
+  Map segmentsMap = 
getNextOrEmpty(getSegmentsMetadata().values()).stream()
+  .collect(Collectors.toMap(SegmentMetadata::getPath, 
Function.identity()));
+
   Map filesMap = 
getNextOrEmpty(getFilesMetadata().values()).stream()
-  .collect(Collectors.toMap(FileMetadata::getLocation, 
Function.identity()));
+  .collect(Collectors.toMap(FileMetadata::getPath, 
Function.identity()));
 
   Multimap rowGroupsMap = 
LinkedListMultimap.create();
-  getNextOrEmpty(getRowGroupsMetadata().values()).forEach(entry -> 
rowGroupsMap.put(entry.getLocation(), entry));
+  getNextOrEmpty(getRowGroupsMetadata().values()).forEach(entry -> 
rowGroupsMap.put(entry.getPath(), entry));
 
-  builder.withRowGroups(rowGroupsMap)
+  filteredMetadata.withRowGroups(rowGroupsMap)
   .withTable(getTableMetadata())
+  .withSegments(segmentsMap)
   .withPartitions(getNextOrEmpty(getPartitionsMetadata()))
   .withNonInterestingColumns(getNonInterestingColumnsMetadata())
   .withFiles(filesMap)
   .withMatching(false);
 }
 
-if (builder.getOverflowLevel() != MetadataLevel.NONE) {
+if (filteredMetadata.getOverflowLevel() != MetadataType.NONE) {
   logger.warn("applyFilter {} wasn't able to do pruning for  all metadata 
levels filter condition, since metadata count for " +
 "{} level exceeds 
`planner.store.parquet.rowgroup.filter.pushdown.threshold` value.\n" +
 "But underlying metadata was pruned without filter expression 
according to the metadata with above level.",
-  ExpressionStringBuilder.toString(filterExpr), 
builder.getOverflowLevel());
+  ExpressionStringBuilder.toString(filterExpr), 
filteredMetadata.getOverflowLevel());
 }
 
 logger.debug("applyFilter {} reduce row groups # from {} to {}",
-ExpressionStringBuilder.toString(filterExpr), 
getRowGroupsMetadata().size(), builder.getRowGroups().size());
+ExpressionStringBuilder.toString(filterExpr), 
getRowGroupsMetadata().size(), filteredMetadata.getRowGroups().size());
 
 Review comment:
   add ```isDebugEnabled()``` check before call 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor Metadata interfaces and classes to contain all needed information 
> for the File based Metastore
> ---
>
> Key: DRILL-7271
> URL: https://issues.apache.org/jira/browse/DRILL-7271
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Arina Ielchiieva
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> 1. Merge info from metadataStatistics + statisticsKinds into one holder: 
> Map.
> 2. Rename hasStatistics to hasDescriptiveStatistics
> 3. Remove drill-file-metastore-plugin
> 4. Move  
> org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel
>  to metadata module, rename to MetadataType and add new value: SEGMENT.
> 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
> 6. Add new info classes:
> {noformat}
> class TableInfo {
>   String storagePlugin;
>   String workspace;
>   String name;
>   String type;
>   String owner;
> }
> class MetadataInfo {
>   public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
>   public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT";
>   MetadataType type (enum);
>   String key;
>   String identifier;
> }
> {noformat}
> 7. Modify existing metadata classes:
> org.apache.drill.metastore.FileTableMetadata
> {noformat}
> missing fields
> --
> storagePlugin, workspace, tableType -> will be covered by TableInfo class
> metadataType, metadataKey -> will be covered by MetadataInfo class
> interestingColumns
> fields to modify
> 

[jira] [Commented] (DRILL-7293) Convert the regex ("log") plugin to use EVF

2019-06-24 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870914#comment-16870914
 ] 

ASF GitHub Bot commented on DRILL-7293:
---

arina-ielchiieva commented on issue #1807: DRILL-7293: Convert the regex 
("log") plugin to use EVF
URL: https://github.com/apache/drill/pull/1807#issuecomment-504905705
 
 
   @paul-rogers I am still unclear if you have tried the following query for 
log plugin data: `select * from table(t(schema=>'inline=(col1 varchar)'))` 
where `t` is table with log plugin data. Did you try it? I suppose it should 
work.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Convert the regex ("log") plugin to use EVF
> ---
>
> Key: DRILL-7293
> URL: https://issues.apache.org/jira/browse/DRILL-7293
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
> The "log" plugin (which uses a regex to define the row format) is the subject 
> of Chapter 12 of the Learning Apache Drill book (though the version in the 
> book is simpler than the one in the master branch.)
> The recently-completed "Enhanced Vector Framework" (EVF, AKA the "row set 
> framework") gives Drill control over the size of batches created by readers, 
> and allows readers to use the recently-added provided schema mechanism.
> We wish to use the log reader as an example for how to convert a Drill format 
> plugin to use the EVF so that other developers can convert their own plugins.
> This PR provides the first set of log plugin changes to enable us to publish 
> a tutorial on the EVF.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6958) CTAS csv with option

2019-06-24 Thread benj (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

benj updated DRILL-6958:

Description: 
Currently, it may be difficult to produce well-formed CSV with CTAS (see 
comment below).

It appears necessary to have some additional/configuratble options to write CSV 
file with CTAS :
 * possibility to change/define the separator,
 * possibility to write or not the header,
 * possibility to force the write of only 1 file instead of lot of parts,
 * possibility to force quoting
 * possibility to use/change escape char
 * ...

  was:
Add some options to write CSV file with CTAS :
 * possibility to change/define the separator,
 * possibility to write or not the header,
 * possibility to force the write of only 1 file instead of lot of parts,
 * possibility to force quoting


> CTAS csv with option
> 
>
> Key: DRILL-6958
> URL: https://issues.apache.org/jira/browse/DRILL-6958
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Text  CSV
>Affects Versions: 1.15.0, 1.16.0
>Reporter: benj
>Priority: Major
>
> Currently, it may be difficult to produce well-formed CSV with CTAS (see 
> comment below).
> It appears necessary to have some additional/configuratble options to write 
> CSV file with CTAS :
>  * possibility to change/define the separator,
>  * possibility to write or not the header,
>  * possibility to force the write of only 1 file instead of lot of parts,
>  * possibility to force quoting
>  * possibility to use/change escape char
>  * ...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6958) CTAS csv with option

2019-06-24 Thread benj (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

benj updated DRILL-6958:

Issue Type: Bug  (was: Improvement)

> CTAS csv with option
> 
>
> Key: DRILL-6958
> URL: https://issues.apache.org/jira/browse/DRILL-6958
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Text  CSV
>Affects Versions: 1.15.0, 1.16.0
>Reporter: benj
>Priority: Major
>
> Add some options to write CSV file with CTAS :
>  * possibility to change/define the separator,
>  * possibility to write or not the header,
>  * possibility to force the write of only 1 file instead of lot of parts,
>  * possibility to force quoting



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6958) CTAS csv with option

2019-06-24 Thread benj (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

benj updated DRILL-6958:

Affects Version/s: 1.16.0

> CTAS csv with option
> 
>
> Key: DRILL-6958
> URL: https://issues.apache.org/jira/browse/DRILL-6958
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Text  CSV
>Affects Versions: 1.15.0, 1.16.0
>Reporter: benj
>Priority: Major
>
> Add some options to write CSV file with CTAS :
>  * possibility to change/define the separator,
>  * possibility to write or not the header,
>  * possibility to force the write of only 1 file instead of lot of parts,
>  * possibility to force quoting



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)