[jira] [Commented] (DRILL-7308) Incorrect Metadata from text file queries
[ https://issues.apache.org/jira/browse/DRILL-7308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872033#comment-16872033 ] Paul Rogers commented on DRILL-7308: Recall that Drill can return not only multiple batches, but multiple "result sets": runs of batches with different schemas. A more sophisticated REST solution would handle this case. I can't find any ProtoBuf field that says that the schema changed. Instead, we'd have to reuse code from elsewhere which compares the current schema to the previous one. Ideally, in that case, we'd create a new JSON element for the second schema. Something like: {code:json} { resultSets: [ { "rows": ... "schema": ... }, { "rows": ... "schema": ... } ] } {code} It is easy to create such a case. Simply create two CSV files, one with 2 columns, the other with three. Use just a simple \{{SELECT * FROM yourTable}} query. You will get two data batches, each with a distinct schema. The current implementation will give just the first schema and all rows, with varying schemas. (Actually, the current implementation will list the two columns, then the three columns, duplicating the first two, but we want to fix that...) This is yet another reason to use a provisioned schema: with such a schema we can guarantee that the entire query will return a single, consistent schema regardless of the variation across files. > Incorrect Metadata from text file queries > - > > Key: DRILL-7308 > URL: https://issues.apache.org/jira/browse/DRILL-7308 > Project: Apache Drill > Issue Type: Bug > Components: Metadata >Affects Versions: 1.17.0 >Reporter: Charles Givre >Priority: Major > Attachments: Screen Shot 2019-06-24 at 3.16.40 PM.png, domains.csvh > > > I'm noticing some strange behavior with the newest version of Drill. If you > query a CSV file, you get the following metadata: > {code:sql} > SELECT * FROM dfs.test.`domains.csvh` LIMIT 1 > {code} > {code:json} > { > "queryId": "22eee85f-c02c-5878-9735-091d18788061", > "columns": [ > "domain" > ], > "rows": [} > { "domain": "thedataist.com" } ], > "metadata": [ > "VARCHAR(0, 0)", > "VARCHAR(0, 0)" > ], > "queryState": "COMPLETED", > "attemptedAutoLimit": 0 > } > {code} > There are two issues here: > 1. VARCHAR now has precision > 2. There are twice as many columns as there should be. > Additionally, if you query a regular CSV, without the columns extracted, you > get the following: > {code:json} > "rows": [ > { > "columns": "[\"ACCT_NUM\",\"PRODUCT\",\"MONTH\",\"REVENUE\"]" } > ], > "metadata": [ > "VARCHAR(0, 0)", > "VARCHAR(0, 0)" > ], > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7308) Incorrect Metadata from text file queries
[ https://issues.apache.org/jira/browse/DRILL-7308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872031#comment-16872031 ] Paul Rogers commented on DRILL-7308: The problem with duplicated schema is also due to a flaw in the DRILL-6847 code in {{WebUserConnection}}: {code:java} @Override public void sendData(RpcOutcomeListener listener, QueryWritableBatch result) { ... for (int i = 0; i < loader.getSchema().getFieldCount(); ++i) { //DRILL-6847: This section adds query metadata to the REST results {code} The {{sendData()}} method is called for *each* batch of data sent by the server. Probably the manual test case was against a short file that fit into a single batch. However, if the file is large, or if the query is distributed with multiple files, then multiple batches will be sent. Also, with the recent "V3" text reader, the code sends an empty schema batch followed by one or more non-empty data batches. (This "feature" is being disabled in DRILL-7306.) So, each time a batch is received, the code adds another copy of the schema to the {{metadata}} list maintained in {{WebUserConnection}}. A quick and dirty solution is to count the batches, and set the schema only on the first. > Incorrect Metadata from text file queries > - > > Key: DRILL-7308 > URL: https://issues.apache.org/jira/browse/DRILL-7308 > Project: Apache Drill > Issue Type: Bug > Components: Metadata >Affects Versions: 1.17.0 >Reporter: Charles Givre >Priority: Major > Attachments: Screen Shot 2019-06-24 at 3.16.40 PM.png, domains.csvh > > > I'm noticing some strange behavior with the newest version of Drill. If you > query a CSV file, you get the following metadata: > {code:sql} > SELECT * FROM dfs.test.`domains.csvh` LIMIT 1 > {code} > {code:json} > { > "queryId": "22eee85f-c02c-5878-9735-091d18788061", > "columns": [ > "domain" > ], > "rows": [} > { "domain": "thedataist.com" } ], > "metadata": [ > "VARCHAR(0, 0)", > "VARCHAR(0, 0)" > ], > "queryState": "COMPLETED", > "attemptedAutoLimit": 0 > } > {code} > There are two issues here: > 1. VARCHAR now has precision > 2. There are twice as many columns as there should be. > Additionally, if you query a regular CSV, without the columns extracted, you > get the following: > {code:json} > "rows": [ > { > "columns": "[\"ACCT_NUM\",\"PRODUCT\",\"MONTH\",\"REVENUE\"]" } > ], > "metadata": [ > "VARCHAR(0, 0)", > "VARCHAR(0, 0)" > ], > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (DRILL-7308) Incorrect Metadata from text file queries
[ https://issues.apache.org/jira/browse/DRILL-7308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871990#comment-16871990 ] Paul Rogers edited comment on DRILL-7308 at 6/25/19 5:27 AM: - The width issue appears to have been introduced with this commit: "DRILL-6847: Add Query Metadata to RESTful Interface" (which, ahem, [~cgivre], was your PR...). In {{WebUserConnection}}: {code:java} //For DECIMAL type if (col.getType().hasPrecision()) { dataType.append("("); dataType.append(col.getType().getPrecision()); if (col.getType().hasScale()) { dataType.append(", "); dataType.append(col.getType().getScale()); } dataType.append(")"); } else if (col.getType().hasWidth()) { //Case for VARCHAR columns with specified width dataType.append("("); dataType.append(col.getType().getWidth()); dataType.append(")"); } {code} I did not debug the code, but it appears that {{hasPrecision()}} and {{hasScale()}} simply report if the field is set; it does *not* tell us if the field is zero. Also, about a year or so ago, Drill moved {{VARCHAR}} width to the precision field, so the supposed {{VARCHAR}} code block is a no-op. The correct code would be something like: {code:java} //For DECIMAL and VARCHAR types if (col.getType().hasPrecision() && col.getType().getPrecision() > 0) { dataType.append("("); dataType.append(col.getType().getPrecision()); if (col.getType().hasScale() && col.getType().getScale() > 0) { {code} was (Author: paul.rogers): The width issue appears to have been introduced with this commit: "DRILL-6847: Add Query Metadata to RESTful Interface" (which, ahem, [~cgivre], was your PR...): {code:java} //For DECIMAL type if (col.getType().hasPrecision()) { dataType.append("("); dataType.append(col.getType().getPrecision()); if (col.getType().hasScale()) { dataType.append(", "); dataType.append(col.getType().getScale()); } dataType.append(")"); } else if (col.getType().hasWidth()) { //Case for VARCHAR columns with specified width dataType.append("("); dataType.append(col.getType().getWidth()); dataType.append(")"); } {code} I did not debug the code, but it appears that {{hasPrecision()}} and {{hasScale()}} simply report if the field is set; it does *not* tell us if the field is zero. Also, about a year or so ago, Drill moved {{VARCHAR}} width to the precision field, so the supposed {{VARCHAR}} code block is a no-op. The correct code would be something like: {code:java} //For DECIMAL and VARCHAR types if (col.getType().hasPrecision() && col.getType().getPrecision() > 0) { dataType.append("("); dataType.append(col.getType().getPrecision()); if (col.getType().hasScale() && col.getType().getScale() > 0) { {code} > Incorrect Metadata from text file queries > - > > Key: DRILL-7308 > URL: https://issues.apache.org/jira/browse/DRILL-7308 > Project: Apache Drill > Issue Type: Bug > Components: Metadata >Affects Versions: 1.17.0 >Reporter: Charles Givre >Priority: Major > Attachments: Screen Shot 2019-06-24 at 3.16.40 PM.png, domains.csvh > > > I'm noticing some strange behavior with the newest version of Drill. If you > query a CSV file, you get the following metadata: > {code:sql} > SELECT * FROM dfs.test.`domains.csvh` LIMIT 1 > {code} > {code:json} > { > "queryId": "22eee85f-c02c-5878-9735-091d18788061", > "columns": [ > "domain" > ], > "rows": [} > { "domain": "thedataist.com" } ], > "metadata": [ > "VARCHAR(0, 0)", > "VARCHAR(0, 0)" > ], > "queryState": "COMPLETED", > "attemptedAutoLimit": 0 > } > {code} > There are two issues here: > 1. VARCHAR now has precision > 2. There are twice as many columns as there should be. > Additionally, if you query a regular CSV, without the columns extracted, you > get the following: > {code:json} > "rows": [ > { > "columns": "[\"ACCT_NUM\",\"PRODUCT\",\"MONTH\",\"REVENUE\"]" } > ], > "metadata": [ > "VARCHAR(0, 0)", > "VARCHAR(0, 0)" > ], > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework
[ https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872009#comment-16872009 ] ASF GitHub Bot commented on DRILL-7306: --- paul-rogers commented on issue #1813: DRILL-7306: Disable schema-only batch for new scan framework URL: https://github.com/apache/drill/pull/1813#issuecomment-505286184 @arina-ielchiieva, regarding `enableSchemaBatch`, recall that Java boolean variables are, by definition in the language spec, set to false. So, since we never set it to true, except in tests, it always defaults to false. Since this was confusing, added Javadoc to explain the problem and the default setting of the option. Does this new material answer your question? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Disable "fast schema" batch for new scan framework > -- > > Key: DRILL-7306 > URL: https://issues.apache.org/jira/browse/DRILL-7306 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.16.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.17.0 > > > The EVF framework is set up to return a "fast schema" empty batch with only > schema as its first batch because, when the code was written, it seemed > that's how we wanted operators to work. However, DRILL-7305 notes that many > operators cannot handle empty batches. > Since the empty-batch bugs show that Drill does not, in fact, provide a "fast > schema" batch, this ticket asks to disable the feature in the new scan > framework. The feature is disabled with a config option; it can be re-enabled > if ever it is needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7308) Incorrect Metadata from text file queries
[ https://issues.apache.org/jira/browse/DRILL-7308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871992#comment-16871992 ] Paul Rogers commented on DRILL-7308: This issue points out a unfortunate reality (IMHO): lack of unit tests for the REST API. We have nothing, other than vigilent users, to track down issues such as this one. I believe that unit tests can be easily created: use a {{ClusterTest}} and set the config(?) option to enable the web server. Use a web client of some sort to fire a request. Either compare the results against a golden file, or just test for the bits of interest (such as, for DRILL-6847, test against each type and mode, and with and without width/precision, and verify just that part of the result. The unit test would also allow very easy debugging. It seems the best we can do at present is build all of Drill, start it, and connect a remote debugger. This is so cumbersome that folks will avoid stepping through code to see if it works. > Incorrect Metadata from text file queries > - > > Key: DRILL-7308 > URL: https://issues.apache.org/jira/browse/DRILL-7308 > Project: Apache Drill > Issue Type: Bug > Components: Metadata >Affects Versions: 1.17.0 >Reporter: Charles Givre >Priority: Major > Attachments: Screen Shot 2019-06-24 at 3.16.40 PM.png, domains.csvh > > > I'm noticing some strange behavior with the newest version of Drill. If you > query a CSV file, you get the following metadata: > {code:sql} > SELECT * FROM dfs.test.`domains.csvh` LIMIT 1 > {code} > {code:json} > { > "queryId": "22eee85f-c02c-5878-9735-091d18788061", > "columns": [ > "domain" > ], > "rows": [} > { "domain": "thedataist.com" } ], > "metadata": [ > "VARCHAR(0, 0)", > "VARCHAR(0, 0)" > ], > "queryState": "COMPLETED", > "attemptedAutoLimit": 0 > } > {code} > There are two issues here: > 1. VARCHAR now has precision > 2. There are twice as many columns as there should be. > Additionally, if you query a regular CSV, without the columns extracted, you > get the following: > {code:json} > "rows": [ > { > "columns": "[\"ACCT_NUM\",\"PRODUCT\",\"MONTH\",\"REVENUE\"]" } > ], > "metadata": [ > "VARCHAR(0, 0)", > "VARCHAR(0, 0)" > ], > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (DRILL-7308) Incorrect Metadata from text file queries
[ https://issues.apache.org/jira/browse/DRILL-7308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871992#comment-16871992 ] Paul Rogers edited comment on DRILL-7308 at 6/25/19 4:41 AM: - This issue points out a unfortunate reality (IMHO): lack of unit tests for the REST API. We have nothing, other than vigilent users, to track down issues such as this one. I believe that unit tests can be easily created: use a {{ClusterTest}} and set the config\(?) option to enable the web server. Use a web client of some sort to fire a request. Either compare the results against a golden file, or just test for the bits of interest (such as, for DRILL-6847, test against each type and mode, and with and without width/precision, and verify just that part of the result. The unit test would also allow very easy debugging. It seems the best we can do at present is build all of Drill, start it, and connect a remote debugger. This is so cumbersome that folks will avoid stepping through code to see if it works. was (Author: paul.rogers): This issue points out a unfortunate reality (IMHO): lack of unit tests for the REST API. We have nothing, other than vigilent users, to track down issues such as this one. I believe that unit tests can be easily created: use a {{ClusterTest}} and set the config(?) option to enable the web server. Use a web client of some sort to fire a request. Either compare the results against a golden file, or just test for the bits of interest (such as, for DRILL-6847, test against each type and mode, and with and without width/precision, and verify just that part of the result. The unit test would also allow very easy debugging. It seems the best we can do at present is build all of Drill, start it, and connect a remote debugger. This is so cumbersome that folks will avoid stepping through code to see if it works. > Incorrect Metadata from text file queries > - > > Key: DRILL-7308 > URL: https://issues.apache.org/jira/browse/DRILL-7308 > Project: Apache Drill > Issue Type: Bug > Components: Metadata >Affects Versions: 1.17.0 >Reporter: Charles Givre >Priority: Major > Attachments: Screen Shot 2019-06-24 at 3.16.40 PM.png, domains.csvh > > > I'm noticing some strange behavior with the newest version of Drill. If you > query a CSV file, you get the following metadata: > {code:sql} > SELECT * FROM dfs.test.`domains.csvh` LIMIT 1 > {code} > {code:json} > { > "queryId": "22eee85f-c02c-5878-9735-091d18788061", > "columns": [ > "domain" > ], > "rows": [} > { "domain": "thedataist.com" } ], > "metadata": [ > "VARCHAR(0, 0)", > "VARCHAR(0, 0)" > ], > "queryState": "COMPLETED", > "attemptedAutoLimit": 0 > } > {code} > There are two issues here: > 1. VARCHAR now has precision > 2. There are twice as many columns as there should be. > Additionally, if you query a regular CSV, without the columns extracted, you > get the following: > {code:json} > "rows": [ > { > "columns": "[\"ACCT_NUM\",\"PRODUCT\",\"MONTH\",\"REVENUE\"]" } > ], > "metadata": [ > "VARCHAR(0, 0)", > "VARCHAR(0, 0)" > ], > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7308) Incorrect Metadata from text file queries
[ https://issues.apache.org/jira/browse/DRILL-7308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871990#comment-16871990 ] Paul Rogers commented on DRILL-7308: The width issue appears to have been introduced with this commit: "DRILL-6847: Add Query Metadata to RESTful Interface" (which, ahem, [~cgivre], was your PR...): {code:java} //For DECIMAL type if (col.getType().hasPrecision()) { dataType.append("("); dataType.append(col.getType().getPrecision()); if (col.getType().hasScale()) { dataType.append(", "); dataType.append(col.getType().getScale()); } dataType.append(")"); } else if (col.getType().hasWidth()) { //Case for VARCHAR columns with specified width dataType.append("("); dataType.append(col.getType().getWidth()); dataType.append(")"); } {code} I did not debug the code, but it appears that {{hasPrecision()}} and {{hasScale()}} simply report if the field is set; it does *not* tell us if the field is zero. Also, about a year or so ago, Drill moved {{VARCHAR}} width to the precision field, so the supposed {{VARCHAR}} code block is a no-op. The correct code would be something like: {code:java} //For DECIMAL and VARCHAR types if (col.getType().hasPrecision() && col.getType().getPrecision() > 0) { dataType.append("("); dataType.append(col.getType().getPrecision()); if (col.getType().hasScale() && col.getType().getScale() > 0) { {code} > Incorrect Metadata from text file queries > - > > Key: DRILL-7308 > URL: https://issues.apache.org/jira/browse/DRILL-7308 > Project: Apache Drill > Issue Type: Bug > Components: Metadata >Affects Versions: 1.17.0 >Reporter: Charles Givre >Priority: Major > Attachments: Screen Shot 2019-06-24 at 3.16.40 PM.png, domains.csvh > > > I'm noticing some strange behavior with the newest version of Drill. If you > query a CSV file, you get the following metadata: > {code:sql} > SELECT * FROM dfs.test.`domains.csvh` LIMIT 1 > {code} > {code:json} > { > "queryId": "22eee85f-c02c-5878-9735-091d18788061", > "columns": [ > "domain" > ], > "rows": [} > { "domain": "thedataist.com" } ], > "metadata": [ > "VARCHAR(0, 0)", > "VARCHAR(0, 0)" > ], > "queryState": "COMPLETED", > "attemptedAutoLimit": 0 > } > {code} > There are two issues here: > 1. VARCHAR now has precision > 2. There are twice as many columns as there should be. > Additionally, if you query a regular CSV, without the columns extracted, you > get the following: > {code:json} > "rows": [ > { > "columns": "[\"ACCT_NUM\",\"PRODUCT\",\"MONTH\",\"REVENUE\"]" } > ], > "metadata": [ > "VARCHAR(0, 0)", > "VARCHAR(0, 0)" > ], > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7308) Incorrect Metadata from text file queries
[ https://issues.apache.org/jira/browse/DRILL-7308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871954#comment-16871954 ] Charles Givre commented on DRILL-7308: -- [~Paul.Rogers], That is correct. Something is amiss in the REST API. It was breaking Superset. Also, I was attempting to run unit tests on some UDFs I've been working on and was encountering strange errors that related to CHAR and VARCHAR datatypes. I suspect that these problems may be related. > Incorrect Metadata from text file queries > - > > Key: DRILL-7308 > URL: https://issues.apache.org/jira/browse/DRILL-7308 > Project: Apache Drill > Issue Type: Bug > Components: Metadata >Affects Versions: 1.17.0 >Reporter: Charles Givre >Priority: Major > Attachments: Screen Shot 2019-06-24 at 3.16.40 PM.png, domains.csvh > > > I'm noticing some strange behavior with the newest version of Drill. If you > query a CSV file, you get the following metadata: > {code:sql} > SELECT * FROM dfs.test.`domains.csvh` LIMIT 1 > {code} > {code:json} > { > "queryId": "22eee85f-c02c-5878-9735-091d18788061", > "columns": [ > "domain" > ], > "rows": [} > { "domain": "thedataist.com" } ], > "metadata": [ > "VARCHAR(0, 0)", > "VARCHAR(0, 0)" > ], > "queryState": "COMPLETED", > "attemptedAutoLimit": 0 > } > {code} > There are two issues here: > 1. VARCHAR now has precision > 2. There are twice as many columns as there should be. > Additionally, if you query a regular CSV, without the columns extracted, you > get the following: > {code:json} > "rows": [ > { > "columns": "[\"ACCT_NUM\",\"PRODUCT\",\"MONTH\",\"REVENUE\"]" } > ], > "metadata": [ > "VARCHAR(0, 0)", > "VARCHAR(0, 0)" > ], > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (DRILL-7308) Incorrect Metadata from text file queries
[ https://issues.apache.org/jira/browse/DRILL-7308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871945#comment-16871945 ] Paul Rogers edited comment on DRILL-7308 at 6/25/19 3:08 AM: - According to the screen shot, this is the REST API, method POST with query as payload, URL is {{http::/query.json}}. was (Author: paul.rogers): I presume this is the REST API? Please specify the URL used to do the query. > Incorrect Metadata from text file queries > - > > Key: DRILL-7308 > URL: https://issues.apache.org/jira/browse/DRILL-7308 > Project: Apache Drill > Issue Type: Bug > Components: Metadata >Affects Versions: 1.17.0 >Reporter: Charles Givre >Priority: Major > Attachments: Screen Shot 2019-06-24 at 3.16.40 PM.png, domains.csvh > > > I'm noticing some strange behavior with the newest version of Drill. If you > query a CSV file, you get the following metadata: > {code:sql} > SELECT * FROM dfs.test.`domains.csvh` LIMIT 1 > {code} > {code:json} > { > "queryId": "22eee85f-c02c-5878-9735-091d18788061", > "columns": [ > "domain" > ], > "rows": [} > { "domain": "thedataist.com" } ], > "metadata": [ > "VARCHAR(0, 0)", > "VARCHAR(0, 0)" > ], > "queryState": "COMPLETED", > "attemptedAutoLimit": 0 > } > {code} > There are two issues here: > 1. VARCHAR now has precision > 2. There are twice as many columns as there should be. > Additionally, if you query a regular CSV, without the columns extracted, you > get the following: > {code:json} > "rows": [ > { > "columns": "[\"ACCT_NUM\",\"PRODUCT\",\"MONTH\",\"REVENUE\"]" } > ], > "metadata": [ > "VARCHAR(0, 0)", > "VARCHAR(0, 0)" > ], > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7308) Incorrect Metadata from text file queries
[ https://issues.apache.org/jira/browse/DRILL-7308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871945#comment-16871945 ] Paul Rogers commented on DRILL-7308: I presume this is the REST API? Please specify the URL used to do the query. > Incorrect Metadata from text file queries > - > > Key: DRILL-7308 > URL: https://issues.apache.org/jira/browse/DRILL-7308 > Project: Apache Drill > Issue Type: Bug > Components: Metadata >Affects Versions: 1.17.0 >Reporter: Charles Givre >Priority: Major > Attachments: Screen Shot 2019-06-24 at 3.16.40 PM.png, domains.csvh > > > I'm noticing some strange behavior with the newest version of Drill. If you > query a CSV file, you get the following metadata: > {code:sql} > SELECT * FROM dfs.test.`domains.csvh` LIMIT 1 > {code} > {code:json} > { > "queryId": "22eee85f-c02c-5878-9735-091d18788061", > "columns": [ > "domain" > ], > "rows": [} > { "domain": "thedataist.com" } ], > "metadata": [ > "VARCHAR(0, 0)", > "VARCHAR(0, 0)" > ], > "queryState": "COMPLETED", > "attemptedAutoLimit": 0 > } > {code} > There are two issues here: > 1. VARCHAR now has precision > 2. There are twice as many columns as there should be. > Additionally, if you query a regular CSV, without the columns extracted, you > get the following: > {code:json} > "rows": [ > { > "columns": "[\"ACCT_NUM\",\"PRODUCT\",\"MONTH\",\"REVENUE\"]" } > ], > "metadata": [ > "VARCHAR(0, 0)", > "VARCHAR(0, 0)" > ], > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-7308) Incorrect Metadata from text file queries
[ https://issues.apache.org/jira/browse/DRILL-7308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Rogers updated DRILL-7308: --- Description: I'm noticing some strange behavior with the newest version of Drill. If you query a CSV file, you get the following metadata: {code:sql} SELECT * FROM dfs.test.`domains.csvh` LIMIT 1 {code} {code:json} { "queryId": "22eee85f-c02c-5878-9735-091d18788061", "columns": [ "domain" ], "rows": [} { "domain": "thedataist.com" } ], "metadata": [ "VARCHAR(0, 0)", "VARCHAR(0, 0)" ], "queryState": "COMPLETED", "attemptedAutoLimit": 0 } {code} There are two issues here: 1. VARCHAR now has precision 2. There are twice as many columns as there should be. Additionally, if you query a regular CSV, without the columns extracted, you get the following: {code:json} "rows": [ { "columns": "[\"ACCT_NUM\",\"PRODUCT\",\"MONTH\",\"REVENUE\"]" } ], "metadata": [ "VARCHAR(0, 0)", "VARCHAR(0, 0)" ], {code} was: I'm noticing some strange behavior with the newest version of Drill. If you query a CSV file, you get the following metadata: {code:sql} SELECT * FROM dfs.test.`domains.csvh` LIMIT 1 {code} {code:json} { "queryId": "22eee85f-c02c-5878-9735-091d18788061", "columns": [ "domain" ], "rows": [} { "domain": "thedataist.com" } ], "metadata": [}} "VARCHAR(0, 0)", "VARCHAR(0, 0)" ], "queryState": "COMPLETED", "attemptedAutoLimit": 0 } {code} There are two issues here: 1. VARCHAR now has precision 2. There are twice as many columns as there should be. Additionally, if you query a regular CSV, without the columns extracted, you get the following: {code:json} "rows": [ { "columns": "[\"ACCT_NUM\",\"PRODUCT\",\"MONTH\",\"REVENUE\"]" } ], "metadata": [ "VARCHAR(0, 0)", "VARCHAR(0, 0)" ], {code} > Incorrect Metadata from text file queries > - > > Key: DRILL-7308 > URL: https://issues.apache.org/jira/browse/DRILL-7308 > Project: Apache Drill > Issue Type: Bug > Components: Metadata >Affects Versions: 1.17.0 >Reporter: Charles Givre >Priority: Major > Attachments: Screen Shot 2019-06-24 at 3.16.40 PM.png, domains.csvh > > > I'm noticing some strange behavior with the newest version of Drill. If you > query a CSV file, you get the following metadata: > {code:sql} > SELECT * FROM dfs.test.`domains.csvh` LIMIT 1 > {code} > {code:json} > { > "queryId": "22eee85f-c02c-5878-9735-091d18788061", > "columns": [ > "domain" > ], > "rows": [} > { "domain": "thedataist.com" } ], > "metadata": [ > "VARCHAR(0, 0)", > "VARCHAR(0, 0)" > ], > "queryState": "COMPLETED", > "attemptedAutoLimit": 0 > } > {code} > There are two issues here: > 1. VARCHAR now has precision > 2. There are twice as many columns as there should be. > Additionally, if you query a regular CSV, without the columns extracted, you > get the following: > {code:json} > "rows": [ > { > "columns": "[\"ACCT_NUM\",\"PRODUCT\",\"MONTH\",\"REVENUE\"]" } > ], > "metadata": [ > "VARCHAR(0, 0)", > "VARCHAR(0, 0)" > ], > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-7308) Incorrect Metadata from text file queries
[ https://issues.apache.org/jira/browse/DRILL-7308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Rogers updated DRILL-7308: --- Description: I'm noticing some strange behavior with the newest version of Drill. If you query a CSV file, you get the following metadata: {code:sql} SELECT * FROM dfs.test.`domains.csvh` LIMIT 1 {code} {code:json} { "queryId": "22eee85f-c02c-5878-9735-091d18788061", "columns": [ "domain" ], "rows": [} { "domain": "thedataist.com" } ], "metadata": [}} "VARCHAR(0, 0)", "VARCHAR(0, 0)" ], "queryState": "COMPLETED", "attemptedAutoLimit": 0 } {code} There are two issues here: 1. VARCHAR now has precision 2. There are twice as many columns as there should be. Additionally, if you query a regular CSV, without the columns extracted, you get the following: {code:json} "rows": [ { "columns": "[\"ACCT_NUM\",\"PRODUCT\",\"MONTH\",\"REVENUE\"]" } ], "metadata": [ "VARCHAR(0, 0)", "VARCHAR(0, 0)" ], {code} was: {{I'm noticing some strange behavior with the newest version of Drill. If you query a CSV file, you get the following metadata:}} {{ }} {{ SELECT * FROM dfs.test.`domains.csvh` LIMIT 1}} {{ }} {{ {}} {{ "queryId": "22eee85f-c02c-5878-9735-091d18788061",}} {{ "columns": [}} {{ "domain"}} {{ ],}} {{ "rows": [}} {{ }}{{{ "domain": "thedataist.com" }}}{{ ],}} {{ "metadata": [}} {{ "VARCHAR(0, 0)",}} {{ "VARCHAR(0, 0)"}} {{ ],}} {{ "queryState": "COMPLETED",}} {{ "attemptedAutoLimit": 0}} {{ }}} {{ }} {{ }} {{ There are two issues here:}} {{ 1. VARCHAR now has precision }} {{ 2. There are twice as many columns as there should be.}} {{ }} {{ Additionally, if you query a regular CSV, without the columns extracted, you get the following:}} {{ }} {{ "rows": [}} {{ }} { "columns": "[\"ACCT_NUM\",\"PRODUCT\",\"MONTH\",\"REVENUE\"]" } ], "metadata": [ "VARCHAR(0, 0)", "VARCHAR(0, 0)" ], > Incorrect Metadata from text file queries > - > > Key: DRILL-7308 > URL: https://issues.apache.org/jira/browse/DRILL-7308 > Project: Apache Drill > Issue Type: Bug > Components: Metadata >Affects Versions: 1.17.0 >Reporter: Charles Givre >Priority: Major > Attachments: Screen Shot 2019-06-24 at 3.16.40 PM.png, domains.csvh > > > I'm noticing some strange behavior with the newest version of Drill. If you > query a CSV file, you get the following metadata: > {code:sql} > SELECT * FROM dfs.test.`domains.csvh` LIMIT 1 > {code} > {code:json} > { > "queryId": "22eee85f-c02c-5878-9735-091d18788061", > "columns": [ > "domain" > ], > "rows": [} > { "domain": "thedataist.com" } ], > "metadata": [}} > "VARCHAR(0, 0)", > "VARCHAR(0, 0)" > ], > "queryState": "COMPLETED", > "attemptedAutoLimit": 0 > } > {code} > There are two issues here: > 1. VARCHAR now has precision > 2. There are twice as many columns as there should be. > Additionally, if you query a regular CSV, without the columns extracted, you > get the following: > {code:json} > "rows": [ > { > "columns": "[\"ACCT_NUM\",\"PRODUCT\",\"MONTH\",\"REVENUE\"]" } > ], > "metadata": [ > "VARCHAR(0, 0)", > "VARCHAR(0, 0)" > ], > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6951) Merge row set based mock data source
[ https://issues.apache.org/jira/browse/DRILL-6951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871937#comment-16871937 ] ASF GitHub Bot commented on DRILL-6951: --- paul-rogers commented on issue #1809: DRILL-6951: Row set based mock data source URL: https://github.com/apache/drill/pull/1809#issuecomment-505257599 Squashed commits and rebased on latest master. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Merge row set based mock data source > > > Key: DRILL-6951 > URL: https://issues.apache.org/jira/browse/DRILL-6951 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.15.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Labels: ready-to-commit > Fix For: 1.17.0 > > > The mock reader framework is an obscure bit of code used in tests that > generates fake data for use in things like testing sort, filters and so on. > Because the mock reader is simple, it is a good demonstration case for the > new scanner framework based on the result set loader. This task merges the > existing work in migrating the mock data source into master via a PR. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-7308) Incorrect Metadata from text file queries
[ https://issues.apache.org/jira/browse/DRILL-7308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Givre updated DRILL-7308: - Attachment: Screen Shot 2019-06-24 at 3.16.40 PM.png > Incorrect Metadata from text file queries > - > > Key: DRILL-7308 > URL: https://issues.apache.org/jira/browse/DRILL-7308 > Project: Apache Drill > Issue Type: Bug > Components: Metadata >Affects Versions: 1.17.0 >Reporter: Charles Givre >Priority: Major > Attachments: Screen Shot 2019-06-24 at 3.16.40 PM.png, domains.csvh > > > {{I'm noticing some strange behavior with the newest version of Drill. If > you query a CSV file, you get the following metadata:}} > {{ }} > {{ SELECT * FROM dfs.test.`domains.csvh` LIMIT 1}} > {{ }} > {{ {}} > {{ "queryId": "22eee85f-c02c-5878-9735-091d18788061",}} > {{ "columns": [}} > {{ "domain"}} > {{ ],}} > {{ "rows": [}} > {{ }}{{{ "domain": "thedataist.com" }}}{{ ],}} > {{ "metadata": [}} > {{ "VARCHAR(0, 0)",}} > {{ "VARCHAR(0, 0)"}} > {{ ],}} > {{ "queryState": "COMPLETED",}} > {{ "attemptedAutoLimit": 0}} > {{ }}} > {{ }} > {{ }} > {{ There are two issues here:}} > {{ 1. VARCHAR now has precision }} > {{ 2. There are twice as many columns as there should be.}} > {{ }} > {{ Additionally, if you query a regular CSV, without the columns extracted, > you get the following:}} > {{ }} > {{ "rows": [}} > {{ }} > { "columns": "[\"ACCT_NUM\",\"PRODUCT\",\"MONTH\",\"REVENUE\"]" } > ], > "metadata": [ > "VARCHAR(0, 0)", > "VARCHAR(0, 0)" > ], > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-7308) Incorrect Metadata from text file queries
[ https://issues.apache.org/jira/browse/DRILL-7308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Givre updated DRILL-7308: - Description: {{I'm noticing some strange behavior with the newest version of Drill. If you query a CSV file, you get the following metadata:}} {{ }} {{ SELECT * FROM dfs.test.`domains.csvh` LIMIT 1}} {{ }} {{ {}} {{ "queryId": "22eee85f-c02c-5878-9735-091d18788061",}} {{ "columns": [}} {{ "domain"}} {{ ],}} {{ "rows": [}} {{ }}{{{ "domain": "thedataist.com" }}}{{ ],}} {{ "metadata": [}} {{ "VARCHAR(0, 0)",}} {{ "VARCHAR(0, 0)"}} {{ ],}} {{ "queryState": "COMPLETED",}} {{ "attemptedAutoLimit": 0}} {{ }}} {{ }} {{ }} {{ There are two issues here:}} {{ 1. VARCHAR now has precision }} {{ 2. There are twice as many columns as there should be.}} {{ }} {{ Additionally, if you query a regular CSV, without the columns extracted, you get the following:}} {{ }} {{ "rows": [}} {{ }} { "columns": "[\"ACCT_NUM\",\"PRODUCT\",\"MONTH\",\"REVENUE\"]" } ], "metadata": [ "VARCHAR(0, 0)", "VARCHAR(0, 0)" ], was: I'm noticing some strange behavior with the newest version of Drill. If you query a CSV file, you get the following metadata: SELECT * FROM dfs.test.`domains.csvh` LIMIT 1 { "queryId": "22eee85f-c02c-5878-9735-091d18788061", "columns": [ "domain" ], "rows": [ { "domain": "thedataist.com" } ], "metadata": [ "VARCHAR(0, 0)", "VARCHAR(0, 0)" ], "queryState": "COMPLETED", "attemptedAutoLimit": 0 } There are two issues here: 1. VARCHAR now has precision 2. There are twice as many columns as there should be. Additionally, if you query a regular CSV, without the columns extracted, you get the following: "rows": [ { "columns": "[\"ACCT_NUM\",\"PRODUCT\",\"MONTH\",\"REVENUE\"]" } ], "metadata": [ "VARCHAR(0, 0)", "VARCHAR(0, 0)" ], > Incorrect Metadata from text file queries > - > > Key: DRILL-7308 > URL: https://issues.apache.org/jira/browse/DRILL-7308 > Project: Apache Drill > Issue Type: Bug > Components: Metadata >Affects Versions: 1.17.0 >Reporter: Charles Givre >Priority: Major > Attachments: Screen Shot 2019-06-24 at 3.16.40 PM.png, domains.csvh > > > {{I'm noticing some strange behavior with the newest version of Drill. If > you query a CSV file, you get the following metadata:}} > {{ }} > {{ SELECT * FROM dfs.test.`domains.csvh` LIMIT 1}} > {{ }} > {{ {}} > {{ "queryId": "22eee85f-c02c-5878-9735-091d18788061",}} > {{ "columns": [}} > {{ "domain"}} > {{ ],}} > {{ "rows": [}} > {{ }}{{{ "domain": "thedataist.com" }}}{{ ],}} > {{ "metadata": [}} > {{ "VARCHAR(0, 0)",}} > {{ "VARCHAR(0, 0)"}} > {{ ],}} > {{ "queryState": "COMPLETED",}} > {{ "attemptedAutoLimit": 0}} > {{ }}} > {{ }} > {{ }} > {{ There are two issues here:}} > {{ 1. VARCHAR now has precision }} > {{ 2. There are twice as many columns as there should be.}} > {{ }} > {{ Additionally, if you query a regular CSV, without the columns extracted, > you get the following:}} > {{ }} > {{ "rows": [}} > {{ }} > { "columns": "[\"ACCT_NUM\",\"PRODUCT\",\"MONTH\",\"REVENUE\"]" } > ], > "metadata": [ > "VARCHAR(0, 0)", > "VARCHAR(0, 0)" > ], > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-7308) Incorrect Metadata from text file queries
Charles Givre created DRILL-7308: Summary: Incorrect Metadata from text file queries Key: DRILL-7308 URL: https://issues.apache.org/jira/browse/DRILL-7308 Project: Apache Drill Issue Type: Bug Components: Metadata Affects Versions: 1.17.0 Reporter: Charles Givre Attachments: domains.csvh I'm noticing some strange behavior with the newest version of Drill. If you query a CSV file, you get the following metadata: SELECT * FROM dfs.test.`domains.csvh` LIMIT 1 { "queryId": "22eee85f-c02c-5878-9735-091d18788061", "columns": [ "domain" ], "rows": [ { "domain": "thedataist.com" } ], "metadata": [ "VARCHAR(0, 0)", "VARCHAR(0, 0)" ], "queryState": "COMPLETED", "attemptedAutoLimit": 0 } There are two issues here: 1. VARCHAR now has precision 2. There are twice as many columns as there should be. Additionally, if you query a regular CSV, without the columns extracted, you get the following: "rows": [ { "columns": "[\"ACCT_NUM\",\"PRODUCT\",\"MONTH\",\"REVENUE\"]" } ], "metadata": [ "VARCHAR(0, 0)", "VARCHAR(0, 0)" ], -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
[ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Volodymyr Vysotskyi updated DRILL-7271: --- Description: 1. Merge info from metadataStatistics + statisticsKinds into one holder: Map. 2. Rename hasStatistics to hasDescriptiveStatistics 3. Remove drill-file-metastore-plugin 4. Move org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel to metadata module, rename to MetadataType and add new value: SEGMENT. 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder. 6. Add new info classes: {noformat} class TableInfo { String storagePlugin; String workspace; String name; String type; String owner; } class MetadataInfo { public static final String GENERAL_INFO_KEY = "GENERAL_INFO"; public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT"; MetadataType type (enum); String key; String identifier; } {noformat} 7. Modify existing metadata classes: org.apache.drill.metastore.FileTableMetadata {noformat} missing fields -- storagePlugin, workspace, tableType -> will be covered by TableInfo class metadataType, metadataKey -> will be covered by MetadataInfo class interestingColumns fields to modify private final Map tableStatistics; private final Map statisticsKinds; private final Set partitionKeys; -> Map {noformat} org.apache.drill.metastore.PartitionMetadata {noformat} missing fields -- storagePlugin, workspace -> will be covered by TableInfo class metadataType, metadataKey, metadataIdentifier -> will be covered by MetadataInfo class partitionValues (List) location (String) (for directory level metadata) - directory location fields to modify private final Map tableStatistics; private final Map statisticsKinds; private final Set location; -> locations {noformat} org.apache.drill.metastore.FileMetadata {noformat} missing fields -- storagePlugin, workspace -> will be covered by TableInfo class metadataType, metadataKey, metadataIdentifier -> will be covered by MetadataInfo class path - path to file fields to modify private final Map tableStatistics; private final Map statisticsKinds; private final Path location; - should contain directory to which file belongs {noformat} org.apache.drill.metastore.RowGroupMetadata {noformat} missing fields -- storagePlugin, workspace -> will be covered by TableInfo class metadataType, metadataKey, metadataIdentifier -> will be covered by MetadataInfo class path - path to file fields to modify private final Map tableStatistics; private final Map statisticsKinds; private final Path location; - should contain directory to which file belongs {noformat} 8. Remove org.apache.drill.exec package from metastore module. 9. Rename ColumnStatisticsImpl class. 10. Separate existing classes in org.apache.drill.metastore package into sub-packages. 11. Rename FileTableMetadata -> BaseTableMetadata 12. TableMetadataProvider.getNonInterestingColumnsMeta() -> getNonInterestingColumnsMetadata 13. Introduce segment-level metadata class: {noformat} class SegmentMetadata { TableInfo tableInfo; MetadataInfo metadataInfo; SchemaPath column; TupleMetadata schema; String location; Map columnsStatistics; Map statistics; List partitionValues; List locations; long lastModifiedTime; } {noformat} h1. Segment metadata In the fix for this Jira, one of the changes is introducing segment level metadata. For now, metadata hierarchy is the following: - Table - Segment - Partition - File - Row group Segment represents some a part of the table united using some specific qualities. For example for file system tables, segment may correspond to directories with its data. For hive tables, segment corresponds to hive partitions. In opposite, partition metadata, will correspond to "drill partitions". It is groups of data which have the same values for specific columns within a file or row group. So filtering will be produced for table level, then for segments, after that for partitions, for files and then for row groups. was: 1. Merge info from metadataStatistics + statisticsKinds into one holder: Map. 2. Rename hasStatistics to hasDescriptiveStatistics 3. Remove drill-file-metastore-plugin 4. Move org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel to metadata module, rename to MetadataType and add new value: SEGMENT. 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder. 6. Add new info classes: {noformat} class TableInfo { String storagePlugin; String workspace; String name; String type; String owner; } class MetadataInfo { public static final String GENERAL_INFO_KEY = "GENERAL_INFO"; public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT"; MetadataType type (enum); String key; String identifier; }
[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
[ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871478#comment-16871478 ] ASF GitHub Bot commented on DRILL-7271: --- arina-ielchiieva commented on issue #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore URL: https://github.com/apache/drill/pull/1810#issuecomment-505076823 @vvysotskyi thanks for making the changes. +1 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor Metadata interfaces and classes to contain all needed information > for the File based Metastore > --- > > Key: DRILL-7271 > URL: https://issues.apache.org/jira/browse/DRILL-7271 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Arina Ielchiieva >Assignee: Volodymyr Vysotskyi >Priority: Major > Labels: ready-to-commit > Fix For: 1.17.0 > > > 1. Merge info from metadataStatistics + statisticsKinds into one holder: > Map. > 2. Rename hasStatistics to hasDescriptiveStatistics > 3. Remove drill-file-metastore-plugin > 4. Move > org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel > to metadata module, rename to MetadataType and add new value: SEGMENT. > 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder. > 6. Add new info classes: > {noformat} > class TableInfo { > String storagePlugin; > String workspace; > String name; > String type; > String owner; > } > class MetadataInfo { > public static final String GENERAL_INFO_KEY = "GENERAL_INFO"; > public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT"; > MetadataType type (enum); > String key; > String identifier; > } > {noformat} > 7. Modify existing metadata classes: > org.apache.drill.metastore.FileTableMetadata > {noformat} > missing fields > -- > storagePlugin, workspace, tableType -> will be covered by TableInfo class > metadataType, metadataKey -> will be covered by MetadataInfo class > interestingColumns > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Set partitionKeys; -> Map > {noformat} > org.apache.drill.metastore.PartitionMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > partitionValues (List) > location (String) (for directory level metadata) - directory location > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Set location; -> locations > {noformat} > org.apache.drill.metastore.FileMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > path - path to file > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Path location; - should contain directory to which file belongs > {noformat} > org.apache.drill.metastore.RowGroupMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > path - path to file > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Path location; - should contain directory to which file belongs > {noformat} > 8. Remove org.apache.drill.exec package from metastore module. > 9. Rename ColumnStatisticsImpl class. > 10. Separate existing classes in org.apache.drill.metastore package into > sub-packages. > 11. Rename FileTableMetadata -> BaseTableMetadata > 12. TableMetadataProvider.getNonInterestingColumnsMeta() -> > getNonInterestingColumnsMetadata > 13. Introduce segment-level metadata class: > {noformat} > class SegmentMetadata { > TableInfo tableInfo; > MetadataInfo metadataInfo; > SchemaPath column; > TupleMetadata schema; > String location; > Map columnsStatistics; > Map statistics; > List partitionValues; > List locations; > long lastModifiedTime; > } > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
[ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-7271: Labels: ready-to-commit (was: ) > Refactor Metadata interfaces and classes to contain all needed information > for the File based Metastore > --- > > Key: DRILL-7271 > URL: https://issues.apache.org/jira/browse/DRILL-7271 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Arina Ielchiieva >Assignee: Volodymyr Vysotskyi >Priority: Major > Labels: ready-to-commit > Fix For: 1.17.0 > > > 1. Merge info from metadataStatistics + statisticsKinds into one holder: > Map. > 2. Rename hasStatistics to hasDescriptiveStatistics > 3. Remove drill-file-metastore-plugin > 4. Move > org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel > to metadata module, rename to MetadataType and add new value: SEGMENT. > 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder. > 6. Add new info classes: > {noformat} > class TableInfo { > String storagePlugin; > String workspace; > String name; > String type; > String owner; > } > class MetadataInfo { > public static final String GENERAL_INFO_KEY = "GENERAL_INFO"; > public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT"; > MetadataType type (enum); > String key; > String identifier; > } > {noformat} > 7. Modify existing metadata classes: > org.apache.drill.metastore.FileTableMetadata > {noformat} > missing fields > -- > storagePlugin, workspace, tableType -> will be covered by TableInfo class > metadataType, metadataKey -> will be covered by MetadataInfo class > interestingColumns > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Set partitionKeys; -> Map > {noformat} > org.apache.drill.metastore.PartitionMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > partitionValues (List) > location (String) (for directory level metadata) - directory location > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Set location; -> locations > {noformat} > org.apache.drill.metastore.FileMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > path - path to file > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Path location; - should contain directory to which file belongs > {noformat} > org.apache.drill.metastore.RowGroupMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > path - path to file > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Path location; - should contain directory to which file belongs > {noformat} > 8. Remove org.apache.drill.exec package from metastore module. > 9. Rename ColumnStatisticsImpl class. > 10. Separate existing classes in org.apache.drill.metastore package into > sub-packages. > 11. Rename FileTableMetadata -> BaseTableMetadata > 12. TableMetadataProvider.getNonInterestingColumnsMeta() -> > getNonInterestingColumnsMetadata > 13. Introduce segment-level metadata class: > {noformat} > class SegmentMetadata { > TableInfo tableInfo; > MetadataInfo metadataInfo; > SchemaPath column; > TupleMetadata schema; > String location; > Map columnsStatistics; > Map statistics; > List partitionValues; > List locations; > long lastModifiedTime; > } > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
[ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871468#comment-16871468 ] ASF GitHub Bot commented on DRILL-7271: --- vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore URL: https://github.com/apache/drill/pull/1810#discussion_r296794477 ## File path: metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/MetadataType.java ## @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.metastore.metadata; + +/** + * Enum with possible types of metadata. + */ +public enum MetadataType { + + ALL, Review comment: Thanks, done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor Metadata interfaces and classes to contain all needed information > for the File based Metastore > --- > > Key: DRILL-7271 > URL: https://issues.apache.org/jira/browse/DRILL-7271 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Arina Ielchiieva >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.17.0 > > > 1. Merge info from metadataStatistics + statisticsKinds into one holder: > Map. > 2. Rename hasStatistics to hasDescriptiveStatistics > 3. Remove drill-file-metastore-plugin > 4. Move > org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel > to metadata module, rename to MetadataType and add new value: SEGMENT. > 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder. > 6. Add new info classes: > {noformat} > class TableInfo { > String storagePlugin; > String workspace; > String name; > String type; > String owner; > } > class MetadataInfo { > public static final String GENERAL_INFO_KEY = "GENERAL_INFO"; > public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT"; > MetadataType type (enum); > String key; > String identifier; > } > {noformat} > 7. Modify existing metadata classes: > org.apache.drill.metastore.FileTableMetadata > {noformat} > missing fields > -- > storagePlugin, workspace, tableType -> will be covered by TableInfo class > metadataType, metadataKey -> will be covered by MetadataInfo class > interestingColumns > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Set partitionKeys; -> Map > {noformat} > org.apache.drill.metastore.PartitionMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > partitionValues (List) > location (String) (for directory level metadata) - directory location > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Set location; -> locations > {noformat} > org.apache.drill.metastore.FileMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > path - path to file > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Path location; - should contain directory to which file belongs > {noformat} > org.apache.drill.metastore.RowGroupMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by >
[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
[ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871469#comment-16871469 ] ASF GitHub Bot commented on DRILL-7271: --- vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore URL: https://github.com/apache/drill/pull/1810#discussion_r296776704 ## File path: metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/PartitionMetadata.java ## @@ -0,0 +1,119 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.metastore.metadata; + +import org.apache.drill.common.expression.SchemaPath; +import org.apache.hadoop.fs.Path; + +import java.util.List; +import java.util.Objects; +import java.util.Set; + +/** + * Represents a metadata for the table part, which corresponds to the specific partition key. + */ +public class PartitionMetadata extends BaseMetadata { + private final SchemaPath column; + private final List partitionValues; + private final Set locations; + private final long lastModifiedTime; + + private PartitionMetadata(PartitionMetadataBuilder builder) { +super(builder); +this.column = builder.column; +this.partitionValues = builder.partitionValues; +this.locations = builder.locations; +this.lastModifiedTime = builder.lastModifiedTime; + } + + /** + * It allows to obtain the column path for this partition + * + * @return column path + */ + public SchemaPath getColumn() { +return column; + } + + /** + * File locations for this partition + * + * @return file locations + */ + public Set getLocations() { +return locations; + } + + /** + * It allows to check the time, when any files were modified. It is in Unix Timestamp + * + * @return last modified time of files + */ + public long getLastModifiedTime() { +return lastModifiedTime; + } + + public List getPartitionValues() { +return partitionValues; + } + + public static PartitionMetadataBuilder builder() { +return new PartitionMetadataBuilder(); + } + + public static class PartitionMetadataBuilder extends BaseMetadataBuilder { +private SchemaPath column; +private List partitionValues; +private Set locations; +private long lastModifiedTime = BaseTableMetadata.NON_DEFINED_LAST_MODIFIED_TIME; + +public PartitionMetadataBuilder withLocations(Set locations) { Review comment: Agree, renamed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor Metadata interfaces and classes to contain all needed information > for the File based Metastore > --- > > Key: DRILL-7271 > URL: https://issues.apache.org/jira/browse/DRILL-7271 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Arina Ielchiieva >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.17.0 > > > 1. Merge info from metadataStatistics + statisticsKinds into one holder: > Map. > 2. Rename hasStatistics to hasDescriptiveStatistics > 3. Remove drill-file-metastore-plugin > 4. Move > org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel > to metadata module, rename to MetadataType and add new value: SEGMENT. > 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder. > 6. Add new info classes: > {noformat} > class TableInfo { > String storagePlugin; > String workspace; > String name; > String type; > String owner; > } > class MetadataInfo { > public static final String GENERAL_INFO_KEY = "GENERAL_INFO"; > public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT"; > MetadataType type (enum); > String key; > String
[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
[ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871455#comment-16871455 ] ASF GitHub Bot commented on DRILL-7271: --- vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore URL: https://github.com/apache/drill/pull/1810#discussion_r296753640 ## File path: metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/BaseTableMetadata.java ## @@ -0,0 +1,143 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.metastore.metadata; + +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.metastore.statistics.ColumnStatistics; +import org.apache.drill.metastore.statistics.StatisticsHolder; +import org.apache.hadoop.fs.Path; + +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.Objects; + +/** + * Base implementation of {@link TableMetadata} interface. + */ +public class BaseTableMetadata extends BaseMetadata implements TableMetadata { + + public static final long NON_DEFINED_LAST_MODIFIED_TIME = -1; + + private final Path location; + private final long lastModifiedTime; + private final Map partitionKeys; + private final List interestingColumns; + + private BaseTableMetadata(BaseTableMetadataBuilder builder) { +super(builder); +this.location = builder.location; +this.partitionKeys = builder.partitionKeys; +this.interestingColumns = builder.interestingColumns; +this.lastModifiedTime = builder.lastModifiedTime; + } + + public boolean isPartitionColumn(String fieldName) { +return partitionKeys.containsKey(fieldName); + } + + boolean isPartitioned() { +return !partitionKeys.isEmpty(); + } + + @Override + public Path getLocation() { +return location; + } + + @Override + public long getLastModifiedTime() { +return lastModifiedTime; + } + + @Override + public List getInterestingColumns() { +return interestingColumns; + } + + @Override + @SuppressWarnings("unchecked") + public BaseTableMetadata cloneWithStats(Map columnStatistics, List tableStatistics) { +Map mergedTableStatistics = new HashMap<>(this.statistics); + +// overrides statistics value for the case when new statistics is exact or existing one was estimated +tableStatistics.stream() +.filter(statisticsHolder -> statisticsHolder.getStatisticsKind().isExact() + || !this.statistics.get(statisticsHolder.getStatisticsKind().getName()).getStatisticsKind().isExact()) +.forEach(statisticsHolder -> mergedTableStatistics.put(statisticsHolder.getStatisticsKind().getName(), statisticsHolder)); + +Map newColumnsStatistics = new HashMap<>(this.columnsStatistics); +this.columnsStatistics.forEach( +(columnName, value) -> newColumnsStatistics.put(columnName, value.cloneWith(columnStatistics.get(columnName; + +return BaseTableMetadata.builder() +.withTableInfo(tableInfo) +.withMetadataInfo(metadataInfo) +.withLocation(location) +.withSchema(schema) +.withColumnsStatistics(newColumnsStatistics) +.withStatistics(mergedTableStatistics.values()) +.withLastModifiedTime(lastModifiedTime) +.withPartitionKeys(partitionKeys) +.withInterestingColumns(interestingColumns) +.build(); + } + + public static BaseTableMetadataBuilder builder() { +return new BaseTableMetadataBuilder(); + } + + public static class BaseTableMetadataBuilder extends BaseMetadataBuilder { +private Path location; +private long lastModifiedTime = NON_DEFINED_LAST_MODIFIED_TIME; +private Map partitionKeys; +private List interestingColumns; + +public BaseTableMetadataBuilder withLocation(Path location) { Review comment: Done, thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For
[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
[ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871456#comment-16871456 ] ASF GitHub Bot commented on DRILL-7271: --- vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore URL: https://github.com/apache/drill/pull/1810#discussion_r296757799 ## File path: metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/BaseTableMetadata.java ## @@ -0,0 +1,143 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.metastore.metadata; + +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.metastore.statistics.ColumnStatistics; +import org.apache.drill.metastore.statistics.StatisticsHolder; +import org.apache.hadoop.fs.Path; + +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.Objects; + +/** + * Base implementation of {@link TableMetadata} interface. + */ +public class BaseTableMetadata extends BaseMetadata implements TableMetadata { + + public static final long NON_DEFINED_LAST_MODIFIED_TIME = -1; + + private final Path location; + private final long lastModifiedTime; + private final Map partitionKeys; + private final List interestingColumns; + + private BaseTableMetadata(BaseTableMetadataBuilder builder) { +super(builder); +this.location = builder.location; +this.partitionKeys = builder.partitionKeys; +this.interestingColumns = builder.interestingColumns; +this.lastModifiedTime = builder.lastModifiedTime; + } + + public boolean isPartitionColumn(String fieldName) { +return partitionKeys.containsKey(fieldName); + } + + boolean isPartitioned() { +return !partitionKeys.isEmpty(); + } + + @Override + public Path getLocation() { +return location; + } + + @Override + public long getLastModifiedTime() { +return lastModifiedTime; + } + + @Override + public List getInterestingColumns() { +return interestingColumns; + } + + @Override + @SuppressWarnings("unchecked") + public BaseTableMetadata cloneWithStats(Map columnStatistics, List tableStatistics) { +Map mergedTableStatistics = new HashMap<>(this.statistics); + +// overrides statistics value for the case when new statistics is exact or existing one was estimated +tableStatistics.stream() +.filter(statisticsHolder -> statisticsHolder.getStatisticsKind().isExact() + || !this.statistics.get(statisticsHolder.getStatisticsKind().getName()).getStatisticsKind().isExact()) +.forEach(statisticsHolder -> mergedTableStatistics.put(statisticsHolder.getStatisticsKind().getName(), statisticsHolder)); + +Map newColumnsStatistics = new HashMap<>(this.columnsStatistics); +this.columnsStatistics.forEach( +(columnName, value) -> newColumnsStatistics.put(columnName, value.cloneWith(columnStatistics.get(columnName; + +return BaseTableMetadata.builder() +.withTableInfo(tableInfo) +.withMetadataInfo(metadataInfo) +.withLocation(location) +.withSchema(schema) +.withColumnsStatistics(newColumnsStatistics) +.withStatistics(mergedTableStatistics.values()) +.withLastModifiedTime(lastModifiedTime) +.withPartitionKeys(partitionKeys) +.withInterestingColumns(interestingColumns) +.build(); + } + + public static BaseTableMetadataBuilder builder() { +return new BaseTableMetadataBuilder(); + } + + public static class BaseTableMetadataBuilder extends BaseMetadataBuilder { +private Path location; +private long lastModifiedTime = NON_DEFINED_LAST_MODIFIED_TIME; +private Map partitionKeys; +private List interestingColumns; + +public BaseTableMetadataBuilder withLocation(Path location) { + this.location = location; + return self(); +} + +public BaseTableMetadataBuilder withLastModifiedTime(long lastModifiedTime) { + this.lastModifiedTime = lastModifiedTime; + return self(); +} + +public BaseTableMetadataBuilder
[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
[ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871459#comment-16871459 ] ASF GitHub Bot commented on DRILL-7271: --- vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore URL: https://github.com/apache/drill/pull/1810#discussion_r296772407 ## File path: metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/MetadataInfo.java ## @@ -0,0 +1,50 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.metastore.metadata; + +/** + * Class which identifies specific metadata. + */ +public class MetadataInfo { + + public static final String GENERAL_INFO_KEY = "GENERAL_INFO"; + public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT"; + public static final String DEFAULT_COLUMN_PREFIX = "_$SEGMENT_"; Review comment: This constant will be used for creating a segment column name to avoid depending on the values of session options for partition column names. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor Metadata interfaces and classes to contain all needed information > for the File based Metastore > --- > > Key: DRILL-7271 > URL: https://issues.apache.org/jira/browse/DRILL-7271 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Arina Ielchiieva >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.17.0 > > > 1. Merge info from metadataStatistics + statisticsKinds into one holder: > Map. > 2. Rename hasStatistics to hasDescriptiveStatistics > 3. Remove drill-file-metastore-plugin > 4. Move > org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel > to metadata module, rename to MetadataType and add new value: SEGMENT. > 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder. > 6. Add new info classes: > {noformat} > class TableInfo { > String storagePlugin; > String workspace; > String name; > String type; > String owner; > } > class MetadataInfo { > public static final String GENERAL_INFO_KEY = "GENERAL_INFO"; > public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT"; > MetadataType type (enum); > String key; > String identifier; > } > {noformat} > 7. Modify existing metadata classes: > org.apache.drill.metastore.FileTableMetadata > {noformat} > missing fields > -- > storagePlugin, workspace, tableType -> will be covered by TableInfo class > metadataType, metadataKey -> will be covered by MetadataInfo class > interestingColumns > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Set partitionKeys; -> Map > {noformat} > org.apache.drill.metastore.PartitionMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > partitionValues (List) > location (String) (for directory level metadata) - directory location > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Set location; -> locations > {noformat} > org.apache.drill.metastore.FileMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > path - path to file > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; >
[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
[ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871454#comment-16871454 ] ASF GitHub Bot commented on DRILL-7271: --- vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore URL: https://github.com/apache/drill/pull/1810#discussion_r296752797 ## File path: metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/BaseMetadata.java ## @@ -0,0 +1,148 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.metastore.metadata; + +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.exec.record.metadata.ColumnMetadata; +import org.apache.drill.exec.record.metadata.TupleMetadata; +import org.apache.drill.metastore.SchemaPathUtils; +import org.apache.drill.metastore.statistics.ColumnStatistics; +import org.apache.drill.metastore.statistics.StatisticsHolder; +import org.apache.drill.metastore.statistics.StatisticsKind; + +import java.util.Collection; +import java.util.Map; +import java.util.Objects; +import java.util.function.Function; +import java.util.stream.Collectors; + +/** + * Common provider of tuple schema, column metadata, and statistics for table, partition, file or row group. + */ +public abstract class BaseMetadata implements Metadata { + protected final TableInfo tableInfo; + protected final MetadataInfo metadataInfo; + protected final TupleMetadata schema; + protected final Map columnsStatistics; + protected final Map statistics; + + protected > BaseMetadata(BaseMetadataBuilder builder) { +this.tableInfo = builder.tableInfo; +this.metadataInfo = builder.metadataInfo; +this.schema = builder.schema; +this.columnsStatistics = builder.columnsStatistics; +this.statistics = builder.statistics.stream() +.collect(Collectors.toMap( +statistic -> statistic.getStatisticsKind().getName(), +Function.identity(), +(a, b) -> a.getStatisticsKind().isExact() ? a : b)); + } + + @Override + public Map getColumnsStatistics() { +return columnsStatistics; + } + + @Override + public ColumnStatistics getColumnStatistics(SchemaPath columnName) { +return columnsStatistics.get(columnName); + } + + @Override + public TupleMetadata getSchema() { +return schema; + } + + @Override + @SuppressWarnings("unchecked") + public V getStatistic(StatisticsKind statisticsKind) { +StatisticsHolder statisticsHolder = statistics.get(statisticsKind.getName()); +return statisticsHolder != null ? statisticsHolder.getStatisticsValue() : null; + } + + @Override + public boolean containsExactStatistics(StatisticsKind statisticsKind) { +StatisticsHolder statisticsHolder = statistics.get(statisticsKind.getName()); +return statisticsHolder != null && statisticsHolder.getStatisticsKind().isExact(); + } + + @Override + @SuppressWarnings("unchecked") + public V getStatisticsForColumn(SchemaPath columnName, StatisticsKind statisticsKind) { +return (V) columnsStatistics.get(columnName).get(statisticsKind); + } + + @Override + public ColumnMetadata getColumn(SchemaPath name) { +return SchemaPathUtils.getColumnMetadata(name, schema); + } + + @Override + public TableInfo getTableInfo() { +return tableInfo; + } + + @Override + public MetadataInfo getMetadataInfo() { +return metadataInfo; + } + + public static abstract class BaseMetadataBuilder> { +protected TableInfo tableInfo; +protected MetadataInfo metadataInfo; +protected TupleMetadata schema; +protected Map columnsStatistics; +protected Collection statistics; + +public T withTableInfo(TableInfo tableInfo) { Review comment: Thanks, removed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org >
[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
[ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871446#comment-16871446 ] ASF GitHub Bot commented on DRILL-7271: --- vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore URL: https://github.com/apache/drill/pull/1810#discussion_r296737969 ## File path: metastore/metastore-api/src/main/java/org/apache/drill/metastore/TableMetadataUtils.java ## @@ -0,0 +1,139 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.metastore; + +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.common.types.TypeProtos; +import org.apache.drill.metastore.metadata.BaseMetadata; +import org.apache.drill.metastore.metadata.TableMetadata; +import org.apache.drill.metastore.statistics.CollectableColumnStatisticsKind; +import org.apache.drill.metastore.statistics.ColumnStatistics; +import org.apache.drill.metastore.statistics.ColumnStatisticsKind; +import org.apache.drill.metastore.statistics.StatisticsHolder; +import org.apache.drill.metastore.statistics.TableStatisticsKind; +import org.apache.drill.shaded.guava.com.google.common.primitives.UnsignedBytes; + +import java.util.ArrayList; +import java.util.Collection; +import java.util.Collections; +import java.util.Comparator; +import java.util.HashMap; +import java.util.Iterator; +import java.util.List; +import java.util.Map; +import java.util.Set; + +public class TableMetadataUtils { + + private TableMetadataUtils() { Review comment: Agree, removed it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor Metadata interfaces and classes to contain all needed information > for the File based Metastore > --- > > Key: DRILL-7271 > URL: https://issues.apache.org/jira/browse/DRILL-7271 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Arina Ielchiieva >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.17.0 > > > 1. Merge info from metadataStatistics + statisticsKinds into one holder: > Map. > 2. Rename hasStatistics to hasDescriptiveStatistics > 3. Remove drill-file-metastore-plugin > 4. Move > org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel > to metadata module, rename to MetadataType and add new value: SEGMENT. > 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder. > 6. Add new info classes: > {noformat} > class TableInfo { > String storagePlugin; > String workspace; > String name; > String type; > String owner; > } > class MetadataInfo { > public static final String GENERAL_INFO_KEY = "GENERAL_INFO"; > public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT"; > MetadataType type (enum); > String key; > String identifier; > } > {noformat} > 7. Modify existing metadata classes: > org.apache.drill.metastore.FileTableMetadata > {noformat} > missing fields > -- > storagePlugin, workspace, tableType -> will be covered by TableInfo class > metadataType, metadataKey -> will be covered by MetadataInfo class > interestingColumns > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Set partitionKeys; -> Map > {noformat} > org.apache.drill.metastore.PartitionMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > partitionValues (List) > location (String) (for directory level metadata) - directory location > fields to
[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
[ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871447#comment-16871447 ] ASF GitHub Bot commented on DRILL-7271: --- vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore URL: https://github.com/apache/drill/pull/1810#discussion_r296746956 ## File path: metastore/metastore-api/src/main/java/org/apache/drill/metastore/statistics/StatisticsHolder.java ## @@ -0,0 +1,76 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.metastore.statistics; + +import com.fasterxml.jackson.annotation.JsonCreator; +import com.fasterxml.jackson.annotation.JsonInclude; +import com.fasterxml.jackson.annotation.JsonProperty; +import com.fasterxml.jackson.annotation.JsonTypeInfo; +import com.fasterxml.jackson.core.JsonProcessingException; +import com.fasterxml.jackson.core.util.DefaultPrettyPrinter; +import com.fasterxml.jackson.databind.ObjectMapper; +import com.fasterxml.jackson.databind.ObjectReader; +import com.fasterxml.jackson.databind.ObjectWriter; + +import java.io.IOException; + +/** + * Class-holder for statistics kind and its value. + * + * @param Type of statistics value + */ +@JsonInclude(JsonInclude.Include.NON_DEFAULT) +public class StatisticsHolder { + + public static final ObjectWriter OBJECT_WRITER = new ObjectMapper().setDefaultPrettyPrinter(new DefaultPrettyPrinter()).writer(); + private static final ObjectReader OBJECT_READER = new ObjectMapper().readerFor(StatisticsHolder.class); + + private final T statisticsValue; + private final BaseStatisticsKind statisticsKind; + + @JsonCreator + public StatisticsHolder(@JsonProperty("statisticsValue") T statisticsValue, + @JsonProperty("statisticsKind") BaseStatisticsKind statisticsKind) { +this.statisticsValue = statisticsValue; +this.statisticsKind = statisticsKind; + } + + public StatisticsHolder(T statisticsValue, + StatisticsKind statisticsKind) { +this.statisticsValue = statisticsValue; +this.statisticsKind = (BaseStatisticsKind) statisticsKind; + } + + @JsonTypeInfo(use = JsonTypeInfo.Id.CLASS, +include = JsonTypeInfo.As.WRAPPER_OBJECT) + public T getStatisticsValue() { +return statisticsValue; + } + + public StatisticsKind getStatisticsKind() { +return statisticsKind; + } + + public static StatisticsHolder deserialize(String serialized) throws IOException { +return OBJECT_READER.readValue(serialized); + } + + public static String serialize(StatisticsHolder statisticsHolder) throws JsonProcessingException { Review comment: Thanks, done for this class and for `ColumnStatistics` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor Metadata interfaces and classes to contain all needed information > for the File based Metastore > --- > > Key: DRILL-7271 > URL: https://issues.apache.org/jira/browse/DRILL-7271 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Arina Ielchiieva >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.17.0 > > > 1. Merge info from metadataStatistics + statisticsKinds into one holder: > Map. > 2. Rename hasStatistics to hasDescriptiveStatistics > 3. Remove drill-file-metastore-plugin > 4. Move > org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel > to metadata module, rename to MetadataType and add new value: SEGMENT. > 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder. > 6. Add new info classes: > {noformat} > class TableInfo { > String storagePlugin; >
[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
[ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871465#comment-16871465 ] ASF GitHub Bot commented on DRILL-7271: --- vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore URL: https://github.com/apache/drill/pull/1810#discussion_r296786531 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/AbstractParquetGroupScan.java ## @@ -210,6 +213,17 @@ public int getMaxParallelizationWidth() { return readEntries; } + /** + * {@inheritDoc} + * + * - if file metadata was pruned, prunes underlying metadata Review comment: Yes, it can. Fixed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor Metadata interfaces and classes to contain all needed information > for the File based Metastore > --- > > Key: DRILL-7271 > URL: https://issues.apache.org/jira/browse/DRILL-7271 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Arina Ielchiieva >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.17.0 > > > 1. Merge info from metadataStatistics + statisticsKinds into one holder: > Map. > 2. Rename hasStatistics to hasDescriptiveStatistics > 3. Remove drill-file-metastore-plugin > 4. Move > org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel > to metadata module, rename to MetadataType and add new value: SEGMENT. > 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder. > 6. Add new info classes: > {noformat} > class TableInfo { > String storagePlugin; > String workspace; > String name; > String type; > String owner; > } > class MetadataInfo { > public static final String GENERAL_INFO_KEY = "GENERAL_INFO"; > public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT"; > MetadataType type (enum); > String key; > String identifier; > } > {noformat} > 7. Modify existing metadata classes: > org.apache.drill.metastore.FileTableMetadata > {noformat} > missing fields > -- > storagePlugin, workspace, tableType -> will be covered by TableInfo class > metadataType, metadataKey -> will be covered by MetadataInfo class > interestingColumns > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Set partitionKeys; -> Map > {noformat} > org.apache.drill.metastore.PartitionMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > partitionValues (List) > location (String) (for directory level metadata) - directory location > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Set location; -> locations > {noformat} > org.apache.drill.metastore.FileMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > path - path to file > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Path location; - should contain directory to which file belongs > {noformat} > org.apache.drill.metastore.RowGroupMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > path - path to file > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Path location; - should contain directory to which file belongs > {noformat} > 8. Remove org.apache.drill.exec package from metastore module. > 9. Rename ColumnStatisticsImpl class. > 10. Separate existing classes in org.apache.drill.metastore package into > sub-packages. > 11. Rename FileTableMetadata -> BaseTableMetadata > 12. TableMetadataProvider.getNonInterestingColumnsMeta() -> > getNonInterestingColumnsMetadata > 13. Introduce segment-level metadata class: > {noformat} > class SegmentMetadata { > TableInfo tableInfo; > MetadataInfo metadataInfo; > SchemaPath column; > TupleMetadata
[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
[ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871457#comment-16871457 ] ASF GitHub Bot commented on DRILL-7271: --- vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore URL: https://github.com/apache/drill/pull/1810#discussion_r296761257 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/AbstractGroupScanWithMetadata.java ## @@ -221,6 +229,31 @@ public void setFilterForRuntime(LogicalExpression filterExpr, OptimizerRulesCont if ( ! skipRuntimePruning ) { setFilter(filterExpr); } } + /** + * Applies specified filter {@code filterExpr} to current group scan and produces filtering at: + * + * table level: + * - if filter matches all the the data or prunes all the data, sets corresponding value to Review comment: Agree, thanks for pointing this, replaced it with nested lists. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor Metadata interfaces and classes to contain all needed information > for the File based Metastore > --- > > Key: DRILL-7271 > URL: https://issues.apache.org/jira/browse/DRILL-7271 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Arina Ielchiieva >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.17.0 > > > 1. Merge info from metadataStatistics + statisticsKinds into one holder: > Map. > 2. Rename hasStatistics to hasDescriptiveStatistics > 3. Remove drill-file-metastore-plugin > 4. Move > org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel > to metadata module, rename to MetadataType and add new value: SEGMENT. > 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder. > 6. Add new info classes: > {noformat} > class TableInfo { > String storagePlugin; > String workspace; > String name; > String type; > String owner; > } > class MetadataInfo { > public static final String GENERAL_INFO_KEY = "GENERAL_INFO"; > public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT"; > MetadataType type (enum); > String key; > String identifier; > } > {noformat} > 7. Modify existing metadata classes: > org.apache.drill.metastore.FileTableMetadata > {noformat} > missing fields > -- > storagePlugin, workspace, tableType -> will be covered by TableInfo class > metadataType, metadataKey -> will be covered by MetadataInfo class > interestingColumns > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Set partitionKeys; -> Map > {noformat} > org.apache.drill.metastore.PartitionMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > partitionValues (List) > location (String) (for directory level metadata) - directory location > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Set location; -> locations > {noformat} > org.apache.drill.metastore.FileMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > path - path to file > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Path location; - should contain directory to which file belongs > {noformat} > org.apache.drill.metastore.RowGroupMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > path - path to file > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Path location; - should contain directory to which file belongs > {noformat} > 8. Remove org.apache.drill.exec package from metastore module. > 9. Rename ColumnStatisticsImpl class. > 10. Separate existing classes in org.apache.drill.metastore package into > sub-packages. > 11. Rename FileTableMetadata -> BaseTableMetadata > 12.
[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
[ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871448#comment-16871448 ] ASF GitHub Bot commented on DRILL-7271: --- vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore URL: https://github.com/apache/drill/pull/1810#discussion_r296746701 ## File path: metastore/metastore-api/src/main/java/org/apache/drill/metastore/statistics/StatisticsHolder.java ## @@ -0,0 +1,76 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.metastore.statistics; + +import com.fasterxml.jackson.annotation.JsonCreator; +import com.fasterxml.jackson.annotation.JsonInclude; +import com.fasterxml.jackson.annotation.JsonProperty; +import com.fasterxml.jackson.annotation.JsonTypeInfo; +import com.fasterxml.jackson.core.JsonProcessingException; +import com.fasterxml.jackson.core.util.DefaultPrettyPrinter; +import com.fasterxml.jackson.databind.ObjectMapper; +import com.fasterxml.jackson.databind.ObjectReader; +import com.fasterxml.jackson.databind.ObjectWriter; + +import java.io.IOException; + +/** + * Class-holder for statistics kind and its value. + * + * @param Type of statistics value + */ +@JsonInclude(JsonInclude.Include.NON_DEFAULT) +public class StatisticsHolder { + + public static final ObjectWriter OBJECT_WRITER = new ObjectMapper().setDefaultPrettyPrinter(new DefaultPrettyPrinter()).writer(); + private static final ObjectReader OBJECT_READER = new ObjectMapper().readerFor(StatisticsHolder.class); + + private final T statisticsValue; + private final BaseStatisticsKind statisticsKind; + + @JsonCreator + public StatisticsHolder(@JsonProperty("statisticsValue") T statisticsValue, + @JsonProperty("statisticsKind") BaseStatisticsKind statisticsKind) { +this.statisticsValue = statisticsValue; +this.statisticsKind = statisticsKind; + } + + public StatisticsHolder(T statisticsValue, + StatisticsKind statisticsKind) { +this.statisticsValue = statisticsValue; +this.statisticsKind = (BaseStatisticsKind) statisticsKind; + } + + @JsonTypeInfo(use = JsonTypeInfo.Id.CLASS, +include = JsonTypeInfo.As.WRAPPER_OBJECT) + public T getStatisticsValue() { +return statisticsValue; + } + + public StatisticsKind getStatisticsKind() { +return statisticsKind; + } + + public static StatisticsHolder deserialize(String serialized) throws IOException { Review comment: Thanks, done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor Metadata interfaces and classes to contain all needed information > for the File based Metastore > --- > > Key: DRILL-7271 > URL: https://issues.apache.org/jira/browse/DRILL-7271 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Arina Ielchiieva >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.17.0 > > > 1. Merge info from metadataStatistics + statisticsKinds into one holder: > Map. > 2. Rename hasStatistics to hasDescriptiveStatistics > 3. Remove drill-file-metastore-plugin > 4. Move > org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel > to metadata module, rename to MetadataType and add new value: SEGMENT. > 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder. > 6. Add new info classes: > {noformat} > class TableInfo { > String storagePlugin; > String workspace; > String name; > String type; > String owner; > } > class MetadataInfo { > public static final String GENERAL_INFO_KEY = "GENERAL_INFO"; > public static final String
[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
[ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871444#comment-16871444 ] ASF GitHub Bot commented on DRILL-7271: --- vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore URL: https://github.com/apache/drill/pull/1810#discussion_r296737602 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillStatsTable.java ## @@ -452,53 +456,54 @@ public static ObjectMapper getMapper() { .addDeserializer(TypeProtos.MajorType.class, new MajorTypeSerDe.De()) .addDeserializer(SchemaPath.class, new SchemaPath.De()); mapper.registerModule(deModule); +mapper.registerSubtypes(new NamedType(NumericEquiDepthHistogram.class, "numeric-equi-depth")); Review comment: It would be nice, but I think I can break backward compatibility since it was defined earlier here: https://github.com/apache/drill/blob/05a1a3a888a7408bde683acc36f406fbd2459254/exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/Histogram.java#L31 So all previously created stats files wouldn't be deserialized correctly. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor Metadata interfaces and classes to contain all needed information > for the File based Metastore > --- > > Key: DRILL-7271 > URL: https://issues.apache.org/jira/browse/DRILL-7271 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Arina Ielchiieva >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.17.0 > > > 1. Merge info from metadataStatistics + statisticsKinds into one holder: > Map. > 2. Rename hasStatistics to hasDescriptiveStatistics > 3. Remove drill-file-metastore-plugin > 4. Move > org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel > to metadata module, rename to MetadataType and add new value: SEGMENT. > 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder. > 6. Add new info classes: > {noformat} > class TableInfo { > String storagePlugin; > String workspace; > String name; > String type; > String owner; > } > class MetadataInfo { > public static final String GENERAL_INFO_KEY = "GENERAL_INFO"; > public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT"; > MetadataType type (enum); > String key; > String identifier; > } > {noformat} > 7. Modify existing metadata classes: > org.apache.drill.metastore.FileTableMetadata > {noformat} > missing fields > -- > storagePlugin, workspace, tableType -> will be covered by TableInfo class > metadataType, metadataKey -> will be covered by MetadataInfo class > interestingColumns > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Set partitionKeys; -> Map > {noformat} > org.apache.drill.metastore.PartitionMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > partitionValues (List) > location (String) (for directory level metadata) - directory location > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Set location; -> locations > {noformat} > org.apache.drill.metastore.FileMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > path - path to file > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Path location; - should contain directory to which file belongs > {noformat} > org.apache.drill.metastore.RowGroupMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > path - path to file > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Path location; - should contain directory to which file belongs > {noformat} > 8. Remove org.apache.drill.exec package from metastore
[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
[ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871463#comment-16871463 ] ASF GitHub Bot commented on DRILL-7271: --- vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore URL: https://github.com/apache/drill/pull/1810#discussion_r296765872 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/AbstractGroupScanWithMetadata.java ## @@ -666,27 +733,66 @@ protected void filterTableMetadata(FilterPredicate filterPredicate, Set Refactor Metadata interfaces and classes to contain all needed information > for the File based Metastore > --- > > Key: DRILL-7271 > URL: https://issues.apache.org/jira/browse/DRILL-7271 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Arina Ielchiieva >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.17.0 > > > 1. Merge info from metadataStatistics + statisticsKinds into one holder: > Map. > 2. Rename hasStatistics to hasDescriptiveStatistics > 3. Remove drill-file-metastore-plugin > 4. Move > org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel > to metadata module, rename to MetadataType and add new value: SEGMENT. > 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder. > 6. Add new info classes: > {noformat} > class TableInfo { > String storagePlugin; > String workspace; > String name; > String type; > String owner; > } > class MetadataInfo { > public static final String GENERAL_INFO_KEY = "GENERAL_INFO"; > public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT"; > MetadataType type (enum); > String key; > String identifier; > } > {noformat} > 7. Modify existing metadata classes: > org.apache.drill.metastore.FileTableMetadata > {noformat} > missing fields > -- > storagePlugin, workspace, tableType -> will be covered by TableInfo class > metadataType, metadataKey -> will be covered by MetadataInfo class > interestingColumns > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Set partitionKeys; -> Map > {noformat} > org.apache.drill.metastore.PartitionMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > partitionValues (List) > location (String) (for directory level metadata) - directory location > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Set location; -> locations > {noformat} > org.apache.drill.metastore.FileMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > path - path to file > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Path location; - should contain directory to which file belongs > {noformat} > org.apache.drill.metastore.RowGroupMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > path - path to file > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Path location; - should contain directory to which file belongs > {noformat} > 8. Remove org.apache.drill.exec package from metastore module. > 9. Rename ColumnStatisticsImpl class. > 10. Separate existing classes in org.apache.drill.metastore package into > sub-packages. > 11. Rename FileTableMetadata -> BaseTableMetadata > 12. TableMetadataProvider.getNonInterestingColumnsMeta() -> > getNonInterestingColumnsMetadata > 13. Introduce segment-level metadata class: > {noformat} > class SegmentMetadata { > TableInfo tableInfo; > MetadataInfo metadataInfo; > SchemaPath column; > TupleMetadata schema; > String location; > Map columnsStatistics; > Map statistics; > List partitionValues; > List locations; > long lastModifiedTime; > } > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
[ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871460#comment-16871460 ] ASF GitHub Bot commented on DRILL-7271: --- vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore URL: https://github.com/apache/drill/pull/1810#discussion_r296765020 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/AbstractGroupScanWithMetadata.java ## @@ -572,34 +626,39 @@ public GroupScanWithMetadataFilterer(AbstractGroupScanWithMetadata source) { */ public abstract AbstractGroupScanWithMetadata build(); -public GroupScanWithMetadataFilterer withTable(TableMetadata tableMetadata) { +public B withTable(TableMetadata tableMetadata) { this.tableMetadata = tableMetadata; - return this; + return self(); Review comment: `self()` method was introduced to return a specific type of implementation instead of the base type. So we don't need to add casts for the case when `this` instance should be returned. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor Metadata interfaces and classes to contain all needed information > for the File based Metastore > --- > > Key: DRILL-7271 > URL: https://issues.apache.org/jira/browse/DRILL-7271 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Arina Ielchiieva >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.17.0 > > > 1. Merge info from metadataStatistics + statisticsKinds into one holder: > Map. > 2. Rename hasStatistics to hasDescriptiveStatistics > 3. Remove drill-file-metastore-plugin > 4. Move > org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel > to metadata module, rename to MetadataType and add new value: SEGMENT. > 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder. > 6. Add new info classes: > {noformat} > class TableInfo { > String storagePlugin; > String workspace; > String name; > String type; > String owner; > } > class MetadataInfo { > public static final String GENERAL_INFO_KEY = "GENERAL_INFO"; > public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT"; > MetadataType type (enum); > String key; > String identifier; > } > {noformat} > 7. Modify existing metadata classes: > org.apache.drill.metastore.FileTableMetadata > {noformat} > missing fields > -- > storagePlugin, workspace, tableType -> will be covered by TableInfo class > metadataType, metadataKey -> will be covered by MetadataInfo class > interestingColumns > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Set partitionKeys; -> Map > {noformat} > org.apache.drill.metastore.PartitionMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > partitionValues (List) > location (String) (for directory level metadata) - directory location > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Set location; -> locations > {noformat} > org.apache.drill.metastore.FileMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > path - path to file > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Path location; - should contain directory to which file belongs > {noformat} > org.apache.drill.metastore.RowGroupMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > path - path to file > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Path location; - should contain directory to which file belongs > {noformat} > 8. Remove org.apache.drill.exec package from metastore module. > 9. Rename ColumnStatisticsImpl class. > 10. Separate existing classes in
[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
[ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871464#comment-16871464 ] ASF GitHub Bot commented on DRILL-7271: --- vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore URL: https://github.com/apache/drill/pull/1810#discussion_r296776542 ## File path: metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/PartitionMetadata.java ## @@ -0,0 +1,119 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.metastore.metadata; + +import org.apache.drill.common.expression.SchemaPath; +import org.apache.hadoop.fs.Path; + +import java.util.List; +import java.util.Objects; +import java.util.Set; + +/** + * Represents a metadata for the table part, which corresponds to the specific partition key. + */ +public class PartitionMetadata extends BaseMetadata { + private final SchemaPath column; + private final List partitionValues; + private final Set locations; + private final long lastModifiedTime; + + private PartitionMetadata(PartitionMetadataBuilder builder) { +super(builder); +this.column = builder.column; +this.partitionValues = builder.partitionValues; +this.locations = builder.locations; +this.lastModifiedTime = builder.lastModifiedTime; + } + + /** + * It allows to obtain the column path for this partition + * + * @return column path + */ + public SchemaPath getColumn() { +return column; + } + + /** + * File locations for this partition + * + * @return file locations + */ + public Set getLocations() { +return locations; + } + + /** + * It allows to check the time, when any files were modified. It is in Unix Timestamp Review comment: Thanks, done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor Metadata interfaces and classes to contain all needed information > for the File based Metastore > --- > > Key: DRILL-7271 > URL: https://issues.apache.org/jira/browse/DRILL-7271 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Arina Ielchiieva >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.17.0 > > > 1. Merge info from metadataStatistics + statisticsKinds into one holder: > Map. > 2. Rename hasStatistics to hasDescriptiveStatistics > 3. Remove drill-file-metastore-plugin > 4. Move > org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel > to metadata module, rename to MetadataType and add new value: SEGMENT. > 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder. > 6. Add new info classes: > {noformat} > class TableInfo { > String storagePlugin; > String workspace; > String name; > String type; > String owner; > } > class MetadataInfo { > public static final String GENERAL_INFO_KEY = "GENERAL_INFO"; > public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT"; > MetadataType type (enum); > String key; > String identifier; > } > {noformat} > 7. Modify existing metadata classes: > org.apache.drill.metastore.FileTableMetadata > {noformat} > missing fields > -- > storagePlugin, workspace, tableType -> will be covered by TableInfo class > metadataType, metadataKey -> will be covered by MetadataInfo class > interestingColumns > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Set partitionKeys; -> Map > {noformat} > org.apache.drill.metastore.PartitionMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by
[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
[ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871442#comment-16871442 ] ASF GitHub Bot commented on DRILL-7271: --- vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore URL: https://github.com/apache/drill/pull/1810#discussion_r296731422 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/expr/StatisticsProvider.java ## @@ -218,89 +201,85 @@ public ColumnStatistics visitFunctionHolderExpression(FunctionHolderExpression h ValueHolder minFuncHolder = InterpreterEvaluator.evaluateFunction(interpreter, args1, holderExpr.getName()); ValueHolder maxFuncHolder = InterpreterEvaluator.evaluateFunction(interpreter, args2, holderExpr.getName()); - MinMaxStatistics statistics; switch (destType) { case INT: - statistics = new MinMaxStatistics<>(((IntHolder) minFuncHolder).value, ((IntHolder) maxFuncHolder).value, Integer::compareTo); - break; + return StatisticsProvider.getColumnStatistics( + ((IntHolder) minFuncHolder).value, + ((IntHolder) maxFuncHolder).value, + ColumnStatisticsKind.NULLS_COUNT.getFrom(input), + destType); case BIGINT: - statistics = new MinMaxStatistics<>(((BigIntHolder) minFuncHolder).value, ((BigIntHolder) maxFuncHolder).value, Long::compareTo); - break; + return StatisticsProvider.getColumnStatistics( + ((BigIntHolder) minFuncHolder).value, + ((BigIntHolder) maxFuncHolder).value, + ColumnStatisticsKind.NULLS_COUNT.getFrom(input), + destType); case FLOAT4: - statistics = new MinMaxStatistics<>(((Float4Holder) minFuncHolder).value, ((Float4Holder) maxFuncHolder).value, Float::compareTo); - break; + return StatisticsProvider.getColumnStatistics( + ((Float4Holder) minFuncHolder).value, + ((Float4Holder) maxFuncHolder).value, + ColumnStatisticsKind.NULLS_COUNT.getFrom(input), + destType); case FLOAT8: - statistics = new MinMaxStatistics<>(((Float8Holder) minFuncHolder).value, ((Float8Holder) maxFuncHolder).value, Double::compareTo); - break; + return StatisticsProvider.getColumnStatistics( + ((Float8Holder) minFuncHolder).value, + ((Float8Holder) maxFuncHolder).value, + ColumnStatisticsKind.NULLS_COUNT.getFrom(input), + destType); case TIMESTAMP: - statistics = new MinMaxStatistics<>(((TimeStampHolder) minFuncHolder).value, ((TimeStampHolder) maxFuncHolder).value, Long::compareTo); - break; + return StatisticsProvider.getColumnStatistics( + ((TimeStampHolder) minFuncHolder).value, + ((TimeStampHolder) maxFuncHolder).value, + ColumnStatisticsKind.NULLS_COUNT.getFrom(input), + destType); default: return null; } - statistics.setNullsCount((long) input.getStatistic(ColumnStatisticsKind.NULLS_COUNT)); - return statistics; } catch (Exception e) { - throw new DrillRuntimeException("Error in evaluating function of " + holderExpr.getName() ); + throw new DrillRuntimeException("Error in evaluating function of " + holderExpr.getName()); } } - public static class MinMaxStatistics implements ColumnStatistics { -private final V minVal; -private final V maxVal; -private final Comparator valueComparator; -private long nullsCount; - -public MinMaxStatistics(V minVal, V maxVal, Comparator valueComparator) { - this.minVal = minVal; - this.maxVal = maxVal; - this.valueComparator = valueComparator; -} - -@Override -public Object getStatistic(StatisticsKind statisticsKind) { - switch (statisticsKind.getName()) { -case ExactStatisticsConstants.MIN_VALUE: - return minVal; -case ExactStatisticsConstants.MAX_VALUE: - return maxVal; -case ExactStatisticsConstants.NULLS_COUNT: - return nullsCount; -default: - return null; - } -} - -@Override -public boolean containsStatistic(StatisticsKind statisticsKind) { - switch (statisticsKind.getName()) { -case ExactStatisticsConstants.MIN_VALUE: -case ExactStatisticsConstants.MAX_VALUE: -case ExactStatisticsConstants.NULLS_COUNT: - return true; -default: - return false; - } -} - -@Override -public boolean containsExactStatistics(StatisticsKind statisticsKind) { - return true; -} - -@Override -public Comparator getValueComparator() { - return valueComparator; -
[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
[ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871467#comment-16871467 ] ASF GitHub Bot commented on DRILL-7271: --- vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore URL: https://github.com/apache/drill/pull/1810#discussion_r296787766 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/AbstractParquetGroupScan.java ## @@ -281,31 +265,54 @@ public AbstractGroupScanWithMetadata applyFilter(LogicalExpression filterExpr, U logger.debug("All row groups have been filtered out. Add back one to get schema from scanner"); + Map segmentsMap = getNextOrEmpty(getSegmentsMetadata().values()).stream() + .collect(Collectors.toMap(SegmentMetadata::getPath, Function.identity())); + Map filesMap = getNextOrEmpty(getFilesMetadata().values()).stream() - .collect(Collectors.toMap(FileMetadata::getLocation, Function.identity())); + .collect(Collectors.toMap(FileMetadata::getPath, Function.identity())); Multimap rowGroupsMap = LinkedListMultimap.create(); - getNextOrEmpty(getRowGroupsMetadata().values()).forEach(entry -> rowGroupsMap.put(entry.getLocation(), entry)); + getNextOrEmpty(getRowGroupsMetadata().values()).forEach(entry -> rowGroupsMap.put(entry.getPath(), entry)); - builder.withRowGroups(rowGroupsMap) + filteredMetadata.withRowGroups(rowGroupsMap) .withTable(getTableMetadata()) + .withSegments(segmentsMap) .withPartitions(getNextOrEmpty(getPartitionsMetadata())) .withNonInterestingColumns(getNonInterestingColumnsMetadata()) .withFiles(filesMap) .withMatching(false); } -if (builder.getOverflowLevel() != MetadataLevel.NONE) { - logger.warn("applyFilter {} wasn't able to do pruning for all metadata levels filter condition, since metadata count for " + -"{} level exceeds `planner.store.parquet.rowgroup.filter.pushdown.threshold` value.\n" + -"But underlying metadata was pruned without filter expression according to the metadata with above level.", - ExpressionStringBuilder.toString(filterExpr), builder.getOverflowLevel()); +if (filteredMetadata.getOverflowLevel() != MetadataType.NONE) { + if (logger.isWarnEnabled()) { Review comment: Agree, this is very unlikely) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor Metadata interfaces and classes to contain all needed information > for the File based Metastore > --- > > Key: DRILL-7271 > URL: https://issues.apache.org/jira/browse/DRILL-7271 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Arina Ielchiieva >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.17.0 > > > 1. Merge info from metadataStatistics + statisticsKinds into one holder: > Map. > 2. Rename hasStatistics to hasDescriptiveStatistics > 3. Remove drill-file-metastore-plugin > 4. Move > org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel > to metadata module, rename to MetadataType and add new value: SEGMENT. > 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder. > 6. Add new info classes: > {noformat} > class TableInfo { > String storagePlugin; > String workspace; > String name; > String type; > String owner; > } > class MetadataInfo { > public static final String GENERAL_INFO_KEY = "GENERAL_INFO"; > public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT"; > MetadataType type (enum); > String key; > String identifier; > } > {noformat} > 7. Modify existing metadata classes: > org.apache.drill.metastore.FileTableMetadata > {noformat} > missing fields > -- > storagePlugin, workspace, tableType -> will be covered by TableInfo class > metadataType, metadataKey -> will be covered by MetadataInfo class > interestingColumns > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Set partitionKeys; -> Map > {noformat} > org.apache.drill.metastore.PartitionMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class
[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
[ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871470#comment-16871470 ] ASF GitHub Bot commented on DRILL-7271: --- vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore URL: https://github.com/apache/drill/pull/1810#discussion_r296779015 ## File path: metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/TableInfo.java ## @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.metastore.metadata; + +/** + * General table information. + */ +public class TableInfo { + public static final String UNKNOWN = "UNKNOWN"; + public static final TableInfo UNKNOWN_TABLE_INFO = new TableInfo(UNKNOWN, UNKNOWN, UNKNOWN, UNKNOWN, UNKNOWN); + + private final String storagePlugin; + private final String workspace; + private final String name; + private final String type; + private final String owner; + + public TableInfo(String storagePlugin, String workspace, String name, String type, String owner) { Review comment: Thanks, done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor Metadata interfaces and classes to contain all needed information > for the File based Metastore > --- > > Key: DRILL-7271 > URL: https://issues.apache.org/jira/browse/DRILL-7271 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Arina Ielchiieva >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.17.0 > > > 1. Merge info from metadataStatistics + statisticsKinds into one holder: > Map. > 2. Rename hasStatistics to hasDescriptiveStatistics > 3. Remove drill-file-metastore-plugin > 4. Move > org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel > to metadata module, rename to MetadataType and add new value: SEGMENT. > 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder. > 6. Add new info classes: > {noformat} > class TableInfo { > String storagePlugin; > String workspace; > String name; > String type; > String owner; > } > class MetadataInfo { > public static final String GENERAL_INFO_KEY = "GENERAL_INFO"; > public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT"; > MetadataType type (enum); > String key; > String identifier; > } > {noformat} > 7. Modify existing metadata classes: > org.apache.drill.metastore.FileTableMetadata > {noformat} > missing fields > -- > storagePlugin, workspace, tableType -> will be covered by TableInfo class > metadataType, metadataKey -> will be covered by MetadataInfo class > interestingColumns > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Set partitionKeys; -> Map > {noformat} > org.apache.drill.metastore.PartitionMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > partitionValues (List) > location (String) (for directory level metadata) - directory location > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Set location; -> locations > {noformat} > org.apache.drill.metastore.FileMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > path - path to file > fields to modify >
[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
[ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871451#comment-16871451 ] ASF GitHub Bot commented on DRILL-7271: --- vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore URL: https://github.com/apache/drill/pull/1810#discussion_r296753052 ## File path: metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/BaseMetadata.java ## @@ -0,0 +1,148 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.metastore.metadata; + +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.exec.record.metadata.ColumnMetadata; +import org.apache.drill.exec.record.metadata.TupleMetadata; +import org.apache.drill.metastore.SchemaPathUtils; +import org.apache.drill.metastore.statistics.ColumnStatistics; +import org.apache.drill.metastore.statistics.StatisticsHolder; +import org.apache.drill.metastore.statistics.StatisticsKind; + +import java.util.Collection; +import java.util.Map; +import java.util.Objects; +import java.util.function.Function; +import java.util.stream.Collectors; + +/** + * Common provider of tuple schema, column metadata, and statistics for table, partition, file or row group. + */ +public abstract class BaseMetadata implements Metadata { + protected final TableInfo tableInfo; + protected final MetadataInfo metadataInfo; + protected final TupleMetadata schema; + protected final Map columnsStatistics; + protected final Map statistics; + + protected > BaseMetadata(BaseMetadataBuilder builder) { +this.tableInfo = builder.tableInfo; +this.metadataInfo = builder.metadataInfo; +this.schema = builder.schema; +this.columnsStatistics = builder.columnsStatistics; +this.statistics = builder.statistics.stream() +.collect(Collectors.toMap( +statistic -> statistic.getStatisticsKind().getName(), +Function.identity(), +(a, b) -> a.getStatisticsKind().isExact() ? a : b)); + } + + @Override + public Map getColumnsStatistics() { +return columnsStatistics; + } + + @Override + public ColumnStatistics getColumnStatistics(SchemaPath columnName) { +return columnsStatistics.get(columnName); + } + + @Override + public TupleMetadata getSchema() { +return schema; + } + + @Override + @SuppressWarnings("unchecked") + public V getStatistic(StatisticsKind statisticsKind) { +StatisticsHolder statisticsHolder = statistics.get(statisticsKind.getName()); +return statisticsHolder != null ? statisticsHolder.getStatisticsValue() : null; + } + + @Override + public boolean containsExactStatistics(StatisticsKind statisticsKind) { +StatisticsHolder statisticsHolder = statistics.get(statisticsKind.getName()); +return statisticsHolder != null && statisticsHolder.getStatisticsKind().isExact(); + } + + @Override + @SuppressWarnings("unchecked") + public V getStatisticsForColumn(SchemaPath columnName, StatisticsKind statisticsKind) { +return (V) columnsStatistics.get(columnName).get(statisticsKind); + } + + @Override + public ColumnMetadata getColumn(SchemaPath name) { +return SchemaPathUtils.getColumnMetadata(name, schema); + } + + @Override + public TableInfo getTableInfo() { +return tableInfo; + } + + @Override + public MetadataInfo getMetadataInfo() { +return metadataInfo; + } + + public static abstract class BaseMetadataBuilder> { +protected TableInfo tableInfo; +protected MetadataInfo metadataInfo; +protected TupleMetadata schema; +protected Map columnsStatistics; +protected Collection statistics; + +public T withTableInfo(TableInfo tableInfo) { + this.tableInfo = tableInfo; + return self(); +} + +public T withMetadataInfo(MetadataInfo metadataInfo) { + this.metadataInfo = metadataInfo; + return self(); +} + +public T withSchema(TupleMetadata schema) { + this.schema = schema; + return self(); +} + +public T withColumnsStatistics(Map
[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
[ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871445#comment-16871445 ] ASF GitHub Bot commented on DRILL-7271: --- vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore URL: https://github.com/apache/drill/pull/1810#discussion_r296750708 ## File path: metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/BaseMetadata.java ## @@ -0,0 +1,148 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.metastore.metadata; + +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.exec.record.metadata.ColumnMetadata; +import org.apache.drill.exec.record.metadata.TupleMetadata; +import org.apache.drill.metastore.SchemaPathUtils; +import org.apache.drill.metastore.statistics.ColumnStatistics; +import org.apache.drill.metastore.statistics.StatisticsHolder; +import org.apache.drill.metastore.statistics.StatisticsKind; + +import java.util.Collection; +import java.util.Map; +import java.util.Objects; +import java.util.function.Function; +import java.util.stream.Collectors; + +/** + * Common provider of tuple schema, column metadata, and statistics for table, partition, file or row group. + */ +public abstract class BaseMetadata implements Metadata { + protected final TableInfo tableInfo; + protected final MetadataInfo metadataInfo; + protected final TupleMetadata schema; + protected final Map columnsStatistics; + protected final Map statistics; Review comment: Agree, `metadataStatistics` fits better, renamed it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor Metadata interfaces and classes to contain all needed information > for the File based Metastore > --- > > Key: DRILL-7271 > URL: https://issues.apache.org/jira/browse/DRILL-7271 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Arina Ielchiieva >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.17.0 > > > 1. Merge info from metadataStatistics + statisticsKinds into one holder: > Map. > 2. Rename hasStatistics to hasDescriptiveStatistics > 3. Remove drill-file-metastore-plugin > 4. Move > org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel > to metadata module, rename to MetadataType and add new value: SEGMENT. > 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder. > 6. Add new info classes: > {noformat} > class TableInfo { > String storagePlugin; > String workspace; > String name; > String type; > String owner; > } > class MetadataInfo { > public static final String GENERAL_INFO_KEY = "GENERAL_INFO"; > public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT"; > MetadataType type (enum); > String key; > String identifier; > } > {noformat} > 7. Modify existing metadata classes: > org.apache.drill.metastore.FileTableMetadata > {noformat} > missing fields > -- > storagePlugin, workspace, tableType -> will be covered by TableInfo class > metadataType, metadataKey -> will be covered by MetadataInfo class > interestingColumns > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Set partitionKeys; -> Map > {noformat} > org.apache.drill.metastore.PartitionMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > partitionValues (List) > location (String) (for directory level
[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
[ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871462#comment-16871462 ] ASF GitHub Bot commented on DRILL-7271: --- vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore URL: https://github.com/apache/drill/pull/1810#discussion_r296762428 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/AbstractGroupScanWithMetadata.java ## @@ -535,31 +581,39 @@ public TableMetadata getTableMetadata() { return partitions; } + protected Map getSegmentsMetadata() { +if (segments == null) { + segments = metadataProvider.getSegmentsMetadataMap(); +} +return segments; + } + @JsonIgnore public NonInterestingColumnsMetadata getNonInterestingColumnsMetadata() { if (nonInterestingColumnsMetadata == null) { - nonInterestingColumnsMetadata = metadataProvider.getNonInterestingColumnsMeta(); + nonInterestingColumnsMetadata = metadataProvider.getNonInterestingColumnsMetadata(); } return nonInterestingColumnsMetadata; } /** * This class is responsible for filtering different metadata levels. */ - protected abstract static class GroupScanWithMetadataFilterer { + protected abstract static class GroupScanWithMetadataFilterer> { protected final AbstractGroupScanWithMetadata source; protected boolean matchAllMetadata = false; protected TableMetadata tableMetadata; protected List partitions = Collections.emptyList(); +protected Map segments = Collections.emptyMap(); Review comment: Yes, it is expected. Later it may be replaced with a regular list or if filtering will not happen, there wouldn't be allocated new object. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor Metadata interfaces and classes to contain all needed information > for the File based Metastore > --- > > Key: DRILL-7271 > URL: https://issues.apache.org/jira/browse/DRILL-7271 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Arina Ielchiieva >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.17.0 > > > 1. Merge info from metadataStatistics + statisticsKinds into one holder: > Map. > 2. Rename hasStatistics to hasDescriptiveStatistics > 3. Remove drill-file-metastore-plugin > 4. Move > org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel > to metadata module, rename to MetadataType and add new value: SEGMENT. > 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder. > 6. Add new info classes: > {noformat} > class TableInfo { > String storagePlugin; > String workspace; > String name; > String type; > String owner; > } > class MetadataInfo { > public static final String GENERAL_INFO_KEY = "GENERAL_INFO"; > public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT"; > MetadataType type (enum); > String key; > String identifier; > } > {noformat} > 7. Modify existing metadata classes: > org.apache.drill.metastore.FileTableMetadata > {noformat} > missing fields > -- > storagePlugin, workspace, tableType -> will be covered by TableInfo class > metadataType, metadataKey -> will be covered by MetadataInfo class > interestingColumns > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Set partitionKeys; -> Map > {noformat} > org.apache.drill.metastore.PartitionMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > partitionValues (List) > location (String) (for directory level metadata) - directory location > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Set location; -> locations > {noformat} > org.apache.drill.metastore.FileMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > path - path to file > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Path location; -
[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
[ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871461#comment-16871461 ] ASF GitHub Bot commented on DRILL-7271: --- vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore URL: https://github.com/apache/drill/pull/1810#discussion_r296770734 ## File path: metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/MetadataInfo.java ## @@ -0,0 +1,50 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.metastore.metadata; + +/** + * Class which identifies specific metadata. Review comment: Thanks, done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor Metadata interfaces and classes to contain all needed information > for the File based Metastore > --- > > Key: DRILL-7271 > URL: https://issues.apache.org/jira/browse/DRILL-7271 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Arina Ielchiieva >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.17.0 > > > 1. Merge info from metadataStatistics + statisticsKinds into one holder: > Map. > 2. Rename hasStatistics to hasDescriptiveStatistics > 3. Remove drill-file-metastore-plugin > 4. Move > org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel > to metadata module, rename to MetadataType and add new value: SEGMENT. > 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder. > 6. Add new info classes: > {noformat} > class TableInfo { > String storagePlugin; > String workspace; > String name; > String type; > String owner; > } > class MetadataInfo { > public static final String GENERAL_INFO_KEY = "GENERAL_INFO"; > public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT"; > MetadataType type (enum); > String key; > String identifier; > } > {noformat} > 7. Modify existing metadata classes: > org.apache.drill.metastore.FileTableMetadata > {noformat} > missing fields > -- > storagePlugin, workspace, tableType -> will be covered by TableInfo class > metadataType, metadataKey -> will be covered by MetadataInfo class > interestingColumns > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Set partitionKeys; -> Map > {noformat} > org.apache.drill.metastore.PartitionMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > partitionValues (List) > location (String) (for directory level metadata) - directory location > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Set location; -> locations > {noformat} > org.apache.drill.metastore.FileMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > path - path to file > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Path location; - should contain directory to which file belongs > {noformat} > org.apache.drill.metastore.RowGroupMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > path - path to file
[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
[ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871452#comment-16871452 ] ASF GitHub Bot commented on DRILL-7271: --- vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore URL: https://github.com/apache/drill/pull/1810#discussion_r296767257 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/AbstractGroupScanWithMetadata.java ## @@ -666,27 +733,66 @@ protected void filterTableMetadata(FilterPredicate filterPredicate, Set schemaPathsInExpr) { +protected void filterSegmentMetadata(OptionManager optionManager, + FilterPredicate filterPredicate, + Set schemaPathsInExpr) { if (!matchAllMetadata) { -if (!source.getPartitionsMetadata().isEmpty()) { - if (source.getPartitionsMetadata().size() <= optionManager.getOption( - PlannerSettings.PARQUET_ROWGROUP_FILTER_PUSHDOWN_PLANNING_THRESHOLD)) { +if (!source.getSegmentsMetadata().isEmpty()) { + if (source.getSegmentsMetadata().size() <= optionManager.getOption( + PlannerSettings.PARQUET_ROWGROUP_FILTER_PUSHDOWN_PLANNING_THRESHOLD)) { matchAllMetadata = true; -partitions = filterAndGetMetadata(schemaPathsInExpr, source.getPartitionsMetadata(), filterPredicate, optionManager); +segments = filterAndGetMetadata(schemaPathsInExpr, +source.getSegmentsMetadata().values(), +filterPredicate, +optionManager).stream() +.collect(Collectors.toMap(SegmentMetadata::getPath, Function.identity())); Review comment: Thanks, formatted the code and added `BinaryOperator`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor Metadata interfaces and classes to contain all needed information > for the File based Metastore > --- > > Key: DRILL-7271 > URL: https://issues.apache.org/jira/browse/DRILL-7271 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Arina Ielchiieva >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.17.0 > > > 1. Merge info from metadataStatistics + statisticsKinds into one holder: > Map. > 2. Rename hasStatistics to hasDescriptiveStatistics > 3. Remove drill-file-metastore-plugin > 4. Move > org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel > to metadata module, rename to MetadataType and add new value: SEGMENT. > 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder. > 6. Add new info classes: > {noformat} > class TableInfo { > String storagePlugin; > String workspace; > String name; > String type; > String owner; > } > class MetadataInfo { > public static final String GENERAL_INFO_KEY = "GENERAL_INFO"; > public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT"; > MetadataType type (enum); > String key; > String identifier; > } > {noformat} > 7. Modify existing metadata classes: > org.apache.drill.metastore.FileTableMetadata > {noformat} > missing fields > -- > storagePlugin, workspace, tableType -> will be covered by TableInfo class > metadataType, metadataKey -> will be covered by MetadataInfo class > interestingColumns > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Set partitionKeys; -> Map > {noformat} > org.apache.drill.metastore.PartitionMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > partitionValues (List) > location (String) (for directory level metadata) - directory location > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Set location; -> locations > {noformat} > org.apache.drill.metastore.FileMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > path - path to file > fields to modify > > private final Map tableStatistics; > private final Map
[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
[ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871472#comment-16871472 ] ASF GitHub Bot commented on DRILL-7271: --- vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore URL: https://github.com/apache/drill/pull/1810#discussion_r296781802 ## File path: metastore/metastore-api/src/main/java/org/apache/drill/metastore/statistics/ColumnStatistics.java ## @@ -0,0 +1,167 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.metastore.statistics; + +import com.fasterxml.jackson.annotation.JsonAutoDetect; +import com.fasterxml.jackson.annotation.JsonCreator; +import com.fasterxml.jackson.annotation.JsonInclude; +import com.fasterxml.jackson.annotation.JsonProperty; +import com.fasterxml.jackson.annotation.JsonPropertyOrder; +import com.fasterxml.jackson.core.JsonProcessingException; +import com.fasterxml.jackson.databind.ObjectMapper; +import com.fasterxml.jackson.databind.ObjectReader; +import org.apache.drill.common.types.TypeProtos; +import org.apache.drill.metastore.TableMetadataUtils; + +import java.io.IOException; +import java.util.Collection; +import java.util.Comparator; +import java.util.HashMap; +import java.util.Map; +import java.util.function.Function; +import java.util.stream.Collectors; + +import static org.apache.drill.metastore.statistics.StatisticsHolder.OBJECT_WRITER; + +/** + * Represents collection of statistics values for specific column. Review comment: Thanks, added. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor Metadata interfaces and classes to contain all needed information > for the File based Metastore > --- > > Key: DRILL-7271 > URL: https://issues.apache.org/jira/browse/DRILL-7271 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Arina Ielchiieva >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.17.0 > > > 1. Merge info from metadataStatistics + statisticsKinds into one holder: > Map. > 2. Rename hasStatistics to hasDescriptiveStatistics > 3. Remove drill-file-metastore-plugin > 4. Move > org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel > to metadata module, rename to MetadataType and add new value: SEGMENT. > 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder. > 6. Add new info classes: > {noformat} > class TableInfo { > String storagePlugin; > String workspace; > String name; > String type; > String owner; > } > class MetadataInfo { > public static final String GENERAL_INFO_KEY = "GENERAL_INFO"; > public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT"; > MetadataType type (enum); > String key; > String identifier; > } > {noformat} > 7. Modify existing metadata classes: > org.apache.drill.metastore.FileTableMetadata > {noformat} > missing fields > -- > storagePlugin, workspace, tableType -> will be covered by TableInfo class > metadataType, metadataKey -> will be covered by MetadataInfo class > interestingColumns > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Set partitionKeys; -> Map > {noformat} > org.apache.drill.metastore.PartitionMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > partitionValues (List) > location (String) (for directory level metadata) - directory location > fields to modify >
[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
[ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871443#comment-16871443 ] ASF GitHub Bot commented on DRILL-7271: --- vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore URL: https://github.com/apache/drill/pull/1810#discussion_r296728354 ## File path: contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeParquetScan.java ## @@ -228,5 +228,10 @@ public HiveDrillNativeParquetScanFilterer(HiveDrillNativeParquetScan source) { protected AbstractParquetGroupScan getNewScan() { return new HiveDrillNativeParquetScan((HiveDrillNativeParquetScan) source); } + +@Override +protected HiveDrillNativeParquetScanFilterer self() { Review comment: This method came from `GroupScanWithMetadataFilterer` and is used to return the correct type of `this` instance to avoid casts in parent classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor Metadata interfaces and classes to contain all needed information > for the File based Metastore > --- > > Key: DRILL-7271 > URL: https://issues.apache.org/jira/browse/DRILL-7271 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Arina Ielchiieva >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.17.0 > > > 1. Merge info from metadataStatistics + statisticsKinds into one holder: > Map. > 2. Rename hasStatistics to hasDescriptiveStatistics > 3. Remove drill-file-metastore-plugin > 4. Move > org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel > to metadata module, rename to MetadataType and add new value: SEGMENT. > 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder. > 6. Add new info classes: > {noformat} > class TableInfo { > String storagePlugin; > String workspace; > String name; > String type; > String owner; > } > class MetadataInfo { > public static final String GENERAL_INFO_KEY = "GENERAL_INFO"; > public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT"; > MetadataType type (enum); > String key; > String identifier; > } > {noformat} > 7. Modify existing metadata classes: > org.apache.drill.metastore.FileTableMetadata > {noformat} > missing fields > -- > storagePlugin, workspace, tableType -> will be covered by TableInfo class > metadataType, metadataKey -> will be covered by MetadataInfo class > interestingColumns > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Set partitionKeys; -> Map > {noformat} > org.apache.drill.metastore.PartitionMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > partitionValues (List) > location (String) (for directory level metadata) - directory location > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Set location; -> locations > {noformat} > org.apache.drill.metastore.FileMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > path - path to file > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Path location; - should contain directory to which file belongs > {noformat} > org.apache.drill.metastore.RowGroupMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > path - path to file > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Path location; - should contain directory to which file belongs > {noformat} > 8. Remove org.apache.drill.exec package from metastore module. > 9. Rename ColumnStatisticsImpl class. > 10. Separate existing classes in org.apache.drill.metastore package into > sub-packages. > 11. Rename FileTableMetadata -> BaseTableMetadata > 12.
[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
[ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871471#comment-16871471 ] ASF GitHub Bot commented on DRILL-7271: --- vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore URL: https://github.com/apache/drill/pull/1810#discussion_r296794257 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/BaseParquetMetadataProvider.java ## @@ -313,18 +328,30 @@ public TableMetadata getTableMetadata() { partitionsForValue.asMap().forEach((partitionKey, value) -> { Map columnsStatistics = new HashMap<>(); -Map statistics = new HashMap<>(); +List statistics = new ArrayList<>(); partitionKey = partitionKey == NULL_VALUE ? null : partitionKey; -statistics.put(ColumnStatisticsKind.MIN_VALUE, partitionKey); -statistics.put(ColumnStatisticsKind.MAX_VALUE, partitionKey); +statistics.add(new StatisticsHolder<>(partitionKey, ColumnStatisticsKind.MIN_VALUE)); +statistics.add(new StatisticsHolder<>(partitionKey, ColumnStatisticsKind.MAX_VALUE)); -statistics.put(ColumnStatisticsKind.NULLS_COUNT, Statistic.NO_COLUMN_STATS); -statistics.put(TableStatisticsKind.ROW_COUNT, Statistic.NO_COLUMN_STATS); +statistics.add(new StatisticsHolder<>(Statistic.NO_COLUMN_STATS, ColumnStatisticsKind.NULLS_COUNT)); +statistics.add(new StatisticsHolder<>(Statistic.NO_COLUMN_STATS, TableStatisticsKind.ROW_COUNT)); columnsStatistics.put(partitionColumn, -new ColumnStatisticsImpl<>(statistics, - ParquetTableMetadataUtils.getComparator(getParquetGroupScanStatistics().getTypeForColumn(partitionColumn).getMinorType(; -partitions.add(new PartitionMetadata(partitionColumn, getTableMetadata().getSchema(), -columnsStatistics, statistics, (Set) value, tableName, -1)); +new ColumnStatistics<>(statistics, + getParquetGroupScanStatistics().getTypeForColumn(partitionColumn).getMinorType())); +MetadataInfo metadataInfo = new MetadataInfo(MetadataType.PARTITION, MetadataInfo.GENERAL_INFO_KEY, null); +TableMetadata tableMetadata = getTableMetadata(); +PartitionMetadata partitionMetadata = PartitionMetadata.builder() +.withTableInfo(tableMetadata.getTableInfo()) +.withMetadataInfo(metadataInfo) +.withColumn(partitionColumn) +.withSchema(tableMetadata.getSchema()) +.withColumnsStatistics(columnsStatistics) +.withStatistics(statistics) +.withPartitionValues(Collections.emptyList()) +.withLocations((Set) value) Review comment: It is required because `HashMultimap.asMap()` returns map with Collection in the values, but for `HashMultimap` used set. To avoid problems for the case when `HashMultimap` implementation is changed, I have replaced it with `new HashSet<>(value)`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor Metadata interfaces and classes to contain all needed information > for the File based Metastore > --- > > Key: DRILL-7271 > URL: https://issues.apache.org/jira/browse/DRILL-7271 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Arina Ielchiieva >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.17.0 > > > 1. Merge info from metadataStatistics + statisticsKinds into one holder: > Map. > 2. Rename hasStatistics to hasDescriptiveStatistics > 3. Remove drill-file-metastore-plugin > 4. Move > org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel > to metadata module, rename to MetadataType and add new value: SEGMENT. > 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder. > 6. Add new info classes: > {noformat} > class TableInfo { > String storagePlugin; > String workspace; > String name; > String type; > String owner; > } > class MetadataInfo { > public static final String GENERAL_INFO_KEY = "GENERAL_INFO"; > public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT"; > MetadataType type (enum); > String key; > String identifier; > } > {noformat} > 7. Modify existing
[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
[ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871458#comment-16871458 ] ASF GitHub Bot commented on DRILL-7271: --- vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore URL: https://github.com/apache/drill/pull/1810#discussion_r296756147 ## File path: metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/BaseTableMetadata.java ## @@ -0,0 +1,143 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.metastore.metadata; + +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.metastore.statistics.ColumnStatistics; +import org.apache.drill.metastore.statistics.StatisticsHolder; +import org.apache.hadoop.fs.Path; + +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.Objects; + +/** + * Base implementation of {@link TableMetadata} interface. + */ +public class BaseTableMetadata extends BaseMetadata implements TableMetadata { + + public static final long NON_DEFINED_LAST_MODIFIED_TIME = -1; + + private final Path location; + private final long lastModifiedTime; + private final Map partitionKeys; + private final List interestingColumns; + + private BaseTableMetadata(BaseTableMetadataBuilder builder) { +super(builder); +this.location = builder.location; +this.partitionKeys = builder.partitionKeys; +this.interestingColumns = builder.interestingColumns; +this.lastModifiedTime = builder.lastModifiedTime; + } + + public boolean isPartitionColumn(String fieldName) { +return partitionKeys.containsKey(fieldName); + } + + boolean isPartitioned() { +return !partitionKeys.isEmpty(); + } + + @Override + public Path getLocation() { +return location; + } + + @Override + public long getLastModifiedTime() { +return lastModifiedTime; + } + + @Override + public List getInterestingColumns() { +return interestingColumns; + } + + @Override + @SuppressWarnings("unchecked") + public BaseTableMetadata cloneWithStats(Map columnStatistics, List tableStatistics) { +Map mergedTableStatistics = new HashMap<>(this.statistics); + +// overrides statistics value for the case when new statistics is exact or existing one was estimated +tableStatistics.stream() +.filter(statisticsHolder -> statisticsHolder.getStatisticsKind().isExact() + || !this.statistics.get(statisticsHolder.getStatisticsKind().getName()).getStatisticsKind().isExact()) +.forEach(statisticsHolder -> mergedTableStatistics.put(statisticsHolder.getStatisticsKind().getName(), statisticsHolder)); + +Map newColumnsStatistics = new HashMap<>(this.columnsStatistics); +this.columnsStatistics.forEach( +(columnName, value) -> newColumnsStatistics.put(columnName, value.cloneWith(columnStatistics.get(columnName; + +return BaseTableMetadata.builder() +.withTableInfo(tableInfo) +.withMetadataInfo(metadataInfo) +.withLocation(location) +.withSchema(schema) +.withColumnsStatistics(newColumnsStatistics) +.withStatistics(mergedTableStatistics.values()) +.withLastModifiedTime(lastModifiedTime) +.withPartitionKeys(partitionKeys) +.withInterestingColumns(interestingColumns) +.build(); + } + + public static BaseTableMetadataBuilder builder() { +return new BaseTableMetadataBuilder(); + } + + public static class BaseTableMetadataBuilder extends BaseMetadataBuilder { +private Path location; +private long lastModifiedTime = NON_DEFINED_LAST_MODIFIED_TIME; +private Map partitionKeys; +private List interestingColumns; + +public BaseTableMetadataBuilder withLocation(Path location) { + this.location = location; + return self(); +} + +public BaseTableMetadataBuilder withLastModifiedTime(long lastModifiedTime) { + this.lastModifiedTime = lastModifiedTime; + return self(); +} + +public BaseTableMetadataBuilder
[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
[ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871466#comment-16871466 ] ASF GitHub Bot commented on DRILL-7271: --- vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore URL: https://github.com/apache/drill/pull/1810#discussion_r296775175 ## File path: metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/MetadataType.java ## @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.metastore.metadata; + +/** + * Enum with possible types of metadata. + */ +public enum MetadataType { + + ALL, + + /** + * Table level metadata type. + */ + TABLE, + + /** + * Segment level metadata type. It corresponds to the metadata + * within specific directory for FS tables, or may correspond to partition for hive tables. + */ + SEGMENT, + + /** + * Drill partition level metadata type. It corresponds to parts of table data which has the same + * values within specific column, i.e. partitions discovered by Drill. + */ + PARTITION, + + /** + * File level metadata type. + */ + FILE, + + /** + * Row group level metadata type. Used for parquet tables. + */ + ROW_GROUP, + + NONE Review comment: 1. Thanks, added. 2. It is used during filtering to indicate that filtering was finished and there was no metadata whose size exceeds `PlannerSettings.PARQUET_ROWGROUP_FILTER_PUSHDOWN_PLANNING_THRESHOLD`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor Metadata interfaces and classes to contain all needed information > for the File based Metastore > --- > > Key: DRILL-7271 > URL: https://issues.apache.org/jira/browse/DRILL-7271 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Arina Ielchiieva >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.17.0 > > > 1. Merge info from metadataStatistics + statisticsKinds into one holder: > Map. > 2. Rename hasStatistics to hasDescriptiveStatistics > 3. Remove drill-file-metastore-plugin > 4. Move > org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel > to metadata module, rename to MetadataType and add new value: SEGMENT. > 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder. > 6. Add new info classes: > {noformat} > class TableInfo { > String storagePlugin; > String workspace; > String name; > String type; > String owner; > } > class MetadataInfo { > public static final String GENERAL_INFO_KEY = "GENERAL_INFO"; > public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT"; > MetadataType type (enum); > String key; > String identifier; > } > {noformat} > 7. Modify existing metadata classes: > org.apache.drill.metastore.FileTableMetadata > {noformat} > missing fields > -- > storagePlugin, workspace, tableType -> will be covered by TableInfo class > metadataType, metadataKey -> will be covered by MetadataInfo class > interestingColumns > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Set partitionKeys; -> Map > {noformat} > org.apache.drill.metastore.PartitionMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > partitionValues (List) > location (String) (for directory level metadata) - directory location > fields to modify > > private final Map tableStatistics;
[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
[ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871449#comment-16871449 ] ASF GitHub Bot commented on DRILL-7271: --- vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore URL: https://github.com/apache/drill/pull/1810#discussion_r296743285 ## File path: metastore/metastore-api/src/main/java/org/apache/drill/metastore/statistics/StatisticsHolder.java ## @@ -0,0 +1,76 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.metastore.statistics; + +import com.fasterxml.jackson.annotation.JsonCreator; +import com.fasterxml.jackson.annotation.JsonInclude; +import com.fasterxml.jackson.annotation.JsonProperty; +import com.fasterxml.jackson.annotation.JsonTypeInfo; +import com.fasterxml.jackson.core.JsonProcessingException; +import com.fasterxml.jackson.core.util.DefaultPrettyPrinter; +import com.fasterxml.jackson.databind.ObjectMapper; +import com.fasterxml.jackson.databind.ObjectReader; +import com.fasterxml.jackson.databind.ObjectWriter; + +import java.io.IOException; + +/** + * Class-holder for statistics kind and its value. + * + * @param Type of statistics value + */ +@JsonInclude(JsonInclude.Include.NON_DEFAULT) +public class StatisticsHolder { + + public static final ObjectWriter OBJECT_WRITER = new ObjectMapper().setDefaultPrettyPrinter(new DefaultPrettyPrinter()).writer(); Review comment: It is also used in `ColumnStatistics`. Set package default visibility. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor Metadata interfaces and classes to contain all needed information > for the File based Metastore > --- > > Key: DRILL-7271 > URL: https://issues.apache.org/jira/browse/DRILL-7271 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Arina Ielchiieva >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.17.0 > > > 1. Merge info from metadataStatistics + statisticsKinds into one holder: > Map. > 2. Rename hasStatistics to hasDescriptiveStatistics > 3. Remove drill-file-metastore-plugin > 4. Move > org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel > to metadata module, rename to MetadataType and add new value: SEGMENT. > 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder. > 6. Add new info classes: > {noformat} > class TableInfo { > String storagePlugin; > String workspace; > String name; > String type; > String owner; > } > class MetadataInfo { > public static final String GENERAL_INFO_KEY = "GENERAL_INFO"; > public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT"; > MetadataType type (enum); > String key; > String identifier; > } > {noformat} > 7. Modify existing metadata classes: > org.apache.drill.metastore.FileTableMetadata > {noformat} > missing fields > -- > storagePlugin, workspace, tableType -> will be covered by TableInfo class > metadataType, metadataKey -> will be covered by MetadataInfo class > interestingColumns > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Set partitionKeys; -> Map > {noformat} > org.apache.drill.metastore.PartitionMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > partitionValues (List) > location (String) (for directory level metadata) - directory location > fields to modify > > private final Map
[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
[ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871453#comment-16871453 ] ASF GitHub Bot commented on DRILL-7271: --- vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore URL: https://github.com/apache/drill/pull/1810#discussion_r296753431 ## File path: metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/BaseMetadata.java ## @@ -0,0 +1,148 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.metastore.metadata; + +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.exec.record.metadata.ColumnMetadata; +import org.apache.drill.exec.record.metadata.TupleMetadata; +import org.apache.drill.metastore.SchemaPathUtils; +import org.apache.drill.metastore.statistics.ColumnStatistics; +import org.apache.drill.metastore.statistics.StatisticsHolder; +import org.apache.drill.metastore.statistics.StatisticsKind; + +import java.util.Collection; +import java.util.Map; +import java.util.Objects; +import java.util.function.Function; +import java.util.stream.Collectors; + +/** + * Common provider of tuple schema, column metadata, and statistics for table, partition, file or row group. + */ +public abstract class BaseMetadata implements Metadata { + protected final TableInfo tableInfo; + protected final MetadataInfo metadataInfo; + protected final TupleMetadata schema; + protected final Map columnsStatistics; + protected final Map statistics; + + protected > BaseMetadata(BaseMetadataBuilder builder) { +this.tableInfo = builder.tableInfo; +this.metadataInfo = builder.metadataInfo; +this.schema = builder.schema; +this.columnsStatistics = builder.columnsStatistics; +this.statistics = builder.statistics.stream() +.collect(Collectors.toMap( +statistic -> statistic.getStatisticsKind().getName(), +Function.identity(), +(a, b) -> a.getStatisticsKind().isExact() ? a : b)); + } + + @Override + public Map getColumnsStatistics() { +return columnsStatistics; + } + + @Override + public ColumnStatistics getColumnStatistics(SchemaPath columnName) { +return columnsStatistics.get(columnName); + } + + @Override + public TupleMetadata getSchema() { +return schema; + } + + @Override + @SuppressWarnings("unchecked") + public V getStatistic(StatisticsKind statisticsKind) { +StatisticsHolder statisticsHolder = statistics.get(statisticsKind.getName()); +return statisticsHolder != null ? statisticsHolder.getStatisticsValue() : null; + } + + @Override + public boolean containsExactStatistics(StatisticsKind statisticsKind) { +StatisticsHolder statisticsHolder = statistics.get(statisticsKind.getName()); +return statisticsHolder != null && statisticsHolder.getStatisticsKind().isExact(); + } + + @Override + @SuppressWarnings("unchecked") + public V getStatisticsForColumn(SchemaPath columnName, StatisticsKind statisticsKind) { +return (V) columnsStatistics.get(columnName).get(statisticsKind); + } + + @Override + public ColumnMetadata getColumn(SchemaPath name) { +return SchemaPathUtils.getColumnMetadata(name, schema); + } + + @Override + public TableInfo getTableInfo() { +return tableInfo; + } + + @Override + public MetadataInfo getMetadataInfo() { +return metadataInfo; + } + + public static abstract class BaseMetadataBuilder> { +protected TableInfo tableInfo; +protected MetadataInfo metadataInfo; +protected TupleMetadata schema; +protected Map columnsStatistics; +protected Collection statistics; + +public T withTableInfo(TableInfo tableInfo) { + this.tableInfo = tableInfo; + return self(); +} + +public T withMetadataInfo(MetadataInfo metadataInfo) { + this.metadataInfo = metadataInfo; + return self(); +} + +public T withSchema(TupleMetadata schema) { + this.schema = schema; + return self(); +} + +public T withColumnsStatistics(Map
[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
[ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871450#comment-16871450 ] ASF GitHub Bot commented on DRILL-7271: --- vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore URL: https://github.com/apache/drill/pull/1810#discussion_r296753490 ## File path: metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/BaseMetadata.java ## @@ -0,0 +1,148 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.metastore.metadata; + +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.exec.record.metadata.ColumnMetadata; +import org.apache.drill.exec.record.metadata.TupleMetadata; +import org.apache.drill.metastore.SchemaPathUtils; +import org.apache.drill.metastore.statistics.ColumnStatistics; +import org.apache.drill.metastore.statistics.StatisticsHolder; +import org.apache.drill.metastore.statistics.StatisticsKind; + +import java.util.Collection; +import java.util.Map; +import java.util.Objects; +import java.util.function.Function; +import java.util.stream.Collectors; + +/** + * Common provider of tuple schema, column metadata, and statistics for table, partition, file or row group. + */ +public abstract class BaseMetadata implements Metadata { + protected final TableInfo tableInfo; + protected final MetadataInfo metadataInfo; + protected final TupleMetadata schema; + protected final Map columnsStatistics; + protected final Map statistics; + + protected > BaseMetadata(BaseMetadataBuilder builder) { +this.tableInfo = builder.tableInfo; +this.metadataInfo = builder.metadataInfo; +this.schema = builder.schema; +this.columnsStatistics = builder.columnsStatistics; +this.statistics = builder.statistics.stream() +.collect(Collectors.toMap( +statistic -> statistic.getStatisticsKind().getName(), +Function.identity(), +(a, b) -> a.getStatisticsKind().isExact() ? a : b)); + } + + @Override + public Map getColumnsStatistics() { +return columnsStatistics; + } + + @Override + public ColumnStatistics getColumnStatistics(SchemaPath columnName) { +return columnsStatistics.get(columnName); + } + + @Override + public TupleMetadata getSchema() { +return schema; + } + + @Override + @SuppressWarnings("unchecked") + public V getStatistic(StatisticsKind statisticsKind) { +StatisticsHolder statisticsHolder = statistics.get(statisticsKind.getName()); +return statisticsHolder != null ? statisticsHolder.getStatisticsValue() : null; + } + + @Override + public boolean containsExactStatistics(StatisticsKind statisticsKind) { +StatisticsHolder statisticsHolder = statistics.get(statisticsKind.getName()); +return statisticsHolder != null && statisticsHolder.getStatisticsKind().isExact(); + } + + @Override + @SuppressWarnings("unchecked") + public V getStatisticsForColumn(SchemaPath columnName, StatisticsKind statisticsKind) { +return (V) columnsStatistics.get(columnName).get(statisticsKind); + } + + @Override + public ColumnMetadata getColumn(SchemaPath name) { +return SchemaPathUtils.getColumnMetadata(name, schema); + } + + @Override + public TableInfo getTableInfo() { +return tableInfo; + } + + @Override + public MetadataInfo getMetadataInfo() { +return metadataInfo; + } + + public static abstract class BaseMetadataBuilder> { +protected TableInfo tableInfo; +protected MetadataInfo metadataInfo; +protected TupleMetadata schema; +protected Map columnsStatistics; +protected Collection statistics; + +public T withTableInfo(TableInfo tableInfo) { + this.tableInfo = tableInfo; + return self(); +} + +public T withMetadataInfo(MetadataInfo metadataInfo) { + this.metadataInfo = metadataInfo; + return self(); +} + +public T withSchema(TupleMetadata schema) { + this.schema = schema; + return self(); +} + +public T withColumnsStatistics(Map
[jira] [Updated] (DRILL-6951) Merge row set based mock data source
[ https://issues.apache.org/jira/browse/DRILL-6951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-6951: Labels: ready-to-commit (was: ) > Merge row set based mock data source > > > Key: DRILL-6951 > URL: https://issues.apache.org/jira/browse/DRILL-6951 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.15.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Labels: ready-to-commit > Fix For: 1.17.0 > > > The mock reader framework is an obscure bit of code used in tests that > generates fake data for use in things like testing sort, filters and so on. > Because the mock reader is simple, it is a good demonstration case for the > new scanner framework based on the result set loader. This task merges the > existing work in migrating the mock data source into master via a PR. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6951) Merge row set based mock data source
[ https://issues.apache.org/jira/browse/DRILL-6951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871190#comment-16871190 ] ASF GitHub Bot commented on DRILL-6951: --- arina-ielchiieva commented on issue #1809: DRILL-6951: Row set based mock data source URL: https://github.com/apache/drill/pull/1809#issuecomment-505004114 @paul-rogers looks good, please squash the commits. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Merge row set based mock data source > > > Key: DRILL-6951 > URL: https://issues.apache.org/jira/browse/DRILL-6951 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.15.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.17.0 > > > The mock reader framework is an obscure bit of code used in tests that > generates fake data for use in things like testing sort, filters and so on. > Because the mock reader is simple, it is a good demonstration case for the > new scanner framework based on the result set loader. This task merges the > existing work in migrating the mock data source into master via a PR. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7306) Disable "fast schema" batch for new scan framework
[ https://issues.apache.org/jira/browse/DRILL-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871188#comment-16871188 ] ASF GitHub Bot commented on DRILL-7306: --- arina-ielchiieva commented on pull request #1813: DRILL-7306: Disable schema-only batch for new scan framework URL: https://github.com/apache/drill/pull/1813#discussion_r296713307 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/easy/EasyFormatPlugin.java ## @@ -398,6 +363,40 @@ public void addContext(UserException.Builder builder) { } } + /** + * Initialize the scan framework builder with standard options. + * Call this from the plugin-specific + * {@link #frameworkBuilder(OptionManager, EasySubScan)} method. + * The plugin can then customize/revise options as needed. Review comment: Please add two params to the Javadoc as well. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Disable "fast schema" batch for new scan framework > -- > > Key: DRILL-7306 > URL: https://issues.apache.org/jira/browse/DRILL-7306 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.16.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.17.0 > > > The EVF framework is set up to return a "fast schema" empty batch with only > schema as its first batch because, when the code was written, it seemed > that's how we wanted operators to work. However, DRILL-7305 notes that many > operators cannot handle empty batches. > Since the empty-batch bugs show that Drill does not, in fact, provide a "fast > schema" batch, this ticket asks to disable the feature in the new scan > framework. The feature is disabled with a config option; it can be re-enabled > if ever it is needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
[ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871133#comment-16871133 ] ASF GitHub Bot commented on DRILL-7271: --- arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore URL: https://github.com/apache/drill/pull/1810#discussion_r296687506 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/AbstractParquetGroupScan.java ## @@ -281,31 +265,54 @@ public AbstractGroupScanWithMetadata applyFilter(LogicalExpression filterExpr, U logger.debug("All row groups have been filtered out. Add back one to get schema from scanner"); + Map segmentsMap = getNextOrEmpty(getSegmentsMetadata().values()).stream() + .collect(Collectors.toMap(SegmentMetadata::getPath, Function.identity())); + Map filesMap = getNextOrEmpty(getFilesMetadata().values()).stream() - .collect(Collectors.toMap(FileMetadata::getLocation, Function.identity())); + .collect(Collectors.toMap(FileMetadata::getPath, Function.identity())); Multimap rowGroupsMap = LinkedListMultimap.create(); - getNextOrEmpty(getRowGroupsMetadata().values()).forEach(entry -> rowGroupsMap.put(entry.getLocation(), entry)); + getNextOrEmpty(getRowGroupsMetadata().values()).forEach(entry -> rowGroupsMap.put(entry.getPath(), entry)); - builder.withRowGroups(rowGroupsMap) + filteredMetadata.withRowGroups(rowGroupsMap) .withTable(getTableMetadata()) + .withSegments(segmentsMap) .withPartitions(getNextOrEmpty(getPartitionsMetadata())) .withNonInterestingColumns(getNonInterestingColumnsMetadata()) .withFiles(filesMap) .withMatching(false); } -if (builder.getOverflowLevel() != MetadataLevel.NONE) { - logger.warn("applyFilter {} wasn't able to do pruning for all metadata levels filter condition, since metadata count for " + -"{} level exceeds `planner.store.parquet.rowgroup.filter.pushdown.threshold` value.\n" + -"But underlying metadata was pruned without filter expression according to the metadata with above level.", - ExpressionStringBuilder.toString(filterExpr), builder.getOverflowLevel()); +if (filteredMetadata.getOverflowLevel() != MetadataType.NONE) { + if (logger.isWarnEnabled()) { Review comment: No objections for this change but what are the odds of warn level being disabled? :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor Metadata interfaces and classes to contain all needed information > for the File based Metastore > --- > > Key: DRILL-7271 > URL: https://issues.apache.org/jira/browse/DRILL-7271 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Arina Ielchiieva >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.17.0 > > > 1. Merge info from metadataStatistics + statisticsKinds into one holder: > Map. > 2. Rename hasStatistics to hasDescriptiveStatistics > 3. Remove drill-file-metastore-plugin > 4. Move > org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel > to metadata module, rename to MetadataType and add new value: SEGMENT. > 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder. > 6. Add new info classes: > {noformat} > class TableInfo { > String storagePlugin; > String workspace; > String name; > String type; > String owner; > } > class MetadataInfo { > public static final String GENERAL_INFO_KEY = "GENERAL_INFO"; > public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT"; > MetadataType type (enum); > String key; > String identifier; > } > {noformat} > 7. Modify existing metadata classes: > org.apache.drill.metastore.FileTableMetadata > {noformat} > missing fields > -- > storagePlugin, workspace, tableType -> will be covered by TableInfo class > metadataType, metadataKey -> will be covered by MetadataInfo class > interestingColumns > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Set partitionKeys; -> Map > {noformat} > org.apache.drill.metastore.PartitionMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey,
[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
[ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871147#comment-16871147 ] ASF GitHub Bot commented on DRILL-7271: --- arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore URL: https://github.com/apache/drill/pull/1810#discussion_r296693580 ## File path: metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/BaseTableMetadata.java ## @@ -0,0 +1,143 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.metastore.metadata; + +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.metastore.statistics.ColumnStatistics; +import org.apache.drill.metastore.statistics.StatisticsHolder; +import org.apache.hadoop.fs.Path; + +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.Objects; + +/** + * Base implementation of {@link TableMetadata} interface. + */ +public class BaseTableMetadata extends BaseMetadata implements TableMetadata { + + public static final long NON_DEFINED_LAST_MODIFIED_TIME = -1; + + private final Path location; + private final long lastModifiedTime; + private final Map partitionKeys; + private final List interestingColumns; + + private BaseTableMetadata(BaseTableMetadataBuilder builder) { +super(builder); +this.location = builder.location; +this.partitionKeys = builder.partitionKeys; +this.interestingColumns = builder.interestingColumns; +this.lastModifiedTime = builder.lastModifiedTime; + } + + public boolean isPartitionColumn(String fieldName) { +return partitionKeys.containsKey(fieldName); + } + + boolean isPartitioned() { +return !partitionKeys.isEmpty(); + } + + @Override + public Path getLocation() { +return location; + } + + @Override + public long getLastModifiedTime() { +return lastModifiedTime; + } + + @Override + public List getInterestingColumns() { +return interestingColumns; + } + + @Override + @SuppressWarnings("unchecked") + public BaseTableMetadata cloneWithStats(Map columnStatistics, List tableStatistics) { +Map mergedTableStatistics = new HashMap<>(this.statistics); + +// overrides statistics value for the case when new statistics is exact or existing one was estimated +tableStatistics.stream() +.filter(statisticsHolder -> statisticsHolder.getStatisticsKind().isExact() + || !this.statistics.get(statisticsHolder.getStatisticsKind().getName()).getStatisticsKind().isExact()) +.forEach(statisticsHolder -> mergedTableStatistics.put(statisticsHolder.getStatisticsKind().getName(), statisticsHolder)); + +Map newColumnsStatistics = new HashMap<>(this.columnsStatistics); +this.columnsStatistics.forEach( +(columnName, value) -> newColumnsStatistics.put(columnName, value.cloneWith(columnStatistics.get(columnName; + +return BaseTableMetadata.builder() +.withTableInfo(tableInfo) +.withMetadataInfo(metadataInfo) +.withLocation(location) +.withSchema(schema) +.withColumnsStatistics(newColumnsStatistics) +.withStatistics(mergedTableStatistics.values()) +.withLastModifiedTime(lastModifiedTime) +.withPartitionKeys(partitionKeys) +.withInterestingColumns(interestingColumns) +.build(); + } + + public static BaseTableMetadataBuilder builder() { +return new BaseTableMetadataBuilder(); + } + + public static class BaseTableMetadataBuilder extends BaseMetadataBuilder { +private Path location; +private long lastModifiedTime = NON_DEFINED_LAST_MODIFIED_TIME; +private Map partitionKeys; +private List interestingColumns; + +public BaseTableMetadataBuilder withLocation(Path location) { + this.location = location; + return self(); +} + +public BaseTableMetadataBuilder withLastModifiedTime(long lastModifiedTime) { + this.lastModifiedTime = lastModifiedTime; + return self(); +} + +public BaseTableMetadataBuilder
[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
[ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871134#comment-16871134 ] ASF GitHub Bot commented on DRILL-7271: --- arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore URL: https://github.com/apache/drill/pull/1810#discussion_r296686176 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/AbstractParquetGroupScan.java ## @@ -210,6 +213,17 @@ public int getMaxParallelizationWidth() { return readEntries; } + /** + * {@inheritDoc} + * + * - if file metadata was pruned, prunes underlying metadata Review comment: Not sure if we need dash here, can be this covered with nested list? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor Metadata interfaces and classes to contain all needed information > for the File based Metastore > --- > > Key: DRILL-7271 > URL: https://issues.apache.org/jira/browse/DRILL-7271 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Arina Ielchiieva >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.17.0 > > > 1. Merge info from metadataStatistics + statisticsKinds into one holder: > Map. > 2. Rename hasStatistics to hasDescriptiveStatistics > 3. Remove drill-file-metastore-plugin > 4. Move > org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel > to metadata module, rename to MetadataType and add new value: SEGMENT. > 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder. > 6. Add new info classes: > {noformat} > class TableInfo { > String storagePlugin; > String workspace; > String name; > String type; > String owner; > } > class MetadataInfo { > public static final String GENERAL_INFO_KEY = "GENERAL_INFO"; > public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT"; > MetadataType type (enum); > String key; > String identifier; > } > {noformat} > 7. Modify existing metadata classes: > org.apache.drill.metastore.FileTableMetadata > {noformat} > missing fields > -- > storagePlugin, workspace, tableType -> will be covered by TableInfo class > metadataType, metadataKey -> will be covered by MetadataInfo class > interestingColumns > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Set partitionKeys; -> Map > {noformat} > org.apache.drill.metastore.PartitionMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > partitionValues (List) > location (String) (for directory level metadata) - directory location > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Set location; -> locations > {noformat} > org.apache.drill.metastore.FileMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > path - path to file > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Path location; - should contain directory to which file belongs > {noformat} > org.apache.drill.metastore.RowGroupMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > path - path to file > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Path location; - should contain directory to which file belongs > {noformat} > 8. Remove org.apache.drill.exec package from metastore module. > 9. Rename ColumnStatisticsImpl class. > 10. Separate existing classes in org.apache.drill.metastore package into > sub-packages. > 11. Rename FileTableMetadata -> BaseTableMetadata > 12. TableMetadataProvider.getNonInterestingColumnsMeta() -> > getNonInterestingColumnsMetadata > 13. Introduce segment-level metadata class: > {noformat} > class SegmentMetadata { > TableInfo tableInfo; > MetadataInfo
[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
[ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871137#comment-16871137 ] ASF GitHub Bot commented on DRILL-7271: --- arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore URL: https://github.com/apache/drill/pull/1810#discussion_r296684303 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/AbstractGroupScanWithMetadata.java ## @@ -666,27 +733,66 @@ protected void filterTableMetadata(FilterPredicate filterPredicate, Set schemaPathsInExpr) { +protected void filterSegmentMetadata(OptionManager optionManager, + FilterPredicate filterPredicate, + Set schemaPathsInExpr) { if (!matchAllMetadata) { -if (!source.getPartitionsMetadata().isEmpty()) { - if (source.getPartitionsMetadata().size() <= optionManager.getOption( - PlannerSettings.PARQUET_ROWGROUP_FILTER_PUSHDOWN_PLANNING_THRESHOLD)) { +if (!source.getSegmentsMetadata().isEmpty()) { + if (source.getSegmentsMetadata().size() <= optionManager.getOption( + PlannerSettings.PARQUET_ROWGROUP_FILTER_PUSHDOWN_PLANNING_THRESHOLD)) { matchAllMetadata = true; -partitions = filterAndGetMetadata(schemaPathsInExpr, source.getPartitionsMetadata(), filterPredicate, optionManager); +segments = filterAndGetMetadata(schemaPathsInExpr, +source.getSegmentsMetadata().values(), +filterPredicate, +optionManager).stream() +.collect(Collectors.toMap(SegmentMetadata::getPath, Function.identity())); Review comment: ```suggestion .collect(Collectors.toMap( SegmentMetadata::getPath, Function.identity())); ``` Plus what about duplicates handling? It would be safer to add `(o, n) -> n` but of course if you did not intend to fail on duplicate. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor Metadata interfaces and classes to contain all needed information > for the File based Metastore > --- > > Key: DRILL-7271 > URL: https://issues.apache.org/jira/browse/DRILL-7271 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Arina Ielchiieva >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.17.0 > > > 1. Merge info from metadataStatistics + statisticsKinds into one holder: > Map. > 2. Rename hasStatistics to hasDescriptiveStatistics > 3. Remove drill-file-metastore-plugin > 4. Move > org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel > to metadata module, rename to MetadataType and add new value: SEGMENT. > 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder. > 6. Add new info classes: > {noformat} > class TableInfo { > String storagePlugin; > String workspace; > String name; > String type; > String owner; > } > class MetadataInfo { > public static final String GENERAL_INFO_KEY = "GENERAL_INFO"; > public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT"; > MetadataType type (enum); > String key; > String identifier; > } > {noformat} > 7. Modify existing metadata classes: > org.apache.drill.metastore.FileTableMetadata > {noformat} > missing fields > -- > storagePlugin, workspace, tableType -> will be covered by TableInfo class > metadataType, metadataKey -> will be covered by MetadataInfo class > interestingColumns > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Set partitionKeys; -> Map > {noformat} > org.apache.drill.metastore.PartitionMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > partitionValues (List) > location (String) (for directory level metadata) - directory location > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Set location; -> locations > {noformat} > org.apache.drill.metastore.FileMetadata > {noformat} > missing fields > -- > storagePlugin,
[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
[ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871142#comment-16871142 ] ASF GitHub Bot commented on DRILL-7271: --- arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore URL: https://github.com/apache/drill/pull/1810#discussion_r296690973 ## File path: metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/MetadataInfo.java ## @@ -0,0 +1,50 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.metastore.metadata; + +/** + * Class which identifies specific metadata. Review comment: Please write better java doc: "Class that specifies metadata type ..." and provide an example. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor Metadata interfaces and classes to contain all needed information > for the File based Metastore > --- > > Key: DRILL-7271 > URL: https://issues.apache.org/jira/browse/DRILL-7271 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Arina Ielchiieva >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.17.0 > > > 1. Merge info from metadataStatistics + statisticsKinds into one holder: > Map. > 2. Rename hasStatistics to hasDescriptiveStatistics > 3. Remove drill-file-metastore-plugin > 4. Move > org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel > to metadata module, rename to MetadataType and add new value: SEGMENT. > 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder. > 6. Add new info classes: > {noformat} > class TableInfo { > String storagePlugin; > String workspace; > String name; > String type; > String owner; > } > class MetadataInfo { > public static final String GENERAL_INFO_KEY = "GENERAL_INFO"; > public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT"; > MetadataType type (enum); > String key; > String identifier; > } > {noformat} > 7. Modify existing metadata classes: > org.apache.drill.metastore.FileTableMetadata > {noformat} > missing fields > -- > storagePlugin, workspace, tableType -> will be covered by TableInfo class > metadataType, metadataKey -> will be covered by MetadataInfo class > interestingColumns > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Set partitionKeys; -> Map > {noformat} > org.apache.drill.metastore.PartitionMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > partitionValues (List) > location (String) (for directory level metadata) - directory location > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Set location; -> locations > {noformat} > org.apache.drill.metastore.FileMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > path - path to file > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Path location; - should contain directory to which file belongs > {noformat} > org.apache.drill.metastore.RowGroupMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType,
[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
[ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871152#comment-16871152 ] ASF GitHub Bot commented on DRILL-7271: --- arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore URL: https://github.com/apache/drill/pull/1810#discussion_r296691929 ## File path: metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/PartitionMetadata.java ## @@ -0,0 +1,119 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.metastore.metadata; + +import org.apache.drill.common.expression.SchemaPath; +import org.apache.hadoop.fs.Path; + +import java.util.List; +import java.util.Objects; +import java.util.Set; + +/** + * Represents a metadata for the table part, which corresponds to the specific partition key. + */ +public class PartitionMetadata extends BaseMetadata { + private final SchemaPath column; + private final List partitionValues; + private final Set locations; + private final long lastModifiedTime; + + private PartitionMetadata(PartitionMetadataBuilder builder) { +super(builder); +this.column = builder.column; +this.partitionValues = builder.partitionValues; +this.locations = builder.locations; +this.lastModifiedTime = builder.lastModifiedTime; + } + + /** + * It allows to obtain the column path for this partition + * + * @return column path + */ + public SchemaPath getColumn() { +return column; + } + + /** + * File locations for this partition + * + * @return file locations + */ + public Set getLocations() { +return locations; + } + + /** + * It allows to check the time, when any files were modified. It is in Unix Timestamp Review comment: Add timestamp unit of measurement. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor Metadata interfaces and classes to contain all needed information > for the File based Metastore > --- > > Key: DRILL-7271 > URL: https://issues.apache.org/jira/browse/DRILL-7271 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Arina Ielchiieva >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.17.0 > > > 1. Merge info from metadataStatistics + statisticsKinds into one holder: > Map. > 2. Rename hasStatistics to hasDescriptiveStatistics > 3. Remove drill-file-metastore-plugin > 4. Move > org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel > to metadata module, rename to MetadataType and add new value: SEGMENT. > 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder. > 6. Add new info classes: > {noformat} > class TableInfo { > String storagePlugin; > String workspace; > String name; > String type; > String owner; > } > class MetadataInfo { > public static final String GENERAL_INFO_KEY = "GENERAL_INFO"; > public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT"; > MetadataType type (enum); > String key; > String identifier; > } > {noformat} > 7. Modify existing metadata classes: > org.apache.drill.metastore.FileTableMetadata > {noformat} > missing fields > -- > storagePlugin, workspace, tableType -> will be covered by TableInfo class > metadataType, metadataKey -> will be covered by MetadataInfo class > interestingColumns > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Set partitionKeys; -> Map > {noformat} > org.apache.drill.metastore.PartitionMetadata > {noformat} > missing fields > -- > storagePlugin,
[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
[ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871150#comment-16871150 ] ASF GitHub Bot commented on DRILL-7271: --- arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore URL: https://github.com/apache/drill/pull/1810#discussion_r296694014 ## File path: metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/BaseMetadata.java ## @@ -0,0 +1,148 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.metastore.metadata; + +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.exec.record.metadata.ColumnMetadata; +import org.apache.drill.exec.record.metadata.TupleMetadata; +import org.apache.drill.metastore.SchemaPathUtils; +import org.apache.drill.metastore.statistics.ColumnStatistics; +import org.apache.drill.metastore.statistics.StatisticsHolder; +import org.apache.drill.metastore.statistics.StatisticsKind; + +import java.util.Collection; +import java.util.Map; +import java.util.Objects; +import java.util.function.Function; +import java.util.stream.Collectors; + +/** + * Common provider of tuple schema, column metadata, and statistics for table, partition, file or row group. + */ +public abstract class BaseMetadata implements Metadata { + protected final TableInfo tableInfo; + protected final MetadataInfo metadataInfo; + protected final TupleMetadata schema; + protected final Map columnsStatistics; + protected final Map statistics; + + protected > BaseMetadata(BaseMetadataBuilder builder) { +this.tableInfo = builder.tableInfo; +this.metadataInfo = builder.metadataInfo; +this.schema = builder.schema; +this.columnsStatistics = builder.columnsStatistics; +this.statistics = builder.statistics.stream() +.collect(Collectors.toMap( +statistic -> statistic.getStatisticsKind().getName(), +Function.identity(), +(a, b) -> a.getStatisticsKind().isExact() ? a : b)); + } + + @Override + public Map getColumnsStatistics() { +return columnsStatistics; + } + + @Override + public ColumnStatistics getColumnStatistics(SchemaPath columnName) { +return columnsStatistics.get(columnName); + } + + @Override + public TupleMetadata getSchema() { +return schema; + } + + @Override + @SuppressWarnings("unchecked") + public V getStatistic(StatisticsKind statisticsKind) { +StatisticsHolder statisticsHolder = statistics.get(statisticsKind.getName()); +return statisticsHolder != null ? statisticsHolder.getStatisticsValue() : null; + } + + @Override + public boolean containsExactStatistics(StatisticsKind statisticsKind) { +StatisticsHolder statisticsHolder = statistics.get(statisticsKind.getName()); +return statisticsHolder != null && statisticsHolder.getStatisticsKind().isExact(); + } + + @Override + @SuppressWarnings("unchecked") + public V getStatisticsForColumn(SchemaPath columnName, StatisticsKind statisticsKind) { +return (V) columnsStatistics.get(columnName).get(statisticsKind); + } + + @Override + public ColumnMetadata getColumn(SchemaPath name) { +return SchemaPathUtils.getColumnMetadata(name, schema); + } + + @Override + public TableInfo getTableInfo() { +return tableInfo; + } + + @Override + public MetadataInfo getMetadataInfo() { +return metadataInfo; + } + + public static abstract class BaseMetadataBuilder> { +protected TableInfo tableInfo; +protected MetadataInfo metadataInfo; +protected TupleMetadata schema; +protected Map columnsStatistics; +protected Collection statistics; + +public T withTableInfo(TableInfo tableInfo) { + this.tableInfo = tableInfo; + return self(); +} + +public T withMetadataInfo(MetadataInfo metadataInfo) { + this.metadataInfo = metadataInfo; + return self(); +} + +public T withSchema(TupleMetadata schema) { + this.schema = schema; + return self(); +} + +public T withColumnsStatistics(Map
[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
[ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871156#comment-16871156 ] ASF GitHub Bot commented on DRILL-7271: --- arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore URL: https://github.com/apache/drill/pull/1810#discussion_r296696051 ## File path: metastore/metastore-api/src/main/java/org/apache/drill/metastore/statistics/ColumnStatistics.java ## @@ -0,0 +1,167 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.metastore.statistics; + +import com.fasterxml.jackson.annotation.JsonAutoDetect; +import com.fasterxml.jackson.annotation.JsonCreator; +import com.fasterxml.jackson.annotation.JsonInclude; +import com.fasterxml.jackson.annotation.JsonProperty; +import com.fasterxml.jackson.annotation.JsonPropertyOrder; +import com.fasterxml.jackson.core.JsonProcessingException; +import com.fasterxml.jackson.databind.ObjectMapper; +import com.fasterxml.jackson.databind.ObjectReader; +import org.apache.drill.common.types.TypeProtos; +import org.apache.drill.metastore.TableMetadataUtils; + +import java.io.IOException; +import java.util.Collection; +import java.util.Comparator; +import java.util.HashMap; +import java.util.Map; +import java.util.function.Function; +import java.util.stream.Collectors; + +import static org.apache.drill.metastore.statistics.StatisticsHolder.OBJECT_WRITER; + +/** + * Represents collection of statistics values for specific column. Review comment: Can you please add example. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor Metadata interfaces and classes to contain all needed information > for the File based Metastore > --- > > Key: DRILL-7271 > URL: https://issues.apache.org/jira/browse/DRILL-7271 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Arina Ielchiieva >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.17.0 > > > 1. Merge info from metadataStatistics + statisticsKinds into one holder: > Map. > 2. Rename hasStatistics to hasDescriptiveStatistics > 3. Remove drill-file-metastore-plugin > 4. Move > org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel > to metadata module, rename to MetadataType and add new value: SEGMENT. > 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder. > 6. Add new info classes: > {noformat} > class TableInfo { > String storagePlugin; > String workspace; > String name; > String type; > String owner; > } > class MetadataInfo { > public static final String GENERAL_INFO_KEY = "GENERAL_INFO"; > public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT"; > MetadataType type (enum); > String key; > String identifier; > } > {noformat} > 7. Modify existing metadata classes: > org.apache.drill.metastore.FileTableMetadata > {noformat} > missing fields > -- > storagePlugin, workspace, tableType -> will be covered by TableInfo class > metadataType, metadataKey -> will be covered by MetadataInfo class > interestingColumns > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Set partitionKeys; -> Map > {noformat} > org.apache.drill.metastore.PartitionMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > partitionValues (List) > location (String) (for directory level metadata) - directory location > fields to
[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
[ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871132#comment-16871132 ] ASF GitHub Bot commented on DRILL-7271: --- arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore URL: https://github.com/apache/drill/pull/1810#discussion_r296685467 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillStatsTable.java ## @@ -452,53 +456,54 @@ public static ObjectMapper getMapper() { .addDeserializer(TypeProtos.MajorType.class, new MajorTypeSerDe.De()) .addDeserializer(SchemaPath.class, new SchemaPath.De()); mapper.registerModule(deModule); +mapper.registerSubtypes(new NamedType(NumericEquiDepthHistogram.class, "numeric-equi-depth")); Review comment: Do you think it makes sense to add `histogram` word as well: `numeric-equi-depth-histogram`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor Metadata interfaces and classes to contain all needed information > for the File based Metastore > --- > > Key: DRILL-7271 > URL: https://issues.apache.org/jira/browse/DRILL-7271 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Arina Ielchiieva >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.17.0 > > > 1. Merge info from metadataStatistics + statisticsKinds into one holder: > Map. > 2. Rename hasStatistics to hasDescriptiveStatistics > 3. Remove drill-file-metastore-plugin > 4. Move > org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel > to metadata module, rename to MetadataType and add new value: SEGMENT. > 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder. > 6. Add new info classes: > {noformat} > class TableInfo { > String storagePlugin; > String workspace; > String name; > String type; > String owner; > } > class MetadataInfo { > public static final String GENERAL_INFO_KEY = "GENERAL_INFO"; > public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT"; > MetadataType type (enum); > String key; > String identifier; > } > {noformat} > 7. Modify existing metadata classes: > org.apache.drill.metastore.FileTableMetadata > {noformat} > missing fields > -- > storagePlugin, workspace, tableType -> will be covered by TableInfo class > metadataType, metadataKey -> will be covered by MetadataInfo class > interestingColumns > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Set partitionKeys; -> Map > {noformat} > org.apache.drill.metastore.PartitionMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > partitionValues (List) > location (String) (for directory level metadata) - directory location > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Set location; -> locations > {noformat} > org.apache.drill.metastore.FileMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > path - path to file > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Path location; - should contain directory to which file belongs > {noformat} > org.apache.drill.metastore.RowGroupMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > path - path to file > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Path location; - should contain directory to which file belongs > {noformat} > 8. Remove org.apache.drill.exec package from metastore module. > 9. Rename ColumnStatisticsImpl class. > 10. Separate existing classes in org.apache.drill.metastore package into > sub-packages. > 11. Rename FileTableMetadata -> BaseTableMetadata > 12.
[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
[ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871131#comment-16871131 ] ASF GitHub Bot commented on DRILL-7271: --- arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore URL: https://github.com/apache/drill/pull/1810#discussion_r296682838 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/AbstractGroupScanWithMetadata.java ## @@ -666,27 +733,66 @@ protected void filterTableMetadata(FilterPredicate filterPredicate, Set Refactor Metadata interfaces and classes to contain all needed information > for the File based Metastore > --- > > Key: DRILL-7271 > URL: https://issues.apache.org/jira/browse/DRILL-7271 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Arina Ielchiieva >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.17.0 > > > 1. Merge info from metadataStatistics + statisticsKinds into one holder: > Map. > 2. Rename hasStatistics to hasDescriptiveStatistics > 3. Remove drill-file-metastore-plugin > 4. Move > org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel > to metadata module, rename to MetadataType and add new value: SEGMENT. > 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder. > 6. Add new info classes: > {noformat} > class TableInfo { > String storagePlugin; > String workspace; > String name; > String type; > String owner; > } > class MetadataInfo { > public static final String GENERAL_INFO_KEY = "GENERAL_INFO"; > public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT"; > MetadataType type (enum); > String key; > String identifier; > } > {noformat} > 7. Modify existing metadata classes: > org.apache.drill.metastore.FileTableMetadata > {noformat} > missing fields > -- > storagePlugin, workspace, tableType -> will be covered by TableInfo class > metadataType, metadataKey -> will be covered by MetadataInfo class > interestingColumns > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Set partitionKeys; -> Map > {noformat} > org.apache.drill.metastore.PartitionMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > partitionValues (List) > location (String) (for directory level metadata) - directory location > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Set location; -> locations > {noformat} > org.apache.drill.metastore.FileMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > path - path to file > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Path location; - should contain directory to which file belongs > {noformat} > org.apache.drill.metastore.RowGroupMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > path - path to file > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Path location; - should contain directory to which file belongs > {noformat} > 8. Remove org.apache.drill.exec package from metastore module. > 9. Rename ColumnStatisticsImpl class. > 10. Separate existing classes in org.apache.drill.metastore package into > sub-packages. > 11. Rename FileTableMetadata -> BaseTableMetadata > 12. TableMetadataProvider.getNonInterestingColumnsMeta() -> > getNonInterestingColumnsMetadata > 13. Introduce segment-level metadata class: > {noformat} > class SegmentMetadata { > TableInfo tableInfo; > MetadataInfo metadataInfo; > SchemaPath column; > TupleMetadata schema; > String location; > Map columnsStatistics; > Map statistics; > List partitionValues; > List locations; > long lastModifiedTime; > } > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
[ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871141#comment-16871141 ] ASF GitHub Bot commented on DRILL-7271: --- arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore URL: https://github.com/apache/drill/pull/1810#discussion_r296691350 ## File path: metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/MetadataType.java ## @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.metastore.metadata; + +/** + * Enum with possible types of metadata. + */ +public enum MetadataType { + + ALL, + + /** + * Table level metadata type. + */ + TABLE, + + /** + * Segment level metadata type. It corresponds to the metadata + * within specific directory for FS tables, or may correspond to partition for hive tables. + */ + SEGMENT, + + /** + * Drill partition level metadata type. It corresponds to parts of table data which has the same + * values within specific column, i.e. partitions discovered by Drill. + */ + PARTITION, + + /** + * File level metadata type. + */ + FILE, + + /** + * Row group level metadata type. Used for parquet tables. + */ + ROW_GROUP, + + NONE Review comment: 1. Add java doc 2. Where none can be used? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor Metadata interfaces and classes to contain all needed information > for the File based Metastore > --- > > Key: DRILL-7271 > URL: https://issues.apache.org/jira/browse/DRILL-7271 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Arina Ielchiieva >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.17.0 > > > 1. Merge info from metadataStatistics + statisticsKinds into one holder: > Map. > 2. Rename hasStatistics to hasDescriptiveStatistics > 3. Remove drill-file-metastore-plugin > 4. Move > org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel > to metadata module, rename to MetadataType and add new value: SEGMENT. > 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder. > 6. Add new info classes: > {noformat} > class TableInfo { > String storagePlugin; > String workspace; > String name; > String type; > String owner; > } > class MetadataInfo { > public static final String GENERAL_INFO_KEY = "GENERAL_INFO"; > public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT"; > MetadataType type (enum); > String key; > String identifier; > } > {noformat} > 7. Modify existing metadata classes: > org.apache.drill.metastore.FileTableMetadata > {noformat} > missing fields > -- > storagePlugin, workspace, tableType -> will be covered by TableInfo class > metadataType, metadataKey -> will be covered by MetadataInfo class > interestingColumns > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Set partitionKeys; -> Map > {noformat} > org.apache.drill.metastore.PartitionMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > partitionValues (List) > location (String) (for directory level metadata) - directory location > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Set location; -> locations > {noformat} > org.apache.drill.metastore.FileMetadata > {noformat} > missing
[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
[ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871136#comment-16871136 ] ASF GitHub Bot commented on DRILL-7271: --- arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore URL: https://github.com/apache/drill/pull/1810#discussion_r296682028 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/AbstractGroupScanWithMetadata.java ## @@ -535,31 +581,39 @@ public TableMetadata getTableMetadata() { return partitions; } + protected Map getSegmentsMetadata() { +if (segments == null) { + segments = metadataProvider.getSegmentsMetadataMap(); +} +return segments; + } + @JsonIgnore public NonInterestingColumnsMetadata getNonInterestingColumnsMetadata() { if (nonInterestingColumnsMetadata == null) { - nonInterestingColumnsMetadata = metadataProvider.getNonInterestingColumnsMeta(); + nonInterestingColumnsMetadata = metadataProvider.getNonInterestingColumnsMetadata(); } return nonInterestingColumnsMetadata; } /** * This class is responsible for filtering different metadata levels. */ - protected abstract static class GroupScanWithMetadataFilterer { + protected abstract static class GroupScanWithMetadataFilterer> { protected final AbstractGroupScanWithMetadata source; protected boolean matchAllMetadata = false; protected TableMetadata tableMetadata; protected List partitions = Collections.emptyList(); +protected Map segments = Collections.emptyMap(); Review comment: Using Collections emptyMap or emptyList creates unmodifiable objects, is this expected? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor Metadata interfaces and classes to contain all needed information > for the File based Metastore > --- > > Key: DRILL-7271 > URL: https://issues.apache.org/jira/browse/DRILL-7271 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Arina Ielchiieva >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.17.0 > > > 1. Merge info from metadataStatistics + statisticsKinds into one holder: > Map. > 2. Rename hasStatistics to hasDescriptiveStatistics > 3. Remove drill-file-metastore-plugin > 4. Move > org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel > to metadata module, rename to MetadataType and add new value: SEGMENT. > 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder. > 6. Add new info classes: > {noformat} > class TableInfo { > String storagePlugin; > String workspace; > String name; > String type; > String owner; > } > class MetadataInfo { > public static final String GENERAL_INFO_KEY = "GENERAL_INFO"; > public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT"; > MetadataType type (enum); > String key; > String identifier; > } > {noformat} > 7. Modify existing metadata classes: > org.apache.drill.metastore.FileTableMetadata > {noformat} > missing fields > -- > storagePlugin, workspace, tableType -> will be covered by TableInfo class > metadataType, metadataKey -> will be covered by MetadataInfo class > interestingColumns > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Set partitionKeys; -> Map > {noformat} > org.apache.drill.metastore.PartitionMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > partitionValues (List) > location (String) (for directory level metadata) - directory location > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Set location; -> locations > {noformat} > org.apache.drill.metastore.FileMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > path - path to file > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Path location; - should contain directory to which file belongs >
[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
[ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871125#comment-16871125 ] ASF GitHub Bot commented on DRILL-7271: --- arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore URL: https://github.com/apache/drill/pull/1810#discussion_r296615995 ## File path: metastore/metastore-api/src/main/java/org/apache/drill/metastore/statistics/StatisticsHolder.java ## @@ -0,0 +1,76 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.metastore.statistics; + +import com.fasterxml.jackson.annotation.JsonCreator; +import com.fasterxml.jackson.annotation.JsonInclude; +import com.fasterxml.jackson.annotation.JsonProperty; +import com.fasterxml.jackson.annotation.JsonTypeInfo; +import com.fasterxml.jackson.core.JsonProcessingException; +import com.fasterxml.jackson.core.util.DefaultPrettyPrinter; +import com.fasterxml.jackson.databind.ObjectMapper; +import com.fasterxml.jackson.databind.ObjectReader; +import com.fasterxml.jackson.databind.ObjectWriter; + +import java.io.IOException; + +/** + * Class-holder for statistics kind and its value. + * + * @param Type of statistics value + */ +@JsonInclude(JsonInclude.Include.NON_DEFAULT) +public class StatisticsHolder { + + public static final ObjectWriter OBJECT_WRITER = new ObjectMapper().setDefaultPrettyPrinter(new DefaultPrettyPrinter()).writer(); + private static final ObjectReader OBJECT_READER = new ObjectMapper().readerFor(StatisticsHolder.class); + + private final T statisticsValue; + private final BaseStatisticsKind statisticsKind; + + @JsonCreator + public StatisticsHolder(@JsonProperty("statisticsValue") T statisticsValue, + @JsonProperty("statisticsKind") BaseStatisticsKind statisticsKind) { +this.statisticsValue = statisticsValue; +this.statisticsKind = statisticsKind; + } + + public StatisticsHolder(T statisticsValue, + StatisticsKind statisticsKind) { +this.statisticsValue = statisticsValue; +this.statisticsKind = (BaseStatisticsKind) statisticsKind; + } + + @JsonTypeInfo(use = JsonTypeInfo.Id.CLASS, +include = JsonTypeInfo.As.WRAPPER_OBJECT) + public T getStatisticsValue() { +return statisticsValue; + } + + public StatisticsKind getStatisticsKind() { +return statisticsKind; + } + + public static StatisticsHolder deserialize(String serialized) throws IOException { +return OBJECT_READER.readValue(serialized); + } + + public static String serialize(StatisticsHolder statisticsHolder) throws JsonProcessingException { Review comment: Should be class level method without parameters: `public String toJsonString()` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor Metadata interfaces and classes to contain all needed information > for the File based Metastore > --- > > Key: DRILL-7271 > URL: https://issues.apache.org/jira/browse/DRILL-7271 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Arina Ielchiieva >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.17.0 > > > 1. Merge info from metadataStatistics + statisticsKinds into one holder: > Map. > 2. Rename hasStatistics to hasDescriptiveStatistics > 3. Remove drill-file-metastore-plugin > 4. Move > org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel > to metadata module, rename to MetadataType and add new value: SEGMENT. > 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder. > 6. Add new info classes: > {noformat} > class TableInfo { >
[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
[ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871140#comment-16871140 ] ASF GitHub Bot commented on DRILL-7271: --- arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore URL: https://github.com/apache/drill/pull/1810#discussion_r296692394 ## File path: metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/PartitionMetadata.java ## @@ -0,0 +1,119 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.metastore.metadata; + +import org.apache.drill.common.expression.SchemaPath; +import org.apache.hadoop.fs.Path; + +import java.util.List; +import java.util.Objects; +import java.util.Set; + +/** + * Represents a metadata for the table part, which corresponds to the specific partition key. + */ +public class PartitionMetadata extends BaseMetadata { + private final SchemaPath column; + private final List partitionValues; + private final Set locations; + private final long lastModifiedTime; + + private PartitionMetadata(PartitionMetadataBuilder builder) { +super(builder); +this.column = builder.column; +this.partitionValues = builder.partitionValues; +this.locations = builder.locations; +this.lastModifiedTime = builder.lastModifiedTime; + } + + /** + * It allows to obtain the column path for this partition + * + * @return column path + */ + public SchemaPath getColumn() { +return column; + } + + /** + * File locations for this partition + * + * @return file locations + */ + public Set getLocations() { +return locations; + } + + /** + * It allows to check the time, when any files were modified. It is in Unix Timestamp + * + * @return last modified time of files + */ + public long getLastModifiedTime() { +return lastModifiedTime; + } + + public List getPartitionValues() { +return partitionValues; + } + + public static PartitionMetadataBuilder builder() { +return new PartitionMetadataBuilder(); + } + + public static class PartitionMetadataBuilder extends BaseMetadataBuilder { +private SchemaPath column; +private List partitionValues; +private Set locations; +private long lastModifiedTime = BaseTableMetadata.NON_DEFINED_LAST_MODIFIED_TIME; + +public PartitionMetadataBuilder withLocations(Set locations) { Review comment: I think you can omit adding with, example: `locations`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor Metadata interfaces and classes to contain all needed information > for the File based Metastore > --- > > Key: DRILL-7271 > URL: https://issues.apache.org/jira/browse/DRILL-7271 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Arina Ielchiieva >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.17.0 > > > 1. Merge info from metadataStatistics + statisticsKinds into one holder: > Map. > 2. Rename hasStatistics to hasDescriptiveStatistics > 3. Remove drill-file-metastore-plugin > 4. Move > org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel > to metadata module, rename to MetadataType and add new value: SEGMENT. > 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder. > 6. Add new info classes: > {noformat} > class TableInfo { > String storagePlugin; > String workspace; > String name; > String type; > String owner; > } > class MetadataInfo { > public static final String GENERAL_INFO_KEY = "GENERAL_INFO"; > public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT"; >
[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
[ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871135#comment-16871135 ] ASF GitHub Bot commented on DRILL-7271: --- arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore URL: https://github.com/apache/drill/pull/1810#discussion_r296691160 ## File path: metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/MetadataType.java ## @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.metastore.metadata; + +/** + * Enum with possible types of metadata. + */ +public enum MetadataType { + + ALL, Review comment: java doc: "Metadata that can be applicable to any type" This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor Metadata interfaces and classes to contain all needed information > for the File based Metastore > --- > > Key: DRILL-7271 > URL: https://issues.apache.org/jira/browse/DRILL-7271 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Arina Ielchiieva >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.17.0 > > > 1. Merge info from metadataStatistics + statisticsKinds into one holder: > Map. > 2. Rename hasStatistics to hasDescriptiveStatistics > 3. Remove drill-file-metastore-plugin > 4. Move > org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel > to metadata module, rename to MetadataType and add new value: SEGMENT. > 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder. > 6. Add new info classes: > {noformat} > class TableInfo { > String storagePlugin; > String workspace; > String name; > String type; > String owner; > } > class MetadataInfo { > public static final String GENERAL_INFO_KEY = "GENERAL_INFO"; > public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT"; > MetadataType type (enum); > String key; > String identifier; > } > {noformat} > 7. Modify existing metadata classes: > org.apache.drill.metastore.FileTableMetadata > {noformat} > missing fields > -- > storagePlugin, workspace, tableType -> will be covered by TableInfo class > metadataType, metadataKey -> will be covered by MetadataInfo class > interestingColumns > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Set partitionKeys; -> Map > {noformat} > org.apache.drill.metastore.PartitionMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > partitionValues (List) > location (String) (for directory level metadata) - directory location > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Set location; -> locations > {noformat} > org.apache.drill.metastore.FileMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > path - path to file > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Path location; - should contain directory to which file belongs > {noformat} > org.apache.drill.metastore.RowGroupMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey,
[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
[ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871126#comment-16871126 ] ASF GitHub Bot commented on DRILL-7271: --- arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore URL: https://github.com/apache/drill/pull/1810#discussion_r296615585 ## File path: metastore/metastore-api/src/main/java/org/apache/drill/metastore/statistics/StatisticsHolder.java ## @@ -0,0 +1,76 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.metastore.statistics; + +import com.fasterxml.jackson.annotation.JsonCreator; +import com.fasterxml.jackson.annotation.JsonInclude; +import com.fasterxml.jackson.annotation.JsonProperty; +import com.fasterxml.jackson.annotation.JsonTypeInfo; +import com.fasterxml.jackson.core.JsonProcessingException; +import com.fasterxml.jackson.core.util.DefaultPrettyPrinter; +import com.fasterxml.jackson.databind.ObjectMapper; +import com.fasterxml.jackson.databind.ObjectReader; +import com.fasterxml.jackson.databind.ObjectWriter; + +import java.io.IOException; + +/** + * Class-holder for statistics kind and its value. + * + * @param Type of statistics value + */ +@JsonInclude(JsonInclude.Include.NON_DEFAULT) +public class StatisticsHolder { + + public static final ObjectWriter OBJECT_WRITER = new ObjectMapper().setDefaultPrettyPrinter(new DefaultPrettyPrinter()).writer(); + private static final ObjectReader OBJECT_READER = new ObjectMapper().readerFor(StatisticsHolder.class); + + private final T statisticsValue; + private final BaseStatisticsKind statisticsKind; + + @JsonCreator + public StatisticsHolder(@JsonProperty("statisticsValue") T statisticsValue, + @JsonProperty("statisticsKind") BaseStatisticsKind statisticsKind) { +this.statisticsValue = statisticsValue; +this.statisticsKind = statisticsKind; + } + + public StatisticsHolder(T statisticsValue, + StatisticsKind statisticsKind) { +this.statisticsValue = statisticsValue; +this.statisticsKind = (BaseStatisticsKind) statisticsKind; + } + + @JsonTypeInfo(use = JsonTypeInfo.Id.CLASS, +include = JsonTypeInfo.As.WRAPPER_OBJECT) + public T getStatisticsValue() { +return statisticsValue; + } + + public StatisticsKind getStatisticsKind() { +return statisticsKind; + } + + public static StatisticsHolder deserialize(String serialized) throws IOException { Review comment: Rename: `deserialize` -> `of`, `serialized` -> `jsonString` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor Metadata interfaces and classes to contain all needed information > for the File based Metastore > --- > > Key: DRILL-7271 > URL: https://issues.apache.org/jira/browse/DRILL-7271 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Arina Ielchiieva >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.17.0 > > > 1. Merge info from metadataStatistics + statisticsKinds into one holder: > Map. > 2. Rename hasStatistics to hasDescriptiveStatistics > 3. Remove drill-file-metastore-plugin > 4. Move > org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel > to metadata module, rename to MetadataType and add new value: SEGMENT. > 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder. > 6. Add new info classes: > {noformat} > class TableInfo { > String storagePlugin; > String workspace; > String name; > String type; > String owner; > } > class MetadataInfo { > public static final String GENERAL_INFO_KEY =
[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
[ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871130#comment-16871130 ] ASF GitHub Bot commented on DRILL-7271: --- arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore URL: https://github.com/apache/drill/pull/1810#discussion_r296682165 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/AbstractGroupScanWithMetadata.java ## @@ -572,34 +626,39 @@ public GroupScanWithMetadataFilterer(AbstractGroupScanWithMetadata source) { */ public abstract AbstractGroupScanWithMetadata build(); -public GroupScanWithMetadataFilterer withTable(TableMetadata tableMetadata) { +public B withTable(TableMetadata tableMetadata) { this.tableMetadata = tableMetadata; - return this; + return self(); Review comment: Why `self()` method is better than returning `this`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor Metadata interfaces and classes to contain all needed information > for the File based Metastore > --- > > Key: DRILL-7271 > URL: https://issues.apache.org/jira/browse/DRILL-7271 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Arina Ielchiieva >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.17.0 > > > 1. Merge info from metadataStatistics + statisticsKinds into one holder: > Map. > 2. Rename hasStatistics to hasDescriptiveStatistics > 3. Remove drill-file-metastore-plugin > 4. Move > org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel > to metadata module, rename to MetadataType and add new value: SEGMENT. > 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder. > 6. Add new info classes: > {noformat} > class TableInfo { > String storagePlugin; > String workspace; > String name; > String type; > String owner; > } > class MetadataInfo { > public static final String GENERAL_INFO_KEY = "GENERAL_INFO"; > public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT"; > MetadataType type (enum); > String key; > String identifier; > } > {noformat} > 7. Modify existing metadata classes: > org.apache.drill.metastore.FileTableMetadata > {noformat} > missing fields > -- > storagePlugin, workspace, tableType -> will be covered by TableInfo class > metadataType, metadataKey -> will be covered by MetadataInfo class > interestingColumns > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Set partitionKeys; -> Map > {noformat} > org.apache.drill.metastore.PartitionMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > partitionValues (List) > location (String) (for directory level metadata) - directory location > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Set location; -> locations > {noformat} > org.apache.drill.metastore.FileMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > path - path to file > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Path location; - should contain directory to which file belongs > {noformat} > org.apache.drill.metastore.RowGroupMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > path - path to file > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Path location; - should contain directory to which file belongs > {noformat} > 8. Remove org.apache.drill.exec package from metastore module. > 9. Rename ColumnStatisticsImpl class. > 10. Separate existing classes in org.apache.drill.metastore package into > sub-packages. > 11. Rename FileTableMetadata -> BaseTableMetadata > 12.
[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
[ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871153#comment-16871153 ] ASF GitHub Bot commented on DRILL-7271: --- arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore URL: https://github.com/apache/drill/pull/1810#discussion_r296694427 ## File path: metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/BaseMetadata.java ## @@ -0,0 +1,148 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.metastore.metadata; + +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.exec.record.metadata.ColumnMetadata; +import org.apache.drill.exec.record.metadata.TupleMetadata; +import org.apache.drill.metastore.SchemaPathUtils; +import org.apache.drill.metastore.statistics.ColumnStatistics; +import org.apache.drill.metastore.statistics.StatisticsHolder; +import org.apache.drill.metastore.statistics.StatisticsKind; + +import java.util.Collection; +import java.util.Map; +import java.util.Objects; +import java.util.function.Function; +import java.util.stream.Collectors; + +/** + * Common provider of tuple schema, column metadata, and statistics for table, partition, file or row group. + */ +public abstract class BaseMetadata implements Metadata { + protected final TableInfo tableInfo; + protected final MetadataInfo metadataInfo; + protected final TupleMetadata schema; + protected final Map columnsStatistics; + protected final Map statistics; Review comment: What the difference between statistics and column statistics? Maybe statistics should be named better, for example, generalStatistics or metadataStatistics? I think for Metastore we used `metadataStatistics` naming ... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor Metadata interfaces and classes to contain all needed information > for the File based Metastore > --- > > Key: DRILL-7271 > URL: https://issues.apache.org/jira/browse/DRILL-7271 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Arina Ielchiieva >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.17.0 > > > 1. Merge info from metadataStatistics + statisticsKinds into one holder: > Map. > 2. Rename hasStatistics to hasDescriptiveStatistics > 3. Remove drill-file-metastore-plugin > 4. Move > org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel > to metadata module, rename to MetadataType and add new value: SEGMENT. > 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder. > 6. Add new info classes: > {noformat} > class TableInfo { > String storagePlugin; > String workspace; > String name; > String type; > String owner; > } > class MetadataInfo { > public static final String GENERAL_INFO_KEY = "GENERAL_INFO"; > public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT"; > MetadataType type (enum); > String key; > String identifier; > } > {noformat} > 7. Modify existing metadata classes: > org.apache.drill.metastore.FileTableMetadata > {noformat} > missing fields > -- > storagePlugin, workspace, tableType -> will be covered by TableInfo class > metadataType, metadataKey -> will be covered by MetadataInfo class > interestingColumns > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Set partitionKeys; -> Map > {noformat} > org.apache.drill.metastore.PartitionMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by
[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
[ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871145#comment-16871145 ] ASF GitHub Bot commented on DRILL-7271: --- arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore URL: https://github.com/apache/drill/pull/1810#discussion_r296696592 ## File path: metastore/metastore-api/src/main/java/org/apache/drill/metastore/statistics/StatisticsHolder.java ## @@ -0,0 +1,76 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.metastore.statistics; + +import com.fasterxml.jackson.annotation.JsonCreator; +import com.fasterxml.jackson.annotation.JsonInclude; +import com.fasterxml.jackson.annotation.JsonProperty; +import com.fasterxml.jackson.annotation.JsonTypeInfo; +import com.fasterxml.jackson.core.JsonProcessingException; +import com.fasterxml.jackson.core.util.DefaultPrettyPrinter; +import com.fasterxml.jackson.databind.ObjectMapper; +import com.fasterxml.jackson.databind.ObjectReader; +import com.fasterxml.jackson.databind.ObjectWriter; + +import java.io.IOException; + +/** + * Class-holder for statistics kind and its value. + * + * @param Type of statistics value + */ +@JsonInclude(JsonInclude.Include.NON_DEFAULT) +public class StatisticsHolder { + + public static final ObjectWriter OBJECT_WRITER = new ObjectMapper().setDefaultPrettyPrinter(new DefaultPrettyPrinter()).writer(); + private static final ObjectReader OBJECT_READER = new ObjectMapper().readerFor(StatisticsHolder.class); + + private final T statisticsValue; + private final BaseStatisticsKind statisticsKind; + + @JsonCreator + public StatisticsHolder(@JsonProperty("statisticsValue") T statisticsValue, + @JsonProperty("statisticsKind") BaseStatisticsKind statisticsKind) { +this.statisticsValue = statisticsValue; +this.statisticsKind = statisticsKind; + } + + public StatisticsHolder(T statisticsValue, + StatisticsKind statisticsKind) { +this.statisticsValue = statisticsValue; +this.statisticsKind = (BaseStatisticsKind) statisticsKind; + } + + @JsonTypeInfo(use = JsonTypeInfo.Id.CLASS, +include = JsonTypeInfo.As.WRAPPER_OBJECT) + public T getStatisticsValue() { +return statisticsValue; + } + + public StatisticsKind getStatisticsKind() { +return statisticsKind; + } + + public static StatisticsHolder deserialize(String serialized) throws IOException { +return OBJECT_READER.readValue(serialized); + } + + public static String serialize(StatisticsHolder statisticsHolder) throws JsonProcessingException { Review comment: Please apply the same for other classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor Metadata interfaces and classes to contain all needed information > for the File based Metastore > --- > > Key: DRILL-7271 > URL: https://issues.apache.org/jira/browse/DRILL-7271 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Arina Ielchiieva >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.17.0 > > > 1. Merge info from metadataStatistics + statisticsKinds into one holder: > Map. > 2. Rename hasStatistics to hasDescriptiveStatistics > 3. Remove drill-file-metastore-plugin > 4. Move > org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel > to metadata module, rename to MetadataType and add new value: SEGMENT. > 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder. > 6. Add new info classes: > {noformat} > class TableInfo { > String storagePlugin; > String
[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
[ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871148#comment-16871148 ] ASF GitHub Bot commented on DRILL-7271: --- arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore URL: https://github.com/apache/drill/pull/1810#discussion_r296691639 ## File path: metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/PartitionMetadata.java ## @@ -0,0 +1,119 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.metastore.metadata; + +import org.apache.drill.common.expression.SchemaPath; +import org.apache.hadoop.fs.Path; + +import java.util.List; +import java.util.Objects; +import java.util.Set; + +/** + * Represents a metadata for the table part, which corresponds to the specific partition key. + */ +public class PartitionMetadata extends BaseMetadata { + private final SchemaPath column; + private final List partitionValues; + private final Set locations; + private final long lastModifiedTime; + + private PartitionMetadata(PartitionMetadataBuilder builder) { +super(builder); +this.column = builder.column; +this.partitionValues = builder.partitionValues; +this.locations = builder.locations; +this.lastModifiedTime = builder.lastModifiedTime; + } + + /** + * It allows to obtain the column path for this partition + * + * @return column path + */ + public SchemaPath getColumn() { +return column; + } + + /** + * File locations for this partition + * + * @return file locations + */ + public Set getLocations() { +return locations; + } + + /** + * It allows to check the time, when any files were modified. It is in Unix Timestamp Review comment: ```suggestion * Allows to check the time, when any files were modified. It is in Unix Timestamp ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor Metadata interfaces and classes to contain all needed information > for the File based Metastore > --- > > Key: DRILL-7271 > URL: https://issues.apache.org/jira/browse/DRILL-7271 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Arina Ielchiieva >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.17.0 > > > 1. Merge info from metadataStatistics + statisticsKinds into one holder: > Map. > 2. Rename hasStatistics to hasDescriptiveStatistics > 3. Remove drill-file-metastore-plugin > 4. Move > org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel > to metadata module, rename to MetadataType and add new value: SEGMENT. > 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder. > 6. Add new info classes: > {noformat} > class TableInfo { > String storagePlugin; > String workspace; > String name; > String type; > String owner; > } > class MetadataInfo { > public static final String GENERAL_INFO_KEY = "GENERAL_INFO"; > public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT"; > MetadataType type (enum); > String key; > String identifier; > } > {noformat} > 7. Modify existing metadata classes: > org.apache.drill.metastore.FileTableMetadata > {noformat} > missing fields > -- > storagePlugin, workspace, tableType -> will be covered by TableInfo class > metadataType, metadataKey -> will be covered by MetadataInfo class > interestingColumns > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Set partitionKeys; -> Map > {noformat} > org.apache.drill.metastore.PartitionMetadata
[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
[ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871144#comment-16871144 ] ASF GitHub Bot commented on DRILL-7271: --- arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore URL: https://github.com/apache/drill/pull/1810#discussion_r296689298 ## File path: metastore/metastore-api/src/main/java/org/apache/drill/metastore/TableMetadataUtils.java ## @@ -0,0 +1,139 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.metastore; + +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.common.types.TypeProtos; +import org.apache.drill.metastore.metadata.BaseMetadata; +import org.apache.drill.metastore.metadata.TableMetadata; +import org.apache.drill.metastore.statistics.CollectableColumnStatisticsKind; +import org.apache.drill.metastore.statistics.ColumnStatistics; +import org.apache.drill.metastore.statistics.ColumnStatisticsKind; +import org.apache.drill.metastore.statistics.StatisticsHolder; +import org.apache.drill.metastore.statistics.TableStatisticsKind; +import org.apache.drill.shaded.guava.com.google.common.primitives.UnsignedBytes; + +import java.util.ArrayList; +import java.util.Collection; +import java.util.Collections; +import java.util.Comparator; +import java.util.HashMap; +import java.util.Iterator; +import java.util.List; +import java.util.Map; +import java.util.Set; + +public class TableMetadataUtils { + + private TableMetadataUtils() { Review comment: Again, no objections but just per my opinion this is an overhead. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor Metadata interfaces and classes to contain all needed information > for the File based Metastore > --- > > Key: DRILL-7271 > URL: https://issues.apache.org/jira/browse/DRILL-7271 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Arina Ielchiieva >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.17.0 > > > 1. Merge info from metadataStatistics + statisticsKinds into one holder: > Map. > 2. Rename hasStatistics to hasDescriptiveStatistics > 3. Remove drill-file-metastore-plugin > 4. Move > org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel > to metadata module, rename to MetadataType and add new value: SEGMENT. > 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder. > 6. Add new info classes: > {noformat} > class TableInfo { > String storagePlugin; > String workspace; > String name; > String type; > String owner; > } > class MetadataInfo { > public static final String GENERAL_INFO_KEY = "GENERAL_INFO"; > public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT"; > MetadataType type (enum); > String key; > String identifier; > } > {noformat} > 7. Modify existing metadata classes: > org.apache.drill.metastore.FileTableMetadata > {noformat} > missing fields > -- > storagePlugin, workspace, tableType -> will be covered by TableInfo class > metadataType, metadataKey -> will be covered by MetadataInfo class > interestingColumns > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Set partitionKeys; -> Map > {noformat} > org.apache.drill.metastore.PartitionMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > partitionValues (List) > location (String) (for
[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
[ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871143#comment-16871143 ] ASF GitHub Bot commented on DRILL-7271: --- arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore URL: https://github.com/apache/drill/pull/1810#discussion_r296693933 ## File path: metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/BaseMetadata.java ## @@ -0,0 +1,148 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.metastore.metadata; + +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.exec.record.metadata.ColumnMetadata; +import org.apache.drill.exec.record.metadata.TupleMetadata; +import org.apache.drill.metastore.SchemaPathUtils; +import org.apache.drill.metastore.statistics.ColumnStatistics; +import org.apache.drill.metastore.statistics.StatisticsHolder; +import org.apache.drill.metastore.statistics.StatisticsKind; + +import java.util.Collection; +import java.util.Map; +import java.util.Objects; +import java.util.function.Function; +import java.util.stream.Collectors; + +/** + * Common provider of tuple schema, column metadata, and statistics for table, partition, file or row group. + */ +public abstract class BaseMetadata implements Metadata { + protected final TableInfo tableInfo; + protected final MetadataInfo metadataInfo; + protected final TupleMetadata schema; + protected final Map columnsStatistics; + protected final Map statistics; + + protected > BaseMetadata(BaseMetadataBuilder builder) { +this.tableInfo = builder.tableInfo; +this.metadataInfo = builder.metadataInfo; +this.schema = builder.schema; +this.columnsStatistics = builder.columnsStatistics; +this.statistics = builder.statistics.stream() +.collect(Collectors.toMap( +statistic -> statistic.getStatisticsKind().getName(), +Function.identity(), +(a, b) -> a.getStatisticsKind().isExact() ? a : b)); + } + + @Override + public Map getColumnsStatistics() { +return columnsStatistics; + } + + @Override + public ColumnStatistics getColumnStatistics(SchemaPath columnName) { +return columnsStatistics.get(columnName); + } + + @Override + public TupleMetadata getSchema() { +return schema; + } + + @Override + @SuppressWarnings("unchecked") + public V getStatistic(StatisticsKind statisticsKind) { +StatisticsHolder statisticsHolder = statistics.get(statisticsKind.getName()); +return statisticsHolder != null ? statisticsHolder.getStatisticsValue() : null; + } + + @Override + public boolean containsExactStatistics(StatisticsKind statisticsKind) { +StatisticsHolder statisticsHolder = statistics.get(statisticsKind.getName()); +return statisticsHolder != null && statisticsHolder.getStatisticsKind().isExact(); + } + + @Override + @SuppressWarnings("unchecked") + public V getStatisticsForColumn(SchemaPath columnName, StatisticsKind statisticsKind) { +return (V) columnsStatistics.get(columnName).get(statisticsKind); + } + + @Override + public ColumnMetadata getColumn(SchemaPath name) { +return SchemaPathUtils.getColumnMetadata(name, schema); + } + + @Override + public TableInfo getTableInfo() { +return tableInfo; + } + + @Override + public MetadataInfo getMetadataInfo() { +return metadataInfo; + } + + public static abstract class BaseMetadataBuilder> { +protected TableInfo tableInfo; +protected MetadataInfo metadataInfo; +protected TupleMetadata schema; +protected Map columnsStatistics; +protected Collection statistics; + +public T withTableInfo(TableInfo tableInfo) { + this.tableInfo = tableInfo; + return self(); +} + +public T withMetadataInfo(MetadataInfo metadataInfo) { + this.metadataInfo = metadataInfo; + return self(); +} + +public T withSchema(TupleMetadata schema) { + this.schema = schema; + return self(); +} + +public T withColumnsStatistics(Map
[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
[ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871128#comment-16871128 ] ASF GitHub Bot commented on DRILL-7271: --- arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore URL: https://github.com/apache/drill/pull/1810#discussion_r296680787 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/AbstractGroupScanWithMetadata.java ## @@ -221,6 +229,31 @@ public void setFilterForRuntime(LogicalExpression filterExpr, OptimizerRulesCont if ( ! skipRuntimePruning ) { setFilter(filterExpr); } } + /** + * Applies specified filter {@code filterExpr} to current group scan and produces filtering at: + * + * table level: + * - if filter matches all the the data or prunes all the data, sets corresponding value to Review comment: I believe html formatting has notion of nested lists rather than doing custom paragraph with dash. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor Metadata interfaces and classes to contain all needed information > for the File based Metastore > --- > > Key: DRILL-7271 > URL: https://issues.apache.org/jira/browse/DRILL-7271 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Arina Ielchiieva >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.17.0 > > > 1. Merge info from metadataStatistics + statisticsKinds into one holder: > Map. > 2. Rename hasStatistics to hasDescriptiveStatistics > 3. Remove drill-file-metastore-plugin > 4. Move > org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel > to metadata module, rename to MetadataType and add new value: SEGMENT. > 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder. > 6. Add new info classes: > {noformat} > class TableInfo { > String storagePlugin; > String workspace; > String name; > String type; > String owner; > } > class MetadataInfo { > public static final String GENERAL_INFO_KEY = "GENERAL_INFO"; > public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT"; > MetadataType type (enum); > String key; > String identifier; > } > {noformat} > 7. Modify existing metadata classes: > org.apache.drill.metastore.FileTableMetadata > {noformat} > missing fields > -- > storagePlugin, workspace, tableType -> will be covered by TableInfo class > metadataType, metadataKey -> will be covered by MetadataInfo class > interestingColumns > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Set partitionKeys; -> Map > {noformat} > org.apache.drill.metastore.PartitionMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > partitionValues (List) > location (String) (for directory level metadata) - directory location > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Set location; -> locations > {noformat} > org.apache.drill.metastore.FileMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > path - path to file > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Path location; - should contain directory to which file belongs > {noformat} > org.apache.drill.metastore.RowGroupMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > path - path to file > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Path location; - should contain directory to which file belongs > {noformat} > 8. Remove org.apache.drill.exec package from metastore module. > 9. Rename ColumnStatisticsImpl class. > 10. Separate existing classes in org.apache.drill.metastore package into > sub-packages. > 11. Rename FileTableMetadata ->
[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
[ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871146#comment-16871146 ] ASF GitHub Bot commented on DRILL-7271: --- arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore URL: https://github.com/apache/drill/pull/1810#discussion_r296694074 ## File path: metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/BaseMetadata.java ## @@ -0,0 +1,148 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.metastore.metadata; + +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.exec.record.metadata.ColumnMetadata; +import org.apache.drill.exec.record.metadata.TupleMetadata; +import org.apache.drill.metastore.SchemaPathUtils; +import org.apache.drill.metastore.statistics.ColumnStatistics; +import org.apache.drill.metastore.statistics.StatisticsHolder; +import org.apache.drill.metastore.statistics.StatisticsKind; + +import java.util.Collection; +import java.util.Map; +import java.util.Objects; +import java.util.function.Function; +import java.util.stream.Collectors; + +/** + * Common provider of tuple schema, column metadata, and statistics for table, partition, file or row group. + */ +public abstract class BaseMetadata implements Metadata { + protected final TableInfo tableInfo; + protected final MetadataInfo metadataInfo; + protected final TupleMetadata schema; + protected final Map columnsStatistics; + protected final Map statistics; + + protected > BaseMetadata(BaseMetadataBuilder builder) { +this.tableInfo = builder.tableInfo; +this.metadataInfo = builder.metadataInfo; +this.schema = builder.schema; +this.columnsStatistics = builder.columnsStatistics; +this.statistics = builder.statistics.stream() +.collect(Collectors.toMap( +statistic -> statistic.getStatisticsKind().getName(), +Function.identity(), +(a, b) -> a.getStatisticsKind().isExact() ? a : b)); + } + + @Override + public Map getColumnsStatistics() { +return columnsStatistics; + } + + @Override + public ColumnStatistics getColumnStatistics(SchemaPath columnName) { +return columnsStatistics.get(columnName); + } + + @Override + public TupleMetadata getSchema() { +return schema; + } + + @Override + @SuppressWarnings("unchecked") + public V getStatistic(StatisticsKind statisticsKind) { +StatisticsHolder statisticsHolder = statistics.get(statisticsKind.getName()); +return statisticsHolder != null ? statisticsHolder.getStatisticsValue() : null; + } + + @Override + public boolean containsExactStatistics(StatisticsKind statisticsKind) { +StatisticsHolder statisticsHolder = statistics.get(statisticsKind.getName()); +return statisticsHolder != null && statisticsHolder.getStatisticsKind().isExact(); + } + + @Override + @SuppressWarnings("unchecked") + public V getStatisticsForColumn(SchemaPath columnName, StatisticsKind statisticsKind) { +return (V) columnsStatistics.get(columnName).get(statisticsKind); + } + + @Override + public ColumnMetadata getColumn(SchemaPath name) { +return SchemaPathUtils.getColumnMetadata(name, schema); + } + + @Override + public TableInfo getTableInfo() { +return tableInfo; + } + + @Override + public MetadataInfo getMetadataInfo() { +return metadataInfo; + } + + public static abstract class BaseMetadataBuilder> { +protected TableInfo tableInfo; +protected MetadataInfo metadataInfo; +protected TupleMetadata schema; +protected Map columnsStatistics; +protected Collection statistics; + +public T withTableInfo(TableInfo tableInfo) { + this.tableInfo = tableInfo; + return self(); +} + +public T withMetadataInfo(MetadataInfo metadataInfo) { + this.metadataInfo = metadataInfo; + return self(); +} + +public T withSchema(TupleMetadata schema) { + this.schema = schema; + return self(); +} + +public T withColumnsStatistics(Map
[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
[ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871127#comment-16871127 ] ASF GitHub Bot commented on DRILL-7271: --- arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore URL: https://github.com/apache/drill/pull/1810#discussion_r296621362 ## File path: contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeParquetScan.java ## @@ -228,5 +228,10 @@ public HiveDrillNativeParquetScanFilterer(HiveDrillNativeParquetScan source) { protected AbstractParquetGroupScan getNewScan() { return new HiveDrillNativeParquetScan((HiveDrillNativeParquetScan) source); } + +@Override +protected HiveDrillNativeParquetScanFilterer self() { Review comment: Can you please explain where this method came from? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor Metadata interfaces and classes to contain all needed information > for the File based Metastore > --- > > Key: DRILL-7271 > URL: https://issues.apache.org/jira/browse/DRILL-7271 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Arina Ielchiieva >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.17.0 > > > 1. Merge info from metadataStatistics + statisticsKinds into one holder: > Map. > 2. Rename hasStatistics to hasDescriptiveStatistics > 3. Remove drill-file-metastore-plugin > 4. Move > org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel > to metadata module, rename to MetadataType and add new value: SEGMENT. > 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder. > 6. Add new info classes: > {noformat} > class TableInfo { > String storagePlugin; > String workspace; > String name; > String type; > String owner; > } > class MetadataInfo { > public static final String GENERAL_INFO_KEY = "GENERAL_INFO"; > public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT"; > MetadataType type (enum); > String key; > String identifier; > } > {noformat} > 7. Modify existing metadata classes: > org.apache.drill.metastore.FileTableMetadata > {noformat} > missing fields > -- > storagePlugin, workspace, tableType -> will be covered by TableInfo class > metadataType, metadataKey -> will be covered by MetadataInfo class > interestingColumns > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Set partitionKeys; -> Map > {noformat} > org.apache.drill.metastore.PartitionMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > partitionValues (List) > location (String) (for directory level metadata) - directory location > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Set location; -> locations > {noformat} > org.apache.drill.metastore.FileMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > path - path to file > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Path location; - should contain directory to which file belongs > {noformat} > org.apache.drill.metastore.RowGroupMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > path - path to file > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Path location; - should contain directory to which file belongs > {noformat} > 8. Remove org.apache.drill.exec package from metastore module. > 9. Rename ColumnStatisticsImpl class. > 10. Separate existing classes in org.apache.drill.metastore package into > sub-packages. > 11. Rename FileTableMetadata -> BaseTableMetadata > 12. TableMetadataProvider.getNonInterestingColumnsMeta() -> > getNonInterestingColumnsMetadata >
[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
[ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871149#comment-16871149 ] ASF GitHub Bot commented on DRILL-7271: --- arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore URL: https://github.com/apache/drill/pull/1810#discussion_r296695307 ## File path: metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/BaseTableMetadata.java ## @@ -0,0 +1,143 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.metastore.metadata; + +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.metastore.statistics.ColumnStatistics; +import org.apache.drill.metastore.statistics.StatisticsHolder; +import org.apache.hadoop.fs.Path; + +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.Objects; + +/** + * Base implementation of {@link TableMetadata} interface. + */ +public class BaseTableMetadata extends BaseMetadata implements TableMetadata { + + public static final long NON_DEFINED_LAST_MODIFIED_TIME = -1; + + private final Path location; + private final long lastModifiedTime; + private final Map partitionKeys; + private final List interestingColumns; + + private BaseTableMetadata(BaseTableMetadataBuilder builder) { +super(builder); +this.location = builder.location; +this.partitionKeys = builder.partitionKeys; +this.interestingColumns = builder.interestingColumns; +this.lastModifiedTime = builder.lastModifiedTime; + } + + public boolean isPartitionColumn(String fieldName) { +return partitionKeys.containsKey(fieldName); + } + + boolean isPartitioned() { +return !partitionKeys.isEmpty(); + } + + @Override + public Path getLocation() { +return location; + } + + @Override + public long getLastModifiedTime() { +return lastModifiedTime; + } + + @Override + public List getInterestingColumns() { +return interestingColumns; + } + + @Override + @SuppressWarnings("unchecked") + public BaseTableMetadata cloneWithStats(Map columnStatistics, List tableStatistics) { +Map mergedTableStatistics = new HashMap<>(this.statistics); + +// overrides statistics value for the case when new statistics is exact or existing one was estimated +tableStatistics.stream() +.filter(statisticsHolder -> statisticsHolder.getStatisticsKind().isExact() + || !this.statistics.get(statisticsHolder.getStatisticsKind().getName()).getStatisticsKind().isExact()) +.forEach(statisticsHolder -> mergedTableStatistics.put(statisticsHolder.getStatisticsKind().getName(), statisticsHolder)); + +Map newColumnsStatistics = new HashMap<>(this.columnsStatistics); +this.columnsStatistics.forEach( +(columnName, value) -> newColumnsStatistics.put(columnName, value.cloneWith(columnStatistics.get(columnName; + +return BaseTableMetadata.builder() +.withTableInfo(tableInfo) +.withMetadataInfo(metadataInfo) +.withLocation(location) +.withSchema(schema) +.withColumnsStatistics(newColumnsStatistics) +.withStatistics(mergedTableStatistics.values()) +.withLastModifiedTime(lastModifiedTime) +.withPartitionKeys(partitionKeys) +.withInterestingColumns(interestingColumns) +.build(); + } + + public static BaseTableMetadataBuilder builder() { +return new BaseTableMetadataBuilder(); + } + + public static class BaseTableMetadataBuilder extends BaseMetadataBuilder { +private Path location; +private long lastModifiedTime = NON_DEFINED_LAST_MODIFIED_TIME; +private Map partitionKeys; +private List interestingColumns; + +public BaseTableMetadataBuilder withLocation(Path location) { + this.location = location; + return self(); +} + +public BaseTableMetadataBuilder withLastModifiedTime(long lastModifiedTime) { + this.lastModifiedTime = lastModifiedTime; + return self(); +} + +public BaseTableMetadataBuilder
[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
[ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871155#comment-16871155 ] ASF GitHub Bot commented on DRILL-7271: --- arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore URL: https://github.com/apache/drill/pull/1810#discussion_r296695012 ## File path: metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/BaseMetadata.java ## @@ -0,0 +1,148 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.metastore.metadata; + +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.exec.record.metadata.ColumnMetadata; +import org.apache.drill.exec.record.metadata.TupleMetadata; +import org.apache.drill.metastore.SchemaPathUtils; +import org.apache.drill.metastore.statistics.ColumnStatistics; +import org.apache.drill.metastore.statistics.StatisticsHolder; +import org.apache.drill.metastore.statistics.StatisticsKind; + +import java.util.Collection; +import java.util.Map; +import java.util.Objects; +import java.util.function.Function; +import java.util.stream.Collectors; + +/** + * Common provider of tuple schema, column metadata, and statistics for table, partition, file or row group. + */ +public abstract class BaseMetadata implements Metadata { + protected final TableInfo tableInfo; + protected final MetadataInfo metadataInfo; + protected final TupleMetadata schema; + protected final Map columnsStatistics; + protected final Map statistics; + + protected > BaseMetadata(BaseMetadataBuilder builder) { +this.tableInfo = builder.tableInfo; +this.metadataInfo = builder.metadataInfo; +this.schema = builder.schema; +this.columnsStatistics = builder.columnsStatistics; +this.statistics = builder.statistics.stream() +.collect(Collectors.toMap( +statistic -> statistic.getStatisticsKind().getName(), +Function.identity(), +(a, b) -> a.getStatisticsKind().isExact() ? a : b)); + } + + @Override + public Map getColumnsStatistics() { +return columnsStatistics; + } + + @Override + public ColumnStatistics getColumnStatistics(SchemaPath columnName) { +return columnsStatistics.get(columnName); + } + + @Override + public TupleMetadata getSchema() { +return schema; + } + + @Override + @SuppressWarnings("unchecked") + public V getStatistic(StatisticsKind statisticsKind) { +StatisticsHolder statisticsHolder = statistics.get(statisticsKind.getName()); +return statisticsHolder != null ? statisticsHolder.getStatisticsValue() : null; + } + + @Override + public boolean containsExactStatistics(StatisticsKind statisticsKind) { +StatisticsHolder statisticsHolder = statistics.get(statisticsKind.getName()); +return statisticsHolder != null && statisticsHolder.getStatisticsKind().isExact(); + } + + @Override + @SuppressWarnings("unchecked") + public V getStatisticsForColumn(SchemaPath columnName, StatisticsKind statisticsKind) { +return (V) columnsStatistics.get(columnName).get(statisticsKind); + } + + @Override + public ColumnMetadata getColumn(SchemaPath name) { +return SchemaPathUtils.getColumnMetadata(name, schema); + } + + @Override + public TableInfo getTableInfo() { +return tableInfo; + } + + @Override + public MetadataInfo getMetadataInfo() { +return metadataInfo; + } + + public static abstract class BaseMetadataBuilder> { +protected TableInfo tableInfo; +protected MetadataInfo metadataInfo; +protected TupleMetadata schema; +protected Map columnsStatistics; +protected Collection statistics; + +public T withTableInfo(TableInfo tableInfo) { Review comment: Do you think `with` can be removed? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at:
[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
[ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871154#comment-16871154 ] ASF GitHub Bot commented on DRILL-7271: --- arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore URL: https://github.com/apache/drill/pull/1810#discussion_r296695918 ## File path: metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/TableInfo.java ## @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.metastore.metadata; + +/** + * General table information. + */ +public class TableInfo { + public static final String UNKNOWN = "UNKNOWN"; + public static final TableInfo UNKNOWN_TABLE_INFO = new TableInfo(UNKNOWN, UNKNOWN, UNKNOWN, UNKNOWN, UNKNOWN); + + private final String storagePlugin; + private final String workspace; + private final String name; + private final String type; + private final String owner; + + public TableInfo(String storagePlugin, String workspace, String name, String type, String owner) { Review comment: Make constructor private and add builder. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor Metadata interfaces and classes to contain all needed information > for the File based Metastore > --- > > Key: DRILL-7271 > URL: https://issues.apache.org/jira/browse/DRILL-7271 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Arina Ielchiieva >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.17.0 > > > 1. Merge info from metadataStatistics + statisticsKinds into one holder: > Map. > 2. Rename hasStatistics to hasDescriptiveStatistics > 3. Remove drill-file-metastore-plugin > 4. Move > org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel > to metadata module, rename to MetadataType and add new value: SEGMENT. > 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder. > 6. Add new info classes: > {noformat} > class TableInfo { > String storagePlugin; > String workspace; > String name; > String type; > String owner; > } > class MetadataInfo { > public static final String GENERAL_INFO_KEY = "GENERAL_INFO"; > public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT"; > MetadataType type (enum); > String key; > String identifier; > } > {noformat} > 7. Modify existing metadata classes: > org.apache.drill.metastore.FileTableMetadata > {noformat} > missing fields > -- > storagePlugin, workspace, tableType -> will be covered by TableInfo class > metadataType, metadataKey -> will be covered by MetadataInfo class > interestingColumns > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Set partitionKeys; -> Map > {noformat} > org.apache.drill.metastore.PartitionMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > partitionValues (List) > location (String) (for directory level metadata) - directory location > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Set location; -> locations > {noformat} > org.apache.drill.metastore.FileMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > path - path to
[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
[ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871139#comment-16871139 ] ASF GitHub Bot commented on DRILL-7271: --- arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore URL: https://github.com/apache/drill/pull/1810#discussion_r296690696 ## File path: metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/MetadataInfo.java ## @@ -0,0 +1,50 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.metastore.metadata; + +/** + * Class which identifies specific metadata. + */ +public class MetadataInfo { + + public static final String GENERAL_INFO_KEY = "GENERAL_INFO"; + public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT"; + public static final String DEFAULT_COLUMN_PREFIX = "_$SEGMENT_"; Review comment: Where this constant will be used? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor Metadata interfaces and classes to contain all needed information > for the File based Metastore > --- > > Key: DRILL-7271 > URL: https://issues.apache.org/jira/browse/DRILL-7271 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Arina Ielchiieva >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.17.0 > > > 1. Merge info from metadataStatistics + statisticsKinds into one holder: > Map. > 2. Rename hasStatistics to hasDescriptiveStatistics > 3. Remove drill-file-metastore-plugin > 4. Move > org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel > to metadata module, rename to MetadataType and add new value: SEGMENT. > 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder. > 6. Add new info classes: > {noformat} > class TableInfo { > String storagePlugin; > String workspace; > String name; > String type; > String owner; > } > class MetadataInfo { > public static final String GENERAL_INFO_KEY = "GENERAL_INFO"; > public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT"; > MetadataType type (enum); > String key; > String identifier; > } > {noformat} > 7. Modify existing metadata classes: > org.apache.drill.metastore.FileTableMetadata > {noformat} > missing fields > -- > storagePlugin, workspace, tableType -> will be covered by TableInfo class > metadataType, metadataKey -> will be covered by MetadataInfo class > interestingColumns > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Set partitionKeys; -> Map > {noformat} > org.apache.drill.metastore.PartitionMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > partitionValues (List) > location (String) (for directory level metadata) - directory location > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Set location; -> locations > {noformat} > org.apache.drill.metastore.FileMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > path - path to file > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Path location; - should contain directory to which file belongs > {noformat} >
[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
[ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871129#comment-16871129 ] ASF GitHub Bot commented on DRILL-7271: --- arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore URL: https://github.com/apache/drill/pull/1810#discussion_r296622444 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/expr/StatisticsProvider.java ## @@ -218,89 +201,85 @@ public ColumnStatistics visitFunctionHolderExpression(FunctionHolderExpression h ValueHolder minFuncHolder = InterpreterEvaluator.evaluateFunction(interpreter, args1, holderExpr.getName()); ValueHolder maxFuncHolder = InterpreterEvaluator.evaluateFunction(interpreter, args2, holderExpr.getName()); - MinMaxStatistics statistics; switch (destType) { case INT: - statistics = new MinMaxStatistics<>(((IntHolder) minFuncHolder).value, ((IntHolder) maxFuncHolder).value, Integer::compareTo); - break; + return StatisticsProvider.getColumnStatistics( + ((IntHolder) minFuncHolder).value, + ((IntHolder) maxFuncHolder).value, + ColumnStatisticsKind.NULLS_COUNT.getFrom(input), + destType); case BIGINT: - statistics = new MinMaxStatistics<>(((BigIntHolder) minFuncHolder).value, ((BigIntHolder) maxFuncHolder).value, Long::compareTo); - break; + return StatisticsProvider.getColumnStatistics( + ((BigIntHolder) minFuncHolder).value, + ((BigIntHolder) maxFuncHolder).value, + ColumnStatisticsKind.NULLS_COUNT.getFrom(input), + destType); case FLOAT4: - statistics = new MinMaxStatistics<>(((Float4Holder) minFuncHolder).value, ((Float4Holder) maxFuncHolder).value, Float::compareTo); - break; + return StatisticsProvider.getColumnStatistics( + ((Float4Holder) minFuncHolder).value, + ((Float4Holder) maxFuncHolder).value, + ColumnStatisticsKind.NULLS_COUNT.getFrom(input), + destType); case FLOAT8: - statistics = new MinMaxStatistics<>(((Float8Holder) minFuncHolder).value, ((Float8Holder) maxFuncHolder).value, Double::compareTo); - break; + return StatisticsProvider.getColumnStatistics( + ((Float8Holder) minFuncHolder).value, + ((Float8Holder) maxFuncHolder).value, + ColumnStatisticsKind.NULLS_COUNT.getFrom(input), + destType); case TIMESTAMP: - statistics = new MinMaxStatistics<>(((TimeStampHolder) minFuncHolder).value, ((TimeStampHolder) maxFuncHolder).value, Long::compareTo); - break; + return StatisticsProvider.getColumnStatistics( + ((TimeStampHolder) minFuncHolder).value, + ((TimeStampHolder) maxFuncHolder).value, + ColumnStatisticsKind.NULLS_COUNT.getFrom(input), + destType); default: return null; } - statistics.setNullsCount((long) input.getStatistic(ColumnStatisticsKind.NULLS_COUNT)); - return statistics; } catch (Exception e) { - throw new DrillRuntimeException("Error in evaluating function of " + holderExpr.getName() ); + throw new DrillRuntimeException("Error in evaluating function of " + holderExpr.getName()); } } - public static class MinMaxStatistics implements ColumnStatistics { -private final V minVal; -private final V maxVal; -private final Comparator valueComparator; -private long nullsCount; - -public MinMaxStatistics(V minVal, V maxVal, Comparator valueComparator) { - this.minVal = minVal; - this.maxVal = maxVal; - this.valueComparator = valueComparator; -} - -@Override -public Object getStatistic(StatisticsKind statisticsKind) { - switch (statisticsKind.getName()) { -case ExactStatisticsConstants.MIN_VALUE: - return minVal; -case ExactStatisticsConstants.MAX_VALUE: - return maxVal; -case ExactStatisticsConstants.NULLS_COUNT: - return nullsCount; -default: - return null; - } -} - -@Override -public boolean containsStatistic(StatisticsKind statisticsKind) { - switch (statisticsKind.getName()) { -case ExactStatisticsConstants.MIN_VALUE: -case ExactStatisticsConstants.MAX_VALUE: -case ExactStatisticsConstants.NULLS_COUNT: - return true; -default: - return false; - } -} - -@Override -public boolean containsExactStatistics(StatisticsKind statisticsKind) { - return true; -} - -@Override -public Comparator getValueComparator() { - return
[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
[ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871124#comment-16871124 ] ASF GitHub Bot commented on DRILL-7271: --- arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore URL: https://github.com/apache/drill/pull/1810#discussion_r296614730 ## File path: metastore/metastore-api/src/main/java/org/apache/drill/metastore/statistics/StatisticsHolder.java ## @@ -0,0 +1,76 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.metastore.statistics; + +import com.fasterxml.jackson.annotation.JsonCreator; +import com.fasterxml.jackson.annotation.JsonInclude; +import com.fasterxml.jackson.annotation.JsonProperty; +import com.fasterxml.jackson.annotation.JsonTypeInfo; +import com.fasterxml.jackson.core.JsonProcessingException; +import com.fasterxml.jackson.core.util.DefaultPrettyPrinter; +import com.fasterxml.jackson.databind.ObjectMapper; +import com.fasterxml.jackson.databind.ObjectReader; +import com.fasterxml.jackson.databind.ObjectWriter; + +import java.io.IOException; + +/** + * Class-holder for statistics kind and its value. + * + * @param Type of statistics value + */ +@JsonInclude(JsonInclude.Include.NON_DEFAULT) +public class StatisticsHolder { + + public static final ObjectWriter OBJECT_WRITER = new ObjectMapper().setDefaultPrettyPrinter(new DefaultPrettyPrinter()).writer(); Review comment: private? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor Metadata interfaces and classes to contain all needed information > for the File based Metastore > --- > > Key: DRILL-7271 > URL: https://issues.apache.org/jira/browse/DRILL-7271 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Arina Ielchiieva >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.17.0 > > > 1. Merge info from metadataStatistics + statisticsKinds into one holder: > Map. > 2. Rename hasStatistics to hasDescriptiveStatistics > 3. Remove drill-file-metastore-plugin > 4. Move > org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel > to metadata module, rename to MetadataType and add new value: SEGMENT. > 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder. > 6. Add new info classes: > {noformat} > class TableInfo { > String storagePlugin; > String workspace; > String name; > String type; > String owner; > } > class MetadataInfo { > public static final String GENERAL_INFO_KEY = "GENERAL_INFO"; > public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT"; > MetadataType type (enum); > String key; > String identifier; > } > {noformat} > 7. Modify existing metadata classes: > org.apache.drill.metastore.FileTableMetadata > {noformat} > missing fields > -- > storagePlugin, workspace, tableType -> will be covered by TableInfo class > metadataType, metadataKey -> will be covered by MetadataInfo class > interestingColumns > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; > private final Set partitionKeys; -> Map > {noformat} > org.apache.drill.metastore.PartitionMetadata > {noformat} > missing fields > -- > storagePlugin, workspace -> will be covered by TableInfo class > metadataType, metadataKey, metadataIdentifier -> will be covered by > MetadataInfo class > partitionValues (List) > location (String) (for directory level metadata) - directory location > fields to modify > > private final Map tableStatistics; > private final Map statisticsKinds; >
[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
[ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871151#comment-16871151 ] ASF GitHub Bot commented on DRILL-7271: --- arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore URL: https://github.com/apache/drill/pull/1810#discussion_r296695117 ## File path: metastore/metastore-api/src/main/java/org/apache/drill/metastore/metadata/BaseTableMetadata.java ## @@ -0,0 +1,143 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.metastore.metadata; + +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.metastore.statistics.ColumnStatistics; +import org.apache.drill.metastore.statistics.StatisticsHolder; +import org.apache.hadoop.fs.Path; + +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.Objects; + +/** + * Base implementation of {@link TableMetadata} interface. + */ +public class BaseTableMetadata extends BaseMetadata implements TableMetadata { + + public static final long NON_DEFINED_LAST_MODIFIED_TIME = -1; + + private final Path location; + private final long lastModifiedTime; + private final Map partitionKeys; + private final List interestingColumns; + + private BaseTableMetadata(BaseTableMetadataBuilder builder) { +super(builder); +this.location = builder.location; +this.partitionKeys = builder.partitionKeys; +this.interestingColumns = builder.interestingColumns; +this.lastModifiedTime = builder.lastModifiedTime; + } + + public boolean isPartitionColumn(String fieldName) { +return partitionKeys.containsKey(fieldName); + } + + boolean isPartitioned() { +return !partitionKeys.isEmpty(); + } + + @Override + public Path getLocation() { +return location; + } + + @Override + public long getLastModifiedTime() { +return lastModifiedTime; + } + + @Override + public List getInterestingColumns() { +return interestingColumns; + } + + @Override + @SuppressWarnings("unchecked") + public BaseTableMetadata cloneWithStats(Map columnStatistics, List tableStatistics) { +Map mergedTableStatistics = new HashMap<>(this.statistics); + +// overrides statistics value for the case when new statistics is exact or existing one was estimated +tableStatistics.stream() +.filter(statisticsHolder -> statisticsHolder.getStatisticsKind().isExact() + || !this.statistics.get(statisticsHolder.getStatisticsKind().getName()).getStatisticsKind().isExact()) +.forEach(statisticsHolder -> mergedTableStatistics.put(statisticsHolder.getStatisticsKind().getName(), statisticsHolder)); + +Map newColumnsStatistics = new HashMap<>(this.columnsStatistics); +this.columnsStatistics.forEach( +(columnName, value) -> newColumnsStatistics.put(columnName, value.cloneWith(columnStatistics.get(columnName; + +return BaseTableMetadata.builder() +.withTableInfo(tableInfo) +.withMetadataInfo(metadataInfo) +.withLocation(location) +.withSchema(schema) +.withColumnsStatistics(newColumnsStatistics) +.withStatistics(mergedTableStatistics.values()) +.withLastModifiedTime(lastModifiedTime) +.withPartitionKeys(partitionKeys) +.withInterestingColumns(interestingColumns) +.build(); + } + + public static BaseTableMetadataBuilder builder() { +return new BaseTableMetadataBuilder(); + } + + public static class BaseTableMetadataBuilder extends BaseMetadataBuilder { +private Path location; +private long lastModifiedTime = NON_DEFINED_LAST_MODIFIED_TIME; +private Map partitionKeys; +private List interestingColumns; + +public BaseTableMetadataBuilder withLocation(Path location) { Review comment: Same here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For
[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
[ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871138#comment-16871138 ] ASF GitHub Bot commented on DRILL-7271: --- arina-ielchiieva commented on pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore URL: https://github.com/apache/drill/pull/1810#discussion_r296688086 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/BaseParquetMetadataProvider.java ## @@ -313,18 +328,30 @@ public TableMetadata getTableMetadata() { partitionsForValue.asMap().forEach((partitionKey, value) -> { Map columnsStatistics = new HashMap<>(); -Map statistics = new HashMap<>(); +List statistics = new ArrayList<>(); partitionKey = partitionKey == NULL_VALUE ? null : partitionKey; -statistics.put(ColumnStatisticsKind.MIN_VALUE, partitionKey); -statistics.put(ColumnStatisticsKind.MAX_VALUE, partitionKey); +statistics.add(new StatisticsHolder<>(partitionKey, ColumnStatisticsKind.MIN_VALUE)); +statistics.add(new StatisticsHolder<>(partitionKey, ColumnStatisticsKind.MAX_VALUE)); -statistics.put(ColumnStatisticsKind.NULLS_COUNT, Statistic.NO_COLUMN_STATS); -statistics.put(TableStatisticsKind.ROW_COUNT, Statistic.NO_COLUMN_STATS); +statistics.add(new StatisticsHolder<>(Statistic.NO_COLUMN_STATS, ColumnStatisticsKind.NULLS_COUNT)); +statistics.add(new StatisticsHolder<>(Statistic.NO_COLUMN_STATS, TableStatisticsKind.ROW_COUNT)); columnsStatistics.put(partitionColumn, -new ColumnStatisticsImpl<>(statistics, - ParquetTableMetadataUtils.getComparator(getParquetGroupScanStatistics().getTypeForColumn(partitionColumn).getMinorType(; -partitions.add(new PartitionMetadata(partitionColumn, getTableMetadata().getSchema(), -columnsStatistics, statistics, (Set) value, tableName, -1)); +new ColumnStatistics<>(statistics, + getParquetGroupScanStatistics().getTypeForColumn(partitionColumn).getMinorType())); +MetadataInfo metadataInfo = new MetadataInfo(MetadataType.PARTITION, MetadataInfo.GENERAL_INFO_KEY, null); +TableMetadata tableMetadata = getTableMetadata(); +PartitionMetadata partitionMetadata = PartitionMetadata.builder() +.withTableInfo(tableMetadata.getTableInfo()) +.withMetadataInfo(metadataInfo) +.withColumn(partitionColumn) +.withSchema(tableMetadata.getSchema()) +.withColumnsStatistics(columnsStatistics) +.withStatistics(statistics) +.withPartitionValues(Collections.emptyList()) +.withLocations((Set) value) Review comment: Why cast is needed here? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor Metadata interfaces and classes to contain all needed information > for the File based Metastore > --- > > Key: DRILL-7271 > URL: https://issues.apache.org/jira/browse/DRILL-7271 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Arina Ielchiieva >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.17.0 > > > 1. Merge info from metadataStatistics + statisticsKinds into one holder: > Map. > 2. Rename hasStatistics to hasDescriptiveStatistics > 3. Remove drill-file-metastore-plugin > 4. Move > org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel > to metadata module, rename to MetadataType and add new value: SEGMENT. > 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder. > 6. Add new info classes: > {noformat} > class TableInfo { > String storagePlugin; > String workspace; > String name; > String type; > String owner; > } > class MetadataInfo { > public static final String GENERAL_INFO_KEY = "GENERAL_INFO"; > public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT"; > MetadataType type (enum); > String key; > String identifier; > } > {noformat} > 7. Modify existing metadata classes: > org.apache.drill.metastore.FileTableMetadata > {noformat} > missing fields > -- > storagePlugin, workspace, tableType -> will be covered by TableInfo class > metadataType, metadataKey ->
[jira] [Commented] (DRILL-7289) fs.s3a.path.style.access does not seem to work
[ https://issues.apache.org/jira/browse/DRILL-7289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871069#comment-16871069 ] Steve Loughran commented on DRILL-7289: --- it works on the S3A connector since hadoop-2.8 and HADOOP-12963; broadly tested as its the default for enterprise S3 stores which don't work with DNS. if you have problems then either the config isn't complete or you are using an out of date version Recommend * download Hadoop 3.2 and install * try the config in a core-site there, then hadoop fs- ls command * or even better: cloudstore diagnostics https://github.com/steveloughran/cloudstore > fs.s3a.path.style.access does not seem to work > -- > > Key: DRILL-7289 > URL: https://issues.apache.org/jira/browse/DRILL-7289 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 1.16.0 > Environment: Running on Kubernetes >Reporter: Gururajesh Elango >Priority: Major > > fs.s3a.path.style.access does not seem to work. Please see > [https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingBucket.html#access-bucket-intro] > to know about path style access > How to reproduce: > 1. Create a bucket in minio(S3 simulator) > 2. Define a storage like below > "storage": { > s3: { > type: "file", > connection: "s3a://new-bucket", > "config": { > "fs.s3a.access.key": "minio-user", > "fs.s3a.secret.key": "minio-password", > "fs.s3a.endpoint": "http://:9000", > "fs.s3a.connection.ssl.enabled": "false", > "fs.s3a.path.style.access": "true", > "fs.s3a.connection.timeout": "5000", > "fs.s3a.connection.maximum": "100" > }, > "workspaces": { > "tmp": { > "location": "/tmp/drill", > "writable": "true", > "defaultInputFormat": "", > "allowAccessOutsideWorkspace": "false" > }, > "root": { > "location": "/", > "writable": "false", > "defaultInputFormat": "", > "allowAccessOutsideWorkspace": "false" > } > }, > "formats": { > "parquet": { > "type": "parquet" > }, > "json": { > "type": "json", > "extensions": [ > "json" > ] > }, > "avro": { > "type": "avro" > } > }, > "enabled": "true" > } > } > 3. In the logs, Expect an error which states > new-bucket.:9000 is not reachable but what is expected is that > Drill tries to reach > http://:9000/impact-enable-bucket -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7302) Bump Apache Avro from 1.8.2 to 1.9.0
[ https://issues.apache.org/jira/browse/DRILL-7302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871013#comment-16871013 ] Fokko Driesprong commented on DRILL-7302: - I'm allowed to assign tickets. I think it is because you have to add me to the project first. > Bump Apache Avro from 1.8.2 to 1.9.0 > > > Key: DRILL-7302 > URL: https://issues.apache.org/jira/browse/DRILL-7302 > Project: Apache Drill > Issue Type: Improvement >Reporter: Fokko Driesprong >Priority: Major > Labels: ready-to-commit > Fix For: 1.17.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-7307) casthigh for decimal type can lead to the issues with VarDecimalHolder
Dmytriy Grinchenko created DRILL-7307: - Summary: casthigh for decimal type can lead to the issues with VarDecimalHolder Key: DRILL-7307 URL: https://issues.apache.org/jira/browse/DRILL-7307 Project: Apache Drill Issue Type: Bug Reporter: Dmytriy Grinchenko Assignee: Dmytriy Grinchenko Fix For: 1.17.0 The decimal cast may lead to issues with VarDercimal transformation and issues at uml functions which using casthigh under the hood Example: {code} apache drill> select casthigh(cast(1025.0 as decimal(28,8))); Error: SYSTEM ERROR: CompileException: Line 25, Column 60: "isSet" is neither a method, a field, nor a member class of "org.apache.drill.exec.expr.holders.VarDecimalHolder" Fragment 0:0 Please, refer to logs for more information. {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
[ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870955#comment-16870955 ] ASF GitHub Bot commented on DRILL-7271: --- vvysotskyi commented on pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore URL: https://github.com/apache/drill/pull/1810#discussion_r296612824 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/AbstractParquetGroupScan.java ## @@ -281,31 +265,50 @@ public AbstractGroupScanWithMetadata applyFilter(LogicalExpression filterExpr, U logger.debug("All row groups have been filtered out. Add back one to get schema from scanner"); + Map segmentsMap = getNextOrEmpty(getSegmentsMetadata().values()).stream() + .collect(Collectors.toMap(SegmentMetadata::getPath, Function.identity())); + Map filesMap = getNextOrEmpty(getFilesMetadata().values()).stream() - .collect(Collectors.toMap(FileMetadata::getLocation, Function.identity())); + .collect(Collectors.toMap(FileMetadata::getPath, Function.identity())); Multimap rowGroupsMap = LinkedListMultimap.create(); - getNextOrEmpty(getRowGroupsMetadata().values()).forEach(entry -> rowGroupsMap.put(entry.getLocation(), entry)); + getNextOrEmpty(getRowGroupsMetadata().values()).forEach(entry -> rowGroupsMap.put(entry.getPath(), entry)); - builder.withRowGroups(rowGroupsMap) + filteredMetadata.withRowGroups(rowGroupsMap) .withTable(getTableMetadata()) + .withSegments(segmentsMap) .withPartitions(getNextOrEmpty(getPartitionsMetadata())) .withNonInterestingColumns(getNonInterestingColumnsMetadata()) .withFiles(filesMap) .withMatching(false); } -if (builder.getOverflowLevel() != MetadataLevel.NONE) { +if (filteredMetadata.getOverflowLevel() != MetadataType.NONE) { logger.warn("applyFilter {} wasn't able to do pruning for all metadata levels filter condition, since metadata count for " + "{} level exceeds `planner.store.parquet.rowgroup.filter.pushdown.threshold` value.\n" + "But underlying metadata was pruned without filter expression according to the metadata with above level.", - ExpressionStringBuilder.toString(filterExpr), builder.getOverflowLevel()); + ExpressionStringBuilder.toString(filterExpr), filteredMetadata.getOverflowLevel()); } logger.debug("applyFilter {} reduce row groups # from {} to {}", -ExpressionStringBuilder.toString(filterExpr), getRowGroupsMetadata().size(), builder.getRowGroups().size()); +ExpressionStringBuilder.toString(filterExpr), getRowGroupsMetadata().size(), filteredMetadata.getRowGroups().size()); Review comment: Thanks, done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor Metadata interfaces and classes to contain all needed information > for the File based Metastore > --- > > Key: DRILL-7271 > URL: https://issues.apache.org/jira/browse/DRILL-7271 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Arina Ielchiieva >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.17.0 > > > 1. Merge info from metadataStatistics + statisticsKinds into one holder: > Map. > 2. Rename hasStatistics to hasDescriptiveStatistics > 3. Remove drill-file-metastore-plugin > 4. Move > org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel > to metadata module, rename to MetadataType and add new value: SEGMENT. > 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder. > 6. Add new info classes: > {noformat} > class TableInfo { > String storagePlugin; > String workspace; > String name; > String type; > String owner; > } > class MetadataInfo { > public static final String GENERAL_INFO_KEY = "GENERAL_INFO"; > public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT"; > MetadataType type (enum); > String key; > String identifier; > } > {noformat} > 7. Modify existing metadata classes: > org.apache.drill.metastore.FileTableMetadata > {noformat} > missing fields > -- > storagePlugin, workspace, tableType -> will be covered by TableInfo class > metadataType, metadataKey -> will be covered by MetadataInfo class > interestingColumns > fields to modify > > private final
[jira] [Commented] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
[ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870949#comment-16870949 ] ASF GitHub Bot commented on DRILL-7271: --- ihuzenko commented on pull request #1810: DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore URL: https://github.com/apache/drill/pull/1810#discussion_r296327891 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/AbstractParquetGroupScan.java ## @@ -281,31 +265,50 @@ public AbstractGroupScanWithMetadata applyFilter(LogicalExpression filterExpr, U logger.debug("All row groups have been filtered out. Add back one to get schema from scanner"); + Map segmentsMap = getNextOrEmpty(getSegmentsMetadata().values()).stream() + .collect(Collectors.toMap(SegmentMetadata::getPath, Function.identity())); + Map filesMap = getNextOrEmpty(getFilesMetadata().values()).stream() - .collect(Collectors.toMap(FileMetadata::getLocation, Function.identity())); + .collect(Collectors.toMap(FileMetadata::getPath, Function.identity())); Multimap rowGroupsMap = LinkedListMultimap.create(); - getNextOrEmpty(getRowGroupsMetadata().values()).forEach(entry -> rowGroupsMap.put(entry.getLocation(), entry)); + getNextOrEmpty(getRowGroupsMetadata().values()).forEach(entry -> rowGroupsMap.put(entry.getPath(), entry)); - builder.withRowGroups(rowGroupsMap) + filteredMetadata.withRowGroups(rowGroupsMap) .withTable(getTableMetadata()) + .withSegments(segmentsMap) .withPartitions(getNextOrEmpty(getPartitionsMetadata())) .withNonInterestingColumns(getNonInterestingColumnsMetadata()) .withFiles(filesMap) .withMatching(false); } -if (builder.getOverflowLevel() != MetadataLevel.NONE) { +if (filteredMetadata.getOverflowLevel() != MetadataType.NONE) { logger.warn("applyFilter {} wasn't able to do pruning for all metadata levels filter condition, since metadata count for " + "{} level exceeds `planner.store.parquet.rowgroup.filter.pushdown.threshold` value.\n" + "But underlying metadata was pruned without filter expression according to the metadata with above level.", - ExpressionStringBuilder.toString(filterExpr), builder.getOverflowLevel()); + ExpressionStringBuilder.toString(filterExpr), filteredMetadata.getOverflowLevel()); } logger.debug("applyFilter {} reduce row groups # from {} to {}", -ExpressionStringBuilder.toString(filterExpr), getRowGroupsMetadata().size(), builder.getRowGroups().size()); +ExpressionStringBuilder.toString(filterExpr), getRowGroupsMetadata().size(), filteredMetadata.getRowGroups().size()); Review comment: add ```isDebugEnabled()``` check before call This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor Metadata interfaces and classes to contain all needed information > for the File based Metastore > --- > > Key: DRILL-7271 > URL: https://issues.apache.org/jira/browse/DRILL-7271 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Arina Ielchiieva >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.17.0 > > > 1. Merge info from metadataStatistics + statisticsKinds into one holder: > Map. > 2. Rename hasStatistics to hasDescriptiveStatistics > 3. Remove drill-file-metastore-plugin > 4. Move > org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel > to metadata module, rename to MetadataType and add new value: SEGMENT. > 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder. > 6. Add new info classes: > {noformat} > class TableInfo { > String storagePlugin; > String workspace; > String name; > String type; > String owner; > } > class MetadataInfo { > public static final String GENERAL_INFO_KEY = "GENERAL_INFO"; > public static final String DEFAULT_SEGMENT_KEY = "DEFAULT_SEGMENT"; > MetadataType type (enum); > String key; > String identifier; > } > {noformat} > 7. Modify existing metadata classes: > org.apache.drill.metastore.FileTableMetadata > {noformat} > missing fields > -- > storagePlugin, workspace, tableType -> will be covered by TableInfo class > metadataType, metadataKey -> will be covered by MetadataInfo class > interestingColumns > fields to modify >
[jira] [Commented] (DRILL-7293) Convert the regex ("log") plugin to use EVF
[ https://issues.apache.org/jira/browse/DRILL-7293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870914#comment-16870914 ] ASF GitHub Bot commented on DRILL-7293: --- arina-ielchiieva commented on issue #1807: DRILL-7293: Convert the regex ("log") plugin to use EVF URL: https://github.com/apache/drill/pull/1807#issuecomment-504905705 @paul-rogers I am still unclear if you have tried the following query for log plugin data: `select * from table(t(schema=>'inline=(col1 varchar)'))` where `t` is table with log plugin data. Did you try it? I suppose it should work. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Convert the regex ("log") plugin to use EVF > --- > > Key: DRILL-7293 > URL: https://issues.apache.org/jira/browse/DRILL-7293 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.16.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.17.0 > > > The "log" plugin (which uses a regex to define the row format) is the subject > of Chapter 12 of the Learning Apache Drill book (though the version in the > book is simpler than the one in the master branch.) > The recently-completed "Enhanced Vector Framework" (EVF, AKA the "row set > framework") gives Drill control over the size of batches created by readers, > and allows readers to use the recently-added provided schema mechanism. > We wish to use the log reader as an example for how to convert a Drill format > plugin to use the EVF so that other developers can convert their own plugins. > This PR provides the first set of log plugin changes to enable us to publish > a tutorial on the EVF. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6958) CTAS csv with option
[ https://issues.apache.org/jira/browse/DRILL-6958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] benj updated DRILL-6958: Description: Currently, it may be difficult to produce well-formed CSV with CTAS (see comment below). It appears necessary to have some additional/configuratble options to write CSV file with CTAS : * possibility to change/define the separator, * possibility to write or not the header, * possibility to force the write of only 1 file instead of lot of parts, * possibility to force quoting * possibility to use/change escape char * ... was: Add some options to write CSV file with CTAS : * possibility to change/define the separator, * possibility to write or not the header, * possibility to force the write of only 1 file instead of lot of parts, * possibility to force quoting > CTAS csv with option > > > Key: DRILL-6958 > URL: https://issues.apache.org/jira/browse/DRILL-6958 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Text CSV >Affects Versions: 1.15.0, 1.16.0 >Reporter: benj >Priority: Major > > Currently, it may be difficult to produce well-formed CSV with CTAS (see > comment below). > It appears necessary to have some additional/configuratble options to write > CSV file with CTAS : > * possibility to change/define the separator, > * possibility to write or not the header, > * possibility to force the write of only 1 file instead of lot of parts, > * possibility to force quoting > * possibility to use/change escape char > * ... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6958) CTAS csv with option
[ https://issues.apache.org/jira/browse/DRILL-6958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] benj updated DRILL-6958: Issue Type: Bug (was: Improvement) > CTAS csv with option > > > Key: DRILL-6958 > URL: https://issues.apache.org/jira/browse/DRILL-6958 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Text CSV >Affects Versions: 1.15.0, 1.16.0 >Reporter: benj >Priority: Major > > Add some options to write CSV file with CTAS : > * possibility to change/define the separator, > * possibility to write or not the header, > * possibility to force the write of only 1 file instead of lot of parts, > * possibility to force quoting -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6958) CTAS csv with option
[ https://issues.apache.org/jira/browse/DRILL-6958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] benj updated DRILL-6958: Affects Version/s: 1.16.0 > CTAS csv with option > > > Key: DRILL-6958 > URL: https://issues.apache.org/jira/browse/DRILL-6958 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Text CSV >Affects Versions: 1.15.0, 1.16.0 >Reporter: benj >Priority: Major > > Add some options to write CSV file with CTAS : > * possibility to change/define the separator, > * possibility to write or not the header, > * possibility to force the write of only 1 file instead of lot of parts, > * possibility to force quoting -- This message was sent by Atlassian JIRA (v7.6.3#76005)