[jira] [Commented] (DRILL-4530) Improve metadata cache performance for queries with single partition
[ https://issues.apache.org/jira/browse/DRILL-4530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15326690#comment-15326690 ] Aman Sinha commented on DRILL-4530: --- I have created a PR for this. All unit and functional tests are clean. I haven't yet run performance tests. The changes are in 3 broad areas: (a) file/dir selection (b) partition pruning, (c) metadata cache. The changes in (a) and (b) are mostly independent of the changes in (c) which relies on a separate directories file. In the future we could swap out the changes in (c) whenever the metadata cache is enhanced to allow faster access to the directories field. Feedback is welcome. > Improve metadata cache performance for queries with single partition > - > > Key: DRILL-4530 > URL: https://issues.apache.org/jira/browse/DRILL-4530 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.6.0 >Reporter: Aman Sinha >Assignee: Aman Sinha > Fix For: 1.7.0 > > > Consider two types of queries which are run with Parquet metadata caching: > {noformat} > query 1: > SELECT col FROM `A/B/C`; > query 2: > SELECT col FROM `A` WHERE dir0 = 'B' AND dir1 = 'C'; > {noformat} > For a certain dataset, the query1 elapsed time is 1 sec whereas query2 > elapsed time is 9 sec even though both are accessing the same amount of data. > The user expectation is that they should perform roughly the same. The main > difference comes from reading the bigger metadata cache file at the root > level 'A' for query2 and then applying the partitioning filter. query1 reads > a much smaller metadata cache file at the subdirectory level. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4530) Improve metadata cache performance for queries with single partition
[ https://issues.apache.org/jira/browse/DRILL-4530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15326679#comment-15326679 ] ASF GitHub Bot commented on DRILL-4530: --- GitHub user amansinha100 opened a pull request: https://github.com/apache/drill/pull/519 DRILL-4530: Optimize partition pruning with metadata caching for the … …single partition case. - Enhance PruneScanRule to detect single partitions based on referenced dirs in the filter. - Keep a new status of EXPANDED_PARTIAL for FileSelection. - Create separate .directories metadata file to prune directories first before files. - Introduce cacheFileRoot attribute to keep track of the parent directory of the cache file after partition pruning. You can merge this pull request into a Git repository by running: $ git pull https://github.com/amansinha100/incubator-drill DRILL-4530-1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/519.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #519 commit 9c9687e804fa05c8f4b7b065738c458cb88bf5c4 Author: Aman SinhaDate: 2016-03-25T19:55:59Z DRILL-4530: Optimize partition pruning with metadata caching for the single partition case. - Enhance PruneScanRule to detect single partitions based on referenced dirs in the filter. - Keep a new status of EXPANDED_PARTIAL for FileSelection. - Create separate .directories metadata file to prune directories first before files. - Introduce cacheFileRoot attribute to keep track of the parent directory of the cache file after partition pruning. > Improve metadata cache performance for queries with single partition > - > > Key: DRILL-4530 > URL: https://issues.apache.org/jira/browse/DRILL-4530 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.6.0 >Reporter: Aman Sinha >Assignee: Aman Sinha > Fix For: 1.7.0 > > > Consider two types of queries which are run with Parquet metadata caching: > {noformat} > query 1: > SELECT col FROM `A/B/C`; > query 2: > SELECT col FROM `A` WHERE dir0 = 'B' AND dir1 = 'C'; > {noformat} > For a certain dataset, the query1 elapsed time is 1 sec whereas query2 > elapsed time is 9 sec even though both are accessing the same amount of data. > The user expectation is that they should perform roughly the same. The main > difference comes from reading the bigger metadata cache file at the root > level 'A' for query2 and then applying the partitioning filter. query1 reads > a much smaller metadata cache file at the subdirectory level. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (DRILL-3581) Google Guava version is so old it causes incompatibilities with other libs
[ https://issues.apache.org/jira/browse/DRILL-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15326659#comment-15326659 ] Aditya Kishore edited comment on DRILL-3581 at 6/12/16 10:25 PM: - Looks like we will need to move the patcher code into main execution path since HBase 1.1 meta table locator code (used by HBase client) has started using the {{Stopwatch stopwatch = new Stopwatch().start();}} [code|https://github.com/apache/hbase/blob/rel/1.1.0/hbase-client/src/main/java/org/apache/hadoop/hbase/zookeeper/MetaTableLocator.java#L234]. I'll update the pull request for DRILL-4199 with the changes. was (Author: adityakishore): Looks like we will need to move the patcher code into main execution path since HBase 1.1 meta table locator code (used by HBase client) has started using the `Stopwatch stopwatch = new Stopwatch().start();` code. I'll update the pull request for DRILL-4199 with the changes. [1] https://github.com/apache/hbase/blob/rel/1.1.0/hbase-client/src/main/java/org/apache/hadoop/hbase/zookeeper/MetaTableLocator.java#L234 > Google Guava version is so old it causes incompatibilities with other libs > -- > > Key: DRILL-3581 > URL: https://issues.apache.org/jira/browse/DRILL-3581 > Project: Apache Drill > Issue Type: Bug > Components: Client - JDBC >Affects Versions: 1.1.0 > Environment: Linux, JDK 1.8 >Reporter: Joseph Barefoot >Assignee: Steven Phillips > Fix For: 1.6.0 > > > Drill is currently using Guava version 14.0.1, which was released March 2013. > https://github.com/apache/drill/blob/master/pom.xml > Many other java projects use newer versions, however this conflicts with the > Drill JDBC driver since a couple of APIs it uses are incompatible with the > newer guava versions. In particular: > https://github.com/apache/drill/blob/master/common/src/main/java/org/apache/drill/common/util/PathScanner.java > (The public StopWatch class constructor has been removed in favor of factory > methods) > Although this seems minor, it prevents easily using Drill from a java > application, since again many other open source libs will be using the latest > Guava version (18). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3581) Google Guava version is so old it causes incompatibilities with other libs
[ https://issues.apache.org/jira/browse/DRILL-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15326659#comment-15326659 ] Aditya Kishore commented on DRILL-3581: --- Looks like we will need to move the patcher code into main execution path since HBase 1.1 meta table locator code (used by HBase client) has started using the `Stopwatch stopwatch = new Stopwatch().start();` code. I'll update the pull request for DRILL-4199 with the changes. [1] https://github.com/apache/hbase/blob/rel/1.1.0/hbase-client/src/main/java/org/apache/hadoop/hbase/zookeeper/MetaTableLocator.java#L234 > Google Guava version is so old it causes incompatibilities with other libs > -- > > Key: DRILL-3581 > URL: https://issues.apache.org/jira/browse/DRILL-3581 > Project: Apache Drill > Issue Type: Bug > Components: Client - JDBC >Affects Versions: 1.1.0 > Environment: Linux, JDK 1.8 >Reporter: Joseph Barefoot >Assignee: Steven Phillips > Fix For: 1.6.0 > > > Drill is currently using Guava version 14.0.1, which was released March 2013. > https://github.com/apache/drill/blob/master/pom.xml > Many other java projects use newer versions, however this conflicts with the > Drill JDBC driver since a couple of APIs it uses are incompatible with the > newer guava versions. In particular: > https://github.com/apache/drill/blob/master/common/src/main/java/org/apache/drill/common/util/PathScanner.java > (The public StopWatch class constructor has been removed in favor of factory > methods) > Although this seems minor, it prevents easily using Drill from a java > application, since again many other open source libs will be using the latest > Guava version (18). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4717) Drill inserts period into HQL statement when using Hive JDBC Driver
[ https://issues.apache.org/jira/browse/DRILL-4717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Smith updated DRILL-4717: --- Attachment: screenshot-1.png > Drill inserts period into HQL statement when using Hive JDBC Driver > --- > > Key: DRILL-4717 > URL: https://issues.apache.org/jira/browse/DRILL-4717 > Project: Apache Drill > Issue Type: Bug > Components: Storage - JDBC >Affects Versions: 1.6.0 >Reporter: Bryan Smith >Priority: Minor > Attachments: screenshot-1.png > > > When using a Storage PlugIn of type JDBC with the Hive JDBC driver, Drill > inserts a period between the FROM keyword and the table name. Hive rejects > the query statement as invalid. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4717) Drill inserts period into HQL statement when using Hive JDBC Driver
Bryan Smith created DRILL-4717: -- Summary: Drill inserts period into HQL statement when using Hive JDBC Driver Key: DRILL-4717 URL: https://issues.apache.org/jira/browse/DRILL-4717 Project: Apache Drill Issue Type: Bug Components: Storage - JDBC Affects Versions: 1.6.0 Reporter: Bryan Smith Priority: Minor When using a Storage PlugIn of type JDBC with the Hive JDBC driver, Drill inserts a period between the FROM keyword and the table name. Hive rejects the query statement as invalid. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4716) status.json doesn't work in drill ui
[ https://issues.apache.org/jira/browse/DRILL-4716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-4716: Description: 1. http://localhost:8047/status returns "Running!" But http://localhost:8047/status.json gives error. {code} { "errorMessage" : "HTTP 404 Not Found" } {code} 2. Remove link to System Options on page http://localhost:8047/status as redundant. was: 1. http://localhost:8047/status returns "Running!" But http://localhost:8047/status.json gives error. {code} { "errorMessage" : "HTTP 404 Not Found" } {code} 2. Link to System Options on page http://localhost:8047/status is corrupted. > status.json doesn't work in drill ui > > > Key: DRILL-4716 > URL: https://issues.apache.org/jira/browse/DRILL-4716 > Project: Apache Drill > Issue Type: Bug > Components: Client - HTTP >Affects Versions: 1.6.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva >Priority: Minor > Fix For: 1.7.0 > > > 1. http://localhost:8047/status returns "Running!" > But http://localhost:8047/status.json gives error. > {code} > { > "errorMessage" : "HTTP 404 Not Found" > } > {code} > 2. Remove link to System Options on page http://localhost:8047/status as > redundant. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4716) status.json doesn't work in drill ui
[ https://issues.apache.org/jira/browse/DRILL-4716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-4716: Description: 1. http://localhost:8047/status returns "Running!" But http://localhost:8047/status.json gives error. {code} { "errorMessage" : "HTTP 404 Not Found" } {code} 2. Link to System Options on page http://localhost:8047/status is corrupted. was: 1. http://localhost:8047/status returns "Running!" But http://localhost5:8047/status.json gives error. {code} { "errorMessage" : "HTTP 404 Not Found" } {code} 2. Link to System Options on page http://localhost:8047/status is corrupted. > status.json doesn't work in drill ui > > > Key: DRILL-4716 > URL: https://issues.apache.org/jira/browse/DRILL-4716 > Project: Apache Drill > Issue Type: Bug > Components: Client - HTTP >Affects Versions: 1.6.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva >Priority: Minor > Fix For: 1.7.0 > > > 1. http://localhost:8047/status returns "Running!" > But http://localhost:8047/status.json gives error. > {code} > { > "errorMessage" : "HTTP 404 Not Found" > } > {code} > 2. Link to System Options on page http://localhost:8047/status is corrupted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4716) status.json doesn't work in drill ui
Arina Ielchiieva created DRILL-4716: --- Summary: status.json doesn't work in drill ui Key: DRILL-4716 URL: https://issues.apache.org/jira/browse/DRILL-4716 Project: Apache Drill Issue Type: Bug Components: Client - HTTP Affects Versions: 1.6.0 Reporter: Arina Ielchiieva Assignee: Arina Ielchiieva Priority: Minor Fix For: 1.7.0 1. http://localhost:8047/status returns "Running!" But http://localhost5:8047/status.json gives error. {code} { "errorMessage" : "HTTP 404 Not Found" } {code} 2. Link to System Options on page http://localhost:8047/status is corrupted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2593) 500 error when crc for a query profile is out of sync
[ https://issues.apache.org/jira/browse/DRILL-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-2593: Fix Version/s: (was: Future) 1.7.0 > 500 error when crc for a query profile is out of sync > - > > Key: DRILL-2593 > URL: https://issues.apache.org/jira/browse/DRILL-2593 > Project: Apache Drill > Issue Type: Bug > Components: Client - HTTP >Affects Versions: 0.7.0 >Reporter: Jason Altekruse >Assignee: Arina Ielchiieva > Fix For: 1.7.0 > > Attachments: warning1.JPG, warning2.JPG > > > To reproduce, on a machine where an embedded drillbit has been run, edit one > of the profiles stored in /tmp/drill/profiles and try to navigate to the > profiles page on the Web UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2593) 500 error when crc for a query profile is out of sync
[ https://issues.apache.org/jira/browse/DRILL-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15326489#comment-15326489 ] Arina Ielchiieva commented on DRILL-2593: - In case we have corrupted profile, it will be skipped but dismissable warning will be generated. Screenshots - warning1.JPG, warning2.JPG. > 500 error when crc for a query profile is out of sync > - > > Key: DRILL-2593 > URL: https://issues.apache.org/jira/browse/DRILL-2593 > Project: Apache Drill > Issue Type: Bug > Components: Client - HTTP >Affects Versions: 0.7.0 >Reporter: Jason Altekruse >Assignee: Arina Ielchiieva > Fix For: Future > > Attachments: warning1.JPG, warning2.JPG > > > To reproduce, on a machine where an embedded drillbit has been run, edit one > of the profiles stored in /tmp/drill/profiles and try to navigate to the > profiles page on the Web UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2593) 500 error when crc for a query profile is out of sync
[ https://issues.apache.org/jira/browse/DRILL-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-2593: Attachment: warning2.JPG warning1.JPG > 500 error when crc for a query profile is out of sync > - > > Key: DRILL-2593 > URL: https://issues.apache.org/jira/browse/DRILL-2593 > Project: Apache Drill > Issue Type: Bug > Components: Client - HTTP >Affects Versions: 0.7.0 >Reporter: Jason Altekruse >Assignee: Arina Ielchiieva > Fix For: Future > > Attachments: warning1.JPG, warning2.JPG > > > To reproduce, on a machine where an embedded drillbit has been run, edit one > of the profiles stored in /tmp/drill/profiles and try to navigate to the > profiles page on the Web UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (DRILL-2593) 500 error when crc for a query profile is out of sync
[ https://issues.apache.org/jira/browse/DRILL-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva reassigned DRILL-2593: --- Assignee: Arina Ielchiieva > 500 error when crc for a query profile is out of sync > - > > Key: DRILL-2593 > URL: https://issues.apache.org/jira/browse/DRILL-2593 > Project: Apache Drill > Issue Type: Bug > Components: Client - HTTP >Affects Versions: 0.7.0 >Reporter: Jason Altekruse >Assignee: Arina Ielchiieva > Fix For: Future > > > To reproduce, on a machine where an embedded drillbit has been run, edit one > of the profiles stored in /tmp/drill/profiles and try to navigate to the > profiles page on the Web UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)