[jira] [Commented] (DRILL-4530) Improve metadata cache performance for queries with single partition

2016-06-12 Thread Aman Sinha (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15326690#comment-15326690
 ] 

Aman Sinha commented on DRILL-4530:
---

I have created a PR for this.  All unit and functional tests are clean.  I 
haven't yet run performance tests.  The changes are in 3 broad areas:  (a) 
file/dir selection (b) partition pruning, (c) metadata cache.  The changes in 
(a) and (b) are mostly independent of the changes in (c) which relies on a 
separate directories file.  In the future we could swap out the changes in (c) 
whenever the metadata cache is enhanced to allow faster access to the 
directories field.   
Feedback is welcome.  

> Improve metadata cache performance for queries with single partition 
> -
>
> Key: DRILL-4530
> URL: https://issues.apache.org/jira/browse/DRILL-4530
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.6.0
>Reporter: Aman Sinha
>Assignee: Aman Sinha
> Fix For: 1.7.0
>
>
> Consider two types of queries which are run with Parquet metadata caching: 
> {noformat}
> query 1:
> SELECT col FROM  `A/B/C`;
> query 2:
> SELECT col FROM `A` WHERE dir0 = 'B' AND dir1 = 'C';
> {noformat}
> For a certain dataset, the query1 elapsed time is 1 sec whereas query2 
> elapsed time is 9 sec even though both are accessing the same amount of data. 
>  The user expectation is that they should perform roughly the same.  The main 
> difference comes from reading the bigger metadata cache file at the root 
> level 'A' for query2 and then applying the partitioning filter.  query1 reads 
> a much smaller metadata cache file at the subdirectory level. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4530) Improve metadata cache performance for queries with single partition

2016-06-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15326679#comment-15326679
 ] 

ASF GitHub Bot commented on DRILL-4530:
---

GitHub user amansinha100 opened a pull request:

https://github.com/apache/drill/pull/519

DRILL-4530: Optimize partition pruning with metadata caching for the …

…single partition case.

 - Enhance PruneScanRule to detect single partitions based on referenced 
dirs in the filter.
 - Keep a new status of EXPANDED_PARTIAL for FileSelection.
 - Create separate .directories metadata file to prune directories first 
before files.
 - Introduce cacheFileRoot attribute to keep track of the parent directory 
of the cache file after partition pruning.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/amansinha100/incubator-drill DRILL-4530-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/519.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #519


commit 9c9687e804fa05c8f4b7b065738c458cb88bf5c4
Author: Aman Sinha 
Date:   2016-03-25T19:55:59Z

DRILL-4530: Optimize partition pruning with metadata caching for the single 
partition case.

 - Enhance PruneScanRule to detect single partitions based on referenced 
dirs in the filter.
 - Keep a new status of EXPANDED_PARTIAL for FileSelection.
 - Create separate .directories metadata file to prune directories first 
before files.
 - Introduce cacheFileRoot attribute to keep track of the parent directory 
of the cache file after partition pruning.




> Improve metadata cache performance for queries with single partition 
> -
>
> Key: DRILL-4530
> URL: https://issues.apache.org/jira/browse/DRILL-4530
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.6.0
>Reporter: Aman Sinha
>Assignee: Aman Sinha
> Fix For: 1.7.0
>
>
> Consider two types of queries which are run with Parquet metadata caching: 
> {noformat}
> query 1:
> SELECT col FROM  `A/B/C`;
> query 2:
> SELECT col FROM `A` WHERE dir0 = 'B' AND dir1 = 'C';
> {noformat}
> For a certain dataset, the query1 elapsed time is 1 sec whereas query2 
> elapsed time is 9 sec even though both are accessing the same amount of data. 
>  The user expectation is that they should perform roughly the same.  The main 
> difference comes from reading the bigger metadata cache file at the root 
> level 'A' for query2 and then applying the partitioning filter.  query1 reads 
> a much smaller metadata cache file at the subdirectory level. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (DRILL-3581) Google Guava version is so old it causes incompatibilities with other libs

2016-06-12 Thread Aditya Kishore (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15326659#comment-15326659
 ] 

Aditya Kishore edited comment on DRILL-3581 at 6/12/16 10:25 PM:
-

Looks like we will need to move the patcher code into main execution path since 
HBase 1.1 meta table locator code (used by HBase client) has started using the 
{{Stopwatch stopwatch = new Stopwatch().start();}} 
[code|https://github.com/apache/hbase/blob/rel/1.1.0/hbase-client/src/main/java/org/apache/hadoop/hbase/zookeeper/MetaTableLocator.java#L234].

I'll update the pull request for DRILL-4199 with the changes.


was (Author: adityakishore):
Looks like we will need to move the patcher code into main execution path since 
HBase 1.1 meta table locator code (used by HBase client) has started using the 
`Stopwatch stopwatch = new Stopwatch().start();` code.

I'll update the pull request for DRILL-4199 with the changes.

[1] 
https://github.com/apache/hbase/blob/rel/1.1.0/hbase-client/src/main/java/org/apache/hadoop/hbase/zookeeper/MetaTableLocator.java#L234

> Google Guava version is so old it causes incompatibilities with other libs
> --
>
> Key: DRILL-3581
> URL: https://issues.apache.org/jira/browse/DRILL-3581
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Affects Versions: 1.1.0
> Environment: Linux, JDK 1.8
>Reporter: Joseph Barefoot
>Assignee: Steven Phillips
> Fix For: 1.6.0
>
>
> Drill is currently using Guava version 14.0.1, which was released March 2013. 
>  https://github.com/apache/drill/blob/master/pom.xml
> Many other java projects use newer versions, however this conflicts with the 
> Drill JDBC driver since a couple of APIs it uses are incompatible with the 
> newer guava versions.  In particular:
> https://github.com/apache/drill/blob/master/common/src/main/java/org/apache/drill/common/util/PathScanner.java
> (The public StopWatch class constructor has been removed in favor of factory 
> methods)
> Although this seems minor, it prevents easily using Drill from a java 
> application, since again many other open source libs will be using the latest 
> Guava version (18).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3581) Google Guava version is so old it causes incompatibilities with other libs

2016-06-12 Thread Aditya Kishore (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15326659#comment-15326659
 ] 

Aditya Kishore commented on DRILL-3581:
---

Looks like we will need to move the patcher code into main execution path since 
HBase 1.1 meta table locator code (used by HBase client) has started using the 
`Stopwatch stopwatch = new Stopwatch().start();` code.

I'll update the pull request for DRILL-4199 with the changes.

[1] 
https://github.com/apache/hbase/blob/rel/1.1.0/hbase-client/src/main/java/org/apache/hadoop/hbase/zookeeper/MetaTableLocator.java#L234

> Google Guava version is so old it causes incompatibilities with other libs
> --
>
> Key: DRILL-3581
> URL: https://issues.apache.org/jira/browse/DRILL-3581
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Affects Versions: 1.1.0
> Environment: Linux, JDK 1.8
>Reporter: Joseph Barefoot
>Assignee: Steven Phillips
> Fix For: 1.6.0
>
>
> Drill is currently using Guava version 14.0.1, which was released March 2013. 
>  https://github.com/apache/drill/blob/master/pom.xml
> Many other java projects use newer versions, however this conflicts with the 
> Drill JDBC driver since a couple of APIs it uses are incompatible with the 
> newer guava versions.  In particular:
> https://github.com/apache/drill/blob/master/common/src/main/java/org/apache/drill/common/util/PathScanner.java
> (The public StopWatch class constructor has been removed in favor of factory 
> methods)
> Although this seems minor, it prevents easily using Drill from a java 
> application, since again many other open source libs will be using the latest 
> Guava version (18).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4717) Drill inserts period into HQL statement when using Hive JDBC Driver

2016-06-12 Thread Bryan Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Smith updated DRILL-4717:
---
Attachment: screenshot-1.png

> Drill inserts period into HQL statement when using Hive JDBC Driver
> ---
>
> Key: DRILL-4717
> URL: https://issues.apache.org/jira/browse/DRILL-4717
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JDBC
>Affects Versions: 1.6.0
>Reporter: Bryan Smith
>Priority: Minor
> Attachments: screenshot-1.png
>
>
> When using a Storage PlugIn of type JDBC with the Hive JDBC driver, Drill 
> inserts a period between the FROM keyword and the table name.  Hive rejects 
> the query statement as invalid.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4717) Drill inserts period into HQL statement when using Hive JDBC Driver

2016-06-12 Thread Bryan Smith (JIRA)
Bryan Smith created DRILL-4717:
--

 Summary: Drill inserts period into HQL statement when using Hive 
JDBC Driver
 Key: DRILL-4717
 URL: https://issues.apache.org/jira/browse/DRILL-4717
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - JDBC
Affects Versions: 1.6.0
Reporter: Bryan Smith
Priority: Minor


When using a Storage PlugIn of type JDBC with the Hive JDBC driver, Drill 
inserts a period between the FROM keyword and the table name.  Hive rejects the 
query statement as invalid.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4716) status.json doesn't work in drill ui

2016-06-12 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-4716:

Description: 
1. http://localhost:8047/status returns "Running!"
But http://localhost:8047/status.json gives error.
{code}
{
  "errorMessage" : "HTTP 404 Not Found"
}
{code}

2. Remove link to System Options on page http://localhost:8047/status as 
redundant.

  was:
1. http://localhost:8047/status returns "Running!"
But http://localhost:8047/status.json gives error.
{code}
{
  "errorMessage" : "HTTP 404 Not Found"
}
{code}

2. Link to System Options on page http://localhost:8047/status is corrupted.


> status.json doesn't work in drill ui
> 
>
> Key: DRILL-4716
> URL: https://issues.apache.org/jira/browse/DRILL-4716
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - HTTP
>Affects Versions: 1.6.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Minor
> Fix For: 1.7.0
>
>
> 1. http://localhost:8047/status returns "Running!"
> But http://localhost:8047/status.json gives error.
> {code}
> {
>   "errorMessage" : "HTTP 404 Not Found"
> }
> {code}
> 2. Remove link to System Options on page http://localhost:8047/status as 
> redundant.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4716) status.json doesn't work in drill ui

2016-06-12 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-4716:

Description: 
1. http://localhost:8047/status returns "Running!"
But http://localhost:8047/status.json gives error.
{code}
{
  "errorMessage" : "HTTP 404 Not Found"
}
{code}

2. Link to System Options on page http://localhost:8047/status is corrupted.

  was:
1. http://localhost:8047/status returns "Running!"
But http://localhost5:8047/status.json gives error.
{code}
{
  "errorMessage" : "HTTP 404 Not Found"
}
{code}

2. Link to System Options on page http://localhost:8047/status is corrupted.


> status.json doesn't work in drill ui
> 
>
> Key: DRILL-4716
> URL: https://issues.apache.org/jira/browse/DRILL-4716
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - HTTP
>Affects Versions: 1.6.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Minor
> Fix For: 1.7.0
>
>
> 1. http://localhost:8047/status returns "Running!"
> But http://localhost:8047/status.json gives error.
> {code}
> {
>   "errorMessage" : "HTTP 404 Not Found"
> }
> {code}
> 2. Link to System Options on page http://localhost:8047/status is corrupted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4716) status.json doesn't work in drill ui

2016-06-12 Thread Arina Ielchiieva (JIRA)
Arina Ielchiieva created DRILL-4716:
---

 Summary: status.json doesn't work in drill ui
 Key: DRILL-4716
 URL: https://issues.apache.org/jira/browse/DRILL-4716
 Project: Apache Drill
  Issue Type: Bug
  Components: Client - HTTP
Affects Versions: 1.6.0
Reporter: Arina Ielchiieva
Assignee: Arina Ielchiieva
Priority: Minor
 Fix For: 1.7.0


1. http://localhost:8047/status returns "Running!"
But http://localhost5:8047/status.json gives error.
{code}
{
  "errorMessage" : "HTTP 404 Not Found"
}
{code}

2. Link to System Options on page http://localhost:8047/status is corrupted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-2593) 500 error when crc for a query profile is out of sync

2016-06-12 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-2593:

Fix Version/s: (was: Future)
   1.7.0

> 500 error when crc for a query profile is out of sync
> -
>
> Key: DRILL-2593
> URL: https://issues.apache.org/jira/browse/DRILL-2593
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - HTTP
>Affects Versions: 0.7.0
>Reporter: Jason Altekruse
>Assignee: Arina Ielchiieva
> Fix For: 1.7.0
>
> Attachments: warning1.JPG, warning2.JPG
>
>
> To reproduce, on a machine where an embedded drillbit has been run, edit one 
> of the profiles stored in /tmp/drill/profiles and try to navigate to the 
> profiles page on the Web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2593) 500 error when crc for a query profile is out of sync

2016-06-12 Thread Arina Ielchiieva (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15326489#comment-15326489
 ] 

Arina Ielchiieva commented on DRILL-2593:
-

In case we have corrupted profile, it will be skipped but dismissable warning 
will be generated.
Screenshots - warning1.JPG, warning2.JPG.

> 500 error when crc for a query profile is out of sync
> -
>
> Key: DRILL-2593
> URL: https://issues.apache.org/jira/browse/DRILL-2593
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - HTTP
>Affects Versions: 0.7.0
>Reporter: Jason Altekruse
>Assignee: Arina Ielchiieva
> Fix For: Future
>
> Attachments: warning1.JPG, warning2.JPG
>
>
> To reproduce, on a machine where an embedded drillbit has been run, edit one 
> of the profiles stored in /tmp/drill/profiles and try to navigate to the 
> profiles page on the Web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-2593) 500 error when crc for a query profile is out of sync

2016-06-12 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-2593:

Attachment: warning2.JPG
warning1.JPG

> 500 error when crc for a query profile is out of sync
> -
>
> Key: DRILL-2593
> URL: https://issues.apache.org/jira/browse/DRILL-2593
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - HTTP
>Affects Versions: 0.7.0
>Reporter: Jason Altekruse
>Assignee: Arina Ielchiieva
> Fix For: Future
>
> Attachments: warning1.JPG, warning2.JPG
>
>
> To reproduce, on a machine where an embedded drillbit has been run, edit one 
> of the profiles stored in /tmp/drill/profiles and try to navigate to the 
> profiles page on the Web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (DRILL-2593) 500 error when crc for a query profile is out of sync

2016-06-12 Thread Arina Ielchiieva (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-2593:
---

Assignee: Arina Ielchiieva

> 500 error when crc for a query profile is out of sync
> -
>
> Key: DRILL-2593
> URL: https://issues.apache.org/jira/browse/DRILL-2593
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - HTTP
>Affects Versions: 0.7.0
>Reporter: Jason Altekruse
>Assignee: Arina Ielchiieva
> Fix For: Future
>
>
> To reproduce, on a machine where an embedded drillbit has been run, edit one 
> of the profiles stored in /tmp/drill/profiles and try to navigate to the 
> profiles page on the Web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)