[jira] [Commented] (HIVE-15928) Parallelization of Select queries in Druid handler

2017-02-22 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15880047#comment-15880047
 ] 

Lefty Leverenz commented on HIVE-15928:
---

Doc note:  This adds configuration parameter *hive.druid.select.distribute* and 
amends the description of *hive.druid.select.threshold*, which was created by 
HIVE-14217 (also in 2.2.0).  They need to be documented in the wiki.

* [Configuration Properties -- Query and DDL Execution | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-QueryandDDLExecution]
* [Druid Integration | 
https://cwiki.apache.org/confluence/display/Hive/Druid+Integration]

Added a TODOC2.2 label.

> Parallelization of Select queries in Druid handler
> --
>
> Key: HIVE-15928
> URL: https://issues.apache.org/jira/browse/HIVE-15928
> Project: Hive
>  Issue Type: Sub-task
>  Components: Druid integration
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: HIVE-15928.01.patch, HIVE-15928.02.patch, 
> HIVE-15928.patch
>
>
> Even if we split a Select query along its time dimension, parallelization is 
> limited as all queries will hit the broker node. Instead, we can interrogate 
> the broker to get the Druid nodes that contain the data, and query those 
> nodes directly.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15928) Parallelization of Select queries in Druid handler

2017-02-21 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15876479#comment-15876479
 ] 

Ashutosh Chauhan commented on HIVE-15928:
-

+1

> Parallelization of Select queries in Druid handler
> --
>
> Key: HIVE-15928
> URL: https://issues.apache.org/jira/browse/HIVE-15928
> Project: Hive
>  Issue Type: Sub-task
>  Components: Druid integration
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-15928.01.patch, HIVE-15928.02.patch, 
> HIVE-15928.patch
>
>
> Even if we split a Select query along its time dimension, parallelization is 
> limited as all queries will hit the broker node. Instead, we can interrogate 
> the broker to get the Druid nodes that contain the data, and query those 
> nodes directly.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15928) Parallelization of Select queries in Druid handler

2017-02-21 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15876210#comment-15876210
 ] 

Hive QA commented on HIVE-15928:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12853758/HIVE-15928.02.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 10251 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_auto_join1] 
(batchId=3)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join31] (batchId=81)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[multiMapJoin2]
 (batchId=152)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=140)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=223)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=223)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[join31] 
(batchId=133)
org.apache.hive.beeline.TestBeeLineWithArgs.testQueryProgressParallel 
(batchId=211)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3670/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3670/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3670/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 10 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12853758 - PreCommit-HIVE-Build

> Parallelization of Select queries in Druid handler
> --
>
> Key: HIVE-15928
> URL: https://issues.apache.org/jira/browse/HIVE-15928
> Project: Hive
>  Issue Type: Sub-task
>  Components: Druid integration
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-15928.01.patch, HIVE-15928.02.patch, 
> HIVE-15928.patch
>
>
> Even if we split a Select query along its time dimension, parallelization is 
> limited as all queries will hit the broker node. Instead, we can interrogate 
> the broker to get the Druid nodes that contain the data, and query those 
> nodes directly.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15928) Parallelization of Select queries in Druid handler

2017-02-20 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15874892#comment-15874892
 ] 

Hive QA commented on HIVE-15928:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12853584/HIVE-15928.01.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10249 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_auto_join1] 
(batchId=3)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join31] (batchId=81)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[multiMapJoin2]
 (batchId=152)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=140)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=223)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[join31] 
(batchId=133)
org.apache.hadoop.hive.druid.TestHiveDruidQueryBasedInputFormat.testTimeZone 
(batchId=235)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3660/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3660/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3660/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12853584 - PreCommit-HIVE-Build

> Parallelization of Select queries in Druid handler
> --
>
> Key: HIVE-15928
> URL: https://issues.apache.org/jira/browse/HIVE-15928
> Project: Hive
>  Issue Type: Sub-task
>  Components: Druid integration
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-15928.01.patch, HIVE-15928.patch
>
>
> Even if we split a Select query along its time dimension, parallelization is 
> limited as all queries will hit the broker node. Instead, we can interrogate 
> the broker to get the Druid nodes that contain the data, and query those 
> nodes directly.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15928) Parallelization of Select queries in Druid handler

2017-02-20 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15874836#comment-15874836
 ] 

Jesus Camacho Rodriguez commented on HIVE-15928:


[~ashutoshc], I have updated the patch, could you take a look? Thanks

> Parallelization of Select queries in Druid handler
> --
>
> Key: HIVE-15928
> URL: https://issues.apache.org/jira/browse/HIVE-15928
> Project: Hive
>  Issue Type: Sub-task
>  Components: Druid integration
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-15928.01.patch, HIVE-15928.patch
>
>
> Even if we split a Select query along its time dimension, parallelization is 
> limited as all queries will hit the broker node. Instead, we can interrogate 
> the broker to get the Druid nodes that contain the data, and query those 
> nodes directly.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15928) Parallelization of Select queries in Druid handler

2017-02-17 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15871690#comment-15871690
 ] 

Jesus Camacho Rodriguez commented on HIVE-15928:


[~bslim], [~ashutoshc], could you take a look? I have been running tests in the 
cluster and it seems to be working fine. [~bslim], it would be nice if you 
could give it a try too?

I have not added tests because it seems quite difficult to test this feature 
without integration tests. But if you have any ideas, let me know.

> Parallelization of Select queries in Druid handler
> --
>
> Key: HIVE-15928
> URL: https://issues.apache.org/jira/browse/HIVE-15928
> Project: Hive
>  Issue Type: Sub-task
>  Components: Druid integration
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-15928.patch
>
>
> Even if we split a Select query along its time dimension, parallelization is 
> limited as all queries will hit the broker node. Instead, we can interrogate 
> the broker to get the Druid nodes that contain the data, and query those 
> nodes directly.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15928) Parallelization of Select queries in Druid handler

2017-02-16 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15870500#comment-15870500
 ] 

Hive QA commented on HIVE-15928:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12853070/HIVE-15928.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 10224 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_auto_join1] 
(batchId=3)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join31] (batchId=81)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[multiMapJoin2]
 (batchId=152)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=223)
org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver
 (batchId=116)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[join31] 
(batchId=133)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3602/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3602/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3602/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12853070 - PreCommit-HIVE-Build

> Parallelization of Select queries in Druid handler
> --
>
> Key: HIVE-15928
> URL: https://issues.apache.org/jira/browse/HIVE-15928
> Project: Hive
>  Issue Type: Sub-task
>  Components: Druid integration
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-15928.patch
>
>
> Even if we split a Select query along its time dimension, parallelization is 
> limited as all queries will hit the broker node. Instead, we can interrogate 
> the broker to get the Druid nodes that contain the data, and query those 
> nodes directly.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)