[jira] [Commented] (HIVE-15928) Parallelization of Select queries in Druid handler
[ https://issues.apache.org/jira/browse/HIVE-15928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15880047#comment-15880047 ] Lefty Leverenz commented on HIVE-15928: --- Doc note: This adds configuration parameter *hive.druid.select.distribute* and amends the description of *hive.druid.select.threshold*, which was created by HIVE-14217 (also in 2.2.0). They need to be documented in the wiki. * [Configuration Properties -- Query and DDL Execution | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-QueryandDDLExecution] * [Druid Integration | https://cwiki.apache.org/confluence/display/Hive/Druid+Integration] Added a TODOC2.2 label. > Parallelization of Select queries in Druid handler > -- > > Key: HIVE-15928 > URL: https://issues.apache.org/jira/browse/HIVE-15928 > Project: Hive > Issue Type: Sub-task > Components: Druid integration >Affects Versions: 2.2.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Labels: TODOC2.2 > Fix For: 2.2.0 > > Attachments: HIVE-15928.01.patch, HIVE-15928.02.patch, > HIVE-15928.patch > > > Even if we split a Select query along its time dimension, parallelization is > limited as all queries will hit the broker node. Instead, we can interrogate > the broker to get the Druid nodes that contain the data, and query those > nodes directly. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-15928) Parallelization of Select queries in Druid handler
[ https://issues.apache.org/jira/browse/HIVE-15928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15876479#comment-15876479 ] Ashutosh Chauhan commented on HIVE-15928: - +1 > Parallelization of Select queries in Druid handler > -- > > Key: HIVE-15928 > URL: https://issues.apache.org/jira/browse/HIVE-15928 > Project: Hive > Issue Type: Sub-task > Components: Druid integration >Affects Versions: 2.2.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-15928.01.patch, HIVE-15928.02.patch, > HIVE-15928.patch > > > Even if we split a Select query along its time dimension, parallelization is > limited as all queries will hit the broker node. Instead, we can interrogate > the broker to get the Druid nodes that contain the data, and query those > nodes directly. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-15928) Parallelization of Select queries in Druid handler
[ https://issues.apache.org/jira/browse/HIVE-15928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15876210#comment-15876210 ] Hive QA commented on HIVE-15928: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12853758/HIVE-15928.02.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 10251 tests executed *Failed tests:* {noformat} TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) (batchId=235) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_auto_join1] (batchId=3) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join31] (batchId=81) org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys] (batchId=159) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[multiMapJoin2] (batchId=152) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=140) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=223) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] (batchId=223) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[join31] (batchId=133) org.apache.hive.beeline.TestBeeLineWithArgs.testQueryProgressParallel (batchId=211) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3670/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3670/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3670/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 10 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12853758 - PreCommit-HIVE-Build > Parallelization of Select queries in Druid handler > -- > > Key: HIVE-15928 > URL: https://issues.apache.org/jira/browse/HIVE-15928 > Project: Hive > Issue Type: Sub-task > Components: Druid integration >Affects Versions: 2.2.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-15928.01.patch, HIVE-15928.02.patch, > HIVE-15928.patch > > > Even if we split a Select query along its time dimension, parallelization is > limited as all queries will hit the broker node. Instead, we can interrogate > the broker to get the Druid nodes that contain the data, and query those > nodes directly. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-15928) Parallelization of Select queries in Druid handler
[ https://issues.apache.org/jira/browse/HIVE-15928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15874892#comment-15874892 ] Hive QA commented on HIVE-15928: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12853584/HIVE-15928.01.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10249 tests executed *Failed tests:* {noformat} TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) (batchId=235) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_auto_join1] (batchId=3) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join31] (batchId=81) org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys] (batchId=159) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[multiMapJoin2] (batchId=152) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=140) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=223) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[join31] (batchId=133) org.apache.hadoop.hive.druid.TestHiveDruidQueryBasedInputFormat.testTimeZone (batchId=235) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3660/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3660/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3660/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 9 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12853584 - PreCommit-HIVE-Build > Parallelization of Select queries in Druid handler > -- > > Key: HIVE-15928 > URL: https://issues.apache.org/jira/browse/HIVE-15928 > Project: Hive > Issue Type: Sub-task > Components: Druid integration >Affects Versions: 2.2.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-15928.01.patch, HIVE-15928.patch > > > Even if we split a Select query along its time dimension, parallelization is > limited as all queries will hit the broker node. Instead, we can interrogate > the broker to get the Druid nodes that contain the data, and query those > nodes directly. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-15928) Parallelization of Select queries in Druid handler
[ https://issues.apache.org/jira/browse/HIVE-15928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15874836#comment-15874836 ] Jesus Camacho Rodriguez commented on HIVE-15928: [~ashutoshc], I have updated the patch, could you take a look? Thanks > Parallelization of Select queries in Druid handler > -- > > Key: HIVE-15928 > URL: https://issues.apache.org/jira/browse/HIVE-15928 > Project: Hive > Issue Type: Sub-task > Components: Druid integration >Affects Versions: 2.2.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-15928.01.patch, HIVE-15928.patch > > > Even if we split a Select query along its time dimension, parallelization is > limited as all queries will hit the broker node. Instead, we can interrogate > the broker to get the Druid nodes that contain the data, and query those > nodes directly. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-15928) Parallelization of Select queries in Druid handler
[ https://issues.apache.org/jira/browse/HIVE-15928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15871690#comment-15871690 ] Jesus Camacho Rodriguez commented on HIVE-15928: [~bslim], [~ashutoshc], could you take a look? I have been running tests in the cluster and it seems to be working fine. [~bslim], it would be nice if you could give it a try too? I have not added tests because it seems quite difficult to test this feature without integration tests. But if you have any ideas, let me know. > Parallelization of Select queries in Druid handler > -- > > Key: HIVE-15928 > URL: https://issues.apache.org/jira/browse/HIVE-15928 > Project: Hive > Issue Type: Sub-task > Components: Druid integration >Affects Versions: 2.2.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-15928.patch > > > Even if we split a Select query along its time dimension, parallelization is > limited as all queries will hit the broker node. Instead, we can interrogate > the broker to get the Druid nodes that contain the data, and query those > nodes directly. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-15928) Parallelization of Select queries in Druid handler
[ https://issues.apache.org/jira/browse/HIVE-15928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15870500#comment-15870500 ] Hive QA commented on HIVE-15928: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12853070/HIVE-15928.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 10224 tests executed *Failed tests:* {noformat} TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) (batchId=235) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_auto_join1] (batchId=3) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join31] (batchId=81) org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys] (batchId=159) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[multiMapJoin2] (batchId=152) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=223) org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver (batchId=116) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[join31] (batchId=133) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3602/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3602/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3602/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 8 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12853070 - PreCommit-HIVE-Build > Parallelization of Select queries in Druid handler > -- > > Key: HIVE-15928 > URL: https://issues.apache.org/jira/browse/HIVE-15928 > Project: Hive > Issue Type: Sub-task > Components: Druid integration >Affects Versions: 2.2.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-15928.patch > > > Even if we split a Select query along its time dimension, parallelization is > limited as all queries will hit the broker node. Instead, we can interrogate > the broker to get the Druid nodes that contain the data, and query those > nodes directly. -- This message was sent by Atlassian JIRA (v6.3.15#6346)