[jira] [Commented] (HIVE-14474) Create datasource in Druid from Hive
[ https://issues.apache.org/jira/browse/HIVE-14474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15691733#comment-15691733 ] slim bouguerra commented on HIVE-14474: --- have add a new patch that creates/deletes druid segments https://issues.apache.org/jira/browse/HIVE-15277 > Create datasource in Druid from Hive > > > Key: HIVE-14474 > URL: https://issues.apache.org/jira/browse/HIVE-14474 > Project: Hive > Issue Type: Sub-task > Components: Druid integration >Affects Versions: 2.2.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-14474.01.patch, HIVE-14474.02.patch, > HIVE-14474.03.patch, HIVE-14474.04.patch, HIVE-14474.patch > > > We want to extend the DruidStorageHandler to support CTAS queries. > In the initial implementation proposed in this issue, we will write the > results of the query to HDFS (or the location specified in the CTAS > statement), and submit a HadoopIndexing task to the Druid overlord. The task > will contain the path where data was stored, it will read it and create the > segments in Druid. Once this is done, the results are removed from Hive. > The syntax will be as follows: > {code:sql} > CREATE TABLE druid_table_1 > STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler' > TBLPROPERTIES ("druid.datasource" = "my_query_based_datasource") > AS ; > {code} > This statement stores the results of query in a Druid > datasource named 'my_query_based_datasource'. One of the columns of the query > needs to be the time dimension, which is mandatory in Druid. In particular, > we use the same convention that it is used for Druid: there needs to be a the > column named '\_\_time' in the result of the executed query, which will act > as the time dimension column in Druid. Currently, the time column dimension > needs to be a 'timestamp' type column. > This initial implementation interacts with Druid API as it is currently > exposed to the user. In a follow-up issue, we should propose an > implementation that integrates tighter with Druid. In particular, we would > like to store segments directly in Druid from Hive, thus avoiding the > overhead of writing Hive results to HDFS and then launching a MR job that > basically reads them again to create the segments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14474) Create datasource in Druid from Hive
[ https://issues.apache.org/jira/browse/HIVE-14474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15554295#comment-15554295 ] Jesus Camacho Rodriguez commented on HIVE-14474: [~ashutoshc], it is up-to-date; it is just that the initial commit was 24 days ago, and then I just amended it... :) > Create datasource in Druid from Hive > > > Key: HIVE-14474 > URL: https://issues.apache.org/jira/browse/HIVE-14474 > Project: Hive > Issue Type: Sub-task > Components: Druid integration >Affects Versions: 2.2.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-14474.01.patch, HIVE-14474.02.patch, > HIVE-14474.03.patch, HIVE-14474.04.patch, HIVE-14474.patch > > > We want to extend the DruidStorageHandler to support CTAS queries. > In the initial implementation proposed in this issue, we will write the > results of the query to HDFS (or the location specified in the CTAS > statement), and submit a HadoopIndexing task to the Druid overlord. The task > will contain the path where data was stored, it will read it and create the > segments in Druid. Once this is done, the results are removed from Hive. > The syntax will be as follows: > {code:sql} > CREATE TABLE druid_table_1 > STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler' > TBLPROPERTIES ("druid.datasource" = "my_query_based_datasource") > AS ; > {code} > This statement stores the results of query in a Druid > datasource named 'my_query_based_datasource'. One of the columns of the query > needs to be the time dimension, which is mandatory in Druid. In particular, > we use the same convention that it is used for Druid: there needs to be a the > column named '\_\_time' in the result of the executed query, which will act > as the time dimension column in Druid. Currently, the time column dimension > needs to be a 'timestamp' type column. > This initial implementation interacts with Druid API as it is currently > exposed to the user. In a follow-up issue, we should propose an > implementation that integrates tighter with Druid. In particular, we would > like to store segments directly in Druid from Hive, thus avoiding the > overhead of writing Hive results to HDFS and then launching a MR job that > basically reads them again to create the segments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14474) Create datasource in Druid from Hive
[ https://issues.apache.org/jira/browse/HIVE-14474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553992#comment-15553992 ] Ashutosh Chauhan commented on HIVE-14474: - Can you create a RB request for it? Seems like GH request isn't upto date. > Create datasource in Druid from Hive > > > Key: HIVE-14474 > URL: https://issues.apache.org/jira/browse/HIVE-14474 > Project: Hive > Issue Type: Sub-task > Components: Druid integration >Affects Versions: 2.2.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-14474.01.patch, HIVE-14474.02.patch, > HIVE-14474.03.patch, HIVE-14474.04.patch, HIVE-14474.patch > > > We want to extend the DruidStorageHandler to support CTAS queries. > In the initial implementation proposed in this issue, we will write the > results of the query to HDFS (or the location specified in the CTAS > statement), and submit a HadoopIndexing task to the Druid overlord. The task > will contain the path where data was stored, it will read it and create the > segments in Druid. Once this is done, the results are removed from Hive. > The syntax will be as follows: > {code:sql} > CREATE TABLE druid_table_1 > STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler' > TBLPROPERTIES ("druid.datasource" = "my_query_based_datasource") > AS ; > {code} > This statement stores the results of query in a Druid > datasource named 'my_query_based_datasource'. One of the columns of the query > needs to be the time dimension, which is mandatory in Druid. In particular, > we use the same convention that it is used for Druid: there needs to be a the > column named '\_\_time' in the result of the executed query, which will act > as the time dimension column in Druid. Currently, the time column dimension > needs to be a 'timestamp' type column. > This initial implementation interacts with Druid API as it is currently > exposed to the user. In a follow-up issue, we should propose an > implementation that integrates tighter with Druid. In particular, we would > like to store segments directly in Druid from Hive, thus avoiding the > overhead of writing Hive results to HDFS and then launching a MR job that > basically reads them again to create the segments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14474) Create datasource in Druid from Hive
[ https://issues.apache.org/jira/browse/HIVE-14474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15548763#comment-15548763 ] Hive QA commented on HIVE-14474: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12831727/HIVE-14474.04.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 10656 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_mapjoin] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ctas] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char] org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1403/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1403/console Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-Build-1403/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12831727 - PreCommit-HIVE-Build > Create datasource in Druid from Hive > > > Key: HIVE-14474 > URL: https://issues.apache.org/jira/browse/HIVE-14474 > Project: Hive > Issue Type: Sub-task > Components: Druid integration >Affects Versions: 2.2.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-14474.01.patch, HIVE-14474.02.patch, > HIVE-14474.03.patch, HIVE-14474.04.patch, HIVE-14474.patch > > > We want to extend the DruidStorageHandler to support CTAS queries. > In the initial implementation proposed in this issue, we will write the > results of the query to HDFS (or the location specified in the CTAS > statement), and submit a HadoopIndexing task to the Druid overlord. The task > will contain the path where data was stored, it will read it and create the > segments in Druid. Once this is done, the results are removed from Hive. > The syntax will be as follows: > {code:sql} > CREATE TABLE druid_table_1 > STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler' > TBLPROPERTIES ("druid.datasource" = "my_query_based_datasource") > AS ; > {code} > This statement stores the results of query in a Druid > datasource named 'my_query_based_datasource'. One of the columns of the query > needs to be the time dimension, which is mandatory in Druid. In particular, > we use the same convention that it is used for Druid: there needs to be a the > column named '\_\_time' in the result of the executed query, which will act > as the time dimension column in Druid. Currently, the time column dimension > needs to be a 'timestamp' type column. > This initial implementation interacts with Druid API as it is currently > exposed to the user. In a follow-up issue, we should propose an > implementation that integrates tighter with Druid. In particular, we would > like to store segments directly in Druid from Hive, thus avoiding the > overhead of writing Hive results to HDFS and then launching a MR job that > basically reads them again to create the segments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14474) Create datasource in Druid from Hive
[ https://issues.apache.org/jira/browse/HIVE-14474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15546876#comment-15546876 ] Hive QA commented on HIVE-14474: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12831579/HIVE-14474.03.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10657 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_mapjoin] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ctas] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char] org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver[hbase_bulk] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[druid_external] org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1391/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1391/console Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-Build-1391/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 7 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12831579 - PreCommit-HIVE-Build > Create datasource in Druid from Hive > > > Key: HIVE-14474 > URL: https://issues.apache.org/jira/browse/HIVE-14474 > Project: Hive > Issue Type: Sub-task > Components: Druid integration >Affects Versions: 2.2.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-14474.01.patch, HIVE-14474.02.patch, > HIVE-14474.03.patch, HIVE-14474.patch > > > We want to extend the DruidStorageHandler to support CTAS queries. > We need to implement a DruidOutputFormat that can create Druid segments from > the output of the Hive query and store them directly in Druid. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14474) Create datasource in Druid from Hive
[ https://issues.apache.org/jira/browse/HIVE-14474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15546101#comment-15546101 ] ASF GitHub Bot commented on HIVE-14474: --- GitHub user jcamachor opened a pull request: https://github.com/apache/hive/pull/107 HIVE-14474: Create datasource in Druid from Hive You can merge this pull request into a Git repository by running: $ git pull https://github.com/jcamachor/hive HIVE-druid-index Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/107.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #107 commit a6f2961c211c0fc51203e81e832d46f8bbdb7859 Author: Jesus Camacho RodriguezDate: 2016-09-13T14:56:37Z HIVE-14474: Create datasource in Druid from Hive > Create datasource in Druid from Hive > > > Key: HIVE-14474 > URL: https://issues.apache.org/jira/browse/HIVE-14474 > Project: Hive > Issue Type: Sub-task > Components: Druid integration >Affects Versions: 2.2.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-14474.01.patch, HIVE-14474.02.patch, > HIVE-14474.03.patch, HIVE-14474.patch > > > We want to extend the DruidStorageHandler to support CTAS queries. > We need to implement a DruidOutputFormat that can create Druid segments from > the output of the Hive query and store them directly in Druid. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14474) Create datasource in Druid from Hive
[ https://issues.apache.org/jira/browse/HIVE-14474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15490692#comment-15490692 ] Hive QA commented on HIVE-14474: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12828463/HIVE-14474.02.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10546 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_mapjoin] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[stats0] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char] org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[acid_bucket_pruning] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/1187/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/1187/console Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-1187/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12828463 - PreCommit-HIVE-MASTER-Build > Create datasource in Druid from Hive > > > Key: HIVE-14474 > URL: https://issues.apache.org/jira/browse/HIVE-14474 > Project: Hive > Issue Type: Sub-task > Components: Druid integration >Affects Versions: 2.2.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-14474.01.patch, HIVE-14474.02.patch, > HIVE-14474.patch > > > We want to extend the DruidStorageHandler to support CTAS queries. > We need to implement a DruidOutputFormat that can create Druid segments from > the output of the Hive query and store them directly in Druid. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14474) Create datasource in Druid from Hive
[ https://issues.apache.org/jira/browse/HIVE-14474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15490510#comment-15490510 ] Hive QA commented on HIVE-14474: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12828434/HIVE-14474.01.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10546 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_mapjoin] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[stats0] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char] org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[acid_bucket_pruning] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching org.apache.hive.spark.client.TestSparkClient.testJobSubmission {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/1186/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/1186/console Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-1186/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 7 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12828434 - PreCommit-HIVE-MASTER-Build > Create datasource in Druid from Hive > > > Key: HIVE-14474 > URL: https://issues.apache.org/jira/browse/HIVE-14474 > Project: Hive > Issue Type: Sub-task > Components: Druid integration >Affects Versions: 2.2.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-14474.01.patch, HIVE-14474.02.patch, > HIVE-14474.patch > > > We want to extend the DruidStorageHandler to support CTAS queries. > We need to implement a DruidOutputFormat that can create Druid segments from > the output of the Hive query and store them directly in Druid. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14474) Create datasource in Druid from Hive
[ https://issues.apache.org/jira/browse/HIVE-14474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15487685#comment-15487685 ] Hive QA commented on HIVE-14474: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12828278/HIVE-14474.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 17 failed/errored test(s), 10547 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_mapjoin] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[druid_basic1] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[druid_basic2] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[druid_intervals] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[druid_timeseries] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[druid_topn] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[stats0] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char] org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[acid_bucket_pruning] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[druid_address] org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[druid_buckets] org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[druid_datasource] org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[druid_external] org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[druid_location] org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[druid_partitions] org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/1164/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/1164/console Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-1164/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 17 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12828278 - PreCommit-HIVE-MASTER-Build > Create datasource in Druid from Hive > > > Key: HIVE-14474 > URL: https://issues.apache.org/jira/browse/HIVE-14474 > Project: Hive > Issue Type: Sub-task > Components: Druid integration >Affects Versions: 2.2.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-14474.patch > > > We want to extend the DruidStorageHandler to support CTAS queries. > We need to implement a DruidOutputFormat that can create Druid segments from > the output of the Hive query and store them directly in Druid. -- This message was sent by Atlassian JIRA (v6.3.4#6332)