[jira] [Created] (HIVE-24421) DruidOutputFormat and DruidStorageHandler use different filesystem causing issues in data loading
Nishant Bangarwa created HIVE-24421: --- Summary: DruidOutputFormat and DruidStorageHandler use different filesystem causing issues in data loading Key: HIVE-24421 URL: https://issues.apache.org/jira/browse/HIVE-24421 Project: Hive Issue Type: Bug Components: Druid integration Reporter: Nishant Bangarwa Assignee: Nishant Bangarwa -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24420) Druid test failures
Nishant Bangarwa created HIVE-24420: --- Summary: Druid test failures Key: HIVE-24420 URL: https://issues.apache.org/jira/browse/HIVE-24420 Project: Hive Issue Type: Bug Components: Druid integration Reporter: Nishant Bangarwa Assignee: Nishant Bangarwa Test Result (11 failures / ±0) org.apache.hadoop.hive.cli.TestMiniDruidCliDriver.testCliDriver[druid_timestamptz2] org.apache.hadoop.hive.cli.TestMiniDruidCliDriver.testCliDriver[druidmini_dynamic_partition] org.apache.hadoop.hive.cli.TestMiniDruidCliDriver.testCliDriver[druidmini_expressions] org.apache.hadoop.hive.cli.TestMiniDruidCliDriver.testCliDriver[druidmini_extractTime] org.apache.hadoop.hive.cli.TestMiniDruidCliDriver.testCliDriver[druidmini_floorTime] org.apache.hadoop.hive.cli.TestMiniDruidCliDriver.testCliDriver[druidmini_mv] org.apache.hadoop.hive.cli.TestMiniDruidCliDriver.testCliDriver[druidmini_semijoin_reduction_all_types] org.apache.hadoop.hive.cli.TestMiniDruidCliDriver.testCliDriver[druidmini_test1] org.apache.hadoop.hive.cli.TestMiniDruidCliDriver.testCliDriver[druidmini_test_alter] org.apache.hadoop.hive.cli.TestMiniDruidCliDriver.testCliDriver[druidmini_test_insert] org.apache.hadoop.hive.cli.TestMiniDruidCliDriver.testCliDriver[druidmini_test_ts] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23770) Druid filter translation unable to handle inverted between
Nishant Bangarwa created HIVE-23770: --- Summary: Druid filter translation unable to handle inverted between Key: HIVE-23770 URL: https://issues.apache.org/jira/browse/HIVE-23770 Project: Hive Issue Type: Bug Reporter: Nishant Bangarwa Assignee: Nishant Bangarwa Druid filter translation happens in Calcite and does not uses HiveBetween inverted flag for translation this misses a negation in the planned query -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23184) Upgrade druid to 0.17.1
Nishant Bangarwa created HIVE-23184: --- Summary: Upgrade druid to 0.17.1 Key: HIVE-23184 URL: https://issues.apache.org/jira/browse/HIVE-23184 Project: Hive Issue Type: Bug Components: Druid integration Reporter: Nishant Bangarwa Assignee: Nishant Bangarwa Upgrade to druid latest release 0.17.1 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22933) Allow connecting kerberos-enabled Hive to connect to a non-kerberos druid cluster
Nishant Bangarwa created HIVE-22933: --- Summary: Allow connecting kerberos-enabled Hive to connect to a non-kerberos druid cluster Key: HIVE-22933 URL: https://issues.apache.org/jira/browse/HIVE-22933 Project: Hive Issue Type: Bug Reporter: Nishant Bangarwa Assignee: Nishant Bangarwa Currently, If kerberos is enabled for hive, it can only connect to external druid clusters which are kerberos enabled, Since the Druid client used to connect to druid is always KerberosHTTPClient, This task is to allow a kerberos enabled hiverserver2 to connect to non-kerberized druid cluster. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22395) Add ability to read Druid metastore password from jceks
Nishant Bangarwa created HIVE-22395: --- Summary: Add ability to read Druid metastore password from jceks Key: HIVE-22395 URL: https://issues.apache.org/jira/browse/HIVE-22395 Project: Hive Issue Type: Bug Reporter: Nishant Bangarwa Assignee: Nishant Bangarwa -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22394) Duplicate Jars in druid classpath causing issues
Nishant Bangarwa created HIVE-22394: --- Summary: Duplicate Jars in druid classpath causing issues Key: HIVE-22394 URL: https://issues.apache.org/jira/browse/HIVE-22394 Project: Hive Issue Type: Bug Reporter: Nishant Bangarwa Assignee: Nishant Bangarwa hive-druid-handler jar has shaded version of druid classes, druid-hdfs-storage also has non-shaded classes. {code} [hive@hiveserver2-1 lib]$ ls |grep druid calcite-druid-1.19.0.7.0.2.0-163.jar druid-bloom-filter-0.15.1.7.0.2.0-163.jar druid-hdfs-storage-0.15.1.7.0.2.0-163.jar hive-druid-handler-3.1.2000.7.0.2.0-163.jar hive-druid-handler.jar {code} Exception below - {code} Caused by: java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.fs.HadoopFsWrapper at org.apache.hive.druid.com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299) at org.apache.hive.druid.com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286) at org.apache.hive.druid.com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) at org.apache.hadoop.hive.druid.io.DruidRecordWriter.pushSegments(DruidRecordWriter.java:177) ... 22 more Caused by: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.fs.HadoopFsWrapper at org.apache.hive.druid.org.apache.druid.segment.realtime.appenderator.AppenderatorImpl.mergeAndPush(AppenderatorImpl.java:765) at org.apache.hive.druid.org.apache.druid.segment.realtime.appenderator.AppenderatorImpl.lambda$push$1(AppenderatorImpl.java:630) at org.apache.hive.druid.com.google.common.util.concurrent.Futures$1.apply(Futures.java:713) at org.apache.hive.druid.com.google.common.util.concurrent.Futures$ChainingListenableFuture.run(Futures.java:861) ... 3 more Caused by: java.lang.RuntimeException: java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.fs.HadoopFsWrapper at org.apache.hive.druid.org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:96) at org.apache.hive.druid.org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:114) at org.apache.hive.druid.org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:104) at org.apache.hive.druid.org.apache.druid.segment.realtime.appenderator.AppenderatorImpl.mergeAndPush(AppenderatorImpl.java:743) ... 6 more Caused by: java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.fs.HadoopFsWrapper at org.apache.hive.druid.org.apache.druid.storage.hdfs.HdfsDataSegmentPusher.copyFilesWithChecks(HdfsDataSegmentPusher.java:163) at org.apache.hive.druid.org.apache.druid.storage.hdfs.HdfsDataSegmentPusher.push(HdfsDataSegmentPusher.java:145) at org.apache.hive.druid.org.apache.druid.segment.realtime.appenderator.AppenderatorImpl.lambda$mergeAndPush$4(AppenderatorImpl.java:747) at org.apache.hive.druid.org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:86) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-21628) Use druid-s3-extensions when using S3 as druid deep storage
Nishant Bangarwa created HIVE-21628: --- Summary: Use druid-s3-extensions when using S3 as druid deep storage Key: HIVE-21628 URL: https://issues.apache.org/jira/browse/HIVE-21628 Project: Hive Issue Type: Task Reporter: Nishant Bangarwa Currently DruidStorageHandler always use druid-hdfs-extensions for S3 as well as HDFS. HDFS extension, pushes the segment to an intermediate directory and then does rename to copy it to final path. 1) The rename causes additional copy of data over, which is avoided by druid-s3 extension 2) rename may fail when the pushed file is not yet available due to eventual consistent model of S3. Refer exception below - {code} Caused by: java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.io.FileNotFoundException: No such file or directory: s3a://edws-nishant-test/druid/druid-1555443464-ggdf/data/workingDirectory/.staging-hive_20190417170114_a7fb3dcd-623b-46ca-bb87-9aac2fb50c6c/intermediateSegmentDir/default.cmv_basetable_d_7/11b3ceeb8d2843508336aac3347687cb/0_index.zip at org.apache.hive.druid.com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299) at org.apache.hive.druid.com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286) at org.apache.hive.druid.com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) at org.apache.hadoop.hive.druid.io.DruidRecordWriter.pushSegments(DruidRecordWriter.java:184) ... 22 more Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: No such file or directory: s3a://edws-nishant-test/druid/druid-1555443464-ggdf/data/workingDirectory/.staging-hive_20190417170114_a7fb3dcd-623b-46ca-bb87-9aac2fb50c6c/intermediateSegmentDir/default.cmv_basetable_d_7/11b3ceeb8d2843508336aac3347687cb/0_index.zip at org.apache.hive.druid.com.google.common.base.Throwables.propagate(Throwables.java:160) at org.apache.hive.druid.io.druid.segment.realtime.appenderator.AppenderatorImpl.mergeAndPush(AppenderatorImpl.java:665) at org.apache.hive.druid.io.druid.segment.realtime.appenderator.AppenderatorImpl.lambda$push$0(AppenderatorImpl.java:528) at org.apache.hive.druid.com.google.common.util.concurrent.Futures$1.apply(Futures.java:713) at org.apache.hive.druid.com.google.common.util.concurrent.Futures$ChainingListenableFuture.run(Futures.java:861) ... 3 more Caused by: java.io.FileNotFoundException: No such file or directory: s3a://edws-nishant-test/druid/druid-1555443464-ggdf/data/workingDirectory/.staging-hive_20190417170114_a7fb3dcd-623b-46ca-bb87-9aac2fb50c6c/intermediateSegmentDir/default.cmv_basetable_d_7/11b3ceeb8d2843508336aac3347687cb/0_index.zip at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2488) at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2382) at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2321) at org.apache.hadoop.fs.FileSystem.getFileLinkStatus(FileSystem.java:2727) at org.apache.hadoop.fs.FileSystem.rename(FileSystem.java:1560) at org.apache.hadoop.fs.HadoopFsWrapper.rename(HadoopFsWrapper.java:53) at org.apache.hive.druid.io.druid.storage.hdfs.HdfsDataSegmentPusher.copyFilesWithChecks(HdfsDataSegmentPusher.java:168) at org.apache.hive.druid.io.druid.storage.hdfs.HdfsDataSegmentPusher.push(HdfsDataSegmentPusher.java:149) at org.apache.hive.druid.io.druid.segment.realtime.appenderator.AppenderatorImpl.lambda$mergeAndPush$3(AppenderatorImpl.java:647) at org.apache.hive.druid.io.druid.java.util.common.RetryUtils.retry(RetryUtils.java:63) at org.apache.hive.druid.io.druid.java.util.common.RetryUtils.retry(RetryUtils.java:81) at org.apache.hive.druid.io.druid.segment.realtime.appenderator.AppenderatorImpl.mergeAndPush(AppenderatorImpl.java:638) ... 6 more {code} This task is add the ability to switch to using druid-s3-extension when using S3A file scheme for druid storage directory. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21612) Upgrade druid to 0.14.0-incubating
Nishant Bangarwa created HIVE-21612: --- Summary: Upgrade druid to 0.14.0-incubating Key: HIVE-21612 URL: https://issues.apache.org/jira/browse/HIVE-21612 Project: Hive Issue Type: Task Reporter: Nishant Bangarwa Assignee: Nishant Bangarwa Druid 0.14.0-incubating is released. This task is to upgrade hive to use 0.14.0-incubating version of druid. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20709) ASF License issue in HiveJDBCImplementor
Nishant Bangarwa created HIVE-20709: --- Summary: ASF License issue in HiveJDBCImplementor Key: HIVE-20709 URL: https://issues.apache.org/jira/browse/HIVE-20709 Project: Hive Issue Type: Bug Reporter: Nishant Bangarwa Assignee: Nishant Bangarwa Lines that start with ? in the ASF License report indicate files that do not have an Apache license header: !? /data/hiveptest/working/yetus_PreCommit-HIVE-Build-14277/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/jdbc/HiveJdbcImplementor.java -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20700) Add config to disable rollup for druid
Nishant Bangarwa created HIVE-20700: --- Summary: Add config to disable rollup for druid Key: HIVE-20700 URL: https://issues.apache.org/jira/browse/HIVE-20700 Project: Hive Issue Type: New Feature Components: Druid integration Reporter: Nishant Bangarwa Assignee: Nishant Bangarwa Add a table property - 'druid.rollup' to allow disabling rollup for druid tables. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20698) Better error instead of NPE when timestamp is null for any row when ingesting to druid
Nishant Bangarwa created HIVE-20698: --- Summary: Better error instead of NPE when timestamp is null for any row when ingesting to druid Key: HIVE-20698 URL: https://issues.apache.org/jira/browse/HIVE-20698 Project: Hive Issue Type: Improvement Components: Druid integration Reporter: Nishant Bangarwa Assignee: Nishant Bangarwa Currently when ingesting data to druid we get a wierd NPE when timestamp is null for any row. We should provide an error with a better message which helps user to know what is actually wrong. {code} Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.druid.serde.DruidSerDe.serialize(DruidSerDe.java:364) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:957) at org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator.process(VectorFileSinkOperator.java:111) at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:965) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:938) at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:158) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:480) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20687) Cancel Running Druid Query when a hive query is cancelled.
Nishant Bangarwa created HIVE-20687: --- Summary: Cancel Running Druid Query when a hive query is cancelled. Key: HIVE-20687 URL: https://issues.apache.org/jira/browse/HIVE-20687 Project: Hive Issue Type: Improvement Components: Druid integration Reporter: Nishant Bangarwa Assignee: Nishant Bangarwa https://issues.apache.org/jira/browse/HIVE-20686 ensures that hive query id is passed to druid. Druid also supports query cancellation by query id. Queries can be cancelled explicitly using their queryId by sending a DELETE request to following endpoint on the broker or router - {code} DELETE /druid/v2/{queryId} {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20686) Sync QueryIDs across hive and druid
Nishant Bangarwa created HIVE-20686: --- Summary: Sync QueryIDs across hive and druid Key: HIVE-20686 URL: https://issues.apache.org/jira/browse/HIVE-20686 Project: Hive Issue Type: Improvement Components: Druid integration Reporter: Nishant Bangarwa Assignee: Nishant Bangarwa For the queries that hive passes to druid, pass on additional queryID as query context. It will be useful in tracing query level metrics across druid and hive. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20684) Analyze table compute stats fails for tables containing timestamp with local time zone column
Nishant Bangarwa created HIVE-20684: --- Summary: Analyze table compute stats fails for tables containing timestamp with local time zone column Key: HIVE-20684 URL: https://issues.apache.org/jira/browse/HIVE-20684 Project: Hive Issue Type: Bug Reporter: Nishant Bangarwa Assignee: Nishant Bangarwa Analyze table druid_table compute statistics for columns; Reference Exception - {code} org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException: Only integer/long/timestamp/date/float/double/string/binary/boolean/decimal type argument is accepted but timestamp with local time zone is passed. at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFComputeStats.getEvaluator(GenericUDAFComputeStats.java:105) at org.apache.hadoop.hive.ql.udf.generic.AbstractGenericUDAFResolver.getEvaluator(AbstractGenericUDAFResolver.java:48) at org.apache.hadoop.hive.ql.exec.FunctionRegistry.getGenericUDAFEvaluator(FunctionRegistry.java:1043) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getGenericUDAFEvaluator(SemanticAnalyzer.java:4817) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapGroupByOperator(SemanticAnalyzer.java:5482) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapAggrNoSkew(SemanticAnalyzer.java:6496) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:10617) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11557) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11427) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:12229) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12319) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11802) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20683) Add the Ability to push Dynamic Between and Bloom filters to Druid
Nishant Bangarwa created HIVE-20683: --- Summary: Add the Ability to push Dynamic Between and Bloom filters to Druid Key: HIVE-20683 URL: https://issues.apache.org/jira/browse/HIVE-20683 Project: Hive Issue Type: New Feature Components: Druid integration Reporter: Nishant Bangarwa Assignee: Nishant Bangarwa For optimizing joins, Hive generates BETWEEN filter with min-max and BLOOM filter for filtering one side of semi-join. Druid 0.13.0 will have support for Bloom filters (Added via https://github.com/apache/incubator-druid/pull/6222) Implementation details - # Hive generates and passes the filters as part of 'filterExpr' in TableScan. # DruidQueryBasedRecordReader gets this filter passed as part of the conf. # During execution phase, before sending the query to druid in DruidQueryBasedRecordReader we will deserialize this filter, translate it into a DruidDimFilter and add it to existing DruidQuery. Tez executor already ensures that when we start reading results from the record reader, all the dynamic values are initialized. # Explaining a druid query also prints the query sent to druid as {{druid.json.query}}. We also need to make sure to update the druid query with the filters. During explain we do not have the actual values for the dynamic values, so instead of values we will print the dynamic expression itself as part of druid query. Note:- This work needs druid to be updated to version 0.13.0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20626) Log more details when druid metastore transaction fails in callback
Nishant Bangarwa created HIVE-20626: --- Summary: Log more details when druid metastore transaction fails in callback Key: HIVE-20626 URL: https://issues.apache.org/jira/browse/HIVE-20626 Project: Hive Issue Type: Task Components: Druid integration Reporter: Nishant Bangarwa Assignee: Nishant Bangarwa Below exception does not give much details on what is the actual cause of the error. We also need to log the callback exception when we get it. {code} Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Transaction failed do to exception being thrown from within the callback. See cause for the original exception.) at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:932) ~[hive-exec-3.1.0.3.0.0.0-1634.jar:3.1.0.3.0.0.0-1634] at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:937) ~[hive-exec-3.1.0.3.0.0.0-1634.jar:3.1.0.3.0.0.0-1634] at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:4954) ~[hive-exec-3.1.0.3.0.0.0-1634.jar:3.1.0.3.0.0.0-1634] at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:428) ~[hive-exec-3.1.0.3.0.0.0-1634.jar:3.1.0.3.0.0.0-1634] at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:205) ~[hive-exec-3.1.0.3.0.0.0-1634.jar:3.1.0.3.0.0.0-1634] at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97) ~[hive-exec-3.1.0.3.0.0.0-1634.jar:3.1.0.3.0.0.0-1634] at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2668) ~[hive-exec-3.1.0.3.0.0.0-1634.jar:3.1.0.3.0.0.0-1634] {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20546) Upgrade to Druid 0.13.0
Nishant Bangarwa created HIVE-20546: --- Summary: Upgrade to Druid 0.13.0 Key: HIVE-20546 URL: https://issues.apache.org/jira/browse/HIVE-20546 Project: Hive Issue Type: Task Reporter: Nishant Bangarwa Assignee: Nishant Bangarwa This task is to upgrade to druid 0.13.0 when it is released. Note that it will hopefully be first apache release for Druid. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20539) Remove dependency on com.metamx.java-util
Nishant Bangarwa created HIVE-20539: --- Summary: Remove dependency on com.metamx.java-util Key: HIVE-20539 URL: https://issues.apache.org/jira/browse/HIVE-20539 Project: Hive Issue Type: Task Reporter: Nishant Bangarwa Assignee: Nishant Bangarwa java-util was moved from com.metamx to druid code repository. Currently we are packing both com.metamx.java-jtil and io.druid.java-util, This task is to remove the dependency on com.metamx.java-util -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20469) Do not rollup PK/FK columns when indexing to druid.
Nishant Bangarwa created HIVE-20469: --- Summary: Do not rollup PK/FK columns when indexing to druid. Key: HIVE-20469 URL: https://issues.apache.org/jira/browse/HIVE-20469 Project: Hive Issue Type: Bug Reporter: Nishant Bangarwa Assignee: Nishant Bangarwa When indexing data to druid if a numeric column has a PK/FK constraint. We need to make sure it is not indexed as a metric and rolled up when indexing to druid. Thanks [~t3rmin4t0r] for recommending this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20468) Add ability to skip creating druid bitmap indexes for specific string dimensions
Nishant Bangarwa created HIVE-20468: --- Summary: Add ability to skip creating druid bitmap indexes for specific string dimensions Key: HIVE-20468 URL: https://issues.apache.org/jira/browse/HIVE-20468 Project: Hive Issue Type: New Feature Reporter: Nishant Bangarwa Assignee: Nishant Bangarwa Currently we create bitmap index for all druid dimensions. For some columns (e.g Free form text, high cardinality columns that are rarely filtered upon), It may be beneficial to skip creating druid bitmap index and save disk space. In druid https://github.com/apache/incubator-druid/pull/5402 added support for creating string dimension columns without bitmap indexes. This task is to add similar option when indexing data from hive. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20449) DruidMiniTests - Move creation of druid table from allTypesOrc to test setup phase
Nishant Bangarwa created HIVE-20449: --- Summary: DruidMiniTests - Move creation of druid table from allTypesOrc to test setup phase Key: HIVE-20449 URL: https://issues.apache.org/jira/browse/HIVE-20449 Project: Hive Issue Type: Improvement Reporter: Nishant Bangarwa Assignee: Nishant Bangarwa Multiple druid tests end up creating a Druid table from allTypesOrc table. Moving this table creation to a pre-test setup phase would avoid redundant work in tests and possibly help in reducing test runtimes. Thanks, [~jcamachorodriguez] for suggesting this improvement. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20353) Follow redirects when hive connects to a passive druid overlord/coordinator
Nishant Bangarwa created HIVE-20353: --- Summary: Follow redirects when hive connects to a passive druid overlord/coordinator Key: HIVE-20353 URL: https://issues.apache.org/jira/browse/HIVE-20353 Project: Hive Issue Type: Bug Components: Druid integration Reporter: Nishant Bangarwa Assignee: Nishant Bangarwa When we have multiple druid coordinators/overlords and hive tries to connect to a passive one, it will get a redirect. Currently the http client in druid storage handler does not follow redirects. We need to check if there is a redirect and follow that for druid overlord/coordinator -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20349) Implement Retry Logic in HiveDruidSplit for Scan Queries
Nishant Bangarwa created HIVE-20349: --- Summary: Implement Retry Logic in HiveDruidSplit for Scan Queries Key: HIVE-20349 URL: https://issues.apache.org/jira/browse/HIVE-20349 Project: Hive Issue Type: Bug Reporter: Nishant Bangarwa Assignee: Nishant Bangarwa while distributing druid scan query we check where the segments are loaded and then each HiveDruidSplit directly queries the historical node. There are few cases when we need to retry and refetch the segments. # The segment is loaded on multiple historical nodes and one of them went down. in this case when we do not get response from one segment, we query the next replica. # The segment was loaded onto a realtime task and was handed over, when we query the realtime task has already finished. In this case there is no replica. The Split needs to query the broker again for the location of the segment and then send the query to correct historical node. This is also the root cause of failure of druidkafkamini_basic.q test, where the segment handover happens before the scan query is executed. Note: This is not a problem when we are directly querying Druid brokers as the broker handles the retry logic. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20341) Druid Needs Explicit CASTs from Timestamp to STRING when the output of timestamp function is used as String
Nishant Bangarwa created HIVE-20341: --- Summary: Druid Needs Explicit CASTs from Timestamp to STRING when the output of timestamp function is used as String Key: HIVE-20341 URL: https://issues.apache.org/jira/browse/HIVE-20341 Project: Hive Issue Type: Bug Reporter: Nishant Bangarwa Druid timestamp expression functions returns numeric values in form of millis since epoch. Functions that use the output of the timestamp functions as String return different values for tables stored in HIVE and Druid. {code} SELECT SUBSTRING(to_date(datetime0),4) FROM tableau_orc.calcs; | 4-07-25 | SELECT SUBSTRING(to_date(datetime0),4) FROM druid_tableau.calcs; | 002240 | SELECT CONCAT(to_date(datetime0),' 00:00:00') FROM tableau_orc.calcs; | 2004-07-17 00:00:00 | SELECT CONCAT(to_date(datetime0),' 00:00:00') FROM druid_tableau.calcs; | 109045440 00:00:00 | {code} We need to add explicit CAST to String before generating Druid expressions. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20297) Column Level Stats for Druid Tables
Nishant Bangarwa created HIVE-20297: --- Summary: Column Level Stats for Druid Tables Key: HIVE-20297 URL: https://issues.apache.org/jira/browse/HIVE-20297 Project: Hive Issue Type: Improvement Reporter: Nishant Bangarwa Assignee: Nishant Bangarwa This task is to have correct column level stats for druid in hive metastore. - Stats like min/max/cardinality can be gathered using a Druid Segment Metadata Query. - Druid Query planning we need to ensure that the filters/Aggregations pushed inside DruidQuery are accounted for. Having correct stats would also help optimizer ensure proper join orderings when doing federated complex joins between hive/druid. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20279) HiveContextAwareRecordReader slows down Druid Scan queries.
Nishant Bangarwa created HIVE-20279: --- Summary: HiveContextAwareRecordReader slows down Druid Scan queries. Key: HIVE-20279 URL: https://issues.apache.org/jira/browse/HIVE-20279 Project: Hive Issue Type: Improvement Reporter: Nishant Bangarwa Assignee: Nishant Bangarwa Attachments: scan2.svg HiveContextAwareRecordReader add lots of overhead for Druid Scan Queries. See attached flame graph. Looks like the operations for checking for existence of footer/header buffer takes most of time For druid and other storage handlers that do not have footer buffer we should skip the logic for checking the existence for storage handlers atleast. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20278) Druid Scan Query avoid copying from List -> Map -> List
Nishant Bangarwa created HIVE-20278: --- Summary: Druid Scan Query avoid copying from List -> Map -> List Key: HIVE-20278 URL: https://issues.apache.org/jira/browse/HIVE-20278 Project: Hive Issue Type: Improvement Reporter: Nishant Bangarwa Assignee: Nishant Bangarwa DruidScanQueryRecordReader gets a compacted List from druid. It then converts that list into a Map as DruidWritable where key is the column name. At the second stage DruidSerde takes this DruidWritable and creates a List out out of the map again. We can avoid the map creation part by reading the list sent by druid directly in the DruidSerde.deserialize() method. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20035) write booleans as long when serializing to druid
Nishant Bangarwa created HIVE-20035: --- Summary: write booleans as long when serializing to druid Key: HIVE-20035 URL: https://issues.apache.org/jira/browse/HIVE-20035 Project: Hive Issue Type: Bug Reporter: Nishant Bangarwa Assignee: Nishant Bangarwa Druid expressions do not support booleans yet. In druid expressions booleans are treated and parsed from longs, however when we store booleans from hive they are serialized as 'true' and 'false' string values. Need to make serialization consistent with deserialization and write long values when sending data to druid. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20014) Druid SECOND/HOUR/MINUTE does not return correct values when applied to String Columns
Nishant Bangarwa created HIVE-20014: --- Summary: Druid SECOND/HOUR/MINUTE does not return correct values when applied to String Columns Key: HIVE-20014 URL: https://issues.apache.org/jira/browse/HIVE-20014 Project: Hive Issue Type: Bug Reporter: Nishant Bangarwa Assignee: Nishant Bangarwa Query SELECT MINUTE(`time1`) FROM calcs; returns null when the String column only contains timestamp and does not contain any date information in the column. The Druid parser fails to parse the time string values and returns null. {code} 1: jdbc:hive2://ctr-e138-1518143905142-379982> SELECT MINUTE(`time1`) FROM calcs; INFO : Compiling command(queryId=hive_20180627145215_05147329-b8d8-491c-9bab-6fd5045542db): SELECT MINUTE(`time1`) FROM calcs INFO : Semantic Analysis Completed (retrial = false) INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:vc, type:int, comment:null)], properties:null) INFO : Completed compiling command(queryId=hive_20180627145215_05147329-b8d8-491c-9bab-6fd5045542db); Time taken: 0.134 seconds INFO : Executing command(queryId=hive_20180627145215_05147329-b8d8-491c-9bab-6fd5045542db): SELECT MINUTE(`time1`) FROM calcs INFO : Completed executing command(queryId=hive_20180627145215_05147329-b8d8-491c-9bab-6fd5045542db); Time taken: 0.002 seconds INFO : OK +---+ | vc | +---+ | NULL | | NULL | | NULL | | NULL | | NULL | | NULL | | NULL | | NULL | | NULL | | NULL | | NULL | | NULL | | NULL | | NULL | | NULL | | NULL | | NULL | +---+ 17 rows selected (0.266 seconds) 1: jdbc:hive2://ctr-e138-1518143905142-379982> SELECT time1 from calcs; INFO : Compiling command(queryId=hive_20180627145225_93b872de-a698-4859-9730-983eede6935d): SELECT time1 from calcs INFO : Semantic Analysis Completed (retrial = false) INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:time1, type:string, comment:null)], properties:null) INFO : Completed compiling command(queryId=hive_20180627145225_93b872de-a698-4859-9730-983eede6935d); Time taken: 0.116 seconds INFO : Executing command(queryId=hive_20180627145225_93b872de-a698-4859-9730-983eede6935d): SELECT time1 from calcs INFO : Completed executing command(queryId=hive_20180627145225_93b872de-a698-4859-9730-983eede6935d); Time taken: 0.003 seconds INFO : OK +---+ | time1 | +---+ | 22:20:14 | | 22:50:16 | | 19:36:22 | | 19:48:23 | | 00:05:57 | | NULL | | 04:48:07 | | NULL | | 19:57:33 | | NULL | | 04:40:49 | | 02:05:25 | | NULL | | NULL | | 12:33:57 | | 18:58:41 | | 09:33:31 | +---+ 17 rows selected (0.202 seconds) 1: jdbc:hive2://ctr-e138-1518143905142-379982> EXPLAIN SELECT MINUTE(`time1`) FROM calcs; INFO : Compiling command(queryId=hive_20180627145237_39e53a7e-35cb-4e17-8ccb-884c6f6358cd): EXPLAIN SELECT MINUTE(`time1`) FROM calcs INFO : Semantic Analysis Completed (retrial = false) INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:Explain, type:string, comment:null)], properties:null) INFO : Completed compiling command(queryId=hive_20180627145237_39e53a7e-35cb-4e17-8ccb-884c6f6358cd); Time taken: 0.107 seconds INFO : Executing command(queryId=hive_20180627145237_39e53a7e-35cb-4e17-8ccb-884c6f6358cd): EXPLAIN SELECT MINUTE(`time1`) FROM calcs INFO : Starting task [Stage-1:EXPLAIN] in serial mode INFO : Completed executing command(queryId=hive_20180627145237_39e53a7e-35cb-4e17-8ccb-884c6f6358cd); Time taken: 0.003 seconds INFO : OK ++ | Explain | ++ | Plan optimized by CBO. | || | Stage-0| | Fetch Operator | | limit:-1 | | Select Operator [SEL_1]| | Output:["_col0"] | | TableScan [TS_0] | | Output:["vc"],properties:{"druid.fieldNames":"vc","druid.fieldTypes":"int","druid.query.json":"{\"queryType\":\"scan\",\"dataSource\":\"druid_tableau.calcs\",\"intervals\":[\"1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z\"],\"virtualColumns\":[{\"type\":\"expression\",\"name\":\"vc\",\"expression\":\"timestamp_extract(timestamp_parse(\\\"time1\\\",null,'UTC'),'MINUTE','UTC')\",\"outputType\":\"LONG\"}],\"columns\":[\"vc\"],\"resultFormat\":\"compactedList\"}","druid.query.type":"scan"} | || ++ 10 rows selected (0.136 seconds) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20013) Add an Implicit cast to date type for to_date function
Nishant Bangarwa created HIVE-20013: --- Summary: Add an Implicit cast to date type for to_date function Key: HIVE-20013 URL: https://issues.apache.org/jira/browse/HIVE-20013 Project: Hive Issue Type: Bug Reporter: Nishant Bangarwa Assignee: Nishant Bangarwa Issue - SELECT TO_DATE(date1), TO_DATE(datetime1) FROM druid_table_n1; Running this query on Druid returns null values when date1 and datetime1 are of type String. {code} INFO : Executing command(queryId=hive_20180627144822_d4395567-e3cb-4b20-b53b-4e5eba2d7dac): EXPLAIN SELECT TO_DATE(datetime0) ,TO_DATE(date0) FROM calcs INFO : Starting task [Stage-1:EXPLAIN] in serial mode INFO : Completed executing command(queryId=hive_20180627144822_d4395567-e3cb-4b20-b53b-4e5eba2d7dac); Time taken: 0.003 seconds INFO : OK ++ | Explain | ++ | Plan optimized by CBO. | || | Stage-0| | Fetch Operator | | limit:-1 | | Select Operator [SEL_1]| | Output:["_col0","_col1"] | | TableScan [TS_0] | | Output:["vc","vc0"],properties:{"druid.fieldNames":"vc,vc0","druid.fieldTypes":"date,date","druid.query.json":"{\"queryType\":\"scan\",\"dataSource\":\"druid_tableau.calcs\",\"intervals\":[\"1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z\"],\"virtualColumns\":[{\"type\":\"expression\",\"name\":\"vc\",\"expression\":\"timestamp_floor(\\\"datetime0\\\",'P1D','','UTC')\",\"outputType\":\"LONG\"},{\"type\":\"expression\",\"name\":\"vc0\",\"expression\":\"timestamp_floor(\\\"date0\\\",'P1D','','UTC')\",\"outputType\":\"LONG\"}],\"columns\":[\"vc\",\"vc0\"],\"resultFormat\":\"compactedList\"}","druid.query.type":"scan"} | || ++ 10 rows selected (0.606 seconds) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19941) Row based Filters added via Hive Ranger policies are not pushed to druid
Nishant Bangarwa created HIVE-19941: --- Summary: Row based Filters added via Hive Ranger policies are not pushed to druid Key: HIVE-19941 URL: https://issues.apache.org/jira/browse/HIVE-19941 Project: Hive Issue Type: Bug Reporter: Nishant Bangarwa Assignee: Nishant Bangarwa Issue is that when applying table mask we add virtual columns, however non-native tables do not have virtual columns, we need to skip adding virtual columns when generating masking query. Stack Trace - {code} org.apache.hadoop.hive.ql.parse.SemanticException: Line 1:79 Invalid table alias or column reference 'BLOCK__OFFSET__INSIDE__FILE' : (possible column names are: __time, yearmonth, year, month, dayofmonth, dayofweek, weekofyear, hour, minute, second, payment_typ e, fare_amount, surcharge, mta_tax, tip_amount, tolls_amount, total_amount, trip_time) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:11830) ~[hive-exec-2.1.0.2.6. 4.0-91.jar:2.1.0.2.6.4.0-91] at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:11778) ~[hive-exec-2.1.0.2.6.4.0 -91.jar:2.1.0.2.6.4.0-91] at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genSelectLogicalPlan(CalcitePlanner.java:3780) ~[hi ve-exec-2.1.0.2.6.4.0-91.jar:2.1.0.2.6.4.0-91] at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:4117) ~[hive-exe c-2.1.0.2.6.4.0-91.jar:2.1.0.2.6.4.0-91] at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:4016) ~[hive-exe c-2.1.0.2.6.4.0-91.jar:2.1.0.2.6.4.0-91] at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:4060) ~[hive-exe c-2.1.0.2.6.4.0-91.jar:2.1.0.2.6.4.0-91] at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1340) ~[hive-exec-2.1.0.2 .6.4.0-91.jar:2.1.0.2.6.4.0-91] at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1277) ~[hive-exec-2.1.0.2 .6.4.0-91.jar:2.1.0.2.6.4.0-91] at org.apache.calcite.tools.Frameworks$1.apply(Frameworks.java:113) ~[calcite-core-1.10.0.2.6.4.0-91.jar:1.10.0.2.6.4.0-91 ] at org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:997) ~[calcite-core-1.10.0.2.6.4.0-91.jar :1.10.0.2.6.4.0-91] at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:149) ~[calcite-core-1.10.0.2.6.4.0-91.jar:1.10.0.2.6.4. 0-91] at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:106) ~[calcite-core-1.10.0.2.6.4.0-91.jar:1.10.0.2.6.4. 0-91] at org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1082) ~[hive-exec-2.1.0.2.6.4.0-91.jar:2 .1.0.2.6.4.0-91] {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19885) Druid Kafka Ingestion - Allow user to set kafka consumer properties via table properties
Nishant Bangarwa created HIVE-19885: --- Summary: Druid Kafka Ingestion - Allow user to set kafka consumer properties via table properties Key: HIVE-19885 URL: https://issues.apache.org/jira/browse/HIVE-19885 Project: Hive Issue Type: Improvement Reporter: Nishant Bangarwa Assignee: Nishant Bangarwa Allow users to set kafka consumer properties via table properties. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19762) Druid Queries containing Joins gives wrong results.
Nishant Bangarwa created HIVE-19762: --- Summary: Druid Queries containing Joins gives wrong results. Key: HIVE-19762 URL: https://issues.apache.org/jira/browse/HIVE-19762 Project: Hive Issue Type: Bug Reporter: Nishant Bangarwa Assignee: Nishant Bangarwa Druid queries that have joins against self table gives wrong results. e.g. {code} SELECT username AS `username`, SUM(double1) AS `sum_double1` FROM druid_table_with_nulls `tbl1` JOIN ( SELECT username AS `username`, SUM(double1) AS `sum_double2` FROM druid_table_with_nulls GROUP BY `username` ORDER BY `sum_double2` DESC LIMIT 10 ) `tbl2` ON (`tbl1`.`username` = `tbl2`.`username`) GROUP BY `tbl1`.`username`; {code} In this case one of the queries is a druid scan query and other is groupBy query. During planning, the properties of these queries are set to the tableDesc and serdeInfo, while setting the map work, we overwrite the properties from the properties present in serdeInfo, this causes the scan query results to be deserialized using wrong column names and results in Null values. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19604) Incorrect Handling of Boolean in DruidSerde
Nishant Bangarwa created HIVE-19604: --- Summary: Incorrect Handling of Boolean in DruidSerde Key: HIVE-19604 URL: https://issues.apache.org/jira/browse/HIVE-19604 Project: Hive Issue Type: Bug Components: Druid integration Reporter: Nishant Bangarwa Assignee: Nishant Bangarwa Results of boolean expressions from Druid are expressed in the form of numeric 1 or 0. When reading the results in DruidSerde both 1 and 0 are translated to String and then we call Boolean.valueOf(stringForm), this leads to the boolean being read always as false. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19452) Avoid Deserializing and Serializing Druid query in DruidRecordReaders
Nishant Bangarwa created HIVE-19452: --- Summary: Avoid Deserializing and Serializing Druid query in DruidRecordReaders Key: HIVE-19452 URL: https://issues.apache.org/jira/browse/HIVE-19452 Project: Hive Issue Type: Task Reporter: Nishant Bangarwa Assignee: Nishant Bangarwa Druid record reader deserializes and serializes the Druid query before sending it to druid. This can be avoided and we can stop packaging some of druid dependencies e.g. org.antlr from druid-handler selfcontained jar. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19451) Druid Query Execution fails with ClassNotFoundException org.antlr.v4.runtime.CharStream
Nishant Bangarwa created HIVE-19451: --- Summary: Druid Query Execution fails with ClassNotFoundException org.antlr.v4.runtime.CharStream Key: HIVE-19451 URL: https://issues.apache.org/jira/browse/HIVE-19451 Project: Hive Issue Type: Task Reporter: Nishant Bangarwa Assignee: Nishant Bangarwa Stack trace - {code} ERROR : Status: Failed ERROR : Vertex failed, vertexName=Map 1, vertexId=vertex_1524814504173_1344_45_00, diagnostics=[Task failed, taskId=task_1524814504173_1344_45_00_29, diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( failure ) : attempt_1524814504173_1344_45_00_29_0:java.lang.RuntimeException: java.io.IOException: org.apache.hive.druid.com.fasterxml.jackson.databind.exc.InvalidDefinitionException: Cannot construct instance of `org.apache.hive.druid.io.druid.segment.virtual.ExpressionVirtualColumn`, problem: org/antlr/v4/runtime/CharStream at [Source: (String)"{"queryType":"scan","dataSource":{"type":"table","name":"tpcds_real_bin_partitioned_orc_1000.tpcds_denormalized_druid_table_7mcd"},"intervals":{"type":"segments","segments":[{"itvl":"1998-11-30T00:00:00.000Z/1998-12-01T00:00:00.000Z","ver":"2018-05-03T11:35:22.230Z","part":0}]},"virtualColumns":[{"type":"expression","name":"vc","expression":"\"__time\"","outputType":"LONG"}],"resultFormat":"compactedList","batchSize":20480,"limit":9223372036854775807,"filter":{"type":"bound","dimension":"i_brand"[truncated 241 chars]; line: 1, column: 376] (through reference chain: org.apache.hive.druid.io.druid.query.scan.ScanQuery["virtualColumns"]->java.util.ArrayList[0]) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:110) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: org.apache.hive.druid.com.fasterxml.jackson.databind.exc.InvalidDefinitionException: Cannot construct instance of `org.apache.hive.druid.io.druid.segment.virtual.ExpressionVirtualColumn`, problem: org/antlr/v4/runtime/CharStream at [Source: (String)"{"queryType":"scan","dataSource":{"type":"table","name":"tpcds_real_bin_partitioned_orc_1000.tpcds_denormalized_druid_table_7mcd"},"intervals":{"type":"segments","segments":[{"itvl":"1998-11-30T00:00:00.000Z/1998-12-01T00:00:00.000Z","ver":"2018-05-03T11:35:22.230Z","part":0}]},"virtualColumns":[{"type":"expression","name":"vc","expression":"\"__time\"","outputType":"LONG"}],"resultFormat":"compactedList","batchSize":20480,"limit":9223372036854775807,"filter":{"type":"bound","dimension":"i_brand"[truncated 241 chars]; line: 1, column: 376] (through reference chain: org.apache.hive.druid.io.druid.query.scan.ScanQuery["virtualColumns"]->java.util.ArrayList[0]) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:438) at org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:157) at org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(MRReaderMapred.java:83) at org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:703) at org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:662) at org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:150)
[jira] [Created] (HIVE-19173) Add Storage Handler runtime information as part of DESCRIBE EXTENDED
Nishant Bangarwa created HIVE-19173: --- Summary: Add Storage Handler runtime information as part of DESCRIBE EXTENDED Key: HIVE-19173 URL: https://issues.apache.org/jira/browse/HIVE-19173 Project: Hive Issue Type: Task Reporter: Nishant Bangarwa Assignee: Nishant Bangarwa Follow up for https://issues.apache.org/jira/browse/HIVE-18976 Kafka Indexing Service in Druid has a runtime state associated with it. Druid publishes this runtime state as KafkaSupervisorReport which has latest offsets as reported by Kafka, the consumer lag per partition, as well as the aggregate lag of all partitions. This information is quite useful to know whether a kafka-indexing-service backed table has latest info or not. This task is to add a this information as part of the output of DESCRIBE EXTENDED statement -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19172) NPE due to null EnvironmentContext in DDLTask
Nishant Bangarwa created HIVE-19172: --- Summary: NPE due to null EnvironmentContext in DDLTask Key: HIVE-19172 URL: https://issues.apache.org/jira/browse/HIVE-19172 Project: Hive Issue Type: Task Reporter: Nishant Bangarwa Assignee: Nishant Bangarwa Stack Trace - {code} 2018-04-11T02:52:51,386 ERROR [5f2e24bf-ac93-4977-84fe-aa2c5f674ea4 main] exec.DDLTask: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.DDLTask.alterTable(DDLTask.java:3539) at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:392) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1987) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1667) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1414) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19107) Wait for druid kafka indexing tasks to start before returning from create table statement
Nishant Bangarwa created HIVE-19107: --- Summary: Wait for druid kafka indexing tasks to start before returning from create table statement Key: HIVE-19107 URL: https://issues.apache.org/jira/browse/HIVE-19107 Project: Hive Issue Type: Task Reporter: Nishant Bangarwa Assignee: Nishant Bangarwa Follow up for https://issues.apache.org/jira/browse/HIVE-18976 Above PR adds support to setup druid kafka indexing service from hive. However, the create table command submits the kafka supervisor to druid and does not wait for the indexing tasks to start. This task is to add a wait by checking the supervisor status on druid side. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19049) Add support for Alter table add columns for Druid
Nishant Bangarwa created HIVE-19049: --- Summary: Add support for Alter table add columns for Druid Key: HIVE-19049 URL: https://issues.apache.org/jira/browse/HIVE-19049 Project: Hive Issue Type: Task Reporter: Nishant Bangarwa Assignee: Nishant Bangarwa Add support for Alter table add columns for Druid. Currently it is not supported and throws exception. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19026) Configurable serde for druid kafka indexing
Nishant Bangarwa created HIVE-19026: --- Summary: Configurable serde for druid kafka indexing Key: HIVE-19026 URL: https://issues.apache.org/jira/browse/HIVE-19026 Project: Hive Issue Type: Task Reporter: Nishant Bangarwa Assignee: Nishant Bangarwa https://issues.apache.org/jira/browse/HIVE-18976 introduces support for setting up druid kafka-indexing service. Input serialization should be configurable. for now we can say we only support json, but there should be a mechanism to support other formats. Perhaps, we can make use of Hive's serde library like LazySimpleSerde etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18976) Add ability to setup Druid Kafka Ingestion from Hive
Nishant Bangarwa created HIVE-18976: --- Summary: Add ability to setup Druid Kafka Ingestion from Hive Key: HIVE-18976 URL: https://issues.apache.org/jira/browse/HIVE-18976 Project: Hive Issue Type: Bug Components: Druid integration Reporter: Nishant Bangarwa Assignee: Nishant Bangarwa Add Ability to setup druid kafka Ingestion using Hive CREATE TABLE statement e.g. Below query can submit a kafka supervisor spec to the druid overlord and druid can start ingesting events from kafka. {code:java} CREATE TABLE druid_kafka_test(`__time` timestamp, page string, language string, `user` string, added int, deleted int, delta int) STORED BY 'org.apache.hadoop.hive.druid.DruidKafkaStreamingStorageHandler' TBLPROPERTIES ( "druid.segment.granularity" = "HOUR", "druid.query.granularity" = "MINUTE", "kafka.bootstrap.servers" = "localhost:9092", "kafka.topic" = "test-topic", "druid.kafka.ingest.useEarliestOffset" = "true" ); {code} Design - This can be done via a DruidKafkaStreamingStorageHandler that extends existing DruidStorageHandler and adds the additional functionality for Streaming. Testing - Add a DruidKafkaMiniCluster which will consist of DruidMiniCluster + Single Node Kafka Broker. The broker can be populated with a test topic that has some predefined data. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18583) Enable DateRangeRules
Nishant Bangarwa created HIVE-18583: --- Summary: Enable DateRangeRules Key: HIVE-18583 URL: https://issues.apache.org/jira/browse/HIVE-18583 Project: Hive Issue Type: Improvement Reporter: Nishant Bangarwa Assignee: Nishant Bangarwa Enable DateRangeRules to translate druid filters to date ranges. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18569) Hive Druid indexing not dealing with decimals in correct way.
Nishant Bangarwa created HIVE-18569: --- Summary: Hive Druid indexing not dealing with decimals in correct way. Key: HIVE-18569 URL: https://issues.apache.org/jira/browse/HIVE-18569 Project: Hive Issue Type: Bug Components: Druid integration Reporter: Nishant Bangarwa Assignee: Nishant Bangarwa Currently, a decimal column is indexed as double in druid. This should not happen and either the user has to add an explicit cast or we can add a flag to enable approximation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18518) Upgrade druid version to 0.11.0
Nishant Bangarwa created HIVE-18518: --- Summary: Upgrade druid version to 0.11.0 Key: HIVE-18518 URL: https://issues.apache.org/jira/browse/HIVE-18518 Project: Hive Issue Type: Bug Reporter: Nishant Bangarwa Assignee: Nishant Bangarwa this task is to upgrade to druid version 0.11.0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18271) Druid Insert into fails with exception when committing files
Nishant Bangarwa created HIVE-18271: --- Summary: Druid Insert into fails with exception when committing files Key: HIVE-18271 URL: https://issues.apache.org/jira/browse/HIVE-18271 Project: Hive Issue Type: Bug Reporter: Nishant Bangarwa Assignee: Nishant Bangarwa Exception - {code} 03.hwx.site:8020/apps/hive/warehouse/_tmp.all100k_druid_initial_empty to: hdfs://ctr-e136-1513029738776-2163-01-03.hwx.site:8020/apps/hive/warehouse/_tmp.all100k_druid_initial_empty.moved)' org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move: hdfs://ctr-e136-1513029738776-2163-01-03.hwx.site:8020/apps/hive/warehouse/_tmp.all100k_druid_initial_empty to: hdfs://ctr-e136-1513029738776-2163-01-03.hwx.site:8020/apps/hive/warehouse/_tmp.all100k_druid_initial_empty.moved at org.apache.hadoop.hive.ql.exec.Utilities.rename(Utilities.java:1129) at org.apache.hadoop.hive.ql.exec.Utilities.mvFileToFinalPath(Utilities.java:1460) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.jobCloseOp(FileSinkOperator.java:1135) at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:765) at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:770) at org.apache.hadoop.hive.ql.exec.tez.TezTask.close(TezTask.java:588) at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:286) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1987) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1667) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1414) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1211) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1204) at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:242) at org.apache.hive.service.cli.operation.SQLOperation.access$800(SQLOperation.java:91) at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:336) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:350) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-16752) Enable Unit test - TestDruidRecordWriter.testWrite
Nishant Bangarwa created HIVE-16752: --- Summary: Enable Unit test - TestDruidRecordWriter.testWrite Key: HIVE-16752 URL: https://issues.apache.org/jira/browse/HIVE-16752 Project: Hive Issue Type: Bug Components: Druid integration Reporter: Nishant Bangarwa After the changes done in https://issues.apache.org/jira/browse/HIVE-16474 the test is failing due to loading of guava classes from hive-exec jar. this is because the hive-exec jar is a shaded jar which contains all the dependencies. For details see - https://github.com/apache/hive/blob/master/ql/pom.xml#L820 "The way shade was configured since 0.13, is to override the default jar for ql module with the shaded one but keep the same name." So when mvn resolves the jar when running the unit test, it sees the shaded jar which has guava also. To resolve this, there are two ways i could find - 1) Tweak the order of dependencies in druid 2) Somehow add a dependency in druid-handler for non-shaded jar, but since it has been already overridden, not sure how to do it. 3) Use a different namespace for guava classes in hive-exec jar. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HIVE-16576) Fix encoding of intervals when fetching select query candidates from druid
Nishant Bangarwa created HIVE-16576: --- Summary: Fix encoding of intervals when fetching select query candidates from druid Key: HIVE-16576 URL: https://issues.apache.org/jira/browse/HIVE-16576 Project: Hive Issue Type: Bug Components: Druid integration Reporter: Nishant Bangarwa Assignee: Nishant Bangarwa Debug logs on HIVE side - {code} 2017-05-03T23:49:00,672 DEBUG [HttpClient-Netty-Worker-0] client.NettyHttpClient: [GET http://localhost:8082/druid/v2/datasources/cmv_basetable_druid/candidates?intervals=1900-01-01T00:00:00.000+05:53:20/3000-01-01T00:00:00.000+05:30] Got response: 500 Server Error {code} Druid exception stack trace - {code} 2017-05-03T18:56:58,928 WARN [qtp1651318806-158] org.eclipse.jetty.servlet.ServletHandler - /druid/v2/datasources/cmv_basetable_druid/candidates java.lang.IllegalArgumentException: Invalid format: ""1900-01-01T00:00:00.000 05:53:20" at org.joda.time.format.DateTimeFormatter.parseDateTime(DateTimeFormatter.java:899) ~[joda-time-2.8.2.jar:2.8.2] at org.joda.time.convert.StringConverter.setInto(StringConverter.java:212) ~[joda-time-2.8.2.jar:2.8.2] at org.joda.time.base.BaseInterval.(BaseInterval.java:200) ~[joda-time-2.8.2.jar:2.8.2] at org.joda.time.Interval.(Interval.java:193) ~[joda-time-2.8.2.jar:2.8.2] at org.joda.time.Interval.parse(Interval.java:69) ~[joda-time-2.8.2.jar:2.8.2] at io.druid.server.ClientInfoResource.getQueryTargets(ClientInfoResource.java:320) ~[classes/:?] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_92] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_92] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_92] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_92] {code} Note that intervals being sent as part of the HTTP request URL are not encoded properly. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HIVE-16518) Insert override for druid does not replace all existing segments
Nishant Bangarwa created HIVE-16518: --- Summary: Insert override for druid does not replace all existing segments Key: HIVE-16518 URL: https://issues.apache.org/jira/browse/HIVE-16518 Project: Hive Issue Type: Bug Reporter: Nishant Bangarwa Assignee: Nishant Bangarwa Insert override for Druid does not replace segments for all intervals. It just replaces segments for the intervals which are newly ingested. INSERT OVERRIDE TABLE statement on DruidStorageHandler should override all existing segments for the table. -- This message was sent by Atlassian JIRA (v6.3.15#6346)