[jira] [Commented] (HIVE-8128) Improve Parquet Vectorization
[ https://issues.apache.org/jira/browse/HIVE-8128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14319723#comment-14319723 ] Dong Chen commented on HIVE-8128: - Will start from a POC based on the new vectorized Parquet API at https://github.com/zhenxiao/incubator-parquet-mr/pull/1 Improve Parquet Vectorization - Key: HIVE-8128 URL: https://issues.apache.org/jira/browse/HIVE-8128 Project: Hive Issue Type: Sub-task Reporter: Brock Noland Assignee: Dong Chen We'll want to do is finish the vectorization work (e.g. VectorizedOrcSerde, VectorizedOrcSerde) which was partially done in HIVE-5998. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-9635) LLAP: I'm the decider
[ https://issues.apache.org/jira/browse/HIVE-9635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner resolved HIVE-9635. -- Resolution: Fixed Committed to branch. LLAP: I'm the decider - Key: HIVE-9635 URL: https://issues.apache.org/jira/browse/HIVE-9635 Project: Hive Issue Type: Sub-task Affects Versions: llap Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-9635.1.patch, HIVE-9635.2.patch https://www.youtube.com/watch?v=r8VbzrZ9yHQ Physical optimizer to choose what to run inside/outside llap. Tests first whether user code has to be shipped then if the specific query fragment is suitable to run. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9684) Incorrect disk range computation in ORC because of optional stream kind
[ https://issues.apache.org/jira/browse/HIVE-9684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-9684: Attachment: HIVE-9684.branch-1.0.patch Incorrect disk range computation in ORC because of optional stream kind --- Key: HIVE-9684 URL: https://issues.apache.org/jira/browse/HIVE-9684 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 1.0.0, 1.2.0, 1.1.0, 1.0.1 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Priority: Critical Attachments: HIVE-9684.branch-1.0.patch HIVE-9593 changed all required fields in ORC protobuf message to optional field. But DiskRange computation and stream creation code assumes existence of stream kind everywhere. This leads to incorrect calculation of diskranges resulting in out of range exceptions. The proper fix is to check if stream kind exists using stream.hasKind() before adding the stream to disk range computation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9680) GlobalLimitOptimizer is not checking filters correctly
[ https://issues.apache.org/jira/browse/HIVE-9680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14319756#comment-14319756 ] Hive QA commented on HIVE-9680: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12698595/HIVE-9680.1.patch.txt {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 7542 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.thrift.TestHadoop20SAuthBridge.testMetastoreProxyUser org.apache.hadoop.hive.thrift.TestHadoop20SAuthBridge.testSaslWithHiveMetaStore {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2789/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2789/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2789/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12698595 - PreCommit-HIVE-TRUNK-Build GlobalLimitOptimizer is not checking filters correctly --- Key: HIVE-9680 URL: https://issues.apache.org/jira/browse/HIVE-9680 Project: Hive Issue Type: Bug Components: Query Planning Reporter: Navis Assignee: Navis Priority: Trivial Attachments: HIVE-9680.1.patch.txt Some predicates can be not included in opToPartPruner -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-2573) Create per-session function registry
[ https://issues.apache.org/jira/browse/HIVE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14319779#comment-14319779 ] Lefty Leverenz commented on HIVE-2573: -- Doc note: This adds Function to the description of *hive.exec.drop.ignorenonexistent* in 1.2.0, so the wiki needs to be updated (with version information). By the way, HIVE-3781 added Index to the description in 1.1.0. * [Configuration Properties -- hive.exec.drop.ignorenonexistent | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.exec.drop.ignorenonexistent] What other documentation does this need? Should there be a release note? Create per-session function registry - Key: HIVE-2573 URL: https://issues.apache.org/jira/browse/HIVE-2573 Project: Hive Issue Type: Improvement Components: Server Infrastructure Reporter: Navis Assignee: Navis Priority: Minor Labels: TODOC1.2 Fix For: 1.2.0 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2573.D3231.1.patch, HIVE-2573.1.patch.txt, HIVE-2573.10.patch.txt, HIVE-2573.11.patch.txt, HIVE-2573.12.patch.txt, HIVE-2573.13.patch.txt, HIVE-2573.14.patch.txt, HIVE-2573.15.patch.txt, HIVE-2573.2.patch.txt, HIVE-2573.3.patch.txt, HIVE-2573.4.patch.txt, HIVE-2573.5.patch, HIVE-2573.6.patch, HIVE-2573.7.patch, HIVE-2573.8.patch.txt, HIVE-2573.9.patch.txt Currently the function registry is shared resource and could be overrided by other users when using HiveServer. If per-session function registry is provided, this situation could be prevented. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9667) Disable ORC bloom filters for ORC v11 output-format
[ https://issues.apache.org/jira/browse/HIVE-9667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-9667: Resolution: Fixed Status: Resolved (was: Patch Available) Committed to trunk. Thanks [~gopalv] for the patch! Disable ORC bloom filters for ORC v11 output-format --- Key: HIVE-9667 URL: https://issues.apache.org/jira/browse/HIVE-9667 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 1.2.0 Reporter: Gopal V Assignee: Gopal V Priority: Minor Fix For: 1.2.0 Attachments: HIVE-9667.1.patch ORC column bloom filters should only be written if the file format is 0.12+. The older format should not write out the metadata streams for bloom filters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9684) Incorrect disk range computation in ORC because of optional stream kind
Prasanth Jayachandran created HIVE-9684: --- Summary: Incorrect disk range computation in ORC because of optional stream kind Key: HIVE-9684 URL: https://issues.apache.org/jira/browse/HIVE-9684 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 1.0.0, 1.2.0, 1.1.0, 1.0.1 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Priority: Critical HIVE-9593 changed all required fields in ORC protobuf message to optional field. But DiskRange computation and stream creation code assumes existence of stream kind everywhere. This leads to incorrect calculation of diskranges resulting in out of range exceptions. The proper fix is to check if stream kind exists using stream.hasKind() before adding the stream to disk range computation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-2573) Create per-session function registry
[ https://issues.apache.org/jira/browse/HIVE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-2573: - Labels: TODOC1.2 (was: ) Create per-session function registry - Key: HIVE-2573 URL: https://issues.apache.org/jira/browse/HIVE-2573 Project: Hive Issue Type: Improvement Components: Server Infrastructure Reporter: Navis Assignee: Navis Priority: Minor Labels: TODOC1.2 Fix For: 1.2.0 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2573.D3231.1.patch, HIVE-2573.1.patch.txt, HIVE-2573.10.patch.txt, HIVE-2573.11.patch.txt, HIVE-2573.12.patch.txt, HIVE-2573.13.patch.txt, HIVE-2573.14.patch.txt, HIVE-2573.15.patch.txt, HIVE-2573.2.patch.txt, HIVE-2573.3.patch.txt, HIVE-2573.4.patch.txt, HIVE-2573.5.patch, HIVE-2573.6.patch, HIVE-2573.7.patch, HIVE-2573.8.patch.txt, HIVE-2573.9.patch.txt Currently the function registry is shared resource and could be overrided by other users when using HiveServer. If per-session function registry is provided, this situation could be prevented. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9561) SHUFFLE_SORT should only be used for order by query [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14319776#comment-14319776 ] Hive QA commented on HIVE-9561: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12698669/HIVE-9561.3-spark.patch {color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 7471 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby7_noskew_multi_single_reducer org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_multi_single_reducer3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parallel_join0 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_covar_samp org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union4 org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_join_with_different_encryption_keys org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_union3 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_union4 org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchCommit_Json {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/724/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/724/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-724/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 10 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12698669 - PreCommit-HIVE-SPARK-Build SHUFFLE_SORT should only be used for order by query [Spark Branch] -- Key: HIVE-9561 URL: https://issues.apache.org/jira/browse/HIVE-9561 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-9561.1-spark.patch, HIVE-9561.2-spark.patch, HIVE-9561.3-spark.patch The {{sortByKey}} shuffle launches probe jobs. Such jobs can hurt performance and are difficult to control. So we should limit the use of {{sortByKey}} to order by query only. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9684) Incorrect disk range computation in ORC because of optional stream kind
[ https://issues.apache.org/jira/browse/HIVE-9684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-9684: Attachment: HIVE-9684.branch-1.1.patch Incorrect disk range computation in ORC because of optional stream kind --- Key: HIVE-9684 URL: https://issues.apache.org/jira/browse/HIVE-9684 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 1.0.0, 1.2.0, 1.1.0, 1.0.1 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Priority: Critical Attachments: HIVE-9684.branch-1.0.patch, HIVE-9684.branch-1.1.patch HIVE-9593 changed all required fields in ORC protobuf message to optional field. But DiskRange computation and stream creation code assumes existence of stream kind everywhere. This leads to incorrect calculation of diskranges resulting in out of range exceptions. The proper fix is to check if stream kind exists using stream.hasKind() before adding the stream to disk range computation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9638) Drop Index does not check Index or Table exisit or not
[ https://issues.apache.org/jira/browse/HIVE-9638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14319806#comment-14319806 ] Chinna Rao Lalam commented on HIVE-9638: Hi, In Hive 0.7.0 or later, DROP returns an error if the index doesn't exist, unless IF EXISTS is specified or the configuration variable hive.exec.drop.ignorenonexistent is set to true. Drop Index does not check Index or Table exisit or not -- Key: HIVE-9638 URL: https://issues.apache.org/jira/browse/HIVE-9638 Project: Hive Issue Type: Bug Components: Parser Affects Versions: 0.11.0, 0.13.0, 0.14.0, 1.0.0 Reporter: Will Du DROP INDEX index_name ON table_name; statement will be always successful no matter the index_name or table_name exsit -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9655) Dynamic partition table insertion error
[ https://issues.apache.org/jira/browse/HIVE-9655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14319822#comment-14319822 ] Hive QA commented on HIVE-9655: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12698598/HIVE-9655.2.patch {color:green}SUCCESS:{color} +1 7543 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2790/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2790/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2790/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12698598 - PreCommit-HIVE-TRUNK-Build Dynamic partition table insertion error --- Key: HIVE-9655 URL: https://issues.apache.org/jira/browse/HIVE-9655 Project: Hive Issue Type: Bug Components: Logical Optimizer Affects Versions: 1.1 Reporter: Chao Assignee: Chao Attachments: HIVE-9655.1.patch, HIVE-9655.2.patch We have these two tables: {code} create table t1 (c1 bigint, c2 string); CREATE TABLE t2 (c1 int, c2 string) PARTITIONED BY (p1 string); load data local inpath 'data' into table t1; load data local inpath 'data' into table t1; load data local inpath 'data' into table t1; load data local inpath 'data' into table t1; load data local inpath 'data' into table t1; {code} But, when try to insert into table t2 from t1: {code} SET hive.exec.dynamic.partition.mode=nonstrict; insert overwrite table t2 partition(p1) select *,c1 as p1 from t1 distribute by p1; {code} The query failed with the following exception: {noformat} 2015-02-11 12:50:52,756 ERROR [LocalJobRunner Map Task Executor #0]: mr.ExecMapper (ExecMapper.java:map(178)) - org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {c1:1,c2:one} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:503) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: cannot find field _col2 from [0:_col0, 1:_col1] at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:397) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493) ... 10 more Caused by: java.lang.RuntimeException: cannot find field _col2 from [0:_col0, 1:_col1] at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:410) at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147) at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:55) at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:954) at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:325) ... 16 more {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9425) Add jar/file doesn't work with yarn-cluster mode [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14319919#comment-14319919 ] Hive QA commented on HIVE-9425: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12698673/HIVE-9425.1-spark.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 7471 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_join_with_different_encryption_keys {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/725/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/725/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-725/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12698673 - PreCommit-HIVE-SPARK-Build Add jar/file doesn't work with yarn-cluster mode [Spark Branch] --- Key: HIVE-9425 URL: https://issues.apache.org/jira/browse/HIVE-9425 Project: Hive Issue Type: Sub-task Components: spark-branch Reporter: Xiaomin Zhang Assignee: Rui Li Attachments: HIVE-9425.1-spark.patch {noformat} 15/01/20 00:27:31 INFO cluster.YarnClusterScheduler: YarnClusterScheduler.postStartHook done 15/01/20 00:27:31 ERROR spark.SparkContext: Error adding jar (java.io.FileNotFoundException: hive-exec-0.15.0-SNAPSHOT.jar (No such file or directory)), was the --addJars option used? 15/01/20 00:27:31 ERROR spark.SparkContext: Error adding jar (java.io.FileNotFoundException: opennlp-maxent-3.0.3.jar (No such file or directory)), was the --addJars option used? 15/01/20 00:27:31 ERROR spark.SparkContext: Error adding jar (java.io.FileNotFoundException: bigbenchqueriesmr.jar (No such file or directory)), was the --addJars option used? 15/01/20 00:27:31 ERROR spark.SparkContext: Error adding jar (java.io.FileNotFoundException: opennlp-tools-1.5.3.jar (No such file or directory)), was the --addJars option used? 15/01/20 00:27:31 ERROR spark.SparkContext: Error adding jar (java.io.FileNotFoundException: jcl-over-slf4j-1.7.5.jar (No such file or directory)), was the --addJars option used? 15/01/20 00:27:31 INFO client.RemoteDriver: Received job request fef081b0-5408-4804-9531-d131fdd628e6 15/01/20 00:27:31 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize 15/01/20 00:27:31 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize 15/01/20 00:27:31 INFO client.RemoteDriver: Failed to run job fef081b0-5408-4804-9531-d131fdd628e6 org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find class: de.bankmark.bigbench.queries.q10.SentimentUDF Serialization trace: genericUDTF (org.apache.hadoop.hive.ql.plan.UDTFDesc) conf (org.apache.hadoop.hive.ql.exec.UDTFOperator) childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator) childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator) aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork) invertedWorkGraph (org.apache.hadoop.hive.ql.plan.SparkWork) at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138) at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115) {noformat} It seems the additional Jar files are not uploaded to DistributedCache, so that the Driver cannot access it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9684) Incorrect disk range computation in ORC because of optional stream kind
[ https://issues.apache.org/jira/browse/HIVE-9684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14319809#comment-14319809 ] Prasanth Jayachandran commented on HIVE-9684: - [~gopalv]/[~owen.omalley] Can someone review this patch? Incorrect disk range computation in ORC because of optional stream kind --- Key: HIVE-9684 URL: https://issues.apache.org/jira/browse/HIVE-9684 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 1.0.0, 1.1.0, 1.0.1 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Priority: Critical Attachments: HIVE-9684.branch-1.0.patch, HIVE-9684.branch-1.1.patch HIVE-9593 changed all required fields in ORC protobuf message to optional field. But DiskRange computation and stream creation code assumes existence of stream kind everywhere. This leads to incorrect calculation of diskranges resulting in out of range exceptions. The proper fix is to check if stream kind exists using stream.hasKind() before adding the stream to disk range computation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9684) Incorrect disk range computation in ORC because of optional stream kind
[ https://issues.apache.org/jira/browse/HIVE-9684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-9684: Affects Version/s: (was: 1.2.0) Incorrect disk range computation in ORC because of optional stream kind --- Key: HIVE-9684 URL: https://issues.apache.org/jira/browse/HIVE-9684 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 1.0.0, 1.1.0, 1.0.1 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Priority: Critical Attachments: HIVE-9684.branch-1.0.patch, HIVE-9684.branch-1.1.patch HIVE-9593 changed all required fields in ORC protobuf message to optional field. But DiskRange computation and stream creation code assumes existence of stream kind everywhere. This leads to incorrect calculation of diskranges resulting in out of range exceptions. The proper fix is to check if stream kind exists using stream.hasKind() before adding the stream to disk range computation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9684) Incorrect disk range computation in ORC because of optional stream kind
[ https://issues.apache.org/jira/browse/HIVE-9684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-9684: Status: Patch Available (was: Open) Incorrect disk range computation in ORC because of optional stream kind --- Key: HIVE-9684 URL: https://issues.apache.org/jira/browse/HIVE-9684 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 1.0.0, 1.1.0, 1.0.1 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Priority: Critical Attachments: HIVE-9684.branch-1.0.patch, HIVE-9684.branch-1.1.patch HIVE-9593 changed all required fields in ORC protobuf message to optional field. But DiskRange computation and stream creation code assumes existence of stream kind everywhere. This leads to incorrect calculation of diskranges resulting in out of range exceptions. The proper fix is to check if stream kind exists using stream.hasKind() before adding the stream to disk range computation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9645) Constant folding case NULL equality
[ https://issues.apache.org/jira/browse/HIVE-9645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-9645: --- Status: Open (was: Patch Available) Constant folding case NULL equality --- Key: HIVE-9645 URL: https://issues.apache.org/jira/browse/HIVE-9645 Project: Hive Issue Type: Bug Components: Logical Optimizer Affects Versions: 1.2.0 Reporter: Gopal V Assignee: Ashutosh Chauhan Attachments: HIVE-9645.1.patch, HIVE-9645.patch Hive logical optimizer does not follow the Null scan codepath when encountering a NULL = 1; NULL = 1 is not evaluated as false in the constant propogation implementation. {code} hive explain select count(1) from store_sales where null=1; ... TableScan alias: store_sales filterExpr: (null = 1) (type: boolean) Statistics: Num rows: 550076554 Data size: 49570324480 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (null = 1) (type: boolean) Statistics: Num rows: 275038277 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9645) Constant folding case NULL equality
[ https://issues.apache.org/jira/browse/HIVE-9645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-9645: --- Status: Patch Available (was: Open) Constant folding case NULL equality --- Key: HIVE-9645 URL: https://issues.apache.org/jira/browse/HIVE-9645 Project: Hive Issue Type: Bug Components: Logical Optimizer Affects Versions: 1.2.0 Reporter: Gopal V Assignee: Ashutosh Chauhan Attachments: HIVE-9645.1.patch, HIVE-9645.patch Hive logical optimizer does not follow the Null scan codepath when encountering a NULL = 1; NULL = 1 is not evaluated as false in the constant propogation implementation. {code} hive explain select count(1) from store_sales where null=1; ... TableScan alias: store_sales filterExpr: (null = 1) (type: boolean) Statistics: Num rows: 550076554 Data size: 49570324480 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (null = 1) (type: boolean) Statistics: Num rows: 275038277 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9645) Constant folding case NULL equality
[ https://issues.apache.org/jira/browse/HIVE-9645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-9645: --- Attachment: HIVE-9645.1.patch Fixed test cases. Constant folding case NULL equality --- Key: HIVE-9645 URL: https://issues.apache.org/jira/browse/HIVE-9645 Project: Hive Issue Type: Bug Components: Logical Optimizer Affects Versions: 1.2.0 Reporter: Gopal V Assignee: Ashutosh Chauhan Attachments: HIVE-9645.1.patch, HIVE-9645.patch Hive logical optimizer does not follow the Null scan codepath when encountering a NULL = 1; NULL = 1 is not evaluated as false in the constant propogation implementation. {code} hive explain select count(1) from store_sales where null=1; ... TableScan alias: store_sales filterExpr: (null = 1) (type: boolean) Statistics: Num rows: 550076554 Data size: 49570324480 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (null = 1) (type: boolean) Statistics: Num rows: 275038277 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7759) document hive cli authorization behavior when SQL std auth is enabled
[ https://issues.apache.org/jira/browse/HIVE-7759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-7759: - Labels: (was: TODOC14) document hive cli authorization behavior when SQL std auth is enabled - Key: HIVE-7759 URL: https://issues.apache.org/jira/browse/HIVE-7759 Project: Hive Issue Type: Bug Components: Authorization Affects Versions: 0.13.0, 0.14.0, 0.13.1 Reporter: Thejas M Nair Assignee: Thejas M Nair There should a section in sql standard auth doc that highlights how hive-cli behaves with SQL standard authorization turned on. Changes in HIVE-7533 and HIVE-7209 should be documented as part of it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (HIVE-8128) Improve Parquet Vectorization
[ https://issues.apache.org/jira/browse/HIVE-8128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-8128 started by Dong Chen. --- Improve Parquet Vectorization - Key: HIVE-8128 URL: https://issues.apache.org/jira/browse/HIVE-8128 Project: Hive Issue Type: Sub-task Reporter: Brock Noland Assignee: Dong Chen We'll want to do is finish the vectorization work (e.g. VectorizedOrcSerde, VectorizedOrcSerde) which was partially done in HIVE-5998. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9425) External Function Jar files are not available for Driver when running with yarn-cluster mode [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-9425: - Status: Patch Available (was: Open) External Function Jar files are not available for Driver when running with yarn-cluster mode [Spark Branch] --- Key: HIVE-9425 URL: https://issues.apache.org/jira/browse/HIVE-9425 Project: Hive Issue Type: Sub-task Components: spark-branch Reporter: Xiaomin Zhang Assignee: Rui Li Attachments: HIVE-9425.1-spark.patch {noformat} 15/01/20 00:27:31 INFO cluster.YarnClusterScheduler: YarnClusterScheduler.postStartHook done 15/01/20 00:27:31 ERROR spark.SparkContext: Error adding jar (java.io.FileNotFoundException: hive-exec-0.15.0-SNAPSHOT.jar (No such file or directory)), was the --addJars option used? 15/01/20 00:27:31 ERROR spark.SparkContext: Error adding jar (java.io.FileNotFoundException: opennlp-maxent-3.0.3.jar (No such file or directory)), was the --addJars option used? 15/01/20 00:27:31 ERROR spark.SparkContext: Error adding jar (java.io.FileNotFoundException: bigbenchqueriesmr.jar (No such file or directory)), was the --addJars option used? 15/01/20 00:27:31 ERROR spark.SparkContext: Error adding jar (java.io.FileNotFoundException: opennlp-tools-1.5.3.jar (No such file or directory)), was the --addJars option used? 15/01/20 00:27:31 ERROR spark.SparkContext: Error adding jar (java.io.FileNotFoundException: jcl-over-slf4j-1.7.5.jar (No such file or directory)), was the --addJars option used? 15/01/20 00:27:31 INFO client.RemoteDriver: Received job request fef081b0-5408-4804-9531-d131fdd628e6 15/01/20 00:27:31 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize 15/01/20 00:27:31 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize 15/01/20 00:27:31 INFO client.RemoteDriver: Failed to run job fef081b0-5408-4804-9531-d131fdd628e6 org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find class: de.bankmark.bigbench.queries.q10.SentimentUDF Serialization trace: genericUDTF (org.apache.hadoop.hive.ql.plan.UDTFDesc) conf (org.apache.hadoop.hive.ql.exec.UDTFOperator) childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator) childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator) aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork) invertedWorkGraph (org.apache.hadoop.hive.ql.plan.SparkWork) at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138) at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115) {noformat} It seems the additional Jar files are not uploaded to DistributedCache, so that the Driver cannot access it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9425) External Function Jar files are not available for Driver when running with yarn-cluster mode [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-9425: - Description: {noformat} 15/01/20 00:27:31 INFO cluster.YarnClusterScheduler: YarnClusterScheduler.postStartHook done 15/01/20 00:27:31 ERROR spark.SparkContext: Error adding jar (java.io.FileNotFoundException: hive-exec-0.15.0-SNAPSHOT.jar (No such file or directory)), was the --addJars option used? 15/01/20 00:27:31 ERROR spark.SparkContext: Error adding jar (java.io.FileNotFoundException: opennlp-maxent-3.0.3.jar (No such file or directory)), was the --addJars option used? 15/01/20 00:27:31 ERROR spark.SparkContext: Error adding jar (java.io.FileNotFoundException: bigbenchqueriesmr.jar (No such file or directory)), was the --addJars option used? 15/01/20 00:27:31 ERROR spark.SparkContext: Error adding jar (java.io.FileNotFoundException: opennlp-tools-1.5.3.jar (No such file or directory)), was the --addJars option used? 15/01/20 00:27:31 ERROR spark.SparkContext: Error adding jar (java.io.FileNotFoundException: jcl-over-slf4j-1.7.5.jar (No such file or directory)), was the --addJars option used? 15/01/20 00:27:31 INFO client.RemoteDriver: Received job request fef081b0-5408-4804-9531-d131fdd628e6 15/01/20 00:27:31 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize 15/01/20 00:27:31 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize 15/01/20 00:27:31 INFO client.RemoteDriver: Failed to run job fef081b0-5408-4804-9531-d131fdd628e6 org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find class: de.bankmark.bigbench.queries.q10.SentimentUDF Serialization trace: genericUDTF (org.apache.hadoop.hive.ql.plan.UDTFDesc) conf (org.apache.hadoop.hive.ql.exec.UDTFOperator) childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator) childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator) aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork) invertedWorkGraph (org.apache.hadoop.hive.ql.plan.SparkWork) at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138) at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115) {noformat} It seems the additional Jar files are not uploaded to DistributedCache, so that the Driver cannot access it. was: 15/01/20 00:27:31 INFO cluster.YarnClusterScheduler: YarnClusterScheduler.postStartHook done 15/01/20 00:27:31 ERROR spark.SparkContext: Error adding jar (java.io.FileNotFoundException: hive-exec-0.15.0-SNAPSHOT.jar (No such file or directory)), was the --addJars option used? 15/01/20 00:27:31 ERROR spark.SparkContext: Error adding jar (java.io.FileNotFoundException: opennlp-maxent-3.0.3.jar (No such file or directory)), was the --addJars option used? 15/01/20 00:27:31 ERROR spark.SparkContext: Error adding jar (java.io.FileNotFoundException: bigbenchqueriesmr.jar (No such file or directory)), was the --addJars option used? 15/01/20 00:27:31 ERROR spark.SparkContext: Error adding jar (java.io.FileNotFoundException: opennlp-tools-1.5.3.jar (No such file or directory)), was the --addJars option used? 15/01/20 00:27:31 ERROR spark.SparkContext: Error adding jar (java.io.FileNotFoundException: jcl-over-slf4j-1.7.5.jar (No such file or directory)), was the --addJars option used? 15/01/20 00:27:31 INFO client.RemoteDriver: Received job request fef081b0-5408-4804-9531-d131fdd628e6 15/01/20 00:27:31 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize 15/01/20 00:27:31 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize 15/01/20 00:27:31 INFO client.RemoteDriver: Failed to run job fef081b0-5408-4804-9531-d131fdd628e6 org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find class: de.bankmark.bigbench.queries.q10.SentimentUDF Serialization trace: genericUDTF (org.apache.hadoop.hive.ql.plan.UDTFDesc) conf (org.apache.hadoop.hive.ql.exec.UDTFOperator) childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator) childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator) aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork) invertedWorkGraph (org.apache.hadoop.hive.ql.plan.SparkWork) at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138) at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115) It seems the additional Jar files are not uploaded to DistributedCache, so that the Driver cannot access it. External Function Jar files are not available for Driver when running with
[jira] [Updated] (HIVE-9425) Add jar/file doesn't work with yarn-cluster mode [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-9425: - Summary: Add jar/file doesn't work with yarn-cluster mode [Spark Branch] (was: External Function Jar files are not available for Driver when running with yarn-cluster mode [Spark Branch]) Add jar/file doesn't work with yarn-cluster mode [Spark Branch] --- Key: HIVE-9425 URL: https://issues.apache.org/jira/browse/HIVE-9425 Project: Hive Issue Type: Sub-task Components: spark-branch Reporter: Xiaomin Zhang Assignee: Rui Li Attachments: HIVE-9425.1-spark.patch {noformat} 15/01/20 00:27:31 INFO cluster.YarnClusterScheduler: YarnClusterScheduler.postStartHook done 15/01/20 00:27:31 ERROR spark.SparkContext: Error adding jar (java.io.FileNotFoundException: hive-exec-0.15.0-SNAPSHOT.jar (No such file or directory)), was the --addJars option used? 15/01/20 00:27:31 ERROR spark.SparkContext: Error adding jar (java.io.FileNotFoundException: opennlp-maxent-3.0.3.jar (No such file or directory)), was the --addJars option used? 15/01/20 00:27:31 ERROR spark.SparkContext: Error adding jar (java.io.FileNotFoundException: bigbenchqueriesmr.jar (No such file or directory)), was the --addJars option used? 15/01/20 00:27:31 ERROR spark.SparkContext: Error adding jar (java.io.FileNotFoundException: opennlp-tools-1.5.3.jar (No such file or directory)), was the --addJars option used? 15/01/20 00:27:31 ERROR spark.SparkContext: Error adding jar (java.io.FileNotFoundException: jcl-over-slf4j-1.7.5.jar (No such file or directory)), was the --addJars option used? 15/01/20 00:27:31 INFO client.RemoteDriver: Received job request fef081b0-5408-4804-9531-d131fdd628e6 15/01/20 00:27:31 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize 15/01/20 00:27:31 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize 15/01/20 00:27:31 INFO client.RemoteDriver: Failed to run job fef081b0-5408-4804-9531-d131fdd628e6 org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find class: de.bankmark.bigbench.queries.q10.SentimentUDF Serialization trace: genericUDTF (org.apache.hadoop.hive.ql.plan.UDTFDesc) conf (org.apache.hadoop.hive.ql.exec.UDTFOperator) childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator) childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator) aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork) invertedWorkGraph (org.apache.hadoop.hive.ql.plan.SparkWork) at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138) at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115) {noformat} It seems the additional Jar files are not uploaded to DistributedCache, so that the Driver cannot access it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9425) External Function Jar files are not available for Driver when running with yarn-cluster mode [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-9425: - Attachment: HIVE-9425.1-spark.patch Upload an initial patch on behalf of Chengxiang. [~zhos] and [~xhao1], please help to verify if this can solve your problems. Thanks! External Function Jar files are not available for Driver when running with yarn-cluster mode [Spark Branch] --- Key: HIVE-9425 URL: https://issues.apache.org/jira/browse/HIVE-9425 Project: Hive Issue Type: Sub-task Components: spark-branch Reporter: Xiaomin Zhang Assignee: Rui Li Attachments: HIVE-9425.1-spark.patch 15/01/20 00:27:31 INFO cluster.YarnClusterScheduler: YarnClusterScheduler.postStartHook done 15/01/20 00:27:31 ERROR spark.SparkContext: Error adding jar (java.io.FileNotFoundException: hive-exec-0.15.0-SNAPSHOT.jar (No such file or directory)), was the --addJars option used? 15/01/20 00:27:31 ERROR spark.SparkContext: Error adding jar (java.io.FileNotFoundException: opennlp-maxent-3.0.3.jar (No such file or directory)), was the --addJars option used? 15/01/20 00:27:31 ERROR spark.SparkContext: Error adding jar (java.io.FileNotFoundException: bigbenchqueriesmr.jar (No such file or directory)), was the --addJars option used? 15/01/20 00:27:31 ERROR spark.SparkContext: Error adding jar (java.io.FileNotFoundException: opennlp-tools-1.5.3.jar (No such file or directory)), was the --addJars option used? 15/01/20 00:27:31 ERROR spark.SparkContext: Error adding jar (java.io.FileNotFoundException: jcl-over-slf4j-1.7.5.jar (No such file or directory)), was the --addJars option used? 15/01/20 00:27:31 INFO client.RemoteDriver: Received job request fef081b0-5408-4804-9531-d131fdd628e6 15/01/20 00:27:31 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize 15/01/20 00:27:31 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize 15/01/20 00:27:31 INFO client.RemoteDriver: Failed to run job fef081b0-5408-4804-9531-d131fdd628e6 org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find class: de.bankmark.bigbench.queries.q10.SentimentUDF Serialization trace: genericUDTF (org.apache.hadoop.hive.ql.plan.UDTFDesc) conf (org.apache.hadoop.hive.ql.exec.UDTFOperator) childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator) childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator) aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork) invertedWorkGraph (org.apache.hadoop.hive.ql.plan.SparkWork) at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138) at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115) It seems the additional Jar files are not uploaded to DistributedCache, so that the Driver cannot access it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9666) Improve some qtests
[ https://issues.apache.org/jira/browse/HIVE-9666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14319979#comment-14319979 ] Hive QA commented on HIVE-9666: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12698602/HIVE-9666.2.patch {color:green}SUCCESS:{color} +1 7542 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2791/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2791/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2791/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12698602 - PreCommit-HIVE-TRUNK-Build Improve some qtests --- Key: HIVE-9666 URL: https://issues.apache.org/jira/browse/HIVE-9666 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Rui Li Priority: Minor Attachments: HIVE-9666.1.patch, HIVE-9666.2.patch {code} groupby7_noskew_multi_single_reducer.q groupby_multi_single_reducer3.q parallel_join0.q union3.q union4.q {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9683) Hive metastore thrift client connections hang indefinitely
[ https://issues.apache.org/jira/browse/HIVE-9683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14320264#comment-14320264 ] Hive QA commented on HIVE-9683: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12698617/HIVE-9683.1.patch {color:green}SUCCESS:{color} +1 7542 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2792/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2792/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2792/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12698617 - PreCommit-HIVE-TRUNK-Build Hive metastore thrift client connections hang indefinitely -- Key: HIVE-9683 URL: https://issues.apache.org/jira/browse/HIVE-9683 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 1.0.0, 1.0.1 Reporter: Gopal V Assignee: Gopal V Priority: Minor Fix For: 1.0.1 Attachments: HIVE-9683.1.patch THRIFT-2788 fixed network-partition problems that affect Thrift client connections. Since hive-1.0 is on thrift-0.9.0 which is affected by the bug, a workaround can be applied to prevent indefinite connection hangs during net-splits. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (HIVE-7787) Reading Parquet file with enum in Thrift Encoding throws NoSuchFieldError
[ https://issues.apache.org/jira/browse/HIVE-7787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arup Malakar reopened HIVE-7787: Reading Parquet file with enum in Thrift Encoding throws NoSuchFieldError - Key: HIVE-7787 URL: https://issues.apache.org/jira/browse/HIVE-7787 Project: Hive Issue Type: Bug Components: Database/Schema, Thrift API Affects Versions: 0.12.0, 0.13.0, 0.12.1, 0.14.0, 0.13.1 Environment: Hive 0.12 CDH 5.1.0, Hadoop 2.3.0 CDH 5.1.0 Reporter: Raymond Lau Assignee: Arup Malakar Priority: Minor Attachments: HIVE-7787.trunk.1.patch When reading Parquet file, where the original Thrift schema contains a struct with an enum, this causes the following error (full stack trace blow): {code} java.lang.NoSuchFieldError: DECIMAL. {code} Example Thrift Schema: {code} enum MyEnumType { EnumOne, EnumTwo, EnumThree } struct MyStruct { 1: optional MyEnumType myEnumType; 2: optional string field2; 3: optional string field3; } struct outerStruct { 1: optional listMyStruct myStructs } {code} Hive Table: {code} CREATE EXTERNAL TABLE mytable ( mystructs arraystructmyenumtype: string, field2: string, field3: string ) ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'parquet.hive.DeprecatedParquetInputFormat' OUTPUTFORMAT 'parquet.hive.DeprecatedParquetOutputFormat' ; {code} Error Stack trace: {code} Java stack trace for Hive 0.12: Caused by: java.lang.NoSuchFieldError: DECIMAL at org.apache.hadoop.hive.ql.io.parquet.convert.ETypeConverter.getNewConverter(ETypeConverter.java:146) at org.apache.hadoop.hive.ql.io.parquet.convert.HiveGroupConverter.getConverterFromDescription(HiveGroupConverter.java:31) at org.apache.hadoop.hive.ql.io.parquet.convert.ArrayWritableGroupConverter.init(ArrayWritableGroupConverter.java:45) at org.apache.hadoop.hive.ql.io.parquet.convert.HiveGroupConverter.getConverterFromDescription(HiveGroupConverter.java:34) at org.apache.hadoop.hive.ql.io.parquet.convert.DataWritableGroupConverter.init(DataWritableGroupConverter.java:64) at org.apache.hadoop.hive.ql.io.parquet.convert.DataWritableGroupConverter.init(DataWritableGroupConverter.java:47) at org.apache.hadoop.hive.ql.io.parquet.convert.HiveGroupConverter.getConverterFromDescription(HiveGroupConverter.java:36) at org.apache.hadoop.hive.ql.io.parquet.convert.DataWritableGroupConverter.init(DataWritableGroupConverter.java:64) at org.apache.hadoop.hive.ql.io.parquet.convert.DataWritableGroupConverter.init(DataWritableGroupConverter.java:40) at org.apache.hadoop.hive.ql.io.parquet.convert.DataWritableRecordConverter.init(DataWritableRecordConverter.java:32) at org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.prepareForRead(DataWritableReadSupport.java:128) at parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:142) at parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:118) at parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:107) at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.init(ParquetRecordReaderWrapper.java:92) at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.init(ParquetRecordReaderWrapper.java:66) at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:51) at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.init(CombineHiveRecordReader.java:65) ... 16 more {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9138) Add some explain to PTF operator
[ https://issues.apache.org/jira/browse/HIVE-9138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14320373#comment-14320373 ] Hive QA commented on HIVE-9138: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12698640/HIVE-9138.5.patch.txt {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 7535 tests executed *Failed tests:* {noformat} TestSparkClient - did not produce a TEST-*.xml file {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2793/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2793/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2793/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12698640 - PreCommit-HIVE-TRUNK-Build Add some explain to PTF operator Key: HIVE-9138 URL: https://issues.apache.org/jira/browse/HIVE-9138 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Trivial Attachments: HIVE-9138.1.patch.txt, HIVE-9138.2.patch.txt, HIVE-9138.3.patch.txt, HIVE-9138.4.patch.txt, HIVE-9138.5.patch.txt PTFOperator does not explain anything in explain statement, making it hard to understand the internal works. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9605) Remove parquet nested objects from wrapper writable objects
[ https://issues.apache.org/jira/browse/HIVE-9605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14320321#comment-14320321 ] Sergio Peña commented on HIVE-9605: --- This test passes in 'parquet' branch. The patch required the HIVE-9333 patch in order to run correctly. Remove parquet nested objects from wrapper writable objects --- Key: HIVE-9605 URL: https://issues.apache.org/jira/browse/HIVE-9605 Project: Hive Issue Type: Sub-task Affects Versions: 0.14.0 Reporter: Sergio Peña Assignee: Sergio Peña Attachments: HIVE-9605.3.patch, HIVE-9605.4.patch Parquet nested types are using an extra wrapper object (ArrayWritable) as a wrapper of map and list elements. This extra object is not needed and causing unnecessary memory allocations. An example of code is on HiveCollectionConverter.java: {noformat} public void end() { parent.set(index, wrapList(new ArrayWritable( Writable.class, list.toArray(new Writable[list.size()]; } {noformat} This object is later unwrapped on AbstractParquetMapInspector, i.e.: {noformat} final Writable[] mapContainer = ((ArrayWritable) data).get(); final Writable[] mapArray = ((ArrayWritable) mapContainer[0]).get(); for (final Writable obj : mapArray) { ... } {noformat} We should get rid of this wrapper object to save time and memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9685) CLIService should create SessionState after logging into kerberos
Brock Noland created HIVE-9685: -- Summary: CLIService should create SessionState after logging into kerberos Key: HIVE-9685 URL: https://issues.apache.org/jira/browse/HIVE-9685 Project: Hive Issue Type: Bug Affects Versions: 1.1.0 Reporter: Brock Noland Assignee: Brock Noland -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9686) HiveMetastore.logAuditEvent can be used before sasl server is started
Brock Noland created HIVE-9686: -- Summary: HiveMetastore.logAuditEvent can be used before sasl server is started Key: HIVE-9686 URL: https://issues.apache.org/jira/browse/HIVE-9686 Project: Hive Issue Type: Bug Reporter: Brock Noland Assignee: Brock Noland Metastore listeners can use logAudit before the sasl server is started resulting in an NPE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9685) CLIService should create SessionState after logging into kerberos
[ https://issues.apache.org/jira/browse/HIVE-9685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-9685: --- Attachment: HIVE-9685.patch CLIService should create SessionState after logging into kerberos - Key: HIVE-9685 URL: https://issues.apache.org/jira/browse/HIVE-9685 Project: Hive Issue Type: Bug Affects Versions: 1.1.0 Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-9685.patch {noformat} javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211) at org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94) at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271) at org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37) at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52) at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:409) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.init(HiveMetaStoreClient.java:230) at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.init(SessionHiveMetaStoreClient.java:74) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:408) at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1483) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.init(RetryingMetaStoreClient.java:64) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:74) at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2841) at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2860) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:453) at org.apache.hive.service.cli.CLIService.applyAuthorizationConfigPolicy(CLIService.java:123) at org.apache.hive.service.cli.CLIService.init(CLIService.java:81) at org.apache.hive.service.CompositeService.init(CompositeService.java:59) at org.apache.hive.service.server.HiveServer2.init(HiveServer2.java:92) at org.apache.hive.service.server.HiveServer2.startHiveServer2(HiveServer2.java:309) at org.apache.hive.service.server.HiveServer2.access$400(HiveServer2.java:68) at org.apache.hive.service.server.HiveServer2$StartOptionExecutor.execute(HiveServer2.java:523) at org.apache.hive.service.server.HiveServer2.main(HiveServer2.java:396) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9685) CLIService should create SessionState after logging into kerberos
[ https://issues.apache.org/jira/browse/HIVE-9685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-9685: --- Description: {noformat} javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211) at org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94) at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271) at org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37) at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52) at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:409) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.init(HiveMetaStoreClient.java:230) at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.init(SessionHiveMetaStoreClient.java:74) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:408) at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1483) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.init(RetryingMetaStoreClient.java:64) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:74) at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2841) at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2860) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:453) at org.apache.hive.service.cli.CLIService.applyAuthorizationConfigPolicy(CLIService.java:123) at org.apache.hive.service.cli.CLIService.init(CLIService.java:81) at org.apache.hive.service.CompositeService.init(CompositeService.java:59) at org.apache.hive.service.server.HiveServer2.init(HiveServer2.java:92) at org.apache.hive.service.server.HiveServer2.startHiveServer2(HiveServer2.java:309) at org.apache.hive.service.server.HiveServer2.access$400(HiveServer2.java:68) at org.apache.hive.service.server.HiveServer2$StartOptionExecutor.execute(HiveServer2.java:523) at org.apache.hive.service.server.HiveServer2.main(HiveServer2.java:396) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) {noformat} CLIService should create SessionState after logging into kerberos - Key: HIVE-9685 URL: https://issues.apache.org/jira/browse/HIVE-9685 Project: Hive Issue Type: Bug Affects Versions: 1.1.0 Reporter: Brock Noland Assignee: Brock Noland {noformat} javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211) at org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94) at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271) at org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37) at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52) at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49) at
[jira] [Commented] (HIVE-7787) Reading Parquet file with enum in Thrift Encoding throws NoSuchFieldError
[ https://issues.apache.org/jira/browse/HIVE-7787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14320356#comment-14320356 ] Arup Malakar commented on HIVE-7787: I tried release 1.0 and still have the same problem, I am going to reopen the JIRA. I will resubmit the patch when I get time. Reading Parquet file with enum in Thrift Encoding throws NoSuchFieldError - Key: HIVE-7787 URL: https://issues.apache.org/jira/browse/HIVE-7787 Project: Hive Issue Type: Bug Components: Database/Schema, Thrift API Affects Versions: 0.12.0, 0.13.0, 0.12.1, 0.14.0, 0.13.1 Environment: Hive 0.12 CDH 5.1.0, Hadoop 2.3.0 CDH 5.1.0 Reporter: Raymond Lau Assignee: Arup Malakar Priority: Minor Attachments: HIVE-7787.trunk.1.patch When reading Parquet file, where the original Thrift schema contains a struct with an enum, this causes the following error (full stack trace blow): {code} java.lang.NoSuchFieldError: DECIMAL. {code} Example Thrift Schema: {code} enum MyEnumType { EnumOne, EnumTwo, EnumThree } struct MyStruct { 1: optional MyEnumType myEnumType; 2: optional string field2; 3: optional string field3; } struct outerStruct { 1: optional listMyStruct myStructs } {code} Hive Table: {code} CREATE EXTERNAL TABLE mytable ( mystructs arraystructmyenumtype: string, field2: string, field3: string ) ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'parquet.hive.DeprecatedParquetInputFormat' OUTPUTFORMAT 'parquet.hive.DeprecatedParquetOutputFormat' ; {code} Error Stack trace: {code} Java stack trace for Hive 0.12: Caused by: java.lang.NoSuchFieldError: DECIMAL at org.apache.hadoop.hive.ql.io.parquet.convert.ETypeConverter.getNewConverter(ETypeConverter.java:146) at org.apache.hadoop.hive.ql.io.parquet.convert.HiveGroupConverter.getConverterFromDescription(HiveGroupConverter.java:31) at org.apache.hadoop.hive.ql.io.parquet.convert.ArrayWritableGroupConverter.init(ArrayWritableGroupConverter.java:45) at org.apache.hadoop.hive.ql.io.parquet.convert.HiveGroupConverter.getConverterFromDescription(HiveGroupConverter.java:34) at org.apache.hadoop.hive.ql.io.parquet.convert.DataWritableGroupConverter.init(DataWritableGroupConverter.java:64) at org.apache.hadoop.hive.ql.io.parquet.convert.DataWritableGroupConverter.init(DataWritableGroupConverter.java:47) at org.apache.hadoop.hive.ql.io.parquet.convert.HiveGroupConverter.getConverterFromDescription(HiveGroupConverter.java:36) at org.apache.hadoop.hive.ql.io.parquet.convert.DataWritableGroupConverter.init(DataWritableGroupConverter.java:64) at org.apache.hadoop.hive.ql.io.parquet.convert.DataWritableGroupConverter.init(DataWritableGroupConverter.java:40) at org.apache.hadoop.hive.ql.io.parquet.convert.DataWritableRecordConverter.init(DataWritableRecordConverter.java:32) at org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.prepareForRead(DataWritableReadSupport.java:128) at parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:142) at parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:118) at parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:107) at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.init(ParquetRecordReaderWrapper.java:92) at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.init(ParquetRecordReaderWrapper.java:66) at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:51) at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.init(CombineHiveRecordReader.java:65) ... 16 more {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-6617) Reduce ambiguity in grammar
[ https://issues.apache.org/jira/browse/HIVE-6617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-6617: -- Attachment: HIVE-6617.15.patch rebase the patch due to recent commit on trunk Reduce ambiguity in grammar --- Key: HIVE-6617 URL: https://issues.apache.org/jira/browse/HIVE-6617 Project: Hive Issue Type: Task Reporter: Ashutosh Chauhan Assignee: Pengcheng Xiong Attachments: HIVE-6617.01.patch, HIVE-6617.02.patch, HIVE-6617.03.patch, HIVE-6617.04.patch, HIVE-6617.05.patch, HIVE-6617.06.patch, HIVE-6617.07.patch, HIVE-6617.08.patch, HIVE-6617.09.patch, HIVE-6617.10.patch, HIVE-6617.11.patch, HIVE-6617.12.patch, HIVE-6617.13.patch, HIVE-6617.14.patch, HIVE-6617.15.patch CLEAR LIBRARY CACHE As of today, antlr reports 214 warnings. Need to bring down this number, ideally to 0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 30281: HIVE-9333: Move parquet serialize implementation to DataWritableWriter to improve write speeds
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30281/#review72398 --- Ship it! Ship It! - Ryan Blue On Feb. 11, 2015, 3:19 p.m., Sergio Pena wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30281/ --- (Updated Feb. 11, 2015, 3:19 p.m.) Review request for hive, Ryan Blue, cheng xu, and Dong Chen. Bugs: HIVE-9333 https://issues.apache.org/jira/browse/HIVE-9333 Repository: hive-git Description --- This patch moves the ParquetHiveSerDe.serialize() implementation to DataWritableWriter class in order to save time in materializing data on serialize(). Diffs - ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java ea4109d358f7c48d1e2042e5da299475de4a0a29 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java 9199127735533f9a324c5ef456786dda10766c46 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriteSupport.java 060b1b722d32f3b2f88304a1a73eb249e150294b ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java 1d83bf31a3dbcbaa68b3e75a72cec2ec67e7faa5 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/ParquetRecordWriterWrapper.java e52c4bc0b869b3e60cb4bfa9e11a09a0d605ac28 ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestDataWritableWriter.java a693aff18516d133abf0aae4847d3fe00b9f1c96 ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestMapredParquetOutputFormat.java 667d3671547190d363107019cd9a2d105d26d336 ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetSerDe.java 007a665529857bcec612f638a157aa5043562a15 serde/src/java/org/apache/hadoop/hive/serde2/io/ParquetHiveRecord.java PRE-CREATION Diff: https://reviews.apache.org/r/30281/diff/ Testing --- The tests run were the following: 1. JMH (Java microbenchmark) This benchmark called parquet serialize/write methods using text writable objects. Class.method Before Change (ops/s) After Change (ops/s) --- ParquetHiveSerDe.serialize: 19,113 249,528 - 19x speed increase DataWritableWriter.write: 5,033 5,201 - 3.34% speed increase 2. Write 20 million rows (~1GB file) from Text to Parquet I wrote a ~1Gb file in Textfile format, then convert it to a Parquet format using the following statement: CREATE TABLE parquet STORED AS parquet AS SELECT * FROM text; Time (s) it took to write the whole file BEFORE changes: 93.758 s Time (s) it took to write the whole file AFTER changes: 83.903 s It got a 10% of speed inscrease. Thanks, Sergio Pena
can you review HIVE-9617 UDF from_utc_timestamp throws NPE ...
UDF from_utc_timestamp throws NPE if the second argument is null https://issues.apache.org/jira/browse/HIVE-9617
[jira] [Commented] (HIVE-9607) Remove unnecessary attach-jdbc-driver execution from package/pom.xml
[ https://issues.apache.org/jira/browse/HIVE-9607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14320495#comment-14320495 ] Alexander Pivovarov commented on HIVE-9607: --- [~xuefuz] Can you commit it? Remove unnecessary attach-jdbc-driver execution from package/pom.xml Key: HIVE-9607 URL: https://issues.apache.org/jira/browse/HIVE-9607 Project: Hive Issue Type: Improvement Components: Build Infrastructure Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Priority: Minor Attachments: HIVE-9607.1.patch Looks like build-helper-maven-plugin block which has execution attach-jdbc-driver is not needed in package/pom.xml package/pom.xml has maven-dependency-plugin which copies hive-jdbc-standalone to project.build.directory I removed build-helper-maven-plugin block and rebuilt hive hive-jdbc-standalone.jar is still placed to project.build.directory {code} $ mvn clean install -Phadoop-2 -Pdist -DskipTests $ find . -name apache-hive*jdbc.jar -exec ls -la {} \; 16844023 Feb 6 17:45 ./packaging/target/apache-hive-1.2.0-SNAPSHOT-jdbc.jar $ find . -name hive-jdbc*standalone.jar -exec ls -la {} \; 16844023 Feb 6 17:45 ./packaging/target/apache-hive-1.2.0-SNAPSHOT-bin/apache-hive-1.2.0-SNAPSHOT-bin/lib/hive-jdbc-1.2.0-SNAPSHOT-standalone.jar 16844023 Feb 6 17:45 ./jdbc/target/hive-jdbc-1.2.0-SNAPSHOT-standalone.jar {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9673) Set operationhandle in ATS entities for lookups
[ https://issues.apache.org/jira/browse/HIVE-9673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-9673: - Issue Type: Improvement (was: Bug) Set operationhandle in ATS entities for lookups --- Key: HIVE-9673 URL: https://issues.apache.org/jira/browse/HIVE-9673 Project: Hive Issue Type: Improvement Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 1.2.0 Attachments: HIVE-9673.1.patch, HIVE-9673.2.patch Yarn App Timeline Server (ATS) users can find their query using hive query-id. However, query id is available only through the logs at the moment. Thrift api users such as Hue have another unique id for queries, which the operation handle contains (TExecuteStatementResp.TOperationHandle.THandleIdentifier.guid) . Adding the operationhandle guid to ATS will enable such thrift users to get information from ATS for the queries that they have spawned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9673) Set operationhandle in ATS entities for lookups
[ https://issues.apache.org/jira/browse/HIVE-9673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-9673: - Fix Version/s: 1.2.0 Set operationhandle in ATS entities for lookups --- Key: HIVE-9673 URL: https://issues.apache.org/jira/browse/HIVE-9673 Project: Hive Issue Type: Bug Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 1.2.0 Attachments: HIVE-9673.1.patch, HIVE-9673.2.patch Yarn App Timeline Server (ATS) users can find their query using hive query-id. However, query id is available only through the logs at the moment. Thrift api users such as Hue have another unique id for queries, which the operation handle contains (TExecuteStatementResp.TOperationHandle.THandleIdentifier.guid) . Adding the operationhandle guid to ATS will enable such thrift users to get information from ATS for the queries that they have spawned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9673) Set operationhandle in ATS entities for lookups
[ https://issues.apache.org/jira/browse/HIVE-9673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-9673: - Resolution: Fixed Status: Resolved (was: Patch Available) committed to trunk. thanks [~thejas]! Set operationhandle in ATS entities for lookups --- Key: HIVE-9673 URL: https://issues.apache.org/jira/browse/HIVE-9673 Project: Hive Issue Type: Bug Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-9673.1.patch, HIVE-9673.2.patch Yarn App Timeline Server (ATS) users can find their query using hive query-id. However, query id is available only through the logs at the moment. Thrift api users such as Hue have another unique id for queries, which the operation handle contains (TExecuteStatementResp.TOperationHandle.THandleIdentifier.guid) . Adding the operationhandle guid to ATS will enable such thrift users to get information from ATS for the queries that they have spawned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9645) Constant folding case NULL equality
[ https://issues.apache.org/jira/browse/HIVE-9645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14320484#comment-14320484 ] Hive QA commented on HIVE-9645: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12698690/HIVE-9645.1.patch {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 7542 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_windowing_navfn org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_vectorization_ppd org.apache.hadoop.hive.metastore.txn.TestCompactionTxnHandler.testRevokeTimedOutWorkers org.apache.hadoop.hive.thrift.TestHadoop20SAuthBridge.testMetastoreProxyUser org.apache.hadoop.hive.thrift.TestHadoop20SAuthBridge.testSaslWithHiveMetaStore {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2794/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2794/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2794/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12698690 - PreCommit-HIVE-TRUNK-Build Constant folding case NULL equality --- Key: HIVE-9645 URL: https://issues.apache.org/jira/browse/HIVE-9645 Project: Hive Issue Type: Bug Components: Logical Optimizer Affects Versions: 1.2.0 Reporter: Gopal V Assignee: Ashutosh Chauhan Attachments: HIVE-9645.1.patch, HIVE-9645.patch Hive logical optimizer does not follow the Null scan codepath when encountering a NULL = 1; NULL = 1 is not evaluated as false in the constant propogation implementation. {code} hive explain select count(1) from store_sales where null=1; ... TableScan alias: store_sales filterExpr: (null = 1) (type: boolean) Statistics: Num rows: 550076554 Data size: 49570324480 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (null = 1) (type: boolean) Statistics: Num rows: 275038277 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9683) Hive metastore thrift client connections hang indefinitely
[ https://issues.apache.org/jira/browse/HIVE-9683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14320492#comment-14320492 ] Vikram Dixit K commented on HIVE-9683: -- +1 for 1.0 branch. Hive metastore thrift client connections hang indefinitely -- Key: HIVE-9683 URL: https://issues.apache.org/jira/browse/HIVE-9683 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 1.0.0, 1.0.1 Reporter: Gopal V Assignee: Gopal V Priority: Minor Fix For: 1.0.1 Attachments: HIVE-9683.1.patch THRIFT-2788 fixed network-partition problems that affect Thrift client connections. Since hive-1.0 is on thrift-0.9.0 which is affected by the bug, a workaround can be applied to prevent indefinite connection hangs during net-splits. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9481) allow column list specification in INSERT statement
[ https://issues.apache.org/jira/browse/HIVE-9481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-9481: - Fix Version/s: 1.2.0 allow column list specification in INSERT statement --- Key: HIVE-9481 URL: https://issues.apache.org/jira/browse/HIVE-9481 Project: Hive Issue Type: Bug Components: Parser, Query Processor, SQL Affects Versions: 0.14.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Fix For: 1.2.0 Attachments: HIVE-9481.2.patch, HIVE-9481.4.patch, HIVE-9481.5.patch, HIVE-9481.6.patch, HIVE-9481.patch Given a table FOO(a int, b int, c int), ANSI SQL supports insert into FOO(c,b) select x,y from T. The expectation is that 'x' is written to column 'c' and 'y' is written column 'b' and 'a' is set to NULL, assuming column 'a' is NULLABLE. Hive does not support this. In Hive one has to ensure that the data producing statement has a schema that matches target table schema. Since Hive doesn't support DEFAULT value for columns in CREATE TABLE, when target schema is explicitly provided, missing columns will be set to NULL if they are NULLABLE, otherwise an error will be raised. If/when DEFAULT clause is supported, this can be enhanced to set default value rather than NULL. Thus, given {noformat} create table source (a int, b int); create table target (x int, y int, z int); create table target2 (x int, y int, z int); {noformat} {noformat}insert into target(y,z) select * from source;{noformat} will mean {noformat}insert into target select null as x, a, b from source;{noformat} and {noformat}insert into target(z,y) select * from source;{noformat} will meant {noformat}insert into target select null as x, b, a from source;{noformat} Also, {noformat} from source insert into target(y,z) select null as x, * insert into target2(y,z) select null as x, source.*; {noformat} and for partitioned tables, given {noformat} Given: CREATE TABLE pageviews (userid VARCHAR(64), link STRING, from STRING) PARTITIONED BY (datestamp STRING) CLUSTERED BY (userid) INTO 256 BUCKETS STORED AS ORC; INSERT INTO TABLE pageviews PARTITION (datestamp = '2014-09-23')(userid,link) VALUES ('jsmith', 'mail.com'); {noformat} And dynamic partitioning {noformat} INSERT INTO TABLE pageviews PARTITION (datestamp)(userid,datestamp,link) VALUES ('jsmith', '2014-09-23', 'mail.com'); {noformat} In all cases, the schema specification contains columns of the target table which are matched by position to the values produced by VALUES clause/SELECT statement. If the producer side provides values for a dynamic partition column, the column should be in the specified schema. Static partition values are part of the partition spec and thus are not produced by the producer and should not be part of the schema specification. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9481) allow column list specification in INSERT statement
[ https://issues.apache.org/jira/browse/HIVE-9481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-9481: - Resolution: Fixed Status: Resolved (was: Patch Available) allow column list specification in INSERT statement --- Key: HIVE-9481 URL: https://issues.apache.org/jira/browse/HIVE-9481 Project: Hive Issue Type: Bug Components: Parser, Query Processor, SQL Affects Versions: 0.14.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Attachments: HIVE-9481.2.patch, HIVE-9481.4.patch, HIVE-9481.5.patch, HIVE-9481.6.patch, HIVE-9481.patch Given a table FOO(a int, b int, c int), ANSI SQL supports insert into FOO(c,b) select x,y from T. The expectation is that 'x' is written to column 'c' and 'y' is written column 'b' and 'a' is set to NULL, assuming column 'a' is NULLABLE. Hive does not support this. In Hive one has to ensure that the data producing statement has a schema that matches target table schema. Since Hive doesn't support DEFAULT value for columns in CREATE TABLE, when target schema is explicitly provided, missing columns will be set to NULL if they are NULLABLE, otherwise an error will be raised. If/when DEFAULT clause is supported, this can be enhanced to set default value rather than NULL. Thus, given {noformat} create table source (a int, b int); create table target (x int, y int, z int); create table target2 (x int, y int, z int); {noformat} {noformat}insert into target(y,z) select * from source;{noformat} will mean {noformat}insert into target select null as x, a, b from source;{noformat} and {noformat}insert into target(z,y) select * from source;{noformat} will meant {noformat}insert into target select null as x, b, a from source;{noformat} Also, {noformat} from source insert into target(y,z) select null as x, * insert into target2(y,z) select null as x, source.*; {noformat} and for partitioned tables, given {noformat} Given: CREATE TABLE pageviews (userid VARCHAR(64), link STRING, from STRING) PARTITIONED BY (datestamp STRING) CLUSTERED BY (userid) INTO 256 BUCKETS STORED AS ORC; INSERT INTO TABLE pageviews PARTITION (datestamp = '2014-09-23')(userid,link) VALUES ('jsmith', 'mail.com'); {noformat} And dynamic partitioning {noformat} INSERT INTO TABLE pageviews PARTITION (datestamp)(userid,datestamp,link) VALUES ('jsmith', '2014-09-23', 'mail.com'); {noformat} In all cases, the schema specification contains columns of the target table which are matched by position to the values produced by VALUES clause/SELECT statement. If the producer side provides values for a dynamic partition column, the column should be in the specified schema. Static partition values are part of the partition spec and thus are not produced by the producer and should not be part of the schema specification. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9350) Add ability for HiveAuthorizer implementations to filter out results of 'show tables', 'show databases'
[ https://issues.apache.org/jira/browse/HIVE-9350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-9350: Attachment: HIVE-9350.5.patch Fix the classnotfound exception at runtime from perflogger. Add ability for HiveAuthorizer implementations to filter out results of 'show tables', 'show databases' --- Key: HIVE-9350 URL: https://issues.apache.org/jira/browse/HIVE-9350 Project: Hive Issue Type: Bug Components: Authorization Reporter: Thejas M Nair Assignee: Thejas M Nair Labels: TODOC1.2 Fix For: 1.2.0 Attachments: HIVE-9350.1.patch, HIVE-9350.2.patch, HIVE-9350.3.patch, HIVE-9350.4.patch, HIVE-9350.5.patch It should be possible for HiveAuthorizer implementations to control if a user is able to see a table or database in results of 'show tables' and 'show databases' respectively. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 30281: HIVE-9333: Move parquet serialize implementation to DataWritableWriter to improve write speeds
On Feb. 11, 2015, 11:40 p.m., Ryan Blue wrote: Thanks Ryan for your comments. I will add this changes in another JIRA as this one was already merged. I did not add a comment on JIRA to wait for the merge. - Sergio --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30281/#review72053 --- On Feb. 11, 2015, 11:19 p.m., Sergio Pena wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30281/ --- (Updated Feb. 11, 2015, 11:19 p.m.) Review request for hive, Ryan Blue, cheng xu, and Dong Chen. Bugs: HIVE-9333 https://issues.apache.org/jira/browse/HIVE-9333 Repository: hive-git Description --- This patch moves the ParquetHiveSerDe.serialize() implementation to DataWritableWriter class in order to save time in materializing data on serialize(). Diffs - ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java ea4109d358f7c48d1e2042e5da299475de4a0a29 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java 9199127735533f9a324c5ef456786dda10766c46 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriteSupport.java 060b1b722d32f3b2f88304a1a73eb249e150294b ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java 1d83bf31a3dbcbaa68b3e75a72cec2ec67e7faa5 ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/ParquetRecordWriterWrapper.java e52c4bc0b869b3e60cb4bfa9e11a09a0d605ac28 ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestDataWritableWriter.java a693aff18516d133abf0aae4847d3fe00b9f1c96 ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestMapredParquetOutputFormat.java 667d3671547190d363107019cd9a2d105d26d336 ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetSerDe.java 007a665529857bcec612f638a157aa5043562a15 serde/src/java/org/apache/hadoop/hive/serde2/io/ParquetHiveRecord.java PRE-CREATION Diff: https://reviews.apache.org/r/30281/diff/ Testing --- The tests run were the following: 1. JMH (Java microbenchmark) This benchmark called parquet serialize/write methods using text writable objects. Class.method Before Change (ops/s) After Change (ops/s) --- ParquetHiveSerDe.serialize: 19,113 249,528 - 19x speed increase DataWritableWriter.write: 5,033 5,201 - 3.34% speed increase 2. Write 20 million rows (~1GB file) from Text to Parquet I wrote a ~1Gb file in Textfile format, then convert it to a Parquet format using the following statement: CREATE TABLE parquet STORED AS parquet AS SELECT * FROM text; Time (s) it took to write the whole file BEFORE changes: 93.758 s Time (s) it took to write the whole file AFTER changes: 83.903 s It got a 10% of speed inscrease. Thanks, Sergio Pena
[jira] [Commented] (HIVE-9683) Hive metastore thrift client connections hang indefinitely
[ https://issues.apache.org/jira/browse/HIVE-9683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14320486#comment-14320486 ] Gunther Hagleitner commented on HIVE-9683: -- [~vikram.dixit] ok for 1.0 branch? Hive metastore thrift client connections hang indefinitely -- Key: HIVE-9683 URL: https://issues.apache.org/jira/browse/HIVE-9683 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 1.0.0, 1.0.1 Reporter: Gopal V Assignee: Gopal V Priority: Minor Fix For: 1.0.1 Attachments: HIVE-9683.1.patch THRIFT-2788 fixed network-partition problems that affect Thrift client connections. Since hive-1.0 is on thrift-0.9.0 which is affected by the bug, a workaround can be applied to prevent indefinite connection hangs during net-splits. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9481) allow column list specification in INSERT statement
[ https://issues.apache.org/jira/browse/HIVE-9481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14320522#comment-14320522 ] Eugene Koifman commented on HIVE-9481: -- Committed to trunk. Thanks [~alangates] for the review allow column list specification in INSERT statement --- Key: HIVE-9481 URL: https://issues.apache.org/jira/browse/HIVE-9481 Project: Hive Issue Type: Bug Components: Parser, Query Processor, SQL Affects Versions: 0.14.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Attachments: HIVE-9481.2.patch, HIVE-9481.4.patch, HIVE-9481.5.patch, HIVE-9481.6.patch, HIVE-9481.patch Given a table FOO(a int, b int, c int), ANSI SQL supports insert into FOO(c,b) select x,y from T. The expectation is that 'x' is written to column 'c' and 'y' is written column 'b' and 'a' is set to NULL, assuming column 'a' is NULLABLE. Hive does not support this. In Hive one has to ensure that the data producing statement has a schema that matches target table schema. Since Hive doesn't support DEFAULT value for columns in CREATE TABLE, when target schema is explicitly provided, missing columns will be set to NULL if they are NULLABLE, otherwise an error will be raised. If/when DEFAULT clause is supported, this can be enhanced to set default value rather than NULL. Thus, given {noformat} create table source (a int, b int); create table target (x int, y int, z int); create table target2 (x int, y int, z int); {noformat} {noformat}insert into target(y,z) select * from source;{noformat} will mean {noformat}insert into target select null as x, a, b from source;{noformat} and {noformat}insert into target(z,y) select * from source;{noformat} will meant {noformat}insert into target select null as x, b, a from source;{noformat} Also, {noformat} from source insert into target(y,z) select null as x, * insert into target2(y,z) select null as x, source.*; {noformat} and for partitioned tables, given {noformat} Given: CREATE TABLE pageviews (userid VARCHAR(64), link STRING, from STRING) PARTITIONED BY (datestamp STRING) CLUSTERED BY (userid) INTO 256 BUCKETS STORED AS ORC; INSERT INTO TABLE pageviews PARTITION (datestamp = '2014-09-23')(userid,link) VALUES ('jsmith', 'mail.com'); {noformat} And dynamic partitioning {noformat} INSERT INTO TABLE pageviews PARTITION (datestamp)(userid,datestamp,link) VALUES ('jsmith', '2014-09-23', 'mail.com'); {noformat} In all cases, the schema specification contains columns of the target table which are matched by position to the values produced by VALUES clause/SELECT statement. If the producer side provides values for a dynamic partition column, the column should be in the specified schema. Static partition values are part of the partition spec and thus are not produced by the producer and should not be part of the schema specification. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9605) Remove parquet nested objects from wrapper writable objects
[ https://issues.apache.org/jira/browse/HIVE-9605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-9605: --- Resolution: Fixed Fix Version/s: parquet-branch Status: Resolved (was: Patch Available) Committed to branch! Remove parquet nested objects from wrapper writable objects --- Key: HIVE-9605 URL: https://issues.apache.org/jira/browse/HIVE-9605 Project: Hive Issue Type: Sub-task Affects Versions: 0.14.0 Reporter: Sergio Peña Assignee: Sergio Peña Fix For: parquet-branch Attachments: HIVE-9605.3.patch, HIVE-9605.4.patch Parquet nested types are using an extra wrapper object (ArrayWritable) as a wrapper of map and list elements. This extra object is not needed and causing unnecessary memory allocations. An example of code is on HiveCollectionConverter.java: {noformat} public void end() { parent.set(index, wrapList(new ArrayWritable( Writable.class, list.toArray(new Writable[list.size()]; } {noformat} This object is later unwrapped on AbstractParquetMapInspector, i.e.: {noformat} final Writable[] mapContainer = ((ArrayWritable) data).get(); final Writable[] mapArray = ((ArrayWritable) mapContainer[0]).get(); for (final Writable obj : mapArray) { ... } {noformat} We should get rid of this wrapper object to save time and memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-6617) Reduce ambiguity in grammar
[ https://issues.apache.org/jira/browse/HIVE-6617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-6617: -- Status: Open (was: Patch Available) Reduce ambiguity in grammar --- Key: HIVE-6617 URL: https://issues.apache.org/jira/browse/HIVE-6617 Project: Hive Issue Type: Task Reporter: Ashutosh Chauhan Assignee: Pengcheng Xiong Attachments: HIVE-6617.01.patch, HIVE-6617.02.patch, HIVE-6617.03.patch, HIVE-6617.04.patch, HIVE-6617.05.patch, HIVE-6617.06.patch, HIVE-6617.07.patch, HIVE-6617.08.patch, HIVE-6617.09.patch, HIVE-6617.10.patch, HIVE-6617.11.patch, HIVE-6617.12.patch, HIVE-6617.13.patch, HIVE-6617.14.patch, HIVE-6617.15.patch CLEAR LIBRARY CACHE As of today, antlr reports 214 warnings. Need to bring down this number, ideally to 0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-6617) Reduce ambiguity in grammar
[ https://issues.apache.org/jira/browse/HIVE-6617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-6617: -- Status: Patch Available (was: Open) Reduce ambiguity in grammar --- Key: HIVE-6617 URL: https://issues.apache.org/jira/browse/HIVE-6617 Project: Hive Issue Type: Task Reporter: Ashutosh Chauhan Assignee: Pengcheng Xiong Attachments: HIVE-6617.01.patch, HIVE-6617.02.patch, HIVE-6617.03.patch, HIVE-6617.04.patch, HIVE-6617.05.patch, HIVE-6617.06.patch, HIVE-6617.07.patch, HIVE-6617.08.patch, HIVE-6617.09.patch, HIVE-6617.10.patch, HIVE-6617.11.patch, HIVE-6617.12.patch, HIVE-6617.13.patch, HIVE-6617.14.patch, HIVE-6617.15.patch CLEAR LIBRARY CACHE As of today, antlr reports 214 warnings. Need to bring down this number, ideally to 0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9684) Incorrect disk range computation in ORC because of optional stream kind
[ https://issues.apache.org/jira/browse/HIVE-9684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-9684: Attachment: HIVE-9684.1.patch The issue does not happen in trunk. But the check is required for forward compatibility. Incorrect disk range computation in ORC because of optional stream kind --- Key: HIVE-9684 URL: https://issues.apache.org/jira/browse/HIVE-9684 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 1.0.0, 1.1.0, 1.0.1 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Priority: Critical Attachments: HIVE-9684.1.patch, HIVE-9684.branch-1.0.patch, HIVE-9684.branch-1.1.patch HIVE-9593 changed all required fields in ORC protobuf message to optional field. But DiskRange computation and stream creation code assumes existence of stream kind everywhere. This leads to incorrect calculation of diskranges resulting in out of range exceptions. The proper fix is to check if stream kind exists using stream.hasKind() before adding the stream to disk range computation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9596) move standard getDisplayString impl to GenericUDF
[ https://issues.apache.org/jira/browse/HIVE-9596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-9596: - Resolution: Fixed Fix Version/s: 1.2.0 Status: Resolved (was: Patch Available) Thanks for cleaning that up, I've committed to trunk. move standard getDisplayString impl to GenericUDF - Key: HIVE-9596 URL: https://issues.apache.org/jira/browse/HIVE-9596 Project: Hive Issue Type: Improvement Components: UDF Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Priority: Minor Fix For: 1.2.0 Attachments: HIVE-9596.1.patch, HIVE-9596.2.patch, HIVE-9596.3.patch, HIVE-9596.4.patch 54 GenericUDF derived classes have very similar getDisplayString impl which returns fname(child1, child2, childn) instr() and locate() have bugs in their implementation (no comma btw children) Instead of having 54 implementations of the same method it's better to move standard implementation to the base class. affected UDF classes: {code} contrib/src/java/org/apache/hadoop/hive/contrib/genericudf/example/GenericUDFDBOutput.java itests/util/src/main/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFEvaluateNPE.java itests/util/src/main/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFTestGetJavaBoolean.java itests/util/src/main/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFTestGetJavaString.java itests/util/src/main/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFTestTranslate.java ql/src/java/org/apache/hadoop/hive/ql/udf/generic/AbstractGenericUDFEWAHBitmapBop.java ql/src/java/org/apache/hadoop/hive/ql/udf/generic/AbstractGenericUDFReflect.java ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.java ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFAbs.java ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFAddMonths.java ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFArray.java ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFAssertTrue.java ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFBaseNumeric.java ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFBasePad.java ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFBaseTrim.java ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFCoalesce.java ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFConcat.java ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFConcatWS.java ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFDate.java ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFDateAdd.java ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFDateDiff.java ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFDateSub.java ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFDecode.java ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFEWAHBitmapEmpty.java ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFElt.java ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFEncode.java ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFField.java ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFFloorCeilBase.java ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFFormatNumber.java ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFGreatest.java ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFHash.java ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFIf.java ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFInFile.java ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFInitCap.java ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFInstr.java ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFLastDay.java ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFLeadLag.java ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFLocate.java ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFLower.java ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFMacro.java ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFMapKeys.java ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFMapValues.java ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFNamedStruct.java ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFPower.java ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFPrintf.java ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFRound.java ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSentences.java ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSize.java ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSortArray.java
[jira] [Commented] (HIVE-9350) Add ability for HiveAuthorizer implementations to filter out results of 'show tables', 'show databases'
[ https://issues.apache.org/jira/browse/HIVE-9350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14320572#comment-14320572 ] Thejas M Nair commented on HIVE-9350: - Updated review board, but it also shows other changes from trunk as part of the diff. Here is the real change in updated patch - https://github.com/thejasmn/hive/commit/b35795441195825218cc32bda814ea7a9369435f Add ability for HiveAuthorizer implementations to filter out results of 'show tables', 'show databases' --- Key: HIVE-9350 URL: https://issues.apache.org/jira/browse/HIVE-9350 Project: Hive Issue Type: Bug Components: Authorization Reporter: Thejas M Nair Assignee: Thejas M Nair Labels: TODOC1.2 Fix For: 1.2.0 Attachments: HIVE-9350.1.patch, HIVE-9350.2.patch, HIVE-9350.3.patch, HIVE-9350.4.patch, HIVE-9350.5.patch It should be possible for HiveAuthorizer implementations to control if a user is able to see a table or database in results of 'show tables' and 'show databases' respectively. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-6069) Improve error message in GenericUDFRound
[ https://issues.apache.org/jira/browse/HIVE-6069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-6069: - Affects Version/s: 1.0.0 Improve error message in GenericUDFRound Key: HIVE-6069 URL: https://issues.apache.org/jira/browse/HIVE-6069 Project: Hive Issue Type: Bug Components: UDF Affects Versions: 1.0.0 Reporter: Xuefu Zhang Assignee: Alexander Pivovarov Priority: Trivial Fix For: 1.2.0 Attachments: HIVE-6069.1.patch Suggested in HIVE-6039 review board. https://reviews.apache.org/r/16329/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-6069) Improve error message in GenericUDFRound
[ https://issues.apache.org/jira/browse/HIVE-6069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-6069: - Resolution: Fixed Status: Resolved (was: Patch Available) Committed to trunk. Thanks [~apivovarov]! Improve error message in GenericUDFRound Key: HIVE-6069 URL: https://issues.apache.org/jira/browse/HIVE-6069 Project: Hive Issue Type: Bug Components: UDF Affects Versions: 1.0.0 Reporter: Xuefu Zhang Assignee: Alexander Pivovarov Priority: Trivial Attachments: HIVE-6069.1.patch Suggested in HIVE-6039 review board. https://reviews.apache.org/r/16329/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9686) HiveMetastore.logAuditEvent can be used before sasl server is started
[ https://issues.apache.org/jira/browse/HIVE-9686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-9686: --- Affects Version/s: 1.0.0 Status: Patch Available (was: Open) HiveMetastore.logAuditEvent can be used before sasl server is started - Key: HIVE-9686 URL: https://issues.apache.org/jira/browse/HIVE-9686 Project: Hive Issue Type: Bug Affects Versions: 1.0.0 Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-9686.patch Metastore listeners can use logAudit before the sasl server is started resulting in an NPE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9689) Store distinct value estimator's bit vectors in metastore
Prasanth Jayachandran created HIVE-9689: --- Summary: Store distinct value estimator's bit vectors in metastore Key: HIVE-9689 URL: https://issues.apache.org/jira/browse/HIVE-9689 Project: Hive Issue Type: New Feature Reporter: Prasanth Jayachandran Hive currently uses PCSA (Probabilistic Counting and Stochastic Averaging) algorithm to determine distinct cardinality. The NDV value determined from the UDF is stored in the metastore instead of the actual bit vectors. This makes it impossible to estimation the overall NDV across all the partition (or selected partitions). We should ideally store the bitvectors in the metastore and do server side merging of the bitvectors. Also we could replace the current PCSA algorithm in favour of HyperLogLog if space is a constraint. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9684) Incorrect disk range computation in ORC because of optional stream kind
[ https://issues.apache.org/jira/browse/HIVE-9684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14320625#comment-14320625 ] Prasanth Jayachandran commented on HIVE-9684: - Attached trunk patch as well. Incorrect disk range computation in ORC because of optional stream kind --- Key: HIVE-9684 URL: https://issues.apache.org/jira/browse/HIVE-9684 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 1.0.0, 1.1.0, 1.0.1 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Priority: Critical Attachments: HIVE-9684.1.patch, HIVE-9684.branch-1.0.patch, HIVE-9684.branch-1.1.patch HIVE-9593 changed all required fields in ORC protobuf message to optional field. But DiskRange computation and stream creation code assumes existence of stream kind everywhere. This leads to incorrect calculation of diskranges resulting in out of range exceptions. The proper fix is to check if stream kind exists using stream.hasKind() before adding the stream to disk range computation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9523) when columns on which tables are partitioned are used in the join condition same join optimizations as for bucketed tables should be applied
[ https://issues.apache.org/jira/browse/HIVE-9523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-9523: - Labels: gsoc2015 (was: ) when columns on which tables are partitioned are used in the join condition same join optimizations as for bucketed tables should be applied Key: HIVE-9523 URL: https://issues.apache.org/jira/browse/HIVE-9523 Project: Hive Issue Type: Improvement Components: Logical Optimizer, Physical Optimizer, SQL Affects Versions: 0.13.0, 0.14.0, 0.13.1 Reporter: Maciek Kocon Labels: gsoc2015 For JOIN conditions where partitioning criteria are used respectively: ⋮ FROM TabA JOIN TabB ON TabA.partCol1 = TabB.partCol2 AND TabA.partCol2 = TabB.partCol2 the optimizer could/should choose to treat it the same way as with bucketed tables: ⋮ FROM TabC JOIN TabD ON TabC.clusteredByCol1 = TabD.clusteredByCol2 AND TabC.clusteredByCol2 = TabD.clusteredByCol2 and use either Bucket Map Join or better, the Sort Merge Bucket Map Join. This is based on fact that same way as buckets translate to separate files, the partitions essentially provide the same mapping. When data locality is known the optimizer could focus only on joining corresponding partitions rather than whole data sets. #side notes: ⦿ Currently Table DDL Syntax where Partitioning and Bucketing defined at the same time is allowed: CREATE TABLE ⋮ PARTITIONED BY(…) CLUSTERED BY(…) INTO … BUCKETS; But in this case optimizer never chooses to use Bucket Map Join or Sort Merge Bucket Map Join which defeats the purpose of creating BUCKETed tables in such scenarios. Should that be raised as a separate BUG? ⦿ Currently partitioning and bucketing are two separate things but serve same purpose - shouldn't the concept be merged (explicit/implicit partitions?) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9684) Incorrect disk range computation in ORC because of optional stream kind
[ https://issues.apache.org/jira/browse/HIVE-9684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14320674#comment-14320674 ] Gopal V commented on HIVE-9684: --- LGTM +1. This needs the extra condition because unknown enum fields default to the first field (PRESENT). Incorrect disk range computation in ORC because of optional stream kind --- Key: HIVE-9684 URL: https://issues.apache.org/jira/browse/HIVE-9684 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 1.0.0, 1.1.0, 1.0.1 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Priority: Critical Attachments: HIVE-9684.1.patch, HIVE-9684.branch-1.0.patch, HIVE-9684.branch-1.1.patch HIVE-9593 changed all required fields in ORC protobuf message to optional field. But DiskRange computation and stream creation code assumes existence of stream kind everywhere. This leads to incorrect calculation of diskranges resulting in out of range exceptions. The proper fix is to check if stream kind exists using stream.hasKind() before adding the stream to disk range computation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9685) CLIService should create SessionState after logging into kerberos
[ https://issues.apache.org/jira/browse/HIVE-9685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14320720#comment-14320720 ] Xuefu Zhang commented on HIVE-9685: --- +1 CLIService should create SessionState after logging into kerberos - Key: HIVE-9685 URL: https://issues.apache.org/jira/browse/HIVE-9685 Project: Hive Issue Type: Bug Affects Versions: 1.1.0 Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-9685.patch {noformat} javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211) at org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94) at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271) at org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37) at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52) at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:409) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.init(HiveMetaStoreClient.java:230) at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.init(SessionHiveMetaStoreClient.java:74) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:408) at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1483) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.init(RetryingMetaStoreClient.java:64) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:74) at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2841) at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2860) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:453) at org.apache.hive.service.cli.CLIService.applyAuthorizationConfigPolicy(CLIService.java:123) at org.apache.hive.service.cli.CLIService.init(CLIService.java:81) at org.apache.hive.service.CompositeService.init(CompositeService.java:59) at org.apache.hive.service.server.HiveServer2.init(HiveServer2.java:92) at org.apache.hive.service.server.HiveServer2.startHiveServer2(HiveServer2.java:309) at org.apache.hive.service.server.HiveServer2.access$400(HiveServer2.java:68) at org.apache.hive.service.server.HiveServer2$StartOptionExecutor.execute(HiveServer2.java:523) at org.apache.hive.service.server.HiveServer2.main(HiveServer2.java:396) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 30575: HIVE-9350 : Add ability for HiveAuthorizer implementations to filter out results of 'show tables', 'show databases'
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30575/ --- (Updated Feb. 13, 2015, 7 p.m.) Review request for hive and Jason Dere. Changes --- Fix the classnotfound exception at runtime from perflogger. Bugs: HIVE-9350 https://issues.apache.org/jira/browse/HIVE-9350 Repository: hive-git Description --- https://issues.apache.org/jira/browse/HIVE-9350 Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 90bcc49 itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestFilterHooks.java cceac93 itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/security/authorization/plugin/TestHiveAuthorizerShowFilters.java PRE-CREATION metastore/src/java/org/apache/hadoop/hive/metastore/DefaultMetaStoreFilterHookImpl.java b723484 metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreFilterHook.java 51f63ad ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/AuthorizationMetaStoreFilterHook.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/HiveAccessControlException.java d877686 ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/HiveAuthorizationValidator.java 5a5b3d5 ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/HiveAuthorizer.java 1f1eba2 ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/HiveAuthorizerImpl.java e615049 ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/HiveV1Authorizer.java ac1cc47 ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/sqlstd/DummyHiveAuthorizationValidator.java cabc22a ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/sqlstd/SQLStdHiveAuthorizationValidator.java 0e093b0 ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java d4e5562 service/src/java/org/apache/hive/service/cli/CLIService.java 883bf9b Diff: https://reviews.apache.org/r/30575/diff/ Testing --- New unit tests. Thanks, Thejas Nair
[jira] [Commented] (HIVE-9350) Add ability for HiveAuthorizer implementations to filter out results of 'show tables', 'show databases'
[ https://issues.apache.org/jira/browse/HIVE-9350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14320597#comment-14320597 ] Jason Dere commented on HIVE-9350: -- +1 Add ability for HiveAuthorizer implementations to filter out results of 'show tables', 'show databases' --- Key: HIVE-9350 URL: https://issues.apache.org/jira/browse/HIVE-9350 Project: Hive Issue Type: Bug Components: Authorization Reporter: Thejas M Nair Assignee: Thejas M Nair Labels: TODOC1.2 Fix For: 1.2.0 Attachments: HIVE-9350.1.patch, HIVE-9350.2.patch, HIVE-9350.3.patch, HIVE-9350.4.patch, HIVE-9350.5.patch It should be possible for HiveAuthorizer implementations to control if a user is able to see a table or database in results of 'show tables' and 'show databases' respectively. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
review: HIVE-9619 Uninitialized read of numBitVectors in NumDistinctValueEstimator
Hi Everyone Can anyone review it? https://issues.apache.org/jira/browse/HIVE-9619 https://reviews.apache.org/r/30789/diff/#
[jira] [Updated] (HIVE-6069) Improve error message in GenericUDFRound
[ https://issues.apache.org/jira/browse/HIVE-6069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-6069: - Fix Version/s: 1.2.0 Improve error message in GenericUDFRound Key: HIVE-6069 URL: https://issues.apache.org/jira/browse/HIVE-6069 Project: Hive Issue Type: Bug Components: UDF Affects Versions: 1.0.0 Reporter: Xuefu Zhang Assignee: Alexander Pivovarov Priority: Trivial Fix For: 1.2.0 Attachments: HIVE-6069.1.patch Suggested in HIVE-6039 review board. https://reviews.apache.org/r/16329/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-6617) Reduce ambiguity in grammar
[ https://issues.apache.org/jira/browse/HIVE-6617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-6617: -- Status: Patch Available (was: Open) Reduce ambiguity in grammar --- Key: HIVE-6617 URL: https://issues.apache.org/jira/browse/HIVE-6617 Project: Hive Issue Type: Task Reporter: Ashutosh Chauhan Assignee: Pengcheng Xiong Attachments: HIVE-6617.01.patch, HIVE-6617.02.patch, HIVE-6617.03.patch, HIVE-6617.04.patch, HIVE-6617.05.patch, HIVE-6617.06.patch, HIVE-6617.07.patch, HIVE-6617.08.patch, HIVE-6617.09.patch, HIVE-6617.10.patch, HIVE-6617.11.patch, HIVE-6617.12.patch, HIVE-6617.13.patch, HIVE-6617.14.patch, HIVE-6617.15.patch CLEAR LIBRARY CACHE As of today, antlr reports 214 warnings. Need to bring down this number, ideally to 0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-6617) Reduce ambiguity in grammar
[ https://issues.apache.org/jira/browse/HIVE-6617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-6617: -- Attachment: (was: HIVE-6617.15.patch) Reduce ambiguity in grammar --- Key: HIVE-6617 URL: https://issues.apache.org/jira/browse/HIVE-6617 Project: Hive Issue Type: Task Reporter: Ashutosh Chauhan Assignee: Pengcheng Xiong Attachments: HIVE-6617.01.patch, HIVE-6617.02.patch, HIVE-6617.03.patch, HIVE-6617.04.patch, HIVE-6617.05.patch, HIVE-6617.06.patch, HIVE-6617.07.patch, HIVE-6617.08.patch, HIVE-6617.09.patch, HIVE-6617.10.patch, HIVE-6617.11.patch, HIVE-6617.12.patch, HIVE-6617.13.patch, HIVE-6617.14.patch, HIVE-6617.15.patch CLEAR LIBRARY CACHE As of today, antlr reports 214 warnings. Need to bring down this number, ideally to 0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9685) CLIService should create SessionState after logging into kerberos
[ https://issues.apache.org/jira/browse/HIVE-9685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-9685: --- Status: Patch Available (was: Open) CLIService should create SessionState after logging into kerberos - Key: HIVE-9685 URL: https://issues.apache.org/jira/browse/HIVE-9685 Project: Hive Issue Type: Bug Affects Versions: 1.1.0 Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-9685.patch {noformat} javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211) at org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94) at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271) at org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37) at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52) at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:409) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.init(HiveMetaStoreClient.java:230) at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.init(SessionHiveMetaStoreClient.java:74) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:408) at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1483) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.init(RetryingMetaStoreClient.java:64) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:74) at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2841) at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2860) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:453) at org.apache.hive.service.cli.CLIService.applyAuthorizationConfigPolicy(CLIService.java:123) at org.apache.hive.service.cli.CLIService.init(CLIService.java:81) at org.apache.hive.service.CompositeService.init(CompositeService.java:59) at org.apache.hive.service.server.HiveServer2.init(HiveServer2.java:92) at org.apache.hive.service.server.HiveServer2.startHiveServer2(HiveServer2.java:309) at org.apache.hive.service.server.HiveServer2.access$400(HiveServer2.java:68) at org.apache.hive.service.server.HiveServer2$StartOptionExecutor.execute(HiveServer2.java:523) at org.apache.hive.service.server.HiveServer2.main(HiveServer2.java:396) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9686) HiveMetastore.logAuditEvent can be used before sasl server is started
[ https://issues.apache.org/jira/browse/HIVE-9686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-9686: --- Attachment: HIVE-9686.patch HiveMetastore.logAuditEvent can be used before sasl server is started - Key: HIVE-9686 URL: https://issues.apache.org/jira/browse/HIVE-9686 Project: Hive Issue Type: Bug Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-9686.patch Metastore listeners can use logAudit before the sasl server is started resulting in an NPE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-6617) Reduce ambiguity in grammar
[ https://issues.apache.org/jira/browse/HIVE-6617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14320654#comment-14320654 ] Hive QA commented on HIVE-6617: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12698765/HIVE-6617.15.patch {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 7541 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_mult_tables_compact org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_vectorization_ppd org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_select_charliteral {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2795/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2795/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2795/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12698765 - PreCommit-HIVE-TRUNK-Build Reduce ambiguity in grammar --- Key: HIVE-6617 URL: https://issues.apache.org/jira/browse/HIVE-6617 Project: Hive Issue Type: Task Reporter: Ashutosh Chauhan Assignee: Pengcheng Xiong Attachments: HIVE-6617.01.patch, HIVE-6617.02.patch, HIVE-6617.03.patch, HIVE-6617.04.patch, HIVE-6617.05.patch, HIVE-6617.06.patch, HIVE-6617.07.patch, HIVE-6617.08.patch, HIVE-6617.09.patch, HIVE-6617.10.patch, HIVE-6617.11.patch, HIVE-6617.12.patch, HIVE-6617.13.patch, HIVE-6617.14.patch, HIVE-6617.15.patch CLEAR LIBRARY CACHE As of today, antlr reports 214 warnings. Need to bring down this number, ideally to 0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9686) HiveMetastore.logAuditEvent can be used before sasl server is started
[ https://issues.apache.org/jira/browse/HIVE-9686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14320691#comment-14320691 ] Xuefu Zhang commented on HIVE-9686: --- +1 HiveMetastore.logAuditEvent can be used before sasl server is started - Key: HIVE-9686 URL: https://issues.apache.org/jira/browse/HIVE-9686 Project: Hive Issue Type: Bug Affects Versions: 1.0.0 Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-9686.patch Metastore listeners can use logAudit before the sasl server is started resulting in an NPE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9691) Include a few more files include the source tarball
Brock Noland created HIVE-9691: -- Summary: Include a few more files include the source tarball Key: HIVE-9691 URL: https://issues.apache.org/jira/browse/HIVE-9691 Project: Hive Issue Type: Improvement Reporter: Brock Noland Assignee: Brock Noland Fix For: 1.1.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (HIVE-9692) Allocate only parquet selected columns in HiveStructConverter class
[ https://issues.apache.org/jira/browse/HIVE-9692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-9692 started by Sergio Peña. - Allocate only parquet selected columns in HiveStructConverter class --- Key: HIVE-9692 URL: https://issues.apache.org/jira/browse/HIVE-9692 Project: Hive Issue Type: Sub-task Reporter: Sergio Peña Assignee: Sergio Peña HiveStructConverter class is where Hive converts parquet objects to hive writable objects that will be later parsed by object inspectors. This class is allocating enough writable objects as number of columns of the file schema. {noformat} ublic HiveStructConverter(final GroupType requestedSchema, final GroupType tableSchema, MapString, String metadata) { ... this.writables = new Writable[fileSchema.getFieldCount()]; ... } {noformat} This is always allocated even if we only select a specific number of columns. Let's say 2 columns from a table of 50 columns. 50 objects are allocated. Only 2 are used, and 48 are unused. We should be able to allocate only the requested number of columns in order to save memory usage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9692) Allocate only parquet selected columns in HiveStructConverter class
Sergio Peña created HIVE-9692: - Summary: Allocate only parquet selected columns in HiveStructConverter class Key: HIVE-9692 URL: https://issues.apache.org/jira/browse/HIVE-9692 Project: Hive Issue Type: Sub-task Reporter: Sergio Peña Assignee: Sergio Peña HiveStructConverter class is where Hive converts parquet objects to hive writable objects that will be later parsed by object inspectors. This class is allocating enough writable objects as number of columns of the file schema. {noformat} ublic HiveStructConverter(final GroupType requestedSchema, final GroupType tableSchema, MapString, String metadata) { ... this.writables = new Writable[fileSchema.getFieldCount()]; ... } {noformat} This is always allocated even if we only select a specific number of columns. Let's say 2 columns from a table of 50 columns. 50 objects are allocated. Only 2 are used, and 48 are unused. We should be able to allocate only the requested number of columns in order to save memory usage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9666) Improve some qtests
[ https://issues.apache.org/jira/browse/HIVE-9666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14320593#comment-14320593 ] Xuefu Zhang commented on HIVE-9666: --- +1 to patch #2 also. Improve some qtests --- Key: HIVE-9666 URL: https://issues.apache.org/jira/browse/HIVE-9666 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Rui Li Priority: Minor Attachments: HIVE-9666.1.patch, HIVE-9666.2.patch {code} groupby7_noskew_multi_single_reducer.q groupby_multi_single_reducer3.q parallel_join0.q union3.q union4.q {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-6617) Reduce ambiguity in grammar
[ https://issues.apache.org/jira/browse/HIVE-6617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-6617: -- Attachment: HIVE-6617.15.patch Reduce ambiguity in grammar --- Key: HIVE-6617 URL: https://issues.apache.org/jira/browse/HIVE-6617 Project: Hive Issue Type: Task Reporter: Ashutosh Chauhan Assignee: Pengcheng Xiong Attachments: HIVE-6617.01.patch, HIVE-6617.02.patch, HIVE-6617.03.patch, HIVE-6617.04.patch, HIVE-6617.05.patch, HIVE-6617.06.patch, HIVE-6617.07.patch, HIVE-6617.08.patch, HIVE-6617.09.patch, HIVE-6617.10.patch, HIVE-6617.11.patch, HIVE-6617.12.patch, HIVE-6617.13.patch, HIVE-6617.14.patch, HIVE-6617.15.patch CLEAR LIBRARY CACHE As of today, antlr reports 214 warnings. Need to bring down this number, ideally to 0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-6617) Reduce ambiguity in grammar
[ https://issues.apache.org/jira/browse/HIVE-6617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-6617: -- Status: Open (was: Patch Available) Reduce ambiguity in grammar --- Key: HIVE-6617 URL: https://issues.apache.org/jira/browse/HIVE-6617 Project: Hive Issue Type: Task Reporter: Ashutosh Chauhan Assignee: Pengcheng Xiong Attachments: HIVE-6617.01.patch, HIVE-6617.02.patch, HIVE-6617.03.patch, HIVE-6617.04.patch, HIVE-6617.05.patch, HIVE-6617.06.patch, HIVE-6617.07.patch, HIVE-6617.08.patch, HIVE-6617.09.patch, HIVE-6617.10.patch, HIVE-6617.11.patch, HIVE-6617.12.patch, HIVE-6617.13.patch, HIVE-6617.14.patch, HIVE-6617.15.patch CLEAR LIBRARY CACHE As of today, antlr reports 214 warnings. Need to bring down this number, ideally to 0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9687) Blink DB style approximate querying in hive
Vikram Dixit K created HIVE-9687: Summary: Blink DB style approximate querying in hive Key: HIVE-9687 URL: https://issues.apache.org/jira/browse/HIVE-9687 Project: Hive Issue Type: New Feature Reporter: Vikram Dixit K http://www.cs.berkeley.edu/~sameerag/blinkdb_eurosys13.pdf There are various pieces here that need to be thought through and implemented. For e.g. sampling offline, run-time sampling selection module etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9688) Support SAMPLE operator in hive
Prasanth Jayachandran created HIVE-9688: --- Summary: Support SAMPLE operator in hive Key: HIVE-9688 URL: https://issues.apache.org/jira/browse/HIVE-9688 Project: Hive Issue Type: New Feature Reporter: Prasanth Jayachandran Hive needs SAMPLE operator to support parallel order by, skew joins and count + distinct optimizations. Random, Reservoir and Stratified sampling should cover most of the cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9691) Include a few more files include the source tarball
[ https://issues.apache.org/jira/browse/HIVE-9691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-9691: --- Attachment: HIVE-9691.patch Include a few more files include the source tarball --- Key: HIVE-9691 URL: https://issues.apache.org/jira/browse/HIVE-9691 Project: Hive Issue Type: Improvement Reporter: Brock Noland Assignee: Brock Noland Fix For: 1.1.0 Attachments: HIVE-9691.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9659) 'Error while trying to create table container' occurs during hive query case execution when hive.optimize.skewjoin set to 'true' [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14320906#comment-14320906 ] Jimmy Xiang commented on HIVE-9659: --- How big is the data set? Does it work with a small data set? 'Error while trying to create table container' occurs during hive query case execution when hive.optimize.skewjoin set to 'true' [Spark Branch] --- Key: HIVE-9659 URL: https://issues.apache.org/jira/browse/HIVE-9659 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xin Hao We found that 'Error while trying to create table container' occurs during Big-Bench Q12 case execution when hive.optimize.skewjoin set to 'true'. If hive.optimize.skewjoin set to 'false', the case could pass. How to reproduce: 1. set hive.optimize.skewjoin=true; 2. Run BigBench case Q12 and it will fail. Check the executor log (e.g. /usr/lib/spark/work/app-/2/stderr) and you will found error 'Error while trying to create table container' in the log and also a NullPointerException near the end of the log. (a) Detail error message for 'Error while trying to create table container': {noformat} 15/02/12 01:29:49 ERROR SparkMapRecordHandler: Error processing row: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to create table container org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to create table container at org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:118) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:193) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:219) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1051) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:486) at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:141) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:47) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:27) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:98) at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:217) at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to create table container at org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe.load(MapJoinTableContainerSerDe.java:158) at org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:115) ... 21 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error, not a directory: hdfs://bhx1:8020/tmp/hive/root/d22ef465-bff5-4edb-a822-0a9f1c25b66c/hive_2015-02-12_01-28-10_008_6897031694580088767-1/-mr-10009/HashTable-Stage-6/MapJoin-mapfile01--.hashtable at org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe.load(MapJoinTableContainerSerDe.java:106) ... 22 more 15/02/12 01:29:49 INFO SparkRecordHandler: maximum memory = 40939028480 15/02/12 01:29:49 INFO PerfLogger: PERFLOG method=SparkInitializeOperators from=org.apache.hadoop.hive.ql.exec.spark.SparkRecordHandler {noformat} (b) Detail error message for NullPointerException: {noformat} 5/02/12 01:29:50 ERROR
[jira] [Updated] (HIVE-9691) Include a few more files include the source tarball
[ https://issues.apache.org/jira/browse/HIVE-9691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-9691: --- Status: Patch Available (was: Open) Include a few more files include the source tarball --- Key: HIVE-9691 URL: https://issues.apache.org/jira/browse/HIVE-9691 Project: Hive Issue Type: Improvement Reporter: Brock Noland Assignee: Brock Noland Fix For: 1.1.0 Attachments: HIVE-9691.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9691) Include a few more files include the source tarball
[ https://issues.apache.org/jira/browse/HIVE-9691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14320881#comment-14320881 ] Chao commented on HIVE-9691: +1 Include a few more files include the source tarball --- Key: HIVE-9691 URL: https://issues.apache.org/jira/browse/HIVE-9691 Project: Hive Issue Type: Improvement Reporter: Brock Noland Assignee: Brock Noland Fix For: 1.1.0 Attachments: HIVE-9691.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9684) Incorrect disk range computation in ORC because of optional stream kind
[ https://issues.apache.org/jira/browse/HIVE-9684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14320930#comment-14320930 ] Hive QA commented on HIVE-9684: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12698799/HIVE-9684.1.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 7548 tests executed *Failed tests:* {noformat} org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchAbort {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2797/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2797/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2797/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12698799 - PreCommit-HIVE-TRUNK-Build Incorrect disk range computation in ORC because of optional stream kind --- Key: HIVE-9684 URL: https://issues.apache.org/jira/browse/HIVE-9684 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 1.0.0, 1.1.0, 1.0.1 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Priority: Critical Attachments: HIVE-9684.1.patch, HIVE-9684.branch-1.0.patch, HIVE-9684.branch-1.1.patch HIVE-9593 changed all required fields in ORC protobuf message to optional field. But DiskRange computation and stream creation code assumes existence of stream kind everywhere. This leads to incorrect calculation of diskranges resulting in out of range exceptions. The proper fix is to check if stream kind exists using stream.hasKind() before adding the stream to disk range computation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9690) Allow non-numeric arithmetic operations
Jason Dere created HIVE-9690: Summary: Allow non-numeric arithmetic operations Key: HIVE-9690 URL: https://issues.apache.org/jira/browse/HIVE-9690 Project: Hive Issue Type: Bug Components: UDF Reporter: Jason Dere Assignee: Jason Dere Some refactoring for HIVE-5021. The current arithmetic UDFs are specialized for numeric types, and trying to change the logic in the existing UDFs looks a bit complicated. A less intrusive fix would be to create the date-time/interval arithmetic UDFs as a separate UDF class, and to make the plus/minus UDFs act as a wrapper which would invoke the numeric or interval arithmetic UDF depending on the args. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-6617) Reduce ambiguity in grammar
[ https://issues.apache.org/jira/browse/HIVE-6617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14320792#comment-14320792 ] Hive QA commented on HIVE-6617: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12698792/HIVE-6617.15.patch {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 7549 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_mult_tables org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_vectorization_ppd org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_select_charliteral {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2796/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2796/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2796/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12698792 - PreCommit-HIVE-TRUNK-Build Reduce ambiguity in grammar --- Key: HIVE-6617 URL: https://issues.apache.org/jira/browse/HIVE-6617 Project: Hive Issue Type: Task Reporter: Ashutosh Chauhan Assignee: Pengcheng Xiong Attachments: HIVE-6617.01.patch, HIVE-6617.02.patch, HIVE-6617.03.patch, HIVE-6617.04.patch, HIVE-6617.05.patch, HIVE-6617.06.patch, HIVE-6617.07.patch, HIVE-6617.08.patch, HIVE-6617.09.patch, HIVE-6617.10.patch, HIVE-6617.11.patch, HIVE-6617.12.patch, HIVE-6617.13.patch, HIVE-6617.14.patch, HIVE-6617.15.patch CLEAR LIBRARY CACHE As of today, antlr reports 214 warnings. Need to bring down this number, ideally to 0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-6617) Reduce ambiguity in grammar
[ https://issues.apache.org/jira/browse/HIVE-6617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-6617: -- Attachment: HIVE-6617.16.patch update more golden files Reduce ambiguity in grammar --- Key: HIVE-6617 URL: https://issues.apache.org/jira/browse/HIVE-6617 Project: Hive Issue Type: Task Reporter: Ashutosh Chauhan Assignee: Pengcheng Xiong Attachments: HIVE-6617.01.patch, HIVE-6617.02.patch, HIVE-6617.03.patch, HIVE-6617.04.patch, HIVE-6617.05.patch, HIVE-6617.06.patch, HIVE-6617.07.patch, HIVE-6617.08.patch, HIVE-6617.09.patch, HIVE-6617.10.patch, HIVE-6617.11.patch, HIVE-6617.12.patch, HIVE-6617.13.patch, HIVE-6617.14.patch, HIVE-6617.15.patch, HIVE-6617.16.patch CLEAR LIBRARY CACHE As of today, antlr reports 214 warnings. Need to bring down this number, ideally to 0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-6617) Reduce ambiguity in grammar
[ https://issues.apache.org/jira/browse/HIVE-6617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-6617: -- Status: Open (was: Patch Available) Reduce ambiguity in grammar --- Key: HIVE-6617 URL: https://issues.apache.org/jira/browse/HIVE-6617 Project: Hive Issue Type: Task Reporter: Ashutosh Chauhan Assignee: Pengcheng Xiong Attachments: HIVE-6617.01.patch, HIVE-6617.02.patch, HIVE-6617.03.patch, HIVE-6617.04.patch, HIVE-6617.05.patch, HIVE-6617.06.patch, HIVE-6617.07.patch, HIVE-6617.08.patch, HIVE-6617.09.patch, HIVE-6617.10.patch, HIVE-6617.11.patch, HIVE-6617.12.patch, HIVE-6617.13.patch, HIVE-6617.14.patch, HIVE-6617.15.patch, HIVE-6617.16.patch CLEAR LIBRARY CACHE As of today, antlr reports 214 warnings. Need to bring down this number, ideally to 0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-6617) Reduce ambiguity in grammar
[ https://issues.apache.org/jira/browse/HIVE-6617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-6617: -- Status: Patch Available (was: Open) Reduce ambiguity in grammar --- Key: HIVE-6617 URL: https://issues.apache.org/jira/browse/HIVE-6617 Project: Hive Issue Type: Task Reporter: Ashutosh Chauhan Assignee: Pengcheng Xiong Attachments: HIVE-6617.01.patch, HIVE-6617.02.patch, HIVE-6617.03.patch, HIVE-6617.04.patch, HIVE-6617.05.patch, HIVE-6617.06.patch, HIVE-6617.07.patch, HIVE-6617.08.patch, HIVE-6617.09.patch, HIVE-6617.10.patch, HIVE-6617.11.patch, HIVE-6617.12.patch, HIVE-6617.13.patch, HIVE-6617.14.patch, HIVE-6617.15.patch, HIVE-6617.16.patch CLEAR LIBRARY CACHE As of today, antlr reports 214 warnings. Need to bring down this number, ideally to 0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Review Request 31033: HIVE-9690 Refactoring for non-numeric arithmetic operations
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31033/ --- Review request for hive. Bugs: HIVE-9690 https://issues.apache.org/jira/browse/HIVE-9690 Repository: hive-git Description --- Moves GenericUDFOPPlus/GenericUDFOPMinus to GenericUDFOPNumericPlus/GenericUDFOPNumericMinus and adds new GenericUDFOPPlus/GenericUDFOPMinus as wrapper UDFs. Keeps the vectorization annotations in GenericUDFOPPlus/GenericUDFOPMinus. Diffs - ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFBaseArithmetic.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFBaseBinary.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFBaseCompare.java 5c00d36 ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFBaseNumeric.java 1daf57e ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFOPMinus.java 7e225ff ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFOPNumericMinus.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFOPNumericPlus.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFOPPlus.java 2721e6b Diff: https://reviews.apache.org/r/31033/diff/ Testing --- Thanks, Jason Dere
[jira] [Commented] (HIVE-9619) Uninitialized read of numBitVectors in NumDistinctValueEstimator
[ https://issues.apache.org/jira/browse/HIVE-9619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321057#comment-14321057 ] Gopal V commented on HIVE-9619: --- LGTM - +1. Left a minor comment on the RB, but it is not related to any changes, but just as a note. Uninitialized read of numBitVectors in NumDistinctValueEstimator Key: HIVE-9619 URL: https://issues.apache.org/jira/browse/HIVE-9619 Project: Hive Issue Type: Bug Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Priority: Minor Attachments: HIVE-9619.1.patch, HIVE-9619.2.patch {code} private int numBitVectors; // Refer to Flajolet-Martin'86 for the value of phi private final double phi = 0.77351; private int[] a; private int[] b; // Uninitialized read of numBitVectors private FastBitSet[] bitVector = new FastBitSet[numBitVectors]; {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9686) HiveMetastore.logAuditEvent can be used before sasl server is started
[ https://issues.apache.org/jira/browse/HIVE-9686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321055#comment-14321055 ] Hive QA commented on HIVE-9686: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12698802/HIVE-9686.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 7535 tests executed *Failed tests:* {noformat} TestCliDriver-skewjoin_mapjoin11.q-udf_least.q-join4.q-and-12-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.thrift.TestHadoop20SAuthBridge.testSaslWithHiveMetaStore {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2798/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2798/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2798/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12698802 - PreCommit-HIVE-TRUNK-Build HiveMetastore.logAuditEvent can be used before sasl server is started - Key: HIVE-9686 URL: https://issues.apache.org/jira/browse/HIVE-9686 Project: Hive Issue Type: Bug Affects Versions: 1.0.0 Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-9686.patch Metastore listeners can use logAudit before the sasl server is started resulting in an NPE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9693) Introduce a stats cache for metastore
Vaibhav Gumashta created HIVE-9693: -- Summary: Introduce a stats cache for metastore Key: HIVE-9693 URL: https://issues.apache.org/jira/browse/HIVE-9693 Project: Hive Issue Type: Bug Components: Metastore Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9693) Introduce a stats cache for metastore
[ https://issues.apache.org/jira/browse/HIVE-9693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-9693: --- Issue Type: Sub-task (was: Bug) Parent: HIVE-9452 Introduce a stats cache for metastore - Key: HIVE-9693 URL: https://issues.apache.org/jira/browse/HIVE-9693 Project: Hive Issue Type: Sub-task Components: Metastore Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9693) Introduce a stats cache for HBase metastore
[ https://issues.apache.org/jira/browse/HIVE-9693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-9693: --- Summary: Introduce a stats cache for HBase metastore (was: Introduce a stats cache for metastore) Introduce a stats cache for HBase metastore --- Key: HIVE-9693 URL: https://issues.apache.org/jira/browse/HIVE-9693 Project: Hive Issue Type: Sub-task Components: Metastore Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9693) Introduce a stats cache for HBase metastore [hbase-metastore branch]
[ https://issues.apache.org/jira/browse/HIVE-9693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-9693: --- Summary: Introduce a stats cache for HBase metastore [hbase-metastore branch] (was: Introduce a stats cache for HBase metastore) Introduce a stats cache for HBase metastore [hbase-metastore branch] - Key: HIVE-9693 URL: https://issues.apache.org/jira/browse/HIVE-9693 Project: Hive Issue Type: Sub-task Components: Metastore Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9659) 'Error while trying to create table container' occurs during hive query case execution when hive.optimize.skewjoin set to 'true' [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321013#comment-14321013 ] Jimmy Xiang commented on HIVE-9659: --- I can reproduce this issue with a tiny data set. 'Error while trying to create table container' occurs during hive query case execution when hive.optimize.skewjoin set to 'true' [Spark Branch] --- Key: HIVE-9659 URL: https://issues.apache.org/jira/browse/HIVE-9659 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xin Hao We found that 'Error while trying to create table container' occurs during Big-Bench Q12 case execution when hive.optimize.skewjoin set to 'true'. If hive.optimize.skewjoin set to 'false', the case could pass. How to reproduce: 1. set hive.optimize.skewjoin=true; 2. Run BigBench case Q12 and it will fail. Check the executor log (e.g. /usr/lib/spark/work/app-/2/stderr) and you will found error 'Error while trying to create table container' in the log and also a NullPointerException near the end of the log. (a) Detail error message for 'Error while trying to create table container': {noformat} 15/02/12 01:29:49 ERROR SparkMapRecordHandler: Error processing row: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to create table container org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to create table container at org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:118) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:193) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:219) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1051) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:486) at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:141) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:47) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:27) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:98) at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:217) at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to create table container at org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe.load(MapJoinTableContainerSerDe.java:158) at org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:115) ... 21 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error, not a directory: hdfs://bhx1:8020/tmp/hive/root/d22ef465-bff5-4edb-a822-0a9f1c25b66c/hive_2015-02-12_01-28-10_008_6897031694580088767-1/-mr-10009/HashTable-Stage-6/MapJoin-mapfile01--.hashtable at org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe.load(MapJoinTableContainerSerDe.java:106) ... 22 more 15/02/12 01:29:49 INFO SparkRecordHandler: maximum memory = 40939028480 15/02/12 01:29:49 INFO PerfLogger: PERFLOG method=SparkInitializeOperators from=org.apache.hadoop.hive.ql.exec.spark.SparkRecordHandler {noformat} (b) Detail error message for NullPointerException: {noformat} 5/02/12 01:29:50 ERROR MapJoinOperator: