[jira] [Updated] (HIVE-8458) Potential null dereference in Utilities#clearWork()
[ https://issues.apache.org/jira/browse/HIVE-8458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HIVE-8458: - Description: {code} Path mapPath = getPlanPath(conf, MAP_PLAN_NAME); Path reducePath = getPlanPath(conf, REDUCE_PLAN_NAME); // if the plan path hasn't been initialized just return, nothing to clean. if (mapPath == null reducePath == null) { return; } try { FileSystem fs = mapPath.getFileSystem(conf); {code} If mapPath is null but reducePath is not null, getFileSystem() call would produce NPE was: {code} Path mapPath = getPlanPath(conf, MAP_PLAN_NAME); Path reducePath = getPlanPath(conf, REDUCE_PLAN_NAME); // if the plan path hasn't been initialized just return, nothing to clean. if (mapPath == null reducePath == null) { return; } try { FileSystem fs = mapPath.getFileSystem(conf); {code} If mapPath is null but reducePath is not null, getFileSystem() call would produce NPE Potential null dereference in Utilities#clearWork() --- Key: HIVE-8458 URL: https://issues.apache.org/jira/browse/HIVE-8458 Project: Hive Issue Type: Bug Affects Versions: 0.13.1 Reporter: Ted Yu Assignee: skrho Priority: Minor Attachments: HIVE-8458_001.patch {code} Path mapPath = getPlanPath(conf, MAP_PLAN_NAME); Path reducePath = getPlanPath(conf, REDUCE_PLAN_NAME); // if the plan path hasn't been initialized just return, nothing to clean. if (mapPath == null reducePath == null) { return; } try { FileSystem fs = mapPath.getFileSystem(conf); {code} If mapPath is null but reducePath is not null, getFileSystem() call would produce NPE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10454) Query against partitioned table in strict mode failed with No partition predicate found even if partition predicate is specified.
[ https://issues.apache.org/jira/browse/HIVE-10454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509254#comment-14509254 ] Aihua Xu commented on HIVE-10454: - That condition doesn't check how many partitions will be involved. It's just reminding you that you need to provide predicates. We will always have such issue with nondeterministic UDF like unix_timstamp(), even with the query like: select * from t1 where t1.c2 = to_date(date_add(from_unixtime( unix_timestamp() ),1)); For predicate with nondeterministic UDF, the predicate won't be pushed down to TableScanOperator, but currently we only check if TableScanOperator has predicate. So we need not only check if TableScanOperator has predicates but also the child ops (e.g., FilterOperator) to determine if the table has predicate. Query against partitioned table in strict mode failed with No partition predicate found even if partition predicate is specified. --- Key: HIVE-10454 URL: https://issues.apache.org/jira/browse/HIVE-10454 Project: Hive Issue Type: Bug Reporter: Aihua Xu Assignee: Aihua Xu The following queries fail: {noformat} create table t1 (c1 int) PARTITIONED BY (c2 string); set hive.mapred.mode=strict; select * from t1 where t1.c2 to_date(date_add(from_unixtime( unix_timestamp() ),1)); {noformat} The query failed with No partition predicate found for alias t1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10462) CBO (Calcite Return Path): Exception thrown in conversion to MapJoin
[ https://issues.apache.org/jira/browse/HIVE-10462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509251#comment-14509251 ] Hive QA commented on HIVE-10462: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12727612/HIVE-10462.patch {color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 8728 tests executed *Failed tests:* {noformat} TestMinimrCliDriver-bucketmapjoin6.q-constprog_partitioner.q-infer_bucket_sort_dyn_part.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-external_table_with_space_in_location_path.q-infer_bucket_sort_merge.q-auto_sortmerge_join_16.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-groupby2.q-import_exported_table.q-bucketizedhiveinputformat.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-index_bitmap3.q-stats_counter_partitioned.q-temp_table_external.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-infer_bucket_sort_map_operators.q-join1.q-bucketmapjoin7.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-infer_bucket_sort_num_buckets.q-disable_merge_for_bucketing.q-uber_reduce.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-infer_bucket_sort_reducers_power_two.q-scriptfile1.q-scriptfile1_win.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-leftsemijoin_mr.q-load_hdfs_file_with_space_in_the_name.q-root_dir_external_table.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-list_bucket_dml_10.q-bucket_num_reducers.q-bucket6.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-load_fs2.q-file_with_header_footer.q-ql_rewrite_gbtoidx_cbo_1.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-parallel_orderby.q-reduce_deduplicate.q-ql_rewrite_gbtoidx_cbo_2.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-ql_rewrite_gbtoidx.q-smb_mapjoin_8.q - did not produce a TEST-*.xml file TestMinimrCliDriver-schemeAuthority2.q-bucket4.q-input16_cc.q-and-1-more - did not produce a TEST-*.xml file {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3544/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3544/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3544/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 13 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12727612 - PreCommit-HIVE-TRUNK-Build CBO (Calcite Return Path): Exception thrown in conversion to MapJoin Key: HIVE-10462 URL: https://issues.apache.org/jira/browse/HIVE-10462 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Fix For: 1.2.0 Attachments: HIVE-10462.patch When the return path is on, the mapjoin conversion optimization fails as some DS in the Join descriptor have not been initialized properly. The failure can be reproduced with auto_join4.q. In particular, the following Exception is thrown: {noformat} org.apache.hadoop.hive.ql.parse.SemanticException: Generate Map Join Task Error: null at org.apache.hadoop.hive.ql.optimizer.physical.CommonJoinTaskDispatcher.processCurrentTask(CommonJoinTaskDispatcher.java:516) at org.apache.hadoop.hive.ql.optimizer.physical.AbstractJoinTaskDispatcher.dispatch(AbstractJoinTaskDispatcher.java:179) at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(TaskGraphWalker.java:111) at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(TaskGraphWalker.java:180) at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(TaskGraphWalker.java:125) at org.apache.hadoop.hive.ql.optimizer.physical.CommonJoinResolver.resolve(CommonJoinResolver.java:79) at org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:107) at org.apache.hadoop.hive.ql.parse.MapReduceCompiler.optimizeTaskPlan(MapReduceCompiler.java:270) at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:227) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10084) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:203)
[jira] [Assigned] (HIVE-10463) CBO (Calcite Return Path): Insert overwrite... select * from... queries failing for bucketed tables
[ https://issues.apache.org/jira/browse/HIVE-10463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez reassigned HIVE-10463: -- Assignee: Jesus Camacho Rodriguez CBO (Calcite Return Path): Insert overwrite... select * from... queries failing for bucketed tables --- Key: HIVE-10463 URL: https://issues.apache.org/jira/browse/HIVE-10463 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Fix For: 1.2.0 When return path is on. To reproduce the Exception, take the following excerpt from auto_sortmerge_join_10.q: {noformat} set hive.enforce.bucketing = true; set hive.enforce.sorting = true; set hive.exec.reducers.max = 1; CREATE TABLE tbl1(key int, value string) CLUSTERED BY (key) SORTED BY (key) INTO 2 BUCKETS; insert overwrite table tbl1 select * from src where key 10; {noformat} It produces the following Exception: {noformat} java.lang.Exception: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529) Caused by: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:409) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 10 more Caused by: java.lang.RuntimeException: Reduce operator initialization failed at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:157) ... 14 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: cannot find field key from [0:_col0, 1:_col1] at org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:446) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:362) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:150) ... 14 more Caused by: java.lang.RuntimeException: cannot find field key from [0:_col0, 1:_col1] at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:416) at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147) at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:55) at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:978) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:383) ... 22 more {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10463) CBO (Calcite Return Path): Insert overwrite... select * from... queries failing for bucketed tables
[ https://issues.apache.org/jira/browse/HIVE-10463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-10463: --- Assignee: (was: Jesus Camacho Rodriguez) CBO (Calcite Return Path): Insert overwrite... select * from... queries failing for bucketed tables --- Key: HIVE-10463 URL: https://issues.apache.org/jira/browse/HIVE-10463 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Jesus Camacho Rodriguez Fix For: 1.2.0 When return path is on. To reproduce the Exception, take the following excerpt from auto_sortmerge_join_10.q: {noformat} set hive.enforce.bucketing = true; set hive.enforce.sorting = true; set hive.exec.reducers.max = 1; CREATE TABLE tbl1(key int, value string) CLUSTERED BY (key) SORTED BY (key) INTO 2 BUCKETS; insert overwrite table tbl1 select * from src where key 10; {noformat} It produces the following Exception: {noformat} java.lang.Exception: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529) Caused by: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:409) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 10 more Caused by: java.lang.RuntimeException: Reduce operator initialization failed at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:157) ... 14 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: cannot find field key from [0:_col0, 1:_col1] at org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:446) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:362) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:150) ... 14 more Caused by: java.lang.RuntimeException: cannot find field key from [0:_col0, 1:_col1] at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:416) at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147) at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:55) at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:978) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:383) ... 22 more {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10463) CBO (Calcite Return Path): Insert overwrite... select * from... queries failing for bucketed tables
[ https://issues.apache.org/jira/browse/HIVE-10463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-10463: --- Assignee: (was: Jesus Camacho Rodriguez) CBO (Calcite Return Path): Insert overwrite... select * from... queries failing for bucketed tables --- Key: HIVE-10463 URL: https://issues.apache.org/jira/browse/HIVE-10463 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Jesus Camacho Rodriguez Fix For: 1.2.0 When return path is on. To reproduce the Exception, take the following excerpt from auto_sortmerge_join_10.q: {noformat} set hive.enforce.bucketing = true; set hive.enforce.sorting = true; set hive.exec.reducers.max = 1; CREATE TABLE tbl1(key int, value string) CLUSTERED BY (key) SORTED BY (key) INTO 2 BUCKETS; insert overwrite table tbl1 select * from src where key 10; {noformat} It produces the following Exception: {noformat} java.lang.Exception: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529) Caused by: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:409) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 10 more Caused by: java.lang.RuntimeException: Reduce operator initialization failed at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:157) ... 14 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: cannot find field key from [0:_col0, 1:_col1] at org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:446) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:362) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:150) ... 14 more Caused by: java.lang.RuntimeException: cannot find field key from [0:_col0, 1:_col1] at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:416) at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147) at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:55) at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:978) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:383) ... 22 more {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9957) Hive 1.1.0 not compatible with Hadoop 2.4.0
[ https://issues.apache.org/jira/browse/HIVE-9957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509420#comment-14509420 ] subhashmv commented on HIVE-9957: - Actually i installed hive 1.0 version and hadoop 2.5 version it is saying that it is incompatible with the software so please tell me how to apply the patch i can't understand in the link provided by Lefty Hive 1.1.0 not compatible with Hadoop 2.4.0 --- Key: HIVE-9957 URL: https://issues.apache.org/jira/browse/HIVE-9957 Project: Hive Issue Type: Bug Components: Encryption Reporter: Vivek Shrivastava Assignee: Sergio Peña Labels: TODOC1.2 Fix For: 1.2.0 Attachments: HIVE-9957.1.patch Getting this exception while accessing data through Hive. Exception in thread main java.lang.NoSuchMethodError: org.apache.hadoop.hdfs.DFSClient.getKeyProvider()Lorg/apache/hadoop/crypto/key/KeyProvider; at org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.init(Hadoop23Shims.java:1152) at org.apache.hadoop.hive.shims.Hadoop23Shims.createHdfsEncryptionShim(Hadoop23Shims.java:1279) at org.apache.hadoop.hive.ql.session.SessionState.getHdfsEncryptionShim(SessionState.java:392) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.isPathEncrypted(SemanticAnalyzer.java:1756) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getStagingDirectoryPathname(SemanticAnalyzer.java:1875) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1689) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1427) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:10132) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10147) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:192) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:222) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:421) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:307) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1112) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1160) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1039) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:207) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:754) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9917) After HIVE-3454 is done, make int to timestamp conversion configurable
[ https://issues.apache.org/jira/browse/HIVE-9917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509570#comment-14509570 ] Aihua Xu commented on HIVE-9917: Thanks [~jdere]. After HIVE-3454 is done, make int to timestamp conversion configurable -- Key: HIVE-9917 URL: https://issues.apache.org/jira/browse/HIVE-9917 Project: Hive Issue Type: Improvement Reporter: Aihua Xu Assignee: Aihua Xu Fix For: 1.2.0 Attachments: HIVE-9917.patch After HIVE-3454 is fixed, we will have correct behavior of converting int to timestamp. While the customers are using such incorrect behavior for so long, better to make it configurable so that in one release, it will default to old/inconsistent way and the next release will default to new/consistent way. And then we will deprecate it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10413) [CBO] Return path assumes distinct column cant be same as grouping column
[ https://issues.apache.org/jira/browse/HIVE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509346#comment-14509346 ] Ashutosh Chauhan commented on HIVE-10413: - In place of findIn(), ExprNodeDescUtils::indexOf can be used. Other than look good. +1 Tested with given queries. [CBO] Return path assumes distinct column cant be same as grouping column - Key: HIVE-10413 URL: https://issues.apache.org/jira/browse/HIVE-10413 Project: Hive Issue Type: Sub-task Affects Versions: 1.2.0 Reporter: Ashutosh Chauhan Assignee: Laljo John Pullokkaran Attachments: HIVE-10413.1.patch, HIVE-10413.2.patch, HIVE-10413.patch Found in cbo_udf_udaf.q tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-9451) Add max size of column dictionaries to ORC metadata
[ https://issues.apache.org/jira/browse/HIVE-9451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley reassigned HIVE-9451: --- Assignee: Owen O'Malley Add max size of column dictionaries to ORC metadata --- Key: HIVE-9451 URL: https://issues.apache.org/jira/browse/HIVE-9451 Project: Hive Issue Type: Improvement Reporter: Owen O'Malley Assignee: Owen O'Malley To predict the amount of memory required to read an ORC file we need to know the size of the dictionaries for the columns that we are reading. I propose adding the number of bytes for each column's dictionary to the stripe's column statistics. The file's column statistics would have the maximum dictionary size for each column. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-5672) Insert with custom separator not supported for non-local directory
[ https://issues.apache.org/jira/browse/HIVE-5672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509385#comment-14509385 ] Hive QA commented on HIVE-5672: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12727622/HIVE-5672.5.patch {color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 8729 tests executed *Failed tests:* {noformat} TestMinimrCliDriver-bucketmapjoin6.q-constprog_partitioner.q-infer_bucket_sort_dyn_part.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-external_table_with_space_in_location_path.q-infer_bucket_sort_merge.q-auto_sortmerge_join_16.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-groupby2.q-import_exported_table.q-bucketizedhiveinputformat.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-index_bitmap3.q-stats_counter_partitioned.q-temp_table_external.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-infer_bucket_sort_map_operators.q-join1.q-bucketmapjoin7.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-infer_bucket_sort_num_buckets.q-disable_merge_for_bucketing.q-uber_reduce.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-infer_bucket_sort_reducers_power_two.q-scriptfile1.q-scriptfile1_win.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-leftsemijoin_mr.q-load_hdfs_file_with_space_in_the_name.q-root_dir_external_table.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-list_bucket_dml_10.q-bucket_num_reducers.q-bucket6.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-load_fs2.q-file_with_header_footer.q-ql_rewrite_gbtoidx_cbo_1.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-parallel_orderby.q-reduce_deduplicate.q-ql_rewrite_gbtoidx_cbo_2.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-ql_rewrite_gbtoidx.q-smb_mapjoin_8.q - did not produce a TEST-*.xml file TestMinimrCliDriver-schemeAuthority2.q-bucket4.q-input16_cc.q-and-1-more - did not produce a TEST-*.xml file {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3545/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3545/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3545/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 13 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12727622 - PreCommit-HIVE-TRUNK-Build Insert with custom separator not supported for non-local directory -- Key: HIVE-5672 URL: https://issues.apache.org/jira/browse/HIVE-5672 Project: Hive Issue Type: Bug Affects Versions: 0.12.0, 1.0.0 Reporter: Romain Rigaux Assignee: Nemon Lou Attachments: HIVE-5672.1.patch, HIVE-5672.2.patch, HIVE-5672.3.patch, HIVE-5672.4.patch, HIVE-5672.5.patch, HIVE-5672.5.patch.tar.gz https://issues.apache.org/jira/browse/HIVE-3682 is great but non local directory don't seem to be supported: {code} insert overwrite directory '/tmp/test-02' row format delimited FIELDS TERMINATED BY ':' select description FROM sample_07 {code} {code} Error while compiling statement: FAILED: ParseException line 2:0 cannot recognize input near 'row' 'format' 'delimited' in select clause {code} This works (with 'local'): {code} insert overwrite local directory '/tmp/test-02' row format delimited FIELDS TERMINATED BY ':' select code, description FROM sample_07 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7711) Error Serializing GenericUDF
[ https://issues.apache.org/jira/browse/HIVE-7711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509422#comment-14509422 ] ankush commented on HIVE-7711: -- Could you please let me know how i find the kryo version ? how i find the kryo version that i using ? Please help Error Serializing GenericUDF Key: HIVE-7711 URL: https://issues.apache.org/jira/browse/HIVE-7711 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Dr. Christian Betz Attachments: HIVE-7711.1.patch.txt I get an exception running a job with a GenericUDF in HIVE 0.13.0 (which was ok in HIVE 0.12.0). The org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc is serialized using Kryo, trying to serialize stuff in my GenericUDF which is not serializable (doesn't implement Serializable). Switching to Kryo made the comment in ExprNodeGenericFuncDesc obsolte: /** * In case genericUDF is Serializable, we will serialize the object. * * In case genericUDF does not implement Serializable, Java will remember the * class of genericUDF and creates a new instance when deserialized. This is * exactly what we want. */ Find the stacktrace below, however, the description above should be clear. Exception in thread main org.apache.hive.com.esotericsoftware.kryo.KryoException: java.lang.UnsupportedOperationException Serialization trace: value (java.util.concurrent.atomic.AtomicReference) state (clojure.lang.Atom) state (udfs.ArraySum) genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc) colExprMap (org.apache.hadoop.hive.ql.exec.SelectOperator) childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator) aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork) mapWork (org.apache.hadoop.hive.ql.plan.MapredWork) at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125) at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507) at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694) at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106) at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507) at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694) at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106) at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507) at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694) at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106) at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507) at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776) at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:139) at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17) at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694) at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106) at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507) at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776) at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112) at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18) at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694) at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106) at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507) at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776) at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:139) at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17) at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694) at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106) at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507) at
[jira] [Commented] (HIVE-8343) Return value from BlockingQueue.offer() is not checked in DynamicPartitionPruner
[ https://issues.apache.org/jira/browse/HIVE-8343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509320#comment-14509320 ] Ted Yu commented on HIVE-8343: -- lgtm Return value from BlockingQueue.offer() is not checked in DynamicPartitionPruner Key: HIVE-8343 URL: https://issues.apache.org/jira/browse/HIVE-8343 Project: Hive Issue Type: Bug Reporter: Ted Yu Assignee: JongWon Park Priority: Minor Attachments: HIVE-8343.patch In addEvent() and processVertex(), there is call such as the following: {code} queue.offer(event); {code} The return value should be checked. If false is returned, event would not have been queued. Take a look at line 328 in: http://fuseyism.com/classpath/doc/java/util/concurrent/LinkedBlockingQueue-source.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10456) Grace Hash Join should not load spilled partitions on abort
[ https://issues.apache.org/jira/browse/HIVE-10456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509487#comment-14509487 ] Gunther Hagleitner commented on HIVE-10456: --- This doesn't seem right - the clear method you use leaves a broken partition behind right? You clean up some stuff but don't nuke the whole container. I think the logic should be: if it has any spilled partitions, throw away the whole container and make sure it's not in the cache (shouldn't be). If there are no partitions spilled leave alone for reuse. Grace Hash Join should not load spilled partitions on abort --- Key: HIVE-10456 URL: https://issues.apache.org/jira/browse/HIVE-10456 Project: Hive Issue Type: Bug Affects Versions: 1.2.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Attachments: HIVE-10456.1.patch Grace Hash Join loads the spilled partitions to complete the join in closeOp(). This should not happen when closeOp with abort is invoked. Instead it should clean up all the spilled data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10464) How i find the kryo version
[ https://issues.apache.org/jira/browse/HIVE-10464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509489#comment-14509489 ] ankush commented on HIVE-10464: --- thank you Lefty How i find the kryo version Key: HIVE-10464 URL: https://issues.apache.org/jira/browse/HIVE-10464 Project: Hive Issue Type: Improvement Reporter: ankush Could you please let me know how i find the kryo version that i using ? Please help me on this, We are just running HQL (Hive) queries -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10463) CBO (Calcite Return Path): Insert overwrite... select * from... queries failing for bucketed tables
[ https://issues.apache.org/jira/browse/HIVE-10463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laljo John Pullokkaran updated HIVE-10463: -- Assignee: Laljo John Pullokkaran CBO (Calcite Return Path): Insert overwrite... select * from... queries failing for bucketed tables --- Key: HIVE-10463 URL: https://issues.apache.org/jira/browse/HIVE-10463 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Jesus Camacho Rodriguez Assignee: Laljo John Pullokkaran Fix For: 1.2.0 When return path is on. To reproduce the Exception, take the following excerpt from auto_sortmerge_join_10.q: {noformat} set hive.enforce.bucketing = true; set hive.enforce.sorting = true; set hive.exec.reducers.max = 1; CREATE TABLE tbl1(key int, value string) CLUSTERED BY (key) SORTED BY (key) INTO 2 BUCKETS; insert overwrite table tbl1 select * from src where key 10; {noformat} It produces the following Exception: {noformat} java.lang.Exception: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529) Caused by: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:409) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 10 more Caused by: java.lang.RuntimeException: Reduce operator initialization failed at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:157) ... 14 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: cannot find field key from [0:_col0, 1:_col1] at org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:446) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:362) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:150) ... 14 more Caused by: java.lang.RuntimeException: cannot find field key from [0:_col0, 1:_col1] at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:416) at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147) at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:55) at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:978) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:383) ... 22 more {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10416) CBO (Calcite Return Path): Fix return columns if Sort operator is on top of plan returned by Calcite
[ https://issues.apache.org/jira/browse/HIVE-10416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509317#comment-14509317 ] Ashutosh Chauhan commented on HIVE-10416: - Is this patch ready for commit or does it need more work ? CBO (Calcite Return Path): Fix return columns if Sort operator is on top of plan returned by Calcite Key: HIVE-10416 URL: https://issues.apache.org/jira/browse/HIVE-10416 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Fix For: 1.2.0 Attachments: HIVE-10416.01.patch, HIVE-10416.patch When return path is on, if the plan's top operator is a Sort, we need to produce a SelectOp that will output exactly the columns needed by the FS. The following query reproduces the problem: {noformat} select cbo_t3.c_int, c, count(*) from (select key as a, c_int+1 as b, sum(c_int) as c from cbo_t1 where (cbo_t1.c_int + 1 = 0) and (cbo_t1.c_int 0 or cbo_t1.c_float = 0) group by c_float, cbo_t1.c_int, key order by a) cbo_t1 join (select key as p, c_int+1 as q, sum(c_int) as r from cbo_t2 where (cbo_t2.c_int + 1 = 0) and (cbo_t2.c_int 0 or cbo_t2.c_float = 0) group by c_float, cbo_t2.c_int, key order by q/10 desc, r asc) cbo_t2 on cbo_t1.a=p join cbo_t3 on cbo_t1.a=key where (b + cbo_t2.q = 0) and (b 0 or c_int = 0) group by cbo_t3.c_int, c order by cbo_t3.c_int+c desc, c; {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-10464) How i find the kryo version
[ https://issues.apache.org/jira/browse/HIVE-10464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang resolved HIVE-10464. Resolution: Invalid How i find the kryo version Key: HIVE-10464 URL: https://issues.apache.org/jira/browse/HIVE-10464 Project: Hive Issue Type: Improvement Reporter: ankush Could you please let me know how i find the kryo version that i using ? Please help me on this, We are just running HQL (Hive) queries -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10454) Query against partitioned table in strict mode failed with No partition predicate found even if partition predicate is specified.
[ https://issues.apache.org/jira/browse/HIVE-10454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509545#comment-14509545 ] Aihua Xu commented on HIVE-10454: - In this query, I'm not filtering the rows but filtering the partitions, so we won't scan all the partitions. Strict mode by the definition should allow such query. Query against partitioned table in strict mode failed with No partition predicate found even if partition predicate is specified. --- Key: HIVE-10454 URL: https://issues.apache.org/jira/browse/HIVE-10454 Project: Hive Issue Type: Bug Reporter: Aihua Xu Assignee: Aihua Xu The following queries fail: {noformat} create table t1 (c1 int) PARTITIONED BY (c2 string); set hive.mapred.mode=strict; select * from t1 where t1.c2 to_date(date_add(from_unixtime( unix_timestamp() ),1)); {noformat} The query failed with No partition predicate found for alias t1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-5672) Insert with custom separator not supported for non-local directory
[ https://issues.apache.org/jira/browse/HIVE-5672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nemon Lou updated HIVE-5672: Attachment: HIVE-5672.5.patch Insert with custom separator not supported for non-local directory -- Key: HIVE-5672 URL: https://issues.apache.org/jira/browse/HIVE-5672 Project: Hive Issue Type: Bug Affects Versions: 0.12.0, 1.0.0 Reporter: Romain Rigaux Assignee: Nemon Lou Attachments: HIVE-5672.1.patch, HIVE-5672.2.patch, HIVE-5672.3.patch, HIVE-5672.4.patch, HIVE-5672.5.patch, HIVE-5672.5.patch.tar.gz https://issues.apache.org/jira/browse/HIVE-3682 is great but non local directory don't seem to be supported: {code} insert overwrite directory '/tmp/test-02' row format delimited FIELDS TERMINATED BY ':' select description FROM sample_07 {code} {code} Error while compiling statement: FAILED: ParseException line 2:0 cannot recognize input near 'row' 'format' 'delimited' in select clause {code} This works (with 'local'): {code} insert overwrite local directory '/tmp/test-02' row format delimited FIELDS TERMINATED BY ':' select code, description FROM sample_07 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10416) CBO (Calcite Return Path): Fix return columns if Sort operator is on top of plan returned by Calcite
[ https://issues.apache.org/jira/browse/HIVE-10416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509322#comment-14509322 ] Jesus Camacho Rodriguez commented on HIVE-10416: [~ashutoshc], not yet, I need to discuss with John about his comment. CBO (Calcite Return Path): Fix return columns if Sort operator is on top of plan returned by Calcite Key: HIVE-10416 URL: https://issues.apache.org/jira/browse/HIVE-10416 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Fix For: 1.2.0 Attachments: HIVE-10416.01.patch, HIVE-10416.patch When return path is on, if the plan's top operator is a Sort, we need to produce a SelectOp that will output exactly the columns needed by the FS. The following query reproduces the problem: {noformat} select cbo_t3.c_int, c, count(*) from (select key as a, c_int+1 as b, sum(c_int) as c from cbo_t1 where (cbo_t1.c_int + 1 = 0) and (cbo_t1.c_int 0 or cbo_t1.c_float = 0) group by c_float, cbo_t1.c_int, key order by a) cbo_t1 join (select key as p, c_int+1 as q, sum(c_int) as r from cbo_t2 where (cbo_t2.c_int + 1 = 0) and (cbo_t2.c_int 0 or cbo_t2.c_float = 0) group by c_float, cbo_t2.c_int, key order by q/10 desc, r asc) cbo_t2 on cbo_t1.a=p join cbo_t3 on cbo_t1.a=key where (b + cbo_t2.q = 0) and (b 0 or c_int = 0) group by cbo_t3.c_int, c order by cbo_t3.c_int+c desc, c; {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10454) Query against partitioned table in strict mode failed with No partition predicate found even if partition predicate is specified.
[ https://issues.apache.org/jira/browse/HIVE-10454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509403#comment-14509403 ] Xuefu Zhang commented on HIVE-10454: I think the point of strict mode is to prevent full scan all partitions of a table. In your case, while rows are filtered, the scanner will have to scan all partitions, which should be prevented by the virtue of the strict mode. Query against partitioned table in strict mode failed with No partition predicate found even if partition predicate is specified. --- Key: HIVE-10454 URL: https://issues.apache.org/jira/browse/HIVE-10454 Project: Hive Issue Type: Bug Reporter: Aihua Xu Assignee: Aihua Xu The following queries fail: {noformat} create table t1 (c1 int) PARTITIONED BY (c2 string); set hive.mapred.mode=strict; select * from t1 where t1.c2 to_date(date_add(from_unixtime( unix_timestamp() ),1)); {noformat} The query failed with No partition predicate found for alias t1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10302) Load small tables (for map join) in executor memory only once [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HIVE-10302: --- Attachment: HIVE-10302.2-spark.patch Load small tables (for map join) in executor memory only once [Spark Branch] Key: HIVE-10302 URL: https://issues.apache.org/jira/browse/HIVE-10302 Project: Hive Issue Type: Improvement Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: spark-branch Attachments: HIVE-10302.2-spark.patch, HIVE-10302.spark-1.patch Usually there are multiple cores in a Spark executor, and thus it's possible that multiple map-join tasks can be running in the same executor (concurrently or sequentially). Currently, each task will load its own copy of the small tables for map join into memory, ending up with inefficiency. Ideally, we only load the small tables once and share them among the tasks running in that executor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10438) Architecture for ResultSet Compression via external plugin
[ https://issues.apache.org/jira/browse/HIVE-10438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohit Dholakia updated HIVE-10438: -- Affects Version/s: (was: 1.1.0) 1.2.0 Architecture for ResultSet Compression via external plugin --- Key: HIVE-10438 URL: https://issues.apache.org/jira/browse/HIVE-10438 Project: Hive Issue Type: New Feature Components: Hive, Thrift API Affects Versions: 1.2.0 Reporter: Rohit Dholakia Assignee: Rohit Dholakia Labels: patch Attachments: CompressorProtocolHS2.patch, Proposal-rscompressor.pdf, TestingIntegerCompression.pdf, hs2resultSetcompressor.zip This JIRA proposes an architecture for enabling ResultSet compression which uses an external plugin. The patch has three aspects to it: 0. An architecture for enabling ResultSet compression with external plugins 1. An example plugin to demonstrate end-to-end functionality 2. A container to allow everyone to write and test ResultSet compressors. Also attaching a design document explaining the changes, experimental results document, and a pdf explaining how to setup the docker container to observe end-to-end functionality of ResultSet compression. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10464) How i find the kryo version
[ https://issues.apache.org/jira/browse/HIVE-10464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509480#comment-14509480 ] Lefty Leverenz commented on HIVE-10464: --- You can ask this question on the u...@hive.apache.org mailing list. * [User Mailing List | http://hive.apache.org/mailing_lists.html] How i find the kryo version Key: HIVE-10464 URL: https://issues.apache.org/jira/browse/HIVE-10464 Project: Hive Issue Type: Improvement Reporter: ankush Could you please let me know how i find the kryo version that i using ? Please help me on this, We are just running HQL (Hive) queries -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10391) CBO (Calcite Return Path): HiveOpConverter always assumes that HiveFilter does not include a partition column
[ https://issues.apache.org/jira/browse/HIVE-10391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509578#comment-14509578 ] Lefty Leverenz commented on HIVE-10391: --- The commit gives the umbrella jira number (HIVE-9132) instead of this one (HIVE-10391), although the summary text is correct. It's commit 5a576b6fbf1680ab4dd8f275cad484a2614ef2c1. CBO (Calcite Return Path): HiveOpConverter always assumes that HiveFilter does not include a partition column - Key: HIVE-10391 URL: https://issues.apache.org/jira/browse/HIVE-10391 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Pengcheng Xiong Assignee: Laljo John Pullokkaran Fix For: 1.2.0 Attachments: HIVE-10391.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10466) LLAP: fix container sizing configuration for memory
[ https://issues.apache.org/jira/browse/HIVE-10466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-10466: Description: We cannot use full machine for LLAP due to config for cache and executors being split brain... please refer to [~gopalv] for details was: This is [~sershe] impersonating :) We cannot use full machine for LLAP due to config for cache and executors being split brain... please refer to [~gopalv] for details LLAP: fix container sizing configuration for memory --- Key: HIVE-10466 URL: https://issues.apache.org/jira/browse/HIVE-10466 Project: Hive Issue Type: Sub-task Reporter: Gopal V Assignee: Vikram Dixit K We cannot use full machine for LLAP due to config for cache and executors being split brain... please refer to [~gopalv] for details -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10370) Hive does not compile with -Phadoop-1 option
[ https://issues.apache.org/jira/browse/HIVE-10370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509772#comment-14509772 ] Vaibhav Gumashta commented on HIVE-10370: - +1 Hive does not compile with -Phadoop-1 option Key: HIVE-10370 URL: https://issues.apache.org/jira/browse/HIVE-10370 Project: Hive Issue Type: Bug Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Prasanth Jayachandran Priority: Critical Attachments: HIVE-10370.1.patch Running into the below error while running mvn clean install -Pdist -Phadoop-1 {code} [ERROR]hive/serde/src/test/org/apache/hadoop/hive/serde2/lazy/TestLazySimpleFast.java:[164,33] cannot find symbol symbol: method copyBytes() location: variable serialized of type org.apache.hadoop.io.Text {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10443) HIVE-9870 broke hadoop-1 build
[ https://issues.apache.org/jira/browse/HIVE-10443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509771#comment-14509771 ] Prasanth Jayachandran commented on HIVE-10443: -- LGTM, +1. HIVE-9870 broke hadoop-1 build -- Key: HIVE-10443 URL: https://issues.apache.org/jira/browse/HIVE-10443 Project: Hive Issue Type: Bug Affects Versions: 1.2.0 Reporter: Prasanth Jayachandran Assignee: Vaibhav Gumashta Fix For: 1.2.0 Attachments: HIVE-10443.1.patch JvmPauseMonitor added in HIVE-9870 is breaking hadoop-1 build. HiveServer2.startPauseMonitor() does not use reflection properly to start JvmPauseMonitor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10447) Beeline JDBC Driver to support 2 way SSL
[ https://issues.apache.org/jira/browse/HIVE-10447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-10447: - Attachment: HIVE-10447.1.patch cc-ing [~thejas] , [~vgumashta] for review. Beeline JDBC Driver to support 2 way SSL Key: HIVE-10447 URL: https://issues.apache.org/jira/browse/HIVE-10447 Project: Hive Issue Type: Bug Components: Beeline Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-10447.1.patch This jira should cover 2-way SSL authentication between the JDBC Client and server which requires the driver to support it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10465) whitelist restrictions don't get initialized in new copy of HiveConf
[ https://issues.apache.org/jira/browse/HIVE-10465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-10465: - Summary: whitelist restrictions don't get initialized in new copy of HiveConf (was: whitelist restrictions don't get initialized in initial part of session) whitelist restrictions don't get initialized in new copy of HiveConf Key: HIVE-10465 URL: https://issues.apache.org/jira/browse/HIVE-10465 Project: Hive Issue Type: Bug Reporter: Thejas M Nair Assignee: Thejas M Nair Whitelist restrictions use a regex pattern in HiveConf, but when a new HiveConf object copy is created, the regex pattern is not initialized in the new HiveConf copy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10456) Grace Hash Join should not load spilled partitions on abort
[ https://issues.apache.org/jira/browse/HIVE-10456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509719#comment-14509719 ] Gunther Hagleitner commented on HIVE-10456: --- Summary of offline discussion w/ [~prasanth_j]: - Probably best to check if hash table is in registry (on abort). If it is: Ownership is shared, no need to clean up. If it isn't: MapJoinOp owns the table container and need to clean (+ free mem). Grace Hash Join should not load spilled partitions on abort --- Key: HIVE-10456 URL: https://issues.apache.org/jira/browse/HIVE-10456 Project: Hive Issue Type: Bug Affects Versions: 1.2.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Attachments: HIVE-10456.1.patch Grace Hash Join loads the spilled partitions to complete the join in closeOp(). This should not happen when closeOp with abort is invoked. Instead it should clean up all the spilled data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10434) Cancel connection when remote Spark driver process has failed [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509721#comment-14509721 ] Hive QA commented on HIVE-10434: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12727680/HIVE-10434.4-spark.patch {color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 8721 tests executed *Failed tests:* {noformat} TestMinimrCliDriver-bucket6.q-scriptfile1_win.q-quotedid_smb.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-bucketizedhiveinputformat.q-empty_dir_in_table.q - did not produce a TEST-*.xml file TestMinimrCliDriver-groupby2.q-infer_bucket_sort_map_operators.q-load_hdfs_file_with_space_in_the_name.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-import_exported_table.q-truncate_column_buckets.q-bucket_num_reducers2.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-index_bitmap3.q-infer_bucket_sort_num_buckets.q-parallel_orderby.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-infer_bucket_sort_reducers_power_two.q-join1.q-infer_bucket_sort_bucketed_table.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-leftsemijoin_mr.q-bucket5.q-infer_bucket_sort_merge.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-list_bucket_dml_10.q-input16_cc.q-temp_table_external.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-ql_rewrite_gbtoidx.q-bucket_num_reducers.q-scriptfile1.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-ql_rewrite_gbtoidx_cbo_2.q-bucketmapjoin6.q-bucket4.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-reduce_deduplicate.q-infer_bucket_sort_dyn_part.q-udf_using.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-schemeAuthority2.q-uber_reduce.q-ql_rewrite_gbtoidx_cbo_1.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-stats_counter_partitioned.q-external_table_with_space_in_location_path.q-disable_merge_for_bucketing.q-and-1-more - did not produce a TEST-*.xml file org.apache.hive.jdbc.TestSSL.testSSLConnectionWithProperty {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/834/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/834/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-834/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 14 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12727680 - PreCommit-HIVE-SPARK-Build Cancel connection when remote Spark driver process has failed [Spark Branch] - Key: HIVE-10434 URL: https://issues.apache.org/jira/browse/HIVE-10434 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: 1.2.0 Reporter: Chao Sun Assignee: Chao Sun Attachments: HIVE-10434.1-spark.patch, HIVE-10434.3-spark.patch, HIVE-10434.4-spark.patch Currently in HoS, in SparkClientImpl it first launch a remote Driver process, and then wait for it to connect back to the HS2. However, in certain situations (for instance, permission issue), the remote process may fail and exit with error code. In this situation, the HS2 process will still wait for the process to connect, and wait for a full timeout period before it throws the exception. What makes it worth, user may need to wait for two timeout periods: one for the SparkSetReducerParallelism, and another for the actual Spark job. This could be very annoying. We should cancel the timeout task once we found out that the process has failed, and set the promise as failed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10438) Architecture for ResultSet Compression via external plugin
[ https://issues.apache.org/jira/browse/HIVE-10438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohit Dholakia updated HIVE-10438: -- Description: This JIRA proposes an architecture for enabling ResultSet compression which uses an external plugin. The patch has three aspects to it: 0. An architecture for enabling ResultSet compression with external plugins 1. An example plugin to demonstrate end-to-end functionality 2. A container to allow everyone to write and test ResultSet compressors with a query submitter (https://github.com/xiaom/hs2driver) Also attaching a design document explaining the changes, experimental results document, and a pdf explaining how to setup the docker container to observe end-to-end functionality of ResultSet compression. was: This JIRA proposes an architecture for enabling ResultSet compression which uses an external plugin. The patch has three aspects to it: 0. An architecture for enabling ResultSet compression with external plugins 1. An example plugin to demonstrate end-to-end functionality 2. A container to allow everyone to write and test ResultSet compressors. Also attaching a design document explaining the changes, experimental results document, and a pdf explaining how to setup the docker container to observe end-to-end functionality of ResultSet compression. Architecture for ResultSet Compression via external plugin --- Key: HIVE-10438 URL: https://issues.apache.org/jira/browse/HIVE-10438 Project: Hive Issue Type: New Feature Components: Hive, Thrift API Affects Versions: 1.2.0 Reporter: Rohit Dholakia Assignee: Rohit Dholakia Labels: patch Attachments: CompressorProtocolHS2.patch, Proposal-rscompressor.pdf, TestingIntegerCompression.pdf, hs2resultSetcompressor.zip This JIRA proposes an architecture for enabling ResultSet compression which uses an external plugin. The patch has three aspects to it: 0. An architecture for enabling ResultSet compression with external plugins 1. An example plugin to demonstrate end-to-end functionality 2. A container to allow everyone to write and test ResultSet compressors with a query submitter (https://github.com/xiaom/hs2driver) Also attaching a design document explaining the changes, experimental results document, and a pdf explaining how to setup the docker container to observe end-to-end functionality of ResultSet compression. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10438) Architecture for ResultSet Compression via external plugin
[ https://issues.apache.org/jira/browse/HIVE-10438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohit Dholakia updated HIVE-10438: -- Attachment: hs2driver-master.zip Architecture for ResultSet Compression via external plugin --- Key: HIVE-10438 URL: https://issues.apache.org/jira/browse/HIVE-10438 Project: Hive Issue Type: New Feature Components: Hive, Thrift API Affects Versions: 1.2.0 Reporter: Rohit Dholakia Assignee: Rohit Dholakia Labels: patch Attachments: CompressorProtocolHS2.patch, Proposal-rscompressor.pdf, TestingIntegerCompression.pdf, hs2driver-master.zip, hs2resultSetcompressor.zip This JIRA proposes an architecture for enabling ResultSet compression which uses an external plugin. The patch has three aspects to it: 0. An architecture for enabling ResultSet compression with external plugins 1. An example plugin to demonstrate end-to-end functionality 2. A container to allow everyone to write and test ResultSet compressors with a query submitter (https://github.com/xiaom/hs2driver) Also attaching a design document explaining the changes, experimental results document, and a pdf explaining how to setup the docker container to observe end-to-end functionality of ResultSet compression. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10438) Architecture for ResultSet Compression via external plugin
[ https://issues.apache.org/jira/browse/HIVE-10438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohit Dholakia updated HIVE-10438: -- Attachment: readme.txt Architecture for ResultSet Compression via external plugin --- Key: HIVE-10438 URL: https://issues.apache.org/jira/browse/HIVE-10438 Project: Hive Issue Type: New Feature Components: Hive, Thrift API Affects Versions: 1.2.0 Reporter: Rohit Dholakia Assignee: Rohit Dholakia Labels: patch Attachments: CompressorProtocolHS2.patch, Proposal-rscompressor.pdf, TestingIntegerCompression.pdf, hs2driver-master.zip, hs2resultSetcompressor.zip, readme.txt This JIRA proposes an architecture for enabling ResultSet compression which uses an external plugin. The patch has three aspects to it: 0. An architecture for enabling ResultSet compression with external plugins 1. An example plugin to demonstrate end-to-end functionality 2. A container to allow everyone to write and test ResultSet compressors with a query submitter (https://github.com/xiaom/hs2driver) Also attaching a design document explaining the changes, experimental results document, and a pdf explaining how to setup the docker container to observe end-to-end functionality of ResultSet compression. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10443) HIVE-9870 broke hadoop-1 build
[ https://issues.apache.org/jira/browse/HIVE-10443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-10443: Attachment: HIVE-10443.1.patch HIVE-9870 broke hadoop-1 build -- Key: HIVE-10443 URL: https://issues.apache.org/jira/browse/HIVE-10443 Project: Hive Issue Type: Bug Affects Versions: 1.2.0 Reporter: Prasanth Jayachandran Assignee: Vaibhav Gumashta Attachments: HIVE-10443.1.patch JvmPauseMonitor added in HIVE-9870 is breaking hadoop-1 build. HiveServer2.startPauseMonitor() does not use reflection properly to start JvmPauseMonitor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10443) HIVE-9870 broke hadoop-1 build
[ https://issues.apache.org/jira/browse/HIVE-10443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-10443: Fix Version/s: 1.2.0 HIVE-9870 broke hadoop-1 build -- Key: HIVE-10443 URL: https://issues.apache.org/jira/browse/HIVE-10443 Project: Hive Issue Type: Bug Affects Versions: 1.2.0 Reporter: Prasanth Jayachandran Assignee: Vaibhav Gumashta Fix For: 1.2.0 Attachments: HIVE-10443.1.patch JvmPauseMonitor added in HIVE-9870 is breaking hadoop-1 build. HiveServer2.startPauseMonitor() does not use reflection properly to start JvmPauseMonitor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10227) Concrete implementation of Export/Import based ReplicationTaskFactory
[ https://issues.apache.org/jira/browse/HIVE-10227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509627#comment-14509627 ] Lefty Leverenz commented on HIVE-10227: --- Doc note: HIVE-10264 will document everything related to replication, including the configuration parameter added here (*hive.repl.task.factory*). Concrete implementation of Export/Import based ReplicationTaskFactory - Key: HIVE-10227 URL: https://issues.apache.org/jira/browse/HIVE-10227 Project: Hive Issue Type: Sub-task Components: Import/Export Affects Versions: 1.2.0 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Fix For: 1.2.0 Attachments: HIVE-10227.2.patch, HIVE-10227.3.patch, HIVE-10227.4.patch, HIVE-10227.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10443) HIVE-9870 broke hadoop-1 build
[ https://issues.apache.org/jira/browse/HIVE-10443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-10443: Attachment: (was: HIVE-10443.1.patch) HIVE-9870 broke hadoop-1 build -- Key: HIVE-10443 URL: https://issues.apache.org/jira/browse/HIVE-10443 Project: Hive Issue Type: Bug Affects Versions: 1.2.0 Reporter: Prasanth Jayachandran Assignee: Vaibhav Gumashta JvmPauseMonitor added in HIVE-9870 is breaking hadoop-1 build. HiveServer2.startPauseMonitor() does not use reflection properly to start JvmPauseMonitor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10443) HIVE-9870 broke hadoop-1 build
[ https://issues.apache.org/jira/browse/HIVE-10443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-10443: Attachment: HIVE-10443.1.patch HIVE-9870 broke hadoop-1 build -- Key: HIVE-10443 URL: https://issues.apache.org/jira/browse/HIVE-10443 Project: Hive Issue Type: Bug Affects Versions: 1.2.0 Reporter: Prasanth Jayachandran Assignee: Vaibhav Gumashta JvmPauseMonitor added in HIVE-9870 is breaking hadoop-1 build. HiveServer2.startPauseMonitor() does not use reflection properly to start JvmPauseMonitor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-10423) HIVE-7948 breaks deploy_e2e_artifacts.sh
[ https://issues.apache.org/jira/browse/HIVE-10423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aswathy Chellammal Sreekumar resolved HIVE-10423. - Resolution: Fixed HIVE-7948 breaks deploy_e2e_artifacts.sh Key: HIVE-10423 URL: https://issues.apache.org/jira/browse/HIVE-10423 Project: Hive Issue Type: Bug Affects Versions: 1.2.0 Reporter: Eugene Koifman Assignee: Aswathy Chellammal Sreekumar Attachments: HIVE-10423.patch HIVE-7948 added a step to download a ml-1m.zip file and unzip it. this only works if you call deploy_e2e_artifacts.sh once. If you call it again (which is very common in dev) it blocks and ask for additional input from user because target files already exist. This needs to be changed similarly to what we discussed for HIVE-9272, i.e. place artifacts not under source control in testdist/. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10434) Cancel connection when remote Spark driver process has failed [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HIVE-10434: Attachment: HIVE-10434.4-spark.patch Addressing RB comments #2. Cancel connection when remote Spark driver process has failed [Spark Branch] - Key: HIVE-10434 URL: https://issues.apache.org/jira/browse/HIVE-10434 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: 1.2.0 Reporter: Chao Sun Assignee: Chao Sun Attachments: HIVE-10434.1-spark.patch, HIVE-10434.3-spark.patch, HIVE-10434.4-spark.patch Currently in HoS, in SparkClientImpl it first launch a remote Driver process, and then wait for it to connect back to the HS2. However, in certain situations (for instance, permission issue), the remote process may fail and exit with error code. In this situation, the HS2 process will still wait for the process to connect, and wait for a full timeout period before it throws the exception. What makes it worth, user may need to wait for two timeout periods: one for the SparkSetReducerParallelism, and another for the actual Spark job. This could be very annoying. We should cancel the timeout task once we found out that the process has failed, and set the promise as failed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10302) Load small tables (for map join) in executor memory only once [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509619#comment-14509619 ] Hive QA commented on HIVE-10302: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12727654/HIVE-10302.2-spark.patch {color:red}ERROR:{color} -1 due to 21 failed/errored test(s), 8721 tests executed *Failed tests:* {noformat} TestMinimrCliDriver-bucket6.q-scriptfile1_win.q-quotedid_smb.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-bucketizedhiveinputformat.q-empty_dir_in_table.q - did not produce a TEST-*.xml file TestMinimrCliDriver-groupby2.q-infer_bucket_sort_map_operators.q-load_hdfs_file_with_space_in_the_name.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-import_exported_table.q-truncate_column_buckets.q-bucket_num_reducers2.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-index_bitmap3.q-infer_bucket_sort_num_buckets.q-parallel_orderby.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-infer_bucket_sort_reducers_power_two.q-join1.q-infer_bucket_sort_bucketed_table.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-leftsemijoin_mr.q-bucket5.q-infer_bucket_sort_merge.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-list_bucket_dml_10.q-input16_cc.q-temp_table_external.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-ql_rewrite_gbtoidx.q-bucket_num_reducers.q-scriptfile1.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-ql_rewrite_gbtoidx_cbo_2.q-bucketmapjoin6.q-bucket4.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-reduce_deduplicate.q-infer_bucket_sort_dyn_part.q-udf_using.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-schemeAuthority2.q-uber_reduce.q-ql_rewrite_gbtoidx_cbo_1.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-stats_counter_partitioned.q-external_table_with_space_in_location_path.q-disable_merge_for_bucketing.q-and-1-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_sortmerge_join_1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_sortmerge_join_2 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_sortmerge_join_3 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_sortmerge_join_8 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin11 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin5 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketsortoptimize_insert_2 org.apache.hive.jdbc.TestSSL.testSSLVersion {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/833/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/833/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-833/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 21 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12727654 - PreCommit-HIVE-SPARK-Build Load small tables (for map join) in executor memory only once [Spark Branch] Key: HIVE-10302 URL: https://issues.apache.org/jira/browse/HIVE-10302 Project: Hive Issue Type: Improvement Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: spark-branch Attachments: HIVE-10302.2-spark.patch, HIVE-10302.spark-1.patch Usually there are multiple cores in a Spark executor, and thus it's possible that multiple map-join tasks can be running in the same executor (concurrently or sequentially). Currently, each task will load its own copy of the small tables for map join into memory, ending up with inefficiency. Ideally, we only load the small tables once and share them among the tasks running in that executor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10239) Create scripts to do metastore upgrade tests on jenkins for Derby, Oracle and PostgreSQL
[ https://issues.apache.org/jira/browse/HIVE-10239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naveen Gangam updated HIVE-10239: - Attachment: HIVE-10239.03.patch The Oracle installation continues to fail. apt-get update fails to download the oracle binaries. I added some debug to the script to list the content of /var/cache/apt/archives/ and it appears to be empty. + /bin/true + ls -al /var/cache/apt/archives + apt-get install -y --force-yes oracle-xe:i386 For some reason it is unable to download even the 32-bit binaries. For now, I am isolating the changes for postgres + derby from the oracle changes. I will continue to investigate the oracle script. I will file a new jira for this. Create scripts to do metastore upgrade tests on jenkins for Derby, Oracle and PostgreSQL Key: HIVE-10239 URL: https://issues.apache.org/jira/browse/HIVE-10239 Project: Hive Issue Type: Improvement Affects Versions: 1.1.0 Reporter: Naveen Gangam Assignee: Naveen Gangam Attachments: HIVE-10239-donotcommit.patch, HIVE-10239.0.patch, HIVE-10239.0.patch, HIVE-10239.00.patch, HIVE-10239.01.patch, HIVE-10239.02.patch, HIVE-10239.03.patch, HIVE-10239.patch Need to create DB-implementation specific scripts to use the framework introduced in HIVE-9800 to have any metastore schema changes tested across all supported databases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10456) Grace Hash Join should not load spilled partitions on abort
[ https://issues.apache.org/jira/browse/HIVE-10456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-10456: - Attachment: HIVE-10456.2.patch Grace Hash Join should not load spilled partitions on abort --- Key: HIVE-10456 URL: https://issues.apache.org/jira/browse/HIVE-10456 Project: Hive Issue Type: Bug Affects Versions: 1.2.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Attachments: HIVE-10456.1.patch, HIVE-10456.2.patch Grace Hash Join loads the spilled partitions to complete the join in closeOp(). This should not happen when closeOp with abort is invoked. Instead it should clean up all the spilled data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10423) HIVE-7948 breaks deploy_e2e_artifacts.sh
[ https://issues.apache.org/jira/browse/HIVE-10423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509857#comment-14509857 ] Aswathy Chellammal Sreekumar commented on HIVE-10423: - No, it is not committed yet. HIVE-7948 breaks deploy_e2e_artifacts.sh Key: HIVE-10423 URL: https://issues.apache.org/jira/browse/HIVE-10423 Project: Hive Issue Type: Bug Affects Versions: 1.2.0 Reporter: Eugene Koifman Assignee: Aswathy Chellammal Sreekumar Attachments: HIVE-10423.patch HIVE-7948 added a step to download a ml-1m.zip file and unzip it. this only works if you call deploy_e2e_artifacts.sh once. If you call it again (which is very common in dev) it blocks and ask for additional input from user because target files already exist. This needs to be changed similarly to what we discussed for HIVE-9272, i.e. place artifacts not under source control in testdist/. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10456) Grace Hash Join should not load spilled partitions on abort
[ https://issues.apache.org/jira/browse/HIVE-10456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-10456: - Attachment: (was: HIVE-10456.2.patch) Grace Hash Join should not load spilled partitions on abort --- Key: HIVE-10456 URL: https://issues.apache.org/jira/browse/HIVE-10456 Project: Hive Issue Type: Bug Affects Versions: 1.2.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Attachments: HIVE-10456.1.patch Grace Hash Join loads the spilled partitions to complete the join in closeOp(). This should not happen when closeOp with abort is invoked. Instead it should clean up all the spilled data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10384) RetryingMetaStoreClient does not retry wrapped TTransportExceptions
[ https://issues.apache.org/jira/browse/HIVE-10384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509918#comment-14509918 ] Chaoyu Tang commented on HIVE-10384: [~szehon] I think currently the other check only needs the TTransportException since it is the only TException thrown from HiveMetaStoreClient.open during reconnect, no others. Thanks RetryingMetaStoreClient does not retry wrapped TTransportExceptions --- Key: HIVE-10384 URL: https://issues.apache.org/jira/browse/HIVE-10384 Project: Hive Issue Type: Bug Components: Clients Reporter: Eric Liang Assignee: Chaoyu Tang Attachments: HIVE-10384.1.patch, HIVE-10384.patch This bug is very similar to HIVE-9436, in that a TTransportException wrapped in a MetaException will not be retried. RetryingMetaStoreClient has a block of code above the MetaException handler that retries thrift exceptions, but this doesn't work when the exception is wrapped. {code} if ((e.getCause() instanceof TApplicationException) || (e.getCause() instanceof TProtocolException) || (e.getCause() instanceof TTransportException)) { caughtException = (TException) e.getCause(); } else if ((e.getCause() instanceof MetaException) e.getCause().getMessage().matches((?s).*JDO[a-zA-Z]*Exception.*)) { caughtException = (MetaException) e.getCause(); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10419) can't do query on partitioned view with analytic function in strictmode
[ https://issues.apache.org/jira/browse/HIVE-10419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-10419: Attachment: (was: HIVE-10419.patch) can't do query on partitioned view with analytic function in strictmode --- Key: HIVE-10419 URL: https://issues.apache.org/jira/browse/HIVE-10419 Project: Hive Issue Type: Bug Components: Hive, Views Affects Versions: 0.13.0, 0.14.0, 1.0.0 Environment: Cloudera 5.3.x. Reporter: Hector Lagos Hey Guys, I created the following table: CREATE TABLE t1 (id int, key string, value string) partitioned by (dt int); And after that i created a view on that table as follow: create view v1 PARTITIONED ON (dt) as SELECT * FROM ( SELECT row_number() over (partition by key order by value asc) as row_n, * FROM t1 ) t WHERE row_n = 1; We are working with hive.mapred.mode=strict and when I try to do the query select * from v1 where dt = 2 , I'm getting the following error: FAILED: SemanticException [Error 10041]: No partition predicate found for Alias v1:t:t1 Table t1 Is this a bug or a limitation of Hive when you use analytic functions in partitioned views? If i remove the row_number function it works without problems. Thanks in advance, any help will be appreciated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10454) Query against partitioned table in strict mode failed with No partition predicate found even if partition predicate is specified.
[ https://issues.apache.org/jira/browse/HIVE-10454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-10454: Attachment: HIVE-10454.patch Query against partitioned table in strict mode failed with No partition predicate found even if partition predicate is specified. --- Key: HIVE-10454 URL: https://issues.apache.org/jira/browse/HIVE-10454 Project: Hive Issue Type: Bug Reporter: Aihua Xu Assignee: Aihua Xu Attachments: HIVE-10454.patch The following queries fail: {noformat} create table t1 (c1 int) PARTITIONED BY (c2 string); set hive.mapred.mode=strict; select * from t1 where t1.c2 to_date(date_add(from_unixtime( unix_timestamp() ),1)); {noformat} The query failed with No partition predicate found for alias t1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10456) Grace Hash Join should not load spilled partitions on abort
[ https://issues.apache.org/jira/browse/HIVE-10456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-10456: - Attachment: HIVE-10456.2.patch Grace Hash Join should not load spilled partitions on abort --- Key: HIVE-10456 URL: https://issues.apache.org/jira/browse/HIVE-10456 Project: Hive Issue Type: Bug Affects Versions: 1.2.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Attachments: HIVE-10456.1.patch, HIVE-10456.1.patch, HIVE-10456.2.patch Grace Hash Join loads the spilled partitions to complete the join in closeOp(). This should not happen when closeOp with abort is invoked. Instead it should clean up all the spilled data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10413) [CBO] Return path assumes distinct column cant be same as grouping column
[ https://issues.apache.org/jira/browse/HIVE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laljo John Pullokkaran updated HIVE-10413: -- Attachment: HIVE-10413.3.patch [CBO] Return path assumes distinct column cant be same as grouping column - Key: HIVE-10413 URL: https://issues.apache.org/jira/browse/HIVE-10413 Project: Hive Issue Type: Sub-task Affects Versions: 1.2.0 Reporter: Ashutosh Chauhan Assignee: Laljo John Pullokkaran Attachments: HIVE-10413.1.patch, HIVE-10413.2.patch, HIVE-10413.3.patch, HIVE-10413.patch Found in cbo_udf_udaf.q tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8459) DbLockManager locking table in addition to partitions
[ https://issues.apache.org/jira/browse/HIVE-8459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510001#comment-14510001 ] Alan Gates commented on HIVE-8459: -- Changed this to minor as an extra shared lock on the table makes no semantic difference. The lock on the partition would block any xlocks on the table anyway, and a read lock doesn't block other read locks or semi-shared locks. DbLockManager locking table in addition to partitions - Key: HIVE-8459 URL: https://issues.apache.org/jira/browse/HIVE-8459 Project: Hive Issue Type: Bug Components: Locking Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Minor Queries and operations on partitioned tables are generating locks on the whole table when they should only be locking the partition. For example: {code} select count(*) from concur_orc_tab_part where ds = 'today'; {code} This should only be locking the partition ds='today'. But instead: {code} mysql select * from HIVE_LOCKS; +++--+-+-+--+---+--+---++-++ | HL_LOCK_EXT_ID | HL_LOCK_INT_ID | HL_TXNID | HL_DB | HL_TABLE | HL_PARTITION | HL_LOCK_STATE | HL_LOCK_TYPE | HL_LAST_HEARTBEAT | HL_ACQUIRED_AT | HL_USER | HL_HOST| +++--+-+-+--+---+--+---++-++ |428 | 1 |0 | default | concur_orc_tab_part | NULL | a | r| 1413311172000 | 1413311171000 | hive| node-1.example.com | |428 | 2 |0 | default | concur_orc_tab_part | ds=today | a | r| 1413311172000 | 1413311171000 | hive| node-1.example.com | +++--+-+-+--+---+--+---++-++ {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10467) Switch to GIT repository on Jenkins precommit tests
[ https://issues.apache.org/jira/browse/HIVE-10467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-10467: --- Attachment: HIVE-10467.1.patch Switch to GIT repository on Jenkins precommit tests Key: HIVE-10467 URL: https://issues.apache.org/jira/browse/HIVE-10467 Project: Hive Issue Type: Improvement Components: Testing Infrastructure Reporter: Sergio Peña Assignee: Sergio Peña Attachments: HIVE-10467.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10467) Switch to GIT repository on Jenkins precommit tests
[ https://issues.apache.org/jira/browse/HIVE-10467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509785#comment-14509785 ] Sergio Peña commented on HIVE-10467: [~szehon] Could you review this small fix? Switch to GIT repository on Jenkins precommit tests Key: HIVE-10467 URL: https://issues.apache.org/jira/browse/HIVE-10467 Project: Hive Issue Type: Improvement Components: Testing Infrastructure Reporter: Sergio Peña Assignee: Sergio Peña Attachments: HIVE-10467.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10434) Cancel connection when remote Spark driver process has failed [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-10434: --- Attachment: HIVE-10434.4-spark.patch Cancel connection when remote Spark driver process has failed [Spark Branch] - Key: HIVE-10434 URL: https://issues.apache.org/jira/browse/HIVE-10434 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: 1.2.0 Reporter: Chao Sun Assignee: Chao Sun Attachments: HIVE-10434.1-spark.patch, HIVE-10434.3-spark.patch, HIVE-10434.4-spark.patch, HIVE-10434.4-spark.patch Currently in HoS, in SparkClientImpl it first launch a remote Driver process, and then wait for it to connect back to the HS2. However, in certain situations (for instance, permission issue), the remote process may fail and exit with error code. In this situation, the HS2 process will still wait for the process to connect, and wait for a full timeout period before it throws the exception. What makes it worth, user may need to wait for two timeout periods: one for the SparkSetReducerParallelism, and another for the actual Spark job. This could be very annoying. We should cancel the timeout task once we found out that the process has failed, and set the promise as failed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10467) Switch to GIT repository on Jenkins precommit tests
[ https://issues.apache.org/jira/browse/HIVE-10467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509796#comment-14509796 ] Szehon Ho commented on HIVE-10467: -- +1 thanks for taking care of this Switch to GIT repository on Jenkins precommit tests Key: HIVE-10467 URL: https://issues.apache.org/jira/browse/HIVE-10467 Project: Hive Issue Type: Improvement Components: Testing Infrastructure Reporter: Sergio Peña Assignee: Sergio Peña Attachments: HIVE-10467.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10217) LLAP: Support caching of uncompressed ORC data
[ https://issues.apache.org/jira/browse/HIVE-10217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-10217: Attachment: HIVE-10127.patch LLAP: Support caching of uncompressed ORC data -- Key: HIVE-10217 URL: https://issues.apache.org/jira/browse/HIVE-10217 Project: Hive Issue Type: Sub-task Affects Versions: llap Reporter: Gopal V Assignee: Sergey Shelukhin Fix For: llap Attachments: HIVE-10127.patch {code} Caused by: java.io.IOException: ORC compression buffer size (0) is smaller than LLAP low-level cache minimum allocation size (131072). Decrease the value for hive.llap.io.cache.orc.alloc.min at org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:137) at org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:48) at org.apache.hadoop.hive.common.CallableWithNdc.call(CallableWithNdc.java:37) ... 4 more {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10426) Rework/simplify ReplicationTaskFactory instantiation
[ https://issues.apache.org/jira/browse/HIVE-10426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-10426: Attachment: 10246.out The test failures are due to pre-commit tests trying to still work against svn instead of git. This patch affects only a local part of hcat, so I've attached a test output for the same which shows compilation succeeding across the board, and all affected tests succeeding. Rework/simplify ReplicationTaskFactory instantiation Key: HIVE-10426 URL: https://issues.apache.org/jira/browse/HIVE-10426 Project: Hive Issue Type: Sub-task Components: Import/Export Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: 10246.out, HIVE-10426.patch Creating a new jira to continue discussions from HIVE-10227 as to what ReplicationTask.Factory instantiation should look like. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9152) Dynamic Partition Pruning [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HIVE-9152: --- Attachment: HIVE-9152.3-spark.patch Restarted working on this JIRA. Rebased and regenerated the old patch. Dynamic Partition Pruning [Spark Branch] Key: HIVE-9152 URL: https://issues.apache.org/jira/browse/HIVE-9152 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Brock Noland Assignee: Chao Sun Attachments: HIVE-9152.1-spark.patch, HIVE-9152.2-spark.patch, HIVE-9152.3-spark.patch Tez implemented dynamic partition pruning in HIVE-7826. This is a nice optimization and we should implement the same in HOS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10423) HIVE-7948 breaks deploy_e2e_artifacts.sh
[ https://issues.apache.org/jira/browse/HIVE-10423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509861#comment-14509861 ] Aswathy Chellammal Sreekumar commented on HIVE-10423: - I think i marked it resolved mistakenly HIVE-7948 breaks deploy_e2e_artifacts.sh Key: HIVE-10423 URL: https://issues.apache.org/jira/browse/HIVE-10423 Project: Hive Issue Type: Bug Affects Versions: 1.2.0 Reporter: Eugene Koifman Assignee: Aswathy Chellammal Sreekumar Attachments: HIVE-10423.patch HIVE-7948 added a step to download a ml-1m.zip file and unzip it. this only works if you call deploy_e2e_artifacts.sh once. If you call it again (which is very common in dev) it blocks and ask for additional input from user because target files already exist. This needs to be changed similarly to what we discussed for HIVE-9272, i.e. place artifacts not under source control in testdist/. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-5672) Insert with custom separator not supported for non-local directory
[ https://issues.apache.org/jira/browse/HIVE-5672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509927#comment-14509927 ] Sushanth Sowmyan commented on HIVE-5672: Agree with Xuefu - the grammar might be simplified by simply making KW_LOCAL optional, since there is no other place in hive code that seems to be making use of TOK_LOCAL_DIR. To wit, we could have : {code} KW_LOCAL? KW_DIRECTORY StringLiteral tableRowFormat? tableFileFormat? - ^(TOK_DIR StringLiteral tableRowFormat? tableFileFormat?) {code} This does mean that if at some point, we do still want to have a differentiation between local and non-local writes, we have to go back to Nemon's approach, and his approach is definitely the least damage done approach of not trying to remove something that already exists, so his patch makes sense from that point-of-view. We have two approaches here, and I'm +1 on both: a) Nemon's approach b) Xuefu's suggestion : to make KW_LOCAL optional, and then emiting a TOK_DIR instead of a TOK_LOCAL_DIR for that line, and removing any other occurrences of TOK_LOCAL_DIR. We might eventually add it back because we want it, but it's duplicate code pruning in the meanwhile. Insert with custom separator not supported for non-local directory -- Key: HIVE-5672 URL: https://issues.apache.org/jira/browse/HIVE-5672 Project: Hive Issue Type: Bug Affects Versions: 0.12.0, 1.0.0 Reporter: Romain Rigaux Assignee: Nemon Lou Attachments: HIVE-5672.1.patch, HIVE-5672.2.patch, HIVE-5672.3.patch, HIVE-5672.4.patch, HIVE-5672.5.patch, HIVE-5672.5.patch.tar.gz https://issues.apache.org/jira/browse/HIVE-3682 is great but non local directory don't seem to be supported: {code} insert overwrite directory '/tmp/test-02' row format delimited FIELDS TERMINATED BY ':' select description FROM sample_07 {code} {code} Error while compiling statement: FAILED: ParseException line 2:0 cannot recognize input near 'row' 'format' 'delimited' in select clause {code} This works (with 'local'): {code} insert overwrite local directory '/tmp/test-02' row format delimited FIELDS TERMINATED BY ':' select code, description FROM sample_07 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10456) Grace Hash Join should not load spilled partitions on abort
[ https://issues.apache.org/jira/browse/HIVE-10456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509960#comment-14509960 ] Prasanth Jayachandran commented on HIVE-10456: -- [~hagleitn] Can you take a look again? Grace Hash Join should not load spilled partitions on abort --- Key: HIVE-10456 URL: https://issues.apache.org/jira/browse/HIVE-10456 Project: Hive Issue Type: Bug Affects Versions: 1.2.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Attachments: HIVE-10456.1.patch, HIVE-10456.1.patch, HIVE-10456.2.patch Grace Hash Join loads the spilled partitions to complete the join in closeOp(). This should not happen when closeOp with abort is invoked. Instead it should clean up all the spilled data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-8521) Document the ORC format
[ https://issues.apache.org/jira/browse/HIVE-8521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley resolved HIVE-8521. - Resolution: Fixed This was added to the wiki. Document the ORC format --- Key: HIVE-8521 URL: https://issues.apache.org/jira/browse/HIVE-8521 Project: Hive Issue Type: Bug Components: Documentation, File Formats Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: orc-spec.pdf It is past time that we document the ORC file format. I've started and should have a first pass this week. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10423) HIVE-7948 breaks deploy_e2e_artifacts.sh
[ https://issues.apache.org/jira/browse/HIVE-10423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509816#comment-14509816 ] Eugene Koifman commented on HIVE-10423: --- [~asreekumar], did someone commit this patch? HIVE-7948 breaks deploy_e2e_artifacts.sh Key: HIVE-10423 URL: https://issues.apache.org/jira/browse/HIVE-10423 Project: Hive Issue Type: Bug Affects Versions: 1.2.0 Reporter: Eugene Koifman Assignee: Aswathy Chellammal Sreekumar Attachments: HIVE-10423.patch HIVE-7948 added a step to download a ml-1m.zip file and unzip it. this only works if you call deploy_e2e_artifacts.sh once. If you call it again (which is very common in dev) it blocks and ask for additional input from user because target files already exist. This needs to be changed similarly to what we discussed for HIVE-9272, i.e. place artifacts not under source control in testdist/. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10467) Switch to GIT repository on Jenkins precommit tests
[ https://issues.apache.org/jira/browse/HIVE-10467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509867#comment-14509867 ] Sergey Shelukhin commented on HIVE-10467: - Thanks! Switch to GIT repository on Jenkins precommit tests Key: HIVE-10467 URL: https://issues.apache.org/jira/browse/HIVE-10467 Project: Hive Issue Type: Improvement Components: Testing Infrastructure Reporter: Sergio Peña Assignee: Sergio Peña Fix For: 1.2.0 Attachments: HIVE-10467.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10456) Grace Hash Join should not load spilled partitions on abort
[ https://issues.apache.org/jira/browse/HIVE-10456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-10456: --- Attachment: HIVE-10456.1.patch Grace Hash Join should not load spilled partitions on abort --- Key: HIVE-10456 URL: https://issues.apache.org/jira/browse/HIVE-10456 Project: Hive Issue Type: Bug Affects Versions: 1.2.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Attachments: HIVE-10456.1.patch, HIVE-10456.1.patch Grace Hash Join loads the spilled partitions to complete the join in closeOp(). This should not happen when closeOp with abort is invoked. Instead it should clean up all the spilled data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10470) LLAP: NPE in IO when returning 0 rows with no projection
[ https://issues.apache.org/jira/browse/HIVE-10470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-10470: Description: Looks like a trivial fix, unless I'm missing something. I may do it later if you don't ;) {noformat} aused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.io.orc.EncodedTreeReaderFactory.createEncodedTreeReader(EncodedTreeReaderFactory.java:1764) at org.apache.hadoop.hive.llap.io.decode.OrcEncodedDataConsumer.decodeBatch(OrcEncodedDataConsumer.java:92) at org.apache.hadoop.hive.llap.io.decode.OrcEncodedDataConsumer.decodeBatch(OrcEncodedDataConsumer.java:39) at org.apache.hadoop.hive.llap.io.decode.EncodedDataConsumer.consumeData(EncodedDataConsumer.java:116) at org.apache.hadoop.hive.llap.io.decode.EncodedDataConsumer.consumeData(EncodedDataConsumer.java:36) at org.apache.hadoop.hive.ql.io.orc.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:329) at org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:299) at org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:55) at org.apache.hadoop.hive.common.CallableWithNdc.call(CallableWithNdc.java:37) ... 4 more {noformat} was:Looks like a trivial fix, unless I'm missing something. I may do it later if you don't ;) LLAP: NPE in IO when returning 0 rows with no projection Key: HIVE-10470 URL: https://issues.apache.org/jira/browse/HIVE-10470 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Prasanth Jayachandran Looks like a trivial fix, unless I'm missing something. I may do it later if you don't ;) {noformat} aused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.io.orc.EncodedTreeReaderFactory.createEncodedTreeReader(EncodedTreeReaderFactory.java:1764) at org.apache.hadoop.hive.llap.io.decode.OrcEncodedDataConsumer.decodeBatch(OrcEncodedDataConsumer.java:92) at org.apache.hadoop.hive.llap.io.decode.OrcEncodedDataConsumer.decodeBatch(OrcEncodedDataConsumer.java:39) at org.apache.hadoop.hive.llap.io.decode.EncodedDataConsumer.consumeData(EncodedDataConsumer.java:116) at org.apache.hadoop.hive.llap.io.decode.EncodedDataConsumer.consumeData(EncodedDataConsumer.java:36) at org.apache.hadoop.hive.ql.io.orc.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:329) at org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:299) at org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:55) at org.apache.hadoop.hive.common.CallableWithNdc.call(CallableWithNdc.java:37) ... 4 more {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9451) Add max size of column dictionaries to ORC metadata
[ https://issues.apache.org/jira/browse/HIVE-9451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HIVE-9451: Attachment: HIVE-9451.patch This patch adds the maxDictionarySize and configured stripe size to the metadata of ORC files. I'll need to update the expected results for the qfiles that depend on the size of orc files. Add max size of column dictionaries to ORC metadata --- Key: HIVE-9451 URL: https://issues.apache.org/jira/browse/HIVE-9451 Project: Hive Issue Type: Improvement Reporter: Owen O'Malley Assignee: Owen O'Malley Fix For: 1.2.0 Attachments: HIVE-9451.patch To predict the amount of memory required to read an ORC file we need to know the size of the dictionaries for the columns that we are reading. I propose adding the number of bytes for each column's dictionary to the stripe's column statistics. The file's column statistics would have the maximum dictionary size for each column. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10466) LLAP: fix container sizing configuration for memory
[ https://issues.apache.org/jira/browse/HIVE-10466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509794#comment-14509794 ] Gopal V commented on HIVE-10466: The current memory script does this - it uses NODE_MEM/2 as the per-instance-executor uses 1Gb by default as the cache. Then it goes through a bunch of complex heuristics to produce a complete configuration listing which contains the YARN container.size, the Xmx and the total memory allocated to executors. https://github.com/apache/hive/blob/llap/llap-server/src/main/resources/package.py#L14 This produces a workable configuration, but it misses the total capacity of the node by a significant margin (will be 60%, so no double allocs on a single node, but will be 100%). Even in that script, the yarn min-alloc is missing from the computation, so as we edge closer to the line, the harder it gets to configure this correctly. After that, there's the whole YARN reserved memory fraction to deal with in this, so that we can avoid taking up memory in YARN that LLAP can't use. LLAP: fix container sizing configuration for memory --- Key: HIVE-10466 URL: https://issues.apache.org/jira/browse/HIVE-10466 Project: Hive Issue Type: Sub-task Reporter: Gopal V Assignee: Vikram Dixit K We cannot use full machine for LLAP due to config for cache and executors being split brain... please refer to [~gopalv] for details -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9957) Hive 1.1.0 not compatible with Hadoop 2.4.0
[ https://issues.apache.org/jira/browse/HIVE-9957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509790#comment-14509790 ] Thejas M Nair commented on HIVE-9957: - [~subhashmv] What error do you get with 1.0 and hadoop 2.5 ? I wasn't aware of such an issue. To apply the patch, you need to checkout the code, apply the patch and build new hive package (tar.gz). Additional build instructions are here -https://cwiki.apache.org/confluence/display/Hive/HiveDeveloperFAQ Hive 1.1.0 not compatible with Hadoop 2.4.0 --- Key: HIVE-9957 URL: https://issues.apache.org/jira/browse/HIVE-9957 Project: Hive Issue Type: Bug Components: Encryption Reporter: Vivek Shrivastava Assignee: Sergio Peña Labels: TODOC1.2 Fix For: 1.2.0 Attachments: HIVE-9957.1.patch Getting this exception while accessing data through Hive. Exception in thread main java.lang.NoSuchMethodError: org.apache.hadoop.hdfs.DFSClient.getKeyProvider()Lorg/apache/hadoop/crypto/key/KeyProvider; at org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.init(Hadoop23Shims.java:1152) at org.apache.hadoop.hive.shims.Hadoop23Shims.createHdfsEncryptionShim(Hadoop23Shims.java:1279) at org.apache.hadoop.hive.ql.session.SessionState.getHdfsEncryptionShim(SessionState.java:392) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.isPathEncrypted(SemanticAnalyzer.java:1756) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getStagingDirectoryPathname(SemanticAnalyzer.java:1875) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1689) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1427) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:10132) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10147) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:192) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:222) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:421) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:307) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1112) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1160) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1039) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:207) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:754) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10384) RetryingMetaStoreClient does not retry wrapped TTransportExceptions
[ https://issues.apache.org/jira/browse/HIVE-10384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509900#comment-14509900 ] Szehon Ho commented on HIVE-10384: -- Hey Chaoyu, new patch looks good, I'm just wondering does the other check need all that TApplication|TProtocol|TTransport check? RetryingMetaStoreClient does not retry wrapped TTransportExceptions --- Key: HIVE-10384 URL: https://issues.apache.org/jira/browse/HIVE-10384 Project: Hive Issue Type: Bug Components: Clients Reporter: Eric Liang Assignee: Chaoyu Tang Attachments: HIVE-10384.1.patch, HIVE-10384.patch This bug is very similar to HIVE-9436, in that a TTransportException wrapped in a MetaException will not be retried. RetryingMetaStoreClient has a block of code above the MetaException handler that retries thrift exceptions, but this doesn't work when the exception is wrapped. {code} if ((e.getCause() instanceof TApplicationException) || (e.getCause() instanceof TProtocolException) || (e.getCause() instanceof TTransportException)) { caughtException = (TException) e.getCause(); } else if ((e.getCause() instanceof MetaException) e.getCause().getMessage().matches((?s).*JDO[a-zA-Z]*Exception.*)) { caughtException = (MetaException) e.getCause(); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8459) DbLockManager locking table in addition to partitions
[ https://issues.apache.org/jira/browse/HIVE-8459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-8459: - Priority: Minor (was: Major) DbLockManager locking table in addition to partitions - Key: HIVE-8459 URL: https://issues.apache.org/jira/browse/HIVE-8459 Project: Hive Issue Type: Bug Components: Locking Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Minor Queries and operations on partitioned tables are generating locks on the whole table when they should only be locking the partition. For example: {code} select count(*) from concur_orc_tab_part where ds = 'today'; {code} This should only be locking the partition ds='today'. But instead: {code} mysql select * from HIVE_LOCKS; +++--+-+-+--+---+--+---++-++ | HL_LOCK_EXT_ID | HL_LOCK_INT_ID | HL_TXNID | HL_DB | HL_TABLE | HL_PARTITION | HL_LOCK_STATE | HL_LOCK_TYPE | HL_LAST_HEARTBEAT | HL_ACQUIRED_AT | HL_USER | HL_HOST| +++--+-+-+--+---+--+---++-++ |428 | 1 |0 | default | concur_orc_tab_part | NULL | a | r| 1413311172000 | 1413311171000 | hive| node-1.example.com | |428 | 2 |0 | default | concur_orc_tab_part | ds=today | a | r| 1413311172000 | 1413311171000 | hive| node-1.example.com | +++--+-+-+--+---+--+---++-++ {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10465) whitelist restrictions don't get initialized in new copy of HiveConf
[ https://issues.apache.org/jira/browse/HIVE-10465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510010#comment-14510010 ] Daniel Dai commented on HIVE-10465: --- LGTM, +1. whitelist restrictions don't get initialized in new copy of HiveConf Key: HIVE-10465 URL: https://issues.apache.org/jira/browse/HIVE-10465 Project: Hive Issue Type: Bug Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-10465.1.patch Whitelist restrictions use a regex pattern in HiveConf, but when a new HiveConf object copy is created, the regex pattern is not initialized in the new HiveConf copy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10468) Create scripts to do metastore upgrade tests on jenkins for Oracle DB.
[ https://issues.apache.org/jira/browse/HIVE-10468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naveen Gangam updated HIVE-10468: - Description: This JIRA is to isolate the work specific to Oracle DB in HIVE-10239. Because of absence of 64 bit debian packages for oracle-xe, the apt-get install fails on the AWS systems. Create scripts to do metastore upgrade tests on jenkins for Oracle DB. -- Key: HIVE-10468 URL: https://issues.apache.org/jira/browse/HIVE-10468 Project: Hive Issue Type: Improvement Components: Metastore Affects Versions: 1.1.0 Reporter: Naveen Gangam Assignee: Naveen Gangam This JIRA is to isolate the work specific to Oracle DB in HIVE-10239. Because of absence of 64 bit debian packages for oracle-xe, the apt-get install fails on the AWS systems. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-10442) HIVE-10098 broke hadoop-1 build
[ https://issues.apache.org/jira/browse/HIVE-10442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongzhi Chen reassigned HIVE-10442: --- Assignee: Yongzhi Chen HIVE-10098 broke hadoop-1 build --- Key: HIVE-10442 URL: https://issues.apache.org/jira/browse/HIVE-10442 Project: Hive Issue Type: Bug Affects Versions: 1.2.0 Reporter: Prasanth Jayachandran Assignee: Yongzhi Chen fs.addDelegationTokens() method does not seem to exist in hadoop 1.2.1. This breaks the hadoop-1 builds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10467) Switch to GIT repository on Jenkins precommit tests
[ https://issues.apache.org/jira/browse/HIVE-10467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509895#comment-14509895 ] Sergio Peña commented on HIVE-10467: This patch was just part of it. There are some properties files on the jenkins instance where this need to be change as well. All *.properties file that exist on /usr/local/hiveptest/etc/public have the following lines that must be changed: {noformat} repositoryType = git repository = https://git-wip-us.apache.org/repos/asf/hive.git repositoryName = apache-git-master {noformat} I will leave this for future reference. Switch to GIT repository on Jenkins precommit tests Key: HIVE-10467 URL: https://issues.apache.org/jira/browse/HIVE-10467 Project: Hive Issue Type: Improvement Components: Testing Infrastructure Reporter: Sergio Peña Assignee: Sergio Peña Fix For: 1.2.0 Attachments: HIVE-10467.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10239) Create scripts to do metastore upgrade tests on jenkins for Derby, Oracle and PostgreSQL
[ https://issues.apache.org/jira/browse/HIVE-10239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-10239: --- Attachment: HIVE-10239.03.patch Create scripts to do metastore upgrade tests on jenkins for Derby, Oracle and PostgreSQL Key: HIVE-10239 URL: https://issues.apache.org/jira/browse/HIVE-10239 Project: Hive Issue Type: Improvement Affects Versions: 1.1.0 Reporter: Naveen Gangam Assignee: Naveen Gangam Attachments: HIVE-10239-donotcommit.patch, HIVE-10239.0.patch, HIVE-10239.0.patch, HIVE-10239.00.patch, HIVE-10239.01.patch, HIVE-10239.02.patch, HIVE-10239.03.patch, HIVE-10239.03.patch, HIVE-10239.patch Need to create DB-implementation specific scripts to use the framework introduced in HIVE-9800 to have any metastore schema changes tested across all supported databases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10443) HIVE-9870 broke hadoop-1 build
[ https://issues.apache.org/jira/browse/HIVE-10443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-10443: --- Attachment: HIVE-10443.1.patch HIVE-9870 broke hadoop-1 build -- Key: HIVE-10443 URL: https://issues.apache.org/jira/browse/HIVE-10443 Project: Hive Issue Type: Bug Affects Versions: 1.2.0 Reporter: Prasanth Jayachandran Assignee: Vaibhav Gumashta Fix For: 1.2.0 Attachments: HIVE-10443.1.patch, HIVE-10443.1.patch JvmPauseMonitor added in HIVE-9870 is breaking hadoop-1 build. HiveServer2.startPauseMonitor() does not use reflection properly to start JvmPauseMonitor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10419) can't do query on partitioned view with analytic function in strictmode
[ https://issues.apache.org/jira/browse/HIVE-10419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-10419: Attachment: HIVE-10419.patch can't do query on partitioned view with analytic function in strictmode --- Key: HIVE-10419 URL: https://issues.apache.org/jira/browse/HIVE-10419 Project: Hive Issue Type: Bug Components: Hive, Views Affects Versions: 0.13.0, 0.14.0, 1.0.0 Environment: Cloudera 5.3.x. Reporter: Hector Lagos Hey Guys, I created the following table: CREATE TABLE t1 (id int, key string, value string) partitioned by (dt int); And after that i created a view on that table as follow: create view v1 PARTITIONED ON (dt) as SELECT * FROM ( SELECT row_number() over (partition by key order by value asc) as row_n, * FROM t1 ) t WHERE row_n = 1; We are working with hive.mapred.mode=strict and when I try to do the query select * from v1 where dt = 2 , I'm getting the following error: FAILED: SemanticException [Error 10041]: No partition predicate found for Alias v1:t:t1 Table t1 Is this a bug or a limitation of Hive when you use analytic functions in partitioned views? If i remove the row_number function it works without problems. Thanks in advance, any help will be appreciated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10472) Jenkins HMS upgrade test is not publishing results due to GIT change
[ https://issues.apache.org/jira/browse/HIVE-10472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-10472: --- Attachment: HIVE-10472.1.patch [~szehon] Here's another tiny fix for GIT so that HMS upgrade tests can publish the results. Jenkins HMS upgrade test is not publishing results due to GIT change Key: HIVE-10472 URL: https://issues.apache.org/jira/browse/HIVE-10472 Project: Hive Issue Type: Bug Reporter: Sergio Peña Assignee: Sergio Peña Attachments: HIVE-10472.1.patch This error is happening on Jenkins when running the HMS upgrade tests. The class used to publish the results is not found on any directory. + cd /var/lib/jenkins/jobs/PreCommit-HIVE-METASTORE-Test/workspace + set +x Exception in thread main java.lang.NoClassDefFoundError: org/apache/hive/ptest/execution/JIRAService Caused by: java.lang.ClassNotFoundException: org.apache.hive.ptest.execution.JIRAService at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) Could not find the main class: org.apache.hive.ptest.execution.JIRAService. Program will exit. + ret=0 The problem is because the jenkins-execute-hms-test.sh is downloading the code to another directory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-5672) Insert with custom separator not supported for non-local directory
[ https://issues.apache.org/jira/browse/HIVE-5672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509440#comment-14509440 ] Xuefu Zhang commented on HIVE-5672: --- Looking at the patch, I'm not sure if I understand the changes correctly. I can see that we modified the grammar to make local optional and the rest is about refactoring. I'm not sure if this is sufficient. Did I miss anything? Also, instead of adding a new grammar rule, we should combine it with the old one. We just need to make KW_LOCAL optional. Insert with custom separator not supported for non-local directory -- Key: HIVE-5672 URL: https://issues.apache.org/jira/browse/HIVE-5672 Project: Hive Issue Type: Bug Affects Versions: 0.12.0, 1.0.0 Reporter: Romain Rigaux Assignee: Nemon Lou Attachments: HIVE-5672.1.patch, HIVE-5672.2.patch, HIVE-5672.3.patch, HIVE-5672.4.patch, HIVE-5672.5.patch, HIVE-5672.5.patch.tar.gz https://issues.apache.org/jira/browse/HIVE-3682 is great but non local directory don't seem to be supported: {code} insert overwrite directory '/tmp/test-02' row format delimited FIELDS TERMINATED BY ':' select description FROM sample_07 {code} {code} Error while compiling statement: FAILED: ParseException line 2:0 cannot recognize input near 'row' 'format' 'delimited' in select clause {code} This works (with 'local'): {code} insert overwrite local directory '/tmp/test-02' row format delimited FIELDS TERMINATED BY ':' select code, description FROM sample_07 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10464) How i find the kryo version
[ https://issues.apache.org/jira/browse/HIVE-10464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509493#comment-14509493 ] ankush commented on HIVE-10464: --- i ask the que on the u...@hive.apache.org mailing list. Thank you How i find the kryo version Key: HIVE-10464 URL: https://issues.apache.org/jira/browse/HIVE-10464 Project: Hive Issue Type: Improvement Reporter: ankush Could you please let me know how i find the kryo version that i using ? Please help me on this, We are just running HQL (Hive) queries -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10434) Cancel connection when remote Spark driver process has failed [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HIVE-10434: Attachment: HIVE-10434.3-spark.patch Addressing RB comments. Cancel connection when remote Spark driver process has failed [Spark Branch] - Key: HIVE-10434 URL: https://issues.apache.org/jira/browse/HIVE-10434 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: 1.2.0 Reporter: Chao Sun Assignee: Chao Sun Attachments: HIVE-10434.1-spark.patch, HIVE-10434.3-spark.patch Currently in HoS, in SparkClientImpl it first launch a remote Driver process, and then wait for it to connect back to the HS2. However, in certain situations (for instance, permission issue), the remote process may fail and exit with error code. In this situation, the HS2 process will still wait for the process to connect, and wait for a full timeout period before it throws the exception. What makes it worth, user may need to wait for two timeout periods: one for the SparkSetReducerParallelism, and another for the actual Spark job. This could be very annoying. We should cancel the timeout task once we found out that the process has failed, and set the promise as failed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10347) Merge spark to trunk 4/15/2015
[ https://issues.apache.org/jira/browse/HIVE-10347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510449#comment-14510449 ] Lefty Leverenz commented on HIVE-10347: --- Doc note: TODOC-SPARK labels are on the individual JIRA issues. I'll add TODOC-1.2 labels too. Merge spark to trunk 4/15/2015 -- Key: HIVE-10347 URL: https://issues.apache.org/jira/browse/HIVE-10347 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Szehon Ho Assignee: Szehon Ho Fix For: 1.2.0 Attachments: HIVE-10347.2.patch, HIVE-10347.2.patch, HIVE-10347.3.patch, HIVE-10347.4.patch, HIVE-10347.5.patch, HIVE-10347.5.patch, HIVE-10347.6.patch, HIVE-10347.6.patch, HIVE-10347.patch CLEAR LIBRARY CACHE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10465) whitelist restrictions don't get initialized in new copy of HiveConf
[ https://issues.apache.org/jira/browse/HIVE-10465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-10465: - Attachment: HIVE-10465.1.patch whitelist restrictions don't get initialized in new copy of HiveConf Key: HIVE-10465 URL: https://issues.apache.org/jira/browse/HIVE-10465 Project: Hive Issue Type: Bug Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-10465.1.patch Whitelist restrictions use a regex pattern in HiveConf, but when a new HiveConf object copy is created, the regex pattern is not initialized in the new HiveConf copy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-4625) HS2 should not attempt to get delegation token from metastore if using embedded metastore
[ https://issues.apache.org/jira/browse/HIVE-4625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509746#comment-14509746 ] Hari Sankar Sivarama Subramaniyan commented on HIVE-4625: - The test failures are unrelated to the change. Thanks Hari HS2 should not attempt to get delegation token from metastore if using embedded metastore - Key: HIVE-4625 URL: https://issues.apache.org/jira/browse/HIVE-4625 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.11.0 Reporter: Thejas M Nair Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-4625.1.patch, HIVE-4625.2.patch, HIVE-4625.3.patch, HIVE-4625.4.patch, HIVE-4625.5.patch In kerberos secure mode, with doas enabled, Hive server2 tries to get delegation token from metastore even if the metastore is being used in embedded mode. To avoid failure in that case, it uses catch block for UnsupportedOperationException thrown that does nothing. But this leads to an error being logged by lower levels and can mislead users into thinking that there is a problem. It should check if delegation token mode is supported with current configuration before calling the function. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10474) LLAP: investigate why TPCH Q1 1k is slow
[ https://issues.apache.org/jira/browse/HIVE-10474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510047#comment-14510047 ] Sergey Shelukhin commented on HIVE-10474: - [~hagleitn] [~sseth] [~gopalv] fyi LLAP: investigate why TPCH Q1 1k is slow Key: HIVE-10474 URL: https://issues.apache.org/jira/browse/HIVE-10474 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin While most queries run faster in LLAP than just Tez with container reuse, TPCH Q1 is much slower. On my run, tez with container reuse (current default LLAP configuration but mode == container and no daemons running) runs 2-6 (out of 6 consecutive runs in the same session) finished in 25.5sec average; with 16 LLAP daemons in default config the average was 35.5sec; same w/o IO elevator (to rule out its impact) it took 59.7sec w/strange distribution (later runs were slower than earlier runs, still, fastest run was 49.5sec). So excluding IO elevator it's more than 2x degradation. We need to figure out why this is happening. Is it just slot discrepancy? Regardless, this needs to be addressed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10474) LLAP: investigate why TPCH Q1 1k is slow
[ https://issues.apache.org/jira/browse/HIVE-10474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-10474: Description: While most queries run faster in LLAP than just Tez with container reuse, TPCH Q1 is much slower. On my run, on tez with container reuse (current default LLAP configuration but mode == container and no daemons running) runs 2-6 (out of 6 consecutive runs in the same session) finished in 25.5sec average; with 16 LLAP daemons in default config the average was 35.5sec; same w/o IO elevator (to rule out its impact) it took 59.7sec w/strange distribution (later runs were slower than earlier runs, still, fastest run was 49.5sec). So excluding IO elevator it's more than 2x degradation. We need to figure out why this is happening. Is it just slot discrepancy? Regardless, this needs to be addressed. was: While most queries run faster in LLAP than just Tez with container reuse, TPCH Q1 is much slower. On my run, tez with container reuse (current default LLAP configuration but mode == container and no daemons running) runs 2-6 (out of 6 consecutive runs in the same session) finished in 25.5sec average; with 16 LLAP daemons in default config the average was 35.5sec; same w/o IO elevator (to rule out its impact) it took 59.7sec w/strange distribution (later runs were slower than earlier runs, still, fastest run was 49.5sec). So excluding IO elevator it's more than 2x degradation. We need to figure out why this is happening. Is it just slot discrepancy? Regardless, this needs to be addressed. LLAP: investigate why TPCH Q1 1k is slow Key: HIVE-10474 URL: https://issues.apache.org/jira/browse/HIVE-10474 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin While most queries run faster in LLAP than just Tez with container reuse, TPCH Q1 is much slower. On my run, on tez with container reuse (current default LLAP configuration but mode == container and no daemons running) runs 2-6 (out of 6 consecutive runs in the same session) finished in 25.5sec average; with 16 LLAP daemons in default config the average was 35.5sec; same w/o IO elevator (to rule out its impact) it took 59.7sec w/strange distribution (later runs were slower than earlier runs, still, fastest run was 49.5sec). So excluding IO elevator it's more than 2x degradation. We need to figure out why this is happening. Is it just slot discrepancy? Regardless, this needs to be addressed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10451) PTF deserializer fails if values are not used in reducer
[ https://issues.apache.org/jira/browse/HIVE-10451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-10451: Attachment: HIVE-10451.1.patch Patch which fixes the typeinfo parsing, since plan is valid and correct. PTF deserializer fails if values are not used in reducer -- Key: HIVE-10451 URL: https://issues.apache.org/jira/browse/HIVE-10451 Project: Hive Issue Type: Bug Components: Logical Optimizer Affects Versions: 0.14.0, 1.0.0, 1.1.0 Reporter: Gopal V Assignee: Ashutosh Chauhan Attachments: HIVE-10451.1.patch, HIVE-10451.patch In this particular case Select on top of PTF Op is needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10451) PTF deserializer fails if values are not used in reducer
[ https://issues.apache.org/jira/browse/HIVE-10451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-10451: Summary: PTF deserializer fails if values are not used in reducer (was: IdentityProjectRemover removed useful project) PTF deserializer fails if values are not used in reducer -- Key: HIVE-10451 URL: https://issues.apache.org/jira/browse/HIVE-10451 Project: Hive Issue Type: Bug Components: Logical Optimizer Affects Versions: 0.14.0, 1.0.0, 1.1.0 Reporter: Gopal V Assignee: Ashutosh Chauhan Attachments: HIVE-10451.1.patch, HIVE-10451.patch In this particular case Select on top of PTF Op is needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10036) Writing ORC format big table causes OOM - too many fixed sized stream buffers
[ https://issues.apache.org/jira/browse/HIVE-10036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510096#comment-14510096 ] Owen O'Malley commented on HIVE-10036: -- I understand the problem, but this patch creates more trouble than it fixes. The original design is such that you don't do buffer copies. This patch destroys that design and adds both buffer copies and reallocations. We should set the buffer sizes down for the bit vectors, but this patch is going the wrong way. Writing ORC format big table causes OOM - too many fixed sized stream buffers - Key: HIVE-10036 URL: https://issues.apache.org/jira/browse/HIVE-10036 Project: Hive Issue Type: Improvement Reporter: Selina Zhang Assignee: Selina Zhang Labels: orcfile Attachments: HIVE-10036.1.patch, HIVE-10036.2.patch, HIVE-10036.3.patch, HIVE-10036.5.patch, HIVE-10036.6.patch ORC writer keeps multiple out steams for each column. Each output stream is allocated fixed size ByteBuffer (configurable, default to 256K). For a big table, the memory cost is unbearable. Specially when HCatalog dynamic partition involves, several hundreds files may be open and writing at the same time (same problems for FileSinkOperator). Global ORC memory manager controls the buffer size, but it only got kicked in at 5000 rows interval. An enhancement could be done here, but the problem is reducing the buffer size introduces worse compression and more IOs in read path. Sacrificing the read performance is always not a good choice. I changed the fixed size ByteBuffer to a dynamic growth buffer which up bound to the existing configurable buffer size. Most of the streams does not need large buffer so the performance got improved significantly. Comparing to Facebook's hive-dwrf, I monitored 2x performance gain with this fix. Solving OOM for ORC completely maybe needs lots of effort , but this is definitely a low hanging fruit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10475) LLAP: Minor fixes after tez api enhancements for dag completion
[ https://issues.apache.org/jira/browse/HIVE-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-10475: -- Attachment: HIVE-10475.1.txt LLAP: Minor fixes after tez api enhancements for dag completion --- Key: HIVE-10475 URL: https://issues.apache.org/jira/browse/HIVE-10475 Project: Hive Issue Type: Sub-task Reporter: Siddharth Seth Assignee: Siddharth Seth Fix For: llap Attachments: HIVE-10475.1.txt TEZ-2212 and TEZ-2361 add APIs to propagate dag completion information to the TaskCommunicator plugin. This jira is for minor fixes to get the llap branch to compile against these changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10416) CBO (Calcite Return Path): Fix return columns if Sort operator is on top of plan returned by Calcite
[ https://issues.apache.org/jira/browse/HIVE-10416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-10416: --- Attachment: HIVE-10416.02.patch [~jpullokkaran], I have changed the patch according to our discussion. Could you check it? Thanks CBO (Calcite Return Path): Fix return columns if Sort operator is on top of plan returned by Calcite Key: HIVE-10416 URL: https://issues.apache.org/jira/browse/HIVE-10416 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Fix For: 1.2.0 Attachments: HIVE-10416.01.patch, HIVE-10416.02.patch, HIVE-10416.patch When return path is on, if the plan's top operator is a Sort, we need to produce a SelectOp that will output exactly the columns needed by the FS. The following query reproduces the problem: {noformat} select cbo_t3.c_int, c, count(*) from (select key as a, c_int+1 as b, sum(c_int) as c from cbo_t1 where (cbo_t1.c_int + 1 = 0) and (cbo_t1.c_int 0 or cbo_t1.c_float = 0) group by c_float, cbo_t1.c_int, key order by a) cbo_t1 join (select key as p, c_int+1 as q, sum(c_int) as r from cbo_t2 where (cbo_t2.c_int + 1 = 0) and (cbo_t2.c_int 0 or cbo_t2.c_float = 0) group by c_float, cbo_t2.c_int, key order by q/10 desc, r asc) cbo_t2 on cbo_t1.a=p join cbo_t3 on cbo_t1.a=key where (b + cbo_t2.q = 0) and (b 0 or c_int = 0) group by cbo_t3.c_int, c order by cbo_t3.c_int+c desc, c; {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10431) HIVE-9555 broke hadoop-1 build
[ https://issues.apache.org/jira/browse/HIVE-10431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-10431: Attachment: HIVE-10431.patch This fixes the build issue for me HIVE-9555 broke hadoop-1 build -- Key: HIVE-10431 URL: https://issues.apache.org/jira/browse/HIVE-10431 Project: Hive Issue Type: Bug Affects Versions: 1.2.0 Reporter: Prasanth Jayachandran Assignee: Sergey Shelukhin Attachments: HIVE-10431.patch HIVE-9555 RecordReaderUtils uses direct bytebuffer read from FSDataInputStream which is not present in hadoop-1. This breaks hadoop-1 compilation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10474) LLAP: investigate why TPCH Q1 1k is slow
[ https://issues.apache.org/jira/browse/HIVE-10474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-10474: Description: While most queries run faster in LLAP than just Tez with container reuse, TPCH Q1 is much slower. On my run, tez with container reuse (current default LLAP configuration but mode == container and no daemons running) run 2-6 (out of 6) finished in 25.5sec average; with 16 LLAP daemons in default config it finished in 35.5sec; w/the daemons w/o IO elevator (to rule out its impact) it took 59.7sec w/strange distribution (later runs were slower than earlier runs, still, fastest run was 49.5sec). We need to figure out why this is happening. Is it just slot discrepancy? Regardless, this needs to be addressed. was: While most queries run faster in LLAP than just Tez with container reuse, TPCH Q1 is much slower. On my run, tez with container reuse (current default LLAP configuration but mode == container and no daemons running) finished in 25.5sec average; with 16 LLAP daemons in default config it finished in 35.5sec; w/the daemons w/o IO elevator (to rule out its impact) it took 59.7sec w/strange distribution (later runs were slower than earlier runs, still, fastest run was 49.5sec). We need to figure out why this is happening. Is it just slot discrepancy? Regardless, this needs to be addressed. LLAP: investigate why TPCH Q1 1k is slow Key: HIVE-10474 URL: https://issues.apache.org/jira/browse/HIVE-10474 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin While most queries run faster in LLAP than just Tez with container reuse, TPCH Q1 is much slower. On my run, tez with container reuse (current default LLAP configuration but mode == container and no daemons running) run 2-6 (out of 6) finished in 25.5sec average; with 16 LLAP daemons in default config it finished in 35.5sec; w/the daemons w/o IO elevator (to rule out its impact) it took 59.7sec w/strange distribution (later runs were slower than earlier runs, still, fastest run was 49.5sec). We need to figure out why this is happening. Is it just slot discrepancy? Regardless, this needs to be addressed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10474) LLAP: investigate why TPCH Q1 1k is slow
[ https://issues.apache.org/jira/browse/HIVE-10474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-10474: Description: While most queries run faster in LLAP than just Tez with container reuse, TPCH Q1 is much slower. On my run, tez with container reuse (current default LLAP configuration but mode == container and no daemons running) runs 2-6 (out of 6 consecutive runs in the same session) finished in 25.5sec average; with 16 LLAP daemons in default config the average was 35.5sec; same w/o IO elevator (to rule out its impact) it took 59.7sec w/strange distribution (later runs were slower than earlier runs, still, fastest run was 49.5sec). So excluding IO elevator it's more than 2x degradation. We need to figure out why this is happening. Is it just slot discrepancy? Regardless, this needs to be addressed. was: While most queries run faster in LLAP than just Tez with container reuse, TPCH Q1 is much slower. On my run, tez with container reuse (current default LLAP configuration but mode == container and no daemons running) run 2-6 (out of 6) finished in 25.5sec average; with 16 LLAP daemons in default config it finished in 35.5sec; w/the daemons w/o IO elevator (to rule out its impact) it took 59.7sec w/strange distribution (later runs were slower than earlier runs, still, fastest run was 49.5sec). We need to figure out why this is happening. Is it just slot discrepancy? Regardless, this needs to be addressed. LLAP: investigate why TPCH Q1 1k is slow Key: HIVE-10474 URL: https://issues.apache.org/jira/browse/HIVE-10474 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin While most queries run faster in LLAP than just Tez with container reuse, TPCH Q1 is much slower. On my run, tez with container reuse (current default LLAP configuration but mode == container and no daemons running) runs 2-6 (out of 6 consecutive runs in the same session) finished in 25.5sec average; with 16 LLAP daemons in default config the average was 35.5sec; same w/o IO elevator (to rule out its impact) it took 59.7sec w/strange distribution (later runs were slower than earlier runs, still, fastest run was 49.5sec). So excluding IO elevator it's more than 2x degradation. We need to figure out why this is happening. Is it just slot discrepancy? Regardless, this needs to be addressed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10302) Load small tables (for map join) in executor memory only once [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HIVE-10302: --- Attachment: HIVE-10302.3-spark.patch Load small tables (for map join) in executor memory only once [Spark Branch] Key: HIVE-10302 URL: https://issues.apache.org/jira/browse/HIVE-10302 Project: Hive Issue Type: Improvement Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: spark-branch Attachments: HIVE-10302.2-spark.patch, HIVE-10302.3-spark.patch, HIVE-10302.spark-1.patch Usually there are multiple cores in a Spark executor, and thus it's possible that multiple map-join tasks can be running in the same executor (concurrently or sequentially). Currently, each task will load its own copy of the small tables for map join into memory, ending up with inefficiency. Ideally, we only load the small tables once and share them among the tasks running in that executor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10302) Load small tables (for map join) in executor memory only once [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510158#comment-14510158 ] Hive QA commented on HIVE-10302: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12727750/HIVE-10302.3-spark.patch {color:red}ERROR:{color} -1 due to 20 failed/errored test(s), 8721 tests executed *Failed tests:* {noformat} TestMinimrCliDriver-bucket6.q-scriptfile1_win.q-quotedid_smb.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-bucketizedhiveinputformat.q-empty_dir_in_table.q - did not produce a TEST-*.xml file TestMinimrCliDriver-groupby2.q-infer_bucket_sort_map_operators.q-load_hdfs_file_with_space_in_the_name.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-import_exported_table.q-truncate_column_buckets.q-bucket_num_reducers2.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-index_bitmap3.q-infer_bucket_sort_num_buckets.q-parallel_orderby.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-infer_bucket_sort_reducers_power_two.q-join1.q-infer_bucket_sort_bucketed_table.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-leftsemijoin_mr.q-bucket5.q-infer_bucket_sort_merge.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-list_bucket_dml_10.q-input16_cc.q-temp_table_external.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-ql_rewrite_gbtoidx.q-bucket_num_reducers.q-scriptfile1.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-ql_rewrite_gbtoidx_cbo_2.q-bucketmapjoin6.q-bucket4.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-reduce_deduplicate.q-infer_bucket_sort_dyn_part.q-udf_using.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-schemeAuthority2.q-uber_reduce.q-ql_rewrite_gbtoidx_cbo_1.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-stats_counter_partitioned.q-external_table_with_space_in_location_path.q-disable_merge_for_bucketing.q-and-1-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_sortmerge_join_1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_sortmerge_join_2 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_sortmerge_join_7 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_sortmerge_join_8 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin11 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin5 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketsortoptimize_insert_2 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/837/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/837/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-837/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 20 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12727750 - PreCommit-HIVE-SPARK-Build Load small tables (for map join) in executor memory only once [Spark Branch] Key: HIVE-10302 URL: https://issues.apache.org/jira/browse/HIVE-10302 Project: Hive Issue Type: Improvement Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: spark-branch Attachments: HIVE-10302.2-spark.patch, HIVE-10302.3-spark.patch, HIVE-10302.spark-1.patch Usually there are multiple cores in a Spark executor, and thus it's possible that multiple map-join tasks can be running in the same executor (concurrently or sequentially). Currently, each task will load its own copy of the small tables for map join into memory, ending up with inefficiency. Ideally, we only load the small tables once and share them among the tasks running in that executor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)