[jira] [Updated] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark
[ https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-7624: - Status: Patch Available (was: Reopened) Reduce operator initialization failed when running multiple MR query on spark - Key: HIVE-7624 URL: https://issues.apache.org/jira/browse/HIVE-7624 Project: Hive Issue Type: Bug Components: Spark Reporter: Rui Li Assignee: Rui Li Fix For: spark-branch Attachments: HIVE-7624.2-spark.patch, HIVE-7624.3-spark.patch, HIVE-7624.4-spark.patch, HIVE-7624.5-spark.patch, HIVE-7624.6-spark.patch, HIVE-7624.7-spark.patch, HIVE-7624.patch The following error occurs when I try to run a query with multiple reduce works (M-R-R): {quote} 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1) java.lang.RuntimeException: Reduce operator initialization failed at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170) at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53) at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from [0:_col0] at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415) at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147) … {quote} I suspect we're applying the reduce function in wrong order. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark
[ https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-7624: - Attachment: HIVE-7624.7-spark.patch Reduce operator initialization failed when running multiple MR query on spark - Key: HIVE-7624 URL: https://issues.apache.org/jira/browse/HIVE-7624 Project: Hive Issue Type: Bug Components: Spark Reporter: Rui Li Assignee: Rui Li Fix For: spark-branch Attachments: HIVE-7624.2-spark.patch, HIVE-7624.3-spark.patch, HIVE-7624.4-spark.patch, HIVE-7624.5-spark.patch, HIVE-7624.6-spark.patch, HIVE-7624.7-spark.patch, HIVE-7624.patch The following error occurs when I try to run a query with multiple reduce works (M-R-R): {quote} 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1) java.lang.RuntimeException: Reduce operator initialization failed at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170) at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53) at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from [0:_col0] at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415) at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147) … {quote} I suspect we're applying the reduce function in wrong order. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7757) PTest2 separates test files with spaces while QTestGen uses commas
[ https://issues.apache.org/jira/browse/HIVE-7757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100314#comment-14100314 ] Szehon Ho commented on HIVE-7757: - +1 PTest2 separates test files with spaces while QTestGen uses commas -- Key: HIVE-7757 URL: https://issues.apache.org/jira/browse/HIVE-7757 Project: Hive Issue Type: Improvement Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-7757.1.patch I noticed in HIVE-7749 that even after the testconfiguration.properties file is updated TestSparkCliDriver is not being generated correctly. Basically it doesn't include any tests. The issue appears to be that in the pom file properties are separated by comma and the PTest2 properties files are separated by spaces. Since both comma and space are not used in the qtest properties files let's update all parsing code to use both comma and space. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (HIVE-6144) Implement non-staged MapJoin
[ https://issues.apache.org/jira/browse/HIVE-6144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100309#comment-14100309 ] Lefty Leverenz edited comment on HIVE-6144 at 8/18/14 6:06 AM: --- Review request: *hive.auto.convert.join.use.nonstaged* has been added to the section Optimize Auto Join Conversion in a version-0.13.0 box. Is that the right place for it? Could we have some examples and guidance on when to use it? * [Optimize Auto Join Conversion | https://cwiki.apache.org/confluence/display/Hive/LanguageManual+JoinOptimization#LanguageManualJoinOptimization-OptimizeAutoJoinConversion] Also in that section, I changed the value of *hive.auto.convert.join.noconditionaltask.size* to match the default (1000) -- it had been 1 which seemed rather small, but if that value was intended please let me know. Edit Should this information from the parameter description be included in the version-0.13.0 box in Optimize Auto Join Conversion? -- Currently, this is not working with vectorization or Tez execution engine. was (Author: le...@hortonworks.com): Review request: *hive.auto.convert.join.use.nonstaged* has been added to the section Optimize Auto Join Conversion in a version-0.13.0 box. Is that the right place for it? Could we have some examples and guidance on when to use it? * [Optimize Auto Join Conversion | https://cwiki.apache.org/confluence/display/Hive/LanguageManual+JoinOptimization#LanguageManualJoinOptimization-OptimizeAutoJoinConversion] Also in that section, I changed the value of *hive.auto.convert.join.noconditionaltask.size* to match the default (1000) -- it had been 1 which seemed rather small, but if that value was intended please let me know. Implement non-staged MapJoin Key: HIVE-6144 URL: https://issues.apache.org/jira/browse/HIVE-6144 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Labels: TODOC13 Fix For: 0.13.0 Attachments: HIVE-6144.1.patch.txt, HIVE-6144.2.patch.txt, HIVE-6144.3.patch.txt, HIVE-6144.4.patch.txt, HIVE-6144.5.patch.txt, HIVE-6144.6.patch.txt, HIVE-6144.7.patch.txt, HIVE-6144.8.patch.txt, HIVE-6144.9.patch.txt For map join, all data in small aliases are hashed and stored into temporary file in MapRedLocalTask. But for some aliases without filter or projection, it seemed not necessary to do that. For example. {noformat} select a.* from src a join src b on a.key=b.key; {noformat} makes plan like this. {noformat} STAGE PLANS: Stage: Stage-4 Map Reduce Local Work Alias - Map Local Tables: a Fetch Operator limit: -1 Alias - Map Local Operator Tree: a TableScan alias: a HashTable Sink Operator condition expressions: 0 {key} {value} 1 handleSkewJoin: false keys: 0 [Column[key]] 1 [Column[key]] Position of Big Table: 1 Stage: Stage-3 Map Reduce Alias - Map Operator Tree: b TableScan alias: b Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {key} {value} 1 handleSkewJoin: false keys: 0 [Column[key]] 1 [Column[key]] outputColumnNames: _col0, _col1 Position of Big Table: 1 Select Operator File Output Operator Local Work: Map Reduce Local Work Stage: Stage-0 Fetch Operator {noformat} table src(a) is fetched and stored as-is in MRLocalTask. With this patch, plan can be like below. {noformat} Stage: Stage-3 Map Reduce Alias - Map Operator Tree: b TableScan alias: b Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {key} {value} 1 handleSkewJoin: false keys: 0 [Column[key]] 1 [Column[key]] outputColumnNames: _col0, _col1 Position of Big Table: 1 Select Operator File Output Operator Local Work: Map Reduce Local Work Alias - Map Local Tables: a Fetch Operator limit: -1 Alias - Map Local Operator Tree: a TableScan
[jira] [Commented] (HIVE-7681) qualified tablenames usage does not work with several alter-table commands
[ https://issues.apache.org/jira/browse/HIVE-7681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100342#comment-14100342 ] Hive QA commented on HIVE-7681: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12662424/HIVE-7681.4.patch.txt {color:green}SUCCESS:{color} +1 5817 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/372/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/372/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-372/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12662424 qualified tablenames usage does not work with several alter-table commands -- Key: HIVE-7681 URL: https://issues.apache.org/jira/browse/HIVE-7681 Project: Hive Issue Type: Bug Reporter: Thejas M Nair Assignee: Navis Attachments: HIVE-7681.1.patch.txt, HIVE-7681.2.patch.txt, HIVE-7681.3.patch.txt, HIVE-7681.4.patch.txt Changes were made in HIVE-4064 for use of qualified table names in more types of queries. But several alter table commands don't work with qualified - alter table default.tmpfoo set tblproperties (bar = bar value) - ALTER TABLE default.kv_rename_test CHANGE a a STRING - add,drop partition - alter index rebuild -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark
[ https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100358#comment-14100358 ] Hive QA commented on HIVE-7624: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12662431/HIVE-7624.7-spark.patch {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 5915 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2 org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/54/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/54/console Test logs: http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-54/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12662431 Reduce operator initialization failed when running multiple MR query on spark - Key: HIVE-7624 URL: https://issues.apache.org/jira/browse/HIVE-7624 Project: Hive Issue Type: Bug Components: Spark Reporter: Rui Li Assignee: Rui Li Fix For: spark-branch Attachments: HIVE-7624.2-spark.patch, HIVE-7624.3-spark.patch, HIVE-7624.4-spark.patch, HIVE-7624.5-spark.patch, HIVE-7624.6-spark.patch, HIVE-7624.7-spark.patch, HIVE-7624.patch The following error occurs when I try to run a query with multiple reduce works (M-R-R): {quote} 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1) java.lang.RuntimeException: Reduce operator initialization failed at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170) at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53) at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from [0:_col0] at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415) at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147) … {quote} I suspect we're applying the reduce function in wrong order. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7761) Failed to analyze stats with CounterStatsAggregator.[SparkBranch]
Chengxiang Li created HIVE-7761: --- Summary: Failed to analyze stats with CounterStatsAggregator.[SparkBranch] Key: HIVE-7761 URL: https://issues.apache.org/jira/browse/HIVE-7761 Project: Hive Issue Type: Bug Components: Spark Reporter: Chengxiang Li CounterStatsAggregator analyze stats with MR counter, we need to implement another CounterStatsAggregator based on spark speficed counter to analyze table stats. Here is the error information: 2014-08-17 23:46:34,436 ERROR stats.CounterStatsAggregator (CounterStatsAggregator.java:connect(51)) - Failed to get Job instance for null java.lang.ClassCastException: org.apache.hadoop.hive.ql.exec.spark.SparkTask cannot be cast to org.apache.hadoop.hive.ql.exec.mr.MapRedTask at org.apache.hadoop.hive.ql.stats.CounterStatsAggregator.connect(CounterStatsAggregator.java:46) at org.apache.hadoop.hive.ql.exec.StatsTask.createStatsAggregator(StatsTask.java:282) at org.apache.hadoop.hive.ql.exec.StatsTask.aggregateStats(StatsTask.java:142) at org.apache.hadoop.hive.ql.exec.StatsTask.execute(StatsTask.java:118) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:161) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1534) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1301) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1113) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:937) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:927) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:246) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:198) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7761) Failed to analyze stats with CounterStatsAggregator.[SparkBranch]
[ https://issues.apache.org/jira/browse/HIVE-7761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengxiang Li updated HIVE-7761: Description: CounterStatsAggregator analyze stats with MR counter, we need to implement another CounterStatsAggregator based on spark speficed counter to analyze table stats. Here is the error information: {noformat} 2014-08-17 23:46:34,436 ERROR stats.CounterStatsAggregator (CounterStatsAggregator.java:connect(51)) - Failed to get Job instance for null java.lang.ClassCastException: org.apache.hadoop.hive.ql.exec.spark.SparkTask cannot be cast to org.apache.hadoop.hive.ql.exec.mr.MapRedTask at org.apache.hadoop.hive.ql.stats.CounterStatsAggregator.connect(CounterStatsAggregator.java:46) at org.apache.hadoop.hive.ql.exec.StatsTask.createStatsAggregator(StatsTask.java:282) at org.apache.hadoop.hive.ql.exec.StatsTask.aggregateStats(StatsTask.java:142) at org.apache.hadoop.hive.ql.exec.StatsTask.execute(StatsTask.java:118) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:161) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1534) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1301) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1113) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:937) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:927) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:246) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:198) {noformat} was: CounterStatsAggregator analyze stats with MR counter, we need to implement another CounterStatsAggregator based on spark speficed counter to analyze table stats. Here is the error information: 2014-08-17 23:46:34,436 ERROR stats.CounterStatsAggregator (CounterStatsAggregator.java:connect(51)) - Failed to get Job instance for null java.lang.ClassCastException: org.apache.hadoop.hive.ql.exec.spark.SparkTask cannot be cast to org.apache.hadoop.hive.ql.exec.mr.MapRedTask at org.apache.hadoop.hive.ql.stats.CounterStatsAggregator.connect(CounterStatsAggregator.java:46) at org.apache.hadoop.hive.ql.exec.StatsTask.createStatsAggregator(StatsTask.java:282) at org.apache.hadoop.hive.ql.exec.StatsTask.aggregateStats(StatsTask.java:142) at org.apache.hadoop.hive.ql.exec.StatsTask.execute(StatsTask.java:118) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:161) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1534) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1301) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1113) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:937) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:927) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:246) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:198) Failed to analyze stats with CounterStatsAggregator.[SparkBranch] - Key: HIVE-7761 URL: https://issues.apache.org/jira/browse/HIVE-7761 Project: Hive Issue Type: Bug Components: Spark Reporter: Chengxiang Li CounterStatsAggregator analyze stats with MR counter, we need to implement another CounterStatsAggregator based on spark speficed counter to analyze table stats. Here is the error information: {noformat} 2014-08-17 23:46:34,436 ERROR stats.CounterStatsAggregator (CounterStatsAggregator.java:connect(51)) - Failed to get Job instance for null java.lang.ClassCastException: org.apache.hadoop.hive.ql.exec.spark.SparkTask cannot be cast to org.apache.hadoop.hive.ql.exec.mr.MapRedTask at org.apache.hadoop.hive.ql.stats.CounterStatsAggregator.connect(CounterStatsAggregator.java:46) at org.apache.hadoop.hive.ql.exec.StatsTask.createStatsAggregator(StatsTask.java:282) at org.apache.hadoop.hive.ql.exec.StatsTask.aggregateStats(StatsTask.java:142) at org.apache.hadoop.hive.ql.exec.StatsTask.execute(StatsTask.java:118) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:161) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1534) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1301) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1113) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:937) at
[jira] [Updated] (HIVE-6329) Support column level encryption/decryption
[ https://issues.apache.org/jira/browse/HIVE-6329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-6329: Attachment: HIVE-6329.9.patch.txt Support column level encryption/decryption -- Key: HIVE-6329 URL: https://issues.apache.org/jira/browse/HIVE-6329 Project: Hive Issue Type: New Feature Components: Security, Serializers/Deserializers Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-6329.1.patch.txt, HIVE-6329.2.patch.txt, HIVE-6329.3.patch.txt, HIVE-6329.4.patch.txt, HIVE-6329.5.patch.txt, HIVE-6329.6.patch.txt, HIVE-6329.7.patch.txt, HIVE-6329.8.patch.txt, HIVE-6329.9.patch.txt Receiving some requirements on encryption recently but hive is not supporting it. Before the full implementation via HIVE-5207, this might be useful for some cases. {noformat} hive create table encode_test(id int, name STRING, phone STRING, address STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('column.encode.columns'='phone,address', 'column.encode.classname'='org.apache.hadoop.hive.serde2.Base64WriteOnly') STORED AS TEXTFILE; OK Time taken: 0.584 seconds hive insert into table encode_test select 100,'navis','010--','Seoul, Seocho' from src tablesample (1 rows); .. OK Time taken: 5.121 seconds hive select * from encode_test; OK 100 navis MDEwLTAwMDAtMDAwMA== U2VvdWwsIFNlb2Nobw== Time taken: 0.078 seconds, Fetched: 1 row(s) hive {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-5799) session/operation timeout for hiveserver2
[ https://issues.apache.org/jira/browse/HIVE-5799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-5799: Attachment: HIVE-5799.10.patch.txt session/operation timeout for hiveserver2 - Key: HIVE-5799 URL: https://issues.apache.org/jira/browse/HIVE-5799 Project: Hive Issue Type: Improvement Components: HiveServer2 Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-5799.1.patch.txt, HIVE-5799.10.patch.txt, HIVE-5799.2.patch.txt, HIVE-5799.3.patch.txt, HIVE-5799.4.patch.txt, HIVE-5799.5.patch.txt, HIVE-5799.6.patch.txt, HIVE-5799.7.patch.txt, HIVE-5799.8.patch.txt, HIVE-5799.9.patch.txt Need some timeout facility for preventing resource leakages from instable or bad clients. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-5718) Support direct fetch for lateral views, sub queries, etc.
[ https://issues.apache.org/jira/browse/HIVE-5718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-5718: Attachment: HIVE-5718.9.patch.txt Rerun test before commit Support direct fetch for lateral views, sub queries, etc. - Key: HIVE-5718 URL: https://issues.apache.org/jira/browse/HIVE-5718 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Trivial Attachments: D13857.1.patch, D13857.2.patch, D13857.3.patch, HIVE-5718.4.patch.txt, HIVE-5718.5.patch.txt, HIVE-5718.6.patch.txt, HIVE-5718.7.patch.txt, HIVE-5718.8.patch.txt, HIVE-5718.9.patch.txt Extend HIVE-2925 with LV and SubQ. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-5690) Support subquery for single sourced multi query
[ https://issues.apache.org/jira/browse/HIVE-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-5690: Attachment: HIVE-5690.9.patch.txt Support subquery for single sourced multi query --- Key: HIVE-5690 URL: https://issues.apache.org/jira/browse/HIVE-5690 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: D13791.1.patch, HIVE-5690.2.patch.txt, HIVE-5690.3.patch.txt, HIVE-5690.4.patch.txt, HIVE-5690.5.patch.txt, HIVE-5690.6.patch.txt, HIVE-5690.7.patch.txt, HIVE-5690.8.patch.txt, HIVE-5690.9.patch.txt Single sourced multi (insert) query is very useful for various ETL processes but it does not allow subqueries included. For example, {noformat} explain from src insert overwrite table x1 select * from (select distinct key,value) b order by key insert overwrite table x2 select * from (select distinct key,value) c order by value; {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-5690) Support subquery for single sourced multi query
[ https://issues.apache.org/jira/browse/HIVE-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-5690: Attachment: (was: HIVE-5690.9.patch.txt) Support subquery for single sourced multi query --- Key: HIVE-5690 URL: https://issues.apache.org/jira/browse/HIVE-5690 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: D13791.1.patch, HIVE-5690.2.patch.txt, HIVE-5690.3.patch.txt, HIVE-5690.4.patch.txt, HIVE-5690.5.patch.txt, HIVE-5690.6.patch.txt, HIVE-5690.7.patch.txt, HIVE-5690.8.patch.txt Single sourced multi (insert) query is very useful for various ETL processes but it does not allow subqueries included. For example, {noformat} explain from src insert overwrite table x1 select * from (select distinct key,value) b order by key insert overwrite table x2 select * from (select distinct key,value) c order by value; {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7738) tez select sum(decimal) from union all of decimal and null throws NPE
[ https://issues.apache.org/jira/browse/HIVE-7738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-7738: -- Attachment: HIVE-7738.3.patch added test query tez_union_decimal.q tez select sum(decimal) from union all of decimal and null throws NPE - Key: HIVE-7738 URL: https://issues.apache.org/jira/browse/HIVE-7738 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.13.1 Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Attachments: HIVE-7738.2.patch, HIVE-7738.2.patch, HIVE-7738.3.patch, HIVE-7738.patch, HIVE-7738.patch, HIVE-7738.patch, HIVE-7738.patch if run this query using tez engine then hive will throw NPE {code} select sum(a) from ( select cast(1.1 as decimal) a from dual union all select cast(null as decimal) a from dual ) t; {code} {code} hive select sum(a) from ( select cast(1.1 as decimal) a from dual union all select cast(null as decimal) a from dual ) t; Query ID = apivovarov_20140814200909_438385b2-4147-47bc-98a0-a01567bbb5c5 Total jobs = 1 Launching Job 1 out of 1 Status: Running (application id: application_1407388228332_5616) Map 1: -/-Map 4: -/- Reducer 3: 0/1 Map 1: 0/1Map 4: 0/1 Reducer 3: 0/1 Map 1: 0/1Map 4: 0/1 Reducer 3: 0/1 Map 1: 0/1Map 4: 1/1 Reducer 3: 0/1 Map 1: 0/1Map 4: 1/1 Reducer 3: 0/1 Map 1: 0/1Map 4: 1/1 Reducer 3: 0/1 Map 1: 0/1Map 4: 1/1 Reducer 3: 0/1 Map 1: 0/1Map 4: 1/1 Reducer 3: 0/1 Status: Failed Vertex failed, vertexName=Map 1, vertexId=vertex_1407388228332_5616_1_02, diagnostics=[Task failed, taskId=task_1407388228332_5616_1_02_00, diagnostics=[AttemptID:attempt_1407388228332_5616_1_02_00_0 Info:Error: java.lang.RuntimeException: java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:188) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:307) at org.apache.hadoop.mapred.YarnTezDagChild$5.run(YarnTezDagChild.java:564) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594) at org.apache.hadoop.mapred.YarnTezDagChild.main(YarnTezDagChild.java:553) Caused by: java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:145) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:164) ... 6 more Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableConstantHiveDecimalObjectInspector.precision(WritableConstantHiveDecimalObjectInspector.java:61) at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFSum$GenericUDAFSumHiveDecimal.init(GenericUDAFSum.java:106) at org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:362) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:460) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:416) at org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:67) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:460) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:416) at org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:67) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:460) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:416) at org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:189) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376) at org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:425) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:121) ... 7 more Container released by application, AttemptID:attempt_1407388228332_5616_1_02_00_1 Info:Error:
[jira] [Updated] (HIVE-5690) Support subquery for single sourced multi query
[ https://issues.apache.org/jira/browse/HIVE-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-5690: Attachment: HIVE-5690.9.patch.txt Support subquery for single sourced multi query --- Key: HIVE-5690 URL: https://issues.apache.org/jira/browse/HIVE-5690 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: D13791.1.patch, HIVE-5690.2.patch.txt, HIVE-5690.3.patch.txt, HIVE-5690.4.patch.txt, HIVE-5690.5.patch.txt, HIVE-5690.6.patch.txt, HIVE-5690.7.patch.txt, HIVE-5690.8.patch.txt, HIVE-5690.9.patch.txt Single sourced multi (insert) query is very useful for various ETL processes but it does not allow subqueries included. For example, {noformat} explain from src insert overwrite table x1 select * from (select distinct key,value) b order by key insert overwrite table x2 select * from (select distinct key,value) c order by value; {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-4788) RCFile and bzip2 compression not working
[ https://issues.apache.org/jira/browse/HIVE-4788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-4788: Attachment: HIVE-4788.2.patch.txt RCFile and bzip2 compression not working Key: HIVE-4788 URL: https://issues.apache.org/jira/browse/HIVE-4788 Project: Hive Issue Type: Bug Components: Compression Affects Versions: 0.10.0 Environment: CDH4.2 Reporter: Johndee Burks Assignee: Navis Priority: Minor Attachments: HIVE-4788.1.patch.txt, HIVE-4788.2.patch.txt The issue is that Bzip2 compressed rcfile data is encountering an error when being queried even the most simple query select *. The issue is easily reproducible using the following. Create a table and load the sample data below. DDL: create table source_data (a string, b string) row format delimited fields terminated by ','; Sample data: apple,sauce Test: Do the following and you should receive the error listed below for the rcfile table with bz2 compression. create table rc_nobz2 (a string, b string) stored as rcfile; insert into table rc_nobz2 select * from source_txt; SET io.seqfile.compression.type=BLOCK; SET hive.exec.compress.output=true; SET mapred.compress.map.output=true; SET mapred.output.compress=true; SET mapred.output.compression.codec=org.apache.hadoop.io.compress.BZip2Codec; create table rc_bz2 (a string, b string) stored as rcfile; insert into table rc_bz2 select * from source_txt; hive select * from rc_bz2; Failed with exception java.io.IOException:java.io.IOException: Stream is not BZip2 formatted: expected 'h' as first byte but got '�' hive select * from rc_nobz2; apple sauce -- This message was sent by Atlassian JIRA (v6.2#6252)
Review Request 24792: RCFile and bzip2 compression not working
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24792/ --- Review request for hive. Bugs: HIVE-4788 https://issues.apache.org/jira/browse/HIVE-4788 Repository: hive-git Description --- The issue is that Bzip2 compressed rcfile data is encountering an error when being queried even the most simple query select *. The issue is easily reproducible using the following. Create a table and load the sample data below. DDL: create table source_data (a string, b string) row format delimited fields terminated by ','; Sample data: apple,sauce Test: Do the following and you should receive the error listed below for the rcfile table with bz2 compression. create table rc_nobz2 (a string, b string) stored as rcfile; insert into table rc_nobz2 select * from source_txt; SET io.seqfile.compression.type=BLOCK; SET hive.exec.compress.output=true; SET mapred.compress.map.output=true; SET mapred.output.compress=true; SET mapred.output.compression.codec=org.apache.hadoop.io.compress.BZip2Codec; create table rc_bz2 (a string, b string) stored as rcfile; insert into table rc_bz2 select * from source_txt; hive select * from rc_bz2; Failed with exception java.io.IOException:java.io.IOException: Stream is not BZip2 formatted: expected 'h' as first byte but got 'ï¿¿' hive select * from rc_nobz2; apple sauce Diffs - ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java 2a27676 ql/src/test/queries/clientpositive/rcfile_compress.q PRE-CREATION ql/src/test/results/clientpositive/rcfile_compress.q.out PRE-CREATION Diff: https://reviews.apache.org/r/24792/diff/ Testing --- Thanks, Navis Ryu
[jira] [Updated] (HIVE-7711) Error Serializing GenericUDF
[ https://issues.apache.org/jira/browse/HIVE-7711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-7711: Attachment: HIVE-7711.1.patch.txt Error Serializing GenericUDF Key: HIVE-7711 URL: https://issues.apache.org/jira/browse/HIVE-7711 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Dr. Christian Betz Attachments: HIVE-7711.1.patch.txt I get an exception running a job with a GenericUDF in HIVE 0.13.0 (which was ok in HIVE 0.12.0). The org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc is serialized using Kryo, trying to serialize stuff in my GenericUDF which is not serializable (doesn't implement Serializable). Switching to Kryo made the comment in ExprNodeGenericFuncDesc obsolte: /** * In case genericUDF is Serializable, we will serialize the object. * * In case genericUDF does not implement Serializable, Java will remember the * class of genericUDF and creates a new instance when deserialized. This is * exactly what we want. */ Find the stacktrace below, however, the description above should be clear. Exception in thread main org.apache.hive.com.esotericsoftware.kryo.KryoException: java.lang.UnsupportedOperationException Serialization trace: value (java.util.concurrent.atomic.AtomicReference) state (clojure.lang.Atom) state (udfs.ArraySum) genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc) colExprMap (org.apache.hadoop.hive.ql.exec.SelectOperator) childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator) aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork) mapWork (org.apache.hadoop.hive.ql.plan.MapredWork) at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125) at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507) at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694) at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106) at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507) at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694) at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106) at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507) at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694) at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106) at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507) at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776) at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:139) at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17) at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694) at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106) at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507) at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776) at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112) at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18) at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694) at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106) at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507) at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776) at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:139) at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17) at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694) at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106) at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507) at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694) at
[jira] [Commented] (HIVE-7711) Error Serializing GenericUDF
[ https://issues.apache.org/jira/browse/HIVE-7711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100455#comment-14100455 ] Navis commented on HIVE-7711: - [~cbbetz] Could you try this with attached patch? Looks like UDFs need some new annotation for kryo serialization. Error Serializing GenericUDF Key: HIVE-7711 URL: https://issues.apache.org/jira/browse/HIVE-7711 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Dr. Christian Betz Attachments: HIVE-7711.1.patch.txt I get an exception running a job with a GenericUDF in HIVE 0.13.0 (which was ok in HIVE 0.12.0). The org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc is serialized using Kryo, trying to serialize stuff in my GenericUDF which is not serializable (doesn't implement Serializable). Switching to Kryo made the comment in ExprNodeGenericFuncDesc obsolte: /** * In case genericUDF is Serializable, we will serialize the object. * * In case genericUDF does not implement Serializable, Java will remember the * class of genericUDF and creates a new instance when deserialized. This is * exactly what we want. */ Find the stacktrace below, however, the description above should be clear. Exception in thread main org.apache.hive.com.esotericsoftware.kryo.KryoException: java.lang.UnsupportedOperationException Serialization trace: value (java.util.concurrent.atomic.AtomicReference) state (clojure.lang.Atom) state (udfs.ArraySum) genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc) colExprMap (org.apache.hadoop.hive.ql.exec.SelectOperator) childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator) aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork) mapWork (org.apache.hadoop.hive.ql.plan.MapredWork) at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125) at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507) at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694) at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106) at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507) at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694) at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106) at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507) at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694) at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106) at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507) at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776) at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:139) at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17) at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694) at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106) at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507) at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776) at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112) at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18) at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694) at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106) at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507) at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776) at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:139) at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17) at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694) at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106) at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507) at
[jira] [Commented] (HIVE-6329) Support column level encryption/decryption
[ https://issues.apache.org/jira/browse/HIVE-6329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100464#comment-14100464 ] Hive QA commented on HIVE-6329: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12662445/HIVE-6329.9.patch.txt {color:red}ERROR:{color} -1 due to 21 failed/errored test(s), 5819 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_8 org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_external_table_ppd org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_external_table_queries org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_map_queries org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_map_queries_prefix org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_storage_queries org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_custom_key org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_custom_key2 org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_custom_key3 org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_queries org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_ppd_key_ranges org.apache.hadoop.hive.hbase.TestHBaseSerDe.testHBaseSerDeCompositeKeyWithoutSeparator org.apache.hadoop.hive.hbase.TestHBaseSerDe.testHBaseSerDeII org.apache.hadoop.hive.hbase.TestHBaseSerDe.testHBaseSerDeWithColumnPrefixes org.apache.hadoop.hive.hbase.TestHBaseSerDe.testHBaseSerDeWithHiveMapToHBaseColumnFamily org.apache.hadoop.hive.hbase.TestHBaseSerDe.testHBaseSerDeWithHiveMapToHBaseColumnFamilyII org.apache.hadoop.hive.hbase.TestLazyHBaseObject.testLazyHBaseRow2 org.apache.hadoop.hive.hbase.TestLazyHBaseObject.testLazyHBaseRow3 org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.testPigFilterProjection org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.testPigPopulation org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/373/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/373/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-373/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 21 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12662445 Support column level encryption/decryption -- Key: HIVE-6329 URL: https://issues.apache.org/jira/browse/HIVE-6329 Project: Hive Issue Type: New Feature Components: Security, Serializers/Deserializers Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-6329.1.patch.txt, HIVE-6329.2.patch.txt, HIVE-6329.3.patch.txt, HIVE-6329.4.patch.txt, HIVE-6329.5.patch.txt, HIVE-6329.6.patch.txt, HIVE-6329.7.patch.txt, HIVE-6329.8.patch.txt, HIVE-6329.9.patch.txt Receiving some requirements on encryption recently but hive is not supporting it. Before the full implementation via HIVE-5207, this might be useful for some cases. {noformat} hive create table encode_test(id int, name STRING, phone STRING, address STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('column.encode.columns'='phone,address', 'column.encode.classname'='org.apache.hadoop.hive.serde2.Base64WriteOnly') STORED AS TEXTFILE; OK Time taken: 0.584 seconds hive insert into table encode_test select 100,'navis','010--','Seoul, Seocho' from src tablesample (1 rows); .. OK Time taken: 5.121 seconds hive select * from encode_test; OK 100 navis MDEwLTAwMDAtMDAwMA== U2VvdWwsIFNlb2Nobw== Time taken: 0.078 seconds, Fetched: 1 row(s) hive {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7528) Support cluster by and distributed by
[ https://issues.apache.org/jira/browse/HIVE-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-7528: - Attachment: HIVE-7528.spark.patch Support cluster by and distributed by - Key: HIVE-7528 URL: https://issues.apache.org/jira/browse/HIVE-7528 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Rui Li Attachments: HIVE-7528.spark.patch clustered by = distributed by + sort by, so this is related to HIVE-7527. If sort by is in place, the assumption is that we don't need to do anything about distributed by or clustered by. Still, we need to confirm and verify. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7528) Support cluster by and distributed by
[ https://issues.apache.org/jira/browse/HIVE-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100467#comment-14100467 ] Rui Li commented on HIVE-7528: -- Distribute/cluster by should work with the sort shuffler in place. This patch is mainly some refinement to the current shuffle code. Support cluster by and distributed by - Key: HIVE-7528 URL: https://issues.apache.org/jira/browse/HIVE-7528 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Rui Li Attachments: HIVE-7528.spark.patch clustered by = distributed by + sort by, so this is related to HIVE-7527. If sort by is in place, the assumption is that we don't need to do anything about distributed by or clustered by. Still, we need to confirm and verify. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7528) Support cluster by and distributed by
[ https://issues.apache.org/jira/browse/HIVE-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-7528: - Status: Patch Available (was: Open) Support cluster by and distributed by - Key: HIVE-7528 URL: https://issues.apache.org/jira/browse/HIVE-7528 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Rui Li Attachments: HIVE-7528.spark.patch clustered by = distributed by + sort by, so this is related to HIVE-7527. If sort by is in place, the assumption is that we don't need to do anything about distributed by or clustered by. Still, we need to confirm and verify. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-5799) session/operation timeout for hiveserver2
[ https://issues.apache.org/jira/browse/HIVE-5799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100470#comment-14100470 ] Hive QA commented on HIVE-5799: --- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12662446/HIVE-5799.10.patch.txt Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/374/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/374/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-374/ Messages: {noformat} This message was trimmed, see log for full details [INFO] [INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ hive-shims-0.23 --- [INFO] Using 'UTF-8' encoding to copy filtered resources. [INFO] skip non existing resourceDirectory /data/hive-ptest/working/apache-svn-trunk-source/shims/0.23/src/test/resources [INFO] Copying 3 resources [INFO] [INFO] --- maven-antrun-plugin:1.7:run (setup-test-dirs) @ hive-shims-0.23 --- [INFO] Executing tasks main: [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/shims/0.23/target/tmp [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/shims/0.23/target/warehouse [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/shims/0.23/target/tmp/conf [copy] Copying 7 files to /data/hive-ptest/working/apache-svn-trunk-source/shims/0.23/target/tmp/conf [INFO] Executed tasks [INFO] [INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ hive-shims-0.23 --- [INFO] No sources to compile [INFO] [INFO] --- maven-surefire-plugin:2.16:test (default-test) @ hive-shims-0.23 --- [INFO] Tests are skipped. [INFO] [INFO] --- maven-jar-plugin:2.2:jar (default-jar) @ hive-shims-0.23 --- [INFO] Building jar: /data/hive-ptest/working/apache-svn-trunk-source/shims/0.23/target/hive-shims-0.23-0.14.0-SNAPSHOT.jar [INFO] [INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ hive-shims-0.23 --- [INFO] [INFO] --- maven-install-plugin:2.4:install (default-install) @ hive-shims-0.23 --- [INFO] Installing /data/hive-ptest/working/apache-svn-trunk-source/shims/0.23/target/hive-shims-0.23-0.14.0-SNAPSHOT.jar to /data/hive-ptest/working/maven/org/apache/hive/shims/hive-shims-0.23/0.14.0-SNAPSHOT/hive-shims-0.23-0.14.0-SNAPSHOT.jar [INFO] Installing /data/hive-ptest/working/apache-svn-trunk-source/shims/0.23/pom.xml to /data/hive-ptest/working/maven/org/apache/hive/shims/hive-shims-0.23/0.14.0-SNAPSHOT/hive-shims-0.23-0.14.0-SNAPSHOT.pom [INFO] [INFO] [INFO] Building Hive Shims 0.14.0-SNAPSHOT [INFO] [INFO] [INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ hive-shims --- [INFO] Deleting /data/hive-ptest/working/apache-svn-trunk-source/shims/aggregator (includes = [datanucleus.log, derby.log], excludes = []) [INFO] [INFO] --- maven-remote-resources-plugin:1.5:process (default) @ hive-shims --- [INFO] [INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ hive-shims --- [INFO] Using 'UTF-8' encoding to copy filtered resources. [INFO] skip non existing resourceDirectory /data/hive-ptest/working/apache-svn-trunk-source/shims/aggregator/src/main/resources [INFO] Copying 3 resources [INFO] [INFO] --- maven-antrun-plugin:1.7:run (define-classpath) @ hive-shims --- [INFO] Executing tasks main: [INFO] Executed tasks [INFO] [INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ hive-shims --- [INFO] No sources to compile [INFO] [INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ hive-shims --- [INFO] Using 'UTF-8' encoding to copy filtered resources. [INFO] skip non existing resourceDirectory /data/hive-ptest/working/apache-svn-trunk-source/shims/aggregator/src/test/resources [INFO] Copying 3 resources [INFO] [INFO] --- maven-antrun-plugin:1.7:run (setup-test-dirs) @ hive-shims --- [INFO] Executing tasks main: [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/shims/aggregator/target/tmp [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/shims/aggregator/target/warehouse [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/shims/aggregator/target/tmp/conf [copy] Copying 7 files to /data/hive-ptest/working/apache-svn-trunk-source/shims/aggregator/target/tmp/conf [INFO] Executed tasks [INFO] [INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ hive-shims --- [INFO] No sources to compile [INFO] [INFO] ---
[jira] [Created] (HIVE-7762) Enhancement while getting partitions via webhcat client
Suhas Vasu created HIVE-7762: Summary: Enhancement while getting partitions via webhcat client Key: HIVE-7762 URL: https://issues.apache.org/jira/browse/HIVE-7762 Project: Hive Issue Type: Improvement Components: WebHCat Reporter: Suhas Vasu Priority: Minor Hcatalog creates partitions in lower case, whereas getting partitions from hcatalog via webhcat client doesn't handle this. So the client starts throwing exceptions. Ex: CREATE EXTERNAL TABLE in_table (word STRING, cnt INT) PARTITIONED BY (Year STRING, Month STRING, Date STRING, Hour STRING, Minute STRING) STORED AS TEXTFILE LOCATION '/user/suhas/hcat-data/in/'; Then i try to get partitions by: {noformat} String inputTableName = in_table; String database = default; MapString, String partitionSpec = new HashMapString, String(); partitionSpec.put(Year, 2014); partitionSpec.put(Month, 08); partitionSpec.put(Date, 11); partitionSpec.put(Hour, 00); partitionSpec.put(Minute, 00); HCatClient client = get(catalogUrl); HCatPartition hCatPartition = client.getPartition(database, inputTableName, partitionSpec); {noformat} This throws up saying: {noformat} Exception in thread main org.apache.hcatalog.common.HCatException : 9001 : Exception occurred while processing HCat request : Invalid partition-key specified: year at org.apache.hcatalog.api.HCatClientHMSImpl.getPartition(HCatClientHMSImpl.java:366) at com.inmobi.demo.HcatPartitions.main(HcatPartitions.java:34) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) {noformat} The same code works if i do {noformat} partitionSpec.put(year, 2014); partitionSpec.put(month, 08); partitionSpec.put(date, 11); partitionSpec.put(hour, 00); partitionSpec.put(minute, 00); {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7763) Failed to qeury TABLESAMPLE on empty bucket table.[Spark Branch]
Chengxiang Li created HIVE-7763: --- Summary: Failed to qeury TABLESAMPLE on empty bucket table.[Spark Branch] Key: HIVE-7763 URL: https://issues.apache.org/jira/browse/HIVE-7763 Project: Hive Issue Type: Bug Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Get the following exception: {noformat} 2014-08-18 16:23:15,213 ERROR [Executor task launch worker-0]: executor.Executor (Logging.scala:logError(96)) - Exception in task 0.0 in stage 1.0 (TID 0) java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:127) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:52) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:30) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: Configuration and input path are inconsistent at org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:404) at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:93) ... 16 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Configuration and input path are inconsistent at org.apache.hadoop.hive.ql.exec {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7763) Failed to qeury TABLESAMPLE on empty bucket table.[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengxiang Li updated HIVE-7763: Attachment: HIVE-7763.1-spark.patch Failed to qeury TABLESAMPLE on empty bucket table.[Spark Branch] Key: HIVE-7763 URL: https://issues.apache.org/jira/browse/HIVE-7763 Project: Hive Issue Type: Bug Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Attachments: HIVE-7763.1-spark.patch Get the following exception: {noformat} 2014-08-18 16:23:15,213 ERROR [Executor task launch worker-0]: executor.Executor (Logging.scala:logError(96)) - Exception in task 0.0 in stage 1.0 (TID 0) java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:127) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:52) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:30) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: Configuration and input path are inconsistent at org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:404) at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:93) ... 16 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Configuration and input path are inconsistent at org.apache.hadoop.hive.ql.exec {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7763) Failed to qeury TABLESAMPLE on empty bucket table.[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengxiang Li updated HIVE-7763: Status: Patch Available (was: Open) Failed to qeury TABLESAMPLE on empty bucket table.[Spark Branch] Key: HIVE-7763 URL: https://issues.apache.org/jira/browse/HIVE-7763 Project: Hive Issue Type: Bug Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Attachments: HIVE-7763.1-spark.patch Get the following exception: {noformat} 2014-08-18 16:23:15,213 ERROR [Executor task launch worker-0]: executor.Executor (Logging.scala:logError(96)) - Exception in task 0.0 in stage 1.0 (TID 0) java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:127) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:52) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:30) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: Configuration and input path are inconsistent at org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:404) at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:93) ... 16 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Configuration and input path are inconsistent at org.apache.hadoop.hive.ql.exec {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6934) PartitionPruner doesn't handle top level constant expression correctly
[ https://issues.apache.org/jira/browse/HIVE-6934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-6934: Status: Patch Available (was: Open) PartitionPruner doesn't handle top level constant expression correctly -- Key: HIVE-6934 URL: https://issues.apache.org/jira/browse/HIVE-6934 Project: Hive Issue Type: Bug Reporter: Harish Butani Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-6934.1.patch, HIVE-6934.2.patch, HIVE-6934.3.patch, HIVE-6934.4.patch, HIVE-6934.5.patch, HIVE-6934.6.patch You hit this error indirectly, because how we handle invalid constant comparisons. Consider: {code} create table x(key int, value string) partitioned by (dt int, ts string); -- both these queries hit this issue select * from x where key = 'abc'; select * from x where dt = 'abc'; -- the issue is the comparison get converted to the constant false -- and the PartitionPruner doesn't handle top level constant exprs corrcetly {code} Thanks to [~hsubramaniyan] for uncovering this as part of adding tests for HIVE-5376 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6934) PartitionPruner doesn't handle top level constant expression correctly
[ https://issues.apache.org/jira/browse/HIVE-6934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-6934: Attachment: HIVE-6934.6.patch PartitionPruner doesn't handle top level constant expression correctly -- Key: HIVE-6934 URL: https://issues.apache.org/jira/browse/HIVE-6934 Project: Hive Issue Type: Bug Reporter: Harish Butani Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-6934.1.patch, HIVE-6934.2.patch, HIVE-6934.3.patch, HIVE-6934.4.patch, HIVE-6934.5.patch, HIVE-6934.6.patch You hit this error indirectly, because how we handle invalid constant comparisons. Consider: {code} create table x(key int, value string) partitioned by (dt int, ts string); -- both these queries hit this issue select * from x where key = 'abc'; select * from x where dt = 'abc'; -- the issue is the comparison get converted to the constant false -- and the PartitionPruner doesn't handle top level constant exprs corrcetly {code} Thanks to [~hsubramaniyan] for uncovering this as part of adding tests for HIVE-5376 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6934) PartitionPruner doesn't handle top level constant expression correctly
[ https://issues.apache.org/jira/browse/HIVE-6934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-6934: Status: Open (was: Patch Available) PartitionPruner doesn't handle top level constant expression correctly -- Key: HIVE-6934 URL: https://issues.apache.org/jira/browse/HIVE-6934 Project: Hive Issue Type: Bug Reporter: Harish Butani Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-6934.1.patch, HIVE-6934.2.patch, HIVE-6934.3.patch, HIVE-6934.4.patch, HIVE-6934.5.patch, HIVE-6934.6.patch You hit this error indirectly, because how we handle invalid constant comparisons. Consider: {code} create table x(key int, value string) partitioned by (dt int, ts string); -- both these queries hit this issue select * from x where key = 'abc'; select * from x where dt = 'abc'; -- the issue is the comparison get converted to the constant false -- and the PartitionPruner doesn't handle top level constant exprs corrcetly {code} Thanks to [~hsubramaniyan] for uncovering this as part of adding tests for HIVE-5376 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7764) Support all JDBC-HiveServer2 authentication modes on a secure cluster
Vaibhav Gumashta created HIVE-7764: -- Summary: Support all JDBC-HiveServer2 authentication modes on a secure cluster Key: HIVE-7764 URL: https://issues.apache.org/jira/browse/HIVE-7764 Project: Hive Issue Type: Bug Components: HiveServer2, JDBC Affects Versions: 0.14.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Fix For: 0.14.0 Currently, HiveServer2 logs in with its keytab only if hive.server2.authentication is set to KERBEROS. However, hive.server2.authentication is config that determines the auth type an end user will use while authenticating with HiveServer2. There is a valid use case of user authenticating with HiveServer2 using LDAP for example, while HiveServer2 runs the query on a kerberized cluster. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7764) Support all JDBC-HiveServer2 authentication modes on a secure cluster
[ https://issues.apache.org/jira/browse/HIVE-7764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-7764: --- Attachment: HIVE-7764.1.patch Support all JDBC-HiveServer2 authentication modes on a secure cluster - Key: HIVE-7764 URL: https://issues.apache.org/jira/browse/HIVE-7764 Project: Hive Issue Type: Bug Components: HiveServer2, JDBC Affects Versions: 0.14.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Fix For: 0.14.0 Attachments: HIVE-7764.1.patch Currently, HiveServer2 logs in with its keytab only if hive.server2.authentication is set to KERBEROS. However, hive.server2.authentication is config that determines the auth type an end user will use while authenticating with HiveServer2. There is a valid use case of user authenticating with HiveServer2 using LDAP for example, while HiveServer2 runs the query on a kerberized cluster. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7764) Support all JDBC-HiveServer2 authentication modes on a secure cluster
[ https://issues.apache.org/jira/browse/HIVE-7764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-7764: --- Status: Patch Available (was: Open) Support all JDBC-HiveServer2 authentication modes on a secure cluster - Key: HIVE-7764 URL: https://issues.apache.org/jira/browse/HIVE-7764 Project: Hive Issue Type: Bug Components: HiveServer2, JDBC Affects Versions: 0.14.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Fix For: 0.14.0 Attachments: HIVE-7764.1.patch Currently, HiveServer2 logs in with its keytab only if hive.server2.authentication is set to KERBEROS. However, hive.server2.authentication is config that determines the auth type an end user will use while authenticating with HiveServer2. There is a valid use case of user authenticating with HiveServer2 using LDAP for example, while HiveServer2 runs the query on a kerberized cluster. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7353) HiveServer2 using embedded MetaStore leaks JDOPersistanceManager
[ https://issues.apache.org/jira/browse/HIVE-7353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-7353: --- Attachment: HIVE-7353.4.patch Patch rebased on trunk HiveServer2 using embedded MetaStore leaks JDOPersistanceManager Key: HIVE-7353 URL: https://issues.apache.org/jira/browse/HIVE-7353 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.13.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Fix For: 0.14.0 Attachments: HIVE-7353.1.patch, HIVE-7353.2.patch, HIVE-7353.3.patch, HIVE-7353.4.patch While using embedded metastore, while creating background threads to run async operations, HiveServer2 ends up creating new instances of JDOPersistanceManager rather than using the one from the foreground (handler) thread. Since JDOPersistanceManagerFactory caches JDOPersistanceManager instances, they are never GCed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7353) HiveServer2 using embedded MetaStore leaks JDOPersistanceManager
[ https://issues.apache.org/jira/browse/HIVE-7353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-7353: --- Status: Patch Available (was: Open) HiveServer2 using embedded MetaStore leaks JDOPersistanceManager Key: HIVE-7353 URL: https://issues.apache.org/jira/browse/HIVE-7353 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.13.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Fix For: 0.14.0 Attachments: HIVE-7353.1.patch, HIVE-7353.2.patch, HIVE-7353.3.patch, HIVE-7353.4.patch While using embedded metastore, while creating background threads to run async operations, HiveServer2 ends up creating new instances of JDOPersistanceManager rather than using the one from the foreground (handler) thread. Since JDOPersistanceManagerFactory caches JDOPersistanceManager instances, they are never GCed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7353) HiveServer2 using embedded MetaStore leaks JDOPersistanceManager
[ https://issues.apache.org/jira/browse/HIVE-7353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-7353: --- Status: Open (was: Patch Available) HiveServer2 using embedded MetaStore leaks JDOPersistanceManager Key: HIVE-7353 URL: https://issues.apache.org/jira/browse/HIVE-7353 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.13.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Fix For: 0.14.0 Attachments: HIVE-7353.1.patch, HIVE-7353.2.patch, HIVE-7353.3.patch, HIVE-7353.4.patch While using embedded metastore, while creating background threads to run async operations, HiveServer2 ends up creating new instances of JDOPersistanceManager rather than using the one from the foreground (handler) thread. Since JDOPersistanceManagerFactory caches JDOPersistanceManager instances, they are never GCed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7353) HiveServer2 using embedded MetaStore leaks JDOPersistanceManager
[ https://issues.apache.org/jira/browse/HIVE-7353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-7353: --- Description: While using embedded metastore, while creating background threads to run async operations, HiveServer2 ends up creating new instances of JDOPersistanceManager which are cached in JDOPersistanceManagerFactory. Even when the background thread is killed by the thread pool manager, the JDOPersistanceManager are never GCed because they are cached by JDOPersistanceManagerFactory. (was: While using embedded metastore, while creating background threads to run async operations, HiveServer2 ends up creating new instances of JDOPersistanceManager rather than using the one from the foreground (handler) thread. Since JDOPersistanceManagerFactory caches JDOPersistanceManager instances, they are never GCed.) HiveServer2 using embedded MetaStore leaks JDOPersistanceManager Key: HIVE-7353 URL: https://issues.apache.org/jira/browse/HIVE-7353 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.13.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Fix For: 0.14.0 Attachments: HIVE-7353.1.patch, HIVE-7353.2.patch, HIVE-7353.3.patch, HIVE-7353.4.patch While using embedded metastore, while creating background threads to run async operations, HiveServer2 ends up creating new instances of JDOPersistanceManager which are cached in JDOPersistanceManagerFactory. Even when the background thread is killed by the thread pool manager, the JDOPersistanceManager are never GCed because they are cached by JDOPersistanceManagerFactory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-5718) Support direct fetch for lateral views, sub queries, etc.
[ https://issues.apache.org/jira/browse/HIVE-5718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100518#comment-14100518 ] Hive QA commented on HIVE-5718: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12662448/HIVE-5718.9.patch.txt {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 5817 tests executed *Failed tests:* {noformat} org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/375/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/375/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-375/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12662448 Support direct fetch for lateral views, sub queries, etc. - Key: HIVE-5718 URL: https://issues.apache.org/jira/browse/HIVE-5718 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Trivial Attachments: D13857.1.patch, D13857.2.patch, D13857.3.patch, HIVE-5718.4.patch.txt, HIVE-5718.5.patch.txt, HIVE-5718.6.patch.txt, HIVE-5718.7.patch.txt, HIVE-5718.8.patch.txt, HIVE-5718.9.patch.txt Extend HIVE-2925 with LV and SubQ. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7341) Support for Table replication across HCatalog instances
[ https://issues.apache.org/jira/browse/HIVE-7341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100533#comment-14100533 ] Sushanth Sowmyan commented on HIVE-7341: +1, committing. Support for Table replication across HCatalog instances --- Key: HIVE-7341 URL: https://issues.apache.org/jira/browse/HIVE-7341 Project: Hive Issue Type: New Feature Components: HCatalog Affects Versions: 0.13.1 Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Fix For: 0.14.0 Attachments: HIVE-7341.1.patch, HIVE-7341.2.patch, HIVE-7341.3.patch, HIVE-7341.4.patch, HIVE-7341.5.patch The HCatClient currently doesn't provide very much support for replicating HCatTable definitions between 2 HCatalog Server (i.e. Hive metastore) instances. Systems similar to Apache Falcon might find the need to replicate partition data between 2 clusters, and keep the HCatalog metadata in sync between the two. This poses a couple of problems: # The definition of the source table might change (in column schema, I/O formats, record-formats, serde-parameters, etc.) The system will need a way to diff 2 tables and update the target-metastore with the changes. E.g. {code} targetTable.resolve( sourceTable, targetTable.diff(sourceTable) ); hcatClient.updateTableSchema(dbName, tableName, targetTable); {code} # The current {{HCatClient.addPartitions()}} API requires that the partition's schema be derived from the table's schema, thereby requiring that the table-schema be resolved *before* partitions with the new schema are added to the table. This is problematic, because it introduces race conditions when 2 partitions with differing column-schemas (e.g. right after a schema change) are copied in parallel. This can be avoided if each HCatAddPartitionDesc kept track of the partition's schema, in flight. # The source and target metastores might be running different/incompatible versions of Hive. The impending patch attempts to address these concerns (with some caveats). # {{HCatTable}} now has ## a {{diff()}} method, to compare against another HCatTable instance ## a {{resolve(diff)}} method to copy over specified table-attributes from another HCatTable ## a serialize/deserialize mechanism (via {{HCatClient.serializeTable()}} and {{HCatClient.deserializeTable()}}), so that HCatTable instances constructed in other class-loaders may be used for comparison # {{HCatPartition}} now provides finer-grained control over a Partition's column-schema, StorageDescriptor settings, etc. This allows partitions to be copied completely from source, with the ability to override specific properties if required (e.g. location). # {{HCatClient.updateTableSchema()}} can now update the entire table-definition, not just the column schema. # I've cleaned up and removed most of the redundancy between the HCatTable, HCatCreateTableDesc and HCatCreateTableDesc.Builder. The prior API failed to separate the table-attributes from the add-table-operation's attributes. By providing fluent-interfaces in HCatTable, and composing an HCatTable instance in HCatCreateTableDesc, the interfaces are cleaner(ish). The old setters are deprecated, in favour of those in HCatTable. Likewise, HCatPartition and HCatAddPartitionDesc. I'll post a patch for trunk shortly. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7763) Failed to qeury TABLESAMPLE on empty bucket table.[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100534#comment-14100534 ] Hive QA commented on HIVE-7763: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12662464/HIVE-7763.1-spark.patch {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 5915 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/55/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/55/console Test logs: http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-55/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12662464 Failed to qeury TABLESAMPLE on empty bucket table.[Spark Branch] Key: HIVE-7763 URL: https://issues.apache.org/jira/browse/HIVE-7763 Project: Hive Issue Type: Bug Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Attachments: HIVE-7763.1-spark.patch Get the following exception: {noformat} 2014-08-18 16:23:15,213 ERROR [Executor task launch worker-0]: executor.Executor (Logging.scala:logError(96)) - Exception in task 0.0 in stage 1.0 (TID 0) java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:127) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:52) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:30) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: Configuration and input path are inconsistent at org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:404) at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:93) ... 16 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Configuration and input path are inconsistent at org.apache.hadoop.hive.ql.exec {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7341) Support for Table replication across HCatalog instances
[ https://issues.apache.org/jira/browse/HIVE-7341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-7341: --- Resolution: Fixed Status: Resolved (was: Patch Available) Committed. Thanks, Mithun! (@Lefty: There isn't much of a need of end-user documentation for this patch, but possibly a programmer documentation aspect, which should mostly be covered by javadocs and the bug report here) Support for Table replication across HCatalog instances --- Key: HIVE-7341 URL: https://issues.apache.org/jira/browse/HIVE-7341 Project: Hive Issue Type: New Feature Components: HCatalog Affects Versions: 0.13.1 Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Fix For: 0.14.0 Attachments: HIVE-7341.1.patch, HIVE-7341.2.patch, HIVE-7341.3.patch, HIVE-7341.4.patch, HIVE-7341.5.patch The HCatClient currently doesn't provide very much support for replicating HCatTable definitions between 2 HCatalog Server (i.e. Hive metastore) instances. Systems similar to Apache Falcon might find the need to replicate partition data between 2 clusters, and keep the HCatalog metadata in sync between the two. This poses a couple of problems: # The definition of the source table might change (in column schema, I/O formats, record-formats, serde-parameters, etc.) The system will need a way to diff 2 tables and update the target-metastore with the changes. E.g. {code} targetTable.resolve( sourceTable, targetTable.diff(sourceTable) ); hcatClient.updateTableSchema(dbName, tableName, targetTable); {code} # The current {{HCatClient.addPartitions()}} API requires that the partition's schema be derived from the table's schema, thereby requiring that the table-schema be resolved *before* partitions with the new schema are added to the table. This is problematic, because it introduces race conditions when 2 partitions with differing column-schemas (e.g. right after a schema change) are copied in parallel. This can be avoided if each HCatAddPartitionDesc kept track of the partition's schema, in flight. # The source and target metastores might be running different/incompatible versions of Hive. The impending patch attempts to address these concerns (with some caveats). # {{HCatTable}} now has ## a {{diff()}} method, to compare against another HCatTable instance ## a {{resolve(diff)}} method to copy over specified table-attributes from another HCatTable ## a serialize/deserialize mechanism (via {{HCatClient.serializeTable()}} and {{HCatClient.deserializeTable()}}), so that HCatTable instances constructed in other class-loaders may be used for comparison # {{HCatPartition}} now provides finer-grained control over a Partition's column-schema, StorageDescriptor settings, etc. This allows partitions to be copied completely from source, with the ability to override specific properties if required (e.g. location). # {{HCatClient.updateTableSchema()}} can now update the entire table-definition, not just the column schema. # I've cleaned up and removed most of the redundancy between the HCatTable, HCatCreateTableDesc and HCatCreateTableDesc.Builder. The prior API failed to separate the table-attributes from the add-table-operation's attributes. By providing fluent-interfaces in HCatTable, and composing an HCatTable instance in HCatCreateTableDesc, the interfaces are cleaner(ish). The old setters are deprecated, in favour of those in HCatTable. Likewise, HCatPartition and HCatAddPartitionDesc. I'll post a patch for trunk shortly. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7762) Enhancement while getting partitions via webhcat client
[ https://issues.apache.org/jira/browse/HIVE-7762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suhas Vasu updated HIVE-7762: - Attachment: HIVE-7762.patch Enhancement while getting partitions via webhcat client --- Key: HIVE-7762 URL: https://issues.apache.org/jira/browse/HIVE-7762 Project: Hive Issue Type: Improvement Components: WebHCat Reporter: Suhas Vasu Priority: Minor Attachments: HIVE-7762.patch Hcatalog creates partitions in lower case, whereas getting partitions from hcatalog via webhcat client doesn't handle this. So the client starts throwing exceptions. Ex: CREATE EXTERNAL TABLE in_table (word STRING, cnt INT) PARTITIONED BY (Year STRING, Month STRING, Date STRING, Hour STRING, Minute STRING) STORED AS TEXTFILE LOCATION '/user/suhas/hcat-data/in/'; Then i try to get partitions by: {noformat} String inputTableName = in_table; String database = default; MapString, String partitionSpec = new HashMapString, String(); partitionSpec.put(Year, 2014); partitionSpec.put(Month, 08); partitionSpec.put(Date, 11); partitionSpec.put(Hour, 00); partitionSpec.put(Minute, 00); HCatClient client = get(catalogUrl); HCatPartition hCatPartition = client.getPartition(database, inputTableName, partitionSpec); {noformat} This throws up saying: {noformat} Exception in thread main org.apache.hcatalog.common.HCatException : 9001 : Exception occurred while processing HCat request : Invalid partition-key specified: year at org.apache.hcatalog.api.HCatClientHMSImpl.getPartition(HCatClientHMSImpl.java:366) at com.inmobi.demo.HcatPartitions.main(HcatPartitions.java:34) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) {noformat} The same code works if i do {noformat} partitionSpec.put(year, 2014); partitionSpec.put(month, 08); partitionSpec.put(date, 11); partitionSpec.put(hour, 00); partitionSpec.put(minute, 00); {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7762) Enhancement while getting partitions via webhcat client
[ https://issues.apache.org/jira/browse/HIVE-7762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suhas Vasu updated HIVE-7762: - Status: Patch Available (was: Open) Enhancement while getting partitions via webhcat client --- Key: HIVE-7762 URL: https://issues.apache.org/jira/browse/HIVE-7762 Project: Hive Issue Type: Improvement Components: WebHCat Reporter: Suhas Vasu Priority: Minor Attachments: HIVE-7762.patch Hcatalog creates partitions in lower case, whereas getting partitions from hcatalog via webhcat client doesn't handle this. So the client starts throwing exceptions. Ex: CREATE EXTERNAL TABLE in_table (word STRING, cnt INT) PARTITIONED BY (Year STRING, Month STRING, Date STRING, Hour STRING, Minute STRING) STORED AS TEXTFILE LOCATION '/user/suhas/hcat-data/in/'; Then i try to get partitions by: {noformat} String inputTableName = in_table; String database = default; MapString, String partitionSpec = new HashMapString, String(); partitionSpec.put(Year, 2014); partitionSpec.put(Month, 08); partitionSpec.put(Date, 11); partitionSpec.put(Hour, 00); partitionSpec.put(Minute, 00); HCatClient client = get(catalogUrl); HCatPartition hCatPartition = client.getPartition(database, inputTableName, partitionSpec); {noformat} This throws up saying: {noformat} Exception in thread main org.apache.hcatalog.common.HCatException : 9001 : Exception occurred while processing HCat request : Invalid partition-key specified: year at org.apache.hcatalog.api.HCatClientHMSImpl.getPartition(HCatClientHMSImpl.java:366) at com.inmobi.demo.HcatPartitions.main(HcatPartitions.java:34) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) {noformat} The same code works if i do {noformat} partitionSpec.put(year, 2014); partitionSpec.put(month, 08); partitionSpec.put(date, 11); partitionSpec.put(hour, 00); partitionSpec.put(minute, 00); {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7068) Integrate AccumuloStorageHandler
[ https://issues.apache.org/jira/browse/HIVE-7068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100541#comment-14100541 ] Sushanth Sowmyan commented on HIVE-7068: I agree with Nick and Navis - since this is a first addition, I'm good with getting it in and letting people play with it. A basic look through looks like it implements the hive interfaces reasonably well, and I'm +1 for inclusion. Josh, could you please rebase the patch to the current hive trunk and upload (looks like recent changes caused itests/qtest/pom.xml to not patch properly) I'll commit it once the tests pass with the latest rebase. Integrate AccumuloStorageHandler Key: HIVE-7068 URL: https://issues.apache.org/jira/browse/HIVE-7068 Project: Hive Issue Type: New Feature Reporter: Josh Elser Assignee: Josh Elser Fix For: 0.14.0 Attachments: HIVE-7068.1.patch, HIVE-7068.2.patch, HIVE-7068.3.patch [Accumulo|http://accumulo.apache.org] is a BigTable-clone which is similar to HBase. Some [initial work|https://github.com/bfemiano/accumulo-hive-storage-manager] has been done to support querying an Accumulo table using Hive already. It is not a complete solution as, most notably, the current implementation presently lacks support for INSERTs. I would like to polish up the AccumuloStorageHandler (presently based on 0.10), implement missing basic functionality and compare it to the HBaseStorageHandler (to ensure that we follow the same general usage patterns). I've also been in communication with [~bfem] (the initial author) who expressed interest in working on this again. I hope to coordinate efforts with him. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7738) tez select sum(decimal) from union all of decimal and null throws NPE
[ https://issues.apache.org/jira/browse/HIVE-7738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100550#comment-14100550 ] Hive QA commented on HIVE-7738: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12662450/HIVE-7738.3.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 5818 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_tez_union_decimal org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/376/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/376/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-376/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12662450 tez select sum(decimal) from union all of decimal and null throws NPE - Key: HIVE-7738 URL: https://issues.apache.org/jira/browse/HIVE-7738 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.13.1 Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Attachments: HIVE-7738.2.patch, HIVE-7738.2.patch, HIVE-7738.3.patch, HIVE-7738.patch, HIVE-7738.patch, HIVE-7738.patch, HIVE-7738.patch if run this query using tez engine then hive will throw NPE {code} select sum(a) from ( select cast(1.1 as decimal) a from dual union all select cast(null as decimal) a from dual ) t; {code} {code} hive select sum(a) from ( select cast(1.1 as decimal) a from dual union all select cast(null as decimal) a from dual ) t; Query ID = apivovarov_20140814200909_438385b2-4147-47bc-98a0-a01567bbb5c5 Total jobs = 1 Launching Job 1 out of 1 Status: Running (application id: application_1407388228332_5616) Map 1: -/-Map 4: -/- Reducer 3: 0/1 Map 1: 0/1Map 4: 0/1 Reducer 3: 0/1 Map 1: 0/1Map 4: 0/1 Reducer 3: 0/1 Map 1: 0/1Map 4: 1/1 Reducer 3: 0/1 Map 1: 0/1Map 4: 1/1 Reducer 3: 0/1 Map 1: 0/1Map 4: 1/1 Reducer 3: 0/1 Map 1: 0/1Map 4: 1/1 Reducer 3: 0/1 Map 1: 0/1Map 4: 1/1 Reducer 3: 0/1 Status: Failed Vertex failed, vertexName=Map 1, vertexId=vertex_1407388228332_5616_1_02, diagnostics=[Task failed, taskId=task_1407388228332_5616_1_02_00, diagnostics=[AttemptID:attempt_1407388228332_5616_1_02_00_0 Info:Error: java.lang.RuntimeException: java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:188) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:307) at org.apache.hadoop.mapred.YarnTezDagChild$5.run(YarnTezDagChild.java:564) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594) at org.apache.hadoop.mapred.YarnTezDagChild.main(YarnTezDagChild.java:553) Caused by: java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:145) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:164) ... 6 more Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableConstantHiveDecimalObjectInspector.precision(WritableConstantHiveDecimalObjectInspector.java:61) at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFSum$GenericUDAFSumHiveDecimal.init(GenericUDAFSum.java:106) at org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:362) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:460) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:416) at org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:67) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376) at
[jira] [Commented] (HIVE-5690) Support subquery for single sourced multi query
[ https://issues.apache.org/jira/browse/HIVE-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100589#comment-14100589 ] Hive QA commented on HIVE-5690: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12662452/HIVE-5690.9.patch.txt {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 5820 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_mapjoin_mapjoin org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_merge3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_optimize_nullscan org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_ptf org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/377/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/377/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-377/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12662452 Support subquery for single sourced multi query --- Key: HIVE-5690 URL: https://issues.apache.org/jira/browse/HIVE-5690 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: D13791.1.patch, HIVE-5690.2.patch.txt, HIVE-5690.3.patch.txt, HIVE-5690.4.patch.txt, HIVE-5690.5.patch.txt, HIVE-5690.6.patch.txt, HIVE-5690.7.patch.txt, HIVE-5690.8.patch.txt, HIVE-5690.9.patch.txt Single sourced multi (insert) query is very useful for various ETL processes but it does not allow subqueries included. For example, {noformat} explain from src insert overwrite table x1 select * from (select distinct key,value) b order by key insert overwrite table x2 select * from (select distinct key,value) c order by value; {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7673) Authorization api: missing privilege objects in create table/view
[ https://issues.apache.org/jira/browse/HIVE-7673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-7673: Attachment: HIVE-7673.2.patch HIVE-7673.2.patch - patch with test fixes and test updates. Authorization api: missing privilege objects in create table/view - Key: HIVE-7673 URL: https://issues.apache.org/jira/browse/HIVE-7673 Project: Hive Issue Type: Bug Components: Authorization, SQLStandardAuthorization Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-7673.1.patch, HIVE-7673.2.patch Issues being addressed: - In case of create-table-as-select query, the database the table belongs to is not among the objects to be authorized. - Create table has the objectName field of the table entry with the database prefix - like testdb.testtable, instead of just the table name. - checkPrivileges(CREATEVIEW) does not include the name of the view being created in outputHObjs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7673) Authorization api: missing privilege objects in create table/view
[ https://issues.apache.org/jira/browse/HIVE-7673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-7673: Status: Patch Available (was: Open) Authorization api: missing privilege objects in create table/view - Key: HIVE-7673 URL: https://issues.apache.org/jira/browse/HIVE-7673 Project: Hive Issue Type: Bug Components: Authorization, SQLStandardAuthorization Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-7673.1.patch, HIVE-7673.2.patch Issues being addressed: - In case of create-table-as-select query, the database the table belongs to is not among the objects to be authorized. - Create table has the objectName field of the table entry with the database prefix - like testdb.testtable, instead of just the table name. - checkPrivileges(CREATEVIEW) does not include the name of the view being created in outputHObjs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-4788) RCFile and bzip2 compression not working
[ https://issues.apache.org/jira/browse/HIVE-4788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100627#comment-14100627 ] Hive QA commented on HIVE-4788: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12662453/HIVE-4788.2.patch.txt {color:green}SUCCESS:{color} +1 5820 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/378/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/378/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-378/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12662453 RCFile and bzip2 compression not working Key: HIVE-4788 URL: https://issues.apache.org/jira/browse/HIVE-4788 Project: Hive Issue Type: Bug Components: Compression Affects Versions: 0.10.0 Environment: CDH4.2 Reporter: Johndee Burks Assignee: Navis Priority: Minor Attachments: HIVE-4788.1.patch.txt, HIVE-4788.2.patch.txt The issue is that Bzip2 compressed rcfile data is encountering an error when being queried even the most simple query select *. The issue is easily reproducible using the following. Create a table and load the sample data below. DDL: create table source_data (a string, b string) row format delimited fields terminated by ','; Sample data: apple,sauce Test: Do the following and you should receive the error listed below for the rcfile table with bz2 compression. create table rc_nobz2 (a string, b string) stored as rcfile; insert into table rc_nobz2 select * from source_txt; SET io.seqfile.compression.type=BLOCK; SET hive.exec.compress.output=true; SET mapred.compress.map.output=true; SET mapred.output.compress=true; SET mapred.output.compression.codec=org.apache.hadoop.io.compress.BZip2Codec; create table rc_bz2 (a string, b string) stored as rcfile; insert into table rc_bz2 select * from source_txt; hive select * from rc_bz2; Failed with exception java.io.IOException:java.io.IOException: Stream is not BZip2 formatted: expected 'h' as first byte but got '�' hive select * from rc_nobz2; apple sauce -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7765) Null pointer error with UNION ALL on partitioned tables using Tez
Chris Dragga created HIVE-7765: -- Summary: Null pointer error with UNION ALL on partitioned tables using Tez Key: HIVE-7765 URL: https://issues.apache.org/jira/browse/HIVE-7765 Project: Hive Issue Type: Bug Affects Versions: 0.13.1 Environment: Tez 0.4.1, Ubuntu 12.04, Hadoop 2.4.1. Reporter: Chris Dragga Priority: Minor When executing a UNION ALL query in Tez over partitioned tables where at least one table is empty, Hive fails to execute the query, returning the message FAILED: NullPointerException null. No stack trace accompanies this message. Removing partitioning solves this problem, as does switching to MapReduce as the execution engine. This can be reproduced using a variant of the example tables from the Getting Started documentation on the Hive wiki. To create the schema, use CREATE TABLE invites (foo INT, bar STRING) PARTITIONED BY (ds STRING); CREATE TABLE empty_invites (foo INT, bar STRING) PARTITIONED BY (ds STRING); Then, load invites with data (e.g., using the instructions [here|https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DMLOperations]) and execute the following: SELECT * FROM invites UNION ALL SELECT * FROM empty_invites; -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7764) Support all JDBC-HiveServer2 authentication modes on a secure cluster
[ https://issues.apache.org/jira/browse/HIVE-7764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100698#comment-14100698 ] Hive QA commented on HIVE-7764: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12662467/HIVE-7764.1.patch {color:red}ERROR:{color} -1 due to 30 failed/errored test(s), 5727 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.hooks.TestHs2Hooks.org.apache.hadoop.hive.hooks.TestHs2Hooks org.apache.hive.beeline.TestBeeLineWithArgs.org.apache.hive.beeline.TestBeeLineWithArgs org.apache.hive.jdbc.TestJdbcDriver2.org.apache.hive.jdbc.TestJdbcDriver2 org.apache.hive.jdbc.TestJdbcWithMiniHS2.org.apache.hive.jdbc.TestJdbcWithMiniHS2 org.apache.hive.jdbc.TestJdbcWithMiniMr.org.apache.hive.jdbc.TestJdbcWithMiniMr org.apache.hive.jdbc.TestSSL.testConnectionMismatch org.apache.hive.jdbc.TestSSL.testInvalidConfig org.apache.hive.jdbc.TestSSL.testSSLConnectionWithProperty org.apache.hive.jdbc.TestSSL.testSSLConnectionWithURL org.apache.hive.jdbc.TestSSL.testSSLFetch org.apache.hive.jdbc.TestSSL.testSSLFetchHttp org.apache.hive.jdbc.authorization.TestHS2AuthzContext.org.apache.hive.jdbc.authorization.TestHS2AuthzContext org.apache.hive.jdbc.authorization.TestHS2AuthzSessionContext.org.apache.hive.jdbc.authorization.TestHS2AuthzSessionContext org.apache.hive.jdbc.authorization.TestJdbcWithSQLAuthorization.org.apache.hive.jdbc.authorization.TestJdbcWithSQLAuthorization org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection org.apache.hive.jdbc.miniHS2.TestHiveServer2.testGetVariableValue org.apache.hive.jdbc.miniHS2.TestMiniHS2.testConfInSession org.apache.hive.service.auth.TestCustomAuthentication.org.apache.hive.service.auth.TestCustomAuthentication org.apache.hive.service.auth.TestPlainSaslHelper.testDoAsSetting org.apache.hive.service.cli.TestEmbeddedThriftBinaryCLIService.org.apache.hive.service.cli.TestEmbeddedThriftBinaryCLIService org.apache.hive.service.cli.TestScratchDir.testLocalScratchDirs org.apache.hive.service.cli.TestScratchDir.testResourceDirs org.apache.hive.service.cli.TestScratchDir.testScratchDirs org.apache.hive.service.cli.session.TestSessionGlobalInitFile.testSessionGlobalInitFile org.apache.hive.service.cli.session.TestSessionGlobalInitFile.testSessionGlobalInitFileAndConfOverlay org.apache.hive.service.cli.session.TestSessionGlobalInitFile.testSessionGlobalInitFileWithUser org.apache.hive.service.cli.session.TestSessionHooks.testProxyUser org.apache.hive.service.cli.session.TestSessionHooks.testSessionHook org.apache.hive.service.cli.thrift.TestThriftBinaryCLIService.org.apache.hive.service.cli.thrift.TestThriftBinaryCLIService org.apache.hive.service.cli.thrift.TestThriftHttpCLIService.org.apache.hive.service.cli.thrift.TestThriftHttpCLIService {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/379/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/379/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-379/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 30 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12662467 Support all JDBC-HiveServer2 authentication modes on a secure cluster - Key: HIVE-7764 URL: https://issues.apache.org/jira/browse/HIVE-7764 Project: Hive Issue Type: Bug Components: HiveServer2, JDBC Affects Versions: 0.14.0 Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Fix For: 0.14.0 Attachments: HIVE-7764.1.patch Currently, HiveServer2 logs in with its keytab only if hive.server2.authentication is set to KERBEROS. However, hive.server2.authentication is config that determines the auth type an end user will use while authenticating with HiveServer2. There is a valid use case of user authenticating with HiveServer2 using LDAP for example, while HiveServer2 runs the query on a kerberized cluster. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7647) Beeline does not honor --headerInterval and --color when executing with -e
[ https://issues.apache.org/jira/browse/HIVE-7647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100701#comment-14100701 ] Naveen Gangam commented on HIVE-7647: - Would someone be able to review this? Thanks in advance Beeline does not honor --headerInterval and --color when executing with -e Key: HIVE-7647 URL: https://issues.apache.org/jira/browse/HIVE-7647 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.14.0 Reporter: Naveen Gangam Assignee: Naveen Gangam Priority: Minor Fix For: 0.14.0 Attachments: HIVE-7647.1.patch --showHeader is being honored [root@localhost ~]# beeline --showHeader=false -u 'jdbc:hive2://localhost:1/default' -n hive -d org.apache.hive.jdbc.HiveDriver -e select * from sample_07 limit 10; Connecting to jdbc:hive2://localhost:1/default Connected to: Apache Hive (version 0.12.0-cdh5.0.1) Driver: Hive JDBC (version 0.12.0-cdh5.0.1) Transaction isolation: TRANSACTION_REPEATABLE_READ -hiveconf (No such file or directory) +--+--++-+ | 00- | All Occupations | 135185230 | 42270 | | 11- | Management occupations | 6152650| 100310 | | 11-1011 | Chief executives | 301930 | 160440 | | 11-1021 | General and operations managers | 1697690| 107970 | | 11-1031 | Legislators | 64650 | 37980 | | 11-2011 | Advertising and promotions managers | 36100 | 94720 | | 11-2021 | Marketing managers | 166790 | 118160 | | 11-2022 | Sales managers | 333910 | 110390 | | 11-2031 | Public relations managers| 51730 | 101220 | | 11-3011 | Administrative services managers | 246930 | 79500 | +--+--++-+ 10 rows selected (0.838 seconds) Beeline version 0.12.0-cdh5.1.0 by Apache Hive Closing: org.apache.hive.jdbc.HiveConnection --outputFormat is being honored. [root@localhost ~]# beeline --outputFormat=csv -u 'jdbc:hive2://localhost:1/default' -n hive -d org.apache.hive.jdbc.HiveDriver -e select * from sample_07 limit 10; Connecting to jdbc:hive2://localhost:1/default Connected to: Apache Hive (version 0.12.0-cdh5.0.1) Driver: Hive JDBC (version 0.12.0-cdh5.0.1) Transaction isolation: TRANSACTION_REPEATABLE_READ 'code','description','total_emp','salary' '00-','All Occupations','135185230','42270' '11-','Management occupations','6152650','100310' '11-1011','Chief executives','301930','160440' '11-1021','General and operations managers','1697690','107970' '11-1031','Legislators','64650','37980' '11-2011','Advertising and promotions managers','36100','94720' '11-2021','Marketing managers','166790','118160' '11-2022','Sales managers','333910','110390' '11-2031','Public relations managers','51730','101220' '11-3011','Administrative services managers','246930','79500' 10 rows selected (0.664 seconds) Beeline version 0.12.0-cdh5.1.0 by Apache Hive Closing: org.apache.hive.jdbc.HiveConnection both --color --headerInterval are being honored when executing using -f option (reads query from a file rather than the commandline) (cannot really see the color here but use the terminal colors) [root@localhost ~]# beeline --showheader=true --color=true --headerInterval=5 -u 'jdbc:hive2://localhost:1/default' -n hive -d org.apache.hive.jdbc.HiveDriver -f /tmp/tmp.sql Connecting to jdbc:hive2://localhost:1/default Connected to: Apache Hive (version 0.12.0-cdh5.0.1) Driver: Hive JDBC (version 0.12.0-cdh5.0.1) Transaction isolation: TRANSACTION_REPEATABLE_READ Beeline version 0.12.0-cdh5.1.0 by Apache Hive 0: jdbc:hive2://localhost select * from sample_07 limit 8; +--+--++-+ | code | description | total_emp | salary | +--+--++-+ | 00- | All Occupations | 135185230 | 42270 | | 11- | Management occupations | 6152650| 100310 | | 11-1011 | Chief executives | 301930 | 160440 | | 11-1021 | General and operations managers | 1697690| 107970 | | 11-1031 | Legislators | 64650 | 37980 | +--+--++-+ | code | description | total_emp | salary | +--+--++-+ | 11-2011 |
Re: Mail bounces from ebuddy.com
Anyone who is an admin on the list (I don't who the admins are) can do this by doing user-unsubscribe-USERNAME=ebuddy@hive.apache.org where USERNAME is the name of the bouncing user (see http://untroubled.org/ezmlm/ezman/ezman1.html ) Alan. Thejas Nair mailto:the...@hortonworks.com August 17, 2014 at 17:02 I don't know how to do this. Carl, Ashutosh, Do you guys know how to remove these two invalid emails from the mailing list ? Lars Francke mailto:lars.fran...@gmail.com August 17, 2014 at 15:41 Hmm great, I see others mentioning this as well. I'm happy to contact INFRA but I'm not sure if they are even needed or if someone from the Hive team can do this? On Fri, Aug 8, 2014 at 3:43 AM, Lefty Leverenz leftylever...@gmail.com Lefty Leverenz mailto:leftylever...@gmail.com August 7, 2014 at 18:43 (Excuse the spam.) Actually I'm getting two bounces per message, but gmail concatenates them so I didn't notice the second one. -- Lefty On Thu, Aug 7, 2014 at 9:36 PM, Lefty Leverenz leftylever...@gmail.com Lefty Leverenz mailto:leftylever...@gmail.com August 7, 2014 at 18:36 Curious, I've only been getting one bounce per message. Anyway thanks for bringing this up. -- Lefty Lars Francke mailto:lars.fran...@gmail.com August 7, 2014 at 4:38 Hi, every time I send a mail to dev@ I get two bounce mails from two people at ebuddy.com. I don't want to post the E-Mail addresses publicly but I can send them on if needed (and it can be triggered easily by just replying to this mail I guess). Could we maybe remove them from the list? Cheers, Lars -- Sent with Postbox http://www.getpostbox.com -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Mail bounces from ebuddy.com
Thanks, Alan for the hint. I just unsubscribed those two email addresses from ebuddy. On Mon, Aug 18, 2014 at 8:23 AM, Alan Gates ga...@hortonworks.com wrote: Anyone who is an admin on the list (I don't who the admins are) can do this by doing user-unsubscribe-USERNAME=ebuddy@hive.apache.org where USERNAME is the name of the bouncing user (see http://untroubled.org/ezmlm/ezman/ezman1.html ) Alan. Thejas Nair the...@hortonworks.com August 17, 2014 at 17:02 I don't know how to do this. Carl, Ashutosh, Do you guys know how to remove these two invalid emails from the mailing list ? Lars Francke lars.fran...@gmail.com August 17, 2014 at 15:41 Hmm great, I see others mentioning this as well. I'm happy to contact INFRA but I'm not sure if they are even needed or if someone from the Hive team can do this? On Fri, Aug 8, 2014 at 3:43 AM, Lefty Leverenz leftylever...@gmail.com leftylever...@gmail.com Lefty Leverenz leftylever...@gmail.com August 7, 2014 at 18:43 (Excuse the spam.) Actually I'm getting two bounces per message, but gmail concatenates them so I didn't notice the second one. -- Lefty On Thu, Aug 7, 2014 at 9:36 PM, Lefty Leverenz leftylever...@gmail.com leftylever...@gmail.com Lefty Leverenz leftylever...@gmail.com August 7, 2014 at 18:36 Curious, I've only been getting one bounce per message. Anyway thanks for bringing this up. -- Lefty Lars Francke lars.fran...@gmail.com August 7, 2014 at 4:38 Hi, every time I send a mail to dev@ I get two bounce mails from two people at ebuddy.com. I don't want to post the E-Mail addresses publicly but I can send them on if needed (and it can be triggered easily by just replying to this mail I guess). Could we maybe remove them from the list? Cheers, Lars -- Sent with Postbox http://www.getpostbox.com CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
[jira] [Commented] (HIVE-6934) PartitionPruner doesn't handle top level constant expression correctly
[ https://issues.apache.org/jira/browse/HIVE-6934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100760#comment-14100760 ] Hive QA commented on HIVE-6934: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12662466/HIVE-6934.6.patch {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 5820 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_boolexpr org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchCommit_Json org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/380/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/380/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-380/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12662466 PartitionPruner doesn't handle top level constant expression correctly -- Key: HIVE-6934 URL: https://issues.apache.org/jira/browse/HIVE-6934 Project: Hive Issue Type: Bug Reporter: Harish Butani Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-6934.1.patch, HIVE-6934.2.patch, HIVE-6934.3.patch, HIVE-6934.4.patch, HIVE-6934.5.patch, HIVE-6934.6.patch You hit this error indirectly, because how we handle invalid constant comparisons. Consider: {code} create table x(key int, value string) partitioned by (dt int, ts string); -- both these queries hit this issue select * from x where key = 'abc'; select * from x where dt = 'abc'; -- the issue is the comparison get converted to the constant false -- and the PartitionPruner doesn't handle top level constant exprs corrcetly {code} Thanks to [~hsubramaniyan] for uncovering this as part of adding tests for HIVE-5376 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7766) Cleanup Reduce operator code [Spark Branch]
Brock Noland created HIVE-7766: -- Summary: Cleanup Reduce operator code [Spark Branch] Key: HIVE-7766 URL: https://issues.apache.org/jira/browse/HIVE-7766 Project: Hive Issue Type: Sub-task Reporter: Brock Noland Assignee: Rui Li -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7373) Hive should not remove trailing zeros for decimal numbers
[ https://issues.apache.org/jira/browse/HIVE-7373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100762#comment-14100762 ] Sergio Peña commented on HIVE-7373: --- There is a problem when storing the factor value when serializing the value 0. When serializing 0, it was deserializing 0.00 The bug is that factor was being serialized with no changes only when its sign was 1 (positive). The other signs, 0 and negative, were negating the factor. Then, in the deserialize function, the factor was deserializing positive and 0 values. And only the negative value was negating the factor. (serialize) int sign = dec.compareTo(HiveDecimal.ZERO); int factor = dec.precision() - dec.scale(); factor = sign == 1 ? factor : -factor; (BUG) writeByte(buffer, (byte) ( sign + 1), invert); (deserialize) int b = buffer.read(invert) - 1; boolean positive = b != -1; if (!positive) { factor = -factor; } Here's a data example about the bug: length=1 prec-scal serialize | deserialize scale = factor-length -1.0decimal(1,1) factor=0 factor=-0 | factor=0scale = 0-1 (-1) -1 decimal(1,0) factor=1 factor=-1 |factor=1scale = 1-1 (0) 0 decimal(1,0) factor=1 factor=-1 |factor=-1 scale = -1-1 (-2) BUG 0.0decimal(1,1) factor=0 factor=-0 | factor=-0 scale = 0-1 (-1) 1 decimal(1,0) factor=1 factor=1 |factor=1scale = 1-1 (0) 1.0decimal(1,1) factor=0 factor=0 | factor=0scale = 0-1 (-1) And with the fix on serialize: factor = sign != -1 ? factor : -factor; (FIX) length=1 prec-scal serialize | deserialize scale = factor-length -1.0decimal(1,1) factor=0 factor=-0 | factor=0scale = 0-1 (-1) -1 decimal(1,0) factor=1 factor=-1 |factor=1scale = 1-1 (0) 0 decimal(1,0) factor=1 factor=1 |factor=1scale = -1-1 (0) FIX 0.0decimal(1,1) factor=0 factor=0 | factor=0scale = 0-1 (-1) 1 decimal(1,0) factor=1 factor=1 |factor=1scale = 1-1 (0) 1.0decimal(1,1) factor=0 factor=0 | factor=0scale = 0-1 (-1) Hive should not remove trailing zeros for decimal numbers - Key: HIVE-7373 URL: https://issues.apache.org/jira/browse/HIVE-7373 Project: Hive Issue Type: Bug Components: Types Affects Versions: 0.13.0, 0.13.1 Reporter: Xuefu Zhang Assignee: Sergio Peña Attachments: HIVE-7373.1.patch, HIVE-7373.2.patch, HIVE-7373.3.patch, HIVE-7373.4.patch, HIVE-7373.5.patch, HIVE-7373.6.patch, HIVE-7373.6.patch Currently Hive blindly removes trailing zeros of a decimal input number as sort of standardization. This is questionable in theory and problematic in practice. 1. In decimal context, number 3.14 has a different semantic meaning from number 3.14. Removing trailing zeroes makes the meaning lost. 2. In a extreme case, 0.0 has (p, s) as (1, 1). Hive removes trailing zeros, and then the number becomes 0, which has (p, s) of (1, 0). Thus, for a decimal column of (1,1), input such as 0.0, 0.00, and so on becomes NULL because the column doesn't allow a decimal number with integer part. Therefore, I propose Hive preserve the trailing zeroes (up to what the scale allows). With this, in above example, 0.0, 0.00, and 0. will be represented as 0.0 (precision=1, scale=1) internally. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7766) Cleanup Reduce operator code [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7766: --- Attachment: HIVE-7766.1-spark.patch Cleanup Reduce operator code [Spark Branch] --- Key: HIVE-7766 URL: https://issues.apache.org/jira/browse/HIVE-7766 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Brock Noland Assignee: Rui Li Attachments: HIVE-7766.1-spark.patch This patch https://issues.apache.org/jira/secure/attachment/12662431/HIVE-7624.7-spark.patch over on HIVE-7624. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark
[ https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100769#comment-14100769 ] Brock Noland commented on HIVE-7624: Hi Rui, Hive generally follows one commit = one jira so I moved your patch over to HIVE-7766 and committed it. Thank you!! Reduce operator initialization failed when running multiple MR query on spark - Key: HIVE-7624 URL: https://issues.apache.org/jira/browse/HIVE-7624 Project: Hive Issue Type: Bug Components: Spark Reporter: Rui Li Assignee: Rui Li Fix For: spark-branch Attachments: HIVE-7624.2-spark.patch, HIVE-7624.3-spark.patch, HIVE-7624.4-spark.patch, HIVE-7624.5-spark.patch, HIVE-7624.6-spark.patch, HIVE-7624.7-spark.patch, HIVE-7624.patch The following error occurs when I try to run a query with multiple reduce works (M-R-R): {quote} 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1) java.lang.RuntimeException: Reduce operator initialization failed at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170) at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53) at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from [0:_col0] at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415) at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147) … {quote} I suspect we're applying the reduce function in wrong order. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HIVE-7766) Cleanup Reduce operator code [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland resolved HIVE-7766. Resolution: Fixed Fix Version/s: spark-branch Thank you for your contribution Rui! I have committed this to spark! Cleanup Reduce operator code [Spark Branch] --- Key: HIVE-7766 URL: https://issues.apache.org/jira/browse/HIVE-7766 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Brock Noland Assignee: Rui Li Fix For: spark-branch Attachments: HIVE-7766.1-spark.patch This patch https://issues.apache.org/jira/secure/attachment/12662431/HIVE-7624.7-spark.patch over on HIVE-7624. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark
[ https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7624: --- Resolution: Fixed Status: Resolved (was: Patch Available) Reduce operator initialization failed when running multiple MR query on spark - Key: HIVE-7624 URL: https://issues.apache.org/jira/browse/HIVE-7624 Project: Hive Issue Type: Bug Components: Spark Reporter: Rui Li Assignee: Rui Li Fix For: spark-branch Attachments: HIVE-7624.2-spark.patch, HIVE-7624.3-spark.patch, HIVE-7624.4-spark.patch, HIVE-7624.5-spark.patch, HIVE-7624.6-spark.patch, HIVE-7624.7-spark.patch, HIVE-7624.patch The following error occurs when I try to run a query with multiple reduce works (M-R-R): {quote} 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1) java.lang.RuntimeException: Reduce operator initialization failed at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170) at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53) at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from [0:_col0] at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415) at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147) … {quote} I suspect we're applying the reduce function in wrong order. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7528) Support cluster by and distributed by
[ https://issues.apache.org/jira/browse/HIVE-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7528: --- Attachment: HIVE-7528.1-spark.patch Re-uploading the same patch under a name which allow pre-commit tests to run. Support cluster by and distributed by - Key: HIVE-7528 URL: https://issues.apache.org/jira/browse/HIVE-7528 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Rui Li Attachments: HIVE-7528.1-spark.patch, HIVE-7528.spark.patch clustered by = distributed by + sort by, so this is related to HIVE-7527. If sort by is in place, the assumption is that we don't need to do anything about distributed by or clustered by. Still, we need to confirm and verify. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7729) Enable q-tests for ANALYZE TABLE feature [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7729: --- Summary: Enable q-tests for ANALYZE TABLE feature [Spark Branch] (was: Enable q-tests for ANALYZE TABLE feature.) Enable q-tests for ANALYZE TABLE feature [Spark Branch] --- Key: HIVE-7729 URL: https://issues.apache.org/jira/browse/HIVE-7729 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Enable q-tests for ANALYZE TABLE feature since automatic test environment is ready. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7766) Cleanup Reduce operator code [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100766#comment-14100766 ] Brock Noland commented on HIVE-7766: +1 tests passed over on HIVE-7624. Cleanup Reduce operator code [Spark Branch] --- Key: HIVE-7766 URL: https://issues.apache.org/jira/browse/HIVE-7766 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Brock Noland Assignee: Rui Li Attachments: HIVE-7766.1-spark.patch This patch https://issues.apache.org/jira/secure/attachment/12662431/HIVE-7624.7-spark.patch over on HIVE-7624. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7766) Cleanup Reduce operator code [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7766: --- Description: This patch https://issues.apache.org/jira/secure/attachment/12662431/HIVE-7624.7-spark.patch over on HIVE-7624. Cleanup Reduce operator code [Spark Branch] --- Key: HIVE-7766 URL: https://issues.apache.org/jira/browse/HIVE-7766 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Brock Noland Assignee: Rui Li Attachments: HIVE-7766.1-spark.patch This patch https://issues.apache.org/jira/secure/attachment/12662431/HIVE-7624.7-spark.patch over on HIVE-7624. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7761) Failed to analyze stats with CounterStatsAggregator [SparkBranch]
[ https://issues.apache.org/jira/browse/HIVE-7761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7761: --- Summary: Failed to analyze stats with CounterStatsAggregator [SparkBranch] (was: Failed to analyze stats with CounterStatsAggregator.[SparkBranch]) Failed to analyze stats with CounterStatsAggregator [SparkBranch] - Key: HIVE-7761 URL: https://issues.apache.org/jira/browse/HIVE-7761 Project: Hive Issue Type: Bug Components: Spark Reporter: Chengxiang Li CounterStatsAggregator analyze stats with MR counter, we need to implement another CounterStatsAggregator based on spark speficed counter to analyze table stats. Here is the error information: {noformat} 2014-08-17 23:46:34,436 ERROR stats.CounterStatsAggregator (CounterStatsAggregator.java:connect(51)) - Failed to get Job instance for null java.lang.ClassCastException: org.apache.hadoop.hive.ql.exec.spark.SparkTask cannot be cast to org.apache.hadoop.hive.ql.exec.mr.MapRedTask at org.apache.hadoop.hive.ql.stats.CounterStatsAggregator.connect(CounterStatsAggregator.java:46) at org.apache.hadoop.hive.ql.exec.StatsTask.createStatsAggregator(StatsTask.java:282) at org.apache.hadoop.hive.ql.exec.StatsTask.aggregateStats(StatsTask.java:142) at org.apache.hadoop.hive.ql.exec.StatsTask.execute(StatsTask.java:118) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:161) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1534) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1301) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1113) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:937) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:927) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:246) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:198) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7763) Failed to qeury TABLESAMPLE on empty bucket table [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7763: --- Summary: Failed to qeury TABLESAMPLE on empty bucket table [Spark Branch] (was: Failed to qeury TABLESAMPLE on empty bucket table.[Spark Branch]) Failed to qeury TABLESAMPLE on empty bucket table [Spark Branch] Key: HIVE-7763 URL: https://issues.apache.org/jira/browse/HIVE-7763 Project: Hive Issue Type: Bug Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Attachments: HIVE-7763.1-spark.patch Get the following exception: {noformat} 2014-08-18 16:23:15,213 ERROR [Executor task launch worker-0]: executor.Executor (Logging.scala:logError(96)) - Exception in task 0.0 in stage 1.0 (TID 0) java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:127) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:52) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:30) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: Configuration and input path are inconsistent at org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:404) at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:93) ... 16 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Configuration and input path are inconsistent at org.apache.hadoop.hive.ql.exec {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7763) Failed to qeury TABLESAMPLE on empty bucket table.[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100774#comment-14100774 ] Brock Noland commented on HIVE-7763: +1 Failed to qeury TABLESAMPLE on empty bucket table.[Spark Branch] Key: HIVE-7763 URL: https://issues.apache.org/jira/browse/HIVE-7763 Project: Hive Issue Type: Bug Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Attachments: HIVE-7763.1-spark.patch Get the following exception: {noformat} 2014-08-18 16:23:15,213 ERROR [Executor task launch worker-0]: executor.Executor (Logging.scala:logError(96)) - Exception in task 0.0 in stage 1.0 (TID 0) java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:127) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:52) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:30) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: Configuration and input path are inconsistent at org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:404) at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:93) ... 16 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Configuration and input path are inconsistent at org.apache.hadoop.hive.ql.exec {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7763) Failed to qeury TABLESAMPLE on empty bucket table [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7763: --- Resolution: Fixed Status: Resolved (was: Patch Available) Chengxiang I have committed this to spark! Thank you very much for your contribution!! Failed to qeury TABLESAMPLE on empty bucket table [Spark Branch] Key: HIVE-7763 URL: https://issues.apache.org/jira/browse/HIVE-7763 Project: Hive Issue Type: Bug Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Attachments: HIVE-7763.1-spark.patch Get the following exception: {noformat} 2014-08-18 16:23:15,213 ERROR [Executor task launch worker-0]: executor.Executor (Logging.scala:logError(96)) - Exception in task 0.0 in stage 1.0 (TID 0) java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:127) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:52) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:30) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: Configuration and input path are inconsistent at org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:404) at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:93) ... 16 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Configuration and input path are inconsistent at org.apache.hadoop.hive.ql.exec {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7757) PTest2 separates test files with spaces while QTestGen uses commas
[ https://issues.apache.org/jira/browse/HIVE-7757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7757: --- Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Thank you for the review! I have committed this to trunk. PTest2 separates test files with spaces while QTestGen uses commas -- Key: HIVE-7757 URL: https://issues.apache.org/jira/browse/HIVE-7757 Project: Hive Issue Type: Improvement Reporter: Brock Noland Assignee: Brock Noland Fix For: 0.14.0 Attachments: HIVE-7757.1.patch I noticed in HIVE-7749 that even after the testconfiguration.properties file is updated TestSparkCliDriver is not being generated correctly. Basically it doesn't include any tests. The issue appears to be that in the pom file properties are separated by comma and the PTest2 properties files are separated by spaces. Since both comma and space are not used in the qtest properties files let's update all parsing code to use both comma and space. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7763) Failed to qeury TABLESAMPLE on empty bucket table [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7763: --- Assignee: Chengxiang Li (was: Brock Noland) Failed to qeury TABLESAMPLE on empty bucket table [Spark Branch] Key: HIVE-7763 URL: https://issues.apache.org/jira/browse/HIVE-7763 Project: Hive Issue Type: Bug Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Attachments: HIVE-7763.1-spark.patch Get the following exception: {noformat} 2014-08-18 16:23:15,213 ERROR [Executor task launch worker-0]: executor.Executor (Logging.scala:logError(96)) - Exception in task 0.0 in stage 1.0 (TID 0) java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:127) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:52) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:30) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: Configuration and input path are inconsistent at org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:404) at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:93) ... 16 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Configuration and input path are inconsistent at org.apache.hadoop.hive.ql.exec {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HIVE-7763) Failed to qeury TABLESAMPLE on empty bucket table [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland reassigned HIVE-7763: -- Assignee: Brock Noland (was: Chengxiang Li) Failed to qeury TABLESAMPLE on empty bucket table [Spark Branch] Key: HIVE-7763 URL: https://issues.apache.org/jira/browse/HIVE-7763 Project: Hive Issue Type: Bug Components: Spark Reporter: Chengxiang Li Assignee: Brock Noland Attachments: HIVE-7763.1-spark.patch Get the following exception: {noformat} 2014-08-18 16:23:15,213 ERROR [Executor task launch worker-0]: executor.Executor (Logging.scala:logError(96)) - Exception in task 0.0 in stage 1.0 (TID 0) java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:127) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:52) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:30) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: Configuration and input path are inconsistent at org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:404) at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:93) ... 16 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Configuration and input path are inconsistent at org.apache.hadoop.hive.ql.exec {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7702) Start running .q file tests on spark
[ https://issues.apache.org/jira/browse/HIVE-7702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100781#comment-14100781 ] Brock Noland commented on HIVE-7702: After looking at this more, I think we should start with the 100 or so test that tez executes: https://github.com/apache/hive/blob/spark/itests/src/test/resources/testconfiguration.properties#L49 Start running .q file tests on spark Key: HIVE-7702 URL: https://issues.apache.org/jira/browse/HIVE-7702 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Brock Noland Assignee: Chinna Rao Lalam Spark can currently only support a few queries, however there are some .q file tests which will pass today. The basic idea is that we should get some number of these actually working (10-20) so we can actually start testing the project. A good starting point might be the udf*, varchar*, or alter* tests: https://github.com/apache/hive/tree/spark/ql/src/test/queries/clientpositive To generate the output file for test XXX.q, you'd do: {noformat} mvn clean install -DskipTests -Phadoop-2 cd itests mvn clean install -DskipTests -Phadoop-2 cd qtest-spark mvn test -Dtest=TestCliDriver -Dqfile=XXX.q -Dtest.output.overwrite=true -Phadoop-2 {noformat} which would generate XXX.q.out which we can check-in to source control as a golden file. Multiple tests can be run at a give time as so: {noformat} mvn test -Dtest=TestCliDriver -Dqfile=X1.q,X2.q -Dtest.output.overwrite=true -Phadoop-2 {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Mail bounces from ebuddy.com
Thanks Alan and Ashutosh for taking care of this! On Mon, Aug 18, 2014 at 5:45 PM, Ashutosh Chauhan hashut...@apache.org wrote: Thanks, Alan for the hint. I just unsubscribed those two email addresses from ebuddy. On Mon, Aug 18, 2014 at 8:23 AM, Alan Gates ga...@hortonworks.com wrote: Anyone who is an admin on the list (I don't who the admins are) can do this by doing user-unsubscribe-USERNAME=ebuddy@hive.apache.org where USERNAME is the name of the bouncing user (see http://untroubled.org/ezmlm/ezman/ezman1.html ) Alan. Thejas Nair the...@hortonworks.com August 17, 2014 at 17:02 I don't know how to do this. Carl, Ashutosh, Do you guys know how to remove these two invalid emails from the mailing list ? Lars Francke lars.fran...@gmail.com August 17, 2014 at 15:41 Hmm great, I see others mentioning this as well. I'm happy to contact INFRA but I'm not sure if they are even needed or if someone from the Hive team can do this? On Fri, Aug 8, 2014 at 3:43 AM, Lefty Leverenz leftylever...@gmail.com leftylever...@gmail.com Lefty Leverenz leftylever...@gmail.com August 7, 2014 at 18:43 (Excuse the spam.) Actually I'm getting two bounces per message, but gmail concatenates them so I didn't notice the second one. -- Lefty On Thu, Aug 7, 2014 at 9:36 PM, Lefty Leverenz leftylever...@gmail.com leftylever...@gmail.com Lefty Leverenz leftylever...@gmail.com August 7, 2014 at 18:36 Curious, I've only been getting one bounce per message. Anyway thanks for bringing this up. -- Lefty Lars Francke lars.fran...@gmail.com August 7, 2014 at 4:38 Hi, every time I send a mail to dev@ I get two bounce mails from two people at ebuddy.com. I don't want to post the E-Mail addresses publicly but I can send them on if needed (and it can be triggered easily by just replying to this mail I guess). Could we maybe remove them from the list? Cheers, Lars -- Sent with Postbox http://www.getpostbox.com CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
[jira] [Assigned] (HIVE-7747) Spark: Submitting a query to Spark from HiveServer2 fails
[ https://issues.apache.org/jira/browse/HIVE-7747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venki Korukanti reassigned HIVE-7747: - Assignee: Venki Korukanti Spark: Submitting a query to Spark from HiveServer2 fails - Key: HIVE-7747 URL: https://issues.apache.org/jira/browse/HIVE-7747 Project: Hive Issue Type: Bug Components: Spark Affects Versions: spark-branch Reporter: Venki Korukanti Assignee: Venki Korukanti Fix For: spark-branch {{spark.serializer}} is set to {{org.apache.spark.serializer.KryoSerializer}}. Same configuration works fine from Hive CLI. Spark tasks fails with following error: {code} Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 9, 192.168.168.216): java.lang.IllegalStateException: unread block data java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2421) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382) java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62) org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:84) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:744) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7747) Submitting a query to Spark from HiveServer2 fails [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venki Korukanti updated HIVE-7747: -- Summary: Submitting a query to Spark from HiveServer2 fails [Spark Branch] (was: Spark: Submitting a query to Spark from HiveServer2 fails) Submitting a query to Spark from HiveServer2 fails [Spark Branch] - Key: HIVE-7747 URL: https://issues.apache.org/jira/browse/HIVE-7747 Project: Hive Issue Type: Bug Components: Spark Affects Versions: spark-branch Reporter: Venki Korukanti Assignee: Venki Korukanti Fix For: spark-branch {{spark.serializer}} is set to {{org.apache.spark.serializer.KryoSerializer}}. Same configuration works fine from Hive CLI. Spark tasks fails with following error: {code} Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 9, 192.168.168.216): java.lang.IllegalStateException: unread block data java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2421) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382) java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62) org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:84) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:744) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7373) Hive should not remove trailing zeros for decimal numbers
[ https://issues.apache.org/jira/browse/HIVE-7373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100813#comment-14100813 ] Brock Noland commented on HIVE-7373: Nice. I am +1 on this change. Hive should not remove trailing zeros for decimal numbers - Key: HIVE-7373 URL: https://issues.apache.org/jira/browse/HIVE-7373 Project: Hive Issue Type: Bug Components: Types Affects Versions: 0.13.0, 0.13.1 Reporter: Xuefu Zhang Assignee: Sergio Peña Attachments: HIVE-7373.1.patch, HIVE-7373.2.patch, HIVE-7373.3.patch, HIVE-7373.4.patch, HIVE-7373.5.patch, HIVE-7373.6.patch, HIVE-7373.6.patch Currently Hive blindly removes trailing zeros of a decimal input number as sort of standardization. This is questionable in theory and problematic in practice. 1. In decimal context, number 3.14 has a different semantic meaning from number 3.14. Removing trailing zeroes makes the meaning lost. 2. In a extreme case, 0.0 has (p, s) as (1, 1). Hive removes trailing zeros, and then the number becomes 0, which has (p, s) of (1, 0). Thus, for a decimal column of (1,1), input such as 0.0, 0.00, and so on becomes NULL because the column doesn't allow a decimal number with integer part. Therefore, I propose Hive preserve the trailing zeroes (up to what the scale allows). With this, in above example, 0.0, 0.00, and 0. will be represented as 0.0 (precision=1, scale=1) internally. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7739) TestSparkCliDriver should not use includeQueryFiles [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7739: --- Summary: TestSparkCliDriver should not use includeQueryFiles [Spark Branch] (was: TestSparkCliDriver should not use includeQueryFiles) TestSparkCliDriver should not use includeQueryFiles [Spark Branch] -- Key: HIVE-7739 URL: https://issues.apache.org/jira/browse/HIVE-7739 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-7739.1-spark.patch By using includesQueryFile TestSparkCliDriver cannot be used by -Dqfile or -Dqfile_regex. These options are very useful so let's remove it. spark.query.files in testconfiguration.properties will still be used when run via the pre-commit tests to generate -Dqfiles -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7709) Create SparkReporter [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7709: --- Summary: Create SparkReporter [Spark Branch] (was: Create SparkReporter) Create SparkReporter [Spark Branch] --- Key: HIVE-7709 URL: https://issues.apache.org/jira/browse/HIVE-7709 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Hive operators use Reporter to collect global information, with Hive on Spark mode, we need a new implementation of Reporter to collect hive operator level information based on spark specified Counter. This task should depends on HIVE-7551. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7525) Research to find out if it's possible to submit Spark jobs concurrently using shared SparkContext [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7525: --- Summary: Research to find out if it's possible to submit Spark jobs concurrently using shared SparkContext [Spark Branch] (was: Research to find out if it's possible to submit Spark jobs concurrently using shared SparkContext) Research to find out if it's possible to submit Spark jobs concurrently using shared SparkContext [Spark Branch] Key: HIVE-7525 URL: https://issues.apache.org/jira/browse/HIVE-7525 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Chao Refer to HIVE-7503 and SPARK-2688. Find out if it's possible to submit multiple spark jobs concurrently using a shared SparkContext. SparkClient's code can be manipulated for this test. Here is the process: 1. Transform rdd1 to rdd2 using some transformation. 2. call rdd2.cache() to persist it in memory. 3. in two threads, calling accordingly: Thread a. rdd2 - rdd3; rdd3.foreach() Thread b. rdd2 - rdd4; rdd4.foreach() It would be nice to find out monitoring and error reporting aspects. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7702) Start running .q file tests on spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7702: --- Summary: Start running .q file tests on spark [Spark Branch] (was: Start running .q file tests on spark) Start running .q file tests on spark [Spark Branch] --- Key: HIVE-7702 URL: https://issues.apache.org/jira/browse/HIVE-7702 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Brock Noland Assignee: Chinna Rao Lalam Spark can currently only support a few queries, however there are some .q file tests which will pass today. The basic idea is that we should get some number of these actually working (10-20) so we can actually start testing the project. A good starting point might be the udf*, varchar*, or alter* tests: https://github.com/apache/hive/tree/spark/ql/src/test/queries/clientpositive To generate the output file for test XXX.q, you'd do: {noformat} mvn clean install -DskipTests -Phadoop-2 cd itests mvn clean install -DskipTests -Phadoop-2 cd qtest-spark mvn test -Dtest=TestCliDriver -Dqfile=XXX.q -Dtest.output.overwrite=true -Phadoop-2 {noformat} which would generate XXX.q.out which we can check-in to source control as a golden file. Multiple tests can be run at a give time as so: {noformat} mvn test -Dtest=TestCliDriver -Dqfile=X1.q,X2.q -Dtest.output.overwrite=true -Phadoop-2 {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7674) Update to Spark 1.1 [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7674: --- Summary: Update to Spark 1.1 [Spark Branch] (was: Update to Spark 1.1) Update to Spark 1.1 [Spark Branch] -- Key: HIVE-7674 URL: https://issues.apache.org/jira/browse/HIVE-7674 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Brock Noland Priority: Blocker In HIVE-7540 we added a custom repo to use Spark 1.1. Once 1.1 is released we need to remove this repo. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7640) Support Hive TABLESAMPLE [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7640: --- Summary: Support Hive TABLESAMPLE [Spark Branch] (was: Support Hive TABLESAMPLE) Support Hive TABLESAMPLE [Spark Branch] --- Key: HIVE-7640 URL: https://issues.apache.org/jira/browse/HIVE-7640 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Research and verify TABLESAMPLE support in Hive on Spark, and research whether it can be merged with Spark sample features. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7528) Support cluster by and distributed by [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7528: --- Summary: Support cluster by and distributed by [Spark Branch] (was: Support cluster by and distributed by) Support cluster by and distributed by [Spark Branch] Key: HIVE-7528 URL: https://issues.apache.org/jira/browse/HIVE-7528 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Rui Li Attachments: HIVE-7528.1-spark.patch, HIVE-7528.spark.patch clustered by = distributed by + sort by, so this is related to HIVE-7527. If sort by is in place, the assumption is that we don't need to do anything about distributed by or clustered by. Still, we need to confirm and verify. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7597) Support analyze table [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7597: --- Summary: Support analyze table [Spark Branch] (was: Support analyze table) Support analyze table [Spark Branch] Key: HIVE-7597 URL: https://issues.apache.org/jira/browse/HIVE-7597 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Chengxiang Li Fix For: spark-branch Attachments: HIVE-7597.1-spark.patch, HIVE-7597.2-spark.patch, HIVE-7597.3-spark.patch Both MR and Tez has a visitor processing analyze table ... command. We cloned the code from Tez, but may need to make it fit for Spark, verify, and test. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7614) Find solution for closures containing writables [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7614: --- Summary: Find solution for closures containing writables [Spark Branch] (was: Find solution for closures containing writables) Find solution for closures containing writables [Spark Branch] -- Key: HIVE-7614 URL: https://issues.apache.org/jira/browse/HIVE-7614 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Brock Noland Priority: Blocker HIVE-7540 performed a workaround so we could serialize closures with Writables. However, we need a long term solution. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7708) Fix qtest-spark pom.xml reference to test properties [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7708: --- Summary: Fix qtest-spark pom.xml reference to test properties [Spark Branch] (was: Fix qtest-spark pom.xml reference to test properties) Fix qtest-spark pom.xml reference to test properties [Spark Branch] --- Key: HIVE-7708 URL: https://issues.apache.org/jira/browse/HIVE-7708 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Brock Noland Assignee: Brock Noland Fix For: spark-branch Attachments: HIVE-7708.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7675) Implement native HiveMapFunction [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7675: --- Summary: Implement native HiveMapFunction [Spark Branch] (was: Implement native HiveMapFunction) Implement native HiveMapFunction [Spark Branch] --- Key: HIVE-7675 URL: https://issues.apache.org/jira/browse/HIVE-7675 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Currently, Hive on Spark depend on ExecMapper to execute operator logic, full stack is like: Spark FrameWork=HiveMapFunction=ExecMapper=Hive operators. HiveMapFunction is just a thin wrapper of ExecMapper, this introduce several problems as following: # ExecMapper is designed for MR single process task mode, it does not work well under Spark multi-thread task node. # ExecMapper introduce extra API level restriction and process logic. We need implement native HiveMapFunction, as the bridge between Spark framework and Hive operators. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7593) Instantiate SparkClient per user session [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7593: --- Summary: Instantiate SparkClient per user session [Spark Branch] (was: Instantiate SparkClient per user session) Instantiate SparkClient per user session [Spark Branch] --- Key: HIVE-7593 URL: https://issues.apache.org/jira/browse/HIVE-7593 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Chinna Rao Lalam Attachments: HIVE-7593-spark.patch SparkContext is the main class via which Hive talk to Spark cluster. SparkClient encapsulates a SparkContext instance. Currently all user sessions share a single SparkClient instance in HiveServer2. While this is good enough for a POC, even for our first two milestones, this is not desirable for a multi-tenancy environment and gives least flexibility to Hive users. Here is what we propose: 1. Have a SparkClient instance per user session. The SparkClient instance is created when user executes its first query in the session. It will get destroyed when user session ends. 2. The SparkClient is instantiated based on the spark configurations that are available to the user, including those defined at the global level and those overwritten by the user (thru set command, for instance). 3. Ideally, when user changes any spark configuration during the session, the old SparkClient instance should be destroyed and a new one based on the new configurations is created. This may turn out to be a little hard, and thus it's a nice-to-have. If not implemented, we need to document that subsequent configuration changes will not take effect in the current session. Please note that there is a thread-safety issue on Spark side where multiple SparkContext instances cannot coexist in the same JVM (SPARK-2243). We need to work with Spark community to get this addressed. Besides above functional requirements, avoid potential issues is also a consideration. For instance, sharing SC among users is bad, as resources (such as jar for UDF) will be also shared, which is problematic. On the other hand, one SC per job seems too expensive, as the resource needs to be re-rendered even there isn't any change. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7559) StarterProject: Move configuration from SparkClient to HiveConf [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7559: --- Summary: StarterProject: Move configuration from SparkClient to HiveConf [Spark Branch] (was: StarterProject: Move configuration from SparkClient to HiveConf) StarterProject: Move configuration from SparkClient to HiveConf [Spark Branch] -- Key: HIVE-7559 URL: https://issues.apache.org/jira/browse/HIVE-7559 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Brock Noland Priority: Minor Labels: StarterProject The SparkClient class has some configuration keys and defaults. These should be moved to HiveConf. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7665) Create TestSparkCliDriver to run test in spark local mode [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7665: --- Summary: Create TestSparkCliDriver to run test in spark local mode [Spark Branch] (was: Create TestSparkCliDriver to run test in spark local mode) Create TestSparkCliDriver to run test in spark local mode [Spark Branch] Key: HIVE-7665 URL: https://issues.apache.org/jira/browse/HIVE-7665 Project: Hive Issue Type: Sub-task Components: Spark, Testing Infrastructure Reporter: Szehon Ho Assignee: Szehon Ho Fix For: spark-branch Attachments: HIVE-7665-spark.patch, HIVE-7665.2-spark.patch, HIVE-7665.3-spark.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7561) StarterProject: Move from assert to Guava Preconditions.* in Hive on Spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7561: --- Summary: StarterProject: Move from assert to Guava Preconditions.* in Hive on Spark [Spark Branch] (was: StarterProject: Move from assert to Guava Preconditions.* in Hive on Spark) StarterProject: Move from assert to Guava Preconditions.* in Hive on Spark [Spark Branch] - Key: HIVE-7561 URL: https://issues.apache.org/jira/browse/HIVE-7561 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Brock Noland Assignee: Chao Labels: StarterProject Fix For: spark-branch Attachments: HIVE-7561-spark.patch, HIVE-7561.2-spark.patch, HIVE-7561.3-spark.patch Hive uses the assert keyword all over the place. The problem is that assertions are rarely enabled since they have to be specifically enabled. In the Spark code, e.g. GenSparkUtils, let's use Preconditions.*. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7580) Support dynamic partitioning [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7580: --- Summary: Support dynamic partitioning [Spark Branch] (was: Support dynamic partitioning) Support dynamic partitioning [Spark Branch] --- Key: HIVE-7580 URL: https://issues.apache.org/jira/browse/HIVE-7580 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Chinna Rao Lalam My understanding is that we don't need to do anything special for this. However, this needs to be verified and tested. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7569) Make sure multi-MR queries work [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7569: --- Summary: Make sure multi-MR queries work [Spark Branch] (was: Make sure multi-MR queries work) Make sure multi-MR queries work [Spark Branch] -- Key: HIVE-7569 URL: https://issues.apache.org/jira/browse/HIVE-7569 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Chao With the latest dev effort, queries that would involve multiple MR jobs should be supported by spark now, except for sorting, multi-insert, union, and join (map join and smb might just work). However, this hasn't be verified and tested. This task is to ensure this is the case. Please create JIRAs for problems found. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7560) StarterProject: Fix exception handling in POC code [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7560: --- Summary: StarterProject: Fix exception handling in POC code [Spark Branch] (was: StarterProject: Fix exception handling in POC code) StarterProject: Fix exception handling in POC code [Spark Branch] - Key: HIVE-7560 URL: https://issues.apache.org/jira/browse/HIVE-7560 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Brock Noland Assignee: Chao Labels: StarterProject Fix For: spark-branch Attachments: HIVE-7560.1-spark.patch The POC code just printed exceptions to stderr. We should either: 1) LOG at INFO/WARN/ERROR 2) Or rethrow (perhaps wrapped in runtime exception) anything is a fatal error -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7465) Implement pre-commit testing [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7465: --- Summary: Implement pre-commit testing [Spark Branch] (was: Implement pre-commit testing) Implement pre-commit testing [Spark Branch] --- Key: HIVE-7465 URL: https://issues.apache.org/jira/browse/HIVE-7465 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Brock Noland Assignee: Brock Noland Fix For: spark-branch Attachments: HIVE-7465-spark.patch, HIVE-7465-spark.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7503) Support Hive's multi-table insert query with Spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7503: --- Summary: Support Hive's multi-table insert query with Spark [Spark Branch] (was: Support Hive's multi-table insert query with Spark) Support Hive's multi-table insert query with Spark [Spark Branch] - Key: HIVE-7503 URL: https://issues.apache.org/jira/browse/HIVE-7503 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Chao For Hive's multi insert query (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML), there may be an MR job for each insert. When we achieve this with Spark, it would be nice if all the inserts can happen concurrently. It seems that this functionality isn't available in Spark. To make things worse, the source of the insert may be re-computed unless it's staged. Even with this, the inserts will happen sequentially, making the performance suffer. This task is to find out what takes in Spark to enable this without requiring staging the source and sequential insertion. If this has to be solved in Hive, find out an optimum way to do this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7439) Spark job monitoring and error reporting [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7439: --- Summary: Spark job monitoring and error reporting [Spark Branch] (was: Spark job monitoring and error reporting) Spark job monitoring and error reporting [Spark Branch] --- Key: HIVE-7439 URL: https://issues.apache.org/jira/browse/HIVE-7439 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Chengxiang Li After Hive submits a job to Spark cluster, we need to report to user the job progress, such as the percentage done, to the user. This is especially important for long running queries. Moreover, if there is an error during job submission or execution, it's also crucial for hive to fetch the error log and/or stacktrace and feedback it to the user. Please refer design doc on wiki for more information. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7541) Support union all on Spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7541: --- Summary: Support union all on Spark [Spark Branch] (was: Support union all on Spark) Support union all on Spark [Spark Branch] - Key: HIVE-7541 URL: https://issues.apache.org/jira/browse/HIVE-7541 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Xuefu Zhang Assignee: Na Yang Fix For: spark-branch Attachments: HIVE-7541.1-spark.patch, HIVE-7541.2-spark.patch, HIVE-7541.3-spark.patch, HIVE-7541.4-spark.patch, HIVE-7541.5-spark.patch, Hive on Spark Union All design.pdf For union all operator, we will use Spark's union transformation. Refer to the design doc on wiki for more information. -- This message was sent by Atlassian JIRA (v6.2#6252)