[jira] [Updated] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark

2014-08-18 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-7624:
-

Status: Patch Available  (was: Reopened)

 Reduce operator initialization failed when running multiple MR query on spark
 -

 Key: HIVE-7624
 URL: https://issues.apache.org/jira/browse/HIVE-7624
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Fix For: spark-branch

 Attachments: HIVE-7624.2-spark.patch, HIVE-7624.3-spark.patch, 
 HIVE-7624.4-spark.patch, HIVE-7624.5-spark.patch, HIVE-7624.6-spark.patch, 
 HIVE-7624.7-spark.patch, HIVE-7624.patch


 The following error occurs when I try to run a query with multiple reduce 
 works (M-R-R):
 {quote}
 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1)
 java.lang.RuntimeException: Reduce operator initialization failed
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31)
 at 
 org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
 at 
 org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
 at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
 at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
 at 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:54)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from 
 [0:_col0]
 at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
 at 
 org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147)
 …
 {quote}
 I suspect we're applying the reduce function in wrong order.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark

2014-08-18 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-7624:
-

Attachment: HIVE-7624.7-spark.patch

 Reduce operator initialization failed when running multiple MR query on spark
 -

 Key: HIVE-7624
 URL: https://issues.apache.org/jira/browse/HIVE-7624
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Fix For: spark-branch

 Attachments: HIVE-7624.2-spark.patch, HIVE-7624.3-spark.patch, 
 HIVE-7624.4-spark.patch, HIVE-7624.5-spark.patch, HIVE-7624.6-spark.patch, 
 HIVE-7624.7-spark.patch, HIVE-7624.patch


 The following error occurs when I try to run a query with multiple reduce 
 works (M-R-R):
 {quote}
 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1)
 java.lang.RuntimeException: Reduce operator initialization failed
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31)
 at 
 org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
 at 
 org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
 at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
 at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
 at 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:54)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from 
 [0:_col0]
 at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
 at 
 org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147)
 …
 {quote}
 I suspect we're applying the reduce function in wrong order.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7757) PTest2 separates test files with spaces while QTestGen uses commas

2014-08-18 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100314#comment-14100314
 ] 

Szehon Ho commented on HIVE-7757:
-

+1

 PTest2 separates test files with spaces while QTestGen uses commas
 --

 Key: HIVE-7757
 URL: https://issues.apache.org/jira/browse/HIVE-7757
 Project: Hive
  Issue Type: Improvement
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-7757.1.patch


 I noticed in HIVE-7749 that even after the testconfiguration.properties file 
 is updated TestSparkCliDriver is not being generated correctly. Basically it 
 doesn't include any tests. The issue appears to be that in the pom file 
 properties are separated by comma and the PTest2 properties files are 
 separated by spaces. Since both comma and space are not used in the qtest 
 properties files let's update all parsing code to use both comma and space.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (HIVE-6144) Implement non-staged MapJoin

2014-08-18 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100309#comment-14100309
 ] 

Lefty Leverenz edited comment on HIVE-6144 at 8/18/14 6:06 AM:
---

Review request:  *hive.auto.convert.join.use.nonstaged* has been added to the 
section Optimize Auto Join Conversion in a version-0.13.0 box.  Is that the 
right place for it?  Could we have some examples and guidance on when to use it?

* [Optimize Auto Join Conversion | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+JoinOptimization#LanguageManualJoinOptimization-OptimizeAutoJoinConversion]

Also in that section, I changed the value of 
*hive.auto.convert.join.noconditionaltask.size* to match the default (1000) 
-- it had been 1 which seemed rather small, but if that value was intended 
please let me know.

Edit  Should this information from the parameter description be included in 
the version-0.13.0 box in Optimize Auto Join Conversion? -- Currently, this 
is not working with vectorization or Tez execution engine. 


was (Author: le...@hortonworks.com):
Review request:  *hive.auto.convert.join.use.nonstaged* has been added to the 
section Optimize Auto Join Conversion in a version-0.13.0 box.  Is that the 
right place for it?  Could we have some examples and guidance on when to use it?

* [Optimize Auto Join Conversion | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+JoinOptimization#LanguageManualJoinOptimization-OptimizeAutoJoinConversion]

Also in that section, I changed the value of 
*hive.auto.convert.join.noconditionaltask.size* to match the default (1000) 
-- it had been 1 which seemed rather small, but if that value was intended 
please let me know.

 Implement non-staged MapJoin
 

 Key: HIVE-6144
 URL: https://issues.apache.org/jira/browse/HIVE-6144
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
  Labels: TODOC13
 Fix For: 0.13.0

 Attachments: HIVE-6144.1.patch.txt, HIVE-6144.2.patch.txt, 
 HIVE-6144.3.patch.txt, HIVE-6144.4.patch.txt, HIVE-6144.5.patch.txt, 
 HIVE-6144.6.patch.txt, HIVE-6144.7.patch.txt, HIVE-6144.8.patch.txt, 
 HIVE-6144.9.patch.txt


 For map join, all data in small aliases are hashed and stored into temporary 
 file in MapRedLocalTask. But for some aliases without filter or projection, 
 it seemed not necessary to do that. For example.
 {noformat}
 select a.* from src a join src b on a.key=b.key;
 {noformat}
 makes plan like this.
 {noformat}
 STAGE PLANS:
   Stage: Stage-4
 Map Reduce Local Work
   Alias - Map Local Tables:
 a 
   Fetch Operator
 limit: -1
   Alias - Map Local Operator Tree:
 a 
   TableScan
 alias: a
 HashTable Sink Operator
   condition expressions:
 0 {key} {value}
 1 
   handleSkewJoin: false
   keys:
 0 [Column[key]]
 1 [Column[key]]
   Position of Big Table: 1
   Stage: Stage-3
 Map Reduce
   Alias - Map Operator Tree:
 b 
   TableScan
 alias: b
 Map Join Operator
   condition map:
Inner Join 0 to 1
   condition expressions:
 0 {key} {value}
 1 
   handleSkewJoin: false
   keys:
 0 [Column[key]]
 1 [Column[key]]
   outputColumnNames: _col0, _col1
   Position of Big Table: 1
   Select Operator
 File Output Operator
   Local Work:
 Map Reduce Local Work
   Stage: Stage-0
 Fetch Operator
 {noformat}
 table src(a) is fetched and stored as-is in MRLocalTask. With this patch, 
 plan can be like below.
 {noformat}
   Stage: Stage-3
 Map Reduce
   Alias - Map Operator Tree:
 b 
   TableScan
 alias: b
 Map Join Operator
   condition map:
Inner Join 0 to 1
   condition expressions:
 0 {key} {value}
 1 
   handleSkewJoin: false
   keys:
 0 [Column[key]]
 1 [Column[key]]
   outputColumnNames: _col0, _col1
   Position of Big Table: 1
   Select Operator
   File Output Operator
   Local Work:
 Map Reduce Local Work
   Alias - Map Local Tables:
 a 
   Fetch Operator
 limit: -1
   Alias - Map Local Operator Tree:
 a 
   TableScan
   

[jira] [Commented] (HIVE-7681) qualified tablenames usage does not work with several alter-table commands

2014-08-18 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100342#comment-14100342
 ] 

Hive QA commented on HIVE-7681:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12662424/HIVE-7681.4.patch.txt

{color:green}SUCCESS:{color} +1 5817 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/372/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/372/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-372/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12662424

 qualified tablenames usage does not work with several alter-table commands
 --

 Key: HIVE-7681
 URL: https://issues.apache.org/jira/browse/HIVE-7681
 Project: Hive
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Navis
 Attachments: HIVE-7681.1.patch.txt, HIVE-7681.2.patch.txt, 
 HIVE-7681.3.patch.txt, HIVE-7681.4.patch.txt


 Changes were made in HIVE-4064 for use of qualified table names in more types 
 of queries. But several alter table commands don't work with qualified 
 - alter table default.tmpfoo set tblproperties (bar = bar value)
 - ALTER TABLE default.kv_rename_test CHANGE a a STRING
 - add,drop partition
 - alter index rebuild



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark

2014-08-18 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100358#comment-14100358
 ] 

Hive QA commented on HIVE-7624:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12662431/HIVE-7624.7-spark.patch

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 5915 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/54/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/54/console
Test logs: 
http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-54/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12662431

 Reduce operator initialization failed when running multiple MR query on spark
 -

 Key: HIVE-7624
 URL: https://issues.apache.org/jira/browse/HIVE-7624
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Fix For: spark-branch

 Attachments: HIVE-7624.2-spark.patch, HIVE-7624.3-spark.patch, 
 HIVE-7624.4-spark.patch, HIVE-7624.5-spark.patch, HIVE-7624.6-spark.patch, 
 HIVE-7624.7-spark.patch, HIVE-7624.patch


 The following error occurs when I try to run a query with multiple reduce 
 works (M-R-R):
 {quote}
 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1)
 java.lang.RuntimeException: Reduce operator initialization failed
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31)
 at 
 org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
 at 
 org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
 at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
 at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
 at 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:54)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from 
 [0:_col0]
 at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
 at 
 org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147)
 …
 {quote}
 I suspect we're applying the reduce function in wrong order.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7761) Failed to analyze stats with CounterStatsAggregator.[SparkBranch]

2014-08-18 Thread Chengxiang Li (JIRA)
Chengxiang Li created HIVE-7761:
---

 Summary: Failed to analyze stats with 
CounterStatsAggregator.[SparkBranch]
 Key: HIVE-7761
 URL: https://issues.apache.org/jira/browse/HIVE-7761
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chengxiang Li


CounterStatsAggregator analyze stats with MR counter, we need to implement 
another CounterStatsAggregator based on spark speficed counter to analyze table 
stats. Here is the error information:
2014-08-17 23:46:34,436 ERROR stats.CounterStatsAggregator 
(CounterStatsAggregator.java:connect(51)) - Failed to get Job instance for null
java.lang.ClassCastException: org.apache.hadoop.hive.ql.exec.spark.SparkTask 
cannot be cast to org.apache.hadoop.hive.ql.exec.mr.MapRedTask
at 
org.apache.hadoop.hive.ql.stats.CounterStatsAggregator.connect(CounterStatsAggregator.java:46)
at 
org.apache.hadoop.hive.ql.exec.StatsTask.createStatsAggregator(StatsTask.java:282)
at 
org.apache.hadoop.hive.ql.exec.StatsTask.aggregateStats(StatsTask.java:142)
at org.apache.hadoop.hive.ql.exec.StatsTask.execute(StatsTask.java:118)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:161)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1534)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1301)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1113)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:937)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:927)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:246)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:198)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7761) Failed to analyze stats with CounterStatsAggregator.[SparkBranch]

2014-08-18 Thread Chengxiang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated HIVE-7761:


Description: 
CounterStatsAggregator analyze stats with MR counter, we need to implement 
another CounterStatsAggregator based on spark speficed counter to analyze table 
stats. Here is the error information:
{noformat}
2014-08-17 23:46:34,436 ERROR stats.CounterStatsAggregator 
(CounterStatsAggregator.java:connect(51)) - Failed to get Job instance for null
java.lang.ClassCastException: org.apache.hadoop.hive.ql.exec.spark.SparkTask 
cannot be cast to org.apache.hadoop.hive.ql.exec.mr.MapRedTask
at 
org.apache.hadoop.hive.ql.stats.CounterStatsAggregator.connect(CounterStatsAggregator.java:46)
at 
org.apache.hadoop.hive.ql.exec.StatsTask.createStatsAggregator(StatsTask.java:282)
at 
org.apache.hadoop.hive.ql.exec.StatsTask.aggregateStats(StatsTask.java:142)
at org.apache.hadoop.hive.ql.exec.StatsTask.execute(StatsTask.java:118)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:161)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1534)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1301)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1113)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:937)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:927)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:246)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:198)
{noformat}

  was:
CounterStatsAggregator analyze stats with MR counter, we need to implement 
another CounterStatsAggregator based on spark speficed counter to analyze table 
stats. Here is the error information:
2014-08-17 23:46:34,436 ERROR stats.CounterStatsAggregator 
(CounterStatsAggregator.java:connect(51)) - Failed to get Job instance for null
java.lang.ClassCastException: org.apache.hadoop.hive.ql.exec.spark.SparkTask 
cannot be cast to org.apache.hadoop.hive.ql.exec.mr.MapRedTask
at 
org.apache.hadoop.hive.ql.stats.CounterStatsAggregator.connect(CounterStatsAggregator.java:46)
at 
org.apache.hadoop.hive.ql.exec.StatsTask.createStatsAggregator(StatsTask.java:282)
at 
org.apache.hadoop.hive.ql.exec.StatsTask.aggregateStats(StatsTask.java:142)
at org.apache.hadoop.hive.ql.exec.StatsTask.execute(StatsTask.java:118)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:161)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1534)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1301)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1113)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:937)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:927)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:246)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:198)


 Failed to analyze stats with CounterStatsAggregator.[SparkBranch]
 -

 Key: HIVE-7761
 URL: https://issues.apache.org/jira/browse/HIVE-7761
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chengxiang Li

 CounterStatsAggregator analyze stats with MR counter, we need to implement 
 another CounterStatsAggregator based on spark speficed counter to analyze 
 table stats. Here is the error information:
 {noformat}
 2014-08-17 23:46:34,436 ERROR stats.CounterStatsAggregator 
 (CounterStatsAggregator.java:connect(51)) - Failed to get Job instance for 
 null
 java.lang.ClassCastException: org.apache.hadoop.hive.ql.exec.spark.SparkTask 
 cannot be cast to org.apache.hadoop.hive.ql.exec.mr.MapRedTask
 at 
 org.apache.hadoop.hive.ql.stats.CounterStatsAggregator.connect(CounterStatsAggregator.java:46)
 at 
 org.apache.hadoop.hive.ql.exec.StatsTask.createStatsAggregator(StatsTask.java:282)
 at 
 org.apache.hadoop.hive.ql.exec.StatsTask.aggregateStats(StatsTask.java:142)
 at 
 org.apache.hadoop.hive.ql.exec.StatsTask.execute(StatsTask.java:118)
 at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:161)
 at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
 at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1534)
 at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1301)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1113)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:937)
 at 

[jira] [Updated] (HIVE-6329) Support column level encryption/decryption

2014-08-18 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-6329:


Attachment: HIVE-6329.9.patch.txt

 Support column level encryption/decryption
 --

 Key: HIVE-6329
 URL: https://issues.apache.org/jira/browse/HIVE-6329
 Project: Hive
  Issue Type: New Feature
  Components: Security, Serializers/Deserializers
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-6329.1.patch.txt, HIVE-6329.2.patch.txt, 
 HIVE-6329.3.patch.txt, HIVE-6329.4.patch.txt, HIVE-6329.5.patch.txt, 
 HIVE-6329.6.patch.txt, HIVE-6329.7.patch.txt, HIVE-6329.8.patch.txt, 
 HIVE-6329.9.patch.txt


 Receiving some requirements on encryption recently but hive is not supporting 
 it. Before the full implementation via HIVE-5207, this might be useful for 
 some cases.
 {noformat}
 hive create table encode_test(id int, name STRING, phone STRING, address 
 STRING) 
  ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
  WITH SERDEPROPERTIES ('column.encode.columns'='phone,address', 
 'column.encode.classname'='org.apache.hadoop.hive.serde2.Base64WriteOnly') 
 STORED AS TEXTFILE;
 OK
 Time taken: 0.584 seconds
 hive insert into table encode_test select 
 100,'navis','010--','Seoul, Seocho' from src tablesample (1 rows);
 ..
 OK
 Time taken: 5.121 seconds
 hive select * from encode_test;
 OK
 100   navis MDEwLTAwMDAtMDAwMA==  U2VvdWwsIFNlb2Nobw==
 Time taken: 0.078 seconds, Fetched: 1 row(s)
 hive 
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-5799) session/operation timeout for hiveserver2

2014-08-18 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-5799:


Attachment: HIVE-5799.10.patch.txt

 session/operation timeout for hiveserver2
 -

 Key: HIVE-5799
 URL: https://issues.apache.org/jira/browse/HIVE-5799
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-5799.1.patch.txt, HIVE-5799.10.patch.txt, 
 HIVE-5799.2.patch.txt, HIVE-5799.3.patch.txt, HIVE-5799.4.patch.txt, 
 HIVE-5799.5.patch.txt, HIVE-5799.6.patch.txt, HIVE-5799.7.patch.txt, 
 HIVE-5799.8.patch.txt, HIVE-5799.9.patch.txt


 Need some timeout facility for preventing resource leakages from instable  or 
 bad clients.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-5718) Support direct fetch for lateral views, sub queries, etc.

2014-08-18 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-5718:


Attachment: HIVE-5718.9.patch.txt

Rerun test before commit

 Support direct fetch for lateral views, sub queries, etc.
 -

 Key: HIVE-5718
 URL: https://issues.apache.org/jira/browse/HIVE-5718
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: D13857.1.patch, D13857.2.patch, D13857.3.patch, 
 HIVE-5718.4.patch.txt, HIVE-5718.5.patch.txt, HIVE-5718.6.patch.txt, 
 HIVE-5718.7.patch.txt, HIVE-5718.8.patch.txt, HIVE-5718.9.patch.txt


 Extend HIVE-2925 with LV and SubQ.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-5690) Support subquery for single sourced multi query

2014-08-18 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-5690:


Attachment: HIVE-5690.9.patch.txt

 Support subquery for single sourced multi query
 ---

 Key: HIVE-5690
 URL: https://issues.apache.org/jira/browse/HIVE-5690
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: D13791.1.patch, HIVE-5690.2.patch.txt, 
 HIVE-5690.3.patch.txt, HIVE-5690.4.patch.txt, HIVE-5690.5.patch.txt, 
 HIVE-5690.6.patch.txt, HIVE-5690.7.patch.txt, HIVE-5690.8.patch.txt, 
 HIVE-5690.9.patch.txt


 Single sourced multi (insert) query is very useful for various ETL processes 
 but it does not allow subqueries included. For example, 
 {noformat}
 explain from src 
 insert overwrite table x1 select * from (select distinct key,value) b order 
 by key
 insert overwrite table x2 select * from (select distinct key,value) c order 
 by value;
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-5690) Support subquery for single sourced multi query

2014-08-18 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-5690:


Attachment: (was: HIVE-5690.9.patch.txt)

 Support subquery for single sourced multi query
 ---

 Key: HIVE-5690
 URL: https://issues.apache.org/jira/browse/HIVE-5690
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: D13791.1.patch, HIVE-5690.2.patch.txt, 
 HIVE-5690.3.patch.txt, HIVE-5690.4.patch.txt, HIVE-5690.5.patch.txt, 
 HIVE-5690.6.patch.txt, HIVE-5690.7.patch.txt, HIVE-5690.8.patch.txt


 Single sourced multi (insert) query is very useful for various ETL processes 
 but it does not allow subqueries included. For example, 
 {noformat}
 explain from src 
 insert overwrite table x1 select * from (select distinct key,value) b order 
 by key
 insert overwrite table x2 select * from (select distinct key,value) c order 
 by value;
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7738) tez select sum(decimal) from union all of decimal and null throws NPE

2014-08-18 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-7738:
--

Attachment: HIVE-7738.3.patch

added test query tez_union_decimal.q

 tez select sum(decimal) from union all of decimal and null throws NPE
 -

 Key: HIVE-7738
 URL: https://issues.apache.org/jira/browse/HIVE-7738
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.13.1
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
 Attachments: HIVE-7738.2.patch, HIVE-7738.2.patch, HIVE-7738.3.patch, 
 HIVE-7738.patch, HIVE-7738.patch, HIVE-7738.patch, HIVE-7738.patch


 if run this query using tez engine then hive will throw NPE
 {code}
 select sum(a) from (
   select cast(1.1 as decimal) a from dual
   union all
   select cast(null as decimal) a from dual
 ) t;
 {code}
 {code}
 hive select sum(a) from (
select cast(1.1 as decimal) a from dual
union all
select cast(null as decimal) a from dual
  ) t;
 Query ID = apivovarov_20140814200909_438385b2-4147-47bc-98a0-a01567bbb5c5
 Total jobs = 1
 Launching Job 1 out of 1
 Status: Running (application id: application_1407388228332_5616)
 Map 1: -/-Map 4: -/-  Reducer 3: 0/1  
 Map 1: 0/1Map 4: 0/1  Reducer 3: 0/1  
 Map 1: 0/1Map 4: 0/1  Reducer 3: 0/1  
 Map 1: 0/1Map 4: 1/1  Reducer 3: 0/1  
 Map 1: 0/1Map 4: 1/1  Reducer 3: 0/1  
 Map 1: 0/1Map 4: 1/1  Reducer 3: 0/1  
 Map 1: 0/1Map 4: 1/1  Reducer 3: 0/1  
 Map 1: 0/1Map 4: 1/1  Reducer 3: 0/1  
 Status: Failed
 Vertex failed, vertexName=Map 1, vertexId=vertex_1407388228332_5616_1_02, 
 diagnostics=[Task failed, taskId=task_1407388228332_5616_1_02_00, 
 diagnostics=[AttemptID:attempt_1407388228332_5616_1_02_00_0 Info:Error: 
 java.lang.RuntimeException: java.lang.RuntimeException: Map operator 
 initialization failed
   at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:188)
   at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:307)
   at 
 org.apache.hadoop.mapred.YarnTezDagChild$5.run(YarnTezDagChild.java:564)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
   at 
 org.apache.hadoop.mapred.YarnTezDagChild.main(YarnTezDagChild.java:553)
 Caused by: java.lang.RuntimeException: Map operator initialization failed
   at 
 org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:145)
   at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:164)
   ... 6 more
 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableConstantHiveDecimalObjectInspector.precision(WritableConstantHiveDecimalObjectInspector.java:61)
   at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDAFSum$GenericUDAFSumHiveDecimal.init(GenericUDAFSum.java:106)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:362)
   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)
   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:460)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:416)
   at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:67)
   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)
   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:460)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:416)
   at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:67)
   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)
   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:460)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:416)
   at 
 org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:189)
   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:425)
   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)
   at 
 org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:121)
   ... 7 more
 Container released by application, 
 AttemptID:attempt_1407388228332_5616_1_02_00_1 Info:Error: 
 

[jira] [Updated] (HIVE-5690) Support subquery for single sourced multi query

2014-08-18 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-5690:


Attachment: HIVE-5690.9.patch.txt

 Support subquery for single sourced multi query
 ---

 Key: HIVE-5690
 URL: https://issues.apache.org/jira/browse/HIVE-5690
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: D13791.1.patch, HIVE-5690.2.patch.txt, 
 HIVE-5690.3.patch.txt, HIVE-5690.4.patch.txt, HIVE-5690.5.patch.txt, 
 HIVE-5690.6.patch.txt, HIVE-5690.7.patch.txt, HIVE-5690.8.patch.txt, 
 HIVE-5690.9.patch.txt


 Single sourced multi (insert) query is very useful for various ETL processes 
 but it does not allow subqueries included. For example, 
 {noformat}
 explain from src 
 insert overwrite table x1 select * from (select distinct key,value) b order 
 by key
 insert overwrite table x2 select * from (select distinct key,value) c order 
 by value;
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-4788) RCFile and bzip2 compression not working

2014-08-18 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-4788:


Attachment: HIVE-4788.2.patch.txt

 RCFile and bzip2 compression not working
 

 Key: HIVE-4788
 URL: https://issues.apache.org/jira/browse/HIVE-4788
 Project: Hive
  Issue Type: Bug
  Components: Compression
Affects Versions: 0.10.0
 Environment: CDH4.2
Reporter: Johndee Burks
Assignee: Navis
Priority: Minor
 Attachments: HIVE-4788.1.patch.txt, HIVE-4788.2.patch.txt


 The issue is that Bzip2 compressed rcfile data is encountering an error when 
 being queried even the most simple query select *. The issue is easily 
 reproducible using the following. 
 Create a table and load the sample data below. 
 DDL: create table source_data (a string, b string) row format delimited 
 fields terminated by ',';
 Sample data: 
 apple,sauce 
 Test: 
 Do the following and you should receive the error listed below for the rcfile 
 table with bz2 compression. 
 create table rc_nobz2 (a string, b string) stored as rcfile; 
 insert into table rc_nobz2 select * from source_txt; 
 SET io.seqfile.compression.type=BLOCK; 
 SET hive.exec.compress.output=true; 
 SET mapred.compress.map.output=true; 
 SET mapred.output.compress=true; 
 SET mapred.output.compression.codec=org.apache.hadoop.io.compress.BZip2Codec; 
 create table rc_bz2 (a string, b string) stored as rcfile; 
 insert into table rc_bz2 select * from source_txt; 
 hive select * from rc_bz2; 
 Failed with exception java.io.IOException:java.io.IOException: Stream is not 
 BZip2 formatted: expected 'h' as first byte but got '�' 
 hive select * from rc_nobz2; 
 apple sauce



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Review Request 24792: RCFile and bzip2 compression not working

2014-08-18 Thread Navis Ryu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24792/
---

Review request for hive.


Bugs: HIVE-4788
https://issues.apache.org/jira/browse/HIVE-4788


Repository: hive-git


Description
---

The issue is that Bzip2 compressed rcfile data is encountering an error when 
being queried even the most simple query select *. The issue is easily 
reproducible using the following. 

Create a table and load the sample data below. 

DDL: create table source_data (a string, b string) row format delimited fields 
terminated by ',';

Sample data: 
apple,sauce 

Test: 

Do the following and you should receive the error listed below for the rcfile 
table with bz2 compression. 

create table rc_nobz2 (a string, b string) stored as rcfile; 
insert into table rc_nobz2 select * from source_txt; 

SET io.seqfile.compression.type=BLOCK; 
SET hive.exec.compress.output=true; 
SET mapred.compress.map.output=true; 
SET mapred.output.compress=true; 
SET mapred.output.compression.codec=org.apache.hadoop.io.compress.BZip2Codec; 

create table rc_bz2 (a string, b string) stored as rcfile; 
insert into table rc_bz2 select * from source_txt; 

hive select * from rc_bz2; 
Failed with exception java.io.IOException:java.io.IOException: Stream is not 
BZip2 formatted: expected 'h' as first byte but got 'ï¿¿' 
hive select * from rc_nobz2; 
apple   sauce


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java 2a27676 
  ql/src/test/queries/clientpositive/rcfile_compress.q PRE-CREATION 
  ql/src/test/results/clientpositive/rcfile_compress.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/24792/diff/


Testing
---


Thanks,

Navis Ryu



[jira] [Updated] (HIVE-7711) Error Serializing GenericUDF

2014-08-18 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-7711:


Attachment: HIVE-7711.1.patch.txt

 Error Serializing GenericUDF
 

 Key: HIVE-7711
 URL: https://issues.apache.org/jira/browse/HIVE-7711
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Dr. Christian Betz
 Attachments: HIVE-7711.1.patch.txt


 I get an exception running a job with a GenericUDF in HIVE 0.13.0 (which was 
 ok in HIVE 0.12.0).
 The org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc is serialized 
 using Kryo, trying to serialize stuff in my GenericUDF which is not 
 serializable (doesn't implement Serializable).
 Switching to Kryo made the comment in ExprNodeGenericFuncDesc obsolte:
 /**
* In case genericUDF is Serializable, we will serialize the object.
*
* In case genericUDF does not implement Serializable, Java will remember 
 the
* class of genericUDF and creates a new instance when deserialized. This is
* exactly what we want.
*/
 Find the stacktrace below, however, the description above should be clear.
 Exception in thread main 
 org.apache.hive.com.esotericsoftware.kryo.KryoException: 
 java.lang.UnsupportedOperationException
 Serialization trace:
 value (java.util.concurrent.atomic.AtomicReference)
 state (clojure.lang.Atom)
 state (udfs.ArraySum)
 genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
 colExprMap (org.apache.hadoop.hive.ql.exec.SelectOperator)
 childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
 aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
 mapWork (org.apache.hadoop.hive.ql.plan.MapredWork)
   at 
 org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
   at 
 org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
   at 
 org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
   at 
 org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
   at 
 org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
   at 
 org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
   at 
 org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
   at 
 org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
   at 
 org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
   at 
 org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
   at 
 org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
   at 
 org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
   at 
 org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:139)
   at 
 org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
   at 
 org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
   at 
 org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
   at 
 org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
   at 
 org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
   at 
 org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
   at 
 org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
   at 
 org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
   at 
 org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
   at 
 org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
   at 
 org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
   at 
 org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:139)
   at 
 org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
   at 
 org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
   at 
 org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
   at 
 org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
   at 
 org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
   at 
 

[jira] [Commented] (HIVE-7711) Error Serializing GenericUDF

2014-08-18 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100455#comment-14100455
 ] 

Navis commented on HIVE-7711:
-

[~cbbetz] Could you try this with attached patch? Looks like UDFs need some new 
annotation for kryo serialization.

 Error Serializing GenericUDF
 

 Key: HIVE-7711
 URL: https://issues.apache.org/jira/browse/HIVE-7711
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Dr. Christian Betz
 Attachments: HIVE-7711.1.patch.txt


 I get an exception running a job with a GenericUDF in HIVE 0.13.0 (which was 
 ok in HIVE 0.12.0).
 The org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc is serialized 
 using Kryo, trying to serialize stuff in my GenericUDF which is not 
 serializable (doesn't implement Serializable).
 Switching to Kryo made the comment in ExprNodeGenericFuncDesc obsolte:
 /**
* In case genericUDF is Serializable, we will serialize the object.
*
* In case genericUDF does not implement Serializable, Java will remember 
 the
* class of genericUDF and creates a new instance when deserialized. This is
* exactly what we want.
*/
 Find the stacktrace below, however, the description above should be clear.
 Exception in thread main 
 org.apache.hive.com.esotericsoftware.kryo.KryoException: 
 java.lang.UnsupportedOperationException
 Serialization trace:
 value (java.util.concurrent.atomic.AtomicReference)
 state (clojure.lang.Atom)
 state (udfs.ArraySum)
 genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
 colExprMap (org.apache.hadoop.hive.ql.exec.SelectOperator)
 childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
 aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
 mapWork (org.apache.hadoop.hive.ql.plan.MapredWork)
   at 
 org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
   at 
 org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
   at 
 org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
   at 
 org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
   at 
 org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
   at 
 org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
   at 
 org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
   at 
 org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
   at 
 org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
   at 
 org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
   at 
 org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
   at 
 org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
   at 
 org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:139)
   at 
 org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
   at 
 org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
   at 
 org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
   at 
 org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
   at 
 org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
   at 
 org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
   at 
 org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
   at 
 org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
   at 
 org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
   at 
 org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
   at 
 org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
   at 
 org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:139)
   at 
 org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
   at 
 org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
   at 
 org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
   at 
 org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
   at 
 

[jira] [Commented] (HIVE-6329) Support column level encryption/decryption

2014-08-18 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100464#comment-14100464
 ] 

Hive QA commented on HIVE-6329:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12662445/HIVE-6329.9.patch.txt

{color:red}ERROR:{color} -1 due to 21 failed/errored test(s), 5819 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_8
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_external_table_ppd
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_external_table_queries
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_map_queries
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_map_queries_prefix
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_storage_queries
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_custom_key
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_custom_key2
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_custom_key3
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_queries
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_ppd_key_ranges
org.apache.hadoop.hive.hbase.TestHBaseSerDe.testHBaseSerDeCompositeKeyWithoutSeparator
org.apache.hadoop.hive.hbase.TestHBaseSerDe.testHBaseSerDeII
org.apache.hadoop.hive.hbase.TestHBaseSerDe.testHBaseSerDeWithColumnPrefixes
org.apache.hadoop.hive.hbase.TestHBaseSerDe.testHBaseSerDeWithHiveMapToHBaseColumnFamily
org.apache.hadoop.hive.hbase.TestHBaseSerDe.testHBaseSerDeWithHiveMapToHBaseColumnFamilyII
org.apache.hadoop.hive.hbase.TestLazyHBaseObject.testLazyHBaseRow2
org.apache.hadoop.hive.hbase.TestLazyHBaseObject.testLazyHBaseRow3
org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.testPigFilterProjection
org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.testPigPopulation
org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/373/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/373/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-373/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 21 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12662445

 Support column level encryption/decryption
 --

 Key: HIVE-6329
 URL: https://issues.apache.org/jira/browse/HIVE-6329
 Project: Hive
  Issue Type: New Feature
  Components: Security, Serializers/Deserializers
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-6329.1.patch.txt, HIVE-6329.2.patch.txt, 
 HIVE-6329.3.patch.txt, HIVE-6329.4.patch.txt, HIVE-6329.5.patch.txt, 
 HIVE-6329.6.patch.txt, HIVE-6329.7.patch.txt, HIVE-6329.8.patch.txt, 
 HIVE-6329.9.patch.txt


 Receiving some requirements on encryption recently but hive is not supporting 
 it. Before the full implementation via HIVE-5207, this might be useful for 
 some cases.
 {noformat}
 hive create table encode_test(id int, name STRING, phone STRING, address 
 STRING) 
  ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
  WITH SERDEPROPERTIES ('column.encode.columns'='phone,address', 
 'column.encode.classname'='org.apache.hadoop.hive.serde2.Base64WriteOnly') 
 STORED AS TEXTFILE;
 OK
 Time taken: 0.584 seconds
 hive insert into table encode_test select 
 100,'navis','010--','Seoul, Seocho' from src tablesample (1 rows);
 ..
 OK
 Time taken: 5.121 seconds
 hive select * from encode_test;
 OK
 100   navis MDEwLTAwMDAtMDAwMA==  U2VvdWwsIFNlb2Nobw==
 Time taken: 0.078 seconds, Fetched: 1 row(s)
 hive 
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7528) Support cluster by and distributed by

2014-08-18 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-7528:
-

Attachment: HIVE-7528.spark.patch

 Support cluster by and distributed by
 -

 Key: HIVE-7528
 URL: https://issues.apache.org/jira/browse/HIVE-7528
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Rui Li
 Attachments: HIVE-7528.spark.patch


 clustered by = distributed by + sort by, so this is related to HIVE-7527. If 
 sort by is in place, the assumption is that we don't need to do anything 
 about distributed by or clustered by. Still, we need to confirm and verify.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7528) Support cluster by and distributed by

2014-08-18 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100467#comment-14100467
 ] 

Rui Li commented on HIVE-7528:
--

Distribute/cluster by should work with the sort shuffler in place. This patch 
is mainly some refinement to the current shuffle code.

 Support cluster by and distributed by
 -

 Key: HIVE-7528
 URL: https://issues.apache.org/jira/browse/HIVE-7528
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Rui Li
 Attachments: HIVE-7528.spark.patch


 clustered by = distributed by + sort by, so this is related to HIVE-7527. If 
 sort by is in place, the assumption is that we don't need to do anything 
 about distributed by or clustered by. Still, we need to confirm and verify.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7528) Support cluster by and distributed by

2014-08-18 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-7528:
-

Status: Patch Available  (was: Open)

 Support cluster by and distributed by
 -

 Key: HIVE-7528
 URL: https://issues.apache.org/jira/browse/HIVE-7528
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Rui Li
 Attachments: HIVE-7528.spark.patch


 clustered by = distributed by + sort by, so this is related to HIVE-7527. If 
 sort by is in place, the assumption is that we don't need to do anything 
 about distributed by or clustered by. Still, we need to confirm and verify.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-5799) session/operation timeout for hiveserver2

2014-08-18 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100470#comment-14100470
 ] 

Hive QA commented on HIVE-5799:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12662446/HIVE-5799.10.patch.txt

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/374/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/374/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-374/

Messages:
{noformat}
 This message was trimmed, see log for full details 
[INFO] 
[INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ 
hive-shims-0.23 ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory 
/data/hive-ptest/working/apache-svn-trunk-source/shims/0.23/src/test/resources
[INFO] Copying 3 resources
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (setup-test-dirs) @ hive-shims-0.23 ---
[INFO] Executing tasks

main:
[mkdir] Created dir: 
/data/hive-ptest/working/apache-svn-trunk-source/shims/0.23/target/tmp
[mkdir] Created dir: 
/data/hive-ptest/working/apache-svn-trunk-source/shims/0.23/target/warehouse
[mkdir] Created dir: 
/data/hive-ptest/working/apache-svn-trunk-source/shims/0.23/target/tmp/conf
 [copy] Copying 7 files to 
/data/hive-ptest/working/apache-svn-trunk-source/shims/0.23/target/tmp/conf
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ 
hive-shims-0.23 ---
[INFO] No sources to compile
[INFO] 
[INFO] --- maven-surefire-plugin:2.16:test (default-test) @ hive-shims-0.23 ---
[INFO] Tests are skipped.
[INFO] 
[INFO] --- maven-jar-plugin:2.2:jar (default-jar) @ hive-shims-0.23 ---
[INFO] Building jar: 
/data/hive-ptest/working/apache-svn-trunk-source/shims/0.23/target/hive-shims-0.23-0.14.0-SNAPSHOT.jar
[INFO] 
[INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ 
hive-shims-0.23 ---
[INFO] 
[INFO] --- maven-install-plugin:2.4:install (default-install) @ hive-shims-0.23 
---
[INFO] Installing 
/data/hive-ptest/working/apache-svn-trunk-source/shims/0.23/target/hive-shims-0.23-0.14.0-SNAPSHOT.jar
 to 
/data/hive-ptest/working/maven/org/apache/hive/shims/hive-shims-0.23/0.14.0-SNAPSHOT/hive-shims-0.23-0.14.0-SNAPSHOT.jar
[INFO] Installing 
/data/hive-ptest/working/apache-svn-trunk-source/shims/0.23/pom.xml to 
/data/hive-ptest/working/maven/org/apache/hive/shims/hive-shims-0.23/0.14.0-SNAPSHOT/hive-shims-0.23-0.14.0-SNAPSHOT.pom
[INFO] 
[INFO] 
[INFO] Building Hive Shims 0.14.0-SNAPSHOT
[INFO] 
[INFO] 
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ hive-shims ---
[INFO] Deleting 
/data/hive-ptest/working/apache-svn-trunk-source/shims/aggregator (includes = 
[datanucleus.log, derby.log], excludes = [])
[INFO] 
[INFO] --- maven-remote-resources-plugin:1.5:process (default) @ hive-shims ---
[INFO] 
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ 
hive-shims ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory 
/data/hive-ptest/working/apache-svn-trunk-source/shims/aggregator/src/main/resources
[INFO] Copying 3 resources
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (define-classpath) @ hive-shims ---
[INFO] Executing tasks

main:
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ hive-shims ---
[INFO] No sources to compile
[INFO] 
[INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ 
hive-shims ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory 
/data/hive-ptest/working/apache-svn-trunk-source/shims/aggregator/src/test/resources
[INFO] Copying 3 resources
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (setup-test-dirs) @ hive-shims ---
[INFO] Executing tasks

main:
[mkdir] Created dir: 
/data/hive-ptest/working/apache-svn-trunk-source/shims/aggregator/target/tmp
[mkdir] Created dir: 
/data/hive-ptest/working/apache-svn-trunk-source/shims/aggregator/target/warehouse
[mkdir] Created dir: 
/data/hive-ptest/working/apache-svn-trunk-source/shims/aggregator/target/tmp/conf
 [copy] Copying 7 files to 
/data/hive-ptest/working/apache-svn-trunk-source/shims/aggregator/target/tmp/conf
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ 
hive-shims ---
[INFO] No sources to compile
[INFO] 
[INFO] --- 

[jira] [Created] (HIVE-7762) Enhancement while getting partitions via webhcat client

2014-08-18 Thread Suhas Vasu (JIRA)
Suhas Vasu created HIVE-7762:


 Summary: Enhancement while getting partitions via webhcat client
 Key: HIVE-7762
 URL: https://issues.apache.org/jira/browse/HIVE-7762
 Project: Hive
  Issue Type: Improvement
  Components: WebHCat
Reporter: Suhas Vasu
Priority: Minor


Hcatalog creates partitions in lower case, whereas getting partitions from 
hcatalog via webhcat client doesn't handle this. So the client starts throwing 
exceptions.
Ex:
CREATE EXTERNAL TABLE in_table (word STRING, cnt INT) PARTITIONED BY (Year 
STRING, Month STRING, Date STRING, Hour STRING, Minute STRING) STORED AS 
TEXTFILE LOCATION '/user/suhas/hcat-data/in/';

Then i try to get partitions by:
{noformat}
String inputTableName = in_table;
String database = default;

MapString, String partitionSpec = new HashMapString, String();
partitionSpec.put(Year, 2014);
partitionSpec.put(Month, 08);
partitionSpec.put(Date, 11);
partitionSpec.put(Hour, 00);
partitionSpec.put(Minute, 00);

HCatClient client = get(catalogUrl);
HCatPartition hCatPartition = client.getPartition(database, 
inputTableName, partitionSpec);
{noformat}

This throws up saying:
{noformat}
Exception in thread main org.apache.hcatalog.common.HCatException : 9001 : 
Exception occurred while processing HCat request : Invalid partition-key 
specified: year
at 
org.apache.hcatalog.api.HCatClientHMSImpl.getPartition(HCatClientHMSImpl.java:366)
at com.inmobi.demo.HcatPartitions.main(HcatPartitions.java:34)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
{noformat}

The same code works if i do
{noformat}
partitionSpec.put(year, 2014);
partitionSpec.put(month, 08);
partitionSpec.put(date, 11);
partitionSpec.put(hour, 00);
partitionSpec.put(minute, 00);
{noformat}




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7763) Failed to qeury TABLESAMPLE on empty bucket table.[Spark Branch]

2014-08-18 Thread Chengxiang Li (JIRA)
Chengxiang Li created HIVE-7763:
---

 Summary: Failed to qeury TABLESAMPLE on empty bucket table.[Spark 
Branch]
 Key: HIVE-7763
 URL: https://issues.apache.org/jira/browse/HIVE-7763
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li


Get the following exception:
{noformat}
2014-08-18 16:23:15,213 ERROR [Executor task launch worker-0]: 
executor.Executor (Logging.scala:logError(96)) - Exception in task 0.0 in stage 
1.0 (TID 0)
java.lang.RuntimeException: Map operator initialization failed
at 
org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:127)
at 
org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:52)
at 
org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:30)
at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:54)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Configuration and input path 
are inconsistent
at 
org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:404)
at 
org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:93)
... 16 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Configuration and 
input path are inconsistent
at org.apache.hadoop.hive.ql.exec
{noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7763) Failed to qeury TABLESAMPLE on empty bucket table.[Spark Branch]

2014-08-18 Thread Chengxiang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated HIVE-7763:


Attachment: HIVE-7763.1-spark.patch

 Failed to qeury TABLESAMPLE on empty bucket table.[Spark Branch]
 

 Key: HIVE-7763
 URL: https://issues.apache.org/jira/browse/HIVE-7763
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
 Attachments: HIVE-7763.1-spark.patch


 Get the following exception:
 {noformat}
 2014-08-18 16:23:15,213 ERROR [Executor task launch worker-0]: 
 executor.Executor (Logging.scala:logError(96)) - Exception in task 0.0 in 
 stage 1.0 (TID 0)
 java.lang.RuntimeException: Map operator initialization failed
 at 
 org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:127)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:52)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:30)
 at 
 org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
 at 
 org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
 at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
 at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
 at 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:54)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:722)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Configuration and input 
 path are inconsistent
 at 
 org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:404)
 at 
 org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:93)
 ... 16 more
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Configuration 
 and input path are inconsistent
 at org.apache.hadoop.hive.ql.exec
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7763) Failed to qeury TABLESAMPLE on empty bucket table.[Spark Branch]

2014-08-18 Thread Chengxiang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated HIVE-7763:


Status: Patch Available  (was: Open)

 Failed to qeury TABLESAMPLE on empty bucket table.[Spark Branch]
 

 Key: HIVE-7763
 URL: https://issues.apache.org/jira/browse/HIVE-7763
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
 Attachments: HIVE-7763.1-spark.patch


 Get the following exception:
 {noformat}
 2014-08-18 16:23:15,213 ERROR [Executor task launch worker-0]: 
 executor.Executor (Logging.scala:logError(96)) - Exception in task 0.0 in 
 stage 1.0 (TID 0)
 java.lang.RuntimeException: Map operator initialization failed
 at 
 org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:127)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:52)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:30)
 at 
 org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
 at 
 org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
 at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
 at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
 at 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:54)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:722)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Configuration and input 
 path are inconsistent
 at 
 org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:404)
 at 
 org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:93)
 ... 16 more
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Configuration 
 and input path are inconsistent
 at org.apache.hadoop.hive.ql.exec
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6934) PartitionPruner doesn't handle top level constant expression correctly

2014-08-18 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-6934:


Status: Patch Available  (was: Open)

 PartitionPruner doesn't handle top level constant expression correctly
 --

 Key: HIVE-6934
 URL: https://issues.apache.org/jira/browse/HIVE-6934
 Project: Hive
  Issue Type: Bug
Reporter: Harish Butani
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-6934.1.patch, HIVE-6934.2.patch, HIVE-6934.3.patch, 
 HIVE-6934.4.patch, HIVE-6934.5.patch, HIVE-6934.6.patch


 You hit this error indirectly, because how we handle invalid constant 
 comparisons. Consider:
 {code}
 create table x(key int, value string) partitioned by (dt int, ts string);
 -- both these queries hit this issue
 select * from x where key = 'abc';
 select * from x where dt = 'abc';
 -- the issue is the comparison get converted to the constant false
 -- and the PartitionPruner doesn't handle top level constant exprs corrcetly
 {code}
 Thanks to [~hsubramaniyan] for uncovering this as part of adding tests for 
 HIVE-5376



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6934) PartitionPruner doesn't handle top level constant expression correctly

2014-08-18 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-6934:


Attachment: HIVE-6934.6.patch

 PartitionPruner doesn't handle top level constant expression correctly
 --

 Key: HIVE-6934
 URL: https://issues.apache.org/jira/browse/HIVE-6934
 Project: Hive
  Issue Type: Bug
Reporter: Harish Butani
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-6934.1.patch, HIVE-6934.2.patch, HIVE-6934.3.patch, 
 HIVE-6934.4.patch, HIVE-6934.5.patch, HIVE-6934.6.patch


 You hit this error indirectly, because how we handle invalid constant 
 comparisons. Consider:
 {code}
 create table x(key int, value string) partitioned by (dt int, ts string);
 -- both these queries hit this issue
 select * from x where key = 'abc';
 select * from x where dt = 'abc';
 -- the issue is the comparison get converted to the constant false
 -- and the PartitionPruner doesn't handle top level constant exprs corrcetly
 {code}
 Thanks to [~hsubramaniyan] for uncovering this as part of adding tests for 
 HIVE-5376



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6934) PartitionPruner doesn't handle top level constant expression correctly

2014-08-18 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-6934:


Status: Open  (was: Patch Available)

 PartitionPruner doesn't handle top level constant expression correctly
 --

 Key: HIVE-6934
 URL: https://issues.apache.org/jira/browse/HIVE-6934
 Project: Hive
  Issue Type: Bug
Reporter: Harish Butani
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-6934.1.patch, HIVE-6934.2.patch, HIVE-6934.3.patch, 
 HIVE-6934.4.patch, HIVE-6934.5.patch, HIVE-6934.6.patch


 You hit this error indirectly, because how we handle invalid constant 
 comparisons. Consider:
 {code}
 create table x(key int, value string) partitioned by (dt int, ts string);
 -- both these queries hit this issue
 select * from x where key = 'abc';
 select * from x where dt = 'abc';
 -- the issue is the comparison get converted to the constant false
 -- and the PartitionPruner doesn't handle top level constant exprs corrcetly
 {code}
 Thanks to [~hsubramaniyan] for uncovering this as part of adding tests for 
 HIVE-5376



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7764) Support all JDBC-HiveServer2 authentication modes on a secure cluster

2014-08-18 Thread Vaibhav Gumashta (JIRA)
Vaibhav Gumashta created HIVE-7764:
--

 Summary: Support all JDBC-HiveServer2 authentication modes on a 
secure cluster
 Key: HIVE-7764
 URL: https://issues.apache.org/jira/browse/HIVE-7764
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, JDBC
Affects Versions: 0.14.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
 Fix For: 0.14.0


Currently, HiveServer2 logs in with its keytab only if 
hive.server2.authentication is set to KERBEROS. However, 
hive.server2.authentication is config that determines the auth type an end user 
will use while authenticating with HiveServer2. There is a valid use case of 
user authenticating with HiveServer2 using LDAP for example, while HiveServer2 
runs the query on a kerberized cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7764) Support all JDBC-HiveServer2 authentication modes on a secure cluster

2014-08-18 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-7764:
---

Attachment: HIVE-7764.1.patch

 Support all JDBC-HiveServer2 authentication modes on a secure cluster
 -

 Key: HIVE-7764
 URL: https://issues.apache.org/jira/browse/HIVE-7764
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, JDBC
Affects Versions: 0.14.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
 Fix For: 0.14.0

 Attachments: HIVE-7764.1.patch


 Currently, HiveServer2 logs in with its keytab only if 
 hive.server2.authentication is set to KERBEROS. However, 
 hive.server2.authentication is config that determines the auth type an end 
 user will use while authenticating with HiveServer2. There is a valid use 
 case of user authenticating with HiveServer2 using LDAP for example, while 
 HiveServer2 runs the query on a kerberized cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7764) Support all JDBC-HiveServer2 authentication modes on a secure cluster

2014-08-18 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-7764:
---

Status: Patch Available  (was: Open)

 Support all JDBC-HiveServer2 authentication modes on a secure cluster
 -

 Key: HIVE-7764
 URL: https://issues.apache.org/jira/browse/HIVE-7764
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, JDBC
Affects Versions: 0.14.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
 Fix For: 0.14.0

 Attachments: HIVE-7764.1.patch


 Currently, HiveServer2 logs in with its keytab only if 
 hive.server2.authentication is set to KERBEROS. However, 
 hive.server2.authentication is config that determines the auth type an end 
 user will use while authenticating with HiveServer2. There is a valid use 
 case of user authenticating with HiveServer2 using LDAP for example, while 
 HiveServer2 runs the query on a kerberized cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7353) HiveServer2 using embedded MetaStore leaks JDOPersistanceManager

2014-08-18 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-7353:
---

Attachment: HIVE-7353.4.patch

Patch rebased on trunk

 HiveServer2 using embedded MetaStore leaks JDOPersistanceManager
 

 Key: HIVE-7353
 URL: https://issues.apache.org/jira/browse/HIVE-7353
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.13.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
 Fix For: 0.14.0

 Attachments: HIVE-7353.1.patch, HIVE-7353.2.patch, HIVE-7353.3.patch, 
 HIVE-7353.4.patch


 While using embedded metastore, while creating background threads to run 
 async operations, HiveServer2 ends up creating new instances of 
 JDOPersistanceManager rather than using the one from the foreground (handler) 
 thread. Since JDOPersistanceManagerFactory caches JDOPersistanceManager 
 instances, they are never GCed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7353) HiveServer2 using embedded MetaStore leaks JDOPersistanceManager

2014-08-18 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-7353:
---

Status: Patch Available  (was: Open)

 HiveServer2 using embedded MetaStore leaks JDOPersistanceManager
 

 Key: HIVE-7353
 URL: https://issues.apache.org/jira/browse/HIVE-7353
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.13.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
 Fix For: 0.14.0

 Attachments: HIVE-7353.1.patch, HIVE-7353.2.patch, HIVE-7353.3.patch, 
 HIVE-7353.4.patch


 While using embedded metastore, while creating background threads to run 
 async operations, HiveServer2 ends up creating new instances of 
 JDOPersistanceManager rather than using the one from the foreground (handler) 
 thread. Since JDOPersistanceManagerFactory caches JDOPersistanceManager 
 instances, they are never GCed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7353) HiveServer2 using embedded MetaStore leaks JDOPersistanceManager

2014-08-18 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-7353:
---

Status: Open  (was: Patch Available)

 HiveServer2 using embedded MetaStore leaks JDOPersistanceManager
 

 Key: HIVE-7353
 URL: https://issues.apache.org/jira/browse/HIVE-7353
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.13.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
 Fix For: 0.14.0

 Attachments: HIVE-7353.1.patch, HIVE-7353.2.patch, HIVE-7353.3.patch, 
 HIVE-7353.4.patch


 While using embedded metastore, while creating background threads to run 
 async operations, HiveServer2 ends up creating new instances of 
 JDOPersistanceManager rather than using the one from the foreground (handler) 
 thread. Since JDOPersistanceManagerFactory caches JDOPersistanceManager 
 instances, they are never GCed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7353) HiveServer2 using embedded MetaStore leaks JDOPersistanceManager

2014-08-18 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-7353:
---

Description: While using embedded metastore, while creating background 
threads to run async operations, HiveServer2 ends up creating new instances of 
JDOPersistanceManager which are cached in JDOPersistanceManagerFactory. Even 
when the background thread is killed by the thread pool manager, the 
JDOPersistanceManager are never GCed because they are cached by 
JDOPersistanceManagerFactory.  (was: While using embedded metastore, while 
creating background threads to run async operations, HiveServer2 ends up 
creating new instances of JDOPersistanceManager rather than using the one from 
the foreground (handler) thread. Since JDOPersistanceManagerFactory caches 
JDOPersistanceManager instances, they are never GCed.)

 HiveServer2 using embedded MetaStore leaks JDOPersistanceManager
 

 Key: HIVE-7353
 URL: https://issues.apache.org/jira/browse/HIVE-7353
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.13.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
 Fix For: 0.14.0

 Attachments: HIVE-7353.1.patch, HIVE-7353.2.patch, HIVE-7353.3.patch, 
 HIVE-7353.4.patch


 While using embedded metastore, while creating background threads to run 
 async operations, HiveServer2 ends up creating new instances of 
 JDOPersistanceManager which are cached in JDOPersistanceManagerFactory. Even 
 when the background thread is killed by the thread pool manager, the 
 JDOPersistanceManager are never GCed because they are cached by 
 JDOPersistanceManagerFactory.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-5718) Support direct fetch for lateral views, sub queries, etc.

2014-08-18 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100518#comment-14100518
 ] 

Hive QA commented on HIVE-5718:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12662448/HIVE-5718.9.patch.txt

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 5817 tests executed
*Failed tests:*
{noformat}
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/375/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/375/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-375/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12662448

 Support direct fetch for lateral views, sub queries, etc.
 -

 Key: HIVE-5718
 URL: https://issues.apache.org/jira/browse/HIVE-5718
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: D13857.1.patch, D13857.2.patch, D13857.3.patch, 
 HIVE-5718.4.patch.txt, HIVE-5718.5.patch.txt, HIVE-5718.6.patch.txt, 
 HIVE-5718.7.patch.txt, HIVE-5718.8.patch.txt, HIVE-5718.9.patch.txt


 Extend HIVE-2925 with LV and SubQ.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7341) Support for Table replication across HCatalog instances

2014-08-18 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100533#comment-14100533
 ] 

Sushanth Sowmyan commented on HIVE-7341:


+1, committing.

 Support for Table replication across HCatalog instances
 ---

 Key: HIVE-7341
 URL: https://issues.apache.org/jira/browse/HIVE-7341
 Project: Hive
  Issue Type: New Feature
  Components: HCatalog
Affects Versions: 0.13.1
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan
 Fix For: 0.14.0

 Attachments: HIVE-7341.1.patch, HIVE-7341.2.patch, HIVE-7341.3.patch, 
 HIVE-7341.4.patch, HIVE-7341.5.patch


 The HCatClient currently doesn't provide very much support for replicating 
 HCatTable definitions between 2 HCatalog Server (i.e. Hive metastore) 
 instances. 
 Systems similar to Apache Falcon might find the need to replicate partition 
 data between 2 clusters, and keep the HCatalog metadata in sync between the 
 two. This poses a couple of problems:
 # The definition of the source table might change (in column schema, I/O 
 formats, record-formats, serde-parameters, etc.) The system will need a way 
 to diff 2 tables and update the target-metastore with the changes. E.g. 
 {code}
 targetTable.resolve( sourceTable, targetTable.diff(sourceTable) );
 hcatClient.updateTableSchema(dbName, tableName, targetTable);
 {code}
 # The current {{HCatClient.addPartitions()}} API requires that the 
 partition's schema be derived from the table's schema, thereby requiring that 
 the table-schema be resolved *before* partitions with the new schema are 
 added to the table. This is problematic, because it introduces race 
 conditions when 2 partitions with differing column-schemas (e.g. right after 
 a schema change) are copied in parallel. This can be avoided if each 
 HCatAddPartitionDesc kept track of the partition's schema, in flight.
 # The source and target metastores might be running different/incompatible 
 versions of Hive. 
 The impending patch attempts to address these concerns (with some caveats).
 # {{HCatTable}} now has 
 ## a {{diff()}} method, to compare against another HCatTable instance
 ## a {{resolve(diff)}} method to copy over specified table-attributes from 
 another HCatTable
 ## a serialize/deserialize mechanism (via {{HCatClient.serializeTable()}} and 
 {{HCatClient.deserializeTable()}}), so that HCatTable instances constructed 
 in other class-loaders may be used for comparison
 # {{HCatPartition}} now provides finer-grained control over a Partition's 
 column-schema, StorageDescriptor settings, etc. This allows partitions to be 
 copied completely from source, with the ability to override specific 
 properties if required (e.g. location).
 # {{HCatClient.updateTableSchema()}} can now update the entire 
 table-definition, not just the column schema.
 # I've cleaned up and removed most of the redundancy between the HCatTable, 
 HCatCreateTableDesc and HCatCreateTableDesc.Builder. The prior API failed to 
 separate the table-attributes from the add-table-operation's attributes. By 
 providing fluent-interfaces in HCatTable, and composing an HCatTable instance 
 in HCatCreateTableDesc, the interfaces are cleaner(ish). The old setters are 
 deprecated, in favour of those in HCatTable. Likewise, HCatPartition and 
 HCatAddPartitionDesc.
 I'll post a patch for trunk shortly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7763) Failed to qeury TABLESAMPLE on empty bucket table.[Spark Branch]

2014-08-18 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100534#comment-14100534
 ] 

Hive QA commented on HIVE-7763:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12662464/HIVE-7763.1-spark.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 5915 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/55/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/55/console
Test logs: 
http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-55/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12662464

 Failed to qeury TABLESAMPLE on empty bucket table.[Spark Branch]
 

 Key: HIVE-7763
 URL: https://issues.apache.org/jira/browse/HIVE-7763
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
 Attachments: HIVE-7763.1-spark.patch


 Get the following exception:
 {noformat}
 2014-08-18 16:23:15,213 ERROR [Executor task launch worker-0]: 
 executor.Executor (Logging.scala:logError(96)) - Exception in task 0.0 in 
 stage 1.0 (TID 0)
 java.lang.RuntimeException: Map operator initialization failed
 at 
 org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:127)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:52)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:30)
 at 
 org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
 at 
 org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
 at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
 at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
 at 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:54)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:722)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Configuration and input 
 path are inconsistent
 at 
 org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:404)
 at 
 org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:93)
 ... 16 more
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Configuration 
 and input path are inconsistent
 at org.apache.hadoop.hive.ql.exec
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7341) Support for Table replication across HCatalog instances

2014-08-18 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-7341:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed. Thanks, Mithun!

(@Lefty: There isn't much of a need of end-user documentation for this patch, 
but possibly a programmer documentation aspect, which should mostly be covered 
by javadocs and the bug report here)

 Support for Table replication across HCatalog instances
 ---

 Key: HIVE-7341
 URL: https://issues.apache.org/jira/browse/HIVE-7341
 Project: Hive
  Issue Type: New Feature
  Components: HCatalog
Affects Versions: 0.13.1
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan
 Fix For: 0.14.0

 Attachments: HIVE-7341.1.patch, HIVE-7341.2.patch, HIVE-7341.3.patch, 
 HIVE-7341.4.patch, HIVE-7341.5.patch


 The HCatClient currently doesn't provide very much support for replicating 
 HCatTable definitions between 2 HCatalog Server (i.e. Hive metastore) 
 instances. 
 Systems similar to Apache Falcon might find the need to replicate partition 
 data between 2 clusters, and keep the HCatalog metadata in sync between the 
 two. This poses a couple of problems:
 # The definition of the source table might change (in column schema, I/O 
 formats, record-formats, serde-parameters, etc.) The system will need a way 
 to diff 2 tables and update the target-metastore with the changes. E.g. 
 {code}
 targetTable.resolve( sourceTable, targetTable.diff(sourceTable) );
 hcatClient.updateTableSchema(dbName, tableName, targetTable);
 {code}
 # The current {{HCatClient.addPartitions()}} API requires that the 
 partition's schema be derived from the table's schema, thereby requiring that 
 the table-schema be resolved *before* partitions with the new schema are 
 added to the table. This is problematic, because it introduces race 
 conditions when 2 partitions with differing column-schemas (e.g. right after 
 a schema change) are copied in parallel. This can be avoided if each 
 HCatAddPartitionDesc kept track of the partition's schema, in flight.
 # The source and target metastores might be running different/incompatible 
 versions of Hive. 
 The impending patch attempts to address these concerns (with some caveats).
 # {{HCatTable}} now has 
 ## a {{diff()}} method, to compare against another HCatTable instance
 ## a {{resolve(diff)}} method to copy over specified table-attributes from 
 another HCatTable
 ## a serialize/deserialize mechanism (via {{HCatClient.serializeTable()}} and 
 {{HCatClient.deserializeTable()}}), so that HCatTable instances constructed 
 in other class-loaders may be used for comparison
 # {{HCatPartition}} now provides finer-grained control over a Partition's 
 column-schema, StorageDescriptor settings, etc. This allows partitions to be 
 copied completely from source, with the ability to override specific 
 properties if required (e.g. location).
 # {{HCatClient.updateTableSchema()}} can now update the entire 
 table-definition, not just the column schema.
 # I've cleaned up and removed most of the redundancy between the HCatTable, 
 HCatCreateTableDesc and HCatCreateTableDesc.Builder. The prior API failed to 
 separate the table-attributes from the add-table-operation's attributes. By 
 providing fluent-interfaces in HCatTable, and composing an HCatTable instance 
 in HCatCreateTableDesc, the interfaces are cleaner(ish). The old setters are 
 deprecated, in favour of those in HCatTable. Likewise, HCatPartition and 
 HCatAddPartitionDesc.
 I'll post a patch for trunk shortly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7762) Enhancement while getting partitions via webhcat client

2014-08-18 Thread Suhas Vasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suhas Vasu updated HIVE-7762:
-

Attachment: HIVE-7762.patch

 Enhancement while getting partitions via webhcat client
 ---

 Key: HIVE-7762
 URL: https://issues.apache.org/jira/browse/HIVE-7762
 Project: Hive
  Issue Type: Improvement
  Components: WebHCat
Reporter: Suhas Vasu
Priority: Minor
 Attachments: HIVE-7762.patch


 Hcatalog creates partitions in lower case, whereas getting partitions from 
 hcatalog via webhcat client doesn't handle this. So the client starts 
 throwing exceptions.
 Ex:
 CREATE EXTERNAL TABLE in_table (word STRING, cnt INT) PARTITIONED BY (Year 
 STRING, Month STRING, Date STRING, Hour STRING, Minute STRING) STORED AS 
 TEXTFILE LOCATION '/user/suhas/hcat-data/in/';
 Then i try to get partitions by:
 {noformat}
 String inputTableName = in_table;
 String database = default;
 MapString, String partitionSpec = new HashMapString, String();
 partitionSpec.put(Year, 2014);
 partitionSpec.put(Month, 08);
 partitionSpec.put(Date, 11);
 partitionSpec.put(Hour, 00);
 partitionSpec.put(Minute, 00);
 HCatClient client = get(catalogUrl);
 HCatPartition hCatPartition = client.getPartition(database, 
 inputTableName, partitionSpec);
 {noformat}
 This throws up saying:
 {noformat}
 Exception in thread main org.apache.hcatalog.common.HCatException : 9001 : 
 Exception occurred while processing HCat request : Invalid partition-key 
 specified: year
   at 
 org.apache.hcatalog.api.HCatClientHMSImpl.getPartition(HCatClientHMSImpl.java:366)
   at com.inmobi.demo.HcatPartitions.main(HcatPartitions.java:34)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
 {noformat}
 The same code works if i do
 {noformat}
 partitionSpec.put(year, 2014);
 partitionSpec.put(month, 08);
 partitionSpec.put(date, 11);
 partitionSpec.put(hour, 00);
 partitionSpec.put(minute, 00);
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7762) Enhancement while getting partitions via webhcat client

2014-08-18 Thread Suhas Vasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suhas Vasu updated HIVE-7762:
-

Status: Patch Available  (was: Open)

 Enhancement while getting partitions via webhcat client
 ---

 Key: HIVE-7762
 URL: https://issues.apache.org/jira/browse/HIVE-7762
 Project: Hive
  Issue Type: Improvement
  Components: WebHCat
Reporter: Suhas Vasu
Priority: Minor
 Attachments: HIVE-7762.patch


 Hcatalog creates partitions in lower case, whereas getting partitions from 
 hcatalog via webhcat client doesn't handle this. So the client starts 
 throwing exceptions.
 Ex:
 CREATE EXTERNAL TABLE in_table (word STRING, cnt INT) PARTITIONED BY (Year 
 STRING, Month STRING, Date STRING, Hour STRING, Minute STRING) STORED AS 
 TEXTFILE LOCATION '/user/suhas/hcat-data/in/';
 Then i try to get partitions by:
 {noformat}
 String inputTableName = in_table;
 String database = default;
 MapString, String partitionSpec = new HashMapString, String();
 partitionSpec.put(Year, 2014);
 partitionSpec.put(Month, 08);
 partitionSpec.put(Date, 11);
 partitionSpec.put(Hour, 00);
 partitionSpec.put(Minute, 00);
 HCatClient client = get(catalogUrl);
 HCatPartition hCatPartition = client.getPartition(database, 
 inputTableName, partitionSpec);
 {noformat}
 This throws up saying:
 {noformat}
 Exception in thread main org.apache.hcatalog.common.HCatException : 9001 : 
 Exception occurred while processing HCat request : Invalid partition-key 
 specified: year
   at 
 org.apache.hcatalog.api.HCatClientHMSImpl.getPartition(HCatClientHMSImpl.java:366)
   at com.inmobi.demo.HcatPartitions.main(HcatPartitions.java:34)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
 {noformat}
 The same code works if i do
 {noformat}
 partitionSpec.put(year, 2014);
 partitionSpec.put(month, 08);
 partitionSpec.put(date, 11);
 partitionSpec.put(hour, 00);
 partitionSpec.put(minute, 00);
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7068) Integrate AccumuloStorageHandler

2014-08-18 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100541#comment-14100541
 ] 

Sushanth Sowmyan commented on HIVE-7068:


I agree with Nick and Navis - since this is a first addition, I'm good with 
getting it in and letting people play with it. A basic look through looks like 
it implements the hive interfaces reasonably well, and I'm +1 for inclusion.

Josh, could you please rebase the patch to the current hive trunk and upload 
(looks like recent changes caused itests/qtest/pom.xml to not patch properly) 
I'll commit it once the tests pass with the latest rebase.

 Integrate AccumuloStorageHandler
 

 Key: HIVE-7068
 URL: https://issues.apache.org/jira/browse/HIVE-7068
 Project: Hive
  Issue Type: New Feature
Reporter: Josh Elser
Assignee: Josh Elser
 Fix For: 0.14.0

 Attachments: HIVE-7068.1.patch, HIVE-7068.2.patch, HIVE-7068.3.patch


 [Accumulo|http://accumulo.apache.org] is a BigTable-clone which is similar to 
 HBase. Some [initial 
 work|https://github.com/bfemiano/accumulo-hive-storage-manager] has been done 
 to support querying an Accumulo table using Hive already. It is not a 
 complete solution as, most notably, the current implementation presently 
 lacks support for INSERTs.
 I would like to polish up the AccumuloStorageHandler (presently based on 
 0.10), implement missing basic functionality and compare it to the 
 HBaseStorageHandler (to ensure that we follow the same general usage 
 patterns).
 I've also been in communication with [~bfem] (the initial author) who 
 expressed interest in working on this again. I hope to coordinate efforts 
 with him.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7738) tez select sum(decimal) from union all of decimal and null throws NPE

2014-08-18 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100550#comment-14100550
 ] 

Hive QA commented on HIVE-7738:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12662450/HIVE-7738.3.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 5818 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_tez_union_decimal
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/376/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/376/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-376/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12662450

 tez select sum(decimal) from union all of decimal and null throws NPE
 -

 Key: HIVE-7738
 URL: https://issues.apache.org/jira/browse/HIVE-7738
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.13.1
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
 Attachments: HIVE-7738.2.patch, HIVE-7738.2.patch, HIVE-7738.3.patch, 
 HIVE-7738.patch, HIVE-7738.patch, HIVE-7738.patch, HIVE-7738.patch


 if run this query using tez engine then hive will throw NPE
 {code}
 select sum(a) from (
   select cast(1.1 as decimal) a from dual
   union all
   select cast(null as decimal) a from dual
 ) t;
 {code}
 {code}
 hive select sum(a) from (
select cast(1.1 as decimal) a from dual
union all
select cast(null as decimal) a from dual
  ) t;
 Query ID = apivovarov_20140814200909_438385b2-4147-47bc-98a0-a01567bbb5c5
 Total jobs = 1
 Launching Job 1 out of 1
 Status: Running (application id: application_1407388228332_5616)
 Map 1: -/-Map 4: -/-  Reducer 3: 0/1  
 Map 1: 0/1Map 4: 0/1  Reducer 3: 0/1  
 Map 1: 0/1Map 4: 0/1  Reducer 3: 0/1  
 Map 1: 0/1Map 4: 1/1  Reducer 3: 0/1  
 Map 1: 0/1Map 4: 1/1  Reducer 3: 0/1  
 Map 1: 0/1Map 4: 1/1  Reducer 3: 0/1  
 Map 1: 0/1Map 4: 1/1  Reducer 3: 0/1  
 Map 1: 0/1Map 4: 1/1  Reducer 3: 0/1  
 Status: Failed
 Vertex failed, vertexName=Map 1, vertexId=vertex_1407388228332_5616_1_02, 
 diagnostics=[Task failed, taskId=task_1407388228332_5616_1_02_00, 
 diagnostics=[AttemptID:attempt_1407388228332_5616_1_02_00_0 Info:Error: 
 java.lang.RuntimeException: java.lang.RuntimeException: Map operator 
 initialization failed
   at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:188)
   at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:307)
   at 
 org.apache.hadoop.mapred.YarnTezDagChild$5.run(YarnTezDagChild.java:564)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
   at 
 org.apache.hadoop.mapred.YarnTezDagChild.main(YarnTezDagChild.java:553)
 Caused by: java.lang.RuntimeException: Map operator initialization failed
   at 
 org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:145)
   at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:164)
   ... 6 more
 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableConstantHiveDecimalObjectInspector.precision(WritableConstantHiveDecimalObjectInspector.java:61)
   at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDAFSum$GenericUDAFSumHiveDecimal.init(GenericUDAFSum.java:106)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:362)
   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)
   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:460)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:416)
   at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:67)
   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)
   at 

[jira] [Commented] (HIVE-5690) Support subquery for single sourced multi query

2014-08-18 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100589#comment-14100589
 ] 

Hive QA commented on HIVE-5690:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12662452/HIVE-5690.9.patch.txt

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 5820 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_mapjoin_mapjoin
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_merge3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_ptf
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/377/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/377/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-377/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12662452

 Support subquery for single sourced multi query
 ---

 Key: HIVE-5690
 URL: https://issues.apache.org/jira/browse/HIVE-5690
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: D13791.1.patch, HIVE-5690.2.patch.txt, 
 HIVE-5690.3.patch.txt, HIVE-5690.4.patch.txt, HIVE-5690.5.patch.txt, 
 HIVE-5690.6.patch.txt, HIVE-5690.7.patch.txt, HIVE-5690.8.patch.txt, 
 HIVE-5690.9.patch.txt


 Single sourced multi (insert) query is very useful for various ETL processes 
 but it does not allow subqueries included. For example, 
 {noformat}
 explain from src 
 insert overwrite table x1 select * from (select distinct key,value) b order 
 by key
 insert overwrite table x2 select * from (select distinct key,value) c order 
 by value;
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7673) Authorization api: missing privilege objects in create table/view

2014-08-18 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-7673:


Attachment: HIVE-7673.2.patch

HIVE-7673.2.patch - patch with test fixes and test updates.


 Authorization api: missing privilege objects in create table/view
 -

 Key: HIVE-7673
 URL: https://issues.apache.org/jira/browse/HIVE-7673
 Project: Hive
  Issue Type: Bug
  Components: Authorization, SQLStandardAuthorization
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-7673.1.patch, HIVE-7673.2.patch


 Issues being addressed:
 - In case of create-table-as-select query, the database the table belongs to 
 is not among the objects to be authorized.
 - Create table has the objectName field of the table entry with the database 
 prefix - like testdb.testtable, instead of just the table name.
 - checkPrivileges(CREATEVIEW) does not include the name of the view being 
 created in outputHObjs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7673) Authorization api: missing privilege objects in create table/view

2014-08-18 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-7673:


Status: Patch Available  (was: Open)

 Authorization api: missing privilege objects in create table/view
 -

 Key: HIVE-7673
 URL: https://issues.apache.org/jira/browse/HIVE-7673
 Project: Hive
  Issue Type: Bug
  Components: Authorization, SQLStandardAuthorization
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-7673.1.patch, HIVE-7673.2.patch


 Issues being addressed:
 - In case of create-table-as-select query, the database the table belongs to 
 is not among the objects to be authorized.
 - Create table has the objectName field of the table entry with the database 
 prefix - like testdb.testtable, instead of just the table name.
 - checkPrivileges(CREATEVIEW) does not include the name of the view being 
 created in outputHObjs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-4788) RCFile and bzip2 compression not working

2014-08-18 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100627#comment-14100627
 ] 

Hive QA commented on HIVE-4788:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12662453/HIVE-4788.2.patch.txt

{color:green}SUCCESS:{color} +1 5820 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/378/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/378/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-378/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12662453

 RCFile and bzip2 compression not working
 

 Key: HIVE-4788
 URL: https://issues.apache.org/jira/browse/HIVE-4788
 Project: Hive
  Issue Type: Bug
  Components: Compression
Affects Versions: 0.10.0
 Environment: CDH4.2
Reporter: Johndee Burks
Assignee: Navis
Priority: Minor
 Attachments: HIVE-4788.1.patch.txt, HIVE-4788.2.patch.txt


 The issue is that Bzip2 compressed rcfile data is encountering an error when 
 being queried even the most simple query select *. The issue is easily 
 reproducible using the following. 
 Create a table and load the sample data below. 
 DDL: create table source_data (a string, b string) row format delimited 
 fields terminated by ',';
 Sample data: 
 apple,sauce 
 Test: 
 Do the following and you should receive the error listed below for the rcfile 
 table with bz2 compression. 
 create table rc_nobz2 (a string, b string) stored as rcfile; 
 insert into table rc_nobz2 select * from source_txt; 
 SET io.seqfile.compression.type=BLOCK; 
 SET hive.exec.compress.output=true; 
 SET mapred.compress.map.output=true; 
 SET mapred.output.compress=true; 
 SET mapred.output.compression.codec=org.apache.hadoop.io.compress.BZip2Codec; 
 create table rc_bz2 (a string, b string) stored as rcfile; 
 insert into table rc_bz2 select * from source_txt; 
 hive select * from rc_bz2; 
 Failed with exception java.io.IOException:java.io.IOException: Stream is not 
 BZip2 formatted: expected 'h' as first byte but got '�' 
 hive select * from rc_nobz2; 
 apple sauce



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7765) Null pointer error with UNION ALL on partitioned tables using Tez

2014-08-18 Thread Chris Dragga (JIRA)
Chris Dragga created HIVE-7765:
--

 Summary: Null pointer error with UNION ALL on partitioned tables 
using Tez
 Key: HIVE-7765
 URL: https://issues.apache.org/jira/browse/HIVE-7765
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.1
 Environment: Tez 0.4.1, Ubuntu 12.04, Hadoop 2.4.1.
Reporter: Chris Dragga
Priority: Minor


When executing a UNION ALL query in Tez over partitioned tables where at least 
one table is empty, Hive fails to execute the query, returning the message 
FAILED: NullPointerException null.  No stack trace accompanies this message.  
Removing partitioning solves this problem, as does switching to MapReduce as 
the execution engine.

This can be reproduced using a variant of the example tables from the Getting 
Started documentation on the Hive wiki.  To create the schema, use

CREATE TABLE invites (foo INT, bar STRING) PARTITIONED BY (ds STRING);
CREATE TABLE empty_invites (foo INT, bar STRING) PARTITIONED BY (ds STRING);

Then, load invites with data (e.g., using the instructions 
[here|https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DMLOperations])
 and execute the following:

SELECT * FROM invites
UNION ALL
SELECT * FROM empty_invites;



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7764) Support all JDBC-HiveServer2 authentication modes on a secure cluster

2014-08-18 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100698#comment-14100698
 ] 

Hive QA commented on HIVE-7764:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12662467/HIVE-7764.1.patch

{color:red}ERROR:{color} -1 due to 30 failed/errored test(s), 5727 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.hooks.TestHs2Hooks.org.apache.hadoop.hive.hooks.TestHs2Hooks
org.apache.hive.beeline.TestBeeLineWithArgs.org.apache.hive.beeline.TestBeeLineWithArgs
org.apache.hive.jdbc.TestJdbcDriver2.org.apache.hive.jdbc.TestJdbcDriver2
org.apache.hive.jdbc.TestJdbcWithMiniHS2.org.apache.hive.jdbc.TestJdbcWithMiniHS2
org.apache.hive.jdbc.TestJdbcWithMiniMr.org.apache.hive.jdbc.TestJdbcWithMiniMr
org.apache.hive.jdbc.TestSSL.testConnectionMismatch
org.apache.hive.jdbc.TestSSL.testInvalidConfig
org.apache.hive.jdbc.TestSSL.testSSLConnectionWithProperty
org.apache.hive.jdbc.TestSSL.testSSLConnectionWithURL
org.apache.hive.jdbc.TestSSL.testSSLFetch
org.apache.hive.jdbc.TestSSL.testSSLFetchHttp
org.apache.hive.jdbc.authorization.TestHS2AuthzContext.org.apache.hive.jdbc.authorization.TestHS2AuthzContext
org.apache.hive.jdbc.authorization.TestHS2AuthzSessionContext.org.apache.hive.jdbc.authorization.TestHS2AuthzSessionContext
org.apache.hive.jdbc.authorization.TestJdbcWithSQLAuthorization.org.apache.hive.jdbc.authorization.TestJdbcWithSQLAuthorization
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testGetVariableValue
org.apache.hive.jdbc.miniHS2.TestMiniHS2.testConfInSession
org.apache.hive.service.auth.TestCustomAuthentication.org.apache.hive.service.auth.TestCustomAuthentication
org.apache.hive.service.auth.TestPlainSaslHelper.testDoAsSetting
org.apache.hive.service.cli.TestEmbeddedThriftBinaryCLIService.org.apache.hive.service.cli.TestEmbeddedThriftBinaryCLIService
org.apache.hive.service.cli.TestScratchDir.testLocalScratchDirs
org.apache.hive.service.cli.TestScratchDir.testResourceDirs
org.apache.hive.service.cli.TestScratchDir.testScratchDirs
org.apache.hive.service.cli.session.TestSessionGlobalInitFile.testSessionGlobalInitFile
org.apache.hive.service.cli.session.TestSessionGlobalInitFile.testSessionGlobalInitFileAndConfOverlay
org.apache.hive.service.cli.session.TestSessionGlobalInitFile.testSessionGlobalInitFileWithUser
org.apache.hive.service.cli.session.TestSessionHooks.testProxyUser
org.apache.hive.service.cli.session.TestSessionHooks.testSessionHook
org.apache.hive.service.cli.thrift.TestThriftBinaryCLIService.org.apache.hive.service.cli.thrift.TestThriftBinaryCLIService
org.apache.hive.service.cli.thrift.TestThriftHttpCLIService.org.apache.hive.service.cli.thrift.TestThriftHttpCLIService
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/379/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/379/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-379/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 30 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12662467

 Support all JDBC-HiveServer2 authentication modes on a secure cluster
 -

 Key: HIVE-7764
 URL: https://issues.apache.org/jira/browse/HIVE-7764
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, JDBC
Affects Versions: 0.14.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
 Fix For: 0.14.0

 Attachments: HIVE-7764.1.patch


 Currently, HiveServer2 logs in with its keytab only if 
 hive.server2.authentication is set to KERBEROS. However, 
 hive.server2.authentication is config that determines the auth type an end 
 user will use while authenticating with HiveServer2. There is a valid use 
 case of user authenticating with HiveServer2 using LDAP for example, while 
 HiveServer2 runs the query on a kerberized cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7647) Beeline does not honor --headerInterval and --color when executing with -e

2014-08-18 Thread Naveen Gangam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100701#comment-14100701
 ] 

Naveen Gangam commented on HIVE-7647:
-

Would someone be able to review this? Thanks in advance

 Beeline does not honor --headerInterval and --color when executing with -e
 

 Key: HIVE-7647
 URL: https://issues.apache.org/jira/browse/HIVE-7647
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.14.0
Reporter: Naveen Gangam
Assignee: Naveen Gangam
Priority: Minor
 Fix For: 0.14.0

 Attachments: HIVE-7647.1.patch


 --showHeader is being honored
 [root@localhost ~]# beeline --showHeader=false -u 
 'jdbc:hive2://localhost:1/default' -n hive -d 
 org.apache.hive.jdbc.HiveDriver -e select * from sample_07 limit 10;
 Connecting to jdbc:hive2://localhost:1/default
 Connected to: Apache Hive (version 0.12.0-cdh5.0.1)
 Driver: Hive JDBC (version 0.12.0-cdh5.0.1)
 Transaction isolation: TRANSACTION_REPEATABLE_READ
 -hiveconf (No such file or directory)
 +--+--++-+
 | 00-  | All Occupations  | 135185230  | 42270   |
 | 11-  | Management occupations   | 6152650| 100310  |
 | 11-1011  | Chief executives | 301930 | 160440  |
 | 11-1021  | General and operations managers  | 1697690| 107970  |
 | 11-1031  | Legislators  | 64650  | 37980   |
 | 11-2011  | Advertising and promotions managers  | 36100  | 94720   |
 | 11-2021  | Marketing managers   | 166790 | 118160  |
 | 11-2022  | Sales managers   | 333910 | 110390  |
 | 11-2031  | Public relations managers| 51730  | 101220  |
 | 11-3011  | Administrative services managers | 246930 | 79500   |
 +--+--++-+
 10 rows selected (0.838 seconds)
 Beeline version 0.12.0-cdh5.1.0 by Apache Hive
 Closing: org.apache.hive.jdbc.HiveConnection
 --outputFormat is being honored.
 [root@localhost ~]# beeline --outputFormat=csv -u 
 'jdbc:hive2://localhost:1/default' -n hive -d 
 org.apache.hive.jdbc.HiveDriver -e select * from sample_07 limit 10;
 Connecting to jdbc:hive2://localhost:1/default
 Connected to: Apache Hive (version 0.12.0-cdh5.0.1)
 Driver: Hive JDBC (version 0.12.0-cdh5.0.1)
 Transaction isolation: TRANSACTION_REPEATABLE_READ
 'code','description','total_emp','salary'
 '00-','All Occupations','135185230','42270'
 '11-','Management occupations','6152650','100310'
 '11-1011','Chief executives','301930','160440'
 '11-1021','General and operations managers','1697690','107970'
 '11-1031','Legislators','64650','37980'
 '11-2011','Advertising and promotions managers','36100','94720'
 '11-2021','Marketing managers','166790','118160'
 '11-2022','Sales managers','333910','110390'
 '11-2031','Public relations managers','51730','101220'
 '11-3011','Administrative services managers','246930','79500'
 10 rows selected (0.664 seconds)
 Beeline version 0.12.0-cdh5.1.0 by Apache Hive
 Closing: org.apache.hive.jdbc.HiveConnection
 both --color  --headerInterval are being honored when executing using -f 
 option (reads query from a file rather than the commandline) (cannot really 
 see the color here but use the terminal colors)
 [root@localhost ~]# beeline --showheader=true --color=true --headerInterval=5 
 -u 'jdbc:hive2://localhost:1/default' -n hive -d 
 org.apache.hive.jdbc.HiveDriver -f /tmp/tmp.sql  
 Connecting to jdbc:hive2://localhost:1/default
 Connected to: Apache Hive (version 0.12.0-cdh5.0.1)
 Driver: Hive JDBC (version 0.12.0-cdh5.0.1)
 Transaction isolation: TRANSACTION_REPEATABLE_READ
 Beeline version 0.12.0-cdh5.1.0 by Apache Hive
 0: jdbc:hive2://localhost select * from sample_07 limit 8;
 +--+--++-+
 |   code   | description  | total_emp  | salary  |
 +--+--++-+
 | 00-  | All Occupations  | 135185230  | 42270   |
 | 11-  | Management occupations   | 6152650| 100310  |
 | 11-1011  | Chief executives | 301930 | 160440  |
 | 11-1021  | General and operations managers  | 1697690| 107970  |
 | 11-1031  | Legislators  | 64650  | 37980   |
 +--+--++-+
 |   code   | description  | total_emp  | salary  |
 +--+--++-+
 | 11-2011  | 

Re: Mail bounces from ebuddy.com

2014-08-18 Thread Alan Gates
Anyone who is an admin on the list (I don't who the admins are) can do 
this by doing user-unsubscribe-USERNAME=ebuddy@hive.apache.org where 
USERNAME is the name of the bouncing user (see 
http://untroubled.org/ezmlm/ezman/ezman1.html )


Alan.




Thejas Nair mailto:the...@hortonworks.com
August 17, 2014 at 17:02
I don't know how to do this.

Carl, Ashutosh,
Do you guys know how to remove these two invalid emails from the 
mailing list ?



Lars Francke mailto:lars.fran...@gmail.com
August 17, 2014 at 15:41
Hmm great, I see others mentioning this as well. I'm happy to contact 
INFRA

but I'm not sure if they are even needed or if someone from the Hive team
can do this?


On Fri, Aug 8, 2014 at 3:43 AM, Lefty Leverenz leftylever...@gmail.com

Lefty Leverenz mailto:leftylever...@gmail.com
August 7, 2014 at 18:43
(Excuse the spam.) Actually I'm getting two bounces per message, but gmail
concatenates them so I didn't notice the second one.

-- Lefty


On Thu, Aug 7, 2014 at 9:36 PM, Lefty Leverenz leftylever...@gmail.com

Lefty Leverenz mailto:leftylever...@gmail.com
August 7, 2014 at 18:36
Curious, I've only been getting one bounce per message. Anyway thanks for
bringing this up.

-- Lefty



Lars Francke mailto:lars.fran...@gmail.com
August 7, 2014 at 4:38
Hi,

every time I send a mail to dev@ I get two bounce mails from two people at
ebuddy.com. I don't want to post the E-Mail addresses publicly but I can
send them on if needed (and it can be triggered easily by just replying to
this mail I guess).

Could we maybe remove them from the list?

Cheers,
Lars



--
Sent with Postbox http://www.getpostbox.com

--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Mail bounces from ebuddy.com

2014-08-18 Thread Ashutosh Chauhan
Thanks, Alan for the hint. I just unsubscribed those two email addresses
from ebuddy.


On Mon, Aug 18, 2014 at 8:23 AM, Alan Gates ga...@hortonworks.com wrote:

 Anyone who is an admin on the list (I don't who the admins are) can do
 this by doing user-unsubscribe-USERNAME=ebuddy@hive.apache.org where
 USERNAME is the name of the bouncing user (see
 http://untroubled.org/ezmlm/ezman/ezman1.html )

 Alan.



   Thejas Nair the...@hortonworks.com
  August 17, 2014 at 17:02
 I don't know how to do this.

 Carl, Ashutosh,
 Do you guys know how to remove these two invalid emails from the mailing
 list ?


   Lars Francke lars.fran...@gmail.com
  August 17, 2014 at 15:41
 Hmm great, I see others mentioning this as well. I'm happy to contact INFRA
 but I'm not sure if they are even needed or if someone from the Hive team
 can do this?


 On Fri, Aug 8, 2014 at 3:43 AM, Lefty Leverenz leftylever...@gmail.com
 leftylever...@gmail.com

   Lefty Leverenz leftylever...@gmail.com
  August 7, 2014 at 18:43
 (Excuse the spam.) Actually I'm getting two bounces per message, but gmail
 concatenates them so I didn't notice the second one.

 -- Lefty


 On Thu, Aug 7, 2014 at 9:36 PM, Lefty Leverenz leftylever...@gmail.com
 leftylever...@gmail.com

   Lefty Leverenz leftylever...@gmail.com
  August 7, 2014 at 18:36
 Curious, I've only been getting one bounce per message. Anyway thanks for
 bringing this up.

 -- Lefty



   Lars Francke lars.fran...@gmail.com
  August 7, 2014 at 4:38
 Hi,

 every time I send a mail to dev@ I get two bounce mails from two people at
 ebuddy.com. I don't want to post the E-Mail addresses publicly but I can
 send them on if needed (and it can be triggered easily by just replying to
 this mail I guess).

 Could we maybe remove them from the list?

 Cheers,
 Lars


 --
 Sent with Postbox http://www.getpostbox.com

 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.



[jira] [Commented] (HIVE-6934) PartitionPruner doesn't handle top level constant expression correctly

2014-08-18 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100760#comment-14100760
 ] 

Hive QA commented on HIVE-6934:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12662466/HIVE-6934.6.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 5820 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_boolexpr
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchCommit_Json
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/380/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/380/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-380/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12662466

 PartitionPruner doesn't handle top level constant expression correctly
 --

 Key: HIVE-6934
 URL: https://issues.apache.org/jira/browse/HIVE-6934
 Project: Hive
  Issue Type: Bug
Reporter: Harish Butani
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-6934.1.patch, HIVE-6934.2.patch, HIVE-6934.3.patch, 
 HIVE-6934.4.patch, HIVE-6934.5.patch, HIVE-6934.6.patch


 You hit this error indirectly, because how we handle invalid constant 
 comparisons. Consider:
 {code}
 create table x(key int, value string) partitioned by (dt int, ts string);
 -- both these queries hit this issue
 select * from x where key = 'abc';
 select * from x where dt = 'abc';
 -- the issue is the comparison get converted to the constant false
 -- and the PartitionPruner doesn't handle top level constant exprs corrcetly
 {code}
 Thanks to [~hsubramaniyan] for uncovering this as part of adding tests for 
 HIVE-5376



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7766) Cleanup Reduce operator code [Spark Branch]

2014-08-18 Thread Brock Noland (JIRA)
Brock Noland created HIVE-7766:
--

 Summary: Cleanup Reduce operator code [Spark Branch]
 Key: HIVE-7766
 URL: https://issues.apache.org/jira/browse/HIVE-7766
 Project: Hive
  Issue Type: Sub-task
Reporter: Brock Noland
Assignee: Rui Li






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7373) Hive should not remove trailing zeros for decimal numbers

2014-08-18 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-7373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100762#comment-14100762
 ] 

Sergio Peña commented on HIVE-7373:
---

There is a problem when storing the factor value when serializing the value 0. 
When serializing 0, it was deserializing 0.00

The bug is that factor was being serialized with no changes only when its sign 
was 1 (positive). The other signs, 0 and negative, were negating the factor.
Then, in the deserialize function, the factor was deserializing positive and 0 
values. And only the negative value was negating the factor.

(serialize)
int sign = dec.compareTo(HiveDecimal.ZERO);
int factor = dec.precision() - dec.scale();
factor = sign == 1 ? factor : -factor; (BUG)
writeByte(buffer, (byte) ( sign + 1), invert);

(deserialize)
int b = buffer.read(invert) - 1;
boolean positive = b != -1;
if (!positive) {
   factor = -factor;
}

Here's a data example about the bug:
length=1  prec-scal serialize | deserialize scale = 
factor-length
-1.0decimal(1,1)  factor=0  factor=-0 | factor=0scale =  0-1 (-1)
-1   decimal(1,0)  factor=1  factor=-1 |factor=1scale =  1-1  (0)
 0   decimal(1,0)  factor=1  factor=-1 |factor=-1   scale = -1-1 (-2) 
BUG
 0.0decimal(1,1)  factor=0  factor=-0 | factor=-0   scale =  0-1 (-1)
 1   decimal(1,0)  factor=1  factor=1  |factor=1scale =  1-1  (0)
 1.0decimal(1,1)  factor=0  factor=0  | factor=0scale =  0-1 (-1)

And with the fix on serialize:
   factor = sign != -1 ? factor : -factor; (FIX)

 length=1 prec-scal serialize | deserialize scale = 
factor-length
-1.0decimal(1,1)  factor=0  factor=-0 | factor=0scale =  0-1 (-1)
-1   decimal(1,0)  factor=1  factor=-1 |factor=1scale =  1-1  (0)
 0   decimal(1,0)  factor=1  factor=1  |factor=1scale = -1-1  (0) 
FIX
 0.0decimal(1,1)  factor=0  factor=0  | factor=0scale =  0-1 (-1)
 1   decimal(1,0)  factor=1  factor=1  |factor=1scale =  1-1  (0)
 1.0decimal(1,1)  factor=0  factor=0  | factor=0scale =  0-1 (-1)

 Hive should not remove trailing zeros for decimal numbers
 -

 Key: HIVE-7373
 URL: https://issues.apache.org/jira/browse/HIVE-7373
 Project: Hive
  Issue Type: Bug
  Components: Types
Affects Versions: 0.13.0, 0.13.1
Reporter: Xuefu Zhang
Assignee: Sergio Peña
 Attachments: HIVE-7373.1.patch, HIVE-7373.2.patch, HIVE-7373.3.patch, 
 HIVE-7373.4.patch, HIVE-7373.5.patch, HIVE-7373.6.patch, HIVE-7373.6.patch


 Currently Hive blindly removes trailing zeros of a decimal input number as 
 sort of standardization. This is questionable in theory and problematic in 
 practice.
 1. In decimal context,  number 3.14 has a different semantic meaning from 
 number 3.14. Removing trailing zeroes makes the meaning lost.
 2. In a extreme case, 0.0 has (p, s) as (1, 1). Hive removes trailing zeros, 
 and then the number becomes 0, which has (p, s) of (1, 0). Thus, for a 
 decimal column of (1,1), input such as 0.0, 0.00, and so on becomes NULL 
 because the column doesn't allow a decimal number with integer part.
 Therefore, I propose Hive preserve the trailing zeroes (up to what the scale 
 allows). With this, in above example, 0.0, 0.00, and 0. will be 
 represented as 0.0 (precision=1, scale=1) internally.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7766) Cleanup Reduce operator code [Spark Branch]

2014-08-18 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7766:
---

Attachment: HIVE-7766.1-spark.patch

 Cleanup Reduce operator code [Spark Branch]
 ---

 Key: HIVE-7766
 URL: https://issues.apache.org/jira/browse/HIVE-7766
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Brock Noland
Assignee: Rui Li
 Attachments: HIVE-7766.1-spark.patch


 This patch 
 https://issues.apache.org/jira/secure/attachment/12662431/HIVE-7624.7-spark.patch
  over on HIVE-7624.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark

2014-08-18 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100769#comment-14100769
 ] 

Brock Noland commented on HIVE-7624:


Hi Rui,

Hive generally follows one commit = one jira so I moved your patch over to 
HIVE-7766 and committed it. Thank you!!

 Reduce operator initialization failed when running multiple MR query on spark
 -

 Key: HIVE-7624
 URL: https://issues.apache.org/jira/browse/HIVE-7624
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Fix For: spark-branch

 Attachments: HIVE-7624.2-spark.patch, HIVE-7624.3-spark.patch, 
 HIVE-7624.4-spark.patch, HIVE-7624.5-spark.patch, HIVE-7624.6-spark.patch, 
 HIVE-7624.7-spark.patch, HIVE-7624.patch


 The following error occurs when I try to run a query with multiple reduce 
 works (M-R-R):
 {quote}
 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1)
 java.lang.RuntimeException: Reduce operator initialization failed
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31)
 at 
 org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
 at 
 org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
 at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
 at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
 at 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:54)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from 
 [0:_col0]
 at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
 at 
 org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147)
 …
 {quote}
 I suspect we're applying the reduce function in wrong order.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HIVE-7766) Cleanup Reduce operator code [Spark Branch]

2014-08-18 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland resolved HIVE-7766.


   Resolution: Fixed
Fix Version/s: spark-branch

Thank you for your contribution Rui! I have committed this to spark!

 Cleanup Reduce operator code [Spark Branch]
 ---

 Key: HIVE-7766
 URL: https://issues.apache.org/jira/browse/HIVE-7766
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Brock Noland
Assignee: Rui Li
 Fix For: spark-branch

 Attachments: HIVE-7766.1-spark.patch


 This patch 
 https://issues.apache.org/jira/secure/attachment/12662431/HIVE-7624.7-spark.patch
  over on HIVE-7624.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark

2014-08-18 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7624:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

 Reduce operator initialization failed when running multiple MR query on spark
 -

 Key: HIVE-7624
 URL: https://issues.apache.org/jira/browse/HIVE-7624
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Fix For: spark-branch

 Attachments: HIVE-7624.2-spark.patch, HIVE-7624.3-spark.patch, 
 HIVE-7624.4-spark.patch, HIVE-7624.5-spark.patch, HIVE-7624.6-spark.patch, 
 HIVE-7624.7-spark.patch, HIVE-7624.patch


 The following error occurs when I try to run a query with multiple reduce 
 works (M-R-R):
 {quote}
 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1)
 java.lang.RuntimeException: Reduce operator initialization failed
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31)
 at 
 org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
 at 
 org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
 at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
 at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
 at 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:54)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from 
 [0:_col0]
 at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
 at 
 org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147)
 …
 {quote}
 I suspect we're applying the reduce function in wrong order.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7528) Support cluster by and distributed by

2014-08-18 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7528:
---

Attachment: HIVE-7528.1-spark.patch

Re-uploading the same patch under a name which allow pre-commit tests to run.

 Support cluster by and distributed by
 -

 Key: HIVE-7528
 URL: https://issues.apache.org/jira/browse/HIVE-7528
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Rui Li
 Attachments: HIVE-7528.1-spark.patch, HIVE-7528.spark.patch


 clustered by = distributed by + sort by, so this is related to HIVE-7527. If 
 sort by is in place, the assumption is that we don't need to do anything 
 about distributed by or clustered by. Still, we need to confirm and verify.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7729) Enable q-tests for ANALYZE TABLE feature [Spark Branch]

2014-08-18 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7729:
---

Summary: Enable q-tests for ANALYZE TABLE feature [Spark Branch]  (was: 
Enable q-tests for ANALYZE TABLE feature.)

 Enable q-tests for ANALYZE TABLE feature [Spark Branch]
 ---

 Key: HIVE-7729
 URL: https://issues.apache.org/jira/browse/HIVE-7729
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li

 Enable q-tests for ANALYZE TABLE feature since automatic test environment is 
 ready.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7766) Cleanup Reduce operator code [Spark Branch]

2014-08-18 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100766#comment-14100766
 ] 

Brock Noland commented on HIVE-7766:


+1

tests passed over on HIVE-7624.

 Cleanup Reduce operator code [Spark Branch]
 ---

 Key: HIVE-7766
 URL: https://issues.apache.org/jira/browse/HIVE-7766
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Brock Noland
Assignee: Rui Li
 Attachments: HIVE-7766.1-spark.patch


 This patch 
 https://issues.apache.org/jira/secure/attachment/12662431/HIVE-7624.7-spark.patch
  over on HIVE-7624.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7766) Cleanup Reduce operator code [Spark Branch]

2014-08-18 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7766:
---

Description: This patch 
https://issues.apache.org/jira/secure/attachment/12662431/HIVE-7624.7-spark.patch
 over on HIVE-7624.

 Cleanup Reduce operator code [Spark Branch]
 ---

 Key: HIVE-7766
 URL: https://issues.apache.org/jira/browse/HIVE-7766
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Brock Noland
Assignee: Rui Li
 Attachments: HIVE-7766.1-spark.patch


 This patch 
 https://issues.apache.org/jira/secure/attachment/12662431/HIVE-7624.7-spark.patch
  over on HIVE-7624.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7761) Failed to analyze stats with CounterStatsAggregator [SparkBranch]

2014-08-18 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7761:
---

Summary: Failed to analyze stats with CounterStatsAggregator [SparkBranch]  
(was: Failed to analyze stats with CounterStatsAggregator.[SparkBranch])

 Failed to analyze stats with CounterStatsAggregator [SparkBranch]
 -

 Key: HIVE-7761
 URL: https://issues.apache.org/jira/browse/HIVE-7761
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chengxiang Li

 CounterStatsAggregator analyze stats with MR counter, we need to implement 
 another CounterStatsAggregator based on spark speficed counter to analyze 
 table stats. Here is the error information:
 {noformat}
 2014-08-17 23:46:34,436 ERROR stats.CounterStatsAggregator 
 (CounterStatsAggregator.java:connect(51)) - Failed to get Job instance for 
 null
 java.lang.ClassCastException: org.apache.hadoop.hive.ql.exec.spark.SparkTask 
 cannot be cast to org.apache.hadoop.hive.ql.exec.mr.MapRedTask
 at 
 org.apache.hadoop.hive.ql.stats.CounterStatsAggregator.connect(CounterStatsAggregator.java:46)
 at 
 org.apache.hadoop.hive.ql.exec.StatsTask.createStatsAggregator(StatsTask.java:282)
 at 
 org.apache.hadoop.hive.ql.exec.StatsTask.aggregateStats(StatsTask.java:142)
 at 
 org.apache.hadoop.hive.ql.exec.StatsTask.execute(StatsTask.java:118)
 at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:161)
 at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
 at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1534)
 at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1301)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1113)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:937)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:927)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:246)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:198)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7763) Failed to qeury TABLESAMPLE on empty bucket table [Spark Branch]

2014-08-18 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7763:
---

Summary: Failed to qeury TABLESAMPLE on empty bucket table [Spark Branch]  
(was: Failed to qeury TABLESAMPLE on empty bucket table.[Spark Branch])

 Failed to qeury TABLESAMPLE on empty bucket table [Spark Branch]
 

 Key: HIVE-7763
 URL: https://issues.apache.org/jira/browse/HIVE-7763
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
 Attachments: HIVE-7763.1-spark.patch


 Get the following exception:
 {noformat}
 2014-08-18 16:23:15,213 ERROR [Executor task launch worker-0]: 
 executor.Executor (Logging.scala:logError(96)) - Exception in task 0.0 in 
 stage 1.0 (TID 0)
 java.lang.RuntimeException: Map operator initialization failed
 at 
 org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:127)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:52)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:30)
 at 
 org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
 at 
 org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
 at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
 at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
 at 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:54)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:722)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Configuration and input 
 path are inconsistent
 at 
 org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:404)
 at 
 org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:93)
 ... 16 more
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Configuration 
 and input path are inconsistent
 at org.apache.hadoop.hive.ql.exec
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7763) Failed to qeury TABLESAMPLE on empty bucket table.[Spark Branch]

2014-08-18 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100774#comment-14100774
 ] 

Brock Noland commented on HIVE-7763:


+1

 Failed to qeury TABLESAMPLE on empty bucket table.[Spark Branch]
 

 Key: HIVE-7763
 URL: https://issues.apache.org/jira/browse/HIVE-7763
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
 Attachments: HIVE-7763.1-spark.patch


 Get the following exception:
 {noformat}
 2014-08-18 16:23:15,213 ERROR [Executor task launch worker-0]: 
 executor.Executor (Logging.scala:logError(96)) - Exception in task 0.0 in 
 stage 1.0 (TID 0)
 java.lang.RuntimeException: Map operator initialization failed
 at 
 org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:127)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:52)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:30)
 at 
 org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
 at 
 org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
 at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
 at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
 at 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:54)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:722)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Configuration and input 
 path are inconsistent
 at 
 org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:404)
 at 
 org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:93)
 ... 16 more
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Configuration 
 and input path are inconsistent
 at org.apache.hadoop.hive.ql.exec
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7763) Failed to qeury TABLESAMPLE on empty bucket table [Spark Branch]

2014-08-18 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7763:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Chengxiang I have committed this to spark! Thank you very much for your 
contribution!!

 Failed to qeury TABLESAMPLE on empty bucket table [Spark Branch]
 

 Key: HIVE-7763
 URL: https://issues.apache.org/jira/browse/HIVE-7763
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
 Attachments: HIVE-7763.1-spark.patch


 Get the following exception:
 {noformat}
 2014-08-18 16:23:15,213 ERROR [Executor task launch worker-0]: 
 executor.Executor (Logging.scala:logError(96)) - Exception in task 0.0 in 
 stage 1.0 (TID 0)
 java.lang.RuntimeException: Map operator initialization failed
 at 
 org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:127)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:52)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:30)
 at 
 org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
 at 
 org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
 at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
 at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
 at 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:54)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:722)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Configuration and input 
 path are inconsistent
 at 
 org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:404)
 at 
 org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:93)
 ... 16 more
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Configuration 
 and input path are inconsistent
 at org.apache.hadoop.hive.ql.exec
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7757) PTest2 separates test files with spaces while QTestGen uses commas

2014-08-18 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7757:
---

   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Thank you for the review! I have committed this to trunk.

 PTest2 separates test files with spaces while QTestGen uses commas
 --

 Key: HIVE-7757
 URL: https://issues.apache.org/jira/browse/HIVE-7757
 Project: Hive
  Issue Type: Improvement
Reporter: Brock Noland
Assignee: Brock Noland
 Fix For: 0.14.0

 Attachments: HIVE-7757.1.patch


 I noticed in HIVE-7749 that even after the testconfiguration.properties file 
 is updated TestSparkCliDriver is not being generated correctly. Basically it 
 doesn't include any tests. The issue appears to be that in the pom file 
 properties are separated by comma and the PTest2 properties files are 
 separated by spaces. Since both comma and space are not used in the qtest 
 properties files let's update all parsing code to use both comma and space.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7763) Failed to qeury TABLESAMPLE on empty bucket table [Spark Branch]

2014-08-18 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7763:
---

Assignee: Chengxiang Li  (was: Brock Noland)

 Failed to qeury TABLESAMPLE on empty bucket table [Spark Branch]
 

 Key: HIVE-7763
 URL: https://issues.apache.org/jira/browse/HIVE-7763
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
 Attachments: HIVE-7763.1-spark.patch


 Get the following exception:
 {noformat}
 2014-08-18 16:23:15,213 ERROR [Executor task launch worker-0]: 
 executor.Executor (Logging.scala:logError(96)) - Exception in task 0.0 in 
 stage 1.0 (TID 0)
 java.lang.RuntimeException: Map operator initialization failed
 at 
 org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:127)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:52)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:30)
 at 
 org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
 at 
 org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
 at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
 at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
 at 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:54)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:722)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Configuration and input 
 path are inconsistent
 at 
 org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:404)
 at 
 org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:93)
 ... 16 more
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Configuration 
 and input path are inconsistent
 at org.apache.hadoop.hive.ql.exec
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (HIVE-7763) Failed to qeury TABLESAMPLE on empty bucket table [Spark Branch]

2014-08-18 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland reassigned HIVE-7763:
--

Assignee: Brock Noland  (was: Chengxiang Li)

 Failed to qeury TABLESAMPLE on empty bucket table [Spark Branch]
 

 Key: HIVE-7763
 URL: https://issues.apache.org/jira/browse/HIVE-7763
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chengxiang Li
Assignee: Brock Noland
 Attachments: HIVE-7763.1-spark.patch


 Get the following exception:
 {noformat}
 2014-08-18 16:23:15,213 ERROR [Executor task launch worker-0]: 
 executor.Executor (Logging.scala:logError(96)) - Exception in task 0.0 in 
 stage 1.0 (TID 0)
 java.lang.RuntimeException: Map operator initialization failed
 at 
 org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:127)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:52)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:30)
 at 
 org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
 at 
 org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
 at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
 at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
 at 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:54)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:722)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Configuration and input 
 path are inconsistent
 at 
 org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:404)
 at 
 org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:93)
 ... 16 more
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Configuration 
 and input path are inconsistent
 at org.apache.hadoop.hive.ql.exec
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7702) Start running .q file tests on spark

2014-08-18 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100781#comment-14100781
 ] 

Brock Noland commented on HIVE-7702:


After looking at this more, I think we should start with the 100 or so test 
that tez executes:

https://github.com/apache/hive/blob/spark/itests/src/test/resources/testconfiguration.properties#L49

 Start running .q file tests on spark
 

 Key: HIVE-7702
 URL: https://issues.apache.org/jira/browse/HIVE-7702
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Brock Noland
Assignee: Chinna Rao Lalam

 Spark can currently only support a few queries, however there are some .q 
 file tests which will pass today. The basic idea is that we should get some 
 number of these actually working (10-20) so we can actually start testing the 
 project.
 A good starting point might be the udf*, varchar*, or alter* tests:
 https://github.com/apache/hive/tree/spark/ql/src/test/queries/clientpositive
 To generate the output file for test XXX.q, you'd do:
 {noformat}
 mvn clean install -DskipTests -Phadoop-2
 cd itests
 mvn clean install -DskipTests -Phadoop-2
 cd qtest-spark
 mvn test -Dtest=TestCliDriver -Dqfile=XXX.q -Dtest.output.overwrite=true 
 -Phadoop-2
 {noformat}
 which would generate XXX.q.out which we can check-in to source control as a 
 golden file.
 Multiple tests can be run at a give time as so:
 {noformat}
 mvn test -Dtest=TestCliDriver -Dqfile=X1.q,X2.q -Dtest.output.overwrite=true 
 -Phadoop-2
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Mail bounces from ebuddy.com

2014-08-18 Thread Lars Francke
Thanks Alan and Ashutosh for taking care of this!


On Mon, Aug 18, 2014 at 5:45 PM, Ashutosh Chauhan hashut...@apache.org
wrote:

 Thanks, Alan for the hint. I just unsubscribed those two email addresses
 from ebuddy.


 On Mon, Aug 18, 2014 at 8:23 AM, Alan Gates ga...@hortonworks.com wrote:

  Anyone who is an admin on the list (I don't who the admins are) can do
  this by doing user-unsubscribe-USERNAME=ebuddy@hive.apache.org where
  USERNAME is the name of the bouncing user (see
  http://untroubled.org/ezmlm/ezman/ezman1.html )
 
  Alan.
 
 
 
Thejas Nair the...@hortonworks.com
   August 17, 2014 at 17:02
  I don't know how to do this.
 
  Carl, Ashutosh,
  Do you guys know how to remove these two invalid emails from the mailing
  list ?
 
 
Lars Francke lars.fran...@gmail.com
   August 17, 2014 at 15:41
  Hmm great, I see others mentioning this as well. I'm happy to contact
 INFRA
  but I'm not sure if they are even needed or if someone from the Hive team
  can do this?
 
 
  On Fri, Aug 8, 2014 at 3:43 AM, Lefty Leverenz leftylever...@gmail.com
  leftylever...@gmail.com
 
Lefty Leverenz leftylever...@gmail.com
   August 7, 2014 at 18:43
  (Excuse the spam.) Actually I'm getting two bounces per message, but
 gmail
  concatenates them so I didn't notice the second one.
 
  -- Lefty
 
 
  On Thu, Aug 7, 2014 at 9:36 PM, Lefty Leverenz leftylever...@gmail.com
  leftylever...@gmail.com
 
Lefty Leverenz leftylever...@gmail.com
   August 7, 2014 at 18:36
  Curious, I've only been getting one bounce per message. Anyway thanks for
  bringing this up.
 
  -- Lefty
 
 
 
Lars Francke lars.fran...@gmail.com
   August 7, 2014 at 4:38
  Hi,
 
  every time I send a mail to dev@ I get two bounce mails from two people
 at
  ebuddy.com. I don't want to post the E-Mail addresses publicly but I can
  send them on if needed (and it can be triggered easily by just replying
 to
  this mail I guess).
 
  Could we maybe remove them from the list?
 
  Cheers,
  Lars
 
 
  --
  Sent with Postbox http://www.getpostbox.com
 
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or entity
  to which it is addressed and may contain information that is
 confidential,
  privileged and exempt from disclosure under applicable law. If the reader
  of this message is not the intended recipient, you are hereby notified
 that
  any printing, copying, dissemination, distribution, disclosure or
  forwarding of this communication is strictly prohibited. If you have
  received this communication in error, please contact the sender
 immediately
  and delete it from your system. Thank You.
 



[jira] [Assigned] (HIVE-7747) Spark: Submitting a query to Spark from HiveServer2 fails

2014-08-18 Thread Venki Korukanti (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti reassigned HIVE-7747:
-

Assignee: Venki Korukanti

 Spark: Submitting a query to Spark from HiveServer2 fails
 -

 Key: HIVE-7747
 URL: https://issues.apache.org/jira/browse/HIVE-7747
 Project: Hive
  Issue Type: Bug
  Components: Spark
Affects Versions: spark-branch
Reporter: Venki Korukanti
Assignee: Venki Korukanti
 Fix For: spark-branch


 {{spark.serializer}} is set to 
 {{org.apache.spark.serializer.KryoSerializer}}. Same configuration works fine 
 from Hive CLI.
 Spark tasks fails with following error:
 {code}
 Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most 
 recent failure: Lost task 0.3 in stage 1.0 (TID 9, 192.168.168.216): 
 java.lang.IllegalStateException: unread block data
 
 java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2421)
 java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382)
 
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
 java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
 java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
 
 org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62)
 
 org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:84)
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)
 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 java.lang.Thread.run(Thread.java:744)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7747) Submitting a query to Spark from HiveServer2 fails [Spark Branch]

2014-08-18 Thread Venki Korukanti (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti updated HIVE-7747:
--

Summary: Submitting a query to Spark from HiveServer2 fails [Spark Branch]  
(was: Spark: Submitting a query to Spark from HiveServer2 fails)

 Submitting a query to Spark from HiveServer2 fails [Spark Branch]
 -

 Key: HIVE-7747
 URL: https://issues.apache.org/jira/browse/HIVE-7747
 Project: Hive
  Issue Type: Bug
  Components: Spark
Affects Versions: spark-branch
Reporter: Venki Korukanti
Assignee: Venki Korukanti
 Fix For: spark-branch


 {{spark.serializer}} is set to 
 {{org.apache.spark.serializer.KryoSerializer}}. Same configuration works fine 
 from Hive CLI.
 Spark tasks fails with following error:
 {code}
 Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most 
 recent failure: Lost task 0.3 in stage 1.0 (TID 9, 192.168.168.216): 
 java.lang.IllegalStateException: unread block data
 
 java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2421)
 java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382)
 
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
 java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
 java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
 
 org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62)
 
 org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:84)
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)
 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 java.lang.Thread.run(Thread.java:744)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7373) Hive should not remove trailing zeros for decimal numbers

2014-08-18 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100813#comment-14100813
 ] 

Brock Noland commented on HIVE-7373:


Nice. I am +1 on this change.

 Hive should not remove trailing zeros for decimal numbers
 -

 Key: HIVE-7373
 URL: https://issues.apache.org/jira/browse/HIVE-7373
 Project: Hive
  Issue Type: Bug
  Components: Types
Affects Versions: 0.13.0, 0.13.1
Reporter: Xuefu Zhang
Assignee: Sergio Peña
 Attachments: HIVE-7373.1.patch, HIVE-7373.2.patch, HIVE-7373.3.patch, 
 HIVE-7373.4.patch, HIVE-7373.5.patch, HIVE-7373.6.patch, HIVE-7373.6.patch


 Currently Hive blindly removes trailing zeros of a decimal input number as 
 sort of standardization. This is questionable in theory and problematic in 
 practice.
 1. In decimal context,  number 3.14 has a different semantic meaning from 
 number 3.14. Removing trailing zeroes makes the meaning lost.
 2. In a extreme case, 0.0 has (p, s) as (1, 1). Hive removes trailing zeros, 
 and then the number becomes 0, which has (p, s) of (1, 0). Thus, for a 
 decimal column of (1,1), input such as 0.0, 0.00, and so on becomes NULL 
 because the column doesn't allow a decimal number with integer part.
 Therefore, I propose Hive preserve the trailing zeroes (up to what the scale 
 allows). With this, in above example, 0.0, 0.00, and 0. will be 
 represented as 0.0 (precision=1, scale=1) internally.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7739) TestSparkCliDriver should not use includeQueryFiles [Spark Branch]

2014-08-18 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7739:
---

Summary: TestSparkCliDriver should not use includeQueryFiles [Spark Branch] 
 (was: TestSparkCliDriver should not use includeQueryFiles)

 TestSparkCliDriver should not use includeQueryFiles [Spark Branch]
 --

 Key: HIVE-7739
 URL: https://issues.apache.org/jira/browse/HIVE-7739
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-7739.1-spark.patch


 By using includesQueryFile TestSparkCliDriver cannot be used by -Dqfile or 
 -Dqfile_regex. These options are very useful so let's remove it.
 spark.query.files in testconfiguration.properties will still be used when run 
 via the pre-commit tests to generate -Dqfiles



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7709) Create SparkReporter [Spark Branch]

2014-08-18 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7709:
---

Summary: Create SparkReporter [Spark Branch]  (was: Create SparkReporter)

 Create SparkReporter [Spark Branch]
 ---

 Key: HIVE-7709
 URL: https://issues.apache.org/jira/browse/HIVE-7709
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li

 Hive operators use Reporter to collect global information,  with Hive on 
 Spark mode, we need a new implementation of Reporter to collect hive operator 
 level information based on spark specified Counter. This task should depends 
 on HIVE-7551.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7525) Research to find out if it's possible to submit Spark jobs concurrently using shared SparkContext [Spark Branch]

2014-08-18 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7525:
---

Summary: Research to find out if it's possible to submit Spark jobs 
concurrently using shared SparkContext [Spark Branch]  (was: Research to find 
out if it's possible to submit Spark jobs concurrently using shared 
SparkContext)

 Research to find out if it's possible to submit Spark jobs concurrently using 
 shared SparkContext [Spark Branch]
 

 Key: HIVE-7525
 URL: https://issues.apache.org/jira/browse/HIVE-7525
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Chao

 Refer to HIVE-7503 and SPARK-2688. Find out if it's possible to submit 
 multiple spark jobs concurrently using a shared SparkContext. SparkClient's 
 code can be manipulated for this test. Here is the process:
 1. Transform rdd1 to rdd2 using some transformation.
 2. call rdd2.cache() to persist it in memory.
 3. in two threads, calling accordingly:
 Thread a. rdd2 - rdd3; rdd3.foreach()
 Thread b. rdd2 - rdd4; rdd4.foreach()
 It would be nice to find out monitoring and error reporting aspects.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7702) Start running .q file tests on spark [Spark Branch]

2014-08-18 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7702:
---

Summary: Start running .q file tests on spark [Spark Branch]  (was: Start 
running .q file tests on spark)

 Start running .q file tests on spark [Spark Branch]
 ---

 Key: HIVE-7702
 URL: https://issues.apache.org/jira/browse/HIVE-7702
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Brock Noland
Assignee: Chinna Rao Lalam

 Spark can currently only support a few queries, however there are some .q 
 file tests which will pass today. The basic idea is that we should get some 
 number of these actually working (10-20) so we can actually start testing the 
 project.
 A good starting point might be the udf*, varchar*, or alter* tests:
 https://github.com/apache/hive/tree/spark/ql/src/test/queries/clientpositive
 To generate the output file for test XXX.q, you'd do:
 {noformat}
 mvn clean install -DskipTests -Phadoop-2
 cd itests
 mvn clean install -DskipTests -Phadoop-2
 cd qtest-spark
 mvn test -Dtest=TestCliDriver -Dqfile=XXX.q -Dtest.output.overwrite=true 
 -Phadoop-2
 {noformat}
 which would generate XXX.q.out which we can check-in to source control as a 
 golden file.
 Multiple tests can be run at a give time as so:
 {noformat}
 mvn test -Dtest=TestCliDriver -Dqfile=X1.q,X2.q -Dtest.output.overwrite=true 
 -Phadoop-2
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7674) Update to Spark 1.1 [Spark Branch]

2014-08-18 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7674:
---

Summary: Update to Spark 1.1 [Spark Branch]  (was: Update to Spark 1.1)

 Update to Spark 1.1 [Spark Branch]
 --

 Key: HIVE-7674
 URL: https://issues.apache.org/jira/browse/HIVE-7674
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Brock Noland
Priority: Blocker

 In HIVE-7540 we added a custom repo to use Spark 1.1. Once 1.1 is released we 
 need to remove this repo.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7640) Support Hive TABLESAMPLE [Spark Branch]

2014-08-18 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7640:
---

Summary: Support Hive TABLESAMPLE [Spark Branch]  (was: Support Hive 
TABLESAMPLE)

 Support Hive TABLESAMPLE [Spark Branch]
 ---

 Key: HIVE-7640
 URL: https://issues.apache.org/jira/browse/HIVE-7640
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li

 Research and verify TABLESAMPLE support in Hive on Spark, and research 
 whether it  can be merged with Spark sample features.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7528) Support cluster by and distributed by [Spark Branch]

2014-08-18 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7528:
---

Summary: Support cluster by and distributed by [Spark Branch]  (was: 
Support cluster by and distributed by)

 Support cluster by and distributed by [Spark Branch]
 

 Key: HIVE-7528
 URL: https://issues.apache.org/jira/browse/HIVE-7528
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Rui Li
 Attachments: HIVE-7528.1-spark.patch, HIVE-7528.spark.patch


 clustered by = distributed by + sort by, so this is related to HIVE-7527. If 
 sort by is in place, the assumption is that we don't need to do anything 
 about distributed by or clustered by. Still, we need to confirm and verify.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7597) Support analyze table [Spark Branch]

2014-08-18 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7597:
---

Summary: Support analyze table [Spark Branch]  (was: Support analyze table)

 Support analyze table [Spark Branch]
 

 Key: HIVE-7597
 URL: https://issues.apache.org/jira/browse/HIVE-7597
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Chengxiang Li
 Fix For: spark-branch

 Attachments: HIVE-7597.1-spark.patch, HIVE-7597.2-spark.patch, 
 HIVE-7597.3-spark.patch


 Both MR and Tez has a visitor processing analyze table ... command. We 
 cloned the code from Tez, but may need to make it fit for Spark, verify, and 
 test.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7614) Find solution for closures containing writables [Spark Branch]

2014-08-18 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7614:
---

Summary: Find solution for closures containing writables [Spark Branch]  
(was: Find solution for closures containing writables)

 Find solution for closures containing writables [Spark Branch]
 --

 Key: HIVE-7614
 URL: https://issues.apache.org/jira/browse/HIVE-7614
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Brock Noland
Priority: Blocker

 HIVE-7540 performed a workaround so we could serialize closures with 
 Writables. However, we need a long term solution.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7708) Fix qtest-spark pom.xml reference to test properties [Spark Branch]

2014-08-18 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7708:
---

Summary: Fix qtest-spark pom.xml reference to test properties [Spark 
Branch]  (was: Fix qtest-spark pom.xml reference to test properties)

 Fix qtest-spark pom.xml reference to test properties [Spark Branch]
 ---

 Key: HIVE-7708
 URL: https://issues.apache.org/jira/browse/HIVE-7708
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Brock Noland
Assignee: Brock Noland
 Fix For: spark-branch

 Attachments: HIVE-7708.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7675) Implement native HiveMapFunction [Spark Branch]

2014-08-18 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7675:
---

Summary: Implement native HiveMapFunction [Spark Branch]  (was: Implement 
native HiveMapFunction)

 Implement native HiveMapFunction [Spark Branch]
 ---

 Key: HIVE-7675
 URL: https://issues.apache.org/jira/browse/HIVE-7675
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li

 Currently, Hive on Spark depend on ExecMapper to execute operator logic, full 
 stack is like: Spark FrameWork=HiveMapFunction=ExecMapper=Hive operators. 
 HiveMapFunction is just a thin wrapper of ExecMapper, this introduce several 
 problems as following:
 # ExecMapper is designed for MR single process task mode, it does not work 
 well under Spark multi-thread task node.
 # ExecMapper introduce extra API level restriction and process logic.
 We need implement native HiveMapFunction, as the bridge between Spark 
 framework and Hive operators.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7593) Instantiate SparkClient per user session [Spark Branch]

2014-08-18 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7593:
---

Summary: Instantiate SparkClient per user session [Spark Branch]  (was: 
Instantiate SparkClient per user session)

 Instantiate SparkClient per user session [Spark Branch]
 ---

 Key: HIVE-7593
 URL: https://issues.apache.org/jira/browse/HIVE-7593
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Chinna Rao Lalam
 Attachments: HIVE-7593-spark.patch


 SparkContext is the main class via which Hive talk to Spark cluster. 
 SparkClient encapsulates a SparkContext instance. Currently all user sessions 
 share a single SparkClient instance in HiveServer2. While this is good enough 
 for a POC, even for our first two milestones, this is not desirable for a 
 multi-tenancy environment and gives least flexibility to Hive users. Here is 
 what we propose:
 1. Have a SparkClient instance per user session. The SparkClient instance is 
 created when user executes its first query in the session. It will get 
 destroyed when user session ends.
 2. The SparkClient is instantiated based on the spark configurations that are 
 available to the user, including those defined at the global level and those 
 overwritten by the user (thru set command, for instance).
 3. Ideally, when user changes any spark configuration during the session, the 
 old SparkClient instance should be destroyed and a new one based on the new 
 configurations is created. This may turn out to be a little hard, and thus 
 it's a nice-to-have. If not implemented, we need to document that 
 subsequent configuration changes will not take effect in the current session.
 Please note that there is a thread-safety issue on Spark side where multiple 
 SparkContext instances cannot coexist in the same JVM (SPARK-2243). We need 
 to work with Spark community to get this addressed.
 Besides above functional requirements, avoid potential issues is also a 
 consideration. For instance, sharing SC among users is bad, as resources 
 (such as jar for UDF) will be also shared, which is problematic. On the other 
 hand, one SC per job seems too expensive, as the resource needs to be 
 re-rendered even there isn't any change.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7559) StarterProject: Move configuration from SparkClient to HiveConf [Spark Branch]

2014-08-18 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7559:
---

Summary: StarterProject: Move configuration from SparkClient to HiveConf 
[Spark Branch]  (was: StarterProject: Move configuration from SparkClient to 
HiveConf)

 StarterProject: Move configuration from SparkClient to HiveConf [Spark Branch]
 --

 Key: HIVE-7559
 URL: https://issues.apache.org/jira/browse/HIVE-7559
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Brock Noland
Priority: Minor
  Labels: StarterProject

 The SparkClient class has some configuration keys and defaults. These should 
 be moved to HiveConf.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7665) Create TestSparkCliDriver to run test in spark local mode [Spark Branch]

2014-08-18 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7665:
---

Summary: Create TestSparkCliDriver to run test in spark local mode [Spark 
Branch]  (was: Create TestSparkCliDriver to run test in spark local mode)

 Create TestSparkCliDriver to run test in spark local mode [Spark Branch]
 

 Key: HIVE-7665
 URL: https://issues.apache.org/jira/browse/HIVE-7665
 Project: Hive
  Issue Type: Sub-task
  Components: Spark, Testing Infrastructure
Reporter: Szehon Ho
Assignee: Szehon Ho
 Fix For: spark-branch

 Attachments: HIVE-7665-spark.patch, HIVE-7665.2-spark.patch, 
 HIVE-7665.3-spark.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7561) StarterProject: Move from assert to Guava Preconditions.* in Hive on Spark [Spark Branch]

2014-08-18 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7561:
---

Summary: StarterProject: Move from assert to Guava Preconditions.* in Hive 
on Spark [Spark Branch]  (was: StarterProject: Move from assert to Guava 
Preconditions.* in Hive on Spark)

 StarterProject: Move from assert to Guava Preconditions.* in Hive on Spark 
 [Spark Branch]
 -

 Key: HIVE-7561
 URL: https://issues.apache.org/jira/browse/HIVE-7561
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Brock Noland
Assignee: Chao
  Labels: StarterProject
 Fix For: spark-branch

 Attachments: HIVE-7561-spark.patch, HIVE-7561.2-spark.patch, 
 HIVE-7561.3-spark.patch


 Hive uses the assert keyword all over the place. The problem is that 
 assertions are rarely enabled since they have to be specifically enabled. In 
 the Spark code, e.g. GenSparkUtils, let's use Preconditions.*.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7580) Support dynamic partitioning [Spark Branch]

2014-08-18 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7580:
---

Summary: Support dynamic partitioning [Spark Branch]  (was: Support dynamic 
partitioning)

 Support dynamic partitioning [Spark Branch]
 ---

 Key: HIVE-7580
 URL: https://issues.apache.org/jira/browse/HIVE-7580
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Chinna Rao Lalam

 My understanding is that we don't need to do anything special for this. 
 However, this needs to be verified and tested.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7569) Make sure multi-MR queries work [Spark Branch]

2014-08-18 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7569:
---

Summary: Make sure multi-MR queries work [Spark Branch]  (was: Make sure 
multi-MR queries work)

 Make sure multi-MR queries work [Spark Branch]
 --

 Key: HIVE-7569
 URL: https://issues.apache.org/jira/browse/HIVE-7569
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Chao

 With the latest dev effort, queries that would involve multiple MR jobs 
 should be supported by spark now, except for sorting, multi-insert, union, 
 and join (map join and smb might just work). However, this hasn't be verified 
 and tested. This task is to ensure this is the case. Please create JIRAs for 
 problems found.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7560) StarterProject: Fix exception handling in POC code [Spark Branch]

2014-08-18 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7560:
---

Summary: StarterProject: Fix exception handling in POC code [Spark Branch]  
(was: StarterProject: Fix exception handling in POC code)

 StarterProject: Fix exception handling in POC code [Spark Branch]
 -

 Key: HIVE-7560
 URL: https://issues.apache.org/jira/browse/HIVE-7560
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Brock Noland
Assignee: Chao
  Labels: StarterProject
 Fix For: spark-branch

 Attachments: HIVE-7560.1-spark.patch


 The POC code just printed exceptions to stderr. We should either:
 1) LOG at INFO/WARN/ERROR
 2) Or rethrow (perhaps wrapped in runtime exception) anything is a fatal error



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7465) Implement pre-commit testing [Spark Branch]

2014-08-18 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7465:
---

Summary: Implement pre-commit testing [Spark Branch]  (was: Implement 
pre-commit testing)

 Implement pre-commit testing [Spark Branch]
 ---

 Key: HIVE-7465
 URL: https://issues.apache.org/jira/browse/HIVE-7465
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Brock Noland
Assignee: Brock Noland
 Fix For: spark-branch

 Attachments: HIVE-7465-spark.patch, HIVE-7465-spark.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7503) Support Hive's multi-table insert query with Spark [Spark Branch]

2014-08-18 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7503:
---

Summary: Support Hive's multi-table insert query with Spark [Spark Branch]  
(was: Support Hive's multi-table insert query with Spark)

 Support Hive's multi-table insert query with Spark [Spark Branch]
 -

 Key: HIVE-7503
 URL: https://issues.apache.org/jira/browse/HIVE-7503
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Chao

 For Hive's multi insert query 
 (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML), there 
 may be an MR job for each insert.  When we achieve this with Spark, it would 
 be nice if all the inserts can happen concurrently.
 It seems that this functionality isn't available in Spark. To make things 
 worse, the source of the insert may be re-computed unless it's staged. Even 
 with this, the inserts will happen sequentially, making the performance 
 suffer.
 This task is to find out what takes in Spark to enable this without requiring 
 staging the source and sequential insertion. If this has to be solved in 
 Hive, find out an optimum way to do this.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7439) Spark job monitoring and error reporting [Spark Branch]

2014-08-18 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7439:
---

Summary: Spark job monitoring and error reporting [Spark Branch]  (was: 
Spark job monitoring and error reporting)

 Spark job monitoring and error reporting [Spark Branch]
 ---

 Key: HIVE-7439
 URL: https://issues.apache.org/jira/browse/HIVE-7439
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Chengxiang Li

 After Hive submits a job to Spark cluster, we need to report to user the job 
 progress, such as the percentage done, to the user. This is especially 
 important for long running queries. Moreover, if there is an error during job 
 submission or execution, it's also crucial for hive to fetch the error log 
 and/or stacktrace and feedback it to the user.
 Please refer design doc on wiki for more information.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7541) Support union all on Spark [Spark Branch]

2014-08-18 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7541:
---

Summary: Support union all on Spark [Spark Branch]  (was: Support union all 
on Spark)

 Support union all on Spark [Spark Branch]
 -

 Key: HIVE-7541
 URL: https://issues.apache.org/jira/browse/HIVE-7541
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Xuefu Zhang
Assignee: Na Yang
 Fix For: spark-branch

 Attachments: HIVE-7541.1-spark.patch, HIVE-7541.2-spark.patch, 
 HIVE-7541.3-spark.patch, HIVE-7541.4-spark.patch, HIVE-7541.5-spark.patch, 
 Hive on Spark Union All design.pdf


 For union all operator, we will use Spark's union transformation. Refer to 
 the design doc on wiki for more information.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


  1   2   3   >