[jira] [Commented] (HIVE-10073) Runtime exception when querying HBase with Spark [Spark Branch]

2015-03-27 Thread Chengxiang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383406#comment-14383406
 ] 

Chengxiang Li commented on HIVE-10073:
--

Committed to spark branch, thanks jimmy for this contribution.

 Runtime exception when querying HBase with Spark [Spark Branch]
 ---

 Key: HIVE-10073
 URL: https://issues.apache.org/jira/browse/HIVE-10073
 Project: Hive
  Issue Type: Bug
  Components: Spark
Affects Versions: spark-branch
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: spark-branch

 Attachments: HIVE-10073.1-spark.patch, HIVE-10073.2-spark.patch, 
 HIVE-10073.3-spark.patch


 When querying HBase with Spark, we got 
 {noformat}
  Caused by: java.lang.IllegalArgumentException: Must specify table name
 at 
 org.apache.hadoop.hbase.mapreduce.TableOutputFormat.setConf(TableOutputFormat.java:188)
 at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
 at 
 org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveOutputFormat(HiveFileFormatUtils.java:276)
 at 
 org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveOutputFormat(HiveFileFormatUtils.java:266)
 at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:331)
 {noformat}
 But it works fine for MapReduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10073) Runtime exception when querying HBase with Spark [Spark Branch]

2015-03-26 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383219#comment-14383219
 ] 

Xuefu Zhang commented on HIVE-10073:


Okay. Makes sense.

 Runtime exception when querying HBase with Spark [Spark Branch]
 ---

 Key: HIVE-10073
 URL: https://issues.apache.org/jira/browse/HIVE-10073
 Project: Hive
  Issue Type: Bug
  Components: Spark
Affects Versions: spark-branch
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: spark-branch

 Attachments: HIVE-10073.1-spark.patch, HIVE-10073.2-spark.patch, 
 HIVE-10073.3-spark.patch


 When querying HBase with Spark, we got 
 {noformat}
  Caused by: java.lang.IllegalArgumentException: Must specify table name
 at 
 org.apache.hadoop.hbase.mapreduce.TableOutputFormat.setConf(TableOutputFormat.java:188)
 at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
 at 
 org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveOutputFormat(HiveFileFormatUtils.java:276)
 at 
 org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveOutputFormat(HiveFileFormatUtils.java:266)
 at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:331)
 {noformat}
 But it works fine for MapReduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10073) Runtime exception when querying HBase with Spark [Spark Branch]

2015-03-26 Thread Chengxiang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383238#comment-14383238
 ] 

Chengxiang Li commented on HIVE-10073:
--

+1

 Runtime exception when querying HBase with Spark [Spark Branch]
 ---

 Key: HIVE-10073
 URL: https://issues.apache.org/jira/browse/HIVE-10073
 Project: Hive
  Issue Type: Bug
  Components: Spark
Affects Versions: spark-branch
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: spark-branch

 Attachments: HIVE-10073.1-spark.patch, HIVE-10073.2-spark.patch, 
 HIVE-10073.3-spark.patch


 When querying HBase with Spark, we got 
 {noformat}
  Caused by: java.lang.IllegalArgumentException: Must specify table name
 at 
 org.apache.hadoop.hbase.mapreduce.TableOutputFormat.setConf(TableOutputFormat.java:188)
 at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
 at 
 org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveOutputFormat(HiveFileFormatUtils.java:276)
 at 
 org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveOutputFormat(HiveFileFormatUtils.java:266)
 at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:331)
 {noformat}
 But it works fine for MapReduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10073) Runtime exception when querying HBase with Spark [Spark Branch]

2015-03-26 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382162#comment-14382162
 ] 

Xuefu Zhang commented on HIVE-10073:


Hi [~jxiang] and [~chengxiang li], before we patch this on Hive side, I think 
it's better to find the root cause. If the problem is due to Spark, we can 
bring up the problem to that community. So far, I'm not convinced that the 
problem is on hive side.

 Runtime exception when querying HBase with Spark [Spark Branch]
 ---

 Key: HIVE-10073
 URL: https://issues.apache.org/jira/browse/HIVE-10073
 Project: Hive
  Issue Type: Bug
  Components: Spark
Affects Versions: spark-branch
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: spark-branch

 Attachments: HIVE-10073.1-spark.patch


 When querying HBase with Spark, we got 
 {noformat}
  Caused by: java.lang.IllegalArgumentException: Must specify table name
 at 
 org.apache.hadoop.hbase.mapreduce.TableOutputFormat.setConf(TableOutputFormat.java:188)
 at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
 at 
 org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveOutputFormat(HiveFileFormatUtils.java:276)
 at 
 org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveOutputFormat(HiveFileFormatUtils.java:266)
 at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:331)
 {noformat}
 But it works fine for MapReduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10073) Runtime exception when querying HBase with Spark [Spark Branch]

2015-03-26 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382253#comment-14382253
 ] 

Jimmy Xiang commented on HIVE-10073:


[~xuefuz], I think it's an issue on Hive side. In SparkRecordHandler, we use 
the job conf passed in from Hive. So it should be Hive's responsibility to make 
sure it has all the needed information.
[~chengxiang li], though I called checkOutputSpecs for both MapWork and 
ReduceWork, I agree with you that it is better to call it in  
SparkPlanGenerator::generate(BaseWork work). Let me upload a new patch.

 Runtime exception when querying HBase with Spark [Spark Branch]
 ---

 Key: HIVE-10073
 URL: https://issues.apache.org/jira/browse/HIVE-10073
 Project: Hive
  Issue Type: Bug
  Components: Spark
Affects Versions: spark-branch
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: spark-branch

 Attachments: HIVE-10073.1-spark.patch


 When querying HBase with Spark, we got 
 {noformat}
  Caused by: java.lang.IllegalArgumentException: Must specify table name
 at 
 org.apache.hadoop.hbase.mapreduce.TableOutputFormat.setConf(TableOutputFormat.java:188)
 at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
 at 
 org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveOutputFormat(HiveFileFormatUtils.java:276)
 at 
 org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveOutputFormat(HiveFileFormatUtils.java:266)
 at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:331)
 {noformat}
 But it works fine for MapReduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10073) Runtime exception when querying HBase with Spark [Spark Branch]

2015-03-26 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382712#comment-14382712
 ] 

Hive QA commented on HIVE-10073:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12707558/HIVE-10073.3-spark.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 7644 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_percentile_approx_23
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/807/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/807/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-807/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12707558 - PreCommit-HIVE-SPARK-Build

 Runtime exception when querying HBase with Spark [Spark Branch]
 ---

 Key: HIVE-10073
 URL: https://issues.apache.org/jira/browse/HIVE-10073
 Project: Hive
  Issue Type: Bug
  Components: Spark
Affects Versions: spark-branch
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: spark-branch

 Attachments: HIVE-10073.1-spark.patch, HIVE-10073.2-spark.patch, 
 HIVE-10073.3-spark.patch


 When querying HBase with Spark, we got 
 {noformat}
  Caused by: java.lang.IllegalArgumentException: Must specify table name
 at 
 org.apache.hadoop.hbase.mapreduce.TableOutputFormat.setConf(TableOutputFormat.java:188)
 at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
 at 
 org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveOutputFormat(HiveFileFormatUtils.java:276)
 at 
 org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveOutputFormat(HiveFileFormatUtils.java:266)
 at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:331)
 {noformat}
 But it works fine for MapReduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10073) Runtime exception when querying HBase with Spark [Spark Branch]

2015-03-26 Thread Chengxiang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383152#comment-14383152
 ] 

Chengxiang Li commented on HIVE-10073:
--

[~xuefuz], the root cause should be just like Jimmy mentioned, some hbase table 
properties are set to JobConf during checkOutputSpecs, and this method is not 
invoked in HoS. Actually Spark checkout output specs while user build RDD graph 
with certain actions, like PairRDDFunctions::saveAsHadoopDataset, 
PairRDDFunctions::saveAsNewAPIHadoopDataset, in HoS, we use foreach as action, 
and write data to hadoop storage inside Hive, so it should be Hive's 
reponsbility to check output specs.

 Runtime exception when querying HBase with Spark [Spark Branch]
 ---

 Key: HIVE-10073
 URL: https://issues.apache.org/jira/browse/HIVE-10073
 Project: Hive
  Issue Type: Bug
  Components: Spark
Affects Versions: spark-branch
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: spark-branch

 Attachments: HIVE-10073.1-spark.patch, HIVE-10073.2-spark.patch, 
 HIVE-10073.3-spark.patch


 When querying HBase with Spark, we got 
 {noformat}
  Caused by: java.lang.IllegalArgumentException: Must specify table name
 at 
 org.apache.hadoop.hbase.mapreduce.TableOutputFormat.setConf(TableOutputFormat.java:188)
 at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
 at 
 org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveOutputFormat(HiveFileFormatUtils.java:276)
 at 
 org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveOutputFormat(HiveFileFormatUtils.java:266)
 at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:331)
 {noformat}
 But it works fine for MapReduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10073) Runtime exception when querying HBase with Spark [Spark Branch]

2015-03-26 Thread Chengxiang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14381424#comment-14381424
 ] 

Chengxiang Li commented on HIVE-10073:
--

Hi, [~jxiang], I saw you only call checkOutputSpecs for ReduceWork, but there 
may be a FileSinkOperator in map-only job as well, so we may also need to 
checkOutputSpecs for MapWork. Besides, the checkOutputSpecs is invoked at 
SparkRecordHandler::init which would be executed for each task, 
SparkPlanGenerator::generate(BaseWork work) may be a better place to do this, 
we can checkOutputSpecs between clone jobconf and serialized jobconf, so this 
would only be checked once time at RSC side.

 Runtime exception when querying HBase with Spark [Spark Branch]
 ---

 Key: HIVE-10073
 URL: https://issues.apache.org/jira/browse/HIVE-10073
 Project: Hive
  Issue Type: Bug
  Components: Spark
Affects Versions: spark-branch
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: spark-branch

 Attachments: HIVE-10073.1-spark.patch


 When querying HBase with Spark, we got 
 {noformat}
  Caused by: java.lang.IllegalArgumentException: Must specify table name
 at 
 org.apache.hadoop.hbase.mapreduce.TableOutputFormat.setConf(TableOutputFormat.java:188)
 at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
 at 
 org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveOutputFormat(HiveFileFormatUtils.java:276)
 at 
 org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveOutputFormat(HiveFileFormatUtils.java:266)
 at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:331)
 {noformat}
 But it works fine for MapReduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10073) Runtime exception when querying HBase with Spark [Spark Branch]

2015-03-24 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14378743#comment-14378743
 ] 

Jimmy Xiang commented on HIVE-10073:


It looks like property hbase.mapred.outputtable is not set for HoS. It is in 
the table properties which is set properly.

For MR, it works because JobSubmitter (mapred code) calls 
output.checkOutputSpecs. Here the output class is HiveOuptputFormatImpl. In the 
checkOutputSpecs founction, the hbase related settings are copied to JobConf.

However, for Spark, I don't see where output.checkOutputSpecs is called based 
on the stacktrace:

{noformat}
at 
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:425)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:193)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:431)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
at 
org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:103)
at 
org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:58)
at 
org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:32)
at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:170)
at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:170)
at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:197)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)

{noformat}
[~chengxiang li], [~ruili], do you know why checkOutputSpecs isn't called for 
HoS in this case?


 Runtime exception when querying HBase with Spark [Spark Branch]
 ---

 Key: HIVE-10073
 URL: https://issues.apache.org/jira/browse/HIVE-10073
 Project: Hive
  Issue Type: Bug
  Components: Spark
Affects Versions: spark-branch
Reporter: Jimmy Xiang

 When querying HBase with Spark, we got 
 {noformat}
  Caused by: java.lang.IllegalArgumentException: Must specify table name
 at 
 org.apache.hadoop.hbase.mapreduce.TableOutputFormat.setConf(TableOutputFormat.java:188)
 at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
 at 
 org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveOutputFormat(HiveFileFormatUtils.java:276)
 at 
 org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveOutputFormat(HiveFileFormatUtils.java:266)
 at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:331)
 {noformat}
 But it works fine for MapReduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)