[jira] [Commented] (SPARK-11021) SparkSQL cli throws exception when using with Hive 0.12 metastore in spark-1.5.0 version

2015-11-18 Thread bit1129 (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15013046#comment-15013046
 ] 

bit1129 commented on SPARK-11021:
-

I encounter exactly the same issue with Spark 1.5.0 + 0.14.0, The problem is 
gone after I added the configuration as Jeff suggested,Thanks Jeff!

> SparkSQL cli throws exception when using with Hive 0.12 metastore in 
> spark-1.5.0 version
> 
>
> Key: SPARK-11021
> URL: https://issues.apache.org/jira/browse/SPARK-11021
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: iward
>
> After upgrade spark from 1.4.1 to 1.5.0,I get the following exception when I 
> set set the following properties in spark-defaults.conf:
> {noformat}
> spark.sql.hive.metastore.version=0.12.0
> spark.sql.hive.metastore.jars=hive 0.12 jars and hadoop jars
> {noformat}
> when I run a task,it got following exception:
> {noformat}
> java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.spark.sql.hive.client.Shim_v0_12.loadTable(HiveShim.scala:249)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$loadTable$1.apply$mcV$sp(ClientWrapper.scala:489)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$loadTable$1.apply(ClientWrapper.scala:489)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$loadTable$1.apply(ClientWrapper.scala:489)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$withHiveState$1.apply(ClientWrapper.scala:256)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper.retryLocked(ClientWrapper.scala:211)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:248)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper.loadTable(ClientWrapper.scala:488)
>   at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:243)
>   at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:127)
>   at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:263)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:140)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:138)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
>   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:138)
>   at 
> org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:927)
>   at 
> org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:927)
>   at org.apache.spark.sql.DataFrame.(DataFrame.scala:144)
>   at org.apache.spark.sql.DataFrame.(DataFrame.scala:129)
>   at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51)
>   at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:719)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:61)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:311)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:165)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672)
>   at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
>   at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>  Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move 
> results from 
> 

[jira] [Commented] (SPARK-11021) SparkSQL cli throws exception when using with Hive 0.12 metastore in spark-1.5.0 version

2015-11-05 Thread Jeff Mink (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991989#comment-14991989
 ] 

Jeff Mink commented on SPARK-11021:
---

We came across a seemingly similar issue with using older versions of Hive 
through Spark (It's hard to tell, because I don't see the query that caused the 
error above).

We are running Spark 1.5.1 with Hive 1.0. Any time we ran an INSERT OVERWRITE 
or CREATE TABLE AS SELECT through Spark's SQL context, we would see the 
following:
```
15/11/05 09:51:47 INFO output.FileOutputCommitter: Saved output of task 
'attempt_201511050951__m_00_0' to 
hdfs://development/user/jmink/jmink.db/test/.hive-staging_hive_2015-11-05_09-51-46_242_7888543996555678892-1/-ext-1/_temporary/0/task_201511050951__m_00
...
15/11/05 09:51:47 INFO common.FileUtils: deleting  
hdfs://development/user/jmink/jmink.db/test/.hive-staging_hive_2015-11-05_09-51-46_242_7888543996555678892-1
```

The reason it was doing this is because there is a new setting in later 
versions of Hive (I think 1.2?) `hive.exec.stagingdir` that, by default, is set 
to `.hive-staging`. This causes the staging data to be written to a 
subdirectory of the table that we are overwriting, which means that when our 
process gets to the OVERWRITE stage, it deletes the staging folder along with 
everything else in the table's location.

The fix for this was to edit our `/opt/spark/hive-site.xml` and add the 
following entry (of course, you can set this to whatever works for you):
```
  
hive.exec.stagingdir
/tmp/hive/spark-${user.name}
  
```

> SparkSQL cli throws exception when using with Hive 0.12 metastore in 
> spark-1.5.0 version
> 
>
> Key: SPARK-11021
> URL: https://issues.apache.org/jira/browse/SPARK-11021
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: iward
>
> After upgrade spark from 1.4.1 to 1.5.0,I get the following exception when I 
> set set the following properties in spark-defaults.conf:
> {noformat}
> spark.sql.hive.metastore.version=0.12.0
> spark.sql.hive.metastore.jars=hive 0.12 jars and hadoop jars
> {noformat}
> when I run a task,it got following exception:
> {noformat}
> java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.spark.sql.hive.client.Shim_v0_12.loadTable(HiveShim.scala:249)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$loadTable$1.apply$mcV$sp(ClientWrapper.scala:489)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$loadTable$1.apply(ClientWrapper.scala:489)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$loadTable$1.apply(ClientWrapper.scala:489)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$withHiveState$1.apply(ClientWrapper.scala:256)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper.retryLocked(ClientWrapper.scala:211)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:248)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper.loadTable(ClientWrapper.scala:488)
>   at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:243)
>   at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:127)
>   at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:263)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:140)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:138)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
>   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:138)
>   at 
> org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:927)
>   at 
> org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:927)
>   at org.apache.spark.sql.DataFrame.(DataFrame.scala:144)
>   at org.apache.spark.sql.DataFrame.(DataFrame.scala:129)
>   at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51)
>   at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:719)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:61)
>   at 
> 

[jira] [Commented] (SPARK-11021) SparkSQL cli throws exception when using with Hive 0.12 metastore in spark-1.5.0 version

2015-10-21 Thread Xiu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14967577#comment-14967577
 ] 

Xiu commented on SPARK-11021:
-

I put hive 0.12 jar paths inside spark-defaults.conf, and am able to create 
table.

Can you share the specific steps you did to encounter this exception? Thanks!

> SparkSQL cli throws exception when using with Hive 0.12 metastore in 
> spark-1.5.0 version
> 
>
> Key: SPARK-11021
> URL: https://issues.apache.org/jira/browse/SPARK-11021
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: iward
>
> After upgrade spark from 1.4.1 to 1.5.0,I get the following exception when I 
> set set the following properties in spark-defaults.conf:
> {noformat}
> spark.sql.hive.metastore.version=0.12.0
> spark.sql.hive.metastore.jars=hive 0.12 jars and hadoop jars
> {noformat}
> when I run a task,it got following exception:
> {noformat}
> java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.spark.sql.hive.client.Shim_v0_12.loadTable(HiveShim.scala:249)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$loadTable$1.apply$mcV$sp(ClientWrapper.scala:489)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$loadTable$1.apply(ClientWrapper.scala:489)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$loadTable$1.apply(ClientWrapper.scala:489)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$withHiveState$1.apply(ClientWrapper.scala:256)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper.retryLocked(ClientWrapper.scala:211)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:248)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper.loadTable(ClientWrapper.scala:488)
>   at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:243)
>   at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:127)
>   at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:263)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:140)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:138)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
>   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:138)
>   at 
> org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:927)
>   at 
> org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:927)
>   at org.apache.spark.sql.DataFrame.(DataFrame.scala:144)
>   at org.apache.spark.sql.DataFrame.(DataFrame.scala:129)
>   at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51)
>   at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:719)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:61)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:311)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:165)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672)
>   at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
>   at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>  Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move 
> results from 
>