Thanks for pointing out the documentation error :) Opened
https://github.com/apache/spark/pull/6749 to fix this.
On 6/11/15 1:18 AM, James Pirz wrote:
Thanks for your help !
Switching to HiveContext fixed the issue.
Just one side comment:
In the documentation regarding Hive Tables and HiveContext
<https://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables>,
we see:
|// sc is an existing JavaSparkContext.
HiveContext sqlContext = new org.apache.spark.sql.hive.HiveContext(sc);|
But this is incorrect as the constructor in HiveContext does not
accept a JavaSparkContext, but a SparkContext. (the comment is
basically misleading). The correct code snippet should be:
|HiveContext sqlContext = new org.apache.spark.sql.hive.HiveContext(sc.sc
<http://sc.sc>());|
Thanks again for your help.
On Wed, Jun 10, 2015 at 1:17 AM, Cheng Lian <lian.cs....@gmail.com
<mailto:lian.cs....@gmail.com>> wrote:
Hm, this is a common confusion... Although the variable name is
`sqlContext` in Spark shell, it's actually a `HiveContext`, which
extends `SQLContext` and has the ability to communicate with Hive
metastore.
So your program need to instantiate a
`org.apache.spark.sql.hive.HiveContext` instead.
Cheng
On 6/10/15 10:19 AM, James Pirz wrote:
I am using Spark (standalone) to run queries (from a remote
client) against data in tables that are already defined/loaded in
Hive.
I have started metastore service in Hive successfully, and by
putting hive-site.xml, with proper metastore.uri, in
$SPARK_HOME/conf directory, I tried to share its config with spark.
When I start spark-shell, it gives me a default sqlContext, and I
can use that to access my Hive's tables with no problem.
But once I submit a similar query via Spark application through
'spark-submit', it does not see the tables and it seems it does
not pick hive-site.xml which is under conf directory in Spark's
home. I tried to use '--files' argument with spark-submit to pass
"hive-site.xml' to the workers, but it did not change anything.
Here is how I try to run the application:
$SPARK_HOME/bin/spark-submit --class "SimpleClient" --master
spark://my-spark-master:7077
--files=$SPARK_HOME/conf/hive-site.xml simple-sql-client-1.0.jar
Here is the simple example code that I try to run (in Java):
SparkConf conf = new SparkConf().setAppName("Simple SQL Client");
JavaSparkContext sc = new JavaSparkContext(conf);
SQLContext sqlContext = new org.apache.spark.sql.SQLContext(sc);
DataFrame res = sqlContext.sql("show tables");
res.show();
Here are the SW versions:
Spark: 1.3
Hive: 1.2
Hadoop: 2.6
Thanks in advance for any suggestion.