RE: Spark SQL and Hive tables

Andrew Lee Fri, 25 Jul 2014 15:59:30 -0700

Hi Michael,
If I understand correctly, the assembly JAR file is deployed onto HDFS 
/user/$USER/.stagingSpark folders that will be used by all computing (worker) 
nodes when people run in yarn-cluster mode.
Could you elaborate more what does the document mean by this? It is a bit 
misleading and I guess this only applies to standalone mode?
Andrew L

Date: Fri, 25 Jul 2014 15:25:42 -0700
Subject: RE: Spark SQL and Hive tables
From: ssti...@live.com
To: user@spark.apache.org

Thanks!  Will do.

Sent via the Samsung GALAXY S®4, an AT&T 4G LTE smartphone

-------- Original message --------
From: Michael Armbrust 
Date:07/25/2014 3:24 PM (GMT-08:00) 
To: user@spark.apache.org 
Subject: Re: Spark SQL and Hive tables 

[S]ince Hive has a large number of dependencies, it is not included in the 
default Spark assembly. In order to use Hive
 you must first run ‘SPARK_HIVE=true sbt/sbt assembly/assembly’
 (or use -Phive for
 maven). This command builds a new assembly jar that includes Hive. Note that 
this Hive assembly jar must also be present on all of the worker nodes, as they 
will need access to the Hive serialization and deserialization libraries 
(SerDes) in order to acccess
 data stored in Hive.

On Fri, Jul 25, 2014 at 3:20 PM, Sameer Tilak 
<ssti...@live.com> wrote:

Hi Jerry,

I am having trouble with this. May be something wrong with my import or version 
etc. 

scala> import org.apache.spark.sql._;
import org.apache.spark.sql._

scala> val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
<console>:24: error: object hive is not a member of package org.apache.spark.sql
       val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
                                                  ^
Here is what I see for autocompletion:

scala> org.apache.spark.sql.
Row             SQLContext      SchemaRDD       SchemaRDDLike   api
catalyst        columnar        execution       package         parquet
test

Date: Fri, 25 Jul 2014 17:48:27 -0400

Subject: Re: Spark SQL and Hive tables

From: chiling...@gmail.com

To: user@spark.apache.org

Hi Sameer,

The blog post you referred to is about Spark SQL. I don't think the intent of 
the article is meant to guide you how to read data from Hive via Spark SQL. So 
don't worry too much about the blog post. 

The programming guide I referred to demonstrate how to read data from Hive 
using Spark SQL. It is a good starting point.

Best Regards,

Jerry

On Fri, Jul 25, 2014 at 5:38 PM, Sameer Tilak <ssti...@live.com> wrote:

Hi Michael,
Thanks. I am not creating HiveContext, I am creating SQLContext. I am using CDH 
5.1. Can you please let me know which conf/ directory you are talking about? 

From: mich...@databricks.com

Date: Fri, 25 Jul 2014 14:34:53 -0700

Subject: Re: Spark SQL and Hive tables

To: user@spark.apache.org

In particular, have you put your hive-site.xml in the conf/ directory?  Also, 
are you creating a HiveContext instead of a SQLContext?

On Fri, Jul 25, 2014 at 2:27 PM, Jerry Lam <chiling...@gmail.com> wrote:

Hi Sameer,

Maybe this page will help you: 
https://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables

Best Regards,

Jerry

On Fri, Jul 25, 2014 at 5:25 PM, Sameer Tilak <ssti...@live.com> wrote:

Hi All,
I am trying to load data from Hive tables using Spark SQL. I am using 
spark-shell. Here is what I see: 

val trainingDataTable = sql("""SELECT prod.prod_num, demographics.gender, 
demographics.birth_year, demographics.income_group  FROM prod p JOIN 
demographics d ON d.user_id = p.user_id""")

14/07/25 14:18:46 INFO Analyzer: Max iterations (2) reached for batch 
MultiInstanceRelations
14/07/25 14:18:46 INFO Analyzer: Max iterations (2) reached for batch 
CaseInsensitiveAttributeReferences
java.lang.RuntimeException: Table Not Found: prod.

I have these tables in hive. I used show tables command to confirm this. Can 
someone please let me know how do I make them accessible here?

RE: Spark SQL and Hive tables

Reply via email to