Re: Spark SQL and Hive tables

Michael Armbrust Fri, 25 Jul 2014 15:25:34 -0700

>
> [S]ince Hive has a large number of dependencies, it is not included in the
> default Spark assembly. In order to use Hive you must first run 
> ‘SPARK_HIVE=true
> sbt/sbt assembly/assembly’ (or use -Phive for maven). This command builds
> a new assembly jar that includes Hive. Note that this Hive assembly jar
> must also be present on all of the worker nodes, as they will need access
> to the Hive serialization and deserialization libraries (SerDes) in order
> to acccess data stored in Hive.




On Fri, Jul 25, 2014 at 3:20 PM, Sameer Tilak <ssti...@live.com> wrote:

> Hi Jerry,
>
> I am having trouble with this. May be something wrong with my import or
> version etc.
>
> scala> import org.apache.spark.sql._;
> import org.apache.spark.sql._
>
> scala> val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
> <console>:24: error: object hive is not a member of package
> org.apache.spark.sql
>        val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
>                                                   ^
> Here is what I see for autocompletion:
>
> scala> org.apache.spark.sql.
> Row             SQLContext      SchemaRDD       SchemaRDDLike   api
> catalyst        columnar        execution       package         parquet
> test
>
>
> ------------------------------
> Date: Fri, 25 Jul 2014 17:48:27 -0400
>
> Subject: Re: Spark SQL and Hive tables
> From: chiling...@gmail.com
> To: user@spark.apache.org
>
>
> Hi Sameer,
>
> The blog post you referred to is about Spark SQL. I don't think the intent
> of the article is meant to guide you how to read data from Hive via Spark
> SQL. So don't worry too much about the blog post.
>
> The programming guide I referred to demonstrate how to read data from Hive
> using Spark SQL. It is a good starting point.
>
> Best Regards,
>
> Jerry
>
>
> On Fri, Jul 25, 2014 at 5:38 PM, Sameer Tilak <ssti...@live.com> wrote:
>
> Hi Michael,
> Thanks. I am not creating HiveContext, I am creating SQLContext. I am
> using CDH 5.1. Can you please let me know which conf/ directory you are
> talking about?
>
> ------------------------------
> From: mich...@databricks.com
> Date: Fri, 25 Jul 2014 14:34:53 -0700
>
> Subject: Re: Spark SQL and Hive tables
> To: user@spark.apache.org
>
>
> In particular, have you put your hive-site.xml in the conf/ directory?
>  Also, are you creating a HiveContext instead of a SQLContext?
>
>
> On Fri, Jul 25, 2014 at 2:27 PM, Jerry Lam <chiling...@gmail.com> wrote:
>
> Hi Sameer,
>
> Maybe this page will help you:
> https://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables
>
> Best Regards,
>
> Jerry
>
>
>
> On Fri, Jul 25, 2014 at 5:25 PM, Sameer Tilak <ssti...@live.com> wrote:
>
> Hi All,
> I am trying to load data from Hive tables using Spark SQL. I am using
> spark-shell. Here is what I see:
>
> val trainingDataTable = sql("""SELECT prod.prod_num, demographics.gender,
> demographics.birth_year, demographics.income_group  FROM prod p JOIN
> demographics d ON d.user_id = p.user_id""")
>
> 14/07/25 14:18:46 INFO Analyzer: Max iterations (2) reached for batch
> MultiInstanceRelations
> 14/07/25 14:18:46 INFO Analyzer: Max iterations (2) reached for batch
> CaseInsensitiveAttributeReferences
> java.lang.RuntimeException: Table Not Found: prod.
>
> I have these tables in hive. I used show tables command to confirm this.
> Can someone please let me know how do I make them accessible here?
>
>
>
>
>

Re: Spark SQL and Hive tables

Reply via email to