Pointing SparkSQL to existing Hive Metadata with data file locations in HDFS

Sanjay Subramanian Wed, 27 May 2015 17:54:07 -0700

hey guys
On the Hive/Hadoop ecosystem we have using Cloudera distribution CDH 5.2.x , 
there are about 300+ hive tables.The data is stored an text (moving slowly to 
Parquet) on HDFS.I want to use SparkSQL and point to the Hive metadata and be 
able to define JOINS etc using a programming structure like this 
import org.apache.spark.sql.hive.HiveContextval sqlContext = new 
HiveContext(sc)val schemaRdd = sqlContext.sql("some complex SQL")


Is that the way to go ? Some guidance will be great.
thanks
sanjay

Pointing SparkSQL to existing Hive Metadata with data file locations in HDFS

Reply via email to