Evert - Thanks for the instructions, this is generally useful in other scenarios, but I think this isn’t what Shahab needs, because |saveAsTable| actually saves the contents of the SchemaRDD into Hive.

Shahab - As Michael has answered in another thread, you may try |HiveThriftServer2.startWithContext|, which is a quite experimental feature. Here is a quick |spark-shell| sample session:

|import  org.apache.spark.sql.hive.HiveContext
import  org.apache.spark.sql.catalyst.types._
import  java.sql.Date

val  sparkContext  =  sc
import  sparkContext._

val  sqlContext  =  new  HiveContext(sparkContext)
import  sqlContext._

makeRDD((1,"hello") :: (2,"world") 
::Nil).toSchemaRDD.cache().registerTempTable("t")

import  org.apache.spark.sql.hive.thriftserver._
HiveThriftServer2.startWithContext(sqlContext)
|

Then you can connect to the started server via beeline:

|$ ./bin/beeline -u jdbc:hive2://localhost:10000/default
0: jdbc:hive2://localhost:10000/default> select * from t;
+-----+--------+
| _1  |   _2   |
+-----+--------+
| 1   | hello  |
| 2   | world  |
+-----+--------+
2 rows selected (0.208 seconds)
|

Cheng

On 12/20/14 1:09 AM, Evert Lammerts wrote:

Yes you can, using HiveContext, a metastore and the thriftserver. The metastore persists information about your SchemaRDD, and the HiveContext, initialised with information on the metastore, can interact with the metastore. The thriftserver provides JDBC connections using the metastore.

Using MySQL as an example backend for the metastore:

1. Install MySQL
2. Create a database: CREATE database hive_metastore CHARSET latin1;
3. Create a metastore user: GRANT ALL ON hive_metastore.* TO metastore_user IDENTIFIED BY 'password'; 4. Create a hive-site.xml in your Spark's conf dir: see http://pastebin.com/VXcmJWdX for an example 5. Download the mysql jdbc driver from http://dev.mysql.com/downloads/connector/j/ 6. Start the spark-shell with the mysql driver on the classpath: $ ./bin/spark-shell --driver-class-path mysql-connector-java-5.1.34-bin.jar
7. Register the table using something like:
> val sqlct = new org.apache.spark.sql.hive.HiveContext(sc)
> sqlct.setConf("hive.metastore.warehouse.dir”, "/some/path/to/store/tables") # if you're local. i.e. not using HDFS
> ... # create your schemardd using sqlct
> rdd.saveAsTable("mytable")
8. Start the thriftserver (which provides the JDBC connection): 0.9710645253623995nbsp;./sbin/start-thriftserver.sh --driver-class-path mysql-connector-java-5.1.34-bin.jar --conf hive.metastore.warehouse.dir=/some/path/to/store/tables

Something like that should do it. Now you can connect from for example beeline:

$ ./bin/beeline
> !connect jdbc:hive2://localhost:10000
> show tables;

This is a good guide re the metastore regardless of your distribution: http://www.cloudera.com/content/cloudera/en/documentation/cdh4/v4-2-0/CDH4-Installation-Guide/cdh4ig_topic_18_4.html.



On Fri Dec 19 2014 at 5:34:49 PM shahab <shahab.mok...@gmail.com <mailto:shahab.mok...@gmail.com>> wrote:

    Hi,

    Sorry for repeating the same question, just wanted to clarify the
    issue :

    Is it possible to expose a RDD (or SchemaRDD) to external
    components (outside spark) so it can  be queried over JDBC (my
    goal is not to place the RDD back in a database  but use this
    cached RDD to server JDBC queries) ?

    best,

    /shahab

Reply via email to