Thanks Evert for the detailing the solution, I do appreciate it. But I would first try Cheng's suggestion. And Thanks Cheng for the help. I will let you know if I succeed.
best, /Shahab On Sun, Dec 21, 2014 at 12:49 PM, Cheng Lian <[email protected]> wrote: > Evert - Thanks for the instructions, this is generally useful in other > scenarios, but I think this isn’t what Shahab needs, because saveAsTable > actually saves the contents of the SchemaRDD into Hive. > > Shahab - As Michael has answered in another thread, you may try > HiveThriftServer2.startWithContext, which is a quite experimental > feature. Here is a quick spark-shell sample session: > > import org.apache.spark.sql.hive.HiveContextimport > org.apache.spark.sql.catalyst.types._import java.sql.Date > val sparkContext = scimport sparkContext._ > val sqlContext = new HiveContext(sparkContext)import sqlContext._ > > makeRDD((1, "hello") :: (2, "world") :: > Nil).toSchemaRDD.cache().registerTempTable("t") > import > org.apache.spark.sql.hive.thriftserver._HiveThriftServer2.startWithContext(sqlContext) > > Then you can connect to the started server via beeline: > > $ ./bin/beeline -u jdbc:hive2://localhost:10000/default > 0: jdbc:hive2://localhost:10000/default> select * from t; > +-----+--------+ > | _1 | _2 | > +-----+--------+ > | 1 | hello | > | 2 | world | > +-----+--------+ > 2 rows selected (0.208 seconds) > > Cheng > > On 12/20/14 1:09 AM, Evert Lammerts wrote: > > Yes you can, using HiveContext, a metastore and the thriftserver. The > metastore persists information about your SchemaRDD, and the HiveContext, > initialised with information on the metastore, can interact with the > metastore. The thriftserver provides JDBC connections using the metastore. > > Using MySQL as an example backend for the metastore: > > 1. Install MySQL > 2. Create a database: CREATE database hive_metastore CHARSET latin1; > 3. Create a metastore user: GRANT ALL ON hive_metastore.* TO > metastore_user IDENTIFIED BY 'password'; > 4. Create a hive-site.xml in your Spark's conf dir: see > http://pastebin.com/VXcmJWdX for an example > 5. Download the mysql jdbc driver from > http://dev.mysql.com/downloads/connector/j/ > 6. Start the spark-shell with the mysql driver on the classpath: $ > ./bin/spark-shell --driver-class-path mysql-connector-java-5.1.34-bin.jar > 7. Register the table using something like: > > val sqlct = new org.apache.spark.sql.hive.HiveContext(sc) > > sqlct.setConf("hive.metastore.warehouse.dir”, > "/some/path/to/store/tables") # if you're local. i.e. not using HDFS > > ... # create your schemardd using sqlct > > rdd.saveAsTable("mytable") > 8. Start the thriftserver (which provides the JDBC > connection): 0.9710645253623995nbsp;./sbin/start-thriftserver.sh > --driver-class-path mysql-connector-java-5.1.34-bin.jar --conf > hive.metastore.warehouse.dir=/some/path/to/store/tables > > Something like that should do it. Now you can connect from for example > beeline: > > $ ./bin/beeline > > !connect jdbc:hive2://localhost:10000 > > show tables; > > This is a good guide re the metastore regardless of your distribution: > http://www.cloudera.com/content/cloudera/en/documentation/cdh4/v4-2-0/CDH4-Installation-Guide/cdh4ig_topic_18_4.html > . > > > > On Fri Dec 19 2014 at 5:34:49 PM shahab <[email protected]> wrote: > >> Hi, >> >> Sorry for repeating the same question, just wanted to clarify the issue >> : >> >> Is it possible to expose a RDD (or SchemaRDD) to external components >> (outside spark) so it can be queried over JDBC (my goal is not to place >> the RDD back in a database but use this cached RDD to server JDBC queries) >> ? >> >> best, >> >> /shahab >> > >
