This is very experimental and mostly unsupported, but you can start the JDBC server from within your own programs <https://github.com/apache/spark/blob/master/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2.scala#L45> by passing it the HiveContext.
On Sun, Oct 26, 2014 at 12:16 PM, Edward Sargisson <[email protected]> wrote: > Hi all, > This feels like a dumb question but bespeaks my lack of understanding: > what is the Spark thrift-server for? Especially if there's an existing Hive > installation. > > Background: > We want to use Spark to do some processing starting from files (in > probably MapRFS). We want to be able to read the result using SQL so that > we can report the results using Eclipse BIRT. > > My confusion: > Spark 1.1 includes a thrift-server for accessing data via JDBC. However, I > don't understand how to make data available in it from the rest of Spark. > > I have a small program that does what I want in spark-shell. It reads some > JSON, does some manipulation using SchemaRDDs and then has the data ready. > If I've started the shell with the hive-site.xml pointing to a Hive > installation I can use SchemaRDD.saveToTable to put it into Hive - and then > I can use beeline to read it. > > But that's using the *Hive* thrift-server and not the Spark thrift-server. > That doesn't seem to be the intention of having a separate thrift-server in > Spark. Before I started on this I assumed that you could run a Spark > program (in, say, Java) and then make those results accessible for the JDBC > interface. > > So, please, fill me in. What am I missing? > > Many thanks, > Edward >
