Re: How do you use the thrift-server to get data from a Spark program?

Michael Armbrust Sun, 26 Oct 2014 16:13:53 -0700

This is very experimental and mostly unsupported, but you can start the
JDBC server from within your own programs
<https://github.com/apache/spark/blob/master/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2.scala#L45>
by
passing it the HiveContext.


On Sun, Oct 26, 2014 at 12:16 PM, Edward Sargisson <[email protected]>
wrote:

> Hi all,
> This feels like a dumb question but bespeaks my lack of understanding:
> what is the Spark thrift-server for? Especially if there's an existing Hive
> installation.
>
> Background:
> We want to use Spark to do some processing starting from files (in
> probably MapRFS). We want to be able to read the result using SQL so that
> we can report the results using Eclipse BIRT.
>
> My confusion:
> Spark 1.1 includes a thrift-server for accessing data via JDBC. However, I
> don't understand how to make data available in it from the rest of Spark.
>
> I have a small program that does what I want in spark-shell. It reads some
> JSON, does some manipulation using SchemaRDDs and then has the data ready.
> If I've started the shell with the hive-site.xml pointing to a Hive
> installation I can use SchemaRDD.saveToTable to put it into Hive - and then
> I can use beeline to read it.
>
> But that's using the *Hive* thrift-server and not the Spark thrift-server.
> That doesn't seem to be the intention of having a separate thrift-server in
> Spark. Before I started on this I assumed that you could run a Spark
> program (in, say, Java) and then make those results accessible for the JDBC
> interface.
>
> So, please, fill me in. What am I missing?
>
> Many thanks,
> Edward
>

Re: How do you use the thrift-server to get data from a Spark program?

Reply via email to