How do you use the thrift-server to get data from a Spark program?

Edward Sargisson Sun, 26 Oct 2014 12:16:50 -0700

Hi all,
This feels like a dumb question but bespeaks my lack of understanding: what
is the Spark thrift-server for? Especially if there's an existing Hive
installation.


Background:
We want to use Spark to do some processing starting from files (in probably
MapRFS). We want to be able to read the result using SQL so that we can
report the results using Eclipse BIRT.

My confusion:
Spark 1.1 includes a thrift-server for accessing data via JDBC. However, I
don't understand how to make data available in it from the rest of Spark.

I have a small program that does what I want in spark-shell. It reads some
JSON, does some manipulation using SchemaRDDs and then has the data ready.
If I've started the shell with the hive-site.xml pointing to a Hive
installation I can use SchemaRDD.saveToTable to put it into Hive - and then
I can use beeline to read it.

But that's using the *Hive* thrift-server and not the Spark thrift-server.
That doesn't seem to be the intention of having a separate thrift-server in
Spark. Before I started on this I assumed that you could run a Spark
program (in, say, Java) and then make those results accessible for the JDBC
interface.

So, please, fill me in. What am I missing?

Many thanks,
Edward

How do you use the thrift-server to get data from a Spark program?

Reply via email to