Hi all, This feels like a dumb question but bespeaks my lack of understanding: what is the Spark thrift-server for? Especially if there's an existing Hive installation.
Background: We want to use Spark to do some processing starting from files (in probably MapRFS). We want to be able to read the result using SQL so that we can report the results using Eclipse BIRT. My confusion: Spark 1.1 includes a thrift-server for accessing data via JDBC. However, I don't understand how to make data available in it from the rest of Spark. I have a small program that does what I want in spark-shell. It reads some JSON, does some manipulation using SchemaRDDs and then has the data ready. If I've started the shell with the hive-site.xml pointing to a Hive installation I can use SchemaRDD.saveToTable to put it into Hive - and then I can use beeline to read it. But that's using the *Hive* thrift-server and not the Spark thrift-server. That doesn't seem to be the intention of having a separate thrift-server in Spark. Before I started on this I assumed that you could run a Spark program (in, say, Java) and then make those results accessible for the JDBC interface. So, please, fill me in. What am I missing? Many thanks, Edward
