I'm building a system that collects data using Spark Streaming, does some
processing with it, then saves the data. I want the data to be queried by
multiple applications, and it sounds like the Thrift JDBC/ODBC server might
be the right tool to handle the queries. However,  the documentation for the
Thrift server
<http://spark.apache.org/docs/latest/sql-programming-guide.html#running-the-thrift-jdbcodbc-server>
  
seems to be written for Hive users who are moving to Spark. I never used
Hive before I started using Spark, so it is not clear to me how best to use
this.

I've tried putting data into Hive, then serving it with the Thrift server.
But I have not been able to update the data in Hive without first shutting
down the server. This is a problem because new data is always being streamed
in, and so the data must continuously be updated.

The system I'm building is supposed to replace a system that stores the data
in MongoDB. The dataset has now grown so large that the database index does
not fit in memory, which causes major performance problems in MongoDB.

If the Thrift server is the right tool for me, how can I set it up for my
application? If it is not the right tool, what else can I use?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Is-the-Thrift-server-right-for-me-tp21044.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to