Thanks Michael. OK will try SharkServer2.. But I have some basic questions on a related area:
1) If I have a standalone spark application that has already built a RDD, how can SharkServer2 or for that matter Shark access 'that' RDD and do queries on it. All the examples I have seen for Shark, the RDD (tables) are created within Shark's spark context and processed. I have stylized the real problem we have which is, "we have a standalone spark application that is processing DStreams and producing output Dstreams. I want to expose that near real-time Dstream data to a 3 rd party app via JDBC and allow SharkServer2 CLI to operate and query on the Dstreams real-time all from memory". Currently we are writing output stream to Cassandra and exposing it to 3 rd party app through it via JDBC, but want to avoid that extra disk write which increases latency. 2) I have two applications, one used for processing and computing output RDD from an input and another for post processing the resultant RDD into multiple persistent stores + doing other things with it. These are split in to separate processes intentionally. How do we share the output RDD from first application to second application without writing to disk (thinking of serializing the RDD and streaming through Kafka, but then we loose time and all the fault tolerance that RDD brings in)? Is Tachyon the only other way? Are there other models/design patterns for applications that share RDDs, as this may be a very common use case? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-JDBC-Connectivity-tp6511p6543.html Sent from the Apache Spark User List mailing list archive at Nabble.com.