Short answer: yes.

The Spark Cassandra Connector supports the data source API. So you can create a 
DataFrame that points directly to a Cassandra table. You can query it using the 
DataFrame API or the SQL/HiveQL interface.

If you want to see an example,  see slide# 27 and 28 in this deck that I 
presented at the Cassandra Summit 2015:
http://www.slideshare.net/mg007/ad-hoc-analytics-with-cassandra-and-spark


Mohammed

From: Bryan [mailto:bryan.jeff...@gmail.com]
Sent: Tuesday, November 10, 2015 7:42 PM
To: Bryan Jeffrey; user
Subject: RE: Cassandra via SparkSQL/Hive JDBC

Anyone have thoughts or a similar use-case for SparkSQL / Cassandra?

Regards,

Bryan Jeffrey
________________________________
From: Bryan Jeffrey<mailto:bryan.jeff...@gmail.com>
Sent: ‎11/‎4/‎2015 11:16 AM
To: user<mailto:user@spark.apache.org>
Subject: Cassandra via SparkSQL/Hive JDBC
Hello.

I have been working to add SparkSQL HDFS support to our application.  We're 
able to process streaming data, append to a persistent Hive table, and have 
that table available via JDBC/ODBC.  Now we're looking to access data in 
Cassandra via SparkSQL.

In reading a number of previous posts, it appears that the way to do this is to 
instantiate a Spark Context, read the data into an RDD using the Cassandra 
Spark Connector, convert the data to a DF and register it as a temporary table. 
 The data will then be accessible via SparkSQL - although I assume that you 
would need to refresh the table on a periodic basis.

Is there a more straightforward way to do this?  Is it possible to register the 
Cassandra table with Hive so that the SparkSQL thrift server instance can just 
read data directly?

Regards,

Bryan Jeffrey

Reply via email to