I haven’t tried using Zeppelin with Spark on Cassandra, so can’t say for sure, 
but it should not be difficult.

Mohammed

From: Matthew Johnson [mailto:matt.john...@algomi.com]
Sent: Monday, June 22, 2015 2:15 AM
To: Mohammed Guller; shahid ashraf
Cc: user@spark.apache.org
Subject: RE: Code review - Spark SQL command-line client for Cassandra

Thanks Mohammed, it’s good to know I’m not alone!

How easy is it to integrate Zeppelin with Spark on Cassandra? It looks like it 
would only support Hadoop out of the box. Is it just a case of dropping the 
Cassandra Connector onto the Spark classpath?

Cheers,
Matthew

From: Mohammed Guller 
[mailto:moham...@glassbeam.com<mailto:moham...@glassbeam.com>]
Sent: 20 June 2015 17:27
To: shahid ashraf
Cc: Matthew Johnson; user@spark.apache.org<mailto:user@spark.apache.org>
Subject: RE: Code review - Spark SQL command-line client for Cassandra

It is a simple Play-based web application. It exposes an URI for submitting a 
SQL query. It then executes that query using CassandraSQLContext provided by 
Spark Cassandra Connector. Since it is web-based, I added an authentication and 
authorization layer to make sure that only users with the right authorization 
can use it.

I am happy to open-source that code if there is interest. Just need to carve 
out some time to clean it up and remove all the other services that this web 
application provides.

Mohammed

From: shahid ashraf [mailto:sha...@trialx.com]
Sent: Saturday, June 20, 2015 6:52 AM
To: Mohammed Guller
Cc: Matthew Johnson; user@spark.apache.org<mailto:user@spark.apache.org>
Subject: RE: Code review - Spark SQL command-line client for Cassandra


Hi Mohammad
Can you provide more info about the Service u developed
On Jun 20, 2015 7:59 AM, "Mohammed Guller" 
<moham...@glassbeam.com<mailto:moham...@glassbeam.com>> wrote:
Hi Matthew,
It looks fine to me. I have built a similar service that allows a user to 
submit a query from a browser and returns the result in JSON format.

Another alternative is to leave a Spark shell or one of the notebooks (Spark 
Notebook, Zeppelin, etc.) session open and run queries from there. This model 
works only if people give you the queries to execute.

Mohammed

From: Matthew Johnson 
[mailto:matt.john...@algomi.com<mailto:matt.john...@algomi.com>]
Sent: Friday, June 19, 2015 2:20 AM
To: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Code review - Spark SQL command-line client for Cassandra

Hi all,

I have been struggling with Cassandra’s lack of adhoc query support (I know 
this is an anti-pattern of Cassandra, but sometimes management come over and 
ask me to run stuff and it’s impossible to explain that it will take me a while 
when it would take about 10 seconds in MySQL) so I have put together the 
following code snippet that bundles DataStax’s Cassandra Spark connector and 
allows you to submit Spark SQL to it, outputting the results in a text file.

Does anyone spot any obvious flaws in this plan?? (I have a lot more error 
handling etc in my code, but removed it here for brevity)

    private void run(String sqlQuery) {
        SparkContext scc = new SparkContext(conf);
        CassandraSQLContext csql = new CassandraSQLContext(scc);
        DataFrame sql = csql.sql(sqlQuery);
        String folderName = "/tmp/output_" + System.currentTimeMillis();
        LOG.info("Attempting to save SQL results in folder: " + folderName);
        sql.rdd().saveAsTextFile(folderName);
        LOG.info("SQL results saved");
    }

    public static void main(String[] args) {

        String sparkMasterUrl = args[0];
        String sparkHost = args[1];
        String sqlQuery = args[2];

        SparkConf conf = new SparkConf();
        conf.setAppName("Java Spark SQL");
        conf.setMaster(sparkMasterUrl);
        conf.set("spark.cassandra.connection.host", sparkHost);

        JavaSparkSQL app = new JavaSparkSQL(conf);

        app.run(sqlQuery, printToConsole);
    }

I can then submit this to Spark with ‘spark-submit’:


>  ./spark-submit --class com.algomi.spark.JavaSparkSQL --master 
> spark://sales3:7077 
> spark-on-cassandra-0.0.1-SNAPSHOT-jar-with-dependencies.jar 
> spark://sales3:7077 sales3 "select * from mykeyspace.operationlog"

It seems to work pretty well, so I’m pretty happy, but wondering why this isn’t 
common practice (at least I haven’t been able to find much about it on Google) 
– is there something terrible that I’m missing?

Thanks!
Matthew


Reply via email to