Re: Code review - Spark SQL command-line client for Cassandra

pawan kumar Mon, 22 Jun 2015 09:20:34 -0700

Hi,

Zeppelin has a cassandra-spark-connector built into the build. I have not
tried it yet may be you could let us know.


https://github.com/apache/incubator-zeppelin/pull/79

To build a Zeppelin version with the *Datastax Spark/Cassandra connector
<https://github.com/datastax/spark-cassandra-connector>*

mvn clean package *-Pcassandra-spark-1.x* -Dhadoop.version=xxx -Phadoop-x.x
-DskipTests

Right now the Spark/Cassandra connector is available for *Spark 1.1* and *Spark
1.2*. Support for *Spark 1.3* is not released yet (*but you can build you
own Spark/Cassandra connector version 1.3.0-SNAPSHOT*). Support for *Spark
1.4* does not exist yet

Please do not forget to add -Dspark.cassandra.connection.host=xxx to the
*ZEPPELIN_JAVA_OPTS*parameter in *conf/zeppelin-env.sh* file. Alternatively
you can add this parameter in the parameter list of the *Spark interpreter* on
the GUI


-Pawan





On Mon, Jun 22, 2015 at 9:04 AM, Silvio Fiorito <
silvio.fior...@granturing.com> wrote:

>   Yes, just put the Cassandra connector on the Spark classpath and set
> the connector config properties in the interpreter settings.
>
>   From: Mohammed Guller
> Date: Monday, June 22, 2015 at 11:56 AM
> To: Matthew Johnson, shahid ashraf
>
> Cc: "user@spark.apache.org"
> Subject: RE: Code review - Spark SQL command-line client for Cassandra
>
>   I haven’t tried using Zeppelin with Spark on Cassandra, so can’t say
> for sure, but it should not be difficult.
>
>
>
> Mohammed
>
>
>
> *From:* Matthew Johnson [mailto:matt.john...@algomi.com
> <matt.john...@algomi.com>]
> *Sent:* Monday, June 22, 2015 2:15 AM
> *To:* Mohammed Guller; shahid ashraf
> *Cc:* user@spark.apache.org
> *Subject:* RE: Code review - Spark SQL command-line client for Cassandra
>
>
>
> Thanks Mohammed, it’s good to know I’m not alone!
>
>
>
> How easy is it to integrate Zeppelin with Spark on Cassandra? It looks
> like it would only support Hadoop out of the box. Is it just a case of
> dropping the Cassandra Connector onto the Spark classpath?
>
>
>
> Cheers,
>
> Matthew
>
>
>
> *From:* Mohammed Guller [mailto:moham...@glassbeam.com]
> *Sent:* 20 June 2015 17:27
> *To:* shahid ashraf
> *Cc:* Matthew Johnson; user@spark.apache.org
> *Subject:* RE: Code review - Spark SQL command-line client for Cassandra
>
>
>
> It is a simple Play-based web application. It exposes an URI for
> submitting a SQL query. It then executes that query using
> CassandraSQLContext provided by Spark Cassandra Connector. Since it is
> web-based, I added an authentication and authorization layer to make sure
> that only users with the right authorization can use it.
>
>
>
> I am happy to open-source that code if there is interest. Just need to
> carve out some time to clean it up and remove all the other services that
> this web application provides.
>
>
>
> Mohammed
>
>
>
> *From:* shahid ashraf [mailto:sha...@trialx.com <sha...@trialx.com>]
> *Sent:* Saturday, June 20, 2015 6:52 AM
> *To:* Mohammed Guller
> *Cc:* Matthew Johnson; user@spark.apache.org
> *Subject:* RE: Code review - Spark SQL command-line client for Cassandra
>
>
>
> Hi Mohammad
> Can you provide more info about the Service u developed
>
> On Jun 20, 2015 7:59 AM, "Mohammed Guller" <moham...@glassbeam.com> wrote:
>
> Hi Matthew,
>
> It looks fine to me. I have built a similar service that allows a user to
> submit a query from a browser and returns the result in JSON format.
>
>
>
> Another alternative is to leave a Spark shell or one of the notebooks
> (Spark Notebook, Zeppelin, etc.) session open and run queries from there.
> This model works only if people give you the queries to execute.
>
>
>
> Mohammed
>
>
>
> *From:* Matthew Johnson [mailto:matt.john...@algomi.com]
> *Sent:* Friday, June 19, 2015 2:20 AM
> *To:* user@spark.apache.org
> *Subject:* Code review - Spark SQL command-line client for Cassandra
>
>
>
> Hi all,
>
>
>
> I have been struggling with Cassandra’s lack of adhoc query support (I
> know this is an anti-pattern of Cassandra, but sometimes management come
> over and ask me to run stuff and it’s impossible to explain that it will
> take me a while when it would take about 10 seconds in MySQL) so I have put
> together the following code snippet that bundles DataStax’s Cassandra Spark
> connector and allows you to submit Spark SQL to it, outputting the results
> in a text file.
>
>
>
> Does anyone spot any obvious flaws in this plan?? (I have a lot more error
> handling etc in my code, but removed it here for brevity)
>
>
>
>     *private**void* run(String sqlQuery) {
>
>         SparkContext scc = *new* SparkContext(conf);
>
>         CassandraSQLContext csql = *new* CassandraSQLContext(scc);
>
>         DataFrame sql = csql.sql(sqlQuery);
>
>         String folderName = "/tmp/output_" + System.*currentTimeMillis*();
>
>         *LOG*.info("Attempting to save SQL results in folder: " +
> folderName);
>
>         sql.rdd().saveAsTextFile(folderName);
>
>         *LOG*.info("SQL results saved");
>
>     }
>
>
>
>     *public**static**void* main(String[] args) {
>
>
>
>         String sparkMasterUrl = args[0];
>
>         String sparkHost = args[1];
>
>         String sqlQuery = args[2];
>
>
>
>         SparkConf conf = *new* SparkConf();
>
>         conf.setAppName("Java Spark SQL");
>
>         conf.setMaster(sparkMasterUrl);
>
>         conf.set("spark.cassandra.connection.host", sparkHost);
>
>
>
>         JavaSparkSQL app = *new* JavaSparkSQL(conf);
>
>
>
>         app.run(sqlQuery, printToConsole);
>
>     }
>
>
>
> I can then submit this to Spark with ‘spark-submit’:
>
>
>
> Ø  *./spark-submit --class com.algomi.spark.JavaSparkSQL --master
> spark://sales3:7077
> spark-on-cassandra-0.0.1-SNAPSHOT-jar-with-dependencies.jar
> spark://sales3:7077 sales3 "select * from mykeyspace.operationlog" *
>
>
>
> It seems to work pretty well, so I’m pretty happy, but wondering why this
> isn’t common practice (at least I haven’t been able to find much about it
> on Google) – is there something terrible that I’m missing?
>
>
>
> Thanks!
>
> Matthew
>
>
>
>
>

Re: Code review - Spark SQL command-line client for Cassandra

Reply via email to