RE: Code review - Spark SQL command-line client for Cassandra

2015-06-23 Thread Matthew Johnson
Awesome, thanks Pawan – for now I’ll give spark-notebook a go until
Zeppelin catches up to Spark 1.4 (and when Zeppelin has a binary release –
my PC doesn’t seem too happy about building a Node.js app from source).
Thanks for the detailed instructions!!





*From:* pawan kumar [mailto:pkv...@gmail.com]
*Sent:* 22 June 2015 18:53
*To:* Matthew Johnson
*Cc:* Silvio Fiorito; Mohammed Guller; shahid ashraf; user
*Subject:* Re: Code review - Spark SQL command-line client for Cassandra



Hi Matthew,



you could add the dependencies yourself by using the %dep command in
zeppelin ( https://zeppelin.incubator.apache.org/docs/interpreter/spark.html).
I have not tried with zeppelin but have used spark-notebook
<https://github.com/andypetrella/spark-notebook> and got Cassandra
connector working. Below have provided samples.



*In Zeppelin: (Not Tested)*



*%*dep z.load("com.datastax.com:spark-cassandra-connector_2.11:1.4.0-M1")



Note: In order for Spark and Cassandra to work the Spark ,
Spark-Cassandra-Connector, Spark-notebook spark version should match. In
the above case it was 1.2.0



*If using spark-notebook: (Tested & works)*

Installed :

1.   Apache Spark 1.2.0

2.   Cassandra DSE - 1 node (just Cassandra and no analytics)

3.   Notebook:

wget
https://s3.eu-central-1.amazonaws.com/spark-notebook/tgz/spark-notebook-0.4.3-scala-2.10.4-spark-1.2.0-hadoop-2.4.0.tgz



Once notebook have been started :

http://ec2-xx-x-xx-xxx.us-west-x.compute.amazonaws.com:9000/#clusters



Select Standalone:

In SparkConf : update the spark master ip to EC2 : internal DNS name.



*In Spark Notebook:*

:dp "com.datastax.spark" % "spark-cassandra-connector_2.10" % "1.2.0-rc3"



import com.datastax.spark.connector._

import com.datastax.spark.connector.rdd.CassandraRDD



val cassandraHost:String = "localhost"

reset(lastChanges = _.set("spark.cassandra.connection.host", cassandraHost))

val rdd = sparkContext.cassandraTable("excelsior","test")

rdd.toArray.foreach(println)



Note: In order for Spark and Cassandra to work the Spark ,
Spark-Cassandra-Connector, Spark-notebook spark version should match. In
the above case it was 1.2.0











On Mon, Jun 22, 2015 at 9:52 AM, Matthew Johnson 
wrote:

Hi Pawan,



Looking at the changes for that git pull request, it looks like it just
pulls in the dependency (and transitives) for “spark-cassandra-connector”.
Since I am having to build Zeppelin myself anyway, would it be ok to just
add this myself for the connector for 1.4.0 (as found here
http://search.maven.org/#artifactdetails%7Ccom.datastax.spark%7Cspark-cassandra-connector_2.11%7C1.4.0-M1%7Cjar)?
What exactly is it that does not currently exist for Spark 1.4?



Thanks,

Matthew



*From:* pawan kumar [mailto:pkv...@gmail.com]
*Sent:* 22 June 2015 17:19
*To:* Silvio Fiorito
*Cc:* Mohammed Guller; Matthew Johnson; shahid ashraf; user@spark.apache.org
*Subject:* Re: Code review - Spark SQL command-line client for Cassandra



Hi,



Zeppelin has a cassandra-spark-connector built into the build. I have not
tried it yet may be you could let us know.



https://github.com/apache/incubator-zeppelin/pull/79



To build a Zeppelin version with the *Datastax Spark/Cassandra connector
<https://github.com/datastax/spark-cassandra-connector>*

mvn clean package *-Pcassandra-spark-1.x* -Dhadoop.version=xxx -Phadoop-x.x
-DskipTests

Right now the Spark/Cassandra connector is available for *Spark 1.1* and *Spark
1.2*. Support for *Spark 1.3* is not released yet (*but you can build you
own Spark/Cassandra connector version **1.3.0-SNAPSHOT*). Support for *Spark
1.4* does not exist yet

Please do not forget to add -Dspark.cassandra.connection.host=xxx to the
*ZEPPELIN_JAVA_OPTS*parameter in *conf/zeppelin-env.sh* file. Alternatively
you can add this parameter in the parameter list of the *Spark interpreter* on
the GUI



-Pawan











On Mon, Jun 22, 2015 at 9:04 AM, Silvio Fiorito <
silvio.fior...@granturing.com> wrote:

Yes, just put the Cassandra connector on the Spark classpath and set the
connector config properties in the interpreter settings.



*From: *Mohammed Guller
*Date: *Monday, June 22, 2015 at 11:56 AM
*To: *Matthew Johnson, shahid ashraf


*Cc: *"user@spark.apache.org"
*Subject: *RE: Code review - Spark SQL command-line client for Cassandra



I haven’t tried using Zeppelin with Spark on Cassandra, so can’t say for
sure, but it should not be difficult.



Mohammed



*From:* Matthew Johnson [mailto:matt.john...@algomi.com
]
*Sent:* Monday, June 22, 2015 2:15 AM
*To:* Mohammed Guller; shahid ashraf
*Cc:* user@spark.apache.org
*Subject:* RE: Code review - Spark SQL command-line client for Cassandra



Thanks Mohammed, it’s good to know I’m not alone!



How easy is it to integrate Zeppelin with Spark on Cassandra? It looks like
it would only support Hadoop out of the box. Is it just a c

Re: Code review - Spark SQL command-line client for Cassandra

2015-06-22 Thread pawan kumar
Hi Matthew,

you could add the dependencies yourself by using the %dep command in
zeppelin ( https://zeppelin.incubator.apache.org/docs/interpreter/spark.html).
I have not tried with zeppelin but have used spark-notebook
<https://github.com/andypetrella/spark-notebook> and got Cassandra
connector working. Below have provided samples.

*In Zeppelin: (Not Tested)*

%dep z.load("com.datastax.com:spark-cassandra-connector_2.11:1.4.0-M1")


Note: In order for Spark and Cassandra to work the Spark ,
Spark-Cassandra-Connector, Spark-notebook spark version should match. In
the above case it was 1.2.0

*If using spark-notebook: (Tested & works)*

Installed :

   1. Apache Spark 1.2.0
   2. Cassandra DSE - 1 node (just Cassandra and no analytics)
   3. Notebook:

wget
https://s3.eu-central-1.amazonaws.com/spark-notebook/tgz/spark-notebook-0.4.3-scala-2.10.4-spark-1.2.0-hadoop-2.4.0.tgz



Once notebook have been started :

http://ec2-xx-x-xx-xxx.us-west-x.compute.amazonaws.com:9000/#clusters



Select Standalone:

In SparkConf : update the spark master ip to EC2 : internal DNS name.



In Spark Notebook:

:dp "com.datastax.spark" % "spark-cassandra-connector_2.10" % "1.2.0-rc3"



import com.datastax.spark.connector._

import com.datastax.spark.connector.rdd.CassandraRDD



val cassandraHost:String = "localhost"

reset(lastChanges = _.set("spark.cassandra.connection.host", cassandraHost))

val rdd = sparkContext.cassandraTable("excelsior","test")

rdd.toArray.foreach(println)



Note: In order for Spark and Cassandra to work the Spark ,
Spark-Cassandra-Connector, Spark-notebook spark version should match. In
the above case it was 1.2.0






On Mon, Jun 22, 2015 at 9:52 AM, Matthew Johnson 
wrote:

> Hi Pawan,
>
>
>
> Looking at the changes for that git pull request, it looks like it just
> pulls in the dependency (and transitives) for “spark-cassandra-connector”.
> Since I am having to build Zeppelin myself anyway, would it be ok to just
> add this myself for the connector for 1.4.0 (as found here
> http://search.maven.org/#artifactdetails%7Ccom.datastax.spark%7Cspark-cassandra-connector_2.11%7C1.4.0-M1%7Cjar)?
> What exactly is it that does not currently exist for Spark 1.4?
>
>
>
> Thanks,
>
> Matthew
>
>
>
> *From:* pawan kumar [mailto:pkv...@gmail.com]
> *Sent:* 22 June 2015 17:19
> *To:* Silvio Fiorito
> *Cc:* Mohammed Guller; Matthew Johnson; shahid ashraf;
> user@spark.apache.org
> *Subject:* Re: Code review - Spark SQL command-line client for Cassandra
>
>
>
> Hi,
>
>
>
> Zeppelin has a cassandra-spark-connector built into the build. I have not
> tried it yet may be you could let us know.
>
>
>
> https://github.com/apache/incubator-zeppelin/pull/79
>
>
>
> To build a Zeppelin version with the *Datastax Spark/Cassandra connector
> <https://github.com/datastax/spark-cassandra-connector>*
>
> mvn clean package *-Pcassandra-spark-1.x* -Dhadoop.version=xxx
> -Phadoop-x.x -DskipTests
>
> Right now the Spark/Cassandra connector is available for *Spark 1.1* and 
> *Spark
> 1.2*. Support for *Spark 1.3* is not released yet (*but you can build you
> own Spark/Cassandra connector version **1.3.0-SNAPSHOT*). Support for *Spark
> 1.4* does not exist yet
>
> Please do not forget to add -Dspark.cassandra.connection.host=xxx to the
> *ZEPPELIN_JAVA_OPTS*parameter in *conf/zeppelin-env.sh* file.
> Alternatively you can add this parameter in the parameter list of the *Spark
> interpreter* on the GUI
>
>
>
> -Pawan
>
>
>
>
>
>
>
>
>
>
>
> On Mon, Jun 22, 2015 at 9:04 AM, Silvio Fiorito <
> silvio.fior...@granturing.com> wrote:
>
> Yes, just put the Cassandra connector on the Spark classpath and set the
> connector config properties in the interpreter settings.
>
>
>
> *From: *Mohammed Guller
> *Date: *Monday, June 22, 2015 at 11:56 AM
> *To: *Matthew Johnson, shahid ashraf
>
>
> *Cc: *"user@spark.apache.org"
> *Subject: *RE: Code review - Spark SQL command-line client for Cassandra
>
>
>
> I haven’t tried using Zeppelin with Spark on Cassandra, so can’t say for
> sure, but it should not be difficult.
>
>
>
> Mohammed
>
>
>
> *From:* Matthew Johnson [mailto:matt.john...@algomi.com
> ]
> *Sent:* Monday, June 22, 2015 2:15 AM
> *To:* Mohammed Guller; shahid ashraf
> *Cc:* user@spark.apache.org
> *Subject:* RE: Code review - Spark SQL command-line client for Cassandra
>
>
>
> Thanks Mohammed, it’s good to know I’m not alone!
>
>
>
> How easy is it to integrate Zeppelin with Spark on Cassandra? It looks
> like it would only support Hadoop out of the box. Is it just a case of
>

Re: Code review - Spark SQL command-line client for Cassandra

2015-06-22 Thread shahid ashraf
hi Folks

I am newbie to spark world, this seems very interesting work as well as
discusion.  I have same sort of use case.
I usuall use mysql to blocking query for Record Linkage , as of now the
data has grown very much and it's not scalling. I want to store all my data
on hdfs and expose it via spark sql. but i see usually its executed as
batch process. I want to expose spark sql as sort of api, just sql sort of
thing. not batch processing. Please provide necessary guidance ..

On Mon, Jun 22, 2015 at 10:22 PM, Matthew Johnson 
wrote:

> Hi Pawan,
>
>
>
> Looking at the changes for that git pull request, it looks like it just
> pulls in the dependency (and transitives) for “spark-cassandra-connector”.
> Since I am having to build Zeppelin myself anyway, would it be ok to just
> add this myself for the connector for 1.4.0 (as found here
> http://search.maven.org/#artifactdetails%7Ccom.datastax.spark%7Cspark-cassandra-connector_2.11%7C1.4.0-M1%7Cjar)?
> What exactly is it that does not currently exist for Spark 1.4?
>
>
>
> Thanks,
>
> Matthew
>
>
>
> *From:* pawan kumar [mailto:pkv...@gmail.com]
> *Sent:* 22 June 2015 17:19
> *To:* Silvio Fiorito
> *Cc:* Mohammed Guller; Matthew Johnson; shahid ashraf;
> user@spark.apache.org
> *Subject:* Re: Code review - Spark SQL command-line client for Cassandra
>
>
>
> Hi,
>
>
>
> Zeppelin has a cassandra-spark-connector built into the build. I have not
> tried it yet may be you could let us know.
>
>
>
> https://github.com/apache/incubator-zeppelin/pull/79
>
>
>
> To build a Zeppelin version with the *Datastax Spark/Cassandra connector
> <https://github.com/datastax/spark-cassandra-connector>*
>
> mvn clean package *-Pcassandra-spark-1.x* -Dhadoop.version=xxx
> -Phadoop-x.x -DskipTests
>
> Right now the Spark/Cassandra connector is available for *Spark 1.1* and 
> *Spark
> 1.2*. Support for *Spark 1.3* is not released yet (*but you can build you
> own Spark/Cassandra connector version **1.3.0-SNAPSHOT*). Support for *Spark
> 1.4* does not exist yet
>
> Please do not forget to add -Dspark.cassandra.connection.host=xxx to the
> *ZEPPELIN_JAVA_OPTS*parameter in *conf/zeppelin-env.sh* file.
> Alternatively you can add this parameter in the parameter list of the *Spark
> interpreter* on the GUI
>
>
>
> -Pawan
>
>
>
>
>
>
>
>
>
>
>
> On Mon, Jun 22, 2015 at 9:04 AM, Silvio Fiorito <
> silvio.fior...@granturing.com> wrote:
>
> Yes, just put the Cassandra connector on the Spark classpath and set the
> connector config properties in the interpreter settings.
>
>
>
> *From: *Mohammed Guller
> *Date: *Monday, June 22, 2015 at 11:56 AM
> *To: *Matthew Johnson, shahid ashraf
>
>
> *Cc: *"user@spark.apache.org"
> *Subject: *RE: Code review - Spark SQL command-line client for Cassandra
>
>
>
> I haven’t tried using Zeppelin with Spark on Cassandra, so can’t say for
> sure, but it should not be difficult.
>
>
>
> Mohammed
>
>
>
> *From:* Matthew Johnson [mailto:matt.john...@algomi.com
> ]
> *Sent:* Monday, June 22, 2015 2:15 AM
> *To:* Mohammed Guller; shahid ashraf
> *Cc:* user@spark.apache.org
> *Subject:* RE: Code review - Spark SQL command-line client for Cassandra
>
>
>
> Thanks Mohammed, it’s good to know I’m not alone!
>
>
>
> How easy is it to integrate Zeppelin with Spark on Cassandra? It looks
> like it would only support Hadoop out of the box. Is it just a case of
> dropping the Cassandra Connector onto the Spark classpath?
>
>
>
> Cheers,
>
> Matthew
>
>
>
> *From:* Mohammed Guller [mailto:moham...@glassbeam.com]
> *Sent:* 20 June 2015 17:27
> *To:* shahid ashraf
> *Cc:* Matthew Johnson; user@spark.apache.org
> *Subject:* RE: Code review - Spark SQL command-line client for Cassandra
>
>
>
> It is a simple Play-based web application. It exposes an URI for
> submitting a SQL query. It then executes that query using
> CassandraSQLContext provided by Spark Cassandra Connector. Since it is
> web-based, I added an authentication and authorization layer to make sure
> that only users with the right authorization can use it.
>
>
>
> I am happy to open-source that code if there is interest. Just need to
> carve out some time to clean it up and remove all the other services that
> this web application provides.
>
>
>
> Mohammed
>
>
>
> *From:* shahid ashraf [mailto:sha...@trialx.com ]
> *Sent:* Saturday, June 20, 2015 6:52 AM
> *To:* Mohammed Guller
> *Cc:* Matthew Johnson; user@spark.apache.org
> *Subject:* RE: Code review - Spark SQL command-line client for Cassandra
>
>
>
> Hi Mo

RE: Code review - Spark SQL command-line client for Cassandra

2015-06-22 Thread Matthew Johnson
Hi Pawan,



Looking at the changes for that git pull request, it looks like it just
pulls in the dependency (and transitives) for “spark-cassandra-connector”.
Since I am having to build Zeppelin myself anyway, would it be ok to just
add this myself for the connector for 1.4.0 (as found here
http://search.maven.org/#artifactdetails%7Ccom.datastax.spark%7Cspark-cassandra-connector_2.11%7C1.4.0-M1%7Cjar)?
What exactly is it that does not currently exist for Spark 1.4?



Thanks,

Matthew



*From:* pawan kumar [mailto:pkv...@gmail.com]
*Sent:* 22 June 2015 17:19
*To:* Silvio Fiorito
*Cc:* Mohammed Guller; Matthew Johnson; shahid ashraf; user@spark.apache.org
*Subject:* Re: Code review - Spark SQL command-line client for Cassandra



Hi,



Zeppelin has a cassandra-spark-connector built into the build. I have not
tried it yet may be you could let us know.



https://github.com/apache/incubator-zeppelin/pull/79



To build a Zeppelin version with the *Datastax Spark/Cassandra connector
<https://github.com/datastax/spark-cassandra-connector>*

mvn clean package *-Pcassandra-spark-1.x* -Dhadoop.version=xxx -Phadoop-x.x
-DskipTests

Right now the Spark/Cassandra connector is available for *Spark 1.1* and *Spark
1.2*. Support for *Spark 1.3* is not released yet (*but you can build you
own Spark/Cassandra connector version **1.3.0-SNAPSHOT*). Support for *Spark
1.4* does not exist yet

Please do not forget to add -Dspark.cassandra.connection.host=xxx to the
*ZEPPELIN_JAVA_OPTS*parameter in *conf/zeppelin-env.sh* file. Alternatively
you can add this parameter in the parameter list of the *Spark interpreter* on
the GUI



-Pawan











On Mon, Jun 22, 2015 at 9:04 AM, Silvio Fiorito <
silvio.fior...@granturing.com> wrote:

Yes, just put the Cassandra connector on the Spark classpath and set the
connector config properties in the interpreter settings.



*From: *Mohammed Guller
*Date: *Monday, June 22, 2015 at 11:56 AM
*To: *Matthew Johnson, shahid ashraf


*Cc: *"user@spark.apache.org"
*Subject: *RE: Code review - Spark SQL command-line client for Cassandra



I haven’t tried using Zeppelin with Spark on Cassandra, so can’t say for
sure, but it should not be difficult.



Mohammed



*From:* Matthew Johnson [mailto:matt.john...@algomi.com
]
*Sent:* Monday, June 22, 2015 2:15 AM
*To:* Mohammed Guller; shahid ashraf
*Cc:* user@spark.apache.org
*Subject:* RE: Code review - Spark SQL command-line client for Cassandra



Thanks Mohammed, it’s good to know I’m not alone!



How easy is it to integrate Zeppelin with Spark on Cassandra? It looks like
it would only support Hadoop out of the box. Is it just a case of dropping
the Cassandra Connector onto the Spark classpath?



Cheers,

Matthew



*From:* Mohammed Guller [mailto:moham...@glassbeam.com]
*Sent:* 20 June 2015 17:27
*To:* shahid ashraf
*Cc:* Matthew Johnson; user@spark.apache.org
*Subject:* RE: Code review - Spark SQL command-line client for Cassandra



It is a simple Play-based web application. It exposes an URI for submitting
a SQL query. It then executes that query using CassandraSQLContext provided
by Spark Cassandra Connector. Since it is web-based, I added an
authentication and authorization layer to make sure that only users with
the right authorization can use it.



I am happy to open-source that code if there is interest. Just need to
carve out some time to clean it up and remove all the other services that
this web application provides.



Mohammed



*From:* shahid ashraf [mailto:sha...@trialx.com ]
*Sent:* Saturday, June 20, 2015 6:52 AM
*To:* Mohammed Guller
*Cc:* Matthew Johnson; user@spark.apache.org
*Subject:* RE: Code review - Spark SQL command-line client for Cassandra



Hi Mohammad
Can you provide more info about the Service u developed

On Jun 20, 2015 7:59 AM, "Mohammed Guller"  wrote:

Hi Matthew,

It looks fine to me. I have built a similar service that allows a user to
submit a query from a browser and returns the result in JSON format.



Another alternative is to leave a Spark shell or one of the notebooks
(Spark Notebook, Zeppelin, etc.) session open and run queries from there.
This model works only if people give you the queries to execute.



Mohammed



*From:* Matthew Johnson [mailto:matt.john...@algomi.com]
*Sent:* Friday, June 19, 2015 2:20 AM
*To:* user@spark.apache.org
*Subject:* Code review - Spark SQL command-line client for Cassandra



Hi all,



I have been struggling with Cassandra’s lack of adhoc query support (I know
this is an anti-pattern of Cassandra, but sometimes management come over
and ask me to run stuff and it’s impossible to explain that it will take me
a while when it would take about 10 seconds in MySQL) so I have put
together the following code snippet that bundles DataStax’s Cassandra Spark
connector and allows you to submit Spark SQL to it, outputting the results
in a text file.



Does anyone spot any obvious flaws in this plan?? (I have a lot more error
handling etc

Re: Code review - Spark SQL command-line client for Cassandra

2015-06-22 Thread pawan kumar
Hi,

Zeppelin has a cassandra-spark-connector built into the build. I have not
tried it yet may be you could let us know.

https://github.com/apache/incubator-zeppelin/pull/79

To build a Zeppelin version with the *Datastax Spark/Cassandra connector
<https://github.com/datastax/spark-cassandra-connector>*

mvn clean package *-Pcassandra-spark-1.x* -Dhadoop.version=xxx -Phadoop-x.x
-DskipTests

Right now the Spark/Cassandra connector is available for *Spark 1.1* and *Spark
1.2*. Support for *Spark 1.3* is not released yet (*but you can build you
own Spark/Cassandra connector version 1.3.0-SNAPSHOT*). Support for *Spark
1.4* does not exist yet

Please do not forget to add -Dspark.cassandra.connection.host=xxx to the
*ZEPPELIN_JAVA_OPTS*parameter in *conf/zeppelin-env.sh* file. Alternatively
you can add this parameter in the parameter list of the *Spark interpreter* on
the GUI


-Pawan





On Mon, Jun 22, 2015 at 9:04 AM, Silvio Fiorito <
silvio.fior...@granturing.com> wrote:

>   Yes, just put the Cassandra connector on the Spark classpath and set
> the connector config properties in the interpreter settings.
>
>   From: Mohammed Guller
> Date: Monday, June 22, 2015 at 11:56 AM
> To: Matthew Johnson, shahid ashraf
>
> Cc: "user@spark.apache.org"
> Subject: RE: Code review - Spark SQL command-line client for Cassandra
>
>   I haven’t tried using Zeppelin with Spark on Cassandra, so can’t say
> for sure, but it should not be difficult.
>
>
>
> Mohammed
>
>
>
> *From:* Matthew Johnson [mailto:matt.john...@algomi.com
> ]
> *Sent:* Monday, June 22, 2015 2:15 AM
> *To:* Mohammed Guller; shahid ashraf
> *Cc:* user@spark.apache.org
> *Subject:* RE: Code review - Spark SQL command-line client for Cassandra
>
>
>
> Thanks Mohammed, it’s good to know I’m not alone!
>
>
>
> How easy is it to integrate Zeppelin with Spark on Cassandra? It looks
> like it would only support Hadoop out of the box. Is it just a case of
> dropping the Cassandra Connector onto the Spark classpath?
>
>
>
> Cheers,
>
> Matthew
>
>
>
> *From:* Mohammed Guller [mailto:moham...@glassbeam.com]
> *Sent:* 20 June 2015 17:27
> *To:* shahid ashraf
> *Cc:* Matthew Johnson; user@spark.apache.org
> *Subject:* RE: Code review - Spark SQL command-line client for Cassandra
>
>
>
> It is a simple Play-based web application. It exposes an URI for
> submitting a SQL query. It then executes that query using
> CassandraSQLContext provided by Spark Cassandra Connector. Since it is
> web-based, I added an authentication and authorization layer to make sure
> that only users with the right authorization can use it.
>
>
>
> I am happy to open-source that code if there is interest. Just need to
> carve out some time to clean it up and remove all the other services that
> this web application provides.
>
>
>
> Mohammed
>
>
>
> *From:* shahid ashraf [mailto:sha...@trialx.com ]
> *Sent:* Saturday, June 20, 2015 6:52 AM
> *To:* Mohammed Guller
> *Cc:* Matthew Johnson; user@spark.apache.org
> *Subject:* RE: Code review - Spark SQL command-line client for Cassandra
>
>
>
> Hi Mohammad
> Can you provide more info about the Service u developed
>
> On Jun 20, 2015 7:59 AM, "Mohammed Guller"  wrote:
>
> Hi Matthew,
>
> It looks fine to me. I have built a similar service that allows a user to
> submit a query from a browser and returns the result in JSON format.
>
>
>
> Another alternative is to leave a Spark shell or one of the notebooks
> (Spark Notebook, Zeppelin, etc.) session open and run queries from there.
> This model works only if people give you the queries to execute.
>
>
>
> Mohammed
>
>
>
> *From:* Matthew Johnson [mailto:matt.john...@algomi.com]
> *Sent:* Friday, June 19, 2015 2:20 AM
> *To:* user@spark.apache.org
> *Subject:* Code review - Spark SQL command-line client for Cassandra
>
>
>
> Hi all,
>
>
>
> I have been struggling with Cassandra’s lack of adhoc query support (I
> know this is an anti-pattern of Cassandra, but sometimes management come
> over and ask me to run stuff and it’s impossible to explain that it will
> take me a while when it would take about 10 seconds in MySQL) so I have put
> together the following code snippet that bundles DataStax’s Cassandra Spark
> connector and allows you to submit Spark SQL to it, outputting the results
> in a text file.
>
>
>
> Does anyone spot any obvious flaws in this plan?? (I have a lot more error
> handling etc in my code, but removed it here for brevity)
>
>
>
> *private**void* run(String sqlQuery) {
>
> SparkContext scc = *new* SparkContext(conf);
>
> Cass

Re: Code review - Spark SQL command-line client for Cassandra

2015-06-22 Thread Silvio Fiorito
Yes, just put the Cassandra connector on the Spark classpath and set the 
connector config properties in the interpreter settings.

From: Mohammed Guller
Date: Monday, June 22, 2015 at 11:56 AM
To: Matthew Johnson, shahid ashraf
Cc: "user@spark.apache.org<mailto:user@spark.apache.org>"
Subject: RE: Code review - Spark SQL command-line client for Cassandra

I haven’t tried using Zeppelin with Spark on Cassandra, so can’t say for sure, 
but it should not be difficult.

Mohammed

From: Matthew Johnson [mailto:matt.john...@algomi.com]
Sent: Monday, June 22, 2015 2:15 AM
To: Mohammed Guller; shahid ashraf
Cc: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: RE: Code review - Spark SQL command-line client for Cassandra

Thanks Mohammed, it’s good to know I’m not alone!

How easy is it to integrate Zeppelin with Spark on Cassandra? It looks like it 
would only support Hadoop out of the box. Is it just a case of dropping the 
Cassandra Connector onto the Spark classpath?

Cheers,
Matthew

From: Mohammed Guller 
[mailto:moham...@glassbeam.com<mailto:moham...@glassbeam.com>]
Sent: 20 June 2015 17:27
To: shahid ashraf
Cc: Matthew Johnson; user@spark.apache.org<mailto:user@spark.apache.org>
Subject: RE: Code review - Spark SQL command-line client for Cassandra

It is a simple Play-based web application. It exposes an URI for submitting a 
SQL query. It then executes that query using CassandraSQLContext provided by 
Spark Cassandra Connector. Since it is web-based, I added an authentication and 
authorization layer to make sure that only users with the right authorization 
can use it.

I am happy to open-source that code if there is interest. Just need to carve 
out some time to clean it up and remove all the other services that this web 
application provides.

Mohammed

From: shahid ashraf [mailto:sha...@trialx.com]
Sent: Saturday, June 20, 2015 6:52 AM
To: Mohammed Guller
Cc: Matthew Johnson; user@spark.apache.org<mailto:user@spark.apache.org>
Subject: RE: Code review - Spark SQL command-line client for Cassandra


Hi Mohammad
Can you provide more info about the Service u developed
On Jun 20, 2015 7:59 AM, "Mohammed Guller" 
mailto:moham...@glassbeam.com>> wrote:
Hi Matthew,
It looks fine to me. I have built a similar service that allows a user to 
submit a query from a browser and returns the result in JSON format.

Another alternative is to leave a Spark shell or one of the notebooks (Spark 
Notebook, Zeppelin, etc.) session open and run queries from there. This model 
works only if people give you the queries to execute.

Mohammed

From: Matthew Johnson 
[mailto:matt.john...@algomi.com<mailto:matt.john...@algomi.com>]
Sent: Friday, June 19, 2015 2:20 AM
To: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Code review - Spark SQL command-line client for Cassandra

Hi all,

I have been struggling with Cassandra’s lack of adhoc query support (I know 
this is an anti-pattern of Cassandra, but sometimes management come over and 
ask me to run stuff and it’s impossible to explain that it will take me a while 
when it would take about 10 seconds in MySQL) so I have put together the 
following code snippet that bundles DataStax’s Cassandra Spark connector and 
allows you to submit Spark SQL to it, outputting the results in a text file.

Does anyone spot any obvious flaws in this plan?? (I have a lot more error 
handling etc in my code, but removed it here for brevity)

privatevoid run(String sqlQuery) {
SparkContext scc = new SparkContext(conf);
CassandraSQLContext csql = new CassandraSQLContext(scc);
DataFrame sql = csql.sql(sqlQuery);
String folderName = "/tmp/output_" + System.currentTimeMillis();
LOG.info("Attempting to save SQL results in folder: " + folderName);
sql.rdd().saveAsTextFile(folderName);
LOG.info("SQL results saved");
}

publicstaticvoid main(String[] args) {

String sparkMasterUrl = args[0];
String sparkHost = args[1];
String sqlQuery = args[2];

SparkConf conf = new SparkConf();
conf.setAppName("Java Spark SQL");
conf.setMaster(sparkMasterUrl);
conf.set("spark.cassandra.connection.host", sparkHost);

JavaSparkSQL app = new JavaSparkSQL(conf);

app.run(sqlQuery, printToConsole);
}

I can then submit this to Spark with ‘spark-submit’:


>  ./spark-submit --class com.algomi.spark.JavaSparkSQL --master 
> spark://sales3:7077 
> spark-on-cassandra-0.0.1-SNAPSHOT-jar-with-dependencies.jar 
> spark://sales3:7077 sales3 "select * from mykeyspace.operationlog"

It seems to work pretty well, so I’m pretty happy, but wondering why this isn’t 
common practice (at least I haven’t been able to find much about it on Google) 
– is there something terrible that I’m missing?

Thanks!
Matthew




RE: Code review - Spark SQL command-line client for Cassandra

2015-06-22 Thread Mohammed Guller
I haven’t tried using Zeppelin with Spark on Cassandra, so can’t say for sure, 
but it should not be difficult.

Mohammed

From: Matthew Johnson [mailto:matt.john...@algomi.com]
Sent: Monday, June 22, 2015 2:15 AM
To: Mohammed Guller; shahid ashraf
Cc: user@spark.apache.org
Subject: RE: Code review - Spark SQL command-line client for Cassandra

Thanks Mohammed, it’s good to know I’m not alone!

How easy is it to integrate Zeppelin with Spark on Cassandra? It looks like it 
would only support Hadoop out of the box. Is it just a case of dropping the 
Cassandra Connector onto the Spark classpath?

Cheers,
Matthew

From: Mohammed Guller 
[mailto:moham...@glassbeam.com<mailto:moham...@glassbeam.com>]
Sent: 20 June 2015 17:27
To: shahid ashraf
Cc: Matthew Johnson; user@spark.apache.org<mailto:user@spark.apache.org>
Subject: RE: Code review - Spark SQL command-line client for Cassandra

It is a simple Play-based web application. It exposes an URI for submitting a 
SQL query. It then executes that query using CassandraSQLContext provided by 
Spark Cassandra Connector. Since it is web-based, I added an authentication and 
authorization layer to make sure that only users with the right authorization 
can use it.

I am happy to open-source that code if there is interest. Just need to carve 
out some time to clean it up and remove all the other services that this web 
application provides.

Mohammed

From: shahid ashraf [mailto:sha...@trialx.com]
Sent: Saturday, June 20, 2015 6:52 AM
To: Mohammed Guller
Cc: Matthew Johnson; user@spark.apache.org<mailto:user@spark.apache.org>
Subject: RE: Code review - Spark SQL command-line client for Cassandra


Hi Mohammad
Can you provide more info about the Service u developed
On Jun 20, 2015 7:59 AM, "Mohammed Guller" 
mailto:moham...@glassbeam.com>> wrote:
Hi Matthew,
It looks fine to me. I have built a similar service that allows a user to 
submit a query from a browser and returns the result in JSON format.

Another alternative is to leave a Spark shell or one of the notebooks (Spark 
Notebook, Zeppelin, etc.) session open and run queries from there. This model 
works only if people give you the queries to execute.

Mohammed

From: Matthew Johnson 
[mailto:matt.john...@algomi.com<mailto:matt.john...@algomi.com>]
Sent: Friday, June 19, 2015 2:20 AM
To: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Code review - Spark SQL command-line client for Cassandra

Hi all,

I have been struggling with Cassandra’s lack of adhoc query support (I know 
this is an anti-pattern of Cassandra, but sometimes management come over and 
ask me to run stuff and it’s impossible to explain that it will take me a while 
when it would take about 10 seconds in MySQL) so I have put together the 
following code snippet that bundles DataStax’s Cassandra Spark connector and 
allows you to submit Spark SQL to it, outputting the results in a text file.

Does anyone spot any obvious flaws in this plan?? (I have a lot more error 
handling etc in my code, but removed it here for brevity)

private void run(String sqlQuery) {
SparkContext scc = new SparkContext(conf);
CassandraSQLContext csql = new CassandraSQLContext(scc);
DataFrame sql = csql.sql(sqlQuery);
String folderName = "/tmp/output_" + System.currentTimeMillis();
LOG.info("Attempting to save SQL results in folder: " + folderName);
sql.rdd().saveAsTextFile(folderName);
LOG.info("SQL results saved");
}

public static void main(String[] args) {

String sparkMasterUrl = args[0];
String sparkHost = args[1];
String sqlQuery = args[2];

SparkConf conf = new SparkConf();
conf.setAppName("Java Spark SQL");
conf.setMaster(sparkMasterUrl);
conf.set("spark.cassandra.connection.host", sparkHost);

JavaSparkSQL app = new JavaSparkSQL(conf);

app.run(sqlQuery, printToConsole);
}

I can then submit this to Spark with ‘spark-submit’:


>  ./spark-submit --class com.algomi.spark.JavaSparkSQL --master 
> spark://sales3:7077 
> spark-on-cassandra-0.0.1-SNAPSHOT-jar-with-dependencies.jar 
> spark://sales3:7077 sales3 "select * from mykeyspace.operationlog"

It seems to work pretty well, so I’m pretty happy, but wondering why this isn’t 
common practice (at least I haven’t been able to find much about it on Google) 
– is there something terrible that I’m missing?

Thanks!
Matthew




RE: Code review - Spark SQL command-line client for Cassandra

2015-06-22 Thread Matthew Johnson
Thanks Mohammed, it’s good to know I’m not alone!



How easy is it to integrate Zeppelin with Spark on Cassandra? It looks like
it would only support Hadoop out of the box. Is it just a case of dropping
the Cassandra Connector onto the Spark classpath?



Cheers,

Matthew



*From:* Mohammed Guller [mailto:moham...@glassbeam.com]
*Sent:* 20 June 2015 17:27
*To:* shahid ashraf
*Cc:* Matthew Johnson; user@spark.apache.org
*Subject:* RE: Code review - Spark SQL command-line client for Cassandra



It is a simple Play-based web application. It exposes an URI for submitting
a SQL query. It then executes that query using CassandraSQLContext provided
by Spark Cassandra Connector. Since it is web-based, I added an
authentication and authorization layer to make sure that only users with
the right authorization can use it.



I am happy to open-source that code if there is interest. Just need to
carve out some time to clean it up and remove all the other services that
this web application provides.



Mohammed



*From:* shahid ashraf [mailto:sha...@trialx.com ]
*Sent:* Saturday, June 20, 2015 6:52 AM
*To:* Mohammed Guller
*Cc:* Matthew Johnson; user@spark.apache.org
*Subject:* RE: Code review - Spark SQL command-line client for Cassandra



Hi Mohammad
Can you provide more info about the Service u developed

On Jun 20, 2015 7:59 AM, "Mohammed Guller"  wrote:

Hi Matthew,

It looks fine to me. I have built a similar service that allows a user to
submit a query from a browser and returns the result in JSON format.



Another alternative is to leave a Spark shell or one of the notebooks
(Spark Notebook, Zeppelin, etc.) session open and run queries from there.
This model works only if people give you the queries to execute.



Mohammed



*From:* Matthew Johnson [mailto:matt.john...@algomi.com]
*Sent:* Friday, June 19, 2015 2:20 AM
*To:* user@spark.apache.org
*Subject:* Code review - Spark SQL command-line client for Cassandra



Hi all,



I have been struggling with Cassandra’s lack of adhoc query support (I know
this is an anti-pattern of Cassandra, but sometimes management come over
and ask me to run stuff and it’s impossible to explain that it will take me
a while when it would take about 10 seconds in MySQL) so I have put
together the following code snippet that bundles DataStax’s Cassandra Spark
connector and allows you to submit Spark SQL to it, outputting the results
in a text file.



Does anyone spot any obvious flaws in this plan?? (I have a lot more error
handling etc in my code, but removed it here for brevity)



*private* *void* run(String sqlQuery) {

SparkContext scc = *new* SparkContext(conf);

CassandraSQLContext csql = *new* CassandraSQLContext(scc);

DataFrame sql = csql.sql(sqlQuery);

String folderName = "/tmp/output_" + System.*currentTimeMillis*();

*LOG*.info("Attempting to save SQL results in folder: " +
folderName);

sql.rdd().saveAsTextFile(folderName);

*LOG*.info("SQL results saved");

}



*public* *static* *void* main(String[] args) {



String sparkMasterUrl = args[0];

String sparkHost = args[1];

String sqlQuery = args[2];



SparkConf conf = *new* SparkConf();

conf.setAppName("Java Spark SQL");

conf.setMaster(sparkMasterUrl);

conf.set("spark.cassandra.connection.host", sparkHost);



JavaSparkSQL app = *new* JavaSparkSQL(conf);



app.run(sqlQuery, printToConsole);

}



I can then submit this to Spark with ‘spark-submit’:



Ø  *./spark-submit --class com.algomi.spark.JavaSparkSQL --master
spark://sales3:7077
spark-on-cassandra-0.0.1-SNAPSHOT-jar-with-dependencies.jar
spark://sales3:7077 sales3 "select * from mykeyspace.operationlog" *



It seems to work pretty well, so I’m pretty happy, but wondering why this
isn’t common practice (at least I haven’t been able to find much about it
on Google) – is there something terrible that I’m missing?



Thanks!

Matthew


RE: Code review - Spark SQL command-line client for Cassandra

2015-06-20 Thread Mohammed Guller
It is a simple Play-based web application. It exposes an URI for submitting a 
SQL query. It then executes that query using CassandraSQLContext provided by 
Spark Cassandra Connector. Since it is web-based, I added an authentication and 
authorization layer to make sure that only users with the right authorization 
can use it.

I am happy to open-source that code if there is interest. Just need to carve 
out some time to clean it up and remove all the other services that this web 
application provides.

Mohammed

From: shahid ashraf [mailto:sha...@trialx.com]
Sent: Saturday, June 20, 2015 6:52 AM
To: Mohammed Guller
Cc: Matthew Johnson; user@spark.apache.org
Subject: RE: Code review - Spark SQL command-line client for Cassandra


Hi Mohammad
Can you provide more info about the Service u developed
On Jun 20, 2015 7:59 AM, "Mohammed Guller" 
mailto:moham...@glassbeam.com>> wrote:
Hi Matthew,
It looks fine to me. I have built a similar service that allows a user to 
submit a query from a browser and returns the result in JSON format.

Another alternative is to leave a Spark shell or one of the notebooks (Spark 
Notebook, Zeppelin, etc.) session open and run queries from there. This model 
works only if people give you the queries to execute.

Mohammed

From: Matthew Johnson 
[mailto:matt.john...@algomi.com<mailto:matt.john...@algomi.com>]
Sent: Friday, June 19, 2015 2:20 AM
To: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Code review - Spark SQL command-line client for Cassandra

Hi all,

I have been struggling with Cassandra’s lack of adhoc query support (I know 
this is an anti-pattern of Cassandra, but sometimes management come over and 
ask me to run stuff and it’s impossible to explain that it will take me a while 
when it would take about 10 seconds in MySQL) so I have put together the 
following code snippet that bundles DataStax’s Cassandra Spark connector and 
allows you to submit Spark SQL to it, outputting the results in a text file.

Does anyone spot any obvious flaws in this plan?? (I have a lot more error 
handling etc in my code, but removed it here for brevity)

private void run(String sqlQuery) {
SparkContext scc = new SparkContext(conf);
CassandraSQLContext csql = new CassandraSQLContext(scc);
DataFrame sql = csql.sql(sqlQuery);
String folderName = "/tmp/output_" + System.currentTimeMillis();
LOG.info("Attempting to save SQL results in folder: " + folderName);
sql.rdd().saveAsTextFile(folderName);
LOG.info("SQL results saved");
}

public static void main(String[] args) {

String sparkMasterUrl = args[0];
String sparkHost = args[1];
String sqlQuery = args[2];

SparkConf conf = new SparkConf();
conf.setAppName("Java Spark SQL");
conf.setMaster(sparkMasterUrl);
conf.set("spark.cassandra.connection.host", sparkHost);

JavaSparkSQL app = new JavaSparkSQL(conf);

app.run(sqlQuery, printToConsole);
}

I can then submit this to Spark with ‘spark-submit’:


>  ./spark-submit --class com.algomi.spark.JavaSparkSQL --master 
> spark://sales3:7077 
> spark-on-cassandra-0.0.1-SNAPSHOT-jar-with-dependencies.jar 
> spark://sales3:7077 sales3 "select * from mykeyspace.operationlog"

It seems to work pretty well, so I’m pretty happy, but wondering why this isn’t 
common practice (at least I haven’t been able to find much about it on Google) 
– is there something terrible that I’m missing?

Thanks!
Matthew




RE: Code review - Spark SQL command-line client for Cassandra

2015-06-20 Thread shahid ashraf
Hi Mohammad
Can you provide more info about the Service u developed
On Jun 20, 2015 7:59 AM, "Mohammed Guller"  wrote:

>  Hi Matthew,
>
> It looks fine to me. I have built a similar service that allows a user to
> submit a query from a browser and returns the result in JSON format.
>
>
>
> Another alternative is to leave a Spark shell or one of the notebooks
> (Spark Notebook, Zeppelin, etc.) session open and run queries from there.
> This model works only if people give you the queries to execute.
>
>
>
> Mohammed
>
>
>
> *From:* Matthew Johnson [mailto:matt.john...@algomi.com]
> *Sent:* Friday, June 19, 2015 2:20 AM
> *To:* user@spark.apache.org
> *Subject:* Code review - Spark SQL command-line client for Cassandra
>
>
>
> Hi all,
>
>
>
> I have been struggling with Cassandra’s lack of adhoc query support (I
> know this is an anti-pattern of Cassandra, but sometimes management come
> over and ask me to run stuff and it’s impossible to explain that it will
> take me a while when it would take about 10 seconds in MySQL) so I have put
> together the following code snippet that bundles DataStax’s Cassandra Spark
> connector and allows you to submit Spark SQL to it, outputting the results
> in a text file.
>
>
>
> Does anyone spot any obvious flaws in this plan?? (I have a lot more error
> handling etc in my code, but removed it here for brevity)
>
>
>
> *private* *void* run(String sqlQuery) {
>
> SparkContext scc = *new* SparkContext(conf);
>
> CassandraSQLContext csql = *new* CassandraSQLContext(scc);
>
> DataFrame sql = csql.sql(sqlQuery);
>
> String folderName = "/tmp/output_" + System.*currentTimeMillis*();
>
> *LOG*.info("Attempting to save SQL results in folder: " +
> folderName);
>
> sql.rdd().saveAsTextFile(folderName);
>
> *LOG*.info("SQL results saved");
>
> }
>
>
>
> *public* *static* *void* main(String[] args) {
>
>
>
> String sparkMasterUrl = args[0];
>
> String sparkHost = args[1];
>
> String sqlQuery = args[2];
>
>
>
> SparkConf conf = *new* SparkConf();
>
> conf.setAppName("Java Spark SQL");
>
> conf.setMaster(sparkMasterUrl);
>
> conf.set("spark.cassandra.connection.host", sparkHost);
>
>
>
> JavaSparkSQL app = *new* JavaSparkSQL(conf);
>
>
>
> app.run(sqlQuery, printToConsole);
>
> }
>
>
>
> I can then submit this to Spark with ‘spark-submit’:
>
>
>
> Ø  *./spark-submit --class com.algomi.spark.JavaSparkSQL --master
> spark://sales3:7077
> spark-on-cassandra-0.0.1-SNAPSHOT-jar-with-dependencies.jar
> spark://sales3:7077 sales3 "select * from mykeyspace.operationlog" *
>
>
>
> It seems to work pretty well, so I’m pretty happy, but wondering why this
> isn’t common practice (at least I haven’t been able to find much about it
> on Google) – is there something terrible that I’m missing?
>
>
>
> Thanks!
>
> Matthew
>
>
>
>
>


RE: Code review - Spark SQL command-line client for Cassandra

2015-06-19 Thread Mohammed Guller
Hi Matthew,
It looks fine to me. I have built a similar service that allows a user to 
submit a query from a browser and returns the result in JSON format.

Another alternative is to leave a Spark shell or one of the notebooks (Spark 
Notebook, Zeppelin, etc.) session open and run queries from there. This model 
works only if people give you the queries to execute.

Mohammed

From: Matthew Johnson [mailto:matt.john...@algomi.com]
Sent: Friday, June 19, 2015 2:20 AM
To: user@spark.apache.org
Subject: Code review - Spark SQL command-line client for Cassandra

Hi all,

I have been struggling with Cassandra’s lack of adhoc query support (I know 
this is an anti-pattern of Cassandra, but sometimes management come over and 
ask me to run stuff and it’s impossible to explain that it will take me a while 
when it would take about 10 seconds in MySQL) so I have put together the 
following code snippet that bundles DataStax’s Cassandra Spark connector and 
allows you to submit Spark SQL to it, outputting the results in a text file.

Does anyone spot any obvious flaws in this plan?? (I have a lot more error 
handling etc in my code, but removed it here for brevity)

private void run(String sqlQuery) {
SparkContext scc = new SparkContext(conf);
CassandraSQLContext csql = new CassandraSQLContext(scc);
DataFrame sql = csql.sql(sqlQuery);
String folderName = "/tmp/output_" + System.currentTimeMillis();
LOG.info("Attempting to save SQL results in folder: " + folderName);
sql.rdd().saveAsTextFile(folderName);
LOG.info("SQL results saved");
}

public static void main(String[] args) {

String sparkMasterUrl = args[0];
String sparkHost = args[1];
String sqlQuery = args[2];

SparkConf conf = new SparkConf();
conf.setAppName("Java Spark SQL");
conf.setMaster(sparkMasterUrl);
conf.set("spark.cassandra.connection.host", sparkHost);

JavaSparkSQL app = new JavaSparkSQL(conf);

app.run(sqlQuery, printToConsole);
}

I can then submit this to Spark with ‘spark-submit’:


>  ./spark-submit --class com.algomi.spark.JavaSparkSQL --master 
> spark://sales3:7077 
> spark-on-cassandra-0.0.1-SNAPSHOT-jar-with-dependencies.jar 
> spark://sales3:7077 sales3 "select * from mykeyspace.operationlog"

It seems to work pretty well, so I’m pretty happy, but wondering why this isn’t 
common practice (at least I haven’t been able to find much about it on Google) 
– is there something terrible that I’m missing?

Thanks!
Matthew