Re: Tableau + Spark SQL Thrift Server + Cassandra

pawan kumar Fri, 03 Apr 2015 12:38:41 -0700

@Todd,

I had looked at it yesterday. All these dependencies explained is added in
the DSE node. Do I need to include spark and DSE dependencies in the
Zeppline node?


I built zeppelin with no spark and no hadoop. To my understanding zeppelin
will send a request to a remote master at spark://
ec2-54-163-181-25.compute-1.amazonaws.com:7077  which is a dse box. This
box has all the dependencies. Do i need to install all the dependencies in
zeppline box.

Thanks,
Pawan Venugopal

On Fri, Apr 3, 2015 at 12:08 PM, Todd Nist <tsind...@gmail.com> wrote:

> @Pawan,
>
> So it's been a couple of months since I have had a chance to do anything
> with Zeppelin, but here is a link to a post on what I did to get it working
> https://groups.google.com/forum/#!topic/zeppelin-developers/mCNdyOXNikI.
> This may or may not work with the newer releases from Zeppelin.
>
> -Todd
>
> On Fri, Apr 3, 2015 at 3:02 PM, pawan kumar <pkv...@gmail.com> wrote:
>
>> Hi Todd,
>>
>> Thanks for the help. So i was able to get the DSE working with tableau as
>> per the link provided by Mohammed. Now i trying to figure out if i could
>> write sparksql queries from tableau and get data from DSE. My end goal is
>> to get a web based tool where i could write sql queries which will pull
>> data from cassandra.
>>
>> With Zeppelin I was able to build and run it in EC2 but not sure if
>> configurations are right. I am pointing to a spark master which is a remote
>> DSE node and all spark and sparksql dependencies are in the remote node. I
>> am not sure if i need to install spark and its dependencies in the webui
>> (zepplene) node.
>>
>> I am not sure talking about zepplelin in this thread is right.
>>
>> Thanks once again for all the help.
>>
>> Thanks,
>> Pawan Venugopal
>>
>>
>> On Fri, Apr 3, 2015 at 11:48 AM, Todd Nist <tsind...@gmail.com> wrote:
>>
>>> @Pawan
>>>
>>> Not sure if you have seen this or not, but here is a good example by
>>> Jonathan Lacefield of Datastax's on hooking up sparksql with DSE, adding
>>> Tableau is as simple as Mohammed stated with DSE.
>>> https://github.com/jlacefie/sparksqltest.
>>>
>>> HTH,
>>> Todd
>>>
>>> On Fri, Apr 3, 2015 at 2:39 PM, Todd Nist <tsind...@gmail.com> wrote:
>>>
>>>> Hi Mohammed,
>>>>
>>>> Not sure if you have tried this or not.  You could try using the below
>>>> api to start the thriftserver with an existing context.
>>>>
>>>>
>>>> https://github.com/apache/spark/blob/master/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2.scala#L42
>>>>
>>>> The one thing that Michael Ambrust @ databrick recommended was this:
>>>>
>>>>> You can start a JDBC server with an existing context.  See my answer
>>>>> here:
>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Standard-SQL-tool-access-to-SchemaRDD-td20197.html
>>>>
>>>> So something like this based on example from Cheng Lian:
>>>>
>>>> *Server*
>>>>
>>>> import  org.apache.spark.sql.hive.HiveContext
>>>> import  org.apache.spark.sql.catalyst.types._
>>>>
>>>> val  sparkContext  =  sc
>>>> import  sparkContext._
>>>> val  sqlContext  =  new  HiveContext(sparkContext)
>>>> import  sqlContext._
>>>> makeRDD((1,"hello") :: (2,"world") 
>>>> ::Nil).toSchemaRDD.cache().registerTempTable("t")
>>>> // replace the above with the C* + spark-casandra-connectore to generate 
>>>> SchemaRDD and registerTempTable
>>>>
>>>> import  org.apache.spark.sql.hive.thriftserver._
>>>> HiveThriftServer2.startWithContext(sqlContext)
>>>>
>>>> Then Startup
>>>>
>>>> ./bin/beeline -u jdbc:hive2://localhost:10000/default
>>>> 0: jdbc:hive2://localhost:10000/default> select * from t;
>>>>
>>>>
>>>> I have not tried this yet from Tableau.   My understanding is that the
>>>> tempTable is only valid as long as the sqlContext is, so if one terminates
>>>> the code representing the *Server*, and then restarts the standard
>>>> thrift server, sbin/start-thriftserver ..., the table won't be available.
>>>>
>>>> Another possibility is to perhaps use the tuplejump cash project,
>>>> https://github.com/tuplejump/cash.
>>>>
>>>> HTH.
>>>>
>>>> -Todd
>>>>
>>>> On Fri, Apr 3, 2015 at 11:11 AM, pawan kumar <pkv...@gmail.com> wrote:
>>>>
>>>>> Thanks mohammed. Will give it a try today. We would also need the
>>>>> sparksSQL piece as we are migrating our data store from oracle to C* and 
>>>>> it
>>>>> would be easier to maintain all the reports rather recreating each one 
>>>>> from
>>>>> scratch.
>>>>>
>>>>> Thanks,
>>>>> Pawan Venugopal.
>>>>> On Apr 3, 2015 7:59 AM, "Mohammed Guller" <moham...@glassbeam.com>
>>>>> wrote:
>>>>>
>>>>>>  Hi Todd,
>>>>>>
>>>>>>
>>>>>>
>>>>>> We are using Apache C* 2.1.3, not DSE. We got Tableau to work
>>>>>> directly with C* using the ODBC driver, but now would like to add Spark 
>>>>>> SQL
>>>>>> to the mix. I haven’t been able to find any documentation for how to make
>>>>>> this combination work.
>>>>>>
>>>>>>
>>>>>>
>>>>>> We are using the Spark-Cassandra-Connector in our applications, but
>>>>>> haven’t been able to figure out how to get the Spark SQL Thrift Server to
>>>>>> use it and connect to C*. That is the missing piece. Once we solve that
>>>>>> piece of the puzzle then Tableau should be able to see the tables in C*.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Hi Pawan,
>>>>>>
>>>>>> Tableau + C* is pretty straight forward, especially if you are using
>>>>>> DSE. Create a new DSN in Tableau using the ODBC driver that comes with 
>>>>>> DSE.
>>>>>> Once you connect, Tableau allows to use C* keyspace as schema and column
>>>>>> families as tables.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Mohammed
>>>>>>
>>>>>>
>>>>>>
>>>>>> *From:* pawan kumar [mailto:pkv...@gmail.com]
>>>>>> *Sent:* Friday, April 3, 2015 7:41 AM
>>>>>> *To:* Todd Nist
>>>>>> *Cc:* user@spark.apache.org; Mohammed Guller
>>>>>> *Subject:* Re: Tableau + Spark SQL Thrift Server + Cassandra
>>>>>>
>>>>>>
>>>>>>
>>>>>> Hi Todd,
>>>>>>
>>>>>> Thanks for the link. I would be interested in this solution. I am
>>>>>> using DSE for cassandra. Would you provide me with info on connecting 
>>>>>> with
>>>>>> DSE either through Tableau or zeppelin. The goal here is query cassandra
>>>>>> through spark sql so that I could perform joins and groupby on my 
>>>>>> queries.
>>>>>> Are you able to perform spark sql queries with tableau?
>>>>>>
>>>>>> Thanks,
>>>>>> Pawan Venugopal
>>>>>>
>>>>>> On Apr 3, 2015 5:03 AM, "Todd Nist" <tsind...@gmail.com> wrote:
>>>>>>
>>>>>> What version of Cassandra are you using?  Are you using DSE or the
>>>>>> stock Apache Cassandra version?  I have connected it with DSE, but have 
>>>>>> not
>>>>>> attempted it with the standard Apache Cassandra version.
>>>>>>
>>>>>>
>>>>>>
>>>>>> FWIW,
>>>>>> http://www.datastax.com/dev/blog/datastax-odbc-cql-connector-apache-cassandra-datastax-enterprise,
>>>>>> provides an ODBC driver tor accessing C* from Tableau.  Granted it does 
>>>>>> not
>>>>>> provide all the goodness of Spark.  Are you attempting to leverage the
>>>>>> spark-cassandra-connector for this?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Apr 2, 2015 at 10:20 PM, Mohammed Guller <
>>>>>> moham...@glassbeam.com> wrote:
>>>>>>
>>>>>> Hi –
>>>>>>
>>>>>>
>>>>>>
>>>>>> Is anybody using Tableau to analyze data in Cassandra through the
>>>>>> Spark SQL Thrift Server?
>>>>>>
>>>>>>
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>>
>>>>>>
>>>>>> Mohammed
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Tableau + Spark SQL Thrift Server + Cassandra

Reply via email to