[ANN] SparkSQL support for Cassandra with Calliope

2014-10-03 Thread Rohit Rai
Hi All,

An year ago we started this journey and laid the path for Spark + Cassandra
stack. We established the ground work and direction for Spark Cassandra
connectors and we have been happy seeing the results.

With Spark 1.1.0 and SparkSQL release, we its time to take Calliope
http://tuplejump.github.io/calliope/ to the logical next level also
paving the way for much more advanced functionality to come.

Yesterday we released Calliope 1.1.0 Community Tech Preview
https://twitter.com/tuplejump/status/517739186124627968, which brings
Native SparkSQL support for Cassandra. The further details are available
here http://tuplejump.github.io/calliope/tech-preview.html.

This release showcases in core spark-sql
http://tuplejump.github.io/calliope/start-with-sql.html, hiveql
http://tuplejump.github.io/calliope/start-with-hive.html and
HiveThriftServer http://tuplejump.github.io/calliope/calliope-server.html
 support.

I differentiate it as native spark-sql integration as it doesn't rely on
Cassandra's hive connectors (like Cash or DSE) and saves a level of
indirection through Hive.

It also allows us to harness Spark's analyzer and optimizer in future to
work out the best execution plan targeting a balance between Cassandra's
querying restrictions and Sparks in memory processing.

As far as we know this it the first and only third party data store
connector for SparkSQL. This is a CTP release as it relies on Spark
internals that still don't have/stabilized a developer API and we will work
with the Spark Community in documenting the requirements and working
towards a standard and stable API for third party data store integration.

On another note, we no longer require you to signup to access the early
access code repository.

Inviting all of you try it and give us your valuable feedback.

Regards,

Rohit
*Founder  CEO, **Tuplejump, Inc.*

www.tuplejump.com
*The Data Engineering Platform*


Re: cassandra + spark / pyspark

2014-09-11 Thread Rohit Rai
Hi Oleg,

I am the creator of Calliope. Calliope doesn't force any deployment
model... that means you can run it with Mesos or Hadoop or Standalone. To
be fair I don't think the other libs mentioned here should work too.

The Spark cluster HA can be provided using ZooKeeper even in the standalone
deployment mode.


Can you explain what do you mean by in memory aggregations not being
possible. With Calliope being able to utilize the secondary indexes and
also our Stargate Indexes (Distributed lucene indexing for C*)  I am sure
we can handle any scenario. Calliope is used in production at many large
organizations over very very big data.

Feel free to mail me directly, and we can work with you to get you started.

Regards,
Rohit


*Founder  CEO, **Tuplejump, Inc.*

www.tuplejump.com
*The Data Engineering Platform*

On Thu, Sep 11, 2014 at 8:09 PM, Oleg Ruchovets oruchov...@gmail.com
wrote:

 Ok.
DataStax , Startio are required mesos, hadoop yarn other third party to
 get spark cluster HA.

 What in case of calliope?
 Is it sufficient to have cassandra + calliope + spark to be able process
 aggregations?
 In my case we have quite a lot of data so doing aggregation only in memory
 - impossible.

 Does calliope support not in memory mode for spark?

 Thanks
 Oleg.

 On Thu, Sep 11, 2014 at 9:23 PM, abhinav chowdary 
 abhinav.chowd...@gmail.com wrote:

 Adding to conversation...

 there are 3 great open source options available

 1. Calliope http://tuplejump.github.io/calliope/
 This is the first library that was out some time late last year (as i
 can recall) and I have been using this for a while, mostly very stable,
 uses Hadoop i/o in Cassandra (note that it doesn't require hadoop)

 2. Datastax spark cassandra connector
 https://github.com/datastax/spark-cassandra-connector: Main difference
 is this uses cql3, again a great library but has few issues, also is very
 actively developed by far and still uses thrift for minor stuff but all
 heavy lifting in cql3

 3. Startio Deep https://github.com/Stratio/stratio-deep: Has lot more to
 offer if you use all startio stack, Deep is for Spark, Statio Streaming is
 built on top of spark streaming, Stratio meta is something similar to
 sharkor sparksql and finally stratio Cassandra which is a fork of Cassandra
 with advanced Lucene based indexing






Re: Data locality with cash

2014-05-26 Thread Rohit Rai
Hi Jens,

Cash builds on the Cassandra hadoop handlers and thus supports data
locality.


Regards,
Rohit

*Founder  CEO, **Tuplejump, Inc.*

www.tuplejump.com
*The Data Engineering Platform*


On Wed, May 21, 2014 at 9:22 PM, Jens Rantil jens.ran...@tink.se wrote:

 Hi,

 I've had a look at the Hive plugin for Cassandra[1]. Does anyone know if
 it supports data locality if I install task trackers and job trackers on my
 Cassandra instances?

 [1] https://github.com/tuplejump/cash

 Thanks,
 Jens