Re: Forming a cluster of embedded Cassandra instances

Binil Thomas Wed, 17 Feb 2016 19:45:42 -0800

Thanks for sharing your experience! I also found a similar solution in
TitanDB[1], but that also seem to be intended for development use. I think
the consensus here seems to be that one should not be embedding Cassandra
into another JVM.


> For production, we have to support single node clusters (not
> embedded though), and it has been challenging for pretty much
> all the reasons you find people saying not to do so.

What challenges did you face with single-node Cassandra deployment?

[1]:
https://github.com/thinkaurelius/titan/blob/titan10/titan-cassandra/src/main/java/com/thinkaurelius/titan/diskstorage/cassandra/utils/CassandraDaemonWrapper.java

On Sun, Feb 14, 2016 at 11:05 AM, John Sanda <john.sa...@gmail.com> wrote:

> The project I work on day to day uses an embedded instance of Cassandra,
> but it is intended for primarily for development. We embed Cassandra in a
> WildFly (i.e., JBoss) server. It is packaged and deployed as an EAR. I
> personally do not do this. I use and recommend ccm
> <https://github.com/pcmanus/ccm> for development. If you do you WildFly,
> there is also wildfly-cassandra
> <https://github.com/hawkular/wildfly-cassandra> which deploys Cassandra
> as a custom WildFly extension. In other words it is deployed in WildFly
> like other subsystems like EJB, web, etc, not like an application. There
> isn't a whole lot of active development on this, but it could be another
> option.
>
> For production, we have to support single node clusters (not embedded
> though), and it has been challenging for pretty much all the reasons you
> find people saying not to do so.
>
> As for failure detection and cluster membership changes, are you using the
> Datastax driver? You can register an event listener with the driver to
> receive notifications for those things.
>
> On Sat, Feb 13, 2016 at 6:33 PM, Jonathan Haddad <j...@jonhaddad.com>
> wrote:
>
>> +1 to what jack said. Don't mess with embedded till you understand the
>> basics of the db. You're not making your system any less complex, I'd say
>> you're most likely going to shoot yourself in the foot.
>> On Sat, Feb 13, 2016 at 2:22 PM Jack Krupansky <jack.krupan...@gmail.com>
>> wrote:
>>
>>> HA requires an odd number of replicas - 3, 5, 7 - so that split-brain
>>> can be avoided. Two nodes would not support HA. You need to be able to
>>> reach a quorum, which is defined as n/2+1 where n is the number of
>>> replicas. IOW, you cannot update the data if a quorum cannot be reached.
>>> The data on any given node needs to be replicated on at least two other
>>> nodes.
>>>
>>> Embedded Cassandra is only for extremely sophisticated developers - not
>>> those who are new to Cassandra, with a "superficial understanding".
>>>
>>> As a general proposition, you should not be running application code on
>>> Cassandra nodes.
>>>
>>> That said, if any of the senior Cassandra developers wish to personally
>>> support your efforts towards embedded clusters, they are certainly free to
>>> do so. we'll see if any of them step forward.
>>>
>>>
>>> -- Jack Krupansky
>>>
>>> On Sat, Feb 13, 2016 at 3:47 PM, Binil Thomas <
>>> binil.thomas.pub...@gmail.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> TL;DR: I have a very superficial understanding of Cassandra and am
>>>> currently evaluating it for a project.
>>>>
>>>> * Can Cassandra be embedded into another JVM application?
>>>> * Can such embedded instances form a cluster?
>>>> * Can the application use the the failure detection and cluster
>>>> membership dissemination infrastructure of embedded Cassandra?
>>>>
>>>> ----
>>>>
>>>> I am in the process of re-packaging a SaaS system written in Java to be
>>>> deployed on-premise by customers. The SaaS system currently uses AWS
>>>> DynamoDB. The data storage needs for this application are modest, but I
>>>> would like to keep the deployment complexity to a minimum. Here are three
>>>> different usecases the on-premise system should support:
>>>>
>>>> 1. single-node deployments with minimal complexity
>>>> 2. two-node HA deployments; the data and processing needs dictated by
>>>> the load on the system are well under what a single node can do, but the
>>>> second node is there to satisfy the HA requirement as a hot standby
>>>> 3. a multi-node clustered deployment, where higher operational
>>>> complexity is justified
>>>>
>>>> I am considering Cassandra for these usecases.
>>>>
>>>> For usecase #1, I hope to embed Cassandra into the same JVM as my
>>>> application. I read on the web that CassandraDaemon can be used this way.
>>>> Is that accurate? What other applications embed Cassandra this way? I
>>>> *think* JetBrains Upsource does, but do you know other ones? (Incidentally,
>>>> my Java application embeds Jetty webserver also).
>>>>
>>>> For usecase #2, I am hoping that I can deploy two instances of this
>>>> ensemble and have the embedded Cassandra instances form a cluster. If I
>>>> configure every write to be replicated on both nodes synchronously, then it
>>>> will satisfy the HA needs of this usecase. Is it feasible to form clusters
>>>> of embedded Cassandra instances?
>>>>
>>>> For usecase #3, I can form a large cluster of the ensemble where all
>>>> writes are replicated synchronously to a quorum of nodes.
>>>>
>>>> Finally, in usecase #2 and #3, I'd like to use the failure detection
>>>> and cluster membership dissemination infrastructure of Cassandra from
>>>> within my application. Is it possible to be notified of membership changes
>>>> when embedding Cassandra? I could use a separate library to do this (say,
>>>> with JGroups or Akka) but I fear that if this library and the embedded
>>>> Cassandra instances disagrees, it could lead to subtle bugs.
>>>>
>>>> Thanks,
>>>> Binil
>>>>
>>>> PS: Cross-posted at
>>>> http://stackoverflow.com/questions/35384983/forming-a-cluster-of-embedded-cassandra-instances
>>>>
>>>>
>>>
>
>
> --
>
> - John
>

Re: Forming a cluster of embedded Cassandra instances

Reply via email to