Re: Forming a cluster of embedded Cassandra instances

Jack Krupansky Mon, 15 Feb 2016 08:26:24 -0800

But again, you could also simply spawn a process running Cassandra as-is in
its intended form which would eliminate the potential for conflict between
the app heap and Casandra's JVM heap.


-- Jack Krupansky

On Mon, Feb 15, 2016 at 12:56 AM, Jan Kesten <j.kes...@enercast.de> wrote:

> Hi,
>
> the embedded cassandra to speedup entering the project may will work for
> developers, we used it for junit. But a simple clone and maven build - I
> guess it will end in a single node cassandra cluster. Remember cassandra is
> a distributed database, one will need more than one node to get performance
> and fault tolerance. Also I would not recommend adding and removing of
> cluster nodes at high frequency with application start-stop-cycles.
>
> To help in getting things up and running, provide a small readme for
> downloading and starting cassandra. For mac and linux unpacking the tar.gz
> and running cassandra.sh is not too complicated. Or use a hint to the
> DataStax Community Edition installers. Apart from installing Java that is a
> five minute stop to a single node "TestCluster".
>
> Configuring a distributed setup is a bit more or a lot more difficult and
> definitly needs more understanding and planning.
>
> Just as a hint and offtopic: I saw people using cassandra as application
> glue for interprocess communication where every app server started a node
> (for communication, sessions and as queue and so on).  If that is
> eventually a use case - have a look at hazelcast.
>
> Jan
>
> Von meinem iPhone gesendet
>
> Am 14.02.2016 um 23:26 schrieb John Sanda <john.sa...@gmail.com>:
>
> The motivation was to make it easy for someone to get up and running
> quickly with the project. Clone the git repo, run the maven build, and then
> you are all set. It definitely does lower the learning curve for someone
> just getting started with a project and who is not really thinking about
> Cassandra. It also is convenient for non-devs who need to quickly get the
> project up and running. For development, we have people working on Linux,
> Mac OS X, and Windows. I am not a Windows user and not even sure if ccm
> works on Windows, so ccm can't be the de factor standard for development.
>
> On Sun, Feb 14, 2016 at 2:52 PM, Jack Krupansky <jack.krupan...@gmail.com>
> wrote:
>
>> What motivated the use of an embedded instance for development - as
>> opposed to simply spawning a process for Cassandra?
>>
>>
>>
>> -- Jack Krupansky
>>
>> On Sun, Feb 14, 2016 at 2:05 PM, John Sanda <john.sa...@gmail.com> wrote:
>>
>>> The project I work on day to day uses an embedded instance of Cassandra,
>>> but it is intended for primarily for development. We embed Cassandra in a
>>> WildFly (i.e., JBoss) server. It is packaged and deployed as an EAR. I
>>> personally do not do this. I use and recommend ccm
>>> <https://github.com/pcmanus/ccm> for development. If you do you
>>> WildFly, there is also wildfly-cassandra
>>> <https://github.com/hawkular/wildfly-cassandra> which deploys Cassandra
>>> as a custom WildFly extension. In other words it is deployed in WildFly
>>> like other subsystems like EJB, web, etc, not like an application. There
>>> isn't a whole lot of active development on this, but it could be another
>>> option.
>>>
>>> For production, we have to support single node clusters (not embedded
>>> though), and it has been challenging for pretty much all the reasons you
>>> find people saying not to do so.
>>>
>>> As for failure detection and cluster membership changes, are you using
>>> the Datastax driver? You can register an event listener with the driver to
>>> receive notifications for those things.
>>>
>>> On Sat, Feb 13, 2016 at 6:33 PM, Jonathan Haddad <j...@jonhaddad.com>
>>> wrote:
>>>
>>>> +1 to what jack said. Don't mess with embedded till you understand the
>>>> basics of the db. You're not making your system any less complex, I'd say
>>>> you're most likely going to shoot yourself in the foot.
>>>> On Sat, Feb 13, 2016 at 2:22 PM Jack Krupansky <
>>>> jack.krupan...@gmail.com> wrote:
>>>>
>>>>> HA requires an odd number of replicas - 3, 5, 7 - so that split-brain
>>>>> can be avoided. Two nodes would not support HA. You need to be able to
>>>>> reach a quorum, which is defined as n/2+1 where n is the number of
>>>>> replicas. IOW, you cannot update the data if a quorum cannot be reached.
>>>>> The data on any given node needs to be replicated on at least two other
>>>>> nodes.
>>>>>
>>>>> Embedded Cassandra is only for extremely sophisticated developers -
>>>>> not those who are new to Cassandra, with a "superficial understanding".
>>>>>
>>>>> As a general proposition, you should not be running application code
>>>>> on Cassandra nodes.
>>>>>
>>>>> That said, if any of the senior Cassandra developers wish to
>>>>> personally support your efforts towards embedded clusters, they are
>>>>> certainly free to do so. we'll see if any of them step forward.
>>>>>
>>>>>
>>>>> -- Jack Krupansky
>>>>>
>>>>> On Sat, Feb 13, 2016 at 3:47 PM, Binil Thomas <
>>>>> binil.thomas.pub...@gmail.com> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> TL;DR: I have a very superficial understanding of Cassandra and am
>>>>>> currently evaluating it for a project.
>>>>>>
>>>>>> * Can Cassandra be embedded into another JVM application?
>>>>>> * Can such embedded instances form a cluster?
>>>>>> * Can the application use the the failure detection and cluster
>>>>>> membership dissemination infrastructure of embedded Cassandra?
>>>>>>
>>>>>> ----
>>>>>>
>>>>>> I am in the process of re-packaging a SaaS system written in Java to
>>>>>> be deployed on-premise by customers. The SaaS system currently uses AWS
>>>>>> DynamoDB. The data storage needs for this application are modest, but I
>>>>>> would like to keep the deployment complexity to a minimum. Here are three
>>>>>> different usecases the on-premise system should support:
>>>>>>
>>>>>> 1. single-node deployments with minimal complexity
>>>>>> 2. two-node HA deployments; the data and processing needs dictated by
>>>>>> the load on the system are well under what a single node can do, but the
>>>>>> second node is there to satisfy the HA requirement as a hot standby
>>>>>> 3. a multi-node clustered deployment, where higher operational
>>>>>> complexity is justified
>>>>>>
>>>>>> I am considering Cassandra for these usecases.
>>>>>>
>>>>>> For usecase #1, I hope to embed Cassandra into the same JVM as my
>>>>>> application. I read on the web that CassandraDaemon can be used this way.
>>>>>> Is that accurate? What other applications embed Cassandra this way? I
>>>>>> *think* JetBrains Upsource does, but do you know other ones? 
>>>>>> (Incidentally,
>>>>>> my Java application embeds Jetty webserver also).
>>>>>>
>>>>>> For usecase #2, I am hoping that I can deploy two instances of this
>>>>>> ensemble and have the embedded Cassandra instances form a cluster. If I
>>>>>> configure every write to be replicated on both nodes synchronously, then 
>>>>>> it
>>>>>> will satisfy the HA needs of this usecase. Is it feasible to form 
>>>>>> clusters
>>>>>> of embedded Cassandra instances?
>>>>>>
>>>>>> For usecase #3, I can form a large cluster of the ensemble where all
>>>>>> writes are replicated synchronously to a quorum of nodes.
>>>>>>
>>>>>> Finally, in usecase #2 and #3, I'd like to use the failure detection
>>>>>> and cluster membership dissemination infrastructure of Cassandra from
>>>>>> within my application. Is it possible to be notified of membership 
>>>>>> changes
>>>>>> when embedding Cassandra? I could use a separate library to do this (say,
>>>>>> with JGroups or Akka) but I fear that if this library and the embedded
>>>>>> Cassandra instances disagrees, it could lead to subtle bugs.
>>>>>>
>>>>>> Thanks,
>>>>>> Binil
>>>>>>
>>>>>> PS: Cross-posted at
>>>>>> http://stackoverflow.com/questions/35384983/forming-a-cluster-of-embedded-cassandra-instances
>>>>>>
>>>>>>
>>>>>
>>>
>>>
>>> --
>>>
>>> - John
>>>
>>
>>
>
>
> --
>
> - John
>
>

Re: Forming a cluster of embedded Cassandra instances

Reply via email to