remember: you get concurrent mode failures, when the old gen fills up with garbage before it can finish the CMS. so adding capacity = reducing load per machine is the easiest way to make this a non-issue.
On Wed, Jun 2, 2010 at 12:45 PM, Eric Halpern <e...@dnagamesinc.com> wrote: > > > Ryan King wrote: >> >> Why run with so few nodes? >> >> -ryan >> >> On Tue, Jun 1, 2010 at 4:20 PM, Eric Halpern <e...@dnagamesinc.com> wrote: >>> >>> Hello, >>> >>> We're running a 4 node cluster on beefy EC2 virtual instances (8 core, 32 >>> GB) using EBS storage with 8 GB of heap allocated to the JVM. >>> >>> Every couple of hours, each of the nodes does a concurrent mark/sweep >>> that >>> takes around 30 seconds to complete. During that GC, the node >>> temporarily >>> drops out of the cluster, usually for about 15 seconds. >>> >>> The frequency of the concurrent mark sweeps seems reasonable, but the >>> fact >>> that the node drops out of the cluster temporarily is a major problem >>> since >>> this has significant impact on the performance and stability of our >>> service. >>> >>> Has anyone experienced this sort of problem? It would be great to hear >>> from >>> anyone who has had experience with this sort of issue and/or suggestions >>> for >>> how to deal with it. >>> >>> Thanks, Eric >>> -- >> >> > > We wanted to start with a small number of nodes to test things out before > going big. Is there some reason that a small cluster would cause more > problems in this regard. The actual request load is actually pretty light > for the cluster. > -- > View this message in context: > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Nodes-dropping-out-of-cluster-due-to-GC-tp5128481p5132279.html > Sent from the cassandra-u...@incubator.apache.org mailing list archive at > Nabble.com. > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com