Adding a data center with data already in place

2013-10-25 Thread Oleg Dulin
I am using Cassandra 1.1.11 and plan on upgrading soon, but in the meantime here is what happened. I couldn't run repairs because of a slow WAN pipe, so i removed the second data center from the cluster. Today I need to bring that data center back in. It is not 2-3 days out dated. I have

gossip marking all nodes as down when decommissioning one node.

2013-10-25 Thread John Pyeatt
We are running a 6-node cluster in amazon cloud (2 nodes in each availability zone). The ec2 instances are m1.large and we have 256 vnodes on each node. We are using Ec2Snitch, NetworkTopologyStrategy and a replication factor of 3. When we decommission one node suddenly reads and writes start to

Re: Heap almost full

2013-10-25 Thread Alain RODRIGUEZ
If you are starting with Cassandra I really advice you to start with 1.2.11 In 1.2+ bloomfilters are off-heap, you can use vnodes... I summed up the bloom filter usage reported by nodetool cfstats in all the CFs and it was under 50 MB. This is quite a small value. Is there no error in your

Cassandra SSTable deletion/load reporting question

2013-10-25 Thread Jasdeep Hundal
Does anyone have a good explanation or pointers to docs for understanding how Cassandra decides to remove SSTables from disk? After performing a large set of deletes on our cluster, a few hundred gigabytes work (essentially cleaning out nearly all old data), we noticed that nodetool reported

Re: Cassandra SSTable deletion/load reporting question

2013-10-25 Thread Robert Coli
On Fri, Oct 25, 2013 at 1:10 PM, Jasdeep Hundal dsjas...@gmail.com wrote: After performing a large set of deletes on our cluster, a few hundred gigabytes work (essentially cleaning out nearly all old data), we noticed that nodetool reported about the same load as before. Tombstones are

Re: Cassandra SSTable deletion/load reporting question

2013-10-25 Thread Jasdeep Hundal
Thanks Rob. Will checkout the tool you linked to. In our case it's definitely not the tombstones hanging around since we write entire rows at once and the amount of data in a row is far, far greater than the space a tombstone takes. Jasdeep On Fri, Oct 25, 2013 at 1:14 PM, Robert Coli

Query a datacenter

2013-10-25 Thread srmore
I don't know whether this is possible but was just curious, can you query for the data in the remote datacenter with a CL.ONE ? There could be a case where one might not have a QUORUM and would like to read the most recent data which includes the data from the other datacenter. AFAIK to reliably

Re: Query a datacenter

2013-10-25 Thread Robert Coli
On Fri, Oct 25, 2013 at 2:47 PM, srmore comom...@gmail.com wrote: I don't know whether this is possible but was just curious, can you query for the data in the remote datacenter with a CL.ONE ? A coordinator at CL.ONE picks which replica(s) to query based in large part on the dynamic snitch.

Read repair

2013-10-25 Thread Baskar Duraikannu
We are thinking through the deployment architecture for our Cassandra cluster. Let us say that we choose to deploy data across three racks. If let us say that one rack power went down for 10 mins and then it came back. As soon as it came back up, due to some human error, rack1 goes down. Now

manual read repair

2013-10-25 Thread Baskar Duraikannu
We have seen read repair take very long time even for few GBs of data even though we don't see disk or network bottlenecks. Do you use any specific configuration to speed up read repairs?