RE: Error during startup - java.lang.OutOfMemoryError: unable to create new native thread

2013-09-09 Thread Viktor Jevdokimov
For start: - check (cassandra-env.sh) -Xss size, you may need to increase it for your JVM; - check (cassandra-env.sh) -Xms and -Xmx size, you may need to increase it for your data load/bloom filter/index sizes. Best regards / Pagarbiai Viktor Jevdokimov Senior Developer [Adform News]

/proc/sys/vm/zone_reclaim_mode

2013-09-09 Thread Takenori Sato
Hi, I am investigating NUMA issues. I have been aware that bin/cassandra tries to use interleave all policy if available. https://issues.apache.org/jira/browse/CASSANDRA-2594 https://issues.apache.org/jira/browse/CASSANDRA-3245 So what about /proc/sys/vm/zone_reclaim_mode? Any recommendations?

Re: Best way to track backups/delays for cross DC replication

2013-09-09 Thread srmore
I would be interested to know that too, it would be great if anyone can share how they do (or do not) track or monitor cross datacenter migrations. Thanks ! On Wed, Sep 4, 2013 at 10:13 AM, Anand Somani wrote: > Hi, > > Scenario is a cluster spanning across datacenters and we use Local_quorum

Error during startup - java.lang.OutOfMemoryError: unable to create new native thread

2013-09-09 Thread srmore
I have a 5 node cluster with a load of around 300GB each. A node went down and does not come up. I can see the following exception in the logs. ERROR [main] 2013-09-09 21:50:56,117 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[main,5,main] java.lang.OutOfMemoryError: una

Re: Long running nodetool move operation

2013-09-09 Thread Robert Coli
On Mon, Sep 9, 2013 at 7:08 PM, Ike Walker wrote: > I've been using nodetool move to rebalance my cluster. Most of the moves > take under an hour, or a few hours at most. The current move has taken 4+ > days so I'm afraid it will never complete. What's the best way to cancel it > and try again?

Long running nodetool move operation

2013-09-09 Thread Ike Walker
I've been using nodetool move to rebalance my cluster. Most of the moves take under an hour, or a few hours at most. The current move has taken 4+ days so I'm afraid it will never complete. What's the best way to cancel it and try again? I'm running a cluster of 12 nodes at AWS. Each node runs

Re: SchemaDisagreementError when launching a new Cassandra (1.2.2) cluster ?

2013-09-09 Thread Alex Heneveld
Robert, Many thanks. Yes, it looks like a bug in 1.2.2. So far (6 runs) v 1.2.9 is acting as I had expected. (BTW re schema, I'm not defining anything myself so it is just the default/empty schema for which I was getting disagreeing versions.) Can I confirm I'm following best practice:

Re: One node out of three not flushing memtables

2013-09-09 Thread Laing, Michael
I have seen something similar. Of course correlation is not causation... Like you, doing testing with heavy writes. I was using a python client to drive the writes using the cql module which is thrift based. The correlation I eventually tracked down was that whichever node my python client(s) c

Re: Cassandra crashes

2013-09-09 Thread Jan Algermissen
Hi John, On 10.09.2013, at 01:06, John Sanda wrote: > Check your file limits - > http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html?pagename=docs&version=1.2&file=#cassandra/troubleshooting/trblshootInsufficientResources_r.html Did that already - without success. Meanwhil

One node out of three not flushing memtables

2013-09-09 Thread Jan Algermissen
I have a strange pattern: In a cluster with three equally dimensioned and configured nodes I keep loosing one because apparently it fails to flush its memtables: http://twitpic.com/dcrtel It is a different node every time. So far I understand that I should expect to see the chain-saw graph wh

Re: Cassandra crashes

2013-09-09 Thread John Sanda
Check your file limits - http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html?pagename=docs&version=1.2&file=#cassandra/troubleshooting/trblshootInsufficientResources_r.html On Friday, September 6, 2013, Jan Algermissen wrote: > > On 06.09.2013, at 13:12, Alex Major > > wrote: >

Re: Streaming never completes during nodetool rebuild

2013-09-09 Thread Robert Coli
On Mon, Sep 9, 2013 at 12:28 PM, Paulo Motta wrote: > I've been trying to add a new data center to our Cassandra 1.1.10 cluster > for the last few days, but I've been unable to successfully rebuild the > nodes on the new DC due to streaming problems. > There are some upstream streaming fixes in 1

Re: making sure 1 copy per availability zone(rack) using EC2Snitch

2013-09-09 Thread Robert Coli
On Mon, Sep 9, 2013 at 8:56 AM, rash aroskar wrote: > I am planning my new cassandra 1.2.5 cluster with all nodes in single > region but divided among 2 availablity zones equally. I want to make sure > with replication factor 2 I get 1 copy in every availability zone. As per > my knowledge using p

Streaming never completes during nodetool rebuild

2013-09-09 Thread Paulo Motta
Hello, I've been trying to add a new data center to our Cassandra 1.1.10 cluster for the last few days, but I've been unable to successfully rebuild the nodes on the new DC due to streaming problems. I have followed the procedure described in http://www.datastax.com/docs/1.1/cluster_management#ad

Re: making sure 1 copy per availability zone(rack) using EC2Snitch

2013-09-09 Thread rash aroskar
Thanks for quick response Rob. Are you suggesting deploying 1.2.9 only if using Cassandra "DC" outside of EC2 or if I wish to use rack replication at all? On Mon, Sep 9, 2013 at 12:43 PM, Robert Coli wrote: > On Mon, Sep 9, 2013 at 8:56 AM, rash aroskar wrote: > >> I am planning my new cassandr

Re: FSReadError

2013-09-09 Thread David McNelis
Looks to be the case, getting an IO error when trying to cp the file. That is unfortunate. On the bright side, now we at least have a more narrow scope of the problem's source. On Mon, Sep 9, 2013 at 12:54 PM, Robert Coli wrote: > On Mon, Sep 9, 2013 at 6:15 AM, David McNelis wrote: > >> FSR

Re: FSReadError

2013-09-09 Thread Robert Coli
On Mon, Sep 9, 2013 at 6:15 AM, David McNelis wrote: > FSReadError in > /var/cassandra/data/et/http_request/ks-mycql3table-ic-1799-Data.db > > > Any suggestions on taking care of this? Should I just delete the sstable and > repair? The node keeps starting compactions and running into this afte

Re: SchemaDisagreementError when launching a new Cassandra (1.2.2) cluster ?

2013-09-09 Thread Robert Coli
On Mon, Sep 9, 2013 at 8:07 AM, Alex Heneveld < alex.henev...@cloudsoftcorp.com> wrote: > The problem occurs in about 1 in 4 launches when I start a 2-node cluster, > where the two machines are configured identically with both nodes as the > seeds (apart from the listen_address being different). O

making sure 1 copy per availability zone(rack) using EC2Snitch

2013-09-09 Thread rash aroskar
Hello, I am planning my new cassandra 1.2.5 cluster with all nodes in single region but divided among 2 availablity zones equally. I want to make sure with replication factor 2 I get 1 copy in every availability zone. As per my knowledge using placement strategy EC2Snitch should take care of this.

SchemaDisagreementError when launching a new Cassandra (1.2.2) cluster ?

2013-09-09 Thread Alex Heneveld
Hi folks, I'm occasionally seeing SchemaDisagreementError on the boot of a *new* cluster. I'm hoping someone can explain what I'm doing wrong, or help me track down the bug if it is one. The problem occurs in about 1 in 4 launches when I start a 2-node cluster, where the two machines are c

FSReadError

2013-09-09 Thread David McNelis
Morning, I'm getting the following error (21 node cluster running 1.2.8) FSReadError in /var/cassandra/data/et/http_request/ks-mycql3table-ic-1799-Data.db at org.apache.cassandra.io.compress.CompressedRandomAccessReader.reBuffer(CompressedRandomAccessReader.java:93) at org.apache.cas