Re: CentOS - Could not setup cluster(snappy error)

2014-01-02 Thread Víctor Hugo Oliveira Molinar
Thanks for all the answers. The problem was exactly the noexec setting in fstabs file. Cassandra cluster started succesfully after removing that entry. Att, *Víctor Hugo Molinar* On Mon, Dec 30, 2013 at 6:59 PM, Erik Forkalsud wrote: > > You can add something like this to cassandra-env.sh : >

Nodetool removenode => Stream stuck

2014-01-02 Thread Philippe Dupont
Hi guys, Recently a node in our ring became unreachable due to some AWS EC2 issue, and we decided to remove it using "nodetool removenode" command. As we are using Vnodes on a 28 node cluster, the removenode command generates a lot of streams between all nodes. The problem is that some stream beco

Cassandra 1.1.6 crash without any exception or error in log

2014-01-02 Thread Narendra Sharma
8 node cluster running in aws. Any pointers where I should start looking? No kill -9 in history.

Re: Opscenter Metrics

2014-01-02 Thread Nick Bailey
I believe the answer to your similar question on server fault should answer this: http://serverfault.com/questions/564107/what-does-opscenters-write-requests-count-shown-as-ops-sec-exactly-mean On Tue, Dec 31, 2013 at 12:55 AM, Arun Kumar K wrote: > Hi guys, > > I am using YCSB and using thrift

Nodetool ring

2014-01-02 Thread Vivek Mishra
Hi, I am trying to understand "Owns" here. AFAIK, it is range(part of keyspace). Not able to understand why is it shown as 100%? Is it because of effective ownership? Address RackStatus State LoadOwnsToken -3074457345618258503 x.x.x.x 3 Up Nor

Re: Cassandra 1.1.6 crash without any exception or error in log

2014-01-02 Thread Robert Coli
On Thu, Jan 2, 2014 at 8:13 AM, Narendra Sharma wrote: > 8 node cluster running in aws. Any pointers where I should start looking? > No kill -9 in history. > You should start looking at instructions as to how to upgrade to at least the top of the 1.1 line... :D =Rob

Re: Nodetool ring

2014-01-02 Thread Robert Coli
On Thu, Jan 2, 2014 at 10:20 AM, Vivek Mishra wrote: > I am trying to understand "Owns" here. AFAIK, it is range(part of > keyspace). Not able to understand why is it shown as 100%? Is it because of > effective ownership? > When RF=N, effective ownership for each node is 100%. This is almost

Re: Nodetool ring

2014-01-02 Thread Vivek Mishra
Thanks for your quick reply. Even with 2 data center with 3 data nodes each i am seeing 100% on both data center nodes. -Vivek On Fri, Jan 3, 2014 at 12:07 AM, Robert Coli wrote: > On Thu, Jan 2, 2014 at 10:20 AM, Vivek Mishra wrote: > >> I am trying to understand "Owns" here. AFAIK, it is

vnode in production

2014-01-02 Thread Arindam Barua
Hello all, Just wanted to check if anyone has any experiences to share regarding 1. the stability of vnodes in production 2. upgrading to vnodes in production We recently upgraded to 1.2.12 in production and were planning to turn on vnodes using the "adding a new datacenter" metho

Re: vnode in production

2014-01-02 Thread Russell Bradberry
VNodes in production are pretty stable. That being said, I have never heard of anyone doing a successful "nodetool shuffle”.  A few people have skirted the issue by creating a new data center with VNodes enabled and replicating the data over. On January 2, 2014 at 1:52:20 PM, Arindam Barua (a

Re: Nodetool removenode => Stream stuck

2014-01-02 Thread Robert Coli
On Thu, Jan 2, 2014 at 7:54 AM, Philippe Dupont wrote: > As we are using Vnodes on a 28 node cluster, the removenode command > generates a lot of streams between all nodes. The problem is that some > stream becomes stuck. > It's the third time we use the removenode command and each time, some > s

Re: Nodetool ring

2014-01-02 Thread Robert Coli
On Thu, Jan 2, 2014 at 10:48 AM, Vivek Mishra wrote: > Thanks for your quick reply. Even with 2 data center with 3 data nodes > each i am seeing 100% on both data center nodes. > Do you have RF=3 in both? =Rob

delay/stall processing reads

2014-01-02 Thread Thunder Stumpges
Hi all, I am seeing a read operation delay in our small (3 node) cluster where I am testing. The "normal" latency for these operations is < 2ms as recorded by our load client. This holds easily beyond several hundred qps. However there are times when all incoming queries (on a node-by-node basis)

Re: delay/stall processing reads

2014-01-02 Thread Robert Coli
On Thu, Jan 2, 2014 at 2:05 PM, Thunder Stumpges wrote: > I am seeing a read operation delay in our small (3 node) cluster where I > am testing. The "normal" latency for these operations is < 2ms as recorded > by our load client. This holds easily beyond several hundred qps. However > there are t

Re: delay/stall processing reads

2014-01-02 Thread Robert Coli
(D'oh, missed your details in the PS.. :D) I don't know whether the .NET client uses thrift or native protocol.. Re 2.0.2 in production : https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/ =Rob On Thu, Jan 2, 2014 at 2:13 PM, Robert Coli wrote: > On Thu, Jan 2, 2014 a

Re: delay/stall processing reads

2014-01-02 Thread Thunder Stumpges
Thanks Rob, we are using Cassandra 2.0.2, CQL3, native protocol. tpstats is nearly all zeros from what I can tell. Even running a load of 100rps I can only ever see 1 or 2 in the active or pending counters, never anything in the blocked. Even in the "blocked all time" column it is zero in all cases

Re: delay/stall processing reads

2014-01-02 Thread Robert Coli
On Thu, Jan 2, 2014 at 2:24 PM, Thunder Stumpges wrote: > Excuse my ignorance, but where would I look for the GC info? What logs > contain this? I will start looking for log files and more clues in them. > system.log contains some basic info, you can enable extended gc info via options to the JV

Re: delay/stall processing reads

2014-01-02 Thread Thunder Stumpges
Thanks Rob, Well from what I see in system.log it does not appear that GC aligns with this delay. Though it does seem like quite a few GCs take place. Here is my system.log around the time of the delay: INFO [ScheduledTasks:1] 2014-01-02 12:30:22,164 GCInspector.java (line 116) GC for Concurrent

Re: Cassandra 1.1.6 crash without any exception or error in log

2014-01-02 Thread Narendra Sharma
The root cause turned out to be high heap. The Linux OOM Killer ( http://linux-mm.org/OOM_Killer) killed the process. It took some time to figure out but very interesting. We knew high heap is a problem but had no clue when the actual heap usage was well within limit and the process disappeared. sy

Re: Nodetool ring

2014-01-02 Thread Vivek Mishra
Yes. On Fri, Jan 3, 2014 at 12:57 AM, Robert Coli wrote: > On Thu, Jan 2, 2014 at 10:48 AM, Vivek Mishra wrote: > >> Thanks for your quick reply. Even with 2 data center with 3 data nodes >> each i am seeing 100% on both data center nodes. >> > > Do you have RF=3 in both? > > =Rob > >

Re: delay/stall processing reads

2014-01-02 Thread Lee Mighdoll
> > Well from what I see in system.log it does not appear that GC aligns with > this delay. > Though it does seem like quite a few GCs take place. Here is my system.log > around the time of the delay: > It does sound like a lot of CMS runs - you'd like most of your garbage to be collected in new s

Re: Cassandra 1.1.6 crash without any exception or error in log

2014-01-02 Thread Nitin Sharma
I would recommend always running cassandra with -XX:+HeapDumpOnOutofMemoryError. This dumps out a *.hprof file if the process dies due to OOM You can later analyze the hprof files using Eclipse Memory Analyzer (Eclipse MAT ) to figure out root causes and potential lea

Re: Cassandra 1.1.6 crash without any exception or error in log

2014-01-02 Thread Narendra Sharma
In this case the Java/Cassandra process never ran out of memory. Rather it had 20% heap free. It is the OS that ran out of memory. This is the side effect of running with large heap. I was aware of the Java's inefficiency wrt large heap but had to keep it due to large bloomfilter. Note we are still