Re: How does a healthy node look like?

aaron morton Mon, 06 May 2013 01:44:30 -0700

Confirm if your write timeouts are client side socket time outs or the 
TimedOutException from the server.


Typically write latency is related to GC problems, like you are seeing. 

I'm unsure how much CPU resources each cassandra instance has. Is there one 
node on a machine with 6 cores ? 
How many rows are on the node and how wide are the rows ? cfstats or 
cfhistgrams will help. 
Enable the gc logging, or use something like Data Stax OpsCentre, to see how 
low the heap gets after a CMS GC.

> The write-timeouts correlate with the hours of high (ca. >450/h) "GC for 
> ParNew". I never saw any read-timeouts. I set all timeouts to 20 seconds in 
> cassandra.yaml.
That'll do it. 

> To do so we iterate over all rows in the three time-line column families and 
> load the value of the column that is most recent given a cut-off timestamp.
…
> Every night we delete all events that are older than 2 days. Again in batches 
> of 100 rows.
Are you deleting rows from the CF's that you then do a range slice on ?
The tombstones may be hurting you on the range scans, can you remove them ? 

Hope that helps. 

-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 3/05/2013, at 9:25 PM, Steppacher Ralf 
<ralf.steppac...@derivativepartners.com> wrote:

> Sure, I can do that.  
> 
> My main concern is write latency and the write timeouts we are experiencing. 
> Read latency is secondary, as long as we do not introduce timeouts on read 
> and do not exceed our sampling intervals (see below).
> 
> We are running Cassandra 1.2.1 on Ubuntu 12.04 with JDK 1.7.0_17 (64bit).
> The hardware is virtual but so far we are the only tenant on the physical 
> host. 
> 
> Hardware:
> - 1x6 cores with 2.3GHz 
> - 30GB RAM 
> - 1 physical disk for both the tx log and the data files
> - 2 x 1GB Ethernet combined into one virtual interface
> 
> Cassandra Config:
> Cassandra runs with 
> - 7.5GB of heap and 
> - 600MB of new gen space
> as calculated by the cassandra-env script.
> I have adjusted all cassandra.yaml settings where clear guidance is given, 
> e.g. <factor> x <num_cores>.
> I have tried to increase and decrease heap (between 6 and 8GB) and new gen 
> size (between 300 and 1.1GB).
> I have tried compaction_throughput_mb_per_sec values between 16 and 48.
> I have disabled key caches.
> 
> Unfortunately Cassandra has to share the host with other Java processes, the 
> most resource demanding being ActiveMQ 5.8.
> 
> Log Output:
> Over the course of a day (08:00 to 22:00) I see in the logs
> - 280 and 760 "GC for ParNew" per hour (most around 300/h)
> - 60 and 180 "Completed flushing" per hour (most around 100/h)
> - 17 and 46 "Compacted N sstables to" per hour (most around 35/h)
> 
> Data Model:
> The data model is made up of 6 column families. 3 are dynamic to capture the 
> time-line of 3 event types; each event creates a new column and the value is 
> the row key of the event. 3 have a static schema and store the event itself.
> The largest event messages has 16 attributes. All are short text identifiers, 
> floating point numbers and timestamps. For storage in Cassandra every 
> attribute is converted to a string and stored with the utf8 validator.
> 
> Timeouts and Memory pressure:
> The write-timeouts correlate with the hours of high (ca. >450/h) "GC for 
> ParNew". I never saw any read-timeouts. I set all timeouts to 20 seconds in 
> cassandra.yaml.
> Cassandra comes under memory pressure ("Flushing CFS X to relieve memory 
> pressure") between 3 and 5 times a day. The tendency is for it to happen in 
> the afternoon and evening. But also sometimes right after 08:00 in the 
> morning. In about 75% of the cases it flushes one of the event column 
> families, in 25% a time-line column family.
> 
> Write Load:
> We collect events for a theoretical universe of 2.2 million items -> there 
> are a  max of 2.2 million rows in each of the time-line column families, but 
> I never saw an estimated row count in the cfstats of more than 1 million.
> Roughly 1/3 of the entities receive a maximum of 3 events, one of each event 
> type, in a 15 minutes interval from 08:00 to 22:00. The other 2/3 receive 3 
> events 3 times a day. About 16'000 entities receive only one event type, but 
> about once in 3 minutes. 
> On a typical day the load adds up to about 70 to 80 million messages.
> Not all messages are original though. The sources will re-send an event in 
> every interval if there are no new events. The noise ratio I do not know. I 
> guestimate it to be at least 50%. In case of a repeat the existing time-line 
> column and event row are updated with their previous values.
> 
> Read Load:
> In one hour intervals we sample a time coherent snapshot of the events. To do 
> so we iterate over all rows in the three time-line column families and load 
> the value of the column that is most recent given a cut-off timestamp. The 
> value is the row key of the actual event, which we then load as well. We do 
> that in batches of 100 rows at a time.
> 
> Deletes:
> Every night we delete all events that are older than 2 days. Again in batches 
> of 100 rows.
> 
> 
> Thanks for helping!
> Ralf
> 
> 
> From: Alain RODRIGUEZ [arodr...@gmail.com]
> Sent: Thursday, May 02, 2013 09:12
> To: user@cassandra.apache.org
> Subject: Re: How does a healthy node look like?
> 
> Well, maybe should you describe us your hardware and the C* release toi are 
> using. Also give us some metrics.
> Le 30 avr. 2013 18:48, "Steppacher Ralf" 
> <ralf.steppac...@derivativepartners.com> a écrit :
> Hi,
> 
> I have troubles finding some quantitative information as to how a healthy 
> Cassandra node should look like (CPU usage, number of flushes,SSTables, 
> compactions, GC), given a certain hardware spec and read/write load. I have 
> troubles gauging our first and only Cassandra node, whether it needs tuning 
> or is simply overloaded.
> If anyone could point me to some data that would be very helpful.
> 
> (So far I have run the node with the default settings in cassandra.yaml and 
> cassandra-env. The log claims that the server is occasionally under memory 
> pressure and I get frequent timeouts for writes.  I see what I think are many 
> flushes, compactions and GCs in the log. Some toying with heap and new gen 
> sizes, key cache, and the compaction throughput settings did not improve the 
> overall situation much.)
> 
> 
> Thanks!
> Ralf

Re: How does a healthy node look like?

Reply via email to