Re: Is it a memory issue?
Yes, it does mean you’re getting ahead of Cassandra’s ability to keep up although I would have probably expected a higher number of pending compactions before you got serious issues (I’ve seen numbers in the thousands). I notice from the screenshot you provide that you are using secondary indexes. There are a lot of way to missuse secondary indexes (vs not very many way to use them well). I think it’s possible that what you are seeing is the result of the secondary index on event time (I assume a very high cardinality column). This is a good blog on secondary indexes: http://www.wentnet.com/blog/?p=77 Cheers Ben On Mon, 7 Nov 2016 at 16:29 wxn...@zjqunshuo.com wrote: > Thanks Ben. I stopped inserting and checked compaction status as you > mentioned. Seems there is lots of compaction work waiting to do. Please see > below. In this case is it a sign that writting faster than C* can process? > > One node, > [root@iZbp11zpafrqfsiys90kzoZ bin]# ./nodetool compactionstats > pending tasks: 195 > > id compaction type keyspace > tablecompleted totalunit progress > >5da60b10-a4a9-11e6-88e9-755b5673a02aCompaction cargts > eventdata.eventdata_event_time_idx 1699866872 26536427792 bytes > 6.41% > >Compaction system > hints 103543795172210360 bytes 0.20% > Active compaction remaining time : 0h29m48s > > Another node, > [root@iZbp1iqnrpsdhoodwii32bZ bin]# ./nodetool compactionstats > pending tasks: 84 > > id compaction type keyspace > table completed totalunit progress > >28a9d010-a4a7-11e6-b985-979fea8d6099Compaction cargts > eventdata 6561414001424412420 bytes 46.06% > >7c034840-a48e-11e6-b985-979fea8d6099Compaction cargts > eventdata.eventdata_event_time_idx 32098562606 42616107664 bytes > 75.32% > Active compaction remaining time : 0h11m12s > > > *From:* Ben Slater > *Date:* 2016-11-07 11:41 > *To:* user > *Subject:* Re: Is it a memory issue? > > This sounds to me like your writes go ahead of compactions trying to keep > up which can eventually cause issues. Keep an eye on nodetool > compactionstats if the number of compactions continually climbs then you > are writing faster than Cassandra can actually process. If this is > happening then you need to either add more processing capacity (nodes) to > your cluster or throttle writes on the client side. > > It could also be related to conditions like an individual partition > growing too big but I’d check for backed up compactions first. > > Cheers > Ben > > On Mon, 7 Nov 2016 at 14:17 wxn...@zjqunshuo.com > wrote: > > Hi All, > We have one issue on C* testing. At first the inserting was very fast and > TPS was about 30K/s, but when the size of data rows reached 2 billion, the > insertion rate decreased very badly and the TPS was 20K/s. When the size of > rows reached 2.3 billion, the TPS decreased to 0.5K/s, and writing timeout > come out. At last OOM issue happened in some nodes and C* deamon in some > nodes crashed. In production we have about 8 billion rows. My testing > cluster setting is as below. My question is if the memory is the main > issue. Do I need increase the memory, and what's the right setting for > MAX_HEAP_SIZE > and HEAP_NEWSIZE? > > My cluster setting: > C* cluster with 3 nodes in Aliyun Cloud > CPU: 4core > Memory: 8G > Disk: 500G > MAX_HEAP_SIZE=2G > HEAP_NEWSIZE=500M > > My table schema: > > CREATE KEYSPACE IF NOT EXISTS cargts WITH REPLICATION = {'class': > 'SimpleStrategy','replication_factor':2}; > use cargts; > CREATE TABLE eventdata ( > deviceId int, > date int, > event_time bigint, > lat decimal, > lon decimal, > speed int, > heading int, > PRIMARY KEY ((deviceId,date),event_time) > ) > WITH CLUSTERING ORDER BY (event_time ASC); > CREATE INDEX ON eventdata (event_time); > > Best Regards, > -Simon Wu > >
Re: Is it a memory issue?
Thanks Ben. I stopped inserting and checked compaction status as you mentioned. Seems there is lots of compaction work waiting to do. Please see below. In this case is it a sign that writting faster than C* can process? One node, [root@iZbp11zpafrqfsiys90kzoZ bin]# ./nodetool compactionstats pending tasks: 195 id compaction type keyspace tablecompleted totalunit progress 5da60b10-a4a9-11e6-88e9-755b5673a02aCompaction cargts eventdata.eventdata_event_time_idx 1699866872 26536427792 bytes 6.41% Compaction system hints 103543795172210360 bytes 0.20% Active compaction remaining time : 0h29m48s Another node, [root@iZbp1iqnrpsdhoodwii32bZ bin]# ./nodetool compactionstats pending tasks: 84 id compaction type keyspace table completed totalunit progress 28a9d010-a4a7-11e6-b985-979fea8d6099Compaction cargts eventdata 6561414001424412420 bytes 46.06% 7c034840-a48e-11e6-b985-979fea8d6099Compaction cargts eventdata.eventdata_event_time_idx 32098562606 42616107664 bytes 75.32% Active compaction remaining time : 0h11m12s From: Ben Slater Date: 2016-11-07 11:41 To: user Subject: Re: Is it a memory issue? This sounds to me like your writes go ahead of compactions trying to keep up which can eventually cause issues. Keep an eye on nodetool compactionstats if the number of compactions continually climbs then you are writing faster than Cassandra can actually process. If this is happening then you need to either add more processing capacity (nodes) to your cluster or throttle writes on the client side. It could also be related to conditions like an individual partition growing too big but I’d check for backed up compactions first. Cheers Ben On Mon, 7 Nov 2016 at 14:17 wxn...@zjqunshuo.com wrote: Hi All, We have one issue on C* testing. At first the inserting was very fast and TPS was about 30K/s, but when the size of data rows reached 2 billion, the insertion rate decreased very badly and the TPS was 20K/s. When the size of rows reached 2.3 billion, the TPS decreased to 0.5K/s, and writing timeout come out. At last OOM issue happened in some nodes and C* deamon in some nodes crashed. In production we have about 8 billion rows. My testing cluster setting is as below. My question is if the memory is the main issue. Do I need increase the memory, and what's the right setting for MAX_HEAP_SIZE and HEAP_NEWSIZE? My cluster setting: C* cluster with 3 nodes in Aliyun Cloud CPU: 4core Memory: 8G Disk: 500G MAX_HEAP_SIZE=2G HEAP_NEWSIZE=500M My table schema: CREATE KEYSPACE IF NOT EXISTS cargts WITH REPLICATION = {'class': 'SimpleStrategy','replication_factor':2}; use cargts; CREATE TABLE eventdata ( deviceId int, date int, event_time bigint, lat decimal, lon decimal, speed int, heading int, PRIMARY KEY ((deviceId,date),event_time) ) WITH CLUSTERING ORDER BY (event_time ASC); CREATE INDEX ON eventdata (event_time); Best Regards, -Simon Wu
Re: Is it a memory issue?
This sounds to me like your writes go ahead of compactions trying to keep up which can eventually cause issues. Keep an eye on nodetool compactionstats if the number of compactions continually climbs then you are writing faster than Cassandra can actually process. If this is happening then you need to either add more processing capacity (nodes) to your cluster or throttle writes on the client side. It could also be related to conditions like an individual partition growing too big but I’d check for backed up compactions first. Cheers Ben On Mon, 7 Nov 2016 at 14:17 wxn...@zjqunshuo.com wrote: > Hi All, > We have one issue on C* testing. At first the inserting was very fast and > TPS was about 30K/s, but when the size of data rows reached 2 billion, the > insertion rate decreased very badly and the TPS was 20K/s. When the size of > rows reached 2.3 billion, the TPS decreased to 0.5K/s, and writing timeout > come out. At last OOM issue happened in some nodes and C* deamon in some > nodes crashed. In production we have about 8 billion rows. My testing > cluster setting is as below. My question is if the memory is the main > issue. Do I need increase the memory, and what's the right setting for > MAX_HEAP_SIZE > and HEAP_NEWSIZE? > > My cluster setting: > C* cluster with 3 nodes in Aliyun Cloud > CPU: 4core > Memory: 8G > Disk: 500G > MAX_HEAP_SIZE=2G > HEAP_NEWSIZE=500M > > My table schema: > > CREATE KEYSPACE IF NOT EXISTS cargts WITH REPLICATION = {'class': > 'SimpleStrategy','replication_factor':2}; > use cargts; > CREATE TABLE eventdata ( > deviceId int, > date int, > event_time bigint, > lat decimal, > lon decimal, > speed int, > heading int, > PRIMARY KEY ((deviceId,date),event_time) > ) > WITH CLUSTERING ORDER BY (event_time ASC); > CREATE INDEX ON eventdata (event_time); > > Best Regards, > -Simon Wu > >
Is it a memory issue?
Hi All, We have one issue on C* testing. At first the inserting was very fast and TPS was about 30K/s, but when the size of data rows reached 2 billion, the insertion rate decreased very badly and the TPS was 20K/s. When the size of rows reached 2.3 billion, the TPS decreased to 0.5K/s, and writing timeout come out. At last OOM issue happened in some nodes and C* deamon in some nodes crashed. In production we have about 8 billion rows. My testing cluster setting is as below. My question is if the memory is the main issue. Do I need increase the memory, and what's the right setting for MAX_HEAP_SIZE and HEAP_NEWSIZE? My cluster setting: C* cluster with 3 nodes in Aliyun Cloud CPU: 4core Memory: 8G Disk: 500G MAX_HEAP_SIZE=2G HEAP_NEWSIZE=500M My table schema: CREATE KEYSPACE IF NOT EXISTS cargts WITH REPLICATION = {'class': 'SimpleStrategy','replication_factor':2}; use cargts; CREATE TABLE eventdata ( deviceId int, date int, event_time bigint, lat decimal, lon decimal, speed int, heading int, PRIMARY KEY ((deviceId,date),event_time) ) WITH CLUSTERING ORDER BY (event_time ASC); CREATE INDEX ON eventdata (event_time); Best Regards, -Simon Wu