subject:"\[jira\] \[Comment Edited\] \(CASSANDRA\-7949\) LCS compaction low performance, many pending compactions, nodes are almost idle"

[jira] [Comment Edited] (CASSANDRA-7949) LCS compaction low performance, many pending compactions, nodes are almost idle

2014-10-16 Thread Nikolai Grigoriev (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174702#comment-14174702
]

Nikolai Grigoriev edited comment on CASSANDRA-7949 at 10/17/14 3:57 AM:

Update:

Using the property from CASSANDRA-6621 does help to get out of this state. My
cluster is slowly digesting the large sstables and creating bunch of nice small
sstables from them. It is slower than using sstablesplit, I believe, because it
actually does real compactions and, thus, processes and reprocesses different
sets of sstables. My understanding is that every time I get new bunch of L0
sstables there is a phase for updating other levels and it repeats and repeats.

With that property set I see that my total number of sstables grows, my number
of huge sstables decreases and the average size of the sstable decreases as
result.

My conclusions so far:

1. STCS fallback in LCS is a double-edged sword. It is needed to prevent the
flooding the node with tons of small sstables resulting from ongoing writes.
These small ones are often much smaller than the configured target size and hey
need to be merged. But also the use of STCS results in generation of the
super-sized sstables. These become a large headache when the fallback stops and
LCS is supposed to resume normal operations. It appears to me (my humble
opinion) that fallback should be done to some kind of specialized rescue STCS
flavor that merges the small sstables to approximately the LCS target sstable
size BUT DOES NOT create sstables that are much larger than the target size.
With this approach the LCS will resume normal operations much faster than the
cause for the fallback (abnormally high write load) is gone.

2. LCS has major (performance?) issue when you have super-large sstables in the
system. It often gets stuck with single long (many hours) compaction stream
that, by itself, will increase the probability of another STCS fallback even
with reasonable write load. As a possible workaround I was recommended to
consider running multiple C* instances on our relatively powerful machines - to
significantly reduce the amount of data per node and increase compaction
throughput.

3. In the existing systems, depending on the severity of the STCS fallback
work, the fix from CASSANDRA-6621 may help to recover while keeping the nodes
up. It will take a very long time to recover but the nodes will be online.

4. Recovery (see above) is very long. It is much much longer than the duration
of the stress period that causes the condition. In my case I was writing like
crazy for about 4 days and it's been over a week of compactions after that. I
am still very far from 0 pending compactions. Considering this it makes sense
to artificially throttle the write speed when generating the data (like in the
use case I described in previous comments). Extra time spent on writing the
data will be still significantly shorter than the amount of time required to
recover from the consequences of abusing the available write bandwidth.

was (Author: ngrigor...@gmail.com):
Update:

With that property set I see that my total number of sstables grows, my number
of huge sstables decreases and the average size of the sstable decreases as
result.

My conclusions so far:

[jira] [Comment Edited] (CASSANDRA-7949) LCS compaction low performance, many pending compactions, nodes are almost idle

2014-10-12 Thread Nikolai Grigoriev (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14168822#comment-14168822
]

Nikolai Grigoriev edited comment on CASSANDRA-7949 at 10/12/14 11:59 PM:
-

I did another round of testing and I can confirm my previous suspicion. If LCS
goes into STCS fallback mode there seems to be some kind of point of no
return. After loading fairly large amount of data I end up with a number of
large (from few Gb to 200+Gb) sstables. After that the cluster simply goes
downhill - it never recovers. Even if there is no traffic except the repair
service (DSE OpsCenter) the number of pending compactions never declines. It
actually grows. Sstables also grow and grow in size until the moment one of the
compactions runs out of disk space and crashes the node.

Also I believe once in this state there is no way out. sstablesplit tool, as
far as I understand, cannot be used with the live node. And the tool splits the
data in single thread. I have measured its performance on my system, it
processes about 13Mb/s on average, thus, to split all these large sstables it
would take many DAYS.

I have got an idea that might actually help. That JVM property from
CASSANDRA-6621 - it seems to be what I need right now. I have tried it and it
seems (so far) that when compacting my nodes produce only the sstables of the
target size, i.e (I may be wrong but so far it seems so) it is splitting the
large sstables into the small ones while the nodes are on. If it continues like
this I may hope to eventually get rid of mega-huge-sstables and then LCS
performance should be back to normal. Will provide an update later.

was (Author: ngrigor...@gmail.com):
I did another round of testing and I can confirm my previous suspicion. If LCS
goes into STCS fallback mode there seems to be some kind of point of no
return. After loading fairly large amount of data I end up with a number of
large (from few Gb to 200+Gb) sstables. After that the cluster simply goes
downhill - it never recovers. Even if there is no traffic except the repair
service (DSE OpsCenter) the number of pending compactions never declines. It
actually grows. Sstables also grow and grow in size until the moment one of the
compactions runs out of disk space and crashes the node.

LCS compaction low performance, many pending compactions, nodes are almost
idle
---

Key: CASSANDRA-7949
URL: https://issues.apache.org/jira/browse/CASSANDRA-7949
Project: Cassandra
Issue Type: Bug
Components: Core
Environment: DSE 4.5.1-1, Cassandra 2.0.8
Reporter: Nikolai Grigoriev
Attachments: iostats.txt, nodetool_compactionstats.txt,
nodetool_tpstats.txt, pending compactions 2day.png, system.log.gz, vmstat.txt

[jira] [Comment Edited] (CASSANDRA-7949) LCS compaction low performance, many pending compactions, nodes are almost idle

2014-09-22 Thread Marcus Eriksson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143357#comment-14143357
 ] 

Marcus Eriksson edited comment on CASSANDRA-7949 at 9/22/14 4:20 PM:
-

so, if you switch to STCS and then back to LCS and let it compact, you are 
bound to do a lot of L0 to L1 compaction in the beginning since all sstables 
are in level 0 and need to pass through L1 before making it to the higher 
levels.

L0 to L1 compactions usually include _all_ L1 sstables, this means that only 
one can proceed at a time.

Looking at your compactionstats, you have one 2TB compaction going on, probably 
between L0 and L1, that needs to finish before it can continue doing higher 
level compactions


was (Author: krummas):
so, if you switch to STCS and let it compact, you are bound to do a lot of L0 
to L1 compaction in the beginning since all sstables are in level 0 and need to 
pass through L1 before making it to the higher levels.

L0 to L1 compactions usually include _all_ L1 sstables, this means that only 
one can proceed at a time.

Looking at your compactionstats, you have one 2TB compaction going on, probably 
between L0 and L1, that needs to finish before it can continue doing higher 
level compactions

 LCS compaction low performance, many pending compactions, nodes are almost 
 idle
 ---

 Key: CASSANDRA-7949
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7949
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: DSE 4.5.1-1, Cassandra 2.0.8
Reporter: Nikolai Grigoriev
 Attachments: iostats.txt, nodetool_compactionstats.txt, 
 nodetool_tpstats.txt, pending compactions 2day.png, system.log.gz, vmstat.txt


 I've been evaluating new cluster of 15 nodes (32 core, 6x800Gb SSD disks + 
 2x600Gb SAS, 128Gb RAM, OEL 6.5) and I've built a simulator that creates the 
 load similar to the load in our future product. Before running the simulator 
 I had to pre-generate enough data. This was done using Java code and DataStax 
 Java driver. To avoid going deep into details, two tables have been 
 generated. Each table currently has about 55M rows and between few dozens and 
 few thousands of columns in each row.
 This data generation process was generating massive amount of non-overlapping 
 data. Thus, the activity was write-only and highly parallel. This is not the 
 type of the traffic that the system will have ultimately to deal with, it 
 will be mix of reads and updates to the existing data in the future. This is 
 just to explain the choice of LCS, not mentioning the expensive SSD disk 
 space.
 At some point while generating the data I have noticed that the compactions 
 started to pile up. I knew that I was overloading the cluster but I still 
 wanted the genration test to complete. I was expecting to give the cluster 
 enough time to finish the pending compactions and get ready for real traffic.
 However, after the storm of write requests have been stopped I have noticed 
 that the number of pending compactions remained constant (and even climbed up 
 a little bit) on all nodes. After trying to tune some parameters (like 
 setting the compaction bandwidth cap to 0) I have noticed a strange pattern: 
 the nodes were compacting one of the CFs in a single stream using virtually 
 no CPU and no disk I/O. This process was taking hours. After that it would be 
 followed by a short burst of few dozens of compactions running in parallel 
 (CPU at 2000%, some disk I/O - up to 10-20%) and then getting stuck again for 
 many hours doing one compaction at time. So it looks like this:
 # nodetool compactionstats
 pending tasks: 3351
   compaction typekeyspace   table   completed 
   total  unit  progress
Compaction  myks table_list1 66499295588   
 1910515889913 bytes 3.48%
 Active compaction remaining time :n/a
 # df -h
 ...
 /dev/sdb1.5T  637G  854G  43% /cassandra-data/disk1
 /dev/sdc1.5T  425G  1.1T  29% /cassandra-data/disk2
 /dev/sdd1.5T  429G  1.1T  29% /cassandra-data/disk3
 # find . -name **table_list1**Data** | grep -v snapshot | wc -l
 1310
 Among these files I see:
 1043 files of 161Mb (my sstable size is 160Mb)
 9 large files - 3 between 1 and 2Gb, 3 of 5-8Gb, 55Gb, 70Gb and 370Gb
 263 files of various sized - between few dozens of Kb and 160Mb
 I've been running the heavy load for about 1,5days and it's been close to 3 
 days after that and the number of pending compactions does not go down.
 I have applied one of the not-so-obvious recommendations to disable 
 multithreaded compactions and that seems to be helping a bit - I see some 
 nodes started to have fewer pending

[jira] [Comment Edited] (CASSANDRA-7949) LCS compaction low performance, many pending compactions, nodes are almost idle

2014-09-16 Thread Jeremiah Jordan (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14136712#comment-14136712
]

Jeremiah Jordan edited comment on CASSANDRA-7949 at 9/17/14 3:27 AM:
-

For the initial load you probably want to disable STCS in L0. CASSANDRA-6621.
Or maybe use STCS and then switch to LCS when the load is over.

But basically it's working as expected when overloaded. LCS does not deal well
with being overloaded.

was (Author: jjordan):
For the initial load you probably want to disable STCS in L0. CASSANDRA-6621.
Or maybe use STCS and then switch to LCS when the load is over.

LCS compaction low performance, many pending compactions, nodes are almost
idle
---

I've been evaluating new cluster of 15 nodes (32 core, 6x800Gb SSD disks +
2x600Gb SAS, 128Gb RAM, OEL 6.5) and I've built a simulator that creates the
load similar to the load in our future product. Before running the simulator
I had to pre-generate enough data. This was done using Java code and DataStax
Java driver. To avoid going deep into details, two tables have been
generated. Each table currently has about 55M rows and between few dozens and
few thousands of columns in each row.
This data generation process was generating massive amount of non-overlapping
data. Thus, the activity was write-only and highly parallel. This is not the
type of the traffic that the system will have ultimately to deal with, it
will be mix of reads and updates to the existing data in the future. This is
just to explain the choice of LCS, not mentioning the expensive SSD disk
space.
At some point while generating the data I have noticed that the compactions
started to pile up. I knew that I was overloading the cluster but I still
wanted the genration test to complete. I was expecting to give the cluster
enough time to finish the pending compactions and get ready for real traffic.
However, after the storm of write requests have been stopped I have noticed
that the number of pending compactions remained constant (and even climbed up
a little bit) on all nodes. After trying to tune some parameters (like
setting the compaction bandwidth cap to 0) I have noticed a strange pattern:
the nodes were compacting one of the CFs in a single stream using virtually
no CPU and no disk I/O. This process was taking hours. After that it would be
followed by a short burst of few dozens of compactions running in parallel
(CPU at 2000%, some disk I/O - up to 10-20%) and then getting stuck again for
many hours doing one compaction at time. So it looks like this:
# nodetool compactionstats
pending tasks: 3351
compaction typekeyspace table completed
total unit progress
Compaction myks table_list1 66499295588
1910515889913 bytes 3.48%
Active compaction remaining time :n/a
# df -h
...
/dev/sdb1.5T 637G 854G 43% /cassandra-data/disk1
/dev/sdc1.5T 425G 1.1T 29% /cassandra-data/disk2
/dev/sdd1.5T 429G 1.1T 29% /cassandra-data/disk3
# find . -name **table_list1**Data** | grep -v snapshot | wc -l
1310
Among these files I see:
1043 files of 161Mb (my sstable size is 160Mb)
9 large files - 3 between 1 and 2Gb, 3 of 5-8Gb, 55Gb, 70Gb and 370Gb
263 files of various sized - between few dozens of Kb and 160Mb
I've been running the heavy load for about 1,5days and it's been close to 3
days after that and the number of pending compactions does not go down.
I have applied one of the not-so-obvious recommendations to disable
multithreaded compactions and that seems to be helping a bit - I see some
nodes started to have fewer pending compactions. About half of the cluster,
in fact. But even there I see they are sitting idle most of the time lazily
compacting in one stream with CPU at ~140% and occasionally doing the bursts
of compaction work for few minutes.
I am wondering if this is really a bug or something in the LCS logic that
would manifest itself only in such an edge case scenario where I have loaded
lots of unique data quickly.
By the way, I see this pattern only for one of two tables - the one that has
about 4 times more data than another (space-wise, number of rows is the
same). Looks like all these pending compactions are really only for that
larger table.

[jira] [Comment Edited] (CASSANDRA-7949) LCS compaction low performance, many pending compactions, nodes are almost idle

[jira] [Comment Edited] (CASSANDRA-7949) LCS compaction low performance, many pending compactions, nodes are almost idle

[jira] [Comment Edited] (CASSANDRA-7949) LCS compaction low performance, many pending compactions, nodes are almost idle

[jira] [Comment Edited] (CASSANDRA-7949) LCS compaction low performance, many pending compactions, nodes are almost idle

4 matches

Site Navigation

Mail list logo

Footer information