Re: Bursts of Thrift threads make cluster unresponsive
> Is there an order in which the events you described happened, or is the order with which you presented them the order you notice things going wrong? At first, threads count (Thrift) start increasing. After 2 or 3 minutes they consume all CPU cores. After that, simultaneously: message drops occur, read latency increases, active read tasks are noticed. пт, 28 июн. 2019 г. в 01:40, Avinash Mandava : > Yeah i skimmed too fast, don't add more work if CPU is pegged, and if > using thrift protocol NTR would not have values. > > Is there an order in which the events you described happened, or is the > order with which you presented them the order you notice things going > wrong? > > On Thu, Jun 27, 2019 at 1:29 PM Dmitry Simonov > wrote: > >> Thanks for your reply! >> >> > Have you tried increasing concurrent reads until you see more activity >> in disk? >> When problem occurs, freshly created 1.2k - 2k Thrift threads consume all >> CPU on all cores. >> Does increasing concurrent reads may help in this situation? >> >> > >> org.apache.cassandra.metrics.type=ThreadPools.path=transport.scope=Native-Transport-Requests.name=TotalBlockedTasks.Count >> This metric is 0 at all cluster nodes. >> >> пт, 28 июн. 2019 г. в 00:34, Avinash Mandava : >> >>> Have you tried increasing concurrent reads until you see more activity >>> in disk? If you've always got 32 active reads and high pending reads it >>> could just be dropping the reads because the queues are saturated. Could be >>> artificially bottlenecking at the C* process level. >>> >>> Also what does this metric show over time: >>> >>> >>> org.apache.cassandra.metrics.type=ThreadPools.path=transport.scope=Native-Transport-Requests.name=TotalBlockedTasks.Count >>> >>> >>> >>> On Thu, Jun 27, 2019 at 1:52 AM Dmitry Simonov >>> wrote: >>> >>>> Hello! >>>> >>>> We've met several times the following problem. >>>> >>>> Cassandra cluster (5 nodes) becomes unresponsive for ~30 minutes: >>>> - all CPUs have 100% load (normally we have LA 5 on 16-cores machine) >>>> - cassandra's threads count raises from 300 to 1300 - 2000,most of them >>>> are Thrift threads in java.net.SocketInputStream.socketRead0(Native >>>> Method) method, count of other threads doesn't increase >>>> - some Read messages are dropped >>>> - read latency (p99.9) increases to 20-30 seconds >>>> - there are up to 32 active Read Tasks, up to 3k - 6k pending Read Tasks >>>> >>>> Problem starts synchronously on all nodes of cluster. >>>> I cannot tie this problem with increased load from clients ("read rate" >>>> does't increase during the problem). >>>> Also looks like there is no problem with disks (I/O latencies are OK). >>>> >>>> Could anybody please give some advice in further troubleshooting? >>>> >>>> -- >>>> Best Regards, >>>> Dmitry Simonov >>>> >>> >>> >>> -- >>> www.vorstella.com >>> 408 691 8402 >>> >> >> >> -- >> Best Regards, >> Dmitry Simonov >> > > > -- > www.vorstella.com > 408 691 8402 > -- Best Regards, Dmitry Simonov
Re: Bursts of Thrift threads make cluster unresponsive
Thanks for your reply! > Have you tried increasing concurrent reads until you see more activity in disk? When problem occurs, freshly created 1.2k - 2k Thrift threads consume all CPU on all cores. Does increasing concurrent reads may help in this situation? > org.apache.cassandra.metrics.type=ThreadPools.path=transport.scope=Native-Transport-Requests.name=TotalBlockedTasks.Count This metric is 0 at all cluster nodes. пт, 28 июн. 2019 г. в 00:34, Avinash Mandava : > Have you tried increasing concurrent reads until you see more activity in > disk? If you've always got 32 active reads and high pending reads it could > just be dropping the reads because the queues are saturated. Could be > artificially bottlenecking at the C* process level. > > Also what does this metric show over time: > > > org.apache.cassandra.metrics.type=ThreadPools.path=transport.scope=Native-Transport-Requests.name=TotalBlockedTasks.Count > > > > On Thu, Jun 27, 2019 at 1:52 AM Dmitry Simonov > wrote: > >> Hello! >> >> We've met several times the following problem. >> >> Cassandra cluster (5 nodes) becomes unresponsive for ~30 minutes: >> - all CPUs have 100% load (normally we have LA 5 on 16-cores machine) >> - cassandra's threads count raises from 300 to 1300 - 2000,most of them >> are Thrift threads in java.net.SocketInputStream.socketRead0(Native >> Method) method, count of other threads doesn't increase >> - some Read messages are dropped >> - read latency (p99.9) increases to 20-30 seconds >> - there are up to 32 active Read Tasks, up to 3k - 6k pending Read Tasks >> >> Problem starts synchronously on all nodes of cluster. >> I cannot tie this problem with increased load from clients ("read rate" >> does't increase during the problem). >> Also looks like there is no problem with disks (I/O latencies are OK). >> >> Could anybody please give some advice in further troubleshooting? >> >> -- >> Best Regards, >> Dmitry Simonov >> > > > -- > www.vorstella.com > 408 691 8402 > -- Best Regards, Dmitry Simonov
Bursts of Thrift threads make cluster unresponsive
Hello! We've met several times the following problem. Cassandra cluster (5 nodes) becomes unresponsive for ~30 minutes: - all CPUs have 100% load (normally we have LA 5 on 16-cores machine) - cassandra's threads count raises from 300 to 1300 - 2000,most of them are Thrift threads in java.net.SocketInputStream.socketRead0(Native Method) method, count of other threads doesn't increase - some Read messages are dropped - read latency (p99.9) increases to 20-30 seconds - there are up to 32 active Read Tasks, up to 3k - 6k pending Read Tasks Problem starts synchronously on all nodes of cluster. I cannot tie this problem with increased load from clients ("read rate" does't increase during the problem). Also looks like there is no problem with disks (I/O latencies are OK). Could anybody please give some advice in further troubleshooting? -- Best Regards, Dmitry Simonov
cqlsh COPY ... TO ... doesn't work if one node down
Hello! I have cassandra cluster with 5 nodes. There is a (relatively small) keyspace X with RF5. One node goes down. Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 10.0.0.82 253.64 MB 256 100.0% 839bef9d-79af-422c-a21f-33bdcf4493c1 rack1 UN 10.0.0.154 255.92 MB 256 100.0% ce23f3a7-67d2-47c0-9ece-7a5dd67c4105 rack1 UN 10.0.0.76 461.26 MB 256 100.0% c8e18603-0ede-43f0-b713-3ff47ad92323 rack1 UN 10.0.0.94 575.78 MB 256 100.0% 9a324dbc-5ae1-4788-80e4-d86dcaae5a4c rack1 DN 10.0.0.47 ? 256 100.0% 7b628ca2-4e47-457a-ba42-5191f7e5374b rack1 I try to export some data using COPY TO, but it fails after long retries. Why does it fail? How can I make a copy? There must be 4 copies of each row on other (alive) replicas. cqlsh 10.0.0.154 -e "COPY X.Y TO 'backup/X.Y' WITH NUMPROCESSES=1" Using 1 child processes Starting copy of X.Y with columns [key, column1, value]. 2018-06-29 19:12:23,661 Failed to create connection pool for new host 10.0.0.47: Traceback (most recent call last): File "/usr/lib/foobar/lib/python3.5/site-packages/cassandra/cluster.py", line 2476, in run_add_or_renew_pool new_pool = HostConnection(host, distance, self) File "/usr/lib/foobar/lib/python3.5/site-packages/cassandra/pool.py", line 332, in __init__ self._connection = session.cluster.connection_factory(host.address) File "/usr/lib/foobar/lib/python3.5/site-packages/cassandra/cluster.py", line 1205, in connection_factory return self.connection_class.factory(address, self.connect_timeout, *args, **kwargs) File "/usr/lib/foobar/lib/python3.5/site-packages/cassandra/connection.py", line 332, in factory conn = cls(host, *args, **kwargs) File "/usr/lib/foobar/lib/python3.5/site-packages/cassandra/io/asyncorereactor.py", line 344, in __init__ self._connect_socket() File "/usr/lib/foobar/lib/python3.5/site-packages/cassandra/connection.py", line 371, in _connect_socket raise socket.error(sockerr.errno, "Tried connecting to %s. Last error: %s" % ([a[4] for a in addresses], sockerr.strerror or sockerr)) OSError: [Errno None] Tried connecting to [('10.0.0.47', 9042)]. Last error: timed out 2018-06-29 19:12:23,665 Host 10.0.0.47 has been marked down 2018-06-29 19:12:29,674 Error attempting to reconnect to 10.0.0.47, scheduling retry in 2.0 seconds: [Errno None] Tried connecting to [('10.0.0.47', 9042)]. Last error: timed out 2018-06-29 19:12:36,684 Error attempting to reconnect to 10.0.0.47, scheduling retry in 4.0 seconds: [Errno None] Tried connecting to [('10.0.0.47', 9042)]. Last error: timed out 2018-06-29 19:12:45,696 Error attempting to reconnect to 10.0.0.47, scheduling retry in 8.0 seconds: [Errno None] Tried connecting to [('10.0.0.47', 9042)]. Last error: timed out 2018-06-29 19:12:58,716 Error attempting to reconnect to 10.0.0.47, scheduling retry in 16.0 seconds: [Errno None] Tried connecting to [('10.0.0.47', 9042)]. Last error: timed out 2018-06-29 19:13:19,756 Error attempting to reconnect to 10.0.0.47, scheduling retry in 32.0 seconds: [Errno None] Tried connecting to [('10.0.0.47', 9042)]. Last error: timed out 2018-06-29 19:13:56,834 Error attempting to reconnect to 10.0.0.47, scheduling retry in 64.0 seconds: [Errno None] Tried connecting to [('10.0.0.47', 9042)]. Last error: timed out 2018-06-29 19:15:05,887 Error attempting to reconnect to 10.0.0.47, scheduling retry in 128.0 seconds: [Errno None] Tried connecting to [('10.0.0.47', 9042)]. Last error: timed out 2018-06-29 19:17:18,982 Error attempting to reconnect to 10.0.0.47, scheduling retry in 256.0 seconds: [Errno None] Tried connecting to [('10.0.0.47', 9042)]. Last error: timed out 2018-06-29 19:21:40,064 Error attempting to reconnect to 10.0.0.47, scheduling retry in 512.0 seconds: [Errno None] Tried connecting to [('10.0.0.47', 9042)]. Last error: timed out :1:(4, 'Interrupted system call') IOError: IOError: IOError: IOError: IOError: -- Best Regards, Dmitry Simonov
Re: Network problems during repair make it hang on "Wait for validation to complete"
In the previous message, I have pasted source code from cassandra 2.2.8 by mistake. Re-checked for 2.2.11 source. These lines are the same. 2018-06-21 2:49 GMT+05:00 Dmitry Simonov : > Hello! > > Using Cassandra 2.2.11, I observe behaviour, that is very similar to > https://issues.apache.org/jira/browse/CASSANDRA-12860 > > Steps to reproduce: > 1. Set up a cluster: ccm create five -v 2.2.11 && ccm populate -n 5 > --vnodes && ccm start > 2. Import some keyspace into it (approx 50 Mb of data) > 3. Start repair on one node: ccm node2 nodetool repair KEYSPACE > 4. While repair is still running, disconnect node3: sudo iptables -I > INPUT -p tcp -d 127.0.0.3 -j DROP > 5. This repair hangs. > 6. Restore network connectivity > 7. Repair is still hanging. > 8. Following repairs will also hang. > > In tpstats I see tasks that make no progress: > > $ for i in {1..5}; do echo node$i; ccm node$i nodetool tpstats | grep > "Repair#"; done > node1 > Repair#1 1 2255 1 > 0 0 > node2 > Repair#1 1 2335 26 > 0 0 > node3 > node4 > Repair#3 1 147 2175 > 0 0 > node5 > Repair#1 1 2335 17 > 0 0 > > In jconsole I see that Repair threads are blocked here: > > Name: Repair#1:1 > State: WAITING on > com.google.common.util.concurrent.AbstractFuture$Sync@73c5ab7e > Total blocked: 0 Total waited: 242 > > Stack trace: > sun.misc.Unsafe.park(Native Method) > java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) > com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:285) > com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) > com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:135) > com.google.common.util.concurrent.Futures.getUnchecked(Futures.java:1371) > org.apache.cassandra.repair.RepairJob.run(RepairJob.java:167) > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > java.lang.Thread.run(Thread.java:748) > > > According to the source code, they are waiting for validations to complete: > > # > ./apache-cassandra-2.2.8-src/src/java/org/apache/cassandra/repair/RepairJob.java > 74 public void run() > 75 { > ... > 166 // Wait for validation to complete > 167 Futures.getUnchecked(validations); > > > https://issues.apache.org/jira/browse/CASSANDRA-11824 says that problem > was fixed in 2.2.7, but I use 2.2.11. > > Restart of all Cassandra nodes that have hanging tasks (one-by-one) allows > these tasks to disappear from tpstats. After that repairs work well (until > next network problem). > > I also suppose that long GC times on one node (as well as network issues) > during repair may also lead to the same problem. > > Is it a known issue? > > -- > Best Regards, > Dmitry Simonov > -- Best Regards, Dmitry Simonov
Network problems during repair make it hang on "Wait for validation to complete"
Hello! Using Cassandra 2.2.11, I observe behaviour, that is very similar to https://issues.apache.org/jira/browse/CASSANDRA-12860 Steps to reproduce: 1. Set up a cluster: ccm create five -v 2.2.11 && ccm populate -n 5 --vnodes && ccm start 2. Import some keyspace into it (approx 50 Mb of data) 3. Start repair on one node: ccm node2 nodetool repair KEYSPACE 4. While repair is still running, disconnect node3: sudo iptables -I INPUT -p tcp -d 127.0.0.3 -j DROP 5. This repair hangs. 6. Restore network connectivity 7. Repair is still hanging. 8. Following repairs will also hang. In tpstats I see tasks that make no progress: $ for i in {1..5}; do echo node$i; ccm node$i nodetool tpstats | grep "Repair#"; done node1 Repair#1 1 2255 1 0 0 node2 Repair#1 1 2335 26 0 0 node3 node4 Repair#3 1 147 2175 0 0 node5 Repair#1 1 2335 17 0 0 In jconsole I see that Repair threads are blocked here: Name: Repair#1:1 State: WAITING on com.google.common.util.concurrent.AbstractFuture$Sync@73c5ab7e Total blocked: 0 Total waited: 242 Stack trace: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:285) com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:135) com.google.common.util.concurrent.Futures.getUnchecked(Futures.java:1371) org.apache.cassandra.repair.RepairJob.run(RepairJob.java:167) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) java.lang.Thread.run(Thread.java:748) According to the source code, they are waiting for validations to complete: # ./apache-cassandra-2.2.8-src/src/java/org/apache/cassandra/repair/RepairJob.java 74 public void run() 75 { ... 166 // Wait for validation to complete 167 Futures.getUnchecked(validations); https://issues.apache.org/jira/browse/CASSANDRA-11824 says that problem was fixed in 2.2.7, but I use 2.2.11. Restart of all Cassandra nodes that have hanging tasks (one-by-one) allows these tasks to disappear from tpstats. After that repairs work well (until next network problem). I also suppose that long GC times on one node (as well as network issues) during repair may also lead to the same problem. Is it a known issue? -- Best Regards, Dmitry Simonov
Re: Many SSTables only on one node
Hi, Evelyn! I've found the following messages: INFO RepairRunnable.java Starting repair command #41, repairing keyspace XXX with repair options (parallelism: parallel, primary range: false, incremental: false, job threads: 1, ColumnFamilies: [YYY], dataCenters: [], hosts: [], # of ranges: 768) INFO CompactionExecutor:6 CompactionManager.java Starting anticompaction for XXX.YYY on 5132/5846 sstables After that many similar messages go: SSTable BigTableReader(path='/mnt/cassandra/data/XXX/YYY-4c12fd9029e611e8810ac73ddacb37d1/lb-12688-big-Data.db') fully contained in range (-9223372036854775808,-9223372036854775808], mutating repairedAt instead of anticompacting Does it means that anti-compaction is not the cause? 2018-04-05 18:01 GMT+05:00 Evelyn Smith : > It might not be what cause it here. But check your logs for > anti-compactions. > > > On 5 Apr 2018, at 8:35 pm, Dmitry Simonov wrote: > > Thank you! > I'll check this out. > > 2018-04-05 15:00 GMT+05:00 Alexander Dejanovski : > >> 40 pending compactions is pretty high and you should have way less than >> that most of the time, otherwise it means that compaction is not keeping up >> with your write rate. >> >> If you indeed have SSDs for data storage, increase your compaction >> throughput to 100 or 200 (depending on how the CPUs handle the load). You >> can experiment with compaction throughput using : nodetool >> setcompactionthroughput 100 >> >> You can raise the number of concurrent compactors as well and set it to a >> value between 4 and 6 if you have at least 8 cores and CPUs aren't >> overwhelmed. >> >> I'm not sure why you ended up with only one node having 6k SSTables and >> not the others, but you should apply the above changes so that you can >> lower the number of pending compactions and see if it prevents the issue >> from happening again. >> >> Cheers, >> >> >> On Thu, Apr 5, 2018 at 11:33 AM Dmitry Simonov >> wrote: >> >>> Hi, Alexander! >>> >>> SizeTieredCompactionStrategy is used for all CFs in problematic keyspace. >>> Current compaction throughput is 16 MB/s (default value). >>> >>> We always have about 40 pending and 2 active "CompactionExecutor" tasks >>> in "tpstats". >>> Mostly because of another (bigger) keyspace in this cluster. >>> But the situation is the same on each node. >>> >>> According to "nodetool compactionhistory", compactions on this CF run >>> (sometimes several times per day, sometimes one time per day, the last run >>> was yesterday). >>> We run "repair -full" regulary for this keyspace (every 24 hours on each >>> node), because gc_grace_seconds is set to 24 hours. >>> >>> Should we consider increasing compaction throughput and >>> "concurrent_compactors" (as recommended for SSDs) to keep >>> "CompactionExecutor" pending tasks low? >>> >>> 2018-04-05 14:09 GMT+05:00 Alexander Dejanovski >>> : >>> >>>> Hi Dmitry, >>>> >>>> could you tell us which compaction strategy that table is currently >>>> using ? >>>> Also, what is the compaction max throughput and is auto-compaction >>>> correctly enabled on that node ? >>>> >>>> Did you recently run repair ? >>>> >>>> Thanks, >>>> >>>> On Thu, Apr 5, 2018 at 10:53 AM Dmitry Simonov >>>> wrote: >>>> >>>>> Hello! >>>>> >>>>> Could you please give some ideas on the following problem? >>>>> >>>>> We have a cluster with 3 nodes, running Cassandra 2.2.11. >>>>> >>>>> We've recently discovered high CPU usage on one cluster node, after >>>>> some investigation we found that number of sstables for one CF on it is >>>>> very big: 5800 sstables, on other nodes: 3 sstable. >>>>> >>>>> Data size in this keyspace was not very big ~100-200Mb per node. >>>>> >>>>> There is no such problem with other CFs of that keyspace. >>>>> >>>>> nodetool compact solved the issue as a quick-fix. >>>>> >>>>> But I'm wondering, what was the cause? How prevent it from repeating? >>>>> >>>>> -- >>>>> Best Regards, >>>>> Dmitry Simonov >>>>> >>>> -- >>>> - >>>> Alexander Dejanovski >>>> France >>>> @alexanderdeja >>>> >>>> Consultant >>>> Apache Cassandra Consulting >>>> http://www.thelastpickle.com >>>> >>> >>> >>> >>> -- >>> Best Regards, >>> Dmitry Simonov >>> >> -- >> - >> Alexander Dejanovski >> France >> @alexanderdeja >> >> Consultant >> Apache Cassandra Consulting >> http://www.thelastpickle.com >> > > > > -- > Best Regards, > Dmitry Simonov > > > -- Best Regards, Dmitry Simonov
Re: Many SSTables only on one node
Thank you! I'll check this out. 2018-04-05 15:00 GMT+05:00 Alexander Dejanovski : > 40 pending compactions is pretty high and you should have way less than > that most of the time, otherwise it means that compaction is not keeping up > with your write rate. > > If you indeed have SSDs for data storage, increase your compaction > throughput to 100 or 200 (depending on how the CPUs handle the load). You > can experiment with compaction throughput using : nodetool > setcompactionthroughput 100 > > You can raise the number of concurrent compactors as well and set it to a > value between 4 and 6 if you have at least 8 cores and CPUs aren't > overwhelmed. > > I'm not sure why you ended up with only one node having 6k SSTables and > not the others, but you should apply the above changes so that you can > lower the number of pending compactions and see if it prevents the issue > from happening again. > > Cheers, > > > On Thu, Apr 5, 2018 at 11:33 AM Dmitry Simonov > wrote: > >> Hi, Alexander! >> >> SizeTieredCompactionStrategy is used for all CFs in problematic keyspace. >> Current compaction throughput is 16 MB/s (default value). >> >> We always have about 40 pending and 2 active "CompactionExecutor" tasks >> in "tpstats". >> Mostly because of another (bigger) keyspace in this cluster. >> But the situation is the same on each node. >> >> According to "nodetool compactionhistory", compactions on this CF run >> (sometimes several times per day, sometimes one time per day, the last run >> was yesterday). >> We run "repair -full" regulary for this keyspace (every 24 hours on each >> node), because gc_grace_seconds is set to 24 hours. >> >> Should we consider increasing compaction throughput and >> "concurrent_compactors" (as recommended for SSDs) to keep >> "CompactionExecutor" pending tasks low? >> >> 2018-04-05 14:09 GMT+05:00 Alexander Dejanovski : >> >>> Hi Dmitry, >>> >>> could you tell us which compaction strategy that table is currently >>> using ? >>> Also, what is the compaction max throughput and is auto-compaction >>> correctly enabled on that node ? >>> >>> Did you recently run repair ? >>> >>> Thanks, >>> >>> On Thu, Apr 5, 2018 at 10:53 AM Dmitry Simonov >>> wrote: >>> >>>> Hello! >>>> >>>> Could you please give some ideas on the following problem? >>>> >>>> We have a cluster with 3 nodes, running Cassandra 2.2.11. >>>> >>>> We've recently discovered high CPU usage on one cluster node, after >>>> some investigation we found that number of sstables for one CF on it is >>>> very big: 5800 sstables, on other nodes: 3 sstable. >>>> >>>> Data size in this keyspace was not very big ~100-200Mb per node. >>>> >>>> There is no such problem with other CFs of that keyspace. >>>> >>>> nodetool compact solved the issue as a quick-fix. >>>> >>>> But I'm wondering, what was the cause? How prevent it from repeating? >>>> >>>> -- >>>> Best Regards, >>>> Dmitry Simonov >>>> >>> -- >>> - >>> Alexander Dejanovski >>> France >>> @alexanderdeja >>> >>> Consultant >>> Apache Cassandra Consulting >>> http://www.thelastpickle.com >>> >> >> >> >> -- >> Best Regards, >> Dmitry Simonov >> > -- > - > Alexander Dejanovski > France > @alexanderdeja > > Consultant > Apache Cassandra Consulting > http://www.thelastpickle.com > -- Best Regards, Dmitry Simonov
Re: Many SSTables only on one node
Hi, Alexander! SizeTieredCompactionStrategy is used for all CFs in problematic keyspace. Current compaction throughput is 16 MB/s (default value). We always have about 40 pending and 2 active "CompactionExecutor" tasks in "tpstats". Mostly because of another (bigger) keyspace in this cluster. But the situation is the same on each node. According to "nodetool compactionhistory", compactions on this CF run (sometimes several times per day, sometimes one time per day, the last run was yesterday). We run "repair -full" regulary for this keyspace (every 24 hours on each node), because gc_grace_seconds is set to 24 hours. Should we consider increasing compaction throughput and "concurrent_compactors" (as recommended for SSDs) to keep "CompactionExecutor" pending tasks low? 2018-04-05 14:09 GMT+05:00 Alexander Dejanovski : > Hi Dmitry, > > could you tell us which compaction strategy that table is currently using ? > Also, what is the compaction max throughput and is auto-compaction > correctly enabled on that node ? > > Did you recently run repair ? > > Thanks, > > On Thu, Apr 5, 2018 at 10:53 AM Dmitry Simonov > wrote: > >> Hello! >> >> Could you please give some ideas on the following problem? >> >> We have a cluster with 3 nodes, running Cassandra 2.2.11. >> >> We've recently discovered high CPU usage on one cluster node, after some >> investigation we found that number of sstables for one CF on it is very >> big: 5800 sstables, on other nodes: 3 sstable. >> >> Data size in this keyspace was not very big ~100-200Mb per node. >> >> There is no such problem with other CFs of that keyspace. >> >> nodetool compact solved the issue as a quick-fix. >> >> But I'm wondering, what was the cause? How prevent it from repeating? >> >> -- >> Best Regards, >> Dmitry Simonov >> > -- > - > Alexander Dejanovski > France > @alexanderdeja > > Consultant > Apache Cassandra Consulting > http://www.thelastpickle.com > -- Best Regards, Dmitry Simonov
Many SSTables only on one node
Hello! Could you please give some ideas on the following problem? We have a cluster with 3 nodes, running Cassandra 2.2.11. We've recently discovered high CPU usage on one cluster node, after some investigation we found that number of sstables for one CF on it is very big: 5800 sstables, on other nodes: 3 sstable. Data size in this keyspace was not very big ~100-200Mb per node. There is no such problem with other CFs of that keyspace. nodetool compact solved the issue as a quick-fix. But I'm wondering, what was the cause? How prevent it from repeating? -- Best Regards, Dmitry Simonov
Re: "READ messages were dropped ... for internal timeout" after big amount of writes
Thank you for the recommendation! Most of pending compactions are for another (~100 times larger) keyspace. They are always running in the background. 2018-03-16 13:28 GMT+05:00 Nicolas Guyomar : > Hi, > > You also have 62 pending compactions at the same time, which is odd for > such a small dataset IHMO, are you triggering 'nodetool compact' with some > kind of cron you may have forgot after a test or something else ? > Do you have any monitoring in place ? If not, you could let some 'dstat > -tnrvl 10' for a while and look for inconsistency (huge I/O wait at some > point, blocked proc etc) > > > > > On 16 March 2018 at 07:33, Dmitry Simonov wrote: > >> Hello! >> >> We are experiencing problems with Cassandra 2.2.8. >> There is a cluster with 3 nodes. >> Problematic keyspace has RF=3 and contains 3 tables (current table sizes: >> 1Gb, 700Mb, 12Kb). >> >> Several times per day there are bursts of "READ messages were dropped ... >> for internal timeout" messages in logs (on every cassandra node). Duration: >> 5 - 15 minutes. >> >> During periods of drops there is always a queue of pending ReadStage >> tasks: >> >> Pool NameActive Pending Completed Blocked All >> time blocked >> ReadStage3267 2976548410 0 >> 0 >> CompactionExecutor262 802136 0 >> 0 >> >> Others Active and Pending counters of tpstats are 0. >> >> During drops iostat says there is no read requests to disks, probably >> because all data fits in a disk cache: >> >> avg-cpu: %user %nice %system %iowait %steal %idle >> 56,530,94 39,840,010,002,68 >> >> Device: rrqm/s wrqm/s r/s w/srMB/swMB/s avgrq-sz >> avgqu-sz await r_await w_await svctm %util >> sda 0,0011,000,00 26,00 0,00 9,09 715,92 >> 0,78 30,310,00 30,31 2,46 6,40 >> sdb 0,0011,000,00 33,00 0,0010,57 655,70 >> 0,83 26,000,00 26,00 2,00 6,60 >> sdc 0,00 1,000,00 30,50 0,0010,98 737,07 >> 0,91 30,490,00 30,49 2,10 6,40 >> sdd 0,0031,500,00 35,00 0,0011,17 653,50 >> 0,98 28,170,00 28,17 1,83 6,40 >> sde 0,0031,500,00 34,50 0,0010,82 642,10 >> 0,67 19,540,00 19,54 1,39 4,80 >> sdf 0,00 1,000,00 24,50 0,00 9,71 811,78 >> 0,60 24,330,00 24,33 1,88 4,60 >> sdg 0,00 1,000,00 23,00 0,00 8,93 795,15 >> 0,51 22,260,00 22,26 1,91 4,40 >> sdh 0,00 1,000,00 21,50 0,00 8,37 797,05 >> 0,45 21,020,00 21,02 1,86 4,00 >> >> Disks are SSDs. >> >> Before that drops "Local write count" for problematic table increases >> very fast (10k-30k/sec, while ordinary write rate is 10-30/sec) during 1 >> minute. After that drops start. >> >> Tried useding probabilistic tracing to determine which requests cause >> "write count" to increase, but see no "batch_mutate" queries at all, only >> reads! >> >> There are no GC warnings about long pauses >> >> Could you please help troubleshooting the issue? >> >> -- >> Best Regards, >> Dmitry Simonov >> > > -- Best Regards, Dmitry Simonov
"READ messages were dropped ... for internal timeout" after big amount of writes
Hello! We are experiencing problems with Cassandra 2.2.8. There is a cluster with 3 nodes. Problematic keyspace has RF=3 and contains 3 tables (current table sizes: 1Gb, 700Mb, 12Kb). Several times per day there are bursts of "READ messages were dropped ... for internal timeout" messages in logs (on every cassandra node). Duration: 5 - 15 minutes. During periods of drops there is always a queue of pending ReadStage tasks: Pool NameActive Pending Completed Blocked All time blocked ReadStage3267 2976548410 0 0 CompactionExecutor262 802136 0 0 Others Active and Pending counters of tpstats are 0. During drops iostat says there is no read requests to disks, probably because all data fits in a disk cache: avg-cpu: %user %nice %system %iowait %steal %idle 56,530,94 39,840,010,002,68 Device: rrqm/s wrqm/s r/s w/srMB/swMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sda 0,0011,000,00 26,00 0,00 9,09 715,92 0,78 30,310,00 30,31 2,46 6,40 sdb 0,0011,000,00 33,00 0,0010,57 655,70 0,83 26,000,00 26,00 2,00 6,60 sdc 0,00 1,000,00 30,50 0,0010,98 737,07 0,91 30,490,00 30,49 2,10 6,40 sdd 0,0031,500,00 35,00 0,0011,17 653,50 0,98 28,170,00 28,17 1,83 6,40 sde 0,0031,500,00 34,50 0,0010,82 642,10 0,67 19,540,00 19,54 1,39 4,80 sdf 0,00 1,000,00 24,50 0,00 9,71 811,78 0,60 24,330,00 24,33 1,88 4,60 sdg 0,00 1,000,00 23,00 0,00 8,93 795,15 0,51 22,260,00 22,26 1,91 4,40 sdh 0,00 1,000,00 21,50 0,00 8,37 797,05 0,45 21,020,00 21,02 1,86 4,00 Disks are SSDs. Before that drops "Local write count" for problematic table increases very fast (10k-30k/sec, while ordinary write rate is 10-30/sec) during 1 minute. After that drops start. Tried useding probabilistic tracing to determine which requests cause "write count" to increase, but see no "batch_mutate" queries at all, only reads! There are no GC warnings about long pauses Could you please help troubleshooting the issue? -- Best Regards, Dmitry Simonov