Re: cassandra 3.11.4 deadlock
https://issues.apache.org/jira/browse/CASSANDRA-15898 > Sent: Tuesday, June 23, 2020 at 4:16 PM > From: "John Doe" > To: user@cassandra.apache.org > Subject: Re: cassandra 3.11.4 deadlock > > > Yes > here is one more example, happended on another node > > using TWCS > compaction = {'class': > 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy', > 'compaction_window_size': '30', > 'compaction_window_unit': 'MINUTES', > 'max_threshold': '32', > 'min_threshold': '4'} > > ## > "CompactionExecutor:244228": > waiting to lock Monitor@0x7bb425df7ee8 (Object@0x7fd71f91e2d8, a > org/apache/cassandra/db/compaction/SizeTieredCompactionStrategy), > which is held by "CompactionExecutor:244235" > "CompactionExecutor:244235": > waiting to lock Monitor@0x7bb3d83ea188 (Object@0x7fd487f937b0, a > java/io/ExpiringCache), > which is held by "CompactionExecutor:244228" > > > I didnt know which thread is it, so I got all CompactionExecutor and > SizeTieredCompactionStrategy threads. > > > ### CompactionExecutor Threads > > Thread 111461: (state = BLOCKED) > - java.io.ExpiringCache.get(java.lang.String) @bci=0, line=75 (Compiled > frame) > - java.io.UnixFileSystem.canonicalize(java.lang.String) @bci=17, line=152 > (Compiled frame) > - java.io.File.getCanonicalPath() @bci=27, line=618 (Compiled frame) > - java.io.FilePermission$1.run() @bci=103, line=215 (Compiled frame) > - java.io.FilePermission$1.run() @bci=1, line=203 (Compiled frame) > - > java.security.AccessController.doPrivileged(java.security.PrivilegedAction) > @bci=0 (Compiled frame) > - java.io.FilePermission.init(int) @bci=97, line=203 (Compiled frame) > - java.io.FilePermission.(java.lang.String, java.lang.String) @bci=10, > line=277 (Compiled frame) > - java.lang.SecurityManager.checkRead(java.lang.String) @bci=8, line=888 > (Compiled frame) > - java.io.File.length() @bci=13, line=969 (Compiled frame) > - org.apache.cassandra.io.sstable.format.big.BigTableWriter.openEarly() > @bci=87, line=284 (Compiled frame) > - > org.apache.cassandra.io.sstable.SSTableRewriter.maybeReopenEarly(org.apache.cassandra.db.DecoratedKey) > @bci=111, line=179 (Compiled frame) > - > org.apache.cassandra.io.sstable.SSTableRewriter.append(org.apache.cassandra.db.rows.UnfilteredRowIterator) > @bci=9, line=134 (Compiled frame) > - > org.apache.cassandra.db.compaction.writers.DefaultCompactionWriter.realAppend(org.apache.cassandra.db.rows.UnfilteredRowIterator) > @bci=5, line=65 (Compiled frame) > - > org.apache.cassandra.db.compaction.writers.CompactionAwareWriter.append(org.apache.cassandra.db.rows.UnfilteredRowIterator) > @bci=12, line=142 (Compiled frame) > - org.apache.cassandra.db.compaction.CompactionTask.runMayThrow() @bci=490, > line=201 (Compiled frame) > - org.apache.cassandra.utils.WrappedRunnable.run() @bci=1, line=28 (Compiled > frame) > - > org.apache.cassandra.db.compaction.CompactionTask.executeInternal(org.apache.cassandra.db.compaction.CompactionManager$CompactionExecutorStatsCollector) > @bci=6, line=85 (Compiled frame) > - > org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(org.apache.cassandra.db.compaction.CompactionManager$CompactionExecutorStatsCollector) > @bci=2, line=61 (Compiled frame) > - > org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run() > @bci=128, line=266 (Compiled frame) > - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511 > (Compiled frame) > - java.util.concurrent.FutureTask.run() @bci=42, line=266 (Compiled frame) > - > java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker) > @bci=95, line=1149 (Compiled frame) > - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=624 > (Compiled frame) > - > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(java.lang.Runnable) > @bci=1, line=81 (Compiled frame) > - org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$6.run() @bci=4 > (Compiled frame) > - java.lang.Thread.run() @bci=11, line=748 (Compiled frame) > Thread 111416: (state = BLOCKED) > - java.io.ExpiringCache.get(java.lang.String) @bci=0, line=75 (Compiled > frame) > - java.io.UnixFileSystem.canonicalize(java.lang.Stri
Re: cassandra 3.11.4 deadlock
ava.lang.Runnable) @bci=1, line=81 (Compiled frame) - org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$6.run() @bci=4 (Compiled frame) - java.lang.Thread.run() @bci=11, line=748 (Compiled frame) Sent: Tuesday, June 23, 2020 at 3:56 PM From: "Jeff Jirsa" To: "cassandra" Subject: Re: cassandra 3.11.4 deadlock Can you open a JIRA? https://issues.apache.org/jira/projects/CASSANDRA/issues On Tue, Jun 23, 2020 at 11:56 AM John Doe mailto:john@discofan.com]> wrote:We are running apache-cassandra-3.11.4, 10 node cluster with -Xms32G -Xmx32G -Xmn8G using CMS. after running couple of days one of the node become unresponsive and threaddump (jstack -F) shows deadlock. Found one Java-level deadlock: = "Native-Transport-Requests-144": waiting to lock Monitor@0x7cd5142e4d08 (Object@0x7f6e00348268, a java/io/ExpiringCache), which is held by "CompactionExecutor:115134" "CompactionExecutor:115134": waiting to lock Monitor@0x7f6bcaf130f8 (Object@0x7f6dff31faa0, a ch/qos/logback/core/joran/spi/ConfigurationWatchList), which is held by "Native-Transport-Requests-144" Found a total of 1 deadlock. I have seen this couple of time now with different nodes with following in system.log IndexSummaryRedistribution.java:77 - Redistributing index summaries NoSpamLogger.java:91 - Maximum memory usage reached (536870912), cannot allocate chunk of 1048576 also lookin in gc log there has not been a ParNew collection for last 10hrs, only CMS collections. 1739842.375: [GC (CMS Final Remark) [YG occupancy: 2712269 K (7549760 K)] 1739842.375: [Rescan (parallel) , 0.0614157 secs] 1739842.437: [weak refs processing, 0.994 secs] 1739842.437: [class unloading, 0.0231076 secs] 1739842.460: [scrub symbol table, 0.0061049 secs] 1739842.466: [scrub string table, 0.0043847 secs][1 CMS-remark: 17696837K(25165824K)] 20409107K(32715584K), 0.0953750 secs] [Times: user=2.95 sys=0.00, real=0.09 secs] 1739842.471: [CMS-concurrent-sweep-start] 1739848.572: [CMS-concurrent-sweep: 6.101/6.101 secs] [Times: user=6.13 sys=0.00, real=6.10 secs] 1739848.573: [CMS-concurrent-reset-start] 1739848.645: [CMS-concurrent-reset: 0.072/0.072 secs] [Times: user=0.08 sys=0.00, real=0.08 secs] 1739858.653: [GC (CMS Initial Mark) [1 CMS-initial-mark: 17696837K(25165824K)] 20409111K(32715584K), 0.0584838 secs] [Times: user=2.68 sys=0.00, real=0.06 secs] 1739858.713: [CMS-concurrent-mark-start] 1739860.496: [CMS-concurrent-mark: 1.784/1.784 secs] [Times: user=84.77 sys=0.00, real=1.79 secs] 1739860.497: [CMS-concurrent-preclean-start] 1739860.566: [CMS-concurrent-preclean: 0.070/0.070 secs] [Times: user=0.07 sys=0.00, real=0.07 secs] 1739860.567: [CMS-concurrent-abortable-preclean-start] CMS: abort preclean due to time 1739866.333: [CMS-concurrent-abortable-preclean: 5.766/5.766 secs] [Times: user=5.80 sys=0.00, real=5.76 secs] Java HotSpot(TM) 64-Bit Server VM (25.162-b12) for linux-amd64 JRE (1.8.0_162-b12) Memory: 4k page, physical 792290076k(2780032k free), swap 16777212k(16693756k free) CommandLine flags: -XX:+AlwaysPreTouch -XX:CICompilerCount=15 -XX:+CMSClassUnloadingEnabled -XX:+CMSEdenChunksRecordAlways -XX:CMSInitiatingOccupancyFraction=40 -XX:+CMSParallelInitialMarkEnabled -XX:+CMSParallelRemarkEnabled -XX:CMSWaitDuration=1 -XX:ConcGCThreads=50 -XX:+CrashOnOutOfMemoryError -XX:GCLogFileSize=10485760 -XX:+HeapDumpOnOutOfMemoryError -XX:InitialHeapSize=34359738368 -XX:InitialTenuringThreshold=1 -XX:+ManagementServer -XX:MaxHeapSize=34359738368 -XX:MaxNewSize=8589934592 -XX:MaxTenuringThreshold=1 -XX:MinHeapDeltaBytes=196608 -XX:NewSize=8589934592 -XX:NumberOfGCLogFiles=10 -XX:OldPLABSize=16 -XX:OldSize=25769803776 -XX:OnOutOfMemoryError=kill -9 %p -XX:ParallelGCThreads=50 -XX:+PerfDisableSharedMem -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+ResizeTLAB -XX:StringTableSize=103 -XX:SurvivorRatio=8 -XX:ThreadPriorityPolicy=42 -XX:ThreadStackSize=256 -XX:-UseBiasedLocking -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseConcMarkSweepGC -XX:+UseCondCardMark -XX:+UseFastUnorderedTimeStamps -XX:+UseGCLogFileRotation -XX:+UseNUMA -XX:+UseNUMAInterleaving -XX:+UseParNewGC -XX:+UseTLAB -XX:+UseThreadPriorities --Thanks - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org[mailto:user-unsubscr...@cassandra.apache.org] For additional commands, e-mail: user-h...@cassandra.apache.org[mailto:user-h...@cassandra.apache.org] - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: cassandra 3.11.4 deadlock
Can you open a JIRA? https://issues.apache.org/jira/projects/CASSANDRA/issues On Tue, Jun 23, 2020 at 11:56 AM John Doe wrote: > We are running apache-cassandra-3.11.4, 10 node cluster with -Xms32G > -Xmx32G -Xmn8G using CMS. > after running couple of days one of the node become unresponsive and > threaddump (jstack -F) shows deadlock. > > Found one Java-level deadlock: > = > > "Native-Transport-Requests-144": > waiting to lock Monitor@0x7cd5142e4d08 (Object@0x7f6e00348268, > a java/io/ExpiringCache), > which is held by "CompactionExecutor:115134" > "CompactionExecutor:115134": > waiting to lock Monitor@0x7f6bcaf130f8 (Object@0x7f6dff31faa0, > a ch/qos/logback/core/joran/spi/ConfigurationWatchList), > which is held by "Native-Transport-Requests-144" > > Found a total of 1 deadlock. > > I have seen this couple of time now with different nodes with following in > system.log > > IndexSummaryRedistribution.java:77 - Redistributing index summaries > NoSpamLogger.java:91 - Maximum memory usage reached (536870912), cannot > allocate chunk of 1048576 > > also lookin in gc log there has not been a ParNew collection for last > 10hrs, only CMS collections. > > 1739842.375: [GC (CMS Final Remark) [YG occupancy: 2712269 K (7549760 K)] > 1739842.375: [Rescan (parallel) , 0.0614157 secs] > 1739842.437: [weak refs processing, 0.994 secs] > 1739842.437: [class unloading, 0.0231076 secs] > 1739842.460: [scrub symbol table, 0.0061049 secs] > 1739842.466: [scrub string table, 0.0043847 secs][1 CMS-remark: > 17696837K(25165824K)] 20409107K(32715584K), 0.0953750 secs] [Times: > user=2.95 sys=0.00, real=0.09 secs] > 1739842.471: [CMS-concurrent-sweep-start] > 1739848.572: [CMS-concurrent-sweep: 6.101/6.101 secs] [Times: user=6.13 > sys=0.00, real=6.10 secs] > 1739848.573: [CMS-concurrent-reset-start] > 1739848.645: [CMS-concurrent-reset: 0.072/0.072 secs] [Times: user=0.08 > sys=0.00, real=0.08 secs] > 1739858.653: [GC (CMS Initial Mark) [1 CMS-initial-mark: > 17696837K(25165824K)] 20409111K(32715584K), 0.0584838 secs] [Times: > user=2.68 sys=0.00, real=0.06 secs] > 1739858.713: [CMS-concurrent-mark-start] > 1739860.496: [CMS-concurrent-mark: 1.784/1.784 secs] [Times: user=84.77 > sys=0.00, real=1.79 secs] > 1739860.497: [CMS-concurrent-preclean-start] > 1739860.566: [CMS-concurrent-preclean: 0.070/0.070 secs] [Times: user=0.07 > sys=0.00, real=0.07 secs] > 1739860.567: [CMS-concurrent-abortable-preclean-start] > CMS: abort preclean due to time > 1739866.333: [CMS-concurrent-abortable-preclean: 5.766/5.766 secs] [Times: > user=5.80 sys=0.00, real=5.76 secs] > > Java HotSpot(TM) 64-Bit Server VM (25.162-b12) for linux-amd64 JRE > (1.8.0_162-b12) > Memory: 4k page, physical 792290076k(2780032k free), swap > 16777212k(16693756k free) > > CommandLine flags: > -XX:+AlwaysPreTouch > -XX:CICompilerCount=15 > -XX:+CMSClassUnloadingEnabled > -XX:+CMSEdenChunksRecordAlways > -XX:CMSInitiatingOccupancyFraction=40 > -XX:+CMSParallelInitialMarkEnabled > -XX:+CMSParallelRemarkEnabled > -XX:CMSWaitDuration=1 > -XX:ConcGCThreads=50 > -XX:+CrashOnOutOfMemoryError > -XX:GCLogFileSize=10485760 > -XX:+HeapDumpOnOutOfMemoryError > -XX:InitialHeapSize=34359738368 > -XX:InitialTenuringThreshold=1 > -XX:+ManagementServer > -XX:MaxHeapSize=34359738368 > -XX:MaxNewSize=8589934592 > -XX:MaxTenuringThreshold=1 > -XX:MinHeapDeltaBytes=196608 > -XX:NewSize=8589934592 > -XX:NumberOfGCLogFiles=10 > -XX:OldPLABSize=16 > -XX:OldSize=25769803776 > -XX:OnOutOfMemoryError=kill -9 %p > -XX:ParallelGCThreads=50 > -XX:+PerfDisableSharedMem > -XX:+PrintGC > -XX:+PrintGCDetails > -XX:+PrintGCTimeStamps > -XX:+ResizeTLAB > -XX:StringTableSize=103 > -XX:SurvivorRatio=8 > -XX:ThreadPriorityPolicy=42 > -XX:ThreadStackSize=256 > -XX:-UseBiasedLocking > -XX:+UseCMSInitiatingOccupancyOnly > -XX:+UseConcMarkSweepGC > -XX:+UseCondCardMark > -XX:+UseFastUnorderedTimeStamps > -XX:+UseGCLogFileRotation > -XX:+UseNUMA > -XX:+UseNUMAInterleaving > -XX:+UseParNewGC > -XX:+UseTLAB > -XX:+UseThreadPriorities > > --Thanks > > - > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org > For additional commands, e-mail: user-h...@cassandra.apache.org > >
cassandra 3.11.4 deadlock
We are running apache-cassandra-3.11.4, 10 node cluster with -Xms32G -Xmx32G -Xmn8G using CMS. after running couple of days one of the node become unresponsive and threaddump (jstack -F) shows deadlock. Found one Java-level deadlock: = "Native-Transport-Requests-144": waiting to lock Monitor@0x7cd5142e4d08 (Object@0x7f6e00348268, a java/io/ExpiringCache), which is held by "CompactionExecutor:115134" "CompactionExecutor:115134": waiting to lock Monitor@0x7f6bcaf130f8 (Object@0x7f6dff31faa0, a ch/qos/logback/core/joran/spi/ConfigurationWatchList), which is held by "Native-Transport-Requests-144" Found a total of 1 deadlock. I have seen this couple of time now with different nodes with following in system.log IndexSummaryRedistribution.java:77 - Redistributing index summaries NoSpamLogger.java:91 - Maximum memory usage reached (536870912), cannot allocate chunk of 1048576 also lookin in gc log there has not been a ParNew collection for last 10hrs, only CMS collections. 1739842.375: [GC (CMS Final Remark) [YG occupancy: 2712269 K (7549760 K)] 1739842.375: [Rescan (parallel) , 0.0614157 secs] 1739842.437: [weak refs processing, 0.994 secs] 1739842.437: [class unloading, 0.0231076 secs] 1739842.460: [scrub symbol table, 0.0061049 secs] 1739842.466: [scrub string table, 0.0043847 secs][1 CMS-remark: 17696837K(25165824K)] 20409107K(32715584K), 0.0953750 secs] [Times: user=2.95 sys=0.00, real=0.09 secs] 1739842.471: [CMS-concurrent-sweep-start] 1739848.572: [CMS-concurrent-sweep: 6.101/6.101 secs] [Times: user=6.13 sys=0.00, real=6.10 secs] 1739848.573: [CMS-concurrent-reset-start] 1739848.645: [CMS-concurrent-reset: 0.072/0.072 secs] [Times: user=0.08 sys=0.00, real=0.08 secs] 1739858.653: [GC (CMS Initial Mark) [1 CMS-initial-mark: 17696837K(25165824K)] 20409111K(32715584K), 0.0584838 secs] [Times: user=2.68 sys=0.00, real=0.06 secs] 1739858.713: [CMS-concurrent-mark-start] 1739860.496: [CMS-concurrent-mark: 1.784/1.784 secs] [Times: user=84.77 sys=0.00, real=1.79 secs] 1739860.497: [CMS-concurrent-preclean-start] 1739860.566: [CMS-concurrent-preclean: 0.070/0.070 secs] [Times: user=0.07 sys=0.00, real=0.07 secs] 1739860.567: [CMS-concurrent-abortable-preclean-start] CMS: abort preclean due to time 1739866.333: [CMS-concurrent-abortable-preclean: 5.766/5.766 secs] [Times: user=5.80 sys=0.00, real=5.76 secs] Java HotSpot(TM) 64-Bit Server VM (25.162-b12) for linux-amd64 JRE (1.8.0_162-b12) Memory: 4k page, physical 792290076k(2780032k free), swap 16777212k(16693756k free) CommandLine flags: -XX:+AlwaysPreTouch -XX:CICompilerCount=15 -XX:+CMSClassUnloadingEnabled -XX:+CMSEdenChunksRecordAlways -XX:CMSInitiatingOccupancyFraction=40 -XX:+CMSParallelInitialMarkEnabled -XX:+CMSParallelRemarkEnabled -XX:CMSWaitDuration=1 -XX:ConcGCThreads=50 -XX:+CrashOnOutOfMemoryError -XX:GCLogFileSize=10485760 -XX:+HeapDumpOnOutOfMemoryError -XX:InitialHeapSize=34359738368 -XX:InitialTenuringThreshold=1 -XX:+ManagementServer -XX:MaxHeapSize=34359738368 -XX:MaxNewSize=8589934592 -XX:MaxTenuringThreshold=1 -XX:MinHeapDeltaBytes=196608 -XX:NewSize=8589934592 -XX:NumberOfGCLogFiles=10 -XX:OldPLABSize=16 -XX:OldSize=25769803776 -XX:OnOutOfMemoryError=kill -9 %p -XX:ParallelGCThreads=50 -XX:+PerfDisableSharedMem -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+ResizeTLAB -XX:StringTableSize=103 -XX:SurvivorRatio=8 -XX:ThreadPriorityPolicy=42 -XX:ThreadStackSize=256 -XX:-UseBiasedLocking -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseConcMarkSweepGC -XX:+UseCondCardMark -XX:+UseFastUnorderedTimeStamps -XX:+UseGCLogFileRotation -XX:+UseNUMA -XX:+UseNUMAInterleaving -XX:+UseParNewGC -XX:+UseTLAB -XX:+UseThreadPriorities --Thanks - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org