Not sure what is going on there either. Roland - can you open an issue with the information above: https://issues.apache.org/jira/browse/CASSANDRA
On Thu, Apr 13, 2017 at 7:49 PM, benjamin roth <brs...@gmail.com> wrote: > What I can tell you from that trace - given that this is the correct > thread and it really hangs there: > > The validation is stuck when reading from an SSTable. > Unfortunately I am no caffeine expert. It looks like the read is cached > and after the read caffeine tries to drain the cache and this is stuck. I > don't see the reason from that stack trace. > Someone had to dig deeper into caffeine to find the root cause. > > 2017-04-13 9:27 GMT+02:00 Roland Otta <roland.o...@willhaben.at>: > >> i had a closer look at the validation executor thread (i hope thats what >> you meant) >> >> it seems the thread is always repeating stuff in >> org.apache.cassandra.cache.ChunkCache$CachingRebufferer.rebu >> ffer(ChunkCache.java:235) >> >> here is the full stack trace ... >> >> i am sorry .. but i have no clue whats happening there .. >> >> com.github.benmanes.caffeine.cache.BoundedLocalCache$$Lambda$64/ >> 2098345091 <(209)%20834-5091>.accept(Unknown Source) >> com.github.benmanes.caffeine.cache.BoundedBuffer$RingBuffer. >> drainTo(BoundedBuffer.java:104) >> com.github.benmanes.caffeine.cache.StripedBuffer.drainTo(Str >> ipedBuffer.java:160) >> com.github.benmanes.caffeine.cache.BoundedLocalCache.drainRe >> adBuffer(BoundedLocalCache.java:964) >> com.github.benmanes.caffeine.cache.BoundedLocalCache.mainten >> ance(BoundedLocalCache.java:918) >> com.github.benmanes.caffeine.cache.BoundedLocalCache.perform >> CleanUp(BoundedLocalCache.java:903) >> com.github.benmanes.caffeine.cache.BoundedLocalCache$Perform >> CleanupTask.run(BoundedLocalCache.java:2680) >> com.google.common.util.concurrent.MoreExecutors$DirectExecut >> or.execute(MoreExecutors.java:457) >> com.github.benmanes.caffeine.cache.BoundedLocalCache.schedul >> eDrainBuffers(BoundedLocalCache.java:875) >> com.github.benmanes.caffeine.cache.BoundedLocalCache.afterRe >> ad(BoundedLocalCache.java:748) >> com.github.benmanes.caffeine.cache.BoundedLocalCache.compute >> IfAbsent(BoundedLocalCache.java:1783) >> com.github.benmanes.caffeine.cache.LocalCache.computeIfAbsen >> t(LocalCache.java:97) >> com.github.benmanes.caffeine.cache.LocalLoadingCache.get(Loc >> alLoadingCache.java:66) >> org.apache.cassandra.cache.ChunkCache$CachingRebufferer.rebu >> ffer(ChunkCache.java:235) >> org.apache.cassandra.cache.ChunkCache$CachingRebufferer.rebu >> ffer(ChunkCache.java:213) >> org.apache.cassandra.io.util.RandomAccessReader.reBufferAt(R >> andomAccessReader.java:65) >> org.apache.cassandra.io.util.RandomAccessReader.reBuffer(Ran >> domAccessReader.java:59) >> org.apache.cassandra.io.util.RebufferingInputStream.read(Reb >> ufferingInputStream.java:88) >> org.apache.cassandra.io.util.RebufferingInputStream.readFull >> y(RebufferingInputStream.java:66) >> org.apache.cassandra.io.util.RebufferingInputStream.readFull >> y(RebufferingInputStream.java:60) >> org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:402) >> org.apache.cassandra.db.marshal.AbstractType.readValue( >> AbstractType.java:420) >> org.apache.cassandra.db.rows.Cell$Serializer.deserialize(Cell.java:245) >> org.apache.cassandra.db.rows.UnfilteredSerializer.readSimple >> Column(UnfilteredSerializer.java:610) >> org.apache.cassandra.db.rows.UnfilteredSerializer.lambda$des >> erializeRowBody$1(UnfilteredSerializer.java:575) >> org.apache.cassandra.db.rows.UnfilteredSerializer$$Lambda$84/898489541.accept(Unknown >> Source) >> org.apache.cassandra.utils.btree.BTree.applyForwards(BTree.java:1222) >> org.apache.cassandra.utils.btree.BTree.apply(BTree.java:1177) >> org.apache.cassandra.db.Columns.apply(Columns.java:377) >> org.apache.cassandra.db.rows.UnfilteredSerializer.deserializ >> eRowBody(UnfilteredSerializer.java:571) >> org.apache.cassandra.db.rows.UnfilteredSerializer.deserializ >> e(UnfilteredSerializer.java:440) >> org.apache.cassandra.io.sstable.SSTableSimpleIterator$Curren >> tFormatIterator.computeNext(SSTableSimpleIterator.java:95) >> org.apache.cassandra.io.sstable.SSTableSimpleIterator$Curren >> tFormatIterator.computeNext(SSTableSimpleIterator.java:73) >> org.apache.cassandra.utils.AbstractIterator.hasNext(Abstract >> Iterator.java:47) >> org.apache.cassandra.io.sstable.SSTableIdentityIterator.hasN >> ext(SSTableIdentityIterator.java:122) >> org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowI >> terator.computeNext(LazilyInitializedUnfilteredRowIterator.java:100) >> org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowI >> terator.computeNext(LazilyInitializedUnfilteredRowIterator.java:32) >> org.apache.cassandra.utils.AbstractIterator.hasNext(Abstract >> Iterator.java:47) >> org.apache.cassandra.utils.MergeIterator$Candidate.advance( >> MergeIterator.java:374) >> org.apache.cassandra.utils.MergeIterator$ManyToOne.advance( >> MergeIterator.java:186) >> org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNe >> xt(MergeIterator.java:155) >> org.apache.cassandra.utils.AbstractIterator.hasNext(Abstract >> Iterator.java:47) >> org.apache.cassandra.db.rows.UnfilteredRowIterators$Unfilter >> edRowMergeIterator.computeNext(UnfilteredRowIterators.java:500) >> org.apache.cassandra.db.rows.UnfilteredRowIterators$Unfilter >> edRowMergeIterator.computeNext(UnfilteredRowIterators.java:360) >> org.apache.cassandra.utils.AbstractIterator.hasNext(Abstract >> Iterator.java:47) >> org.apache.cassandra.db.transform.BaseRows.hasNext(BaseRows.java:133) >> org.apache.cassandra.db.rows.UnfilteredRowIterators.digest(U >> nfilteredRowIterators.java:178) >> org.apache.cassandra.repair.Validator.rowHash(Validator.java:221) >> org.apache.cassandra.repair.Validator.add(Validator.java:160) >> org.apache.cassandra.db.compaction.CompactionManager.doValid >> ationCompaction(CompactionManager.java:1364) >> org.apache.cassandra.db.compaction.CompactionManager.access$ >> 700(CompactionManager.java:85) >> org.apache.cassandra.db.compaction.CompactionManager$13. >> call(CompactionManager.java:933) >> java.util.concurrent.FutureTask.run(FutureTask.java:266) >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) >> java.util.concurrent.FutureTask.run(FutureTask.java:266) >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool >> Executor.java:1142) >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo >> lExecutor.java:617) >> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$ >> threadLocalDeallocator$0(NamedThreadFactory.java:79) >> org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$5/1371495133.run(Unknown >> Source) >> java.lang.Thread.run(Thread.java:745) >> >> On Thu, 2017-04-13 at 08:47 +0200, benjamin roth wrote: >> >> You should connect to the node with JConsole and see where the compaction >> thread is stuck >> >> 2017-04-13 8:34 GMT+02:00 Roland Otta <roland.o...@willhaben.at>: >> >> hi, >> >> we have the following issue on our 3.10 development cluster. >> >> we are doing regular repairs with thelastpickle's fork of creaper. >> sometimes the repair (it is a full repair in that case) hangs because >> of a stuck validation compaction >> >> nodetool compactionstats gives me >> a1bb45c0-1fc6-11e7-81de-0fb0b3f5a345 Validation bds ad_event >> 805955242 841258085 bytes 95.80% >> we have here no more progress for hours >> >> nodetool tpstats shows >> alidationExecutor 1 1 16186 0 >> 0 >> >> i checked the logs on the affected node and could not find any >> suspicious errors. >> >> anyone that already had this issue and knows how to cope with that? >> >> a restart of the node helps to finish the repair ... but i am not sure >> whether that somehow breaks the full repair >> >> bg, >> roland >> >> >> > -- ----------------- Nate McCall Wellington, NZ @zznate CTO Apache Cassandra Consulting http://www.thelastpickle.com