Not sure what is going on there either. Roland - can you open an issue with
the information above:
https://issues.apache.org/jira/browse/CASSANDRA

On Thu, Apr 13, 2017 at 7:49 PM, benjamin roth <brs...@gmail.com> wrote:

> What I can tell you from that trace - given that this is the correct
> thread and it really hangs there:
>
> The validation is stuck when reading from an SSTable.
> Unfortunately I am no caffeine expert. It looks like the read is cached
> and after the read caffeine tries to drain the cache and this is stuck. I
> don't see the reason from that stack trace.
> Someone had to dig deeper into caffeine to find the root cause.
>
> 2017-04-13 9:27 GMT+02:00 Roland Otta <roland.o...@willhaben.at>:
>
>> i had a closer look at the validation executor thread (i hope thats what
>> you meant)
>>
>> it seems the thread is always repeating stuff in
>> org.apache.cassandra.cache.ChunkCache$CachingRebufferer.rebu
>> ffer(ChunkCache.java:235)
>>
>> here is the full stack trace ...
>>
>> i am sorry .. but i have no clue whats happening there ..
>>
>> com.github.benmanes.caffeine.cache.BoundedLocalCache$$Lambda$64/
>> 2098345091 <(209)%20834-5091>.accept(Unknown Source)
>> com.github.benmanes.caffeine.cache.BoundedBuffer$RingBuffer.
>> drainTo(BoundedBuffer.java:104)
>> com.github.benmanes.caffeine.cache.StripedBuffer.drainTo(Str
>> ipedBuffer.java:160)
>> com.github.benmanes.caffeine.cache.BoundedLocalCache.drainRe
>> adBuffer(BoundedLocalCache.java:964)
>> com.github.benmanes.caffeine.cache.BoundedLocalCache.mainten
>> ance(BoundedLocalCache.java:918)
>> com.github.benmanes.caffeine.cache.BoundedLocalCache.perform
>> CleanUp(BoundedLocalCache.java:903)
>> com.github.benmanes.caffeine.cache.BoundedLocalCache$Perform
>> CleanupTask.run(BoundedLocalCache.java:2680)
>> com.google.common.util.concurrent.MoreExecutors$DirectExecut
>> or.execute(MoreExecutors.java:457)
>> com.github.benmanes.caffeine.cache.BoundedLocalCache.schedul
>> eDrainBuffers(BoundedLocalCache.java:875)
>> com.github.benmanes.caffeine.cache.BoundedLocalCache.afterRe
>> ad(BoundedLocalCache.java:748)
>> com.github.benmanes.caffeine.cache.BoundedLocalCache.compute
>> IfAbsent(BoundedLocalCache.java:1783)
>> com.github.benmanes.caffeine.cache.LocalCache.computeIfAbsen
>> t(LocalCache.java:97)
>> com.github.benmanes.caffeine.cache.LocalLoadingCache.get(Loc
>> alLoadingCache.java:66)
>> org.apache.cassandra.cache.ChunkCache$CachingRebufferer.rebu
>> ffer(ChunkCache.java:235)
>> org.apache.cassandra.cache.ChunkCache$CachingRebufferer.rebu
>> ffer(ChunkCache.java:213)
>> org.apache.cassandra.io.util.RandomAccessReader.reBufferAt(R
>> andomAccessReader.java:65)
>> org.apache.cassandra.io.util.RandomAccessReader.reBuffer(Ran
>> domAccessReader.java:59)
>> org.apache.cassandra.io.util.RebufferingInputStream.read(Reb
>> ufferingInputStream.java:88)
>> org.apache.cassandra.io.util.RebufferingInputStream.readFull
>> y(RebufferingInputStream.java:66)
>> org.apache.cassandra.io.util.RebufferingInputStream.readFull
>> y(RebufferingInputStream.java:60)
>> org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:402)
>> org.apache.cassandra.db.marshal.AbstractType.readValue(
>> AbstractType.java:420)
>> org.apache.cassandra.db.rows.Cell$Serializer.deserialize(Cell.java:245)
>> org.apache.cassandra.db.rows.UnfilteredSerializer.readSimple
>> Column(UnfilteredSerializer.java:610)
>> org.apache.cassandra.db.rows.UnfilteredSerializer.lambda$des
>> erializeRowBody$1(UnfilteredSerializer.java:575)
>> org.apache.cassandra.db.rows.UnfilteredSerializer$$Lambda$84/898489541.accept(Unknown
>> Source)
>> org.apache.cassandra.utils.btree.BTree.applyForwards(BTree.java:1222)
>> org.apache.cassandra.utils.btree.BTree.apply(BTree.java:1177)
>> org.apache.cassandra.db.Columns.apply(Columns.java:377)
>> org.apache.cassandra.db.rows.UnfilteredSerializer.deserializ
>> eRowBody(UnfilteredSerializer.java:571)
>> org.apache.cassandra.db.rows.UnfilteredSerializer.deserializ
>> e(UnfilteredSerializer.java:440)
>> org.apache.cassandra.io.sstable.SSTableSimpleIterator$Curren
>> tFormatIterator.computeNext(SSTableSimpleIterator.java:95)
>> org.apache.cassandra.io.sstable.SSTableSimpleIterator$Curren
>> tFormatIterator.computeNext(SSTableSimpleIterator.java:73)
>> org.apache.cassandra.utils.AbstractIterator.hasNext(Abstract
>> Iterator.java:47)
>> org.apache.cassandra.io.sstable.SSTableIdentityIterator.hasN
>> ext(SSTableIdentityIterator.java:122)
>> org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowI
>> terator.computeNext(LazilyInitializedUnfilteredRowIterator.java:100)
>> org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowI
>> terator.computeNext(LazilyInitializedUnfilteredRowIterator.java:32)
>> org.apache.cassandra.utils.AbstractIterator.hasNext(Abstract
>> Iterator.java:47)
>> org.apache.cassandra.utils.MergeIterator$Candidate.advance(
>> MergeIterator.java:374)
>> org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(
>> MergeIterator.java:186)
>> org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNe
>> xt(MergeIterator.java:155)
>> org.apache.cassandra.utils.AbstractIterator.hasNext(Abstract
>> Iterator.java:47)
>> org.apache.cassandra.db.rows.UnfilteredRowIterators$Unfilter
>> edRowMergeIterator.computeNext(UnfilteredRowIterators.java:500)
>> org.apache.cassandra.db.rows.UnfilteredRowIterators$Unfilter
>> edRowMergeIterator.computeNext(UnfilteredRowIterators.java:360)
>> org.apache.cassandra.utils.AbstractIterator.hasNext(Abstract
>> Iterator.java:47)
>> org.apache.cassandra.db.transform.BaseRows.hasNext(BaseRows.java:133)
>> org.apache.cassandra.db.rows.UnfilteredRowIterators.digest(U
>> nfilteredRowIterators.java:178)
>> org.apache.cassandra.repair.Validator.rowHash(Validator.java:221)
>> org.apache.cassandra.repair.Validator.add(Validator.java:160)
>> org.apache.cassandra.db.compaction.CompactionManager.doValid
>> ationCompaction(CompactionManager.java:1364)
>> org.apache.cassandra.db.compaction.CompactionManager.access$
>> 700(CompactionManager.java:85)
>> org.apache.cassandra.db.compaction.CompactionManager$13.
>> call(CompactionManager.java:933)
>> java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>> java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>> Executor.java:1142)
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>> lExecutor.java:617)
>> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$
>> threadLocalDeallocator$0(NamedThreadFactory.java:79)
>> org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$5/1371495133.run(Unknown
>> Source)
>> java.lang.Thread.run(Thread.java:745)
>>
>> On Thu, 2017-04-13 at 08:47 +0200, benjamin roth wrote:
>>
>> You should connect to the node with JConsole and see where the compaction
>> thread is stuck
>>
>> 2017-04-13 8:34 GMT+02:00 Roland Otta <roland.o...@willhaben.at>:
>>
>> hi,
>>
>> we have the following issue on our 3.10 development cluster.
>>
>> we are doing regular repairs with thelastpickle's fork of creaper.
>> sometimes the repair (it is a full repair in that case) hangs because
>> of a stuck validation compaction
>>
>> nodetool compactionstats gives me
>> a1bb45c0-1fc6-11e7-81de-0fb0b3f5a345 Validation      bds      ad_event
>> 805955242 841258085 bytes 95.80%
>> we have here no more progress for hours
>>
>> nodetool tpstats shows
>> alidationExecutor                1         1          16186         0
>>                0
>>
>> i checked the logs on the affected node and could not find any
>> suspicious errors.
>>
>> anyone that already had this issue and knows how to cope with that?
>>
>> a restart of the node helps to finish the repair ... but i am not sure
>> whether that somehow breaks the full repair
>>
>> bg,
>> roland
>>
>>
>>
>


-- 
-----------------
Nate McCall
Wellington, NZ
@zznate

CTO
Apache Cassandra Consulting
http://www.thelastpickle.com

Reply via email to