[ 
https://issues.apache.org/jira/browse/OAK-7867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16667364#comment-16667364
 ] 

Michael Dürig commented on OAK-7867:
------------------------------------

Here's my patch *Reduce lock granularity*: 
[https://github.com/mduerig/jackrabbit-oak/commits/OAK-7867]. To make it more 
readable I split it in two commits, which I would eventually squash:
 * 
[https://github.com/mduerig/jackrabbit-oak/commit/19723e8421b63a78977dead6e5ee8446647751a0]:
 push the write operation ({{WriteOperationHandler#execute}}) into the 
{{SegmentWriter}}. This has the affect that {{SegmentBufferWritter}} instances 
are only acquired for the smallest time necessary.
 * 
[https://github.com/mduerig/jackrabbit-oak/commit/f0cbca3da7a47ba929587c0273afef7cf8afe684]:
 improve how the {{GCGeneration}} associated with write operations is handled. 
This fixes a problem the the previous commit introduces: a 
{{SegmentWriter.write}} call consisting of multiple records could actually end 
up in segment with different generations should a compaction finish in the 
middle of the write operation. To fix this {{WriteOperationHandler#execute}} 
needs to include the desired {{GCGeneration}}. IMO this change makes it more 
explicit what is going on re. the gc generation while at the same time allowing 
us to remove some special casing in {{SegmentBufferWriterPool#borrowWriter}}.

[~frm], could you have a look? 

 

> Flush thread gets stuck when input stream of binaries block
> -----------------------------------------------------------
>
>                 Key: OAK-7867
>                 URL: https://issues.apache.org/jira/browse/OAK-7867
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: segment-tar
>            Reporter: Michael Dürig
>            Assignee: Michael Dürig
>            Priority: Critical
>              Labels: candidate_oak_1_6, candidate_oak_1_8
>             Fix For: 1.10
>
>
> This issue tackles the root cause of the sever data loss that has been 
> reported in OAK-7852:
> When a the input stream of a binary value blocks indefinitely on read the 
> flush thread of the segment store get blocked:
> {noformat}
> "pool-2-thread-1" #15 prio=5 os_prio=31 tid=0x00007fb0f21e3000 nid=0x5f03 
> waiting on condition [0x000070000a46d000]
> java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x000000076bba62b0> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
> at com.google.common.util.concurrent.Monitor.await(Monitor.java:963)
> at com.google.common.util.concurrent.Monitor.enterWhen(Monitor.java:402)
> at 
> org.apache.jackrabbit.oak.segment.SegmentBufferWriterPool.safeEnterWhen(SegmentBufferWriterPool.java:179)
> at 
> org.apache.jackrabbit.oak.segment.SegmentBufferWriterPool.flush(SegmentBufferWriterPool.java:138)
> at 
> org.apache.jackrabbit.oak.segment.DefaultSegmentWriter.flush(DefaultSegmentWriter.java:138)
> at 
> org.apache.jackrabbit.oak.segment.file.FileStore.lambda$doFlush$8(FileStore.java:307)
> at 
> org.apache.jackrabbit.oak.segment.file.FileStore$$Lambda$22/1345968304.flush(Unknown
>  Source)
> at 
> org.apache.jackrabbit.oak.segment.file.TarRevisions.doFlush(TarRevisions.java:237)
> at 
> org.apache.jackrabbit.oak.segment.file.TarRevisions.flush(TarRevisions.java:195)
> at 
> org.apache.jackrabbit.oak.segment.file.FileStore.doFlush(FileStore.java:306)
> at org.apache.jackrabbit.oak.segment.file.FileStore.flush(FileStore.java:318)
> {noformat}
> The condition {{0x000070000a46d000}} is waiting for the following thread to 
> return its {{SegmentBufferWriter}}, which will never happen if 
> {{InputStream.read(...)}} does not progress.
> {noformat}
> "pool-1-thread-1" #14 prio=5 os_prio=31 tid=0x00007fb0f223a800 nid=0x5d03 
> runnable [0x000070000a369000
> ] java.lang.Thread.State: RUNNABLE
> at com.google.common.io.ByteStreams.read(ByteStreams.java:833)
> at 
> org.apache.jackrabbit.oak.segment.DefaultSegmentWriter$SegmentWriteOperation.internalWriteStream(DefaultSegmentWriter.java:641)
> at 
> org.apache.jackrabbit.oak.segment.DefaultSegmentWriter$SegmentWriteOperation.writeStream(DefaultSegmentWriter.java:618)
> at 
> org.apache.jackrabbit.oak.segment.DefaultSegmentWriter$SegmentWriteOperation.writeBlob(DefaultSegmentWriter.java:577)
> at 
> org.apache.jackrabbit.oak.segment.DefaultSegmentWriter$SegmentWriteOperation.writeProperty(DefaultSegmentWriter.java:691)
> at 
> org.apache.jackrabbit.oak.segment.DefaultSegmentWriter$SegmentWriteOperation.writeProperty(DefaultSegmentWriter.java:677)
> at 
> org.apache.jackrabbit.oak.segment.DefaultSegmentWriter$SegmentWriteOperation.writeNodeUncached(DefaultSegmentWriter.java:900)
> at 
> org.apache.jackrabbit.oak.segment.DefaultSegmentWriter$SegmentWriteOperation.writeNode(DefaultSegmentWriter.java:799)
> at 
> org.apache.jackrabbit.oak.segment.DefaultSegmentWriter$SegmentWriteOperation.access$800(DefaultSegmentWriter.java:252)
> at 
> org.apache.jackrabbit.oak.segment.DefaultSegmentWriter$8.execute(DefaultSegmentWriter.java:240)
> at 
> org.apache.jackrabbit.oak.segment.SegmentBufferWriterPool.execute(SegmentBufferWriterPool.java:105)
> at 
> org.apache.jackrabbit.oak.segment.DefaultSegmentWriter.writeNode(DefaultSegmentWriter.java:235)
> at 
> org.apache.jackrabbit.oak.segment.SegmentWriter.writeNode(SegmentWriter.java:79)
> {noformat}
>  
> This issue is critical as such a misbehaving input stream causes the flush 
> thread to get stuck preventing transient segments from being flushed and thus 
> causing data loss.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to