[jira] [Commented] (OAK-6749) Segment-Tar standby sync fails with "in-memory" blobs present in the source repo

Csaba Varga (JIRA) Thu, 24 Jan 2019 01:31:27 -0800


    [ 
https://issues.apache.org/jira/browse/OAK-6749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16750926#comment-16750926
 ]


Csaba Varga commented on OAK-6749:
----------------------------------

{quote}Csaba Varga, for the sake of completeness, can you describe how you 
worked around the problem?
{quote}
Sure!

I used the "console" mode of oak-run to run a custom Groovy script on the 
Segment-Tar version of the repository, i.e. directly after completing the 
sidegrade from Mongo. This custom script used the Oak API directly to recreate 
the affected blobs (blobs where InMemoryDataRecord.isInstance() returns true 
for the ID). Blobs created by the Segment-Tar implementation will never be 
InMemoryDataRecord instances (at least as of 1.6), so this procedure allowed me 
to sidestep the issue with syncing those instances.

The aggressive de-duplicating behavior of Oak, which is normally a good thing, 
caused me some headaches while writing the script. I ended up creating new 
properties on the affected nodes with a predictable name (a fixed string 
prepended to the original name) and deleting the affected properties to make 
sure the new blobs are referenced and the old "in-memory" blob IDs are left 
unreferenced. Then, as a second phase, I re-created the original properties 
based on the prefixed ones and deleted the prefixed ones. The end result was a 
Segment-Tar repository with the exact same semantics as the original, but with 
no "in-memory" blob IDs referenced by the head revision.

I can share the Groovy script itself if you're interested in the gory details.

> Segment-Tar standby sync fails with "in-memory" blobs present in the source 
> repo
> --------------------------------------------------------------------------------
>
>                 Key: OAK-6749
>                 URL: https://issues.apache.org/jira/browse/OAK-6749
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: blob, tarmk-standby
>    Affects Versions: 1.6.2
>            Reporter: Csaba Varga
>            Assignee: Francesco Mari
>            Priority: Major
>             Fix For: 1.10.1, 1.8.12, 1.10
>
>         Attachments: OAK-6749-01.patch, OAK-6749-02.patch
>
>
> We have run into some issue when trying to transition from an active/active 
> Mongo NodeStore cluster to a single Segment-Tar server with cold standby. The 
> issue itself manifests when the standby server tries to pull changes from the 
> primary after the first round of online revision GC.
> Let me summarize the way we ended up with the current state, and my 
> hypothesis about what happened, based on my debugging so far:
> # We started with a Mongo NodeStore and an external FileDataStore as the blob 
> store. The FileDataStore was set up with minRecordLength=4096. The Mongo 
> store stores blobs below minRecordLength as special "in-memory" blobIDs where 
> the data itself is baked into the ID string in hex.
> # We have executed a sidegrade of the Mongo store into a Segment-Tar store. 
> Our datastore is over 1TB in size, so copying the binaries wasn't an option. 
> The new repository is simply reusing the existing datastore. The "in-memory" 
> blobIDs still look like external blobIDs to the sidegrade process, so they 
> were copied into the Segment-Tar repository as-is, instead of being converted 
> into the efficient in-line format.
> # The server started up without issues on the new Segment-Tar store. The 
> migrated "in-memory" blob IDs seem to work fine, if a bit sub-optimal.
> # At this point, we have created a cold standby instance by copying the files 
> of the stopped primary instance and making the necessary config changes on 
> both servers.
> # Everything worked fine until the primary server started its first round of 
> online revision GC. After that process completed, the standby node started 
> throwing exceptions about missing segments, and eventually stopped 
> altogether. In the meantime, the following warning showed up in the primary 
> log:
> {code:java}
> 29.09.2017 06:12:08.088 *WARN* [nioEventLoopGroup-3-10] 
> org.apache.jackrabbit.oak.segment.standby.server.ExceptionHandler Exception 
> caught on the server
> io.netty.handler.codec.TooLongFrameException: frame length (8208) exceeds the 
> allowed maximum (8192)
>         at 
> io.netty.handler.codec.LineBasedFrameDecoder.fail(LineBasedFrameDecoder.java:146)
>         at 
> io.netty.handler.codec.LineBasedFrameDecoder.fail(LineBasedFrameDecoder.java:142)
>         at 
> io.netty.handler.codec.LineBasedFrameDecoder.decode(LineBasedFrameDecoder.java:99)
>         at 
> io.netty.handler.codec.LineBasedFrameDecoder.decode(LineBasedFrameDecoder.java:75)
>         at 
> io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:411)
>         at 
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:248)
>         at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:366)
>         at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:352)
>         at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:345)
>         at 
> io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
>         at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:366)
>         at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:352)
>         at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:345)
>         at 
> io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1294)
>         at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:366)
>         at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:352)
>         at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:911)
>         at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
>         at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:611)
>         at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:552)
>         at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:466)
>         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:438)
>         at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:140)
>         at 
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
>         at java.lang.Thread.run(Thread.java:745)
> {code}
> This is what seems to be happening:
> # The revision GC creates brand new segments, and the standby instance starts 
> pulling them into its own store.
> # When the standby sees an "in-memory" blobID, it decides that it doesn't 
> have this blob in its own blobstore, so it proceeds to ask for the bytes of 
> the blob from the primary, even though they are encoded in the ID itself.
> # The longest blobID can be more than 8K in size (the 4K blob gets doubled by 
> hex encoding). When such a long blobID is submitted to the primary, the 
> request gets rejected because of excessive length. The secondary keeps 
> waiting until the request times out, and no progress is made in syncing.
> The issue doesn't pop up with repositories that started as Segment-Tar since 
> Segment-Tar always inlines blobs below some hardcoded threshold (16K if I 
> remember correctly).
> I think there could be multiple ways to approach this, not mutually exclusive:
> * Special-case the "in-memory" BlobIDs during sidegrade and replace them with 
> the "native" segment values. If hardcoding knowledge about this 
> implementation detail isn't desired, there could be a new option for the 
> sidegrade process, to force "inlining" of blobs below a certain threshold, 
> even if they aren't in-line in the source repo.
> * Special-case the "in-memory" BlobIDs in StandbyDiff so they aren't 
> requested from the primary, but are either kept as-is or get converted to the 
> "native" format.
> * Increase the network package size limit in the sync protocol, or allow it 
> to be configured. This is the least efficient option, but with the least 
> impact on the code.
> I can work on detailed reproduction steps if needed, but I'd rather not do it 
> beforehand because this is rather cumbersome to reproduce



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (OAK-6749) Segment-Tar standby sync fails with "in-memory" blobs present in the source repo

Reply via email to