Re: Checkpoints and copying NodeStore instances (aka RepositorySidegrade)

Julian Sedding Wed, 05 Aug 2015 08:46:11 -0700

Hi Alex

Thanks for your comments.


On Wed, Aug 5, 2015 at 3:48 PM, Alex Parvulescu
<[email protected]> wrote:
> Hi,
>
> Just a few clarifications on the error you see
>
>> My interpretation is that the AsyncIndexUpdate is trying to retrieve
> the previous checkpoint as stored in /:async/async. Of course this
> checkpoint is not present in the copied NodeStore and thus cannot be
> retrieved.
>
> The error comes from DocumentMk trying to parse the reference checkpoint
> value. Basically what fails here is 'Revision.fromString' receiving a
> malformed checkpoint value because it comes from the SegmentMk. The quick
> fix is to manually remove the properties on the "/:async" hidden node. This
> will indeed trigger a full reindex, but will help you getting over this
> issue.

Agreed. In this case parsing the revision is the first thing that
fails. When copying DNS to SNS a similar situation would arise,
because no snapshot with the provided ID exists.

>
>> IMHO it would be desirable to (optionally) copy the checkpoints as
> well. In the case of AsyncIndexUpdate, having the checkpoint can save
> a full re-index.
>
> This is very tricky, as the 2 representations of checkpoints between
> SegmentMk and DocumentMk are quite different. I would strongly suggest
> going for the reindex, after all you'd only migrate once, so you can
> prepare for this lengthy process.

I'm experimenting with the following approach:
* retrieve the first checkpoint and copy the NodeState tree at that
revision (available via CheckpointMBean impls)
* after copying the tree, merge and create a checkpoint (expiration
time can be calculated)
* rinse and repeat until the head revision is reached

My aim is to reduce the critical path for migrating one NodeStore
(incl JR2) to another. Indexing (especially async indexing) takes is a
big part of the time, so if I can move that out of the critical path,
it can save a lot of downtime.

My current approach for a migration from JR2 to MongoMK is to:
* copy JR2 to TarMK (TarMK is a lot faster for creating indexes etc.
than MongoMK)
* repeat JR2 to TarMK copy every week or every 24h using incremental
copy. this saves on CommitHook execution time - in theory this can
reduce the time for one run to a single full repository traversal.
* finally on the day when the systems should be switched over, run a
last JR2 to TarMK and then a TarMK to MongoMK copy. this is the
critical path.

Due to the above, copying at least the checkpoint of the async index
will likely speed up the critical path. Of course measuring execution
times will provide the definitive answer to this question.

Regards
Julian

>
> best,
> alex
>
>
> On Wed, Aug 5, 2015 at 3:35 PM, Julian Sedding <[email protected]> wrote:
>
>> Hi all
>>
>> I am working on a scenario, where I need to copy a SegmentNodeStore
>> (TarMK) to a DocumentNodeStore (MongoDB).
>>
>> It is pretty straight forward to simply copy the NodeStore via the
>> API. No problems here.
>>
>> In a recent experiment I successfully copied the NodeStore and got an
>> exception in the logs (stacktrace below the email).
>>
>> My interpretation is that the AsyncIndexUpdate is trying to retrieve
>> the previous checkpoint as stored in /:async/async. Of course this
>> checkpoint is not present in the copied NodeStore and thus cannot be
>> retrieved.
>>
>> IMHO it would be desirable to (optionally) copy the checkpoints as
>> well. In the case of AsyncIndexUpdate, having the checkpoint can save
>> a full re-index.
>>
>> The question that remains is how the internal state of
>> AsyncIndexUpdate should be modified:
>> * implementing the logic in oak-upgrade would be pragmatic, but
>> distributes knowledge about AsyncIndexUpdate implementation details to
>> different modules
>> * having a CommitHook/Editor in oak-core that can be used in
>> oak-upgrade might be cleaner, but would only get used in oak-upgrade
>>
>> Other ideas and opinions regarding this feature are more than welcome!
>>
>> Regards
>> Julian
>>
>>
>> 05.08.2015 00:03:19.133 *ERROR* [pool-6-thread-2]
>> org.apache.sling.commons.scheduler.impl.QuartzScheduler Exception
>> during job execution of
>> org.apache.jackrabbit.oak.plugins.index.AsyncIndexUpdate@471e4b4b :
>> 91f7e218-6cf5-4a44-a324-f094c29898e6
>> java.lang.IllegalArgumentException: 91f7e218-6cf5-4a44-a324-f094c29898e6
>>         at
>> org.apache.jackrabbit.oak.plugins.document.Revision.fromString(Revision.java:236)
>>         at
>> org.apache.jackrabbit.oak.plugins.document.DocumentNodeStore.retrieve(DocumentNodeStore.java:1570)
>>         at
>> org.apache.jackrabbit.oak.plugins.index.AsyncIndexUpdate.run(AsyncIndexUpdate.java:279)
>>         at
>> org.apache.sling.commons.scheduler.impl.QuartzJobExecutor.execute(QuartzJobExecutor.java:105)
>>         at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
>>         at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>         at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>         at java.lang.Thread.run(Thread.java:745)
>>

Re: Checkpoints and copying NodeStore instances (aka RepositorySidegrade)

Reply via email to