[
https://issues.apache.org/jira/browse/OAK-3559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14986971#comment-14986971
]
Tomek Rękawek edited comment on OAK-3559 at 12/2/15 1:27 PM:
-------------------------------------------------------------
h4. New bulk update method
The patch adds new {{createOrUpdate(Collection<T> collection, List<UpdateOp>
updateOps)}} method to the {{DocumentStore}} interface. The MongoDB
implementation uses Bulk API. RDB and Memory document stores has been extended
with a naive implementation iterating over {{updateOps}}. The Mongo
implementation works as follows:
1. For each {{UpdateOp}} try to read the assigned document from the cache. Add
them to {{oldDocs}}.
2. Prepare a list of all {{UpdateOps}} that doesn't have their documents and
read them in one {{find()}} call. Add results to {{oldDocs}}.
3. Prepare a bulk update. For each remaining {{UpdateOp}} add following
operation:
* Find document with the same id and the same {{mod_count}} as in the
{{oldDocs}}.
* Apply changes from the {{UpdateOps}}.
4. Execute the bulk update.
If some other process modifies the target documents between points 2 and 3, the
{{mod_count}} will be increased as well and the bulk update will fail for the
concurrently modified docs. The method will then remove the failed documents
from the {{oldDocs}} and restart the process from point 2. It will stop after
3rd iteration.
h4. Changes in the Commit class
The new method has been used in the {{Commit#applyToDocumentStore}}. If it
fails (eg. there has been more than 3 unsuccessful retries in the Mongo
implementation), there will be fallback to the classic approach, applying one
update after another.
h4. Changes in the CommitQueue and ConflictException
Introducing bulk updates means that we may have conflicts in many revisions at
the same time. That's the reason why the {{ConflictException}} now contains the
revision list, rather than a single revision number. In order to resolve
conflicts in the {{DocumentNodeStoreBranch#merge0}} method, the
{{CommitQueue#suspendUntil()}} has been extended as well. Now it allows to pass
a list of revisions and suspends execution until all of them are visible.
was (Author: tomek.rekawek):
The pull request has been created here:
https://github.com/apache/jackrabbit-oak/pull/43
The patch can be downloaded from:
https://patch-diff.githubusercontent.com/raw/apache/jackrabbit-oak/pull/43.diff
h4. New bulk update method
The patch adds new {{createOrUpdate(Collection<T> collection, List<UpdateOp>
updateOps)}} method to the {{DocumentStore}} interface. The MongoDB
implementation uses Bulk API. RDB and Memory document stores has been extended
with a naive implementation iterating over {{updateOps}}. The Mongo
implementation works as follows:
1. For each {{UpdateOp}} try to read the assigned document from the cache. Add
them to {{oldDocs}}.
2. Prepare a list of all {{UpdateOps}} that doesn't have their documents and
read them in one {{find()}} call. Add results to {{oldDocs}}.
3. Prepare a bulk update. For each remaining {{UpdateOp}} add following
operation:
* Find document with the same id and the same {{mod_count}} as in the
{{oldDocs}}.
* Apply changes from the {{UpdateOps}}.
4. Execute the bulk update.
If some other process modifies the target documents between points 2 and 3, the
{{mod_count}} will be increased as well and the bulk update will fail for the
concurrently modified docs. The method will then remove the failed documents
from the {{oldDocs}} and restart the process from point 2. It will stop after
3rd iteration.
h4. Changes in the Commit class
The new method has been used in the {{Commit#applyToDocumentStore}}. If it
fails (eg. there has been more than 3 unsuccessful retries in the Mongo
implementation), there will be fallback to the classic approach, applying one
update after another.
h4. Changes in the CommitQueue and ConflictException
Introducing bulk updates means that we may have conflicts in many revisions at
the same time. That's the reason why the {{ConflictException}} now contains the
revision list, rather than a single revision number. In order to resolve
conflicts in the {{DocumentNodeStoreBranch#merge0}} method, the
{{CommitQueue#suspendUntil()}} has been extended as well. Now it allows to pass
a list of revisions and suspends execution until all of them are visible.
> Bulk document updates in MongoDocumentStore
> -------------------------------------------
>
> Key: OAK-3559
> URL: https://issues.apache.org/jira/browse/OAK-3559
> Project: Jackrabbit Oak
> Issue Type: Sub-task
> Components: mongomk
> Reporter: Tomek Rękawek
> Fix For: 1.4
>
> Attachments: OAK-3559.patch
>
>
> Using the MongoDB [Bulk
> API|https://docs.mongodb.org/manual/reference/method/Bulk/#Bulk] implement
> the [batch version of createOrUpdate method|OAK-3662].
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)