[
https://issues.apache.org/jira/browse/OAK-10778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nuno Santos resolved OAK-10778.
-------------------------------
Fix Version/s: 1.64.0
Resolution: Done
> Indexing job: support parallel download from MongoDB with two connections in
> Pipelined strategy
> -----------------------------------------------------------------------------------------------
>
> Key: OAK-10778
> URL: https://issues.apache.org/jira/browse/OAK-10778
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Components: indexing
> Reporter: Nuno Santos
> Priority: Major
> Fix For: 1.64.0
>
>
> The current version of the Pipelined download strategy uses a single
> connection/thread to download from MongoDB. We can further increase the
> download speed by using an additional MongoDB connection. A Mongo deployment
> has 1 primary and 2 secondaries, so in principle we could have 1 connection
> to each secondary, effectively doubling the download speed.
> There are a few points to observe:
> - Connections should go to different secondaries. If both connections go to
> the same secondary, there's a high change that they will be limited by what a
> single replica can provide and of overloading that replica. So each secondary
> should have one and only one connection.
> - How to partition the range of documents to download between two threads.
> We are already downloading from Mongo in order of {{(_modified, _id)}}. A
> simple and effective partition strategy for 2 connections is for one to
> download in ascending and the other in descending order.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)