[ 
https://issues.apache.org/jira/browse/OAK-10778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Reschke reopened OAK-10778:
----------------------------------

> Indexing job: support parallel download from MongoDB with two connections in 
> Pipelined strategy
> -----------------------------------------------------------------------------------------------
>
>                 Key: OAK-10778
>                 URL: https://issues.apache.org/jira/browse/OAK-10778
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: indexing
>            Reporter: Nuno Santos
>            Priority: Major
>             Fix For: 1.64.0
>
>
> The current version of the Pipelined download strategy uses a single 
> connection/thread to download from MongoDB. We can further increase the 
> download speed by using an additional MongoDB connection. A Mongo deployment 
> has 1 primary and 2 secondaries, so in principle we could have 1 connection 
> to each secondary, effectively doubling the download speed.
> There are a few points to observe:
>  - Connections should go to different secondaries. If both connections go to 
> the same secondary, there's a high change that they will be limited by what a 
> single replica can provide and of overloading that replica. So each secondary 
> should have one and only one connection.
>  - How to partition the range of documents to download between two threads. 
> We are already downloading from Mongo in order of {{(_modified, _id)}}. A 
> simple and effective partition strategy for 2 connections is for one to 
> download in ascending and the other in descending order.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to