Nuno Santos created OAK-10778:
---------------------------------

             Summary: Indexing job: support parallel download from MongoDB with 
two connections in Pipelined strategy
                 Key: OAK-10778
                 URL: https://issues.apache.org/jira/browse/OAK-10778
             Project: Jackrabbit Oak
          Issue Type: Improvement
          Components: indexing
            Reporter: Nuno Santos


The current version of the Pipelined download strategy uses a single 
connection/thread to download from MongoDB. We can further increase the 
download speed by using an additional MongoDB connection. A Mongo deployment 
has 1 primary and 2 secondaries, so in principle we could have 1 connection to 
each secondary, effectively doubling the download speed.

There are a few points to observe:
 - Connections should go to different secondaries. If both connections go to 
the same secondary, there's a high change that they will be limited by what a 
single replica can provide and of overloading that replica. So each secondary 
should have one and only one connection.

 - How to partition the range of documents to download between two threads. We 
are already downloading from Mongo in order of {{(_modified, _id)}}. A simple 
and effective partition strategy for 2 connections is for one to download in 
ascending and the other in descending order.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to