Nuno Santos created OAK-10778: --------------------------------- Summary: Indexing job: support parallel download from MongoDB with two connections in Pipelined strategy Key: OAK-10778 URL: https://issues.apache.org/jira/browse/OAK-10778 Project: Jackrabbit Oak Issue Type: Improvement Components: indexing Reporter: Nuno Santos
The current version of the Pipelined download strategy uses a single connection/thread to download from MongoDB. We can further increase the download speed by using an additional MongoDB connection. A Mongo deployment has 1 primary and 2 secondaries, so in principle we could have 1 connection to each secondary, effectively doubling the download speed. There are a few points to observe: - Connections should go to different secondaries. If both connections go to the same secondary, there's a high change that they will be limited by what a single replica can provide and of overloading that replica. So each secondary should have one and only one connection. - How to partition the range of documents to download between two threads. We are already downloading from Mongo in order of {{(_modified, _id)}}. A simple and effective partition strategy for 2 connections is for one to download in ascending and the other in descending order. -- This message was sent by Atlassian Jira (v8.20.10#820010)