Nuno Santos created OAK-11158:
---------------------------------

             Summary: indexing-job/downloader - Move the conversion of Mongo 
responses to NodeDocument from the download to the transform threads
                 Key: OAK-11158
                 URL: https://issues.apache.org/jira/browse/OAK-11158
             Project: Jackrabbit Oak
          Issue Type: Bug
          Components: indexing
            Reporter: Nuno Santos


Currently, the download thread is iterating over the response receive from 
Mongo by converting the response to NodeDocument instances. This is a fairly 
expensive operation, that can account for more than 50% of the time of the 
download threads. While the download thread is processing the answer, it is 
blocked from requesting more data from Mongo, which is often the bottleneck.

We can instead convert the Mongo documents to a RawBsonDocument, which is just 
a copy of the binary buffer representing a Mongo document. This is a very fast 
operation, as it requires only making a copy the binary buffer. We can then 
pass these RawBsonDocuments to the transform threads, which will then convert 
them to NodeDocument. 

This moves the heavy work of parsing the answer away from the download threads, 
which should significantly improve the download speed as the download threads 
will take less time to process each Mongo response and will more quickly send 
the next request. To deal with the extra load of the transform threads, we can 
increase their number, which currently is set to 2.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to