Nuno Santos created OAK-11158:
---------------------------------
Summary: indexing-job/downloader - Move the conversion of Mongo
responses to NodeDocument from the download to the transform threads
Key: OAK-11158
URL: https://issues.apache.org/jira/browse/OAK-11158
Project: Jackrabbit Oak
Issue Type: Bug
Components: indexing
Reporter: Nuno Santos
Currently, the download thread is iterating over the response receive from
Mongo by converting the response to NodeDocument instances. This is a fairly
expensive operation, that can account for more than 50% of the time of the
download threads. While the download thread is processing the answer, it is
blocked from requesting more data from Mongo, which is often the bottleneck.
We can instead convert the Mongo documents to a RawBsonDocument, which is just
a copy of the binary buffer representing a Mongo document. This is a very fast
operation, as it requires only making a copy the binary buffer. We can then
pass these RawBsonDocuments to the transform threads, which will then convert
them to NodeDocument.
This moves the heavy work of parsing the answer away from the download threads,
which should significantly improve the download speed as the download threads
will take less time to process each Mongo response and will more quickly send
the next request. To deal with the extra load of the transform threads, we can
increase their number, which currently is set to 2.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)