Nuno Santos created OAK-10682:
---------------------------------
Summary: [Indexing job] Improve Mongo regex filter to only use
positive conditions (no negations)
Key: OAK-10682
URL: https://issues.apache.org/jira/browse/OAK-10682
Project: Jackrabbit Oak
Issue Type: Improvement
Components: indexing
Environment: The current implementation of filtering excluded paths
and custom regex is using a condition like
{noformat}
{ _id: { $nin: [ /^[0-9]{1,3}:\/content\/dam\/.*$/ ]} {noformat}
Mongo cannot evaluate this condition without retrieving the full document,
because a value of {{_null}} would also match this condition and the index does
not contain {{null}} values. Therefore, when the index contains excluded paths,
the download will be much slower because Mongo has to retrieve every single
document to evaluate the condition.
As a workaround, we can transform the regex on an equivalent one that matches
the complement of the original regex using [negative
lookahead|https://stackoverflow.com/questions/1240275/how-to-negate-specific-word-in-regex].
This allows rewriting the filter condition using only positive conditions,
which can be evaluated using only the index.
Reporter: Nuno Santos
--
This message was sent by Atlassian Jira
(v8.20.10#820010)