Nuno Santos created OAK-10682:
---------------------------------

             Summary: [Indexing job] Improve Mongo regex filter to only use 
positive conditions (no negations)
                 Key: OAK-10682
                 URL: https://issues.apache.org/jira/browse/OAK-10682
             Project: Jackrabbit Oak
          Issue Type: Improvement
          Components: indexing
         Environment: The current implementation of filtering excluded paths 
and custom regex is using a condition like
{noformat}
{ _id:  { $nin: [ /^[0-9]{1,3}:\/content\/dam\/.*$/ ]} {noformat}
Mongo cannot evaluate this condition without retrieving the full document, 
because a value of {{_null}} would also match this condition and the index does 
not contain {{null}} values. Therefore, when the index contains excluded paths, 
the download will be much slower because Mongo has to retrieve every single 
document to evaluate the condition.

As a workaround, we can transform the regex on an equivalent one that matches 
the complement of the original regex using [negative 
lookahead|https://stackoverflow.com/questions/1240275/how-to-negate-specific-word-in-regex].
 This allows rewriting the filter condition using only positive conditions, 
which can be evaluated using only the index.
            Reporter: Nuno Santos






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to