Hello team,

when we issue the reindex by changing the index definition with
`reindex=true` the algorithm scan all the repository and issue the "node
modified/added" to the specified index.

While this works with small repositories it doesn't really scale with
big ones.

So for taking an extreme example, we have 2 millions node repository
with only 1 node with the required property. The reindex will keep going
for as long the 2m node have not been scanned. And with very active
repositories where we changes a lot of nodes, manually or not, we could
virtually have an endless reindexing.

Based on my experience with content repositories normally clients are
interested in querying only parts of it. For example /content.

I was thinking that it could be a good added value if we could add an
additional property to the index definition: reindexPaths (multivalue,
String).

When this property is specified, the reindex will happens only on those
paths in the order as they are specified and it could potentially makes
the currently indexed content available to the query engine for
returning partial results when every path is completed.

A single path could be just path or a glob/regex. I'm for using a java
regex as it gives the end user a lot of power on fine tuning but on the
other hand regex evaluation is pretty slow...

thoughts?

Cheers
Davide



Reply via email to