Hi Davide,
So what would happen to the already-indexed content which wasn't in
one of the reindexPaths?

For example, let's say I'm building an index of a property called
"keywords". In the repo, I have:

/content/foo@keywords=something
/content/bar/one@keywords=something
/content/bar/two@keywords=something

And then I trigger a reindex with reindexPaths = /content/bar.

Would //element(*)[@keywords='something'] still return /content/foo ?

Regards,
Justin


On Tue, Aug 26, 2014 at 6:04 AM, Davide Giannella <dav...@apache.org> wrote:
> Hello team,
>
> when we issue the reindex by changing the index definition with
> `reindex=true` the algorithm scan all the repository and issue the "node
> modified/added" to the specified index.
>
> While this works with small repositories it doesn't really scale with
> big ones.
>
> So for taking an extreme example, we have 2 millions node repository
> with only 1 node with the required property. The reindex will keep going
> for as long the 2m node have not been scanned. And with very active
> repositories where we changes a lot of nodes, manually or not, we could
> virtually have an endless reindexing.
>
> Based on my experience with content repositories normally clients are
> interested in querying only parts of it. For example /content.
>
> I was thinking that it could be a good added value if we could add an
> additional property to the index definition: reindexPaths (multivalue,
> String).
>
> When this property is specified, the reindex will happens only on those
> paths in the order as they are specified and it could potentially makes
> the currently indexed content available to the query engine for
> returning partial results when every path is completed.
>
> A single path could be just path or a glob/regex. I'm for using a java
> regex as it gives the end user a lot of power on fine tuning but on the
> other hand regex evaluation is pretty slow...
>
> thoughts?
>
> Cheers
> Davide
>
>
>

Reply via email to