Hi all

On a freshly migrated Oak setup (AEM 6.1), I recently observed that
async indexing was running all the time. At first I did not worry,
because there were ~14mio nodes to be indexed, but eventually I got
the impression that there was an endless loop.

Here's my take on what's happening, and please feel free to correct
any wrong assumptions I make:

- after a migration there is no checkpoint for async indexing to start
at, so it indexes everything
- a migration is a single commit, so async indexing is all or nothing
(not sure the single commit is relevant, anyone?)
- due to an oddity in the metadata of a PDF file, async indexing
failed with an exception
- async indexing recommences to see if the error persists on any subsequent run
- rinse and repeat

If my interpretation is correct, I would suggest to review the error handling.

If an error is not recoverable, the current behaviour basically
prevents any documents to be indexed and the AsyncIndexUpdate stops to
make any progress.

It may be a better trade off to report the paths of failing documents
and continue despite the failure.

What do others think?

Regards
Julian

Reply via email to