I'm assuming that this is only safe when the repository is not open thru jackrabbit, otherwise concurrent havoc will insue.

Sébastien Launay wrote:
Hi Paco,

If you are not afraid to get their hands dirty you can use Luke [1]
and analyze the indexes found in repository/workspaces/*/index.
You might want to search the field named '_:FULLTEXT' (told you it
will get dirty ;)).

[1] http://code.google.com/p/luke/

2009/11/24 Paco Avila <[email protected]>:
Thanks, this is the expected answer :(

Anyway, there is any way to detect a failed text extraction ? I know,
I can see the log but the failure it not associated to a file or path.

Some times when I upload a document (word, pdf, etc.) to my DMS build
on Jackrabbit, it is not indexed. Office documents seems to be
specially problematic due to its propietary format. And the problem is
that I don't know which document had problems it their text
extraction, specially if use extractorPoolSize > 1.

Perhaps this question should be send to the development list? I thinks
this can be a very useful improvement to Jackrabbit.


Reply via email to