Re: How can I access to the TextExtractor result?

Dave Brosius Tue, 24 Nov 2009 17:35:35 -0800

I'm assuming that this is only safe when the repository is not open thrujackrabbit, otherwise concurrent havoc will insue.


Sébastien Launay wrote:

Hi Paco,


If you are not afraid to get their hands dirty you can use Luke [1]
and analyze the indexes found in repository/workspaces/*/index.
You might want to search the field named '_:FULLTEXT' (told you it
will get dirty ;)).

[1] http://code.google.com/p/luke/

2009/11/24 Paco Avila <[email protected]>:

Thanks, this is the expected answer :(

Anyway, there is any way to detect a failed text extraction ? I know,
I can see the log but the failure it not associated to a file or path.

Some times when I upload a document (word, pdf, etc.) to my DMS build
on Jackrabbit, it is not indexed. Office documents seems to be
specially problematic due to its propietary format. And the problem is
that I don't know which document had problems it their text
extraction, specially if use extractorPoolSize > 1.

Perhaps this question should be send to the development list? I thinks
this can be a very useful improvement to Jackrabbit.

Re: How can I access to the TextExtractor result?

Reply via email to