Re: How can I access to the TextExtractor result?

Sébastien Launay Tue, 24 Nov 2009 11:04:49 -0800

Hi Paco,

If you are not afraid to get their hands dirty you can use Luke [1]
and analyze the indexes found in repository/workspaces/*/index.
You might want to search the field named '_:FULLTEXT' (told you it
will get dirty ;)).


[1] http://code.google.com/p/luke/

2009/11/24 Paco Avila <[email protected]>:
> Thanks, this is the expected answer :(
>
> Anyway, there is any way to detect a failed text extraction ? I know,
> I can see the log but the failure it not associated to a file or path.
>
> Some times when I upload a document (word, pdf, etc.) to my DMS build
> on Jackrabbit, it is not indexed. Office documents seems to be
> specially problematic due to its propietary format. And the problem is
> that I don't know which document had problems it their text
> extraction, specially if use extractorPoolSize > 1.
>
> Perhaps this question should be send to the development list? I thinks
> this can be a very useful improvement to Jackrabbit.

-- 
Sébastien Launay

Re: How can I access to the TextExtractor result?

Reply via email to