Re: How can I access to the TextExtractor result?

Paco Avila Tue, 24 Nov 2009 11:52:01 -0800

Very interesting :)

On Tue, Nov 24, 2009 at 8:04 PM, Sébastien Launay
<[email protected]> wrote:
> Hi Paco,
>
> If you are not afraid to get their hands dirty you can use Luke [1]
> and analyze the indexes found in repository/workspaces/*/index.
> You might want to search the field named '_:FULLTEXT' (told you it
> will get dirty ;)).
>
> [1] http://code.google.com/p/luke/
>
> 2009/11/24 Paco Avila <[email protected]>:
>> Thanks, this is the expected answer :(
>>
>> Anyway, there is any way to detect a failed text extraction ? I know,
>> I can see the log but the failure it not associated to a file or path.
>>
>> Some times when I upload a document (word, pdf, etc.) to my DMS build
>> on Jackrabbit, it is not indexed. Office documents seems to be
>> specially problematic due to its propietary format. And the problem is
>> that I don't know which document had problems it their text
>> extraction, specially if use extractorPoolSize > 1.
>>
>> Perhaps this question should be send to the development list? I thinks
>> this can be a very useful improvement to Jackrabbit.
>
> --
> Sébastien Launay
>




-- 
Paco Avila
OpenKM
http://www.openkm.com
http://www.guia-ubuntu.org

Re: How can I access to the TextExtractor result?

Reply via email to