Re: Problem con textExtractor

JOSE FELIX HERNANDEZ BARRIO Wed, 28 Apr 2010 02:19:56 -0700

I don't want to index the content of the pdf for full text search,
can i disable it using the configuration below?


 <SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
            <param name="path" value="${wsp.home}/index"/>
            <param name="textFilterClasses"
value="*org.apache.jackrabbit.extractor.PlainTextExtractor*"/>
            <param name="extractorPoolSize " value="2"/>
            <param name="supportHighlighting" value="true"/>
        </SearchIndex>



2010/4/28 Jukka Zitting <[email protected]>

> Hi,
>
> On Wed, Apr 28, 2010 at 10:50 AM, JOSE FELIX HERNANDEZ BARRIO
> <[email protected]> wrote:
> > I'm inserting pdf in the repository and get the exception:
> >
> > 2010-04-28 10:25:39,763 WARN [PDFStreamEngine.java] [processOperator] *
> > java.io.IOException*: Mapping code should be 1 or two bytes and not 4
> >      at org.apache.fontbox.cmap.CMap.addMapping(*CMap.java:122*)
>
> The underlying PDFBox library is having trouble with your PDF file,
> which results in a warning being logged. This is not too serious, the
> only downside is that this PDF might not show up in full text
> searches.
>
> You may want to report this to [email protected] or to the
> PDFBox issue tracker at https://issues.apache.org/jira/browse/PDFBOX.
>
> BR,
>
> Jukka Zitting
>



-- 
Jose Hernandez
675599600
Isthari
http://www.isthari.com

Re: Problem con textExtractor

Reply via email to