Re: Problem con textExtractor

JOSE FELIX HERNANDEZ BARRIO Wed, 28 Apr 2010 02:24:14 -0700

is there any limitation on the size of the pdf the extractor can manage ?

we're working with files around 16mb in size.





2010/4/28 JOSE FELIX HERNANDEZ BARRIO <[email protected]>

> I don't want to index the content of the pdf for full text search,
> can i disable it using the configuration below?
>
>  <SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
>
>             <param name="path" value="${wsp.home}/index"/>
>
>             <param name="textFilterClasses" 
> value="*org.apache.jackrabbit.extractor.PlainTextExtractor*"/>
>
>             <param name="extractorPoolSize " value="2"/>
>
>             <param name="supportHighlighting" value="true"/>
>
>         </SearchIndex>
>
>
>
> 2010/4/28 Jukka Zitting <[email protected]>
>
> Hi,
>>
>> On Wed, Apr 28, 2010 at 10:50 AM, JOSE FELIX HERNANDEZ BARRIO
>> <[email protected]> wrote:
>> > I'm inserting pdf in the repository and get the exception:
>> >
>> > 2010-04-28 10:25:39,763 WARN [PDFStreamEngine.java] [processOperator] *
>> > java.io.IOException*: Mapping code should be 1 or two bytes and not 4
>> >      at org.apache.fontbox.cmap.CMap.addMapping(*CMap.java:122*)
>>
>> The underlying PDFBox library is having trouble with your PDF file,
>> which results in a warning being logged. This is not too serious, the
>> only downside is that this PDF might not show up in full text
>> searches.
>>
>> You may want to report this to [email protected] or to the
>> PDFBox issue tracker at https://issues.apache.org/jira/browse/PDFBOX.
>>
>> BR,
>>
>> Jukka Zitting
>>
>
>
>
> --
> Jose Hernandez
> 675599600
> Isthari
> http://www.isthari.com
>



-- 
Jose Hernandez
675599600
Isthari
http://www.isthari.com

Re: Problem con textExtractor

Reply via email to