is there any limitation on the size of the pdf the extractor can manage ? we're working with files around 16mb in size.
2010/4/28 JOSE FELIX HERNANDEZ BARRIO <[email protected]> > I don't want to index the content of the pdf for full text search, > can i disable it using the configuration below? > > <SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex"> > > <param name="path" value="${wsp.home}/index"/> > > <param name="textFilterClasses" > value="*org.apache.jackrabbit.extractor.PlainTextExtractor*"/> > > <param name="extractorPoolSize " value="2"/> > > <param name="supportHighlighting" value="true"/> > > </SearchIndex> > > > > 2010/4/28 Jukka Zitting <[email protected]> > > Hi, >> >> On Wed, Apr 28, 2010 at 10:50 AM, JOSE FELIX HERNANDEZ BARRIO >> <[email protected]> wrote: >> > I'm inserting pdf in the repository and get the exception: >> > >> > 2010-04-28 10:25:39,763 WARN [PDFStreamEngine.java] [processOperator] * >> > java.io.IOException*: Mapping code should be 1 or two bytes and not 4 >> > at org.apache.fontbox.cmap.CMap.addMapping(*CMap.java:122*) >> >> The underlying PDFBox library is having trouble with your PDF file, >> which results in a warning being logged. This is not too serious, the >> only downside is that this PDF might not show up in full text >> searches. >> >> You may want to report this to [email protected] or to the >> PDFBox issue tracker at https://issues.apache.org/jira/browse/PDFBOX. >> >> BR, >> >> Jukka Zitting >> > > > > -- > Jose Hernandez > 675599600 > Isthari > http://www.isthari.com > -- Jose Hernandez 675599600 Isthari http://www.isthari.com
