The error you see is from pdfbox trying to get the text content out of the pdf 
for search indexes. Unfortunately this seems to happen quite often, but not on 
all PDFs. It is not really significant except if you absolutely need to have 
your documents indexed for searches.

-will

On 18.06.2010, at 12:04, Klein, Ingeborg wrote:

> Hi,
>  
> During  activating a page including pdfs in Magnolia 4.3.1, there was thrown 
> a  Nullpointer Exception with the message “Failed to extract PDF text 
> content”.
> The page was activated properly and the pdf was not corrupt – neither in the 
> autor instance nor in the public instance – so why this exception is thrown ?
>  
> The page was imported from a Magnolia Version 4.1. During the import process, 
> the same error was thrown, but the page was imported successfully and the pdf 
> is not corrupt.
>  
> This error also occurs if a pdf is uploaded.
>  
> This error was not thrown in Magnolia 4.1.
>  
>  
> Regards
>  
> Inge
>  
> 1)       : Failed to extract PDF text content
> java.lang.NullPointerException
>         at org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:194)
>         at org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:182)
>         at 
> org.pdfbox.pdmodel.PDDocumentCatalog.getAllPages(PDDocumentCatalog.java:226)
>         at org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:216)
>         at 
> org.apache.jackrabbit.extractor.PdfTextExtractor.extractText(PdfTextExtractor.java:75)
>         at 
> org.apache.jackrabbit.extractor.CompositeTextExtractor.extractText(CompositeTextExtractor.java:90)
>         at 
> org.apache.jackrabbit.core.query.lucene.JackrabbitTextExtractor.extractText(JackrabbitTextExtractor.java:195)
>         at 
> org.apache.jackrabbit.core.query.lucene.TextExtractorJob$1.call(TextExtractorJob.java:93)
>         at EDU.oswego.cs.dl.util.concurrent.FutureResult$1.run(Unknown Source)
>         at 
> org.apache.jackrabbit.core.query.lucene.TextExtractorJob.run(TextExtractorJob.java:172)
>         at EDU.oswego.cs.dl.util.concurrent.PooledExecutor$Worker.run(Unknown 
> Source)
>         at java.lang.Thread.run(Thread.java:619)
> 2010-06-18 09:48:04,990 WARN  
> org.apache.jackrabbit.extractor.PdfTextExtractor  : Failed to extract PDF 
> text content
> 2010-06-18 09:48:05,006 INFO  
> info.magnolia.module.exchangesimple.ReceiveFilter : User superuser 
> successfuly activated news on magnoliaPublic.
>  
>  
> I found a posting concerning this error with the following solution:
>  
> “I replaced the version of pdfbox (0.6.4) that is bundled with the jackrabbit 
> war file with
> a more recent version (0.7.3 and fontbox 01.) and it worked fine. The bundled 
> versions should
> be upgraded.”
>  
> But after checking the WEB-INF lib of Magnolia, I figured out, that Magnolia 
> 4.3.1 already has included pdfbox-0.7.3.jar and fontbox-0.1.0.jar.
> 
> 
> Regards
>  
> Inge
> 
> 
> ----------------------------------------------------------------
> For list details see
> http://www.magnolia-cms.com/home/community/mailing-lists.html
> To unsubscribe, E-mail to: <[email protected]>
> ----------------------------------------------------------------


----------------------------------------------------------------
For list details see
http://www.magnolia-cms.com/home/community/mailing-lists.html
To unsubscribe, E-mail to: <[email protected]>
----------------------------------------------------------------

Reply via email to