[ 
https://issues.apache.org/jira/browse/JCR-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcel Reutegger resolved JCR-2388.
-----------------------------------

       Resolution: Invalid
    Fix Version/s:     (was: 2.0-beta2)

As of Jackrabbit 2.0 the module jackrabbit-text-extractors has been replaced 
with a dependency to Apache Tika 0.4, which includes PDFBox 0.7.3

If you are using Jackrabbit 1.x then I suggest you write your own text 
extractor that uses PDFBox 0.8.0 and configure it accordingly in the 
workspace.xml.

For Jackrabbit 2.0 we'd have to wait for Tika 0.5, which will include PDFBox 
0.8.0 (http://issues.apache.org/jira/browse/TIKA-158)

> Upgrade PDFBox to version 0.8.0
> -------------------------------
>
>                 Key: JCR-2388
>                 URL: https://issues.apache.org/jira/browse/JCR-2388
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: jackrabbit-text-extractors
>    Affects Versions: 2.0-beta1
>            Reporter: William Woodward
>
> The most recent version of PDFBox fixes a bug in their PDFParser class that 
> caused a null pointer when attempting to extract text from documents created 
> w/ Acrobat Pro version 9. see: 
> https://issues.apache.org/jira/browse/PDFBOX-361. Since this is the first 
> Apache incubator release they have also changed the package names. Therefore, 
> simply getting the new PDFBox in not an option because the Jackrabbit text 
> extractor references the old package names.
> This is a MAJOR problem for us since our user community recently updated to 
> Acrobat 9 (and we have no control over this decision). Our users produce time 
> sensitive reports. Without an updated Jackrabbit (w/ updated PDFBox) we can 
> no longer extract and index text from the user's PDFs.
> Thank you for your consideration in this matter,
> Bill Woodward
> Developer

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to