[ https://issues.apache.org/jira/browse/TIKA-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tim Allison reassigned TIKA-1994: --------------------------------- Assignee: Tim Allison > Integrate OCR with PDFParser > ---------------------------- > > Key: TIKA-1994 > URL: https://issues.apache.org/jira/browse/TIKA-1994 > Project: Tika > Issue Type: Improvement > Reporter: Tim Allison > Assignee: Tim Allison > > Users can now run OCR on individual images embedded inline with PDFs if they > do the right configuration. > It might be useful to run OCR against each rendered page (instead of the > component images). > Integrating OCR is on the roadmap for PDFBox 2.1 (PDFBOX-1912). This will > allow us to experiment with strategies until the cleaner integration is > available with PDFBox 2.1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)