[jira] [Commented] (TIKA-2175) Enable extraction of inlined jp2/jpx from PDF

2016-11-09 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15653028#comment-15653028 ] Hudson commented on TIKA-2175: -- SUCCESS: Integrated in Jenkins build Tika-trunk #1138 (See

[ANNOUNCE] Apache Tika 1.14 release

2016-11-09 Thread Chris Mattmann
Hi, The Apache Tika project is pleased to announce the release of Apache Tika 1.14. The release contents have been pushed out to the main Apache release site and to the Central sync, so the releases should be available as soon as the mirrors get the syncs. Apache Tika is a toolkit for detecting

[jira] [Commented] (TIKA-2175) Enable extraction of inlined jp2/jpx from PDF

2016-11-09 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15652951#comment-15652951 ] Tim Allison commented on TIKA-2175: --- Fixed this in trunk. Will fix in 2.x tomorrow. Thank you,

[jira] [Created] (TIKA-2175) Enable extraction of inlined jp2/jpx from PDF

2016-11-09 Thread Tim Allison (JIRA)
Tim Allison created TIKA-2175: - Summary: Enable extraction of inlined jp2/jpx from PDF Key: TIKA-2175 URL: https://issues.apache.org/jira/browse/TIKA-2175 Project: Tika Issue Type: Bug

[jira] [Commented] (TIKA-2174) Too few formats in support declared by TesseractOCRParser

2016-11-09 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15652937#comment-15652937 ] Hudson commented on TIKA-2174: -- SUCCESS: Integrated in Jenkins build tika-2.x #172 (See

[jira] [Commented] (TIKA-2159) Handle pre-parse embedded object exceptions uniformly and more robustly

2016-11-09 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15652940#comment-15652940 ] Hudson commented on TIKA-2159: -- SUCCESS: Integrated in Jenkins build tika-2.x #172 (See

[jira] [Commented] (TIKA-2173) Add extractInlineImages to PDFParser to enable parameter setting via config

2016-11-09 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15652938#comment-15652938 ] Hudson commented on TIKA-2173: -- SUCCESS: Integrated in Jenkins build tika-2.x #172 (See

[jira] [Commented] (TIKA-2159) Handle pre-parse embedded object exceptions uniformly and more robustly

2016-11-09 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15652931#comment-15652931 ] Hudson commented on TIKA-2159: -- SUCCESS: Integrated in Jenkins build Tika-trunk #1137 (See

tika-2.x-windows - Build # 73 - Still Failing

2016-11-09 Thread Apache Jenkins Server
The Apache Jenkins build system has built tika-2.x-windows (build #73) Status: Still Failing Check console output at https://builds.apache.org/job/tika-2.x-windows/73/ to view the results.

[jira] [Commented] (TIKA-2174) Too few formats in support declared by TesseractOCRParser

2016-11-09 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15652875#comment-15652875 ] Hudson commented on TIKA-2174: -- FAILURE: Integrated in Jenkins build tika-2.x-windows #73 (See

[jira] [Commented] (TIKA-2159) Handle pre-parse embedded object exceptions uniformly and more robustly

2016-11-09 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15652877#comment-15652877 ] Hudson commented on TIKA-2159: -- FAILURE: Integrated in Jenkins build tika-2.x-windows #73 (See

[jira] [Commented] (TIKA-2173) Add extractInlineImages to PDFParser to enable parameter setting via config

2016-11-09 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15652876#comment-15652876 ] Hudson commented on TIKA-2173: -- FAILURE: Integrated in Jenkins build tika-2.x-windows #73 (See

[jira] [Commented] (TIKA-2159) Handle pre-parse embedded object exceptions uniformly and more robustly

2016-11-09 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15652870#comment-15652870 ] Tim Allison commented on TIKA-2159: --- bq. ParsingEmbeddedDocumentExtractor already has some non-ideal

[RESULT] [VOTE] Apache Tika 1.14 Release Candidate #1

2016-11-09 Thread Chris Mattmann
Hi, This VOTE has PASSED with the following tallies: +1 Chris Mattmann* Tim Allison* Bob Paulin* Konstantin Gribov* I’ll go ahead and push to the mirrors and update the website. Thanks to allow who VOTEd! Cheers, Chris On 10/19/16, 11:48 AM, "Chris Mattmann" wrote:

[jira] [Commented] (TIKA-2159) Handle pre-parse embedded object exceptions uniformly and more robustly

2016-11-09 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15652861#comment-15652861 ] Tim Allison commented on TIKA-2159: --- Subclassing won't work because many (most?) parsers don't process

[jira] [Commented] (TIKA-2096) Tika 2.0 -- Supply AutoDetectParser for embedded documents if user forgets to pass it in via ParseContext

2016-11-09 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15652854#comment-15652854 ] Tim Allison commented on TIKA-2096: --- We may want to accelerate this and put it into Tika 1.15. I just

[jira] [Resolved] (TIKA-2174) Too few formats in support declared by TesseractOCRParser

2016-11-09 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2174. --- Resolution: Fixed There's more work to be done on jpx/jp2 extraction from PDFs, but I've added those

[jira] [Commented] (TIKA-2159) Handle pre-parse embedded object exceptions uniformly and more robustly

2016-11-09 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15651543#comment-15651543 ] Tim Allison commented on TIKA-2159: --- #1 it is. Thank you, [~gagravarr]. > Handle pre-parse embedded

[jira] [Commented] (TIKA-2159) Handle pre-parse embedded object exceptions uniformly and more robustly

2016-11-09 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15651535#comment-15651535 ] Nick Burch commented on TIKA-2159: -- Given that we don't control all the parsers, I'm worried things my

[jira] [Comment Edited] (TIKA-2159) Handle pre-parse embedded object exceptions uniformly and more robustly

2016-11-09 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15648562#comment-15648562 ] Tim Allison edited comment on TIKA-2159 at 11/9/16 5:30 PM: For the general

[jira] [Commented] (TIKA-2159) Handle pre-parse embedded object exceptions uniformly and more robustly

2016-11-09 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15651511#comment-15651511 ] Tim Allison commented on TIKA-2159: --- [~gagravarr] or other devs any recommendations/preference? > Handle

[jira] [Comment Edited] (TIKA-2159) Handle pre-parse embedded object exceptions uniformly and more robustly

2016-11-09 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15648562#comment-15648562 ] Tim Allison edited comment on TIKA-2159 at 11/9/16 5:26 PM: For the general

[jira] [Commented] (TIKA-2174) Too few formats in support declared by TesseractOCRParser

2016-11-09 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15651415#comment-15651415 ] Tim Allison commented on TIKA-2174: --- Ok, y, we're seeing the same thing. I asked

[jira] [Commented] (TIKA-2174) Too few formats in support declared by TesseractOCRParser

2016-11-09 Thread Matthew Caruana Galizia (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15651347#comment-15651347 ] Matthew Caruana Galizia commented on TIKA-2174: --- That issue went away once I added 'jp2' and

updated Wiki on OCR for PDFs

2016-11-09 Thread Allison, Timothy B.
https://wiki.apache.org/tika/PDFParser%20%28Apache%20PDFBox%29 Please update as you see fit.

[jira] [Commented] (TIKA-2174) Too few formats in support declared by TesseractOCRParser

2016-11-09 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15651077#comment-15651077 ] Tim Allison commented on TIKA-2174: --- Thank you. If you could share the stacktrace on this issue that you

[jira] [Commented] (TIKA-2174) Too few formats in support declared by TesseractOCRParser

2016-11-09 Thread Matthew Caruana Galizia (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15650892#comment-15650892 ] Matthew Caruana Galizia commented on TIKA-2174: --- Both on inline and independent files. I've

[jira] [Updated] (TIKA-2174) Too few formats in support declared by TesseractOCRParser

2016-11-09 Thread Matthew Caruana Galizia (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthew Caruana Galizia updated TIKA-2174: -- Description: A complete install of Leptonica with Tesseract will add support for

[jira] [Commented] (TIKA-2174) JP2 and JPX (JPEG 2000) support not declared by TesseractOCRParser

2016-11-09 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15650860#comment-15650860 ] Tim Allison commented on TIKA-2174: --- Thank you for opening this. Will fix. To confirm, you're running ocr

[jira] [Created] (TIKA-2174) JP2 and JPX (JPEG 2000) support not declared by TesseractOCRParser

2016-11-09 Thread Matthew Caruana Galizia (JIRA)
Matthew Caruana Galizia created TIKA-2174: - Summary: JP2 and JPX (JPEG 2000) support not declared by TesseractOCRParser Key: TIKA-2174 URL: https://issues.apache.org/jira/browse/TIKA-2174