[jira] [Commented] (TIKA-2093) Add hOCR output type to the TesseractOCRParser

2016-09-22 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15515169#comment-15515169 ] Hudson commented on TIKA-2093: -- SUCCESS: Integrated in Jenkins build tika-2.x #148 (See

[jira] [Commented] (TIKA-2093) Add hOCR output type to the TesseractOCRParser

2016-09-22 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15515128#comment-15515128 ] Hudson commented on TIKA-2093: -- FAILURE: Integrated in Jenkins build tika-2.x-windows #52 (See

tika-2.x-windows - Build # 52 - Still Failing

2016-09-22 Thread Apache Jenkins Server
The Apache Jenkins build system has built tika-2.x-windows (build #52) Status: Still Failing Check console output at https://builds.apache.org/job/tika-2.x-windows/52/ to view the results.

[jira] [Commented] (TIKA-2093) Add hOCR output type to the TesseractOCRParser

2016-09-22 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15515082#comment-15515082 ] Hudson commented on TIKA-2093: -- SUCCESS: Integrated in Jenkins build Tika-trunk #1106 (See

[jira] [Resolved] (TIKA-1627) Authentication for fileUrl

2016-09-22 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1627. --- Resolution: Won't Fix > Authentication for fileUrl > -- > >

[jira] [Commented] (TIKA-1627) Authentication for fileUrl

2016-09-22 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15515073#comment-15515073 ] Tim Allison commented on TIKA-1627: --- We removed fileUrl in Tika 1.10 because it was a [security

[jira] [Comment Edited] (TIKA-2093) Add hOCR output type to the TesseractOCRParser

2016-09-22 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15515035#comment-15515035 ] Tim Allison edited comment on TIKA-2093 at 9/23/16 1:26 AM: [~epugh], I made a

[jira] [Commented] (TIKA-2093) Add hOCR output type to the TesseractOCRParser

2016-09-22 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15515035#comment-15515035 ] Tim Allison commented on TIKA-2093: --- [~epugh], I made a few modifications. The biggest was parsing hocr

[jira] [Commented] (TIKA-2093) Add hOCR output type to the TesseractOCRParser

2016-09-22 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15515007#comment-15515007 ] ASF GitHub Bot commented on TIKA-2093: -- Github user asfgit closed the pull request at:

[GitHub] tika pull request #133: add hOCR output format to TesseractParser TIKA-2093

2016-09-22 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/tika/pull/133 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[jira] [Assigned] (TIKA-2093) Add hOCR output type to the TesseractOCRParser

2016-09-22 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison reassigned TIKA-2093: - Assignee: Tim Allison > Add hOCR output type to the TesseractOCRParser >

[jira] [Updated] (TIKA-2093) Add hOCR output type to the TesseractOCRParser

2016-09-22 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-2093: -- Description: I've tweaked the TesseractOCRParser and TesseractOCRConfig to add the "txt" or "hocr"

[jira] [Updated] (TIKA-2093) Add hOCR output type to the TesseractOCRParser

2016-09-22 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-2093: -- Description: FI've tweaked the TesseractOCRParser and TesseractOCRConfig to add the "txt" or "hocr"

[jira] [Commented] (TIKA-2093) Add hOCR output type to the TesseractOCRParser

2016-09-22 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514642#comment-15514642 ] Tim Allison commented on TIKA-2093: --- On mobile, can't do full review. If hocr output is xhtml, we'll prob

[jira] [Commented] (TIKA-2093) Add hOCR output type to the TesseractOCRParser

2016-09-22 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514215#comment-15514215 ] ASF GitHub Bot commented on TIKA-2093: -- GitHub user epugh opened a pull request:

[GitHub] tika pull request #133: add hOCR output format to TesseractParser TIKA-2093

2016-09-22 Thread epugh
GitHub user epugh opened a pull request: https://github.com/apache/tika/pull/133 add hOCR output format to TesseractParser TIKA-2093 Small change to Tesseract OCR code to add the hOCR outputType. In the future we can add `pdf` and `tsv` as output types as well. First

[jira] [Comment Edited] (TIKA-2091) regression: Zip bomb detected! for HTML file

2016-09-22 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514157#comment-15514157 ] Tim Allison edited comment on TIKA-2091 at 9/22/16 7:07 PM: This particular

[jira] [Commented] (TIKA-2091) regression: Zip bomb detected! for HTML file

2016-09-22 Thread Rodrigo Rosenfeld Rosas (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514163#comment-15514163 ] Rodrigo Rosenfeld Rosas commented on TIKA-2091: --- Thanks for your investigation efforts, but I

[jira] [Resolved] (TIKA-2091) regression: Zip bomb detected! for HTML file

2016-09-22 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2091. --- Resolution: Not A Problem Fix Version/s: (was: 1.7) This particular exception is caused by

[jira] [Commented] (TIKA-2091) regression: Zip bomb detected! for HTML file

2016-09-22 Thread Rodrigo Rosenfeld Rosas (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514142#comment-15514142 ] Rodrigo Rosenfeld Rosas commented on TIKA-2091: --- Great, good job :) Anyway, there's no hurry

[jira] [Commented] (TIKA-2091) regression: Zip bomb detected! for HTML file

2016-09-22 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514131#comment-15514131 ] Tim Allison commented on TIKA-2091: --- Y, I'm able to reproduce it in Solr trunk. The issue is caused by

[jira] [Commented] (TIKA-2091) regression: Zip bomb detected! for HTML file

2016-09-22 Thread Rodrigo Rosenfeld Rosas (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514103#comment-15514103 ] Rodrigo Rosenfeld Rosas commented on TIKA-2091: --- I just tried running on Solr 6.1.0, which

[jira] [Comment Edited] (TIKA-2091) regression: Zip bomb detected! for HTML file

2016-09-22 Thread Rodrigo Rosenfeld Rosas (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513977#comment-15513977 ] Rodrigo Rosenfeld Rosas edited comment on TIKA-2091 at 9/22/16 5:51 PM:

[jira] [Commented] (TIKA-2091) regression: Zip bomb detected! for HTML file

2016-09-22 Thread Rodrigo Rosenfeld Rosas (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513977#comment-15513977 ] Rodrigo Rosenfeld Rosas commented on TIKA-2091: --- I just confirmed it happens with the main

[jira] [Created] (TIKA-2093) Add hOCR output type to the TesseractOCRParser

2016-09-22 Thread Eric Pugh (JIRA)
Eric Pugh created TIKA-2093: --- Summary: Add hOCR output type to the TesseractOCRParser Key: TIKA-2093 URL: https://issues.apache.org/jira/browse/TIKA-2093 Project: Tika Issue Type: Improvement

[jira] [Updated] (TIKA-2092) Integrate Math equation image extraction

2016-09-22 Thread Craig Pfeifer (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Pfeifer updated TIKA-2092: Description: "A general-purpose, deep learning-based system to decompile an image into

[jira] [Created] (TIKA-2092) Integrate Math equation image extraction

2016-09-22 Thread Craig Pfeifer (JIRA)
Craig Pfeifer created TIKA-2092: --- Summary: Integrate Math equation image extraction Key: TIKA-2092 URL: https://issues.apache.org/jira/browse/TIKA-2092 Project: Tika Issue Type: Improvement

[jira] [Commented] (TIKA-2091) regression: Zip bomb detected! for HTML file

2016-09-22 Thread Rodrigo Rosenfeld Rosas (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513795#comment-15513795 ] Rodrigo Rosenfeld Rosas commented on TIKA-2091: --- Hmm, I'll try to get more details about it,

[jira] [Commented] (TIKA-2091) regression: Zip bomb detected! for HTML file

2016-09-22 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513790#comment-15513790 ] Tim Allison commented on TIKA-2091: --- Y, this is the place. Thank you. I'm not able to reproduce with

[jira] [Created] (TIKA-2091) regression: Zip bomb detected! for HTML file

2016-09-22 Thread Rodrigo Rosenfeld Rosas (JIRA)
Rodrigo Rosenfeld Rosas created TIKA-2091: - Summary: regression: Zip bomb detected! for HTML file Key: TIKA-2091 URL: https://issues.apache.org/jira/browse/TIKA-2091 Project: Tika

RE: Tika 1.14?

2016-09-22 Thread Allison, Timothy B.
Thank you, Chris! -Original Message- From: Chris Mattmann [mailto:mattm...@apache.org] Sent: Thursday, September 22, 2016 12:25 PM To: dev@tika.apache.org Subject: Re: Tika 1.14? Sounds great to me Tim. If you tell me when the tests are done, I’d be happy to RC a release! On

Re: Tika 1.14?

2016-09-22 Thread Chris Mattmann
Sounds great to me Tim. If you tell me when the tests are done, I’d be happy to RC a release! On 9/21/16, 11:31 AM, "Allison, Timothy B." wrote: All, PDFBox 2.0.3 is now integrated, I'm about to push the integration with POI-3.15. I have a few cleanup things

[jira] [Commented] (TIKA-2090) Extract javascript from PDActions in PDFs

2016-09-22 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513632#comment-15513632 ] Tim Allison commented on TIKA-2090: --- How hard could it be? :)

[jira] [Commented] (TIKA-2069) Extract Macro text from Microsoft Office documents

2016-09-22 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513354#comment-15513354 ] Hudson commented on TIKA-2069: -- SUCCESS: Integrated in Jenkins build Tika-trunk #1105 (See

[jira] [Commented] (TIKA-2069) Extract Macro text from Microsoft Office documents

2016-09-22 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513351#comment-15513351 ] Hudson commented on TIKA-2069: -- SUCCESS: Integrated in Jenkins build tika-2.x #147 (See

tika-2.x-windows - Build # 51 - Still Failing

2016-09-22 Thread Apache Jenkins Server
The Apache Jenkins build system has built tika-2.x-windows (build #51) Status: Still Failing Check console output at https://builds.apache.org/job/tika-2.x-windows/51/ to view the results.

[jira] [Closed] (TIKA-2088) Library conflict with Parser-Tika Plugin and Lib Folder

2016-09-22 Thread Christian Weber (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christian Weber closed TIKA-2088. - Resolution: Not A Problem Somehow I messed up picking the correct Project. I'm sorry for the