[jira] [Created] (TIKA-2788) Upgrade to PDFBox 2.0.13 when available

2018-11-26 Thread Tim Allison (JIRA)
Tim Allison created TIKA-2788: - Summary: Upgrade to PDFBox 2.0.13 when available Key: TIKA-2788 URL: https://issues.apache.org/jira/browse/TIKA-2788 Project: Tika Issue Type: Task

[jira] [Created] (TIKA-2787) Make WriteLimitReachedException public

2018-11-26 Thread Dmitry Goldenberg (JIRA)
Dmitry Goldenberg created TIKA-2787: --- Summary: Make WriteLimitReachedException public Key: TIKA-2787 URL: https://issues.apache.org/jira/browse/TIKA-2787 Project: Tika Issue Type: Bug

[jira] [Updated] (TIKA-2787) Make WriteLimitReachedException public and not subclass of SAXException

2018-11-26 Thread Dmitry Goldenberg (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Goldenberg updated TIKA-2787: Summary: Make WriteLimitReachedException public and not subclass of SAXException (was:

Comparing extracted text with pdftotext

2018-11-26 Thread Tim Allison
All, I just finished drafting a high level "lab report" comparing pdftotext and Tika/PDFBox on the PDFs in our refreshed regression corpus: https://wiki.apache.org/tika/ComparisonTikaAndPDFToText201811. The more interesting bits are in the actual reports from tika-eval and/or the comparison

[jira] [Commented] (TIKA-2776) Tika server child restart

2018-11-26 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16699108#comment-16699108 ] Tim Allison commented on TIKA-2776: --- Three cheers for logging, and thank you for your patience in

[jira] [Comment Edited] (TIKA-2038) A more accurate facility for detecting Charset Encoding of HTML documents

2018-11-26 Thread Hans Brende (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698539#comment-16698539 ] Hans Brende edited comment on TIKA-2038 at 11/26/18 2:38 PM: - The success of

[jira] [Commented] (TIKA-2776) Tika server child restart

2018-11-26 Thread Mario Bisonti (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698892#comment-16698892 ] Mario Bisonti commented on TIKA-2776: - Now I tried to start tika with the command: java

[jira] [Commented] (TIKA-2776) Tika server child restart

2018-11-26 Thread Mario Bisonti (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698868#comment-16698868 ] Mario Bisonti commented on TIKA-2776: - Hallo Tim. now I have both logs for tika server : parent