[jira] [Commented] (TIKA-1946) Add mime detection and parser for WordPerfect

2016-12-20 Thread Pascal Essiembre (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15766083#comment-15766083 ] Pascal Essiembre commented on TIKA-1946: It now throws a TikaException as you suggest. For child

[jira] [Comment Edited] (TIKA-1788) message/rfc822 parser doesn't identify attachment filenames from Content-Disposition header

2016-12-20 Thread Derek Hardison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15766074#comment-15766074 ] Derek Hardison edited comment on TIKA-1788 at 12/21/16 4:24 AM: The *

[jira] [Commented] (TIKA-1788) message/rfc822 parser doesn't identify attachment filenames from Content-Disposition header

2016-12-20 Thread Derek Hardison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15766074#comment-15766074 ] Derek Hardison commented on TIKA-1788: -- The * indicates the name is wrapped and it can be any number

[jira] [Closed] (TIKA-2094) Error parsing .doc file with visio embed

2016-12-20 Thread wangruochan (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangruochan closed TIKA-2094. - verified as complete > Error parsing .doc file with visio embed > >

[jira] [Commented] (TIKA-2190) Add "preserve_interword_spaces" option of tesseract

2016-12-20 Thread Bipul Kumar (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15765820#comment-15765820 ] Bipul Kumar commented on TIKA-2190: --- Hi Tim, If you are okay, then should I take up this. I want to

[jira] [Commented] (TIKA-1946) Add mime detection and parser for WordPerfect

2016-12-20 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15765696#comment-15765696 ] Luis Filipe Nassif commented on TIKA-1946: -- Thank you, Pascal! I think it may be better to throw

[jira] [Commented] (TIKA-2219) CharsetDetector no longer detects windows-1252 charset

2016-12-20 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15765514#comment-15765514 ] Hudson commented on TIKA-2219: -- SUCCESS: Integrated in Jenkins build tika-2.x #185 (See

[jira] [Commented] (TIKA-2189) Default value mismatch for "enableImageProcessing" in TesseractOCRConfig.properties and TesseractOCRConfig.java

2016-12-20 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15765501#comment-15765501 ] Hudson commented on TIKA-2189: -- SUCCESS: Integrated in Jenkins build Tika-trunk #1164 (See

[jira] [Commented] (TIKA-2190) Add "preserve_interword_spaces" option of tesseract

2016-12-20 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15765500#comment-15765500 ] Hudson commented on TIKA-2190: -- SUCCESS: Integrated in Jenkins build Tika-trunk #1164 (See

[jira] [Commented] (TIKA-2219) CharsetDetector no longer detects windows-1252 charset

2016-12-20 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15765412#comment-15765412 ] Hudson commented on TIKA-2219: -- FAILURE: Integrated in Jenkins build tika-2.x-windows #86 (See

tika-2.x-windows - Build # 86 - Still Failing

2016-12-20 Thread Apache Jenkins Server
The Apache Jenkins build system has built tika-2.x-windows (build #86) Status: Still Failing Check console output at https://builds.apache.org/job/tika-2.x-windows/86/ to view the results.

[jira] [Commented] (TIKA-1946) Add mime detection and parser for WordPerfect

2016-12-20 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15765359#comment-15765359 ] Tim Allison commented on TIKA-1946: --- W00t! Christmas came early. I'll take a look tomorrow. Thank you!

[jira] [Commented] (TIKA-1946) Add mime detection and parser for WordPerfect

2016-12-20 Thread Pascal Essiembre (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15765348#comment-15765348 ] Pascal Essiembre commented on TIKA-1946: I finally had a bit of time to port the WordPerfect parser

[jira] [Comment Edited] (TIKA-1946) Add mime detection and parser for WordPerfect

2016-12-20 Thread Pascal Essiembre (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15765348#comment-15765348 ] Pascal Essiembre edited comment on TIKA-1946 at 12/20/16 9:51 PM: -- I

[jira] [Commented] (TIKA-2219) CharsetDetector no longer detects windows-1252 charset

2016-12-20 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15765347#comment-15765347 ] Tim Allison commented on TIKA-2219: --- Looks like they aren't twiddling with the confidence scores any more

[jira] [Comment Edited] (TIKA-2219) CharsetDetector no longer detects windows-1252 charset

2016-12-20 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15765347#comment-15765347 ] Tim Allison edited comment on TIKA-2219 at 12/20/16 9:51 PM: - Looks like they

[jira] [Commented] (TIKA-2219) CharsetDetector no longer detects windows-1252 charset

2016-12-20 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15765343#comment-15765343 ] Tim Allison commented on TIKA-2219: --- Great. Thank you! > CharsetDetector no longer detects windows-1252

[jira] [Commented] (TIKA-1946) Add mime detection and parser for WordPerfect

2016-12-20 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15765330#comment-15765330 ] ASF GitHub Bot commented on TIKA-1946: -- GitHub user essiembre opened a pull request:

[GitHub] tika pull request #141: New WordPerfect and QuattroPro parsers for TIKA-1946...

2016-12-20 Thread essiembre
GitHub user essiembre opened a pull request: https://github.com/apache/tika/pull/141 New WordPerfect and QuattroPro parsers for TIKA-1946 contributed by pascal.essiembre You can merge this pull request into a Git repository by running: $ git pull

[jira] [Resolved] (TIKA-2189) Default value mismatch for "enableImageProcessing" in TesseractOCRConfig.properties and TesseractOCRConfig.java

2016-12-20 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2189. --- Resolution: Fixed Fix Version/s: 1.15 Thank you! > Default value mismatch for

[jira] [Resolved] (TIKA-2190) Add "preserve_interword_spaces" option of tesseract

2016-12-20 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2190. --- Resolution: Fixed Fix Version/s: 1.15 2.0 Thank you! > Add

[GitHub] tika pull request #139: [TIKA-2189] fix for Default value mismatch for "enab...

2016-12-20 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/tika/pull/139 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[jira] [Commented] (TIKA-2189) Default value mismatch for "enableImageProcessing" in TesseractOCRConfig.properties and TesseractOCRConfig.java

2016-12-20 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15765300#comment-15765300 ] ASF GitHub Bot commented on TIKA-2189: -- Github user asfgit closed the pull request at:

[jira] [Commented] (TIKA-2221) poi.EncryptedDocumentException not wrapped in tika.exception.EncryptedDocumentException

2016-12-20 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15765208#comment-15765208 ] Hudson commented on TIKA-2221: -- UNSTABLE: Integrated in Jenkins build tika-2.x #184 (See

[jira] [Commented] (TIKA-2219) CharsetDetector no longer detects windows-1252 charset

2016-12-20 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15765207#comment-15765207 ] Hudson commented on TIKA-2219: -- UNSTABLE: Integrated in Jenkins build tika-2.x #184 (See

[jira] [Assigned] (TIKA-2190) Add "preserve_interword_spaces" option of tesseract

2016-12-20 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison reassigned TIKA-2190: - Assignee: Tim Allison > Add "preserve_interword_spaces" option of tesseract >

[jira] [Commented] (TIKA-2219) CharsetDetector no longer detects windows-1252 charset

2016-12-20 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15765144#comment-15765144 ] Hudson commented on TIKA-2219: -- FAILURE: Integrated in Jenkins build tika-2.x-windows #85 (See

tika-2.x-windows - Build # 85 - Still Failing

2016-12-20 Thread Apache Jenkins Server
The Apache Jenkins build system has built tika-2.x-windows (build #85) Status: Still Failing Check console output at https://builds.apache.org/job/tika-2.x-windows/85/ to view the results.

[jira] [Commented] (TIKA-2219) CharsetDetector no longer detects windows-1252 charset

2016-12-20 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15765129#comment-15765129 ] Hudson commented on TIKA-2219: -- SUCCESS: Integrated in Jenkins build Tika-trunk #1163 (See

tika-2.x - Build # 183 - Failure

2016-12-20 Thread Apache Jenkins Server
The Apache Jenkins build system has built tika-2.x (build #183) Status: Failure Check console output at https://builds.apache.org/job/tika-2.x/183/ to view the results.

[jira] [Commented] (TIKA-2219) CharsetDetector no longer detects windows-1252 charset

2016-12-20 Thread Pascal Essiembre (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15765059#comment-15765059 ] Pascal Essiembre commented on TIKA-2219: BTW, I tested and can confirm you fix works just fine. >

[jira] [Commented] (TIKA-2219) CharsetDetector no longer detects windows-1252 charset

2016-12-20 Thread Pascal Essiembre (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15765034#comment-15765034 ] Pascal Essiembre commented on TIKA-2219: I am relying on CharsetDetector. Thanks for the fix! >

[jira] [Resolved] (TIKA-2219) CharsetDetector no longer detects windows-1252 charset

2016-12-20 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2219. --- Resolution: Fixed Fix Version/s: 1.15 2.0 > CharsetDetector no longer

[jira] [Commented] (TIKA-2221) poi.EncryptedDocumentException not wrapped in tika.exception.EncryptedDocumentException

2016-12-20 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15765007#comment-15765007 ] Hudson commented on TIKA-2221: -- FAILURE: Integrated in Jenkins build tika-2.x-windows #84 (See

[jira] [Commented] (TIKA-2219) CharsetDetector no longer detects windows-1252 charset

2016-12-20 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15765010#comment-15765010 ] Tim Allison commented on TIKA-2219: --- Y, your diagnosis is correct. Thank you. So that we capture the

tika-2.x-windows - Build # 84 - Still Failing

2016-12-20 Thread Apache Jenkins Server
The Apache Jenkins build system has built tika-2.x-windows (build #84) Status: Still Failing Check console output at https://builds.apache.org/job/tika-2.x-windows/84/ to view the results.

[jira] [Commented] (TIKA-2221) poi.EncryptedDocumentException not wrapped in tika.exception.EncryptedDocumentException

2016-12-20 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15764968#comment-15764968 ] Hudson commented on TIKA-2221: -- SUCCESS: Integrated in Jenkins build Tika-trunk #1162 (See

[jira] [Commented] (TIKA-2220) Refactor/merge new experimental docx/pptx components

2016-12-20 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15764967#comment-15764967 ] Hudson commented on TIKA-2220: -- SUCCESS: Integrated in Jenkins build Tika-trunk #1162 (See

[jira] [Resolved] (TIKA-2221) poi.EncryptedDocumentException not wrapped in tika.exception.EncryptedDocumentException

2016-12-20 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2221. --- Resolution: Fixed Fix Version/s: 2.0 Thank you! > poi.EncryptedDocumentException not wrapped

[jira] [Resolved] (TIKA-2220) Refactor/merge new experimental docx/pptx components

2016-12-20 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2220. --- Resolution: Fixed Fix Version/s: 1.15 We may want to split these out again in the future... >

[jira] [Created] (TIKA-2221) poi.EncryptedDocumentException not wrapped in tika.exception.EncryptedDocumentException

2016-12-20 Thread Matthew Caruana Galizia (JIRA)
Matthew Caruana Galizia created TIKA-2221: - Summary: poi.EncryptedDocumentException not wrapped in tika.exception.EncryptedDocumentException Key: TIKA-2221 URL:

Re: Apache Tika issue review (TIKA-2190 & TIKA-2189)

2016-12-20 Thread Chris Mattmann
Moving dev-owner to BCC. I think you meant to send this to dev@tika.apache.org, so sending there J From: Bipul Kumar Date: Tuesday, December 20, 2016 at 1:54 AM To: "dev-ow...@tika.apache.org" , "talli...@mitre.org"

[jira] [Created] (TIKA-2220) Refactor/merge new experimental docx/pptx components

2016-12-20 Thread Tim Allison (JIRA)
Tim Allison created TIKA-2220: - Summary: Refactor/merge new experimental docx/pptx components Key: TIKA-2220 URL: https://issues.apache.org/jira/browse/TIKA-2220 Project: Tika Issue Type:

[jira] [Commented] (TIKA-2219) CharsetDetector no longer detects windows-1252 charset

2016-12-20 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15764363#comment-15764363 ] Tim Allison commented on TIKA-2219: --- Thank you for opening this. This was caused by our "upgrade" to our

[jira] [Commented] (TIKA-2201) OutOfMemoryError on a reasonably sized document

2016-12-20 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15764283#comment-15764283 ] Tim Allison commented on TIKA-2201: --- I saved a single slide from the test document, and I'm getting an