[jira] [Created] (TIKA-2922) Regression issue with detecting .dotx and .xlam MS Office mime-types

2019-08-12 Thread Pascal Essiembre (JIRA)
Pascal Essiembre created TIKA-2922: -- Summary: Regression issue with detecting .dotx and .xlam MS Office mime-types Key: TIKA-2922 URL: https://issues.apache.org/jira/browse/TIKA-2922 Project: Tika

[jira] [Commented] (TIKA-2490) Turn off stderr warnings in Tika-app

2018-02-03 Thread Pascal Essiembre (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351487#comment-16351487 ] Pascal Essiembre commented on TIKA-2490: I still believe the warnings should be there by default

[jira] [Created] (TIKA-2530) OutlookExtractor "buffer underrun" when parsing .msg with embedded .msg

2017-12-16 Thread Pascal Essiembre (JIRA)
Pascal Essiembre created TIKA-2530: -- Summary: OutlookExtractor "buffer underrun" when parsing .msg with embedded .msg Key: TIKA-2530 URL: https://issues.apache.org/jira/browse/TIKA-2530 Project:

[jira] [Commented] (TIKA-2352) Incorrect EOF exception in WordPerfect parser

2017-05-04 Thread Pascal Essiembre (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15997267#comment-15997267 ] Pascal Essiembre commented on TIKA-2352: I also checked some of the QuatroPro ones, and those I

[jira] [Commented] (TIKA-2352) Incorrect EOF exception in WordPerfect parser

2017-05-04 Thread Pascal Essiembre (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15997238#comment-15997238 ] Pascal Essiembre commented on TIKA-2352: FYI,

[jira] [Commented] (TIKA-2352) Incorrect EOF exception in WordPerfect parser

2017-05-04 Thread Pascal Essiembre (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15997216#comment-15997216 ] Pascal Essiembre commented on TIKA-2352: I had time to look further at one of the file in lists:

[jira] [Commented] (TIKA-2352) Incorrect EOF exception in WordPerfect parser

2017-05-03 Thread Pascal Essiembre (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15995584#comment-15995584 ] Pascal Essiembre commented on TIKA-2352: No problem. I'd be curious to know how many problematic

[jira] [Commented] (TIKA-2352) Incorrect EOF exception in WordPerfect parser

2017-05-03 Thread Pascal Essiembre (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15995549#comment-15995549 ] Pascal Essiembre commented on TIKA-2352: Must have got lost in the mail! :-) I just made a pull

[jira] [Commented] (TIKA-2352) Incorrect EOF exception in WordPerfect parser

2017-05-03 Thread Pascal Essiembre (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15995451#comment-15995451 ] Pascal Essiembre commented on TIKA-2352: Found the cause. My assumption was wrong that the opening

[jira] [Commented] (TIKA-2232) Add JBIG2 image parsing support

2017-01-11 Thread Pascal Essiembre (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15818741#comment-15818741 ] Pascal Essiembre commented on TIKA-2232: Either way. I think the most important is not to have

[jira] [Updated] (TIKA-2232) Add JBIG2 image parsing support

2017-01-05 Thread Pascal Essiembre (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pascal Essiembre updated TIKA-2232: --- Component/s: (was: detector) > Add JBIG2 image parsing support >

[jira] [Created] (TIKA-2232) Add JBIG2 image parsing support

2017-01-05 Thread Pascal Essiembre (JIRA)
Pascal Essiembre created TIKA-2232: -- Summary: Add JBIG2 image parsing support Key: TIKA-2232 URL: https://issues.apache.org/jira/browse/TIKA-2232 Project: Tika Issue Type: New Feature

[jira] [Created] (TIKA-2228) WordPerfect parser update to support 5.x

2016-12-23 Thread Pascal Essiembre (JIRA)
Pascal Essiembre created TIKA-2228: -- Summary: WordPerfect parser update to support 5.x Key: TIKA-2228 URL: https://issues.apache.org/jira/browse/TIKA-2228 Project: Tika Issue Type:

[jira] [Commented] (TIKA-1946) Add mime detection and parser for WordPerfect

2016-12-22 Thread Pascal Essiembre (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15770744#comment-15770744 ] Pascal Essiembre commented on TIKA-1946: This is code imported from one of our existing project

[jira] [Commented] (TIKA-1946) Add mime detection and parser for WordPerfect

2016-12-21 Thread Pascal Essiembre (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15769350#comment-15769350 ] Pascal Essiembre commented on TIKA-1946: FYI, I found relevant information about 5.x file format.

[jira] [Commented] (TIKA-1946) Add mime detection and parser for WordPerfect

2016-12-21 Thread Pascal Essiembre (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15768959#comment-15768959 ] Pascal Essiembre commented on TIKA-1946: I also like the idea of an {{UnsupportedFormatException}}

[jira] [Updated] (TIKA-1946) Add mime detection and parser for WordPerfect

2016-12-21 Thread Pascal Essiembre (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pascal Essiembre updated TIKA-1946: --- Attachment: wordperfect_signatures_by_versions.xlsx In case you are curious, I am attaching a

[jira] [Updated] (TIKA-1946) Add mime detection and parser for WordPerfect

2016-12-21 Thread Pascal Essiembre (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pascal Essiembre updated TIKA-1946: --- Attachment: TIKA-1946-pascal.essiembre-01.patch I created a patch that will now throw a

[jira] [Comment Edited] (TIKA-1946) Add mime detection and parser for WordPerfect

2016-12-21 Thread Pascal Essiembre (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15768146#comment-15768146 ] Pascal Essiembre edited comment on TIKA-1946 at 12/21/16 9:03 PM: --

[jira] [Comment Edited] (TIKA-1946) Add mime detection and parser for WordPerfect

2016-12-21 Thread Pascal Essiembre (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15768146#comment-15768146 ] Pascal Essiembre edited comment on TIKA-1946 at 12/21/16 9:02 PM: --

[jira] [Comment Edited] (TIKA-1946) Add mime detection and parser for WordPerfect

2016-12-21 Thread Pascal Essiembre (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15768146#comment-15768146 ] Pascal Essiembre edited comment on TIKA-1946 at 12/21/16 8:57 PM: --

[jira] [Commented] (TIKA-1946) Add mime detection and parser for WordPerfect

2016-12-21 Thread Pascal Essiembre (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15768146#comment-15768146 ] Pascal Essiembre commented on TIKA-1946: WordPerfect extensions vary quite a bit. But the parser I

[jira] [Commented] (TIKA-1946) Add mime detection and parser for WordPerfect

2016-12-21 Thread Pascal Essiembre (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15768100#comment-15768100 ] Pascal Essiembre commented on TIKA-1946: I also checked. Looks like a version issue. Files that

[jira] [Commented] (TIKA-1946) Add mime detection and parser for WordPerfect

2016-12-21 Thread Pascal Essiembre (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15767939#comment-15767939 ] Pascal Essiembre commented on TIKA-1946: So what would be the percentage that are parsed properly?

[jira] [Commented] (TIKA-1946) Add mime detection and parser for WordPerfect

2016-12-21 Thread Pascal Essiembre (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15767725#comment-15767725 ] Pascal Essiembre commented on TIKA-1946: H2 works for me. I downloaded the files you shared and

[jira] [Created] (TIKA-2222) Contributing a XFDL Parser

2016-12-21 Thread Pascal Essiembre (JIRA)
Pascal Essiembre created TIKA-: -- Summary: Contributing a XFDL Parser Key: TIKA- URL: https://issues.apache.org/jira/browse/TIKA- Project: Tika Issue Type: Improvement

[jira] [Commented] (TIKA-1946) Add mime detection and parser for WordPerfect

2016-12-21 Thread Pascal Essiembre (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15767695#comment-15767695 ] Pascal Essiembre commented on TIKA-1946: I am not sure when I may have time to benefit from it, but

[jira] [Commented] (TIKA-1946) Add mime detection and parser for WordPerfect

2016-12-21 Thread Pascal Essiembre (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15767676#comment-15767676 ] Pascal Essiembre commented on TIKA-1946: Thanks! > Add mime detection and parser for WordPerfect >

[jira] [Commented] (TIKA-1946) Add mime detection and parser for WordPerfect

2016-12-21 Thread Pascal Essiembre (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15767641#comment-15767641 ] Pascal Essiembre commented on TIKA-1946: You are welcome! I am glad to contribute to such a great

[jira] [Commented] (TIKA-1946) Add mime detection and parser for WordPerfect

2016-12-21 Thread Pascal Essiembre (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15767599#comment-15767599 ] Pascal Essiembre commented on TIKA-1946: I noticed you have some corporate copyrights notices

[jira] [Commented] (TIKA-1946) Add mime detection and parser for WordPerfect

2016-12-21 Thread Pascal Essiembre (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15767581#comment-15767581 ] Pascal Essiembre commented on TIKA-1946: I am OK to remove it as you are correct, .wb2 is not

[jira] [Commented] (TIKA-1946) Add mime detection and parser for WordPerfect

2016-12-20 Thread Pascal Essiembre (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15766083#comment-15766083 ] Pascal Essiembre commented on TIKA-1946: It now throws a TikaException as you suggest. For child

[jira] [Commented] (TIKA-1946) Add mime detection and parser for WordPerfect

2016-12-20 Thread Pascal Essiembre (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15765348#comment-15765348 ] Pascal Essiembre commented on TIKA-1946: I finally had a bit of time to port the WordPerfect parser

[jira] [Comment Edited] (TIKA-1946) Add mime detection and parser for WordPerfect

2016-12-20 Thread Pascal Essiembre (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15765348#comment-15765348 ] Pascal Essiembre edited comment on TIKA-1946 at 12/20/16 9:51 PM: -- I

[jira] [Commented] (TIKA-2219) CharsetDetector no longer detects windows-1252 charset

2016-12-20 Thread Pascal Essiembre (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15765059#comment-15765059 ] Pascal Essiembre commented on TIKA-2219: BTW, I tested and can confirm you fix works just fine. >

[jira] [Commented] (TIKA-2219) CharsetDetector no longer detects windows-1252 charset

2016-12-20 Thread Pascal Essiembre (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15765034#comment-15765034 ] Pascal Essiembre commented on TIKA-2219: I am relying on CharsetDetector. Thanks for the fix! >

[jira] [Updated] (TIKA-2219) CharsetDetector no longer detects windows-1252 charset

2016-12-19 Thread Pascal Essiembre (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pascal Essiembre updated TIKA-2219: --- Description: Starting with Tika 1.14, windows-1252 is no longer detected, as ISO-8859-1 is

[jira] [Created] (TIKA-2219) CharsetDetector no longer detects windows-1252 charset

2016-12-19 Thread Pascal Essiembre (JIRA)
Pascal Essiembre created TIKA-2219: -- Summary: CharsetDetector no longer detects windows-1252 charset Key: TIKA-2219 URL: https://issues.apache.org/jira/browse/TIKA-2219 Project: Tika Issue

[jira] [Commented] (TIKA-1607) Introduce new arbitrary object key/values data structure for persistence of Tika Metadata

2016-02-16 Thread Pascal Essiembre (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148914#comment-15148914 ] Pascal Essiembre commented on TIKA-1607: In the case of XFA forms, the form IS the content. One

[jira] [Created] (TIKA-1857) Enhance PDFParser to extract text from XFA forms

2016-02-15 Thread Pascal Essiembre (JIRA)
Pascal Essiembre created TIKA-1857: -- Summary: Enhance PDFParser to extract text from XFA forms Key: TIKA-1857 URL: https://issues.apache.org/jira/browse/TIKA-1857 Project: Tika Issue Type:

[jira] [Commented] (TIKA-741) "Zip bomb" (XML nesting) detection is too strict

2016-02-09 Thread Pascal Essiembre (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15140212#comment-15140212 ] Pascal Essiembre commented on TIKA-741: --- Awesome, thanks! > "Zip bomb" (XML nesting) detection is too

[jira] [Commented] (TIKA-741) "Zip bomb" (XML nesting) detection is too strict

2016-02-09 Thread Pascal Essiembre (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15140217#comment-15140217 ] Pascal Essiembre commented on TIKA-741: --- So the best way to submit PDFBox 2.0.0 related

[jira] [Commented] (TIKA-741) "Zip bomb" (XML nesting) detection is too strict

2016-02-09 Thread Pascal Essiembre (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15140376#comment-15140376 ] Pascal Essiembre commented on TIKA-741: --- Got it, thanks! > "Zip bomb" (XML nesting) detection is too

[jira] [Issue Comment Deleted] (TIKA-741) "Zip bomb" (XML nesting) detection is too strict

2016-02-09 Thread Pascal Essiembre (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pascal Essiembre updated TIKA-741: -- Comment: was deleted (was: Got it. Thanks!) > "Zip bomb" (XML nesting) detection is too strict >

[jira] [Commented] (TIKA-741) "Zip bomb" (XML nesting) detection is too strict

2016-02-08 Thread Pascal Essiembre (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15137497#comment-15137497 ] Pascal Essiembre commented on TIKA-741: --- What? That easy? Those two simple lines did it in my local

[jira] [Commented] (TIKA-741) "Zip bomb" (XML nesting) detection is too strict

2016-02-06 Thread Pascal Essiembre (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15136128#comment-15136128 ] Pascal Essiembre commented on TIKA-741: --- It looks like maxDepth 100 is not enough. I am using Tika

[jira] [Commented] (TIKA-1837) HtmlEncodingDetector wrongly detects charset from commented meta

2016-01-20 Thread Pascal Essiembre (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108892#comment-15108892 ] Pascal Essiembre commented on TIKA-1837: How often? It was the first and only time I have

[jira] [Created] (TIKA-1837) HtmlEncodingDetector wrongly detects charset from commented meta

2016-01-19 Thread Pascal Essiembre (JIRA)
Pascal Essiembre created TIKA-1837: -- Summary: HtmlEncodingDetector wrongly detects charset from commented meta Key: TIKA-1837 URL: https://issues.apache.org/jira/browse/TIKA-1837 Project: Tika

[jira] [Created] (TIKA-1620) OUTPUT_FILE_TOKEN not being replaced in ExternalParser

2015-04-30 Thread Pascal Essiembre (JIRA)
Pascal Essiembre created TIKA-1620: -- Summary: OUTPUT_FILE_TOKEN not being replaced in ExternalParser Key: TIKA-1620 URL: https://issues.apache.org/jira/browse/TIKA-1620 Project: Tika Issue

[jira] [Updated] (TIKA-1286) Adding MS Visio VSDX to mime-types detection

2015-03-12 Thread Pascal Essiembre (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pascal Essiembre updated TIKA-1286: --- Attachment: TIKA-1286.zip Here you go. One for each types. They do not hold real/significant

[jira] [Created] (TIKA-1286) Adding MS Visio VSDX to mime-types detection

2014-04-29 Thread Pascal Essiembre (JIRA)
Pascal Essiembre created TIKA-1286: -- Summary: Adding MS Visio VSDX to mime-types detection Key: TIKA-1286 URL: https://issues.apache.org/jira/browse/TIKA-1286 Project: Tika Issue Type: