[jira] [Commented] (TIKA-2420) Jackcess toSQLString throws UnsupportedOperationException for unknown query type

2017-07-06 Thread Nick C (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16077062#comment-16077062 ] Nick C commented on TIKA-2420: -- I only noticed the Unknown type throwing that exception. The try/catch works

[jira] [Updated] (TIKA-2421) HTML Encoding Detector should ignore UTF-16 and UTF-32

2017-07-06 Thread Nick C (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick C updated TIKA-2421: - Attachment: test.html Here is an example file that if you run through Tika gives a bunch of Chinese characters.

[jira] [Created] (TIKA-2421) HTML Encoding Detector should ignore UTF-16 and UTF-32

2017-07-05 Thread Nick C (JIRA)
Nick C created TIKA-2421: Summary: HTML Encoding Detector should ignore UTF-16 and UTF-32 Key: TIKA-2421 URL: https://issues.apache.org/jira/browse/TIKA-2421 Project: Tika Issue Type: Bug

[jira] [Commented] (TIKA-2420) Jackcess toSQLString throws UnsupportedOperationException for unknown query type

2017-07-05 Thread Nick C (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16075572#comment-16075572 ] Nick C commented on TIKA-2420: -- I'm currently unable to share the document that causes the issue. > Jackcess

[jira] [Created] (TIKA-2420) Jackcess toSQLString throws UnsupportedOperationException for unknown query type

2017-07-05 Thread Nick C (JIRA)
Nick C created TIKA-2420: Summary: Jackcess toSQLString throws UnsupportedOperationException for unknown query type Key: TIKA-2420 URL: https://issues.apache.org/jira/browse/TIKA-2420 Project: Tika

[jira] [Commented] (TIKA-1885) Tika MIME updates for *.cdf and *.xar and custom zero length file detector based on TREC-DD-Polar

2016-05-08 Thread Nick C (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15275792#comment-15275792 ] Nick C commented on TIKA-1885: -- Just saw the changes and noticed a bug; you need to add a mark(1) and reset

[jira] [Commented] (TIKA-1885) Tika MIME updates for *.cdf and *.xar and custom zero length file detector based on TREC-DD-Polar

2016-05-08 Thread Nick C (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15275758#comment-15275758 ] Nick C commented on TIKA-1885: -- I was looking at the code for ZeroSizeFileDetector and noticed the use of

[jira] [Commented] (TIKA-1513) Add mime detection and parsing for dbf files

2016-04-25 Thread Nick C (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15256870#comment-15256870 ] Nick C commented on TIKA-1513: -- Tested more files using the full regex and haven't had any false positives. :D

[jira] [Commented] (TIKA-1513) Add mime detection and parsing for dbf files

2016-04-19 Thread Nick C (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15249129#comment-15249129 ] Nick C commented on TIKA-1513: -- Sounds good. I'll be running this on more files this week and will report back

[jira] [Comment Edited] (TIKA-1513) Add mime detection and parsing for dbf files

2016-04-19 Thread Nick C (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15248463#comment-15248463 ] Nick C edited comment on TIKA-1513 at 4/19/16 7:33 PM: --- I was running this on more

[jira] [Comment Edited] (TIKA-1513) Add mime detection and parsing for dbf files

2016-04-19 Thread Nick C (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15248463#comment-15248463 ] Nick C edited comment on TIKA-1513 at 4/19/16 7:32 PM: --- I was running this on more

[jira] [Comment Edited] (TIKA-1513) Add mime detection and parsing for dbf files

2016-04-19 Thread Nick C (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15248463#comment-15248463 ] Nick C edited comment on TIKA-1513 at 4/19/16 7:31 PM: --- I was running this on more

[jira] [Commented] (TIKA-1513) Add mime detection and parsing for dbf files

2016-04-19 Thread Nick C (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15248463#comment-15248463 ] Nick C commented on TIKA-1513: -- I was running this on more data and ran in to a text file that matched. It

[jira] [Commented] (TIKA-1953) tika-server NullPointerException while processing rtfs

2016-04-18 Thread Nick C (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246846#comment-15246846 ] Nick C commented on TIKA-1953: -- I usually use a TransformerHandler instead of the ToXMLContentHandler but the

[jira] [Commented] (TIKA-1513) Add mime detection and parsing for dbf files

2016-04-17 Thread Nick C (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15245081#comment-15245081 ] Nick C commented on TIKA-1513: -- Did some more testing and simplified the rules enough that it could be made in

[jira] [Comment Edited] (TIKA-1513) Add mime detection and parsing for dbf files

2016-04-17 Thread Nick C (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15245081#comment-15245081 ] Nick C edited comment on TIKA-1513 at 4/18/16 3:09 AM: --- Did some more testing and

[jira] [Commented] (TIKA-1513) Add mime detection and parsing for dbf files

2016-04-13 Thread Nick C (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15239960#comment-15239960 ] Nick C commented on TIKA-1513: -- bq. Well, you know there's still plenty of time to get that into Tika 2.0

[jira] [Commented] (TIKA-1513) Add mime detection and parsing for dbf files

2016-04-13 Thread Nick C (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15239836#comment-15239836 ] Nick C commented on TIKA-1513: -- I added the license header. I think some of the checks could be removed. I'll

[jira] [Comment Edited] (TIKA-1513) Add mime detection and parsing for dbf files

2016-04-11 Thread Nick C (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15236171#comment-15236171 ] Nick C edited comment on TIKA-1513 at 4/11/16 11:10 PM: Some of my checks maybe a

[jira] [Commented] (TIKA-1513) Add mime detection and parsing for dbf files

2016-04-11 Thread Nick C (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15236171#comment-15236171 ] Nick C commented on TIKA-1513: -- Some of my checks maybe a little strict because you can have extra bytes at

[jira] [Commented] (TIKA-1948) Catch exceptions per page in PDFParser

2016-04-11 Thread Nick C (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15236151#comment-15236151 ] Nick C commented on TIKA-1948: -- This is a good change. I had a patch to do something similar but I only

[jira] [Commented] (TIKA-1946) Add mime detection and parser for WordPerfect

2016-04-11 Thread Nick C (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15235631#comment-15235631 ] Nick C commented on TIKA-1946: -- Some is always better than none. Could also try to use wpd2html from

[jira] [Updated] (TIKA-1946) Add mime detection and parser for WordPerfect

2016-04-10 Thread Nick C (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick C updated TIKA-1946: - Description: I noticed some code on github for parsing WordPerfect files (https://github.com/Norconex/importer)

[jira] [Created] (TIKA-1946) Add mime detection and parser for WordPerfect

2016-04-10 Thread Nick C (JIRA)
Nick C created TIKA-1946: Summary: Add mime detection and parser for WordPerfect Key: TIKA-1946 URL: https://issues.apache.org/jira/browse/TIKA-1946 Project: Tika Issue Type: Improvement

[jira] [Commented] (TIKA-1513) Add mime detection and parsing for dbf files

2016-04-10 Thread Nick C (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15234276#comment-15234276 ] Nick C commented on TIKA-1513: -- I wrote the detector from scratch a couple months ago because 0x03 caused too

[jira] [Commented] (TIKA-1945) Powerpoint parser doesn't extract text from diagrams

2016-04-09 Thread Nick C (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15233797#comment-15233797 ] Nick C commented on TIKA-1945: -- Also while looking in to the code I noticed AbstractOOXMLExtractor.getXHTML

[jira] [Created] (TIKA-1945) Powerpoint parser doesn't extract text from diagrams

2016-04-09 Thread Nick C (JIRA)
Nick C created TIKA-1945: Summary: Powerpoint parser doesn't extract text from diagrams Key: TIKA-1945 URL: https://issues.apache.org/jira/browse/TIKA-1945 Project: Tika Issue Type: Bug

[jira] [Created] (TIKA-1944) Add mime magic for apple single/double files

2016-04-09 Thread Nick C (JIRA)
Nick C created TIKA-1944: Summary: Add mime magic for apple single/double files Key: TIKA-1944 URL: https://issues.apache.org/jira/browse/TIKA-1944 Project: Tika Issue Type: Improvement

[jira] [Commented] (TIKA-1513) Add mime detection and parsing for dbf files

2016-04-09 Thread Nick C (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15233743#comment-15233743 ] Nick C commented on TIKA-1513: -- I ended up building a detector that tries to validate the dbf header instead

[jira] [Commented] (TIKA-1927) NPE in JDBCTableReader

2016-04-03 Thread Nick C (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223562#comment-15223562 ] Nick C commented on TIKA-1927: -- One thing I noticed but didn't fix was the handling of nulls for primitive

[jira] [Created] (TIKA-1927) NPE in JDBCTableReader

2016-04-03 Thread Nick C (JIRA)
Nick C created TIKA-1927: Summary: NPE in JDBCTableReader Key: TIKA-1927 URL: https://issues.apache.org/jira/browse/TIKA-1927 Project: Tika Issue Type: Bug Components: parser Affects

[jira] [Commented] (TIKA-1916) NPE in OpenDocumentParser

2016-04-01 Thread Nick C (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15222652#comment-15222652 ] Nick C commented on TIKA-1916: -- I manually created that one because I didn't have the original file. I'll try

[jira] [Updated] (TIKA-1916) NPE in OpenDocumentParser

2016-04-01 Thread Nick C (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick C updated TIKA-1916: - Attachment: MissingMeta.odt Attached is a test file. Nice catch I totally overlooked the closing of the

[jira] [Commented] (TIKA-1914) ExecutableParser doesn't call start document

2016-04-01 Thread Nick C (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1502#comment-1502 ] Nick C commented on TIKA-1914: -- This error happens when using saxon instead of xalan. Xalan doesn't error

[jira] [Updated] (TIKA-1916) NPE in OpenDocumentParser

2016-03-30 Thread Nick C (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick C updated TIKA-1916: - External issue URL: https://github.com/apache/tika/pull/94 External issue ID:

[jira] [Created] (TIKA-1916) NPE in OpenDocumentParser

2016-03-30 Thread Nick C (JIRA)
Nick C created TIKA-1916: Summary: NPE in OpenDocumentParser Key: TIKA-1916 URL: https://issues.apache.org/jira/browse/TIKA-1916 Project: Tika Issue Type: Bug Components: parser

[jira] [Updated] (TIKA-1914) ExecutableParser doesn't call start document

2016-03-30 Thread Nick C (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick C updated TIKA-1914: - External issue URL: https://github.com/apache/tika/pull/93 External issue ID:

[jira] [Created] (TIKA-1914) ExecutableParser doesn't call start document

2016-03-30 Thread Nick C (JIRA)
Nick C created TIKA-1914: Summary: ExecutableParser doesn't call start document Key: TIKA-1914 URL: https://issues.apache.org/jira/browse/TIKA-1914 Project: Tika Issue Type: Bug