[
https://issues.apache.org/jira/browse/TIKA-2420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16077062#comment-16077062
]
Nick C commented on TIKA-2420:
--
I only noticed the Unknown type throwing that exception. The try/catch works
[
https://issues.apache.org/jira/browse/TIKA-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick C updated TIKA-2421:
-
Attachment: test.html
Here is an example file that if you run through Tika gives a bunch of Chinese
characters.
Nick C created TIKA-2421:
Summary: HTML Encoding Detector should ignore UTF-16 and UTF-32
Key: TIKA-2421
URL: https://issues.apache.org/jira/browse/TIKA-2421
Project: Tika
Issue Type: Bug
[
https://issues.apache.org/jira/browse/TIKA-2420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16075572#comment-16075572
]
Nick C commented on TIKA-2420:
--
I'm currently unable to share the document that causes the issue.
> Jackcess
Nick C created TIKA-2420:
Summary: Jackcess toSQLString throws UnsupportedOperationException
for unknown query type
Key: TIKA-2420
URL: https://issues.apache.org/jira/browse/TIKA-2420
Project: Tika
[
https://issues.apache.org/jira/browse/TIKA-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15275792#comment-15275792
]
Nick C commented on TIKA-1885:
--
Just saw the changes and noticed a bug; you need to add a mark(1) and reset
[
https://issues.apache.org/jira/browse/TIKA-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15275758#comment-15275758
]
Nick C commented on TIKA-1885:
--
I was looking at the code for ZeroSizeFileDetector and noticed the use of
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15256870#comment-15256870
]
Nick C commented on TIKA-1513:
--
Tested more files using the full regex and haven't had any false positives. :D
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15249129#comment-15249129
]
Nick C commented on TIKA-1513:
--
Sounds good. I'll be running this on more files this week and will report back
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15248463#comment-15248463
]
Nick C edited comment on TIKA-1513 at 4/19/16 7:33 PM:
---
I was running this on more
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15248463#comment-15248463
]
Nick C edited comment on TIKA-1513 at 4/19/16 7:32 PM:
---
I was running this on more
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15248463#comment-15248463
]
Nick C edited comment on TIKA-1513 at 4/19/16 7:31 PM:
---
I was running this on more
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15248463#comment-15248463
]
Nick C commented on TIKA-1513:
--
I was running this on more data and ran in to a text file that matched. It
[
https://issues.apache.org/jira/browse/TIKA-1953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246846#comment-15246846
]
Nick C commented on TIKA-1953:
--
I usually use a TransformerHandler instead of the ToXMLContentHandler but the
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15245081#comment-15245081
]
Nick C commented on TIKA-1513:
--
Did some more testing and simplified the rules enough that it could be made in
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15245081#comment-15245081
]
Nick C edited comment on TIKA-1513 at 4/18/16 3:09 AM:
---
Did some more testing and
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15239960#comment-15239960
]
Nick C commented on TIKA-1513:
--
bq. Well, you know there's still plenty of time to get that into Tika 2.0
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15239836#comment-15239836
]
Nick C commented on TIKA-1513:
--
I added the license header. I think some of the checks could be removed. I'll
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15236171#comment-15236171
]
Nick C edited comment on TIKA-1513 at 4/11/16 11:10 PM:
Some of my checks maybe a
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15236171#comment-15236171
]
Nick C commented on TIKA-1513:
--
Some of my checks maybe a little strict because you can have extra bytes at
[
https://issues.apache.org/jira/browse/TIKA-1948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15236151#comment-15236151
]
Nick C commented on TIKA-1948:
--
This is a good change. I had a patch to do something similar but I only
[
https://issues.apache.org/jira/browse/TIKA-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15235631#comment-15235631
]
Nick C commented on TIKA-1946:
--
Some is always better than none. Could also try to use wpd2html from
[
https://issues.apache.org/jira/browse/TIKA-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick C updated TIKA-1946:
-
Description: I noticed some code on github for parsing WordPerfect files
(https://github.com/Norconex/importer)
Nick C created TIKA-1946:
Summary: Add mime detection and parser for WordPerfect
Key: TIKA-1946
URL: https://issues.apache.org/jira/browse/TIKA-1946
Project: Tika
Issue Type: Improvement
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15234276#comment-15234276
]
Nick C commented on TIKA-1513:
--
I wrote the detector from scratch a couple months ago because 0x03 caused too
[
https://issues.apache.org/jira/browse/TIKA-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15233797#comment-15233797
]
Nick C commented on TIKA-1945:
--
Also while looking in to the code I noticed AbstractOOXMLExtractor.getXHTML
Nick C created TIKA-1945:
Summary: Powerpoint parser doesn't extract text from diagrams
Key: TIKA-1945
URL: https://issues.apache.org/jira/browse/TIKA-1945
Project: Tika
Issue Type: Bug
Nick C created TIKA-1944:
Summary: Add mime magic for apple single/double files
Key: TIKA-1944
URL: https://issues.apache.org/jira/browse/TIKA-1944
Project: Tika
Issue Type: Improvement
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15233743#comment-15233743
]
Nick C commented on TIKA-1513:
--
I ended up building a detector that tries to validate the dbf header instead
[
https://issues.apache.org/jira/browse/TIKA-1927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223562#comment-15223562
]
Nick C commented on TIKA-1927:
--
One thing I noticed but didn't fix was the handling of nulls for primitive
Nick C created TIKA-1927:
Summary: NPE in JDBCTableReader
Key: TIKA-1927
URL: https://issues.apache.org/jira/browse/TIKA-1927
Project: Tika
Issue Type: Bug
Components: parser
Affects
[
https://issues.apache.org/jira/browse/TIKA-1916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15222652#comment-15222652
]
Nick C commented on TIKA-1916:
--
I manually created that one because I didn't have the original file. I'll try
[
https://issues.apache.org/jira/browse/TIKA-1916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick C updated TIKA-1916:
-
Attachment: MissingMeta.odt
Attached is a test file. Nice catch I totally overlooked the closing of the
[
https://issues.apache.org/jira/browse/TIKA-1914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1502#comment-1502
]
Nick C commented on TIKA-1914:
--
This error happens when using saxon instead of xalan. Xalan doesn't error
[
https://issues.apache.org/jira/browse/TIKA-1916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick C updated TIKA-1916:
-
External issue URL: https://github.com/apache/tika/pull/94
External issue ID:
Nick C created TIKA-1916:
Summary: NPE in OpenDocumentParser
Key: TIKA-1916
URL: https://issues.apache.org/jira/browse/TIKA-1916
Project: Tika
Issue Type: Bug
Components: parser
[
https://issues.apache.org/jira/browse/TIKA-1914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick C updated TIKA-1914:
-
External issue URL: https://github.com/apache/tika/pull/93
External issue ID:
Nick C created TIKA-1914:
Summary: ExecutableParser doesn't call start document
Key: TIKA-1914
URL: https://issues.apache.org/jira/browse/TIKA-1914
Project: Tika
Issue Type: Bug
38 matches
Mail list logo