[jira] [Comment Edited] (TIKA-1657) Allow easier dumping of TikaConfig file from tika-core

2015-08-31 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14723941#comment-14723941 ] Tim Allison edited comment on TIKA-1657 at 8/31/15 7:50 PM: I looked at this a

[jira] [Updated] (TIKA-1657) Allow easier XML serialization of TikaConfig

2015-08-31 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1657: -- Summary: Allow easier XML serialization of TikaConfig (was: Allow easier dumping of TikaConfig file

[jira] [Commented] (TIKA-1723) Integrate language-detector into Tika

2015-08-31 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14723352#comment-14723352 ] Tim Allison commented on TIKA-1723: --- My personal preference would be to add to whatever metadata we have

[jira] [Commented] (TIKA-1723) Integrate language-detector into Tika

2015-08-31 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14723347#comment-14723347 ] Tim Allison commented on TIKA-1723: --- Agreed on complexity of multilingual lang id. You would definitely

[jira] [Commented] (TIKA-1657) Allow easier XML serialization of TikaConfig

2015-09-01 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14725330#comment-14725330 ] Tim Allison commented on TIKA-1657: --- Plan C: abandon the notion of full round-trip-ability and serialize

[jira] [Commented] (TIKA-1729) OCR in PDF files

2015-09-03 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14729048#comment-14729048 ] Tim Allison commented on TIKA-1729: --- Are you able to share an example file? > OCR in PDF files >

[jira] [Commented] (TIKA-1657) Allow easier XML serialization of TikaConfig

2015-09-03 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14729480#comment-14729480 ] Tim Allison commented on TIKA-1657: --- >> then I'd say that's what we ought to give them back! I think we

[jira] [Comment Edited] (TIKA-1657) Allow easier XML serialization of TikaConfig

2015-09-03 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14729480#comment-14729480 ] Tim Allison edited comment on TIKA-1657 at 9/3/15 6:08 PM: --- bq. then I'd say

[jira] [Updated] (TIKA-1657) Allow easier XML serialization of TikaConfig

2015-09-03 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1657: -- Attachment: TIKA-1657v1.patch First very rough draft of code... Will need more input on whether the

[jira] [Commented] (TIKA-1723) Integrate language-detector into Tika

2015-09-03 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14729638#comment-14729638 ] Tim Allison commented on TIKA-1723: --- Y, I agree...that's a potential mess/challenge/opportunity. We

[jira] [Comment Edited] (TIKA-1657) Allow easier XML serialization of TikaConfig

2015-09-03 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14729480#comment-14729480 ] Tim Allison edited comment on TIKA-1657 at 9/3/15 6:10 PM: --- bq. then I'd say

[jira] [Commented] (TIKA-1657) Allow easier XML serialization of TikaConfig

2015-09-03 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14729517#comment-14729517 ] Tim Allison commented on TIKA-1657: --- Hmmm...not sure I see the difference. I do see a difference

[jira] [Comment Edited] (TIKA-1657) Allow easier XML serialization of TikaConfig

2015-09-03 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14729517#comment-14729517 ] Tim Allison edited comment on TIKA-1657 at 9/3/15 6:27 PM: --- Hmmm...not sure I see

[jira] [Commented] (TIKA-1657) Allow easier XML serialization of TikaConfig

2015-09-03 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14729593#comment-14729593 ] Tim Allison commented on TIKA-1657: --- Ah, ok, got it. Thank you. Y, current plan was just the latter.

[jira] [Comment Edited] (TIKA-1657) Allow easier XML serialization of TikaConfig

2015-09-03 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14729452#comment-14729452 ] Tim Allison edited comment on TIKA-1657 at 9/3/15 5:48 PM: --- Current version of

[jira] [Commented] (TIKA-1723) Integrate language-detector into Tika

2015-09-03 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14729468#comment-14729468 ] Tim Allison commented on TIKA-1723: --- Makes sense. I proposed moving it over just so that we didn't lose

[jira] [Commented] (TIKA-1728) Detection is not working properly for detecting HWP 5.0 file

2015-09-05 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14732013#comment-14732013 ] Tim Allison commented on TIKA-1728: --- Should we look into seeing if the author of

[jira] [Commented] (TIKA-1728) Detection is not working properly for detecting HWP 5.0 file

2015-09-08 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734808#comment-14734808 ] Tim Allison commented on TIKA-1728: --- Opened separate ticket for potential integration: TIKA-1731. >

[jira] [Created] (TIKA-1731) Try to integrate java-hwp into Tika

2015-09-08 Thread Tim Allison (JIRA)
Tim Allison created TIKA-1731: - Summary: Try to integrate java-hwp into Tika Key: TIKA-1731 URL: https://issues.apache.org/jira/browse/TIKA-1731 Project: Tika Issue Type: New Feature

[jira] [Updated] (TIKA-1731) Try to integrate java-hwp into Tika

2015-09-08 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1731: -- Description: Now that we have detection working for hwp files, it would be great to add a parser.

[jira] [Commented] (TIKA-1726) Augment public methods that use a java.io.File with methods that use a java.nio.file.Path

2015-09-08 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734821#comment-14734821 ] Tim Allison commented on TIKA-1726: --- My preference would be for {{getPath()}} and

[jira] [Commented] (TIKA-1729) OCR in PDF files

2015-09-03 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14729036#comment-14729036 ] Tim Allison commented on TIKA-1729: --- Where is it failing? Are you able to extract the inline images into

[jira] [Updated] (TIKA-1729) OCR in PDF files

2015-09-03 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1729: -- Issue Type: Improvement (was: Bug) > OCR in PDF files > > > Key:

[jira] [Commented] (TIKA-1513) Add mime detection and parsing for dbf files

2015-09-08 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734927#comment-14734927 ] Tim Allison commented on TIKA-1513: --- Hi [~iryndin], I wanted to check in to see if you've had a chance to

[jira] [Commented] (TIKA-1731) Try to integrate java-hwp into Tika

2015-09-08 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734812#comment-14734812 ] Tim Allison commented on TIKA-1731: --- One other library

[jira] [Comment Edited] (TIKA-1731) Try to integrate java-hwp into Tika

2015-09-08 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734812#comment-14734812 ] Tim Allison edited comment on TIKA-1731 at 9/8/15 1:46 PM: --- One other library

[jira] [Commented] (TIKA-1731) Try to integrate java-hwp into Tika

2015-09-10 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14738660#comment-14738660 ] Tim Allison commented on TIKA-1731: --- Great. Thank you so much! It would be helpful to keep this

[jira] [Commented] (TIKA-1731) Try to integrate java-hwp into Tika

2015-09-10 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14738663#comment-14738663 ] Tim Allison commented on TIKA-1731: --- [~mungeol], out of curiosity, what is your gut

[jira] [Commented] (TIKA-1731) Try to integrate java-hwp into Tika

2015-09-10 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14738638#comment-14738638 ] Tim Allison commented on TIKA-1731: --- Thank you for looking into this. bq. can Tika+POI as they are

[jira] [Comment Edited] (TIKA-1731) Try to integrate java-hwp into Tika

2015-09-10 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14738660#comment-14738660 ] Tim Allison edited comment on TIKA-1731 at 9/10/15 12:26 PM: - Great. Thank you

[jira] [Commented] (TIKA-1731) Try to integrate java-hwp into Tika

2015-09-09 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14736676#comment-14736676 ] Tim Allison commented on TIKA-1731: --- [~mungeol], on another note...did hwp ever go the ooxml route after

[jira] [Comment Edited] (TIKA-1731) Try to integrate java-hwp into Tika

2015-09-09 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14736663#comment-14736663 ] Tim Allison edited comment on TIKA-1731 at 9/9/15 11:08 AM: Thank you for the

[jira] [Commented] (TIKA-1731) Try to integrate java-hwp into Tika

2015-09-09 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14736663#comment-14736663 ] Tim Allison commented on TIKA-1731: --- Thank you for the feedback! Are there other options? I'm a bit

[jira] [Commented] (TIKA-1732) TikaException "Failed to close temporary resources" with AutoDetectParser on Windows

2015-09-09 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14737046#comment-14737046 ] Tim Allison commented on TIKA-1732: --- Any chance there's an old version of POI on your class path? How

[jira] [Commented] (TIKA-1732) TikaException "Failed to close temporary resources" with AutoDetectParser on Windows

2015-09-09 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14736959#comment-14736959 ] Tim Allison commented on TIKA-1732: --- Odd...What happens if you call TikaInputStream.get() on the actual

[jira] [Comment Edited] (TIKA-1732) TikaException "Failed to close temporary resources" with AutoDetectParser on Windows

2015-09-09 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14736959#comment-14736959 ] Tim Allison edited comment on TIKA-1732 at 9/9/15 2:56 PM: --- Odd...What happens if

[jira] [Commented] (TIKA-1733) RuntimeException when parsing some word (.doc) documents

2015-09-10 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14739005#comment-14739005 ] Tim Allison commented on TIKA-1733: --- Can't figure out what's going wrong, I've opened:

[jira] [Comment Edited] (TIKA-1731) Try to integrate java-hwp into Tika

2015-09-16 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14738660#comment-14738660 ] Tim Allison edited comment on TIKA-1731 at 9/16/15 10:52 AM: - Great. Thank you

[jira] [Comment Edited] (TIKA-1607) Introduce new arbitrary object key/values data structure for persistence of Tika Metadata

2015-09-16 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14747338#comment-14747338 ] Tim Allison edited comment on TIKA-1607 at 9/16/15 11:31 AM: - Thank you,

[jira] [Commented] (TIKA-1607) Introduce new arbitrary object key/values data structure for persistence of Tika Metadata

2015-09-16 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14747338#comment-14747338 ] Tim Allison commented on TIKA-1607: --- Thank you, [~rgauss], for your thoughtful responses and example

[jira] [Commented] (TIKA-1731) Try to integrate java-hwp into Tika

2015-09-16 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14747292#comment-14747292 ] Tim Allison commented on TIKA-1731: --- Please don't stop watching. We can use your help! Many thanks for

[jira] [Commented] (TIKA-1731) Try to integrate java-hwp into Tika

2015-09-11 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14740625#comment-14740625 ] Tim Allison commented on TIKA-1731: --- Based on only a very cursory look at the examples+specs you sent,

[jira] [Updated] (TIKA-1755) Make ppt and pptx paragraph/div breaks more consistent

2015-09-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1755: -- Attachment: TIKA-1755.patch Initial patch > Make ppt and pptx paragraph/div breaks more consistent >

[jira] [Created] (TIKA-1755) Make ppt and pptx paragraph/div breaks more consistent

2015-09-30 Thread Tim Allison (JIRA)
Tim Allison created TIKA-1755: - Summary: Make ppt and pptx paragraph/div breaks more consistent Key: TIKA-1755 URL: https://issues.apache.org/jira/browse/TIKA-1755 Project: Tika Issue Type:

[jira] [Commented] (TIKA-1757) tika-batch tests fail on systems with whitespace or special chars in folder name

2015-09-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938893#comment-14938893 ] Tim Allison commented on TIKA-1757: --- Sorry about that. Will fix shortly. Thank you! > tika-batch tests

[jira] [Assigned] (TIKA-1757) tika-batch tests fail on systems with whitespace or special chars in folder name

2015-09-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison reassigned TIKA-1757: - Assignee: Tim Allison > tika-batch tests fail on systems with whitespace or special chars in

[jira] [Commented] (TIKA-1757) tika-batch tests fail on systems with whitespace or special chars in folder name

2015-09-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938913#comment-14938913 ] Tim Allison commented on TIKA-1757: --- Y, won't be able to fix for a few hours, but I can replicate the

[jira] [Commented] (TIKA-1748) Upgrade to POI 3.13-final when available

2015-09-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936698#comment-14936698 ] Tim Allison commented on TIKA-1748: --- Thank you! Will commit today. > Upgrade to POI 3.13-final when

[jira] [Comment Edited] (TIKA-1758) BatchCommandLineBuilder fails on systems with whitespace in path

2015-09-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939102#comment-14939102 ] Tim Allison edited comment on TIKA-1758 at 10/1/15 12:27 AM: - r1706178. Thank

[jira] [Resolved] (TIKA-1758) BatchCommandLineBuilder fails on systems with whitespace in path

2015-09-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1758. --- Resolution: Fixed r1706178. Thank you, [~kunda]. > BatchCommandLineBuilder fails on systems with

[jira] [Resolved] (TIKA-1757) tika-batch tests fail on systems with whitespace or special chars in folder name

2015-09-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1757. --- Resolution: Fixed Mea culpa. Tests pass for me on Windows with space in path and Linux. Let me know

[jira] [Resolved] (TIKA-1756) Update forbiddenapis to v2.0

2015-10-01 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1756. --- Resolution: Fixed r1706242. Thank you, [~thetaphi]! > Update forbiddenapis to v2.0 >

[jira] [Commented] (TIKA-1759) Extract contributor metadata from supporting file formats

2015-10-01 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939881#comment-14939881 ] Tim Allison commented on TIKA-1759: --- If we don't want to call all of the above {{dc:contributor}}...

[jira] [Updated] (TIKA-1759) Extract contributor metadata from supporting file formats

2015-10-01 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1759: -- Attachment: contributors.zip I've created test files for MSOffice docs. If anyone would be willing to

[jira] [Created] (TIKA-1759) Extract contributor metadata from supporting file formats

2015-10-01 Thread Tim Allison (JIRA)
Tim Allison created TIKA-1759: - Summary: Extract contributor metadata from supporting file formats Key: TIKA-1759 URL: https://issues.apache.org/jira/browse/TIKA-1759 Project: Tika Issue Type:

[jira] [Updated] (TIKA-1759) Extract contributor metadata from supporting file formats

2015-10-01 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1759: -- Description: Many common file formats store information about contributors (broadly speaking) to a

[jira] [Commented] (TIKA-1759) Extract contributor metadata from supporting file formats

2015-10-01 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14940091#comment-14940091 ] Tim Allison commented on TIKA-1759: --- Y, it is. I think it would be useful to try to get other broader

[jira] [Commented] (TIKA-1759) Extract contributor metadata from supporting file formats

2015-10-01 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14940009#comment-14940009 ] Tim Allison commented on TIKA-1759: --- [~tilman], I think we're good with PDAnnotationMarkup's

[jira] [Commented] (TIKA-1285) Upgrade to PDFBox 2.0.0 when available

2015-10-03 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14942248#comment-14942248 ] Tim Allison commented on TIKA-1285: --- Completely agree. If I update the PDFBox 2.0 branch of Tika on my

[jira] [Commented] (TIKA-1759) Extract contributor metadata from supporting file formats

2015-10-02 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14940983#comment-14940983 ] Tim Allison commented on TIKA-1759: --- Will do. Thank you! > Extract contributor metadata from supporting

[jira] [Assigned] (TIKA-1761) Error Parsing PPT (97-2003) files with password protection against modification which were created using Office 2013

2015-10-02 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison reassigned TIKA-1761: - Assignee: Tim Allison > Error Parsing PPT (97-2003) files with password protection against >

[jira] [Commented] (TIKA-1760) PDF index fulltext fails.

2015-10-02 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14940981#comment-14940981 ] Tim Allison commented on TIKA-1760: --- Thank you for raising this issue. I'm not sure there's anything we

[jira] [Commented] (TIKA-1748) Upgrade to POI 3.13-final when available

2015-09-28 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933186#comment-14933186 ] Tim Allison commented on TIKA-1748: --- As [~kunda] pointed out, you're using a future version of POI. :)

[jira] [Commented] (TIKA-1736) Bouncy Castle version binary incompatibility

2015-09-28 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933175#comment-14933175 ] Tim Allison commented on TIKA-1736: --- Should be fixed when

[jira] [Comment Edited] (TIKA-1748) Upgrade to POI 3.13-final when available

2015-09-28 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933186#comment-14933186 ] Tim Allison edited comment on TIKA-1748 at 9/28/15 11:40 AM: - As [~kunda]

[jira] [Assigned] (TIKA-1747) Change file->path in tika-batch throughout

2015-09-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison reassigned TIKA-1747: - Assignee: Tim Allison > Change file->path in tika-batch throughout >

[jira] [Resolved] (TIKA-1747) Change file->path in tika-batch throughout

2015-09-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1747. --- Resolution: Fixed r1706060 > Change file->path in tika-batch throughout >

[jira] [Resolved] (TIKA-1754) tika-batch's FileListCrawler truncates the first character of the fileList if the root is e.g. X:

2015-09-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1754. --- Resolution: Fixed Fixed with TIKA-1747. > tika-batch's FileListCrawler truncates the first character

[jira] [Commented] (TIKA-1754) tika-batch's FileListCrawler truncates the first character of the fileList if the root is e.g. X:

2015-09-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14937440#comment-14937440 ] Tim Allison commented on TIKA-1754: --- Y, probably. This particular issue is fixed for now. Will take a

[jira] [Assigned] (TIKA-1744) Use java.nio.file.Path in TikaInputStream

2015-09-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison reassigned TIKA-1744: - Assignee: Tim Allison > Use java.nio.file.Path in TikaInputStream >

[jira] [Created] (TIKA-1754) tika-batch's FileListCrawler truncates the first character of the fileList if the root is e.g. X:

2015-09-30 Thread Tim Allison (JIRA)
Tim Allison created TIKA-1754: - Summary: tika-batch's FileListCrawler truncates the first character of the fileList if the root is e.g. X: Key: TIKA-1754 URL: https://issues.apache.org/jira/browse/TIKA-1754

[jira] [Resolved] (TIKA-1707) Upgrade to Apache POI 3.13 Beta 2

2015-09-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1707. --- Resolution: Fixed r1706079. Thank you, [~kiwiwings]! Apologies to you and [~gagravarr] for not

[jira] [Resolved] (TIKA-1742) StackOverflowError parsing a PDF with ExtractInlineImages=true

2015-09-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1742. --- Resolution: Fixed r1706086 > StackOverflowError parsing a PDF with ExtractInlineImages=true >

[jira] [Commented] (TIKA-1742) StackOverflowError parsing a PDF with ExtractInlineImages=true

2015-09-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14937720#comment-14937720 ] Tim Allison commented on TIKA-1742: --- Thank you, [~nated], for raising this, and thank you [~tilman] for a

[jira] [Updated] (TIKA-1748) Upgrade to POI 3.13-final when available

2015-09-28 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1748: -- Attachment: TIKA-1748.patch Y, not too much work. All tests pass, what could possibly go wrong? I

[jira] [Commented] (TIKA-1732) TikaException "Failed to close temporary resources" with AutoDetectParser on Windows

2015-09-09 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14737440#comment-14737440 ] Tim Allison commented on TIKA-1732: --- NP. Thank you for closing the loop! Your test doc made me realize

[jira] [Commented] (TIKA-1737) PDFBox 1.8.10 is still a basket case

2015-10-05 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14944295#comment-14944295 ] Tim Allison commented on TIKA-1737: --- [~alanbur], over on TIKA-1285, I posted a link for my github fork of

[jira] [Commented] (TIKA-1743) NetworkParser can create Unbounded Number of Threads

2015-09-23 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14904371#comment-14904371 ] Tim Allison commented on TIKA-1743: --- Oh, I wish I had time to finish off TIKA-1657 and

[jira] [Comment Edited] (TIKA-1744) Use java.nio.file.Path in TikaInputStream

2015-09-23 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14904388#comment-14904388 ] Tim Allison edited comment on TIKA-1744 at 9/23/15 12:06 PM: - Thank you,

[jira] [Commented] (TIKA-1742) StackOverflowError parsing a PDF with ExtractInlineImages=true

2015-09-23 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14904360#comment-14904360 ] Tim Allison commented on TIKA-1742: --- The HORROR! If it were a second rate conference, it would be one

[jira] [Created] (TIKA-1747) Change file->path in tika-batch throughout

2015-09-23 Thread Tim Allison (JIRA)
Tim Allison created TIKA-1747: - Summary: Change file->path in tika-batch throughout Key: TIKA-1747 URL: https://issues.apache.org/jira/browse/TIKA-1747 Project: Tika Issue Type: Sub-task

[jira] [Commented] (TIKA-1744) Use java.nio.file.Path in TikaInputStream

2015-09-23 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14904388#comment-14904388 ] Tim Allison commented on TIKA-1744: --- Thank you, [~kunda]! I think this was part of the earlier

[jira] [Created] (TIKA-1748) Upgrade to POI 3.13-final when available

2015-09-23 Thread Tim Allison (JIRA)
Tim Allison created TIKA-1748: - Summary: Upgrade to POI 3.13-final when available Key: TIKA-1748 URL: https://issues.apache.org/jira/browse/TIKA-1748 Project: Tika Issue Type: Task

[jira] [Commented] (TIKA-1742) StackOverflowError parsing a PDF with ExtractInlineImages=true

2015-09-24 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14906238#comment-14906238 ] Tim Allison commented on TIKA-1742: --- [~tilman] fixed this over in PDFBox 1.8.x (already not an issue in

[jira] [Comment Edited] (TIKA-1737) PDFBox 1.8.10 is still a basket case

2015-09-22 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14902528#comment-14902528 ] Tim Allison edited comment on TIKA-1737 at 9/22/15 4:16 PM: bq. there were

[jira] [Comment Edited] (TIKA-1667) Upgrade to POI 3.13-beta1 when available

2015-09-25 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14907914#comment-14907914 ] Tim Allison edited comment on TIKA-1667 at 9/25/15 11:02 AM: - Which issue? Can

[jira] [Commented] (TIKA-1667) Upgrade to POI 3.13-beta1 when available

2015-09-25 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14907914#comment-14907914 ] Tim Allison commented on TIKA-1667: --- Which issue? > Upgrade to POI 3.13-beta1 when available >

[jira] [Commented] (TIKA-1753) Improper word concatenation when extracting pdf

2015-09-25 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14907917#comment-14907917 ] Tim Allison commented on TIKA-1753: --- Y. I defer to [~lehmi] on PDFBOX-2991 for whether this is fixable

[jira] [Commented] (TIKA-1657) Allow easier XML serialization of TikaConfig

2015-09-25 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14907925#comment-14907925 ] Tim Allison commented on TIKA-1657: --- Thank you, [~gagravarr], for moving this forward...and for code that

[jira] [Commented] (TIKA-1737) PDFBox 1.8.10 is still a basket case

2015-09-21 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14900508#comment-14900508 ] Tim Allison commented on TIKA-1737: --- Thank you for raising this issue. I don't think we've seen this

[jira] [Commented] (TIKA-1740) RecursiveParserWrapper returning ContentHandler-s

2015-09-22 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14902592#comment-14902592 ] Tim Allison commented on TIKA-1740: --- Oops. Nick beat me to it. That was plan B. [~gagravarr], do you

[jira] [Resolved] (TIKA-1734) Use java.nio.file.Path in TemporaryResources

2015-09-22 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1734. --- Resolution: Fixed r1704620. Thank you, [~kunda]! > Use java.nio.file.Path in TemporaryResources >

[jira] [Commented] (TIKA-1726) Augment public methods that use a java.io.File with methods that use a java.nio.file.Path

2015-09-22 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14902613#comment-14902613 ] Tim Allison commented on TIKA-1726: --- Thank you, [~kkrugler]. [~kunda], is there enough consensus on this

[jira] [Commented] (TIKA-1737) PDFBox 1.8.10 is still a basket case

2015-09-22 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14902622#comment-14902622 ] Tim Allison commented on TIKA-1737: --- Could we have done something at the Tika level to cause this...I

[jira] [Commented] (TIKA-1737) PDFBox 1.8.10 is still a basket case

2015-09-22 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14902659#comment-14902659 ] Tim Allison commented on TIKA-1737: --- bq. dating back as far as 1992 Y, I just confirmed that I can't

[jira] [Commented] (TIKA-1737) PDFBox 1.8.10 is still a basket case

2015-09-22 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14902528#comment-14902528 ] Tim Allison commented on TIKA-1737: --- bq. there were many more that just had a single line of error Try

[jira] [Commented] (TIKA-1737) PDFBox 1.8.10 is still a basket case

2015-09-22 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14902522#comment-14902522 ] Tim Allison commented on TIKA-1737: --- Thank you, [~tilman]! > PDFBox 1.8.10 is still a basket case >

[jira] [Commented] (TIKA-1734) Use java.nio.file.Path in TemporaryResources

2015-09-22 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14902567#comment-14902567 ] Tim Allison commented on TIKA-1734: --- About to commit, unless you'd like to. :) > Use java.nio.file.Path

[jira] [Commented] (TIKA-1740) RecursiveParserWrapper returning ContentHandler-s

2015-09-22 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14902591#comment-14902591 ] Tim Allison commented on TIKA-1740: --- How about we store a list of pairs instead of

[jira] [Commented] (TIKA-1737) PDFBox 1.8.10 is still a basket case

2015-09-22 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14902835#comment-14902835 ] Tim Allison commented on TIKA-1737: --- See PDFBOX-2986 for a resource leak discovered through testing

<    6   7   8   9   10   11   12   13   14   15   >