[jira] [Commented] (TIKA-1739) cTAKESParser doesn't work in 1.11

2015-09-22 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14903340#comment-14903340 ] Nick Burch commented on TIKA-1739: -- I'm not sure that the cTAKES parser should be creating an

[jira] [Commented] (TIKA-1739) cTAKESParser doesn't work in 1.11

2015-09-22 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14903252#comment-14903252 ] Chris A. Mattmann commented on TIKA-1739: - OK [~totaro] I implemented your solution (see attached

[jira] [Comment Edited] (TIKA-1739) cTAKESParser doesn't work in 1.11

2015-09-22 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14903252#comment-14903252 ] Chris A. Mattmann edited comment on TIKA-1739 at 9/22/15 7:13 PM: -- OK

[jira] [Commented] (TIKA-1739) cTAKESParser doesn't work in 1.11

2015-09-22 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14903188#comment-14903188 ] Chris A. Mattmann commented on TIKA-1739: - Thanks Giuseppe, so I will try this fix now and update

[jira] [Created] (TIKA-1741) Include CTAKESConfig.properties within tika-parsers resources by default

2015-09-22 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created TIKA-1741: -- Summary: Include CTAKESConfig.properties within tika-parsers resources by default Key: TIKA-1741 URL: https://issues.apache.org/jira/browse/TIKA-1741

[jira] [Comment Edited] (TIKA-1737) PDFBox 1.8.10 is still a basket case

2015-09-22 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14902528#comment-14902528 ] Tim Allison edited comment on TIKA-1737 at 9/22/15 4:16 PM: bq. there were

[jira] [Commented] (TIKA-1739) cTAKESParser doesn't work in 1.11

2015-09-22 Thread Giuseppe Totaro (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14903123#comment-14903123 ] Giuseppe Totaro commented on TIKA-1739: --- Hi [~chrismattmann], Hi [~gagravarr], I looked at the last

[jira] [Commented] (TIKA-1739) cTAKESParser doesn't work in 1.11

2015-09-22 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14903389#comment-14903389 ] Nick Burch commented on TIKA-1739: -- We explicitly don't let you set an {{AutoDetectParser}} in the config,

[jira] [Commented] (TIKA-1737) PDFBox 1.8.10 is still a basket case

2015-09-22 Thread Alan Burlison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14903422#comment-14903422 ] Alan Burlison commented on TIKA-1737: - bq. Re the ArrayIndexOutOfBoundsException - are you using

[jira] [Commented] (TIKA-1739) cTAKESParser doesn't work in 1.11

2015-09-22 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14903391#comment-14903391 ] Nick Burch commented on TIKA-1739: -- We explicitly don't let you set an {{AutoDetectParser}} in the config,

[jira] [Commented] (TIKA-1739) cTAKESParser doesn't work in 1.11

2015-09-22 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14903390#comment-14903390 ] Nick Burch commented on TIKA-1739: -- We explicitly don't let you set an {{AutoDetectParser}} in the config,

[jira] [Created] (TIKA-1742) StackOverflowError parsing a PDF with ExtractInlineImages=true

2015-09-22 Thread Nathan Dire (JIRA)
Nathan Dire created TIKA-1742: - Summary: StackOverflowError parsing a PDF with ExtractInlineImages=true Key: TIKA-1742 URL: https://issues.apache.org/jira/browse/TIKA-1742 Project: Tika Issue

[jira] [Updated] (TIKA-1744) Use java.nio.file.Path in TikaInputStream

2015-09-22 Thread Yaniv Kunda (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yaniv Kunda updated TIKA-1744: -- Attachment: TIKA-1744.patch > Use java.nio.file.Path in TikaInputStream >

[jira] [Issue Comment Deleted] (TIKA-1739) cTAKESParser doesn't work in 1.11

2015-09-22 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch updated TIKA-1739: - Comment: was deleted (was: We explicitly don't let you set an {{AutoDetectParser}} in the config, it's

[jira] [Created] (TIKA-1743) NetworkParser can create Unbounded Number of Threads

2015-09-22 Thread Bob Paulin (JIRA)
Bob Paulin created TIKA-1743: Summary: NetworkParser can create Unbounded Number of Threads Key: TIKA-1743 URL: https://issues.apache.org/jira/browse/TIKA-1743 Project: Tika Issue Type: Bug

[jira] [Commented] (TIKA-1737) PDFBox 1.8.10 is still a basket case

2015-09-22 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14903537#comment-14903537 ] Tilman Hausherr commented on TIKA-1737: --- No, PDFBOX-2987 is another one I fixed for you. The NPE in

[jira] [Issue Comment Deleted] (TIKA-1739) cTAKESParser doesn't work in 1.11

2015-09-22 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch updated TIKA-1739: - Comment: was deleted (was: We explicitly don't let you set an {{AutoDetectParser}} in the config, it's

[jira] [Updated] (TIKA-1742) StackOverflowError parsing a PDF with ExtractInlineImages=true

2015-09-22 Thread Nathan Dire (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nathan Dire updated TIKA-1742: -- Description: Here's the file: http://nlp.stanford.edu/~socherr/EMNLP2013_RNTN.pdf Code to repro

[jira] [Commented] (TIKA-1743) NetworkParser can create Unbounded Number of Threads

2015-09-22 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14903878#comment-14903878 ] Tyler Palsulich commented on TIKA-1743: --- [Copied from the list] This sounds like a great idea! We

Re: [jira] [Created] (TIKA-1743) NetworkParser can create Unbounded Number of Threads

2015-09-22 Thread Tyler Palsulich
This sounds like a great idea! We should make the size of the pool configurable with TikaConfig. On Tue, Sep 22, 2015, 3:04 PM Bob Paulin (JIRA) wrote: > Bob Paulin created TIKA-1743: > > > Summary: NetworkParser can create

[jira] [Updated] (TIKA-1745) Add methods accepting java.nio.file.Path to org.apache.tika.Tika and org.apache.tika.parser.ParsingReader

2015-09-22 Thread Yaniv Kunda (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yaniv Kunda updated TIKA-1745: -- Attachment: TIKA-1745.patch > Add methods accepting java.nio.file.Path to org.apache.tika.Tika and >

[jira] [Commented] (TIKA-1739) cTAKESParser doesn't work in 1.11

2015-09-22 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14903670#comment-14903670 ] Chris A. Mattmann commented on TIKA-1739: - So, I'm going to take this to the list, but here is the

[jira] [Resolved] (TIKA-1739) cTAKESParser doesn't work in 1.11

2015-09-22 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann resolved TIKA-1739. - Resolution: Won't Fix Nick suggested a work-around, works fine. > cTAKESParser doesn't

[jira] [Updated] (TIKA-1746) modify TikaFileTypeDetector to use new detect method accepting java.nio.file.Path

2015-09-22 Thread Yaniv Kunda (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yaniv Kunda updated TIKA-1746: -- Attachment: TIKA-1746.patch > modify TikaFileTypeDetector to use new detect method accepting >

[jira] [Commented] (TIKA-1740) RecursiveParserWrapper returning ContentHandler-s

2015-09-22 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14902592#comment-14902592 ] Tim Allison commented on TIKA-1740: --- Oops. Nick beat me to it. That was plan B. [~gagravarr], do you

[jira] [Comment Edited] (TIKA-1737) PDFBox 1.8.10 is still a basket case

2015-09-22 Thread Alan Burlison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14902657#comment-14902657 ] Alan Burlison edited comment on TIKA-1737 at 9/22/15 1:51 PM: -- bq. Could we

[jira] [Commented] (TIKA-1734) Use java.nio.file.Path in TemporaryResources

2015-09-22 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14902668#comment-14902668 ] Hudson commented on TIKA-1734: -- SUCCESS: Integrated in tika-trunk-jdk1.7 #852 (See

[jira] [Commented] (TIKA-1739) cTAKESParser doesn't work in 1.11

2015-09-22 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14902601#comment-14902601 ] Nick Burch commented on TIKA-1739: -- I can't actually use the cTAKES parser on my machine - I tried

[jira] [Resolved] (TIKA-1734) Use java.nio.file.Path in TemporaryResources

2015-09-22 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1734. --- Resolution: Fixed r1704620. Thank you, [~kunda]! > Use java.nio.file.Path in TemporaryResources >

[jira] [Commented] (TIKA-1726) Augment public methods that use a java.io.File with methods that use a java.nio.file.Path

2015-09-22 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14902613#comment-14902613 ] Tim Allison commented on TIKA-1726: --- Thank you, [~kkrugler]. [~kunda], is there enough consensus on this

[jira] [Commented] (TIKA-1737) PDFBox 1.8.10 is still a basket case

2015-09-22 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14902622#comment-14902622 ] Tim Allison commented on TIKA-1737: --- Could we have done something at the Tika level to cause this...I

[jira] [Commented] (TIKA-1737) PDFBox 1.8.10 is still a basket case

2015-09-22 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14902659#comment-14902659 ] Tim Allison commented on TIKA-1737: --- bq. dating back as far as 1992 Y, I just confirmed that I can't

[jira] [Commented] (TIKA-1737) PDFBox 1.8.10 is still a basket case

2015-09-22 Thread Alan Burlison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14902657#comment-14902657 ] Alan Burlison commented on TIKA-1737: - .bq Could we have done something at the Tika level to cause

[jira] [Commented] (TIKA-1740) RecursiveParserWrapper returning ContentHandler-s

2015-09-22 Thread Andrea (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14902694#comment-14902694 ] Andrea commented on TIKA-1740: -- Thanks for your reply. Of course I can create my own Recursive parser, but it

Re: [jira] [Commented] (TIKA-1726) Augment public methods that use a java.io.File with methods that use a java.nio.file.Path

2015-09-22 Thread Yaniv Kunda
Yes, using getPath() for the getFile() counterpart. I'll prepare patches in a few hours. On Sep 22, 2015 4:35 PM, "Tim Allison (JIRA)" wrote: > > [ >

RE: [DISCUSS] Release Tika 1.11?

2015-09-22 Thread Allison, Timothy B.
Thank _you_ for all of your work in modernizing us. With your efforts, we'll be able to deprecate TikaInputStream#get(PunchCard pc) soon. :) >>Regarding FilenameUtils.getName() - I believe that its functionality can be >>replaced by Path.getFileName() - and in a platform-aware manner, as each

[jira] [Commented] (TIKA-1737) PDFBox 1.8.10 is still a basket case

2015-09-22 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14902528#comment-14902528 ] Tim Allison commented on TIKA-1737: --- bq. there were many more that just had a single line of error Try

[jira] [Commented] (TIKA-1737) PDFBox 1.8.10 is still a basket case

2015-09-22 Thread Alan Burlison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14902580#comment-14902580 ] Alan Burlison commented on TIKA-1737: - The heap dump is huge and the profiler struggles to cope so I

[jira] [Commented] (TIKA-1737) PDFBox 1.8.10 is still a basket case

2015-09-22 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14902522#comment-14902522 ] Tim Allison commented on TIKA-1737: --- Thank you, [~tilman]! > PDFBox 1.8.10 is still a basket case >

[jira] [Commented] (TIKA-1734) Use java.nio.file.Path in TemporaryResources

2015-09-22 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14902567#comment-14902567 ] Tim Allison commented on TIKA-1734: --- About to commit, unless you'd like to. :) > Use java.nio.file.Path

[jira] [Commented] (TIKA-1740) RecursiveParserWrapper returning ContentHandler-s

2015-09-22 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14902585#comment-14902585 ] Nick Burch commented on TIKA-1740: -- You might be better off writing your own Recursion handler. Take a

[jira] [Created] (TIKA-1740) RecursiveParserWrapper returning ContentHandler-s

2015-09-22 Thread Andrea (JIRA)
Andrea created TIKA-1740: Summary: RecursiveParserWrapper returning ContentHandler-s Key: TIKA-1740 URL: https://issues.apache.org/jira/browse/TIKA-1740 Project: Tika Issue Type: Wish

[jira] [Commented] (TIKA-1734) Use java.nio.file.Path in TemporaryResources

2015-09-22 Thread Bob Paulin (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14902553#comment-14902553 ] Bob Paulin commented on TIKA-1734: -- +1 from me on this [~kunda] > Use java.nio.file.Path in

[jira] [Commented] (TIKA-1740) RecursiveParserWrapper returning ContentHandler-s

2015-09-22 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14902591#comment-14902591 ] Tim Allison commented on TIKA-1740: --- How about we store a list of pairs instead of

[jira] [Commented] (TIKA-1739) cTAKESParser doesn't work in 1.11

2015-09-22 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14902823#comment-14902823 ] Chris A. Mattmann commented on TIKA-1739: - Nick I wonder if the approval got lost in email or in

[jira] [Commented] (TIKA-1737) PDFBox 1.8.10 is still a basket case

2015-09-22 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14902835#comment-14902835 ] Tim Allison commented on TIKA-1737: --- See PDFBOX-2986 for a resource leak discovered through testing