[jira] [Commented] (TIKA-3048) Tika unable to parse html files with non UTF-8 charset

2020-02-24 Thread Akash (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17044199#comment-17044199 ] Akash commented on TIKA-3048: - Tested on different languages again. Here is the observation.  Extraction

[jira] [Commented] (TIKA-3035) Tika-app --extract mode outputs to stderr instead of stdout

2020-02-24 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17043924#comment-17043924 ] Tim Allison commented on TIKA-3035: --- [~epugh] as I look back on this, how were you getting json with the

Re: Miredot License Key for Apache Tika Project

2020-02-24 Thread Tyler Bui-Palsulich
Hello, Thank you for the license! It looks like it expired at the end of January. Would it be possible to renew it? Thanks, Tyler On Wed, Jul 23, 2014, 10:38 AM Lewis John Mcgibbney < lewis.mcgibb...@gmail.com> wrote: > Fantastic, thank you. > Best > Lewis > > > On Wed, Jul 23, 2014 at 12:00

[jira] [Commented] (TIKA-3006) Regression in PDF keywords extraction since 1.23

2020-02-24 Thread David Pilato (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17043830#comment-17043830 ] David Pilato commented on TIKA-3006: Is it possible to get a SNAPSHOT version of this? Or do I have to

[jira] [Issue Comment Deleted] (TIKA-3035) Tika-app --extract mode outputs to stderr instead of stdout

2020-02-24 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-3035: -- Comment: was deleted (was: I have no position on this. [~sorend] did not bring any further

[jira] [Commented] (TIKA-3035) Tika-app --extract mode outputs to stderr instead of stdout

2020-02-24 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17043805#comment-17043805 ] Tilman Hausherr commented on TIKA-3035: --- I have no position on this. [~sorend] did not bring any

[jira] [Commented] (TIKA-3035) Tika-app --extract mode outputs to stderr instead of stdout

2020-02-24 Thread Soren Daugaard (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17043802#comment-17043802 ] Soren Daugaard commented on TIKA-3035: -- I feel like this should be fixed since listing the embedded

[jira] [Commented] (TIKA-3037) Tika Docs should highlight Tika-Server

2020-02-24 Thread David Eric Pugh (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17043796#comment-17043796 ] David Eric Pugh commented on TIKA-3037: --- [~tallison]did you see the gettingstarted.apt patch file?

[jira] [Commented] (TIKA-3035) Tika-app --extract mode outputs to stderr instead of stdout

2020-02-24 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17043795#comment-17043795 ] Tim Allison commented on TIKA-3035: --- Should we close as "not a problem" or is there something we need to

[jira] [Commented] (TIKA-3039) Remove mvn dockerfile:build goal from tika-server

2020-02-24 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17043788#comment-17043788 ] ASF GitHub Bot commented on TIKA-3039: -- tballison commented on issue #311: TIKA-3039 Remove

[jira] [Commented] (TIKA-3039) Remove mvn dockerfile:build goal from tika-server

2020-02-24 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17043789#comment-17043789 ] ASF GitHub Bot commented on TIKA-3039: -- tballison commented on issue #311: TIKA-3039 Remove

[jira] [Resolved] (TIKA-3031) NumberFormatException while parsing a certain PDF document

2020-02-24 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-3031. --- Resolution: Fixed Upgraded to PDFBox 2.0.19 today. > NumberFormatException while parsing a certain

[jira] [Resolved] (TIKA-3037) Tika Docs should highlight Tika-Server

2020-02-24 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-3037. --- Fix Version/s: 1.24 Resolution: Fixed I think we merged this? Please reopen if I botched

[jira] [Commented] (TIKA-2837) Performance/Stability problem in ToHTMLContentHandler

2020-02-24 Thread Cristian Vat (Jira)
[ https://issues.apache.org/jira/browse/TIKA-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17043776#comment-17043776 ] Cristian Vat commented on TIKA-2837: I guess this could be closed?   It serves as documentation, if

[jira] [Commented] (TIKA-3038) Miredot license key expired

2020-02-24 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17043774#comment-17043774 ] Tim Allison commented on TIKA-3038: --- Thank you [~epugh]. [~tpalsulich], do I remember correctly that

[jira] [Resolved] (TIKA-3026) Consider extracting structure/tags where possible in PDFs with the PDFMarkedContentExtractor

2020-02-24 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-3026. --- Fix Version/s: 1.24 Assignee: Tim Allison Resolution: Fixed > Consider extracting

[jira] [Resolved] (TIKA-3040) PDF inline OCR: Exception while processing certain image (others in same PDF work)

2020-02-24 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-3040. --- Fix Version/s: 1.24 Resolution: Fixed [~Mandalka], please reopen if {{branch_1x}} doesn't work

[jira] [Resolved] (TIKA-3017) OOM in XSLFSheet.java

2020-02-24 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-3017. --- Fix Version/s: 1.24 Assignee: Tim Allison Resolution: Fixed > OOM in XSLFSheet.java >

[jira] [Resolved] (TIKA-3006) Regression in PDF keywords extraction since 1.23

2020-02-24 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-3006. --- Fix Version/s: 1.24 Assignee: Tim Allison Resolution: Fixed [~dadoonet] we should be

[jira] [Resolved] (TIKA-3047) Upgrade to POI 4.1.2

2020-02-24 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-3047. --- Fix Version/s: 1.24 Assignee: Tim Allison Resolution: Fixed > Upgrade to POI 4.1.2 >

[jira] [Resolved] (TIKA-3033) Upgrade to PDFBox 2.0.19 when available

2020-02-24 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-3033. --- Fix Version/s: 1.24 Assignee: Tim Allison Resolution: Fixed Thank you [~lehmi] and

[jira] [Resolved] (TIKA-3050) Add xmp extraction to psd files

2020-02-24 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-3050. --- Fix Version/s: 1.24 Assignee: Tim Allison Resolution: Fixed > Add xmp extraction to

[jira] [Resolved] (TIKA-3045) Allow users to run custom parsing of xfa and xmp

2020-02-24 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-3045. --- Fix Version/s: 1.24 Assignee: Tim Allison Resolution: Fixed > Allow users to run

[jira] [Resolved] (TIKA-3052) [Dependency] Unsafe Dependancy Resolution in com.beust:jcommander 1.35

2020-02-24 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-3052. --- Fix Version/s: 1.24 Assignee: Tim Allison Resolution: Fixed > [Dependency] Unsafe

[jira] [Resolved] (TIKA-3054) [Dependency] Cross-site Scripting (XSS) in org.apache.cxf:cxf-rt-transports-http 3.3.2

2020-02-24 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-3054. --- Fix Version/s: 1.24 Assignee: Tim Allison Resolution: Fixed > [Dependency] Cross-site

[jira] [Resolved] (TIKA-2952) Vulnerable "metadata-extractor 2.11.0" is present in tika 1.22.

2020-02-24 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2952. --- Fix Version/s: 1.24 Assignee: Tim Allison Resolution: Fixed > Vulnerable

[jira] [Commented] (TIKA-3056) General upgrades for 1.24

2020-02-24 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17043726#comment-17043726 ] Tim Allison commented on TIKA-3056: --- We shouldn't upgrade commons codec just yet:

[jira] [Created] (TIKA-3056) General upgrades for 1.24

2020-02-24 Thread Tim Allison (Jira)
Tim Allison created TIKA-3056: - Summary: General upgrades for 1.24 Key: TIKA-3056 URL: https://issues.apache.org/jira/browse/TIKA-3056 Project: Tika Issue Type: Task Reporter: Tim

[jira] [Resolved] (TIKA-2956) Stack Overflow issue reported on metadata-extractor used version by Tika

2020-02-24 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-2956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2956. --- Resolution: Duplicate Will be fixed in 1.24 > Stack Overflow issue reported on metadata-extractor

[jira] [Comment Edited] (TIKA-2952) Vulnerable "metadata-extractor 2.11.0" is present in tika 1.22.

2020-02-24 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17043639#comment-17043639 ] Tim Allison edited comment on TIKA-2952 at 2/24/20 4:28 PM: I've pushed a

[jira] [Commented] (TIKA-2952) Vulnerable "metadata-extractor 2.11.0" is present in tika 1.22.

2020-02-24 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17043639#comment-17043639 ] Tim Allison commented on TIKA-2952: --- I've pushed a shaded version of adobe's library to maven central