[jira] [Updated] (TIKA-1907) Big Pdf parsing to text - Out of memory

2024-05-15 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-1907: -- Fix Version/s: 3.0.0 > Big Pdf parsing to text - Out of memory >

[jira] [Comment Edited] (TIKA-4254) The test `TestMimeTypes#testJavaRegex` is not idempotent, as it passes in the first run and fails in repeated runs in the same environment.

2024-05-12 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845590#comment-17845590 ] Tilman Hausherr edited comment on TIKA-4254 at 5/12/24 9:40 AM: THausherr

[jira] [Commented] (TIKA-4254) The test `TestMimeTypes#testJavaRegex` is not idempotent, as it passes in the first run and fails in repeated runs in the same environment.

2024-05-11 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845566#comment-17845566 ] Tilman Hausherr commented on TIKA-4254: --- Why would we ever run the test twice in the same

[jira] [Commented] (TIKA-4245) Tika does not get html content properly

2024-04-25 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840922#comment-17840922 ] Tilman Hausherr commented on TIKA-4245: --- The file claims to be utf-16 but it isn't. If I change it

[jira] [Commented] (TIKA-4245) Tika does not get html content properly

2024-04-25 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840908#comment-17840908 ] Tilman Hausherr commented on TIKA-4245: --- Happens also with the tika app GUI. > Tika does not get

[jira] [Updated] (TIKA-4245) Tika does not get html content properly

2024-04-25 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-4245: -- Description: We use org.apache.tika.parser.AutoDetectParser to get the content of html files.  

[jira] [Comment Edited] (TIKA-4166) dependency updates for Tika 3.0

2024-04-22 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839745#comment-17839745 ] Tilman Hausherr edited comment on TIKA-4166 at 4/22/24 3:27 PM: It turned

[jira] [Commented] (TIKA-4166) dependency updates for Tika 3.0

2024-04-22 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839745#comment-17839745 ] Tilman Hausherr commented on TIKA-4166: --- It turned out to be something different than the missing

[jira] [Commented] (TIKA-4166) dependency updates for Tika 3.0

2024-04-22 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839652#comment-17839652 ] Tilman Hausherr commented on TIKA-4166: --- The latest Apache parent update means a javadoc update and

[jira] [Commented] (TIKA-4240) Change dependabot to weekly

2024-04-11 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17836236#comment-17836236 ] Tilman Hausherr commented on TIKA-4240: --- I prefer daily but if more people feel pressured or annoyed

[jira] [Updated] (TIKA-4240) Change dependabot to weekly

2024-04-11 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-4240: -- Component/s: build > Change dependabot to weekly > --- > >

[jira] [Commented] (TIKA-4240) Change dependabot to weekly

2024-04-11 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17836224#comment-17836224 ] Tilman Hausherr commented on TIKA-4240: --- Not a burden (that was Eric, sort-of), I just don't have

[jira] [Commented] (TIKA-4238) replace some deprecated code

2024-04-06 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834529#comment-17834529 ] Tilman Hausherr commented on TIKA-4238: --- This was a low-hanging fruit. I could also have done

[jira] [Comment Edited] (TIKA-4238) replace some deprecated code

2024-04-06 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834529#comment-17834529 ] Tilman Hausherr edited comment on TIKA-4238 at 4/6/24 2:12 PM: --- This was a

[jira] [Updated] (TIKA-4218) Run regression tests to support 2.9.2 release

2024-04-06 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-4218: -- Affects Version/s: 2.9.1 > Run regression tests to support 2.9.2 release >

[jira] [Resolved] (TIKA-4218) Run regression tests to support 2.9.2 release

2024-04-06 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr resolved TIKA-4218. --- Assignee: Tim Allison Resolution: Fixed > Run regression tests to support 2.9.2 release

[jira] [Assigned] (TIKA-4171) Tika server only returns last value for PDFs that have multiple of the same key

2024-04-06 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr reassigned TIKA-4171: - Assignee: Tim Allison > Tika server only returns last value for PDFs that have multiple

[jira] [Updated] (TIKA-4218) Run regression tests to support 2.9.2 release

2024-04-06 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-4218: -- Fix Version/s: 2.9.2 > Run regression tests to support 2.9.2 release >

[jira] [Resolved] (TIKA-4171) Tika server only returns last value for PDFs that have multiple of the same key

2024-04-06 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr resolved TIKA-4171. --- Resolution: Fixed > Tika server only returns last value for PDFs that have multiple of the

[jira] [Resolved] (TIKA-4238) replace some deprecated code

2024-04-06 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr resolved TIKA-4238. --- Resolution: Fixed > replace some deprecated code > > >

[jira] [Created] (TIKA-4239) Update to 2.9.3

2024-04-06 Thread Tilman Hausherr (Jira)
Tilman Hausherr created TIKA-4239: - Summary: Update to 2.9.3 Key: TIKA-4239 URL: https://issues.apache.org/jira/browse/TIKA-4239 Project: Tika Issue Type: Task Components: build

[jira] [Updated] (TIKA-4239) Update to 2.9.3

2024-04-06 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-4239: -- Affects Version/s: 2.9.2 > Update to 2.9.3 > --- > > Key: TIKA-4239

[jira] [Resolved] (TIKA-4162) Update to 2.9.2

2024-04-06 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr resolved TIKA-4162. --- Assignee: Tilman Hausherr Resolution: Fixed > Update to 2.9.2 > --- > >

[jira] [Created] (TIKA-4238) replace some deprecated code

2024-04-06 Thread Tilman Hausherr (Jira)
Tilman Hausherr created TIKA-4238: - Summary: replace some deprecated code Key: TIKA-4238 URL: https://issues.apache.org/jira/browse/TIKA-4238 Project: Tika Issue Type: Task Affects

[jira] [Updated] (TIKA-4236) tika-parser-nlp-module has an unnecessary Guava dependency

2024-04-05 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-4236: -- Fix Version/s: 2.9.3 > tika-parser-nlp-module has an unnecessary Guava dependency >

[jira] [Updated] (TIKA-4236) tika-parser-nlp-module has an unnecessary Guava dependency

2024-04-05 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-4236: -- Fix Version/s: (was: 2.9.2) > tika-parser-nlp-module has an unnecessary Guava dependency >

[jira] [Resolved] (TIKA-4236) tika-parser-nlp-module has an unnecessary Guava dependency

2024-04-05 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr resolved TIKA-4236. --- Assignee: Tilman Hausherr Resolution: Fixed > tika-parser-nlp-module has an unnecessary

[jira] [Updated] (TIKA-4236) tika-parser-nlp-module has an unnecessary Guava dependency

2024-04-05 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-4236: -- Fix Version/s: 2.9.2 3.0.0 > tika-parser-nlp-module has an unnecessary Guava

[jira] [Commented] (TIKA-4236) tika-parser-nlp-module has an unnecessary Guava dependency

2024-04-05 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834385#comment-17834385 ] Tilman Hausherr commented on TIKA-4236: --- I found only a test dependency mentioned directly. It's

[jira] [Commented] (TIKA-4236) tika-parser-nlp-module has an unnecessary Guava dependency

2024-04-05 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834282#comment-17834282 ] Tilman Hausherr commented on TIKA-4236: --- https://tika.apache.org/ "The Apache Tika PMC has set

[jira] [Comment Edited] (TIKA-4236) tika-parser-nlp-module has an unnecessary Guava dependency

2024-04-05 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834277#comment-17834277 ] Tilman Hausherr edited comment on TIKA-4236 at 4/5/24 12:21 PM: Is this

[jira] [Commented] (TIKA-4236) tika-parser-nlp-module has an unnecessary Guava dependency

2024-04-05 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834277#comment-17834277 ] Tilman Hausherr commented on TIKA-4236: --- Is this what you had in mind?

[jira] [Commented] (TIKA-4231) Parsing Arabic PDF is returning bad data

2024-04-03 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17833807#comment-17833807 ] Tilman Hausherr commented on TIKA-4231: --- Yes it is text, but the PDF is using a feature that we

[jira] [Commented] (TIKA-4231) Parsing Arabic PDF is returning bad data

2024-04-02 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17833385#comment-17833385 ] Tilman Hausherr commented on TIKA-4231: --- No this is not being worked on. You'll have to use OCR. >

[jira] [Commented] (TIKA-4231) Parsing Arabic PDF is returning bad data

2024-03-29 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17832291#comment-17832291 ] Tilman Hausherr commented on TIKA-4231: --- I have attached an extraction with pdfbox 2.0.31:

[jira] [Updated] (TIKA-4231) Parsing Arabic PDF is returning bad data

2024-03-29 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-4231: -- Attachment: arabic-pdfbox.txt > Parsing Arabic PDF is returning bad data >

[jira] [Commented] (TIKA-4231) Parsing Arabic PDF is returning bad data

2024-03-29 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17832284#comment-17832284 ] Tilman Hausherr commented on TIKA-4231: --- This doesn't change my argument. The latest version is

[jira] [Commented] (TIKA-4231) Parsing Arabic PDF is returning bad data

2024-03-29 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17832258#comment-17832258 ] Tilman Hausherr commented on TIKA-4231: --- The current tika version is 2.9.1, soon to be 2.9.2. There

[jira] [Updated] (TIKA-4228) Tika parser crashes JVM when it gets metadata and embedded objects from pdf

2024-03-27 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-4228: -- Affects Version/s: 2.9.0 > Tika parser crashes JVM when it gets metadata and embedded objects

[jira] [Commented] (TIKA-4218) Run regression tests to support 2.9.2 release

2024-03-26 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830954#comment-17830954 ] Tilman Hausherr commented on TIKA-4218: --- 6FOMNUPGPA6IG66Z4NIUEQIVOR5ON46Q (an MP4 file) has a loss

[jira] [Commented] (TIKA-4218) Run regression tests to support 2.9.2 release

2024-03-25 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830604#comment-17830604 ] Tilman Hausherr commented on TIKA-4218: --- To be honest I didn't look further, because these problems

[jira] [Comment Edited] (TIKA-4171) Tika server only returns last value for PDFs that have multiple of the same key

2024-03-23 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830110#comment-17830110 ] Tilman Hausherr edited comment on TIKA-4171 at 3/23/24 5:50 PM: We have a

[jira] [Updated] (TIKA-4171) Tika server only returns last value for PDFs that have multiple of the same key

2024-03-23 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-4171: -- Attachment: testPDF_XFA_govdocs1_258578.pdf.html > Tika server only returns last value for PDFs

[jira] [Commented] (TIKA-4171) Tika server only returns last value for PDFs that have multiple of the same key

2024-03-23 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830113#comment-17830113 ] Tilman Hausherr commented on TIKA-4171: --- Proposed change: add these 3 lines before the last one in

[jira] [Commented] (TIKA-4171) Tika server only returns last value for PDFs that have multiple of the same key

2024-03-23 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830110#comment-17830110 ] Tilman Hausherr commented on TIKA-4171: --- We have a regression with the file [^876503.pdf] in the

[jira] [Updated] (TIKA-4171) Tika server only returns last value for PDFs that have multiple of the same key

2024-03-23 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-4171: -- Attachment: 876503.pdf > Tika server only returns last value for PDFs that have multiple of the

[jira] [Commented] (TIKA-4218) Run regression tests to support 2.9.2 release

2024-03-23 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830105#comment-17830105 ] Tilman Hausherr commented on TIKA-4218: --- Follow up in TIKA-4171 > Run regression tests to support

[jira] [Reopened] (TIKA-4171) Tika server only returns last value for PDFs that have multiple of the same key

2024-03-23 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr reopened TIKA-4171: --- > Tika server only returns last value for PDFs that have multiple of the same > key >

[jira] [Commented] (TIKA-4218) Run regression tests to support 2.9.2 release

2024-03-23 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830097#comment-17830097 ] Tilman Hausherr commented on TIKA-4218: --- Confirmed, I reverted just that change and then the text

[jira] [Comment Edited] (TIKA-4218) Run regression tests to support 2.9.2 release

2024-03-23 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830094#comment-17830094 ] Tilman Hausherr edited comment on TIKA-4218 at 3/23/24 3:59 PM: Oops, or

[jira] [Commented] (TIKA-4218) Run regression tests to support 2.9.2 release

2024-03-23 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830094#comment-17830094 ] Tilman Hausherr commented on TIKA-4218: --- Oops, or it's part of XFA, I just found it too. > Run

[jira] [Commented] (TIKA-4218) Run regression tests to support 2.9.2 release

2024-03-23 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830093#comment-17830093 ] Tilman Hausherr commented on TIKA-4218: --- I found one difference: "Enter the full name of the

[jira] [Commented] (TIKA-4218) Run regression tests to support 2.9.2 release

2024-03-23 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830079#comment-17830079 ] Tilman Hausherr commented on TIKA-4218: --- The word "party" appears 36 times in the json file, 18

[jira] (TIKA-4218) Run regression tests to support 2.9.2 release

2024-03-23 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4218 ] Tilman Hausherr deleted comment on TIKA-4218: --- was (Author: tilman): There are also improvements not in my own test results, e.g. the "FOP" pdf file. Either something went wrong with my

[jira] [Commented] (TIKA-4218) Run regression tests to support 2.9.2 release

2024-03-23 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830071#comment-17830071 ] Tilman Hausherr commented on TIKA-4218: --- There are also improvements not in my own test results,

[jira] [Commented] (TIKA-4218) Run regression tests to support 2.9.2 release

2024-03-23 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830069#comment-17830069 ] Tilman Hausherr commented on TIKA-4218: --- Weird indeed, 876503.pdf didn't appear in the PDFBox

[jira] [Updated] (TIKA-4206) Variation on Zip Bomb

2024-03-15 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-4206: -- Description: I see TIKA-216 which aims to prevent Zip bombs, but I'm seeing what looks like a

[jira] [Closed] (TIKA-4214) Update apache compress in tika to 1.26+ for CVE-2024-26308.

2024-03-15 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr closed TIKA-4214. - Resolution: Duplicate Duplicate of TIKA-4199. > Update apache compress in tika to 1.26+ for

[jira] [Commented] (TIKA-4199) commons-compress 1.26.0 breaks Apache Tika 2.9.1

2024-03-14 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826996#comment-17826996 ] Tilman Hausherr commented on TIKA-4199: --- The original error you reported wasn't really a bug in

[jira] (TIKA-4166) dependency updates for Tika 3.0

2024-03-09 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4166 ] Tilman Hausherr deleted comment on TIKA-4166: --- was (Author: tilman): I've reverted it and will investigate / fix this later. Seems to be a problem with angus-activation. > dependency

[jira] [Commented] (TIKA-4166) dependency updates for Tika 3.0

2024-03-09 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824953#comment-17824953 ] Tilman Hausherr commented on TIKA-4166: --- I've reverted it and will investigate / fix this later.

[jira] [Resolved] (TIKA-4199) commons-compress 1.26.0 breaks Apache Tika 2.9.1

2024-03-09 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr resolved TIKA-4199. --- Resolution: Fixed Commons-Compress has been updated to 1.26.1, I have reverted the workaround

[jira] [Assigned] (TIKA-4199) commons-compress 1.26.0 breaks Apache Tika 2.9.1

2024-03-09 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr reassigned TIKA-4199: - Assignee: Tilman Hausherr > commons-compress 1.26.0 breaks Apache Tika 2.9.1 >

[jira] [Updated] (TIKA-4203) Add @deprecated annotation where needed

2024-02-24 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-4203: -- Fix Version/s: 3.0.0 > Add @deprecated annotation where needed >

[jira] [Updated] (TIKA-4203) Add @deprecated annotation where needed

2024-02-24 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-4203: -- Affects Version/s: 3.0.0 > Add @deprecated annotation where needed >

[jira] [Resolved] (TIKA-4203) Add @deprecated annotation where needed

2024-02-24 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr resolved TIKA-4203. --- Resolution: Fixed > Add @deprecated annotation where needed >

[jira] [Created] (TIKA-4203) Add @deprecated annotation where needed

2024-02-24 Thread Tilman Hausherr (Jira)
Tilman Hausherr created TIKA-4203: - Summary: Add @deprecated annotation where needed Key: TIKA-4203 URL: https://issues.apache.org/jira/browse/TIKA-4203 Project: Tika Issue Type: Task

[jira] [Updated] (TIKA-4199) commons-compress 1.26.0 breaks Apache Tika 2.9.1

2024-02-20 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-4199: -- Fix Version/s: 2.9.2 3.0.0 > commons-compress 1.26.0 breaks Apache Tika

[jira] [Commented] (TIKA-4199) commons-compress 1.26.0 breaks Apache Tika 2.9.1

2024-02-20 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818937#comment-17818937 ] Tilman Hausherr commented on TIKA-4199: --- I tried an another solution {code:java} if

[jira] [Commented] (TIKA-4201) Add hard limit to stream reading in IWorksParser#detectType

2024-02-20 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818873#comment-17818873 ] Tilman Hausherr commented on TIKA-4201: --- Yeah, makes sense. > Add hard limit to stream reading in

[jira] [Comment Edited] (TIKA-4199) commons-compress 1.26.0 breaks Apache Tika 2.9.1

2024-02-20 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818867#comment-17818867 ] Tilman Hausherr edited comment on TIKA-4199 at 2/20/24 3:37 PM: {quote}I'm

[jira] [Commented] (TIKA-4199) commons-compress 1.26.0 breaks Apache Tika 2.9.1

2024-02-20 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818867#comment-17818867 ] Tilman Hausherr commented on TIKA-4199: --- {quote}I'm not declaring this a problem with

[jira] [Commented] (TIKA-4199) commons-compress 1.26.0 breaks Apache Tika 2.9.1

2024-02-20 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818823#comment-17818823 ] Tilman Hausherr commented on TIKA-4199: --- After merging I discovered that the SevenZWrapper class is

[jira] [Closed] (TIKA-4200) Fix broken build after upgrade to commons-compress

2024-02-20 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr closed TIKA-4200. - Resolution: Duplicate Our CI is failing because of the CVE :-( Duplicate of TIKA-4199. I'm still

[jira] [Comment Edited] (TIKA-4199) commons-compress 1.26.0 breaks Apache Tika 2.9.1

2024-02-20 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818774#comment-17818774 ] Tilman Hausherr edited comment on TIKA-4199 at 2/20/24 11:57 AM: - I'm

[jira] [Commented] (TIKA-4199) commons-compress 1.26.0 breaks Apache Tika 2.9.1

2024-02-20 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818774#comment-17818774 ] Tilman Hausherr commented on TIKA-4199: --- I'm working on it

[jira] [Updated] (TIKA-3841) An exception occurred when parsing some word documents using tika, tika_exception

2024-02-09 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-3841: -- Summary: An exception occurred when parsing some word documents using tika, tika_exception

[jira] [Updated] (TIKA-3841) An exception occurred when parsing some word documents using tikatika_exception

2024-02-09 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-3841: -- Summary: An exception occurred when parsing some word documents using tikatika_exception (was:

[jira] [Closed] (TIKA-4183) Update jackson-databind jar to 2.16.0 or higher (CVE-2023-35116)

2024-01-22 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr closed TIKA-4183. - Resolution: Duplicate duplicate of TIKA-4162, it was done there on 17.11.2023 in

[jira] [Updated] (TIKA-4162) Update to 2.9.2

2024-01-22 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-4162: -- Fix Version/s: 2.9.2 > Update to 2.9.2 > --- > > Key: TIKA-4162 >

[jira] [Updated] (TIKA-4162) Update to 2.9.2

2023-12-27 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-4162: -- Affects Version/s: 2.9.1 > Update to 2.9.2 > --- > > Key: TIKA-4162

[jira] [Closed] (TIKA-4172) Apple binary file incorrectly identified as text/x-sql due to filename

2023-12-16 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr closed TIKA-4172. - Resolution: Not A Bug > Apple binary file incorrectly identified as text/x-sql due to filename >

[jira] [Commented] (TIKA-4173) Fix dev version in main branch

2023-12-13 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17796450#comment-17796450 ] Tilman Hausherr commented on TIKA-4173: --- It wasn't really a problem locally, I only had to change

[jira] [Commented] (TIKA-4173) Fix dev version in main branch

2023-12-13 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17796431#comment-17796431 ] Tilman Hausherr commented on TIKA-4173: --- I noticed that it didn't have the correct version, but I

[jira] [Commented] (TIKA-4172) Apple binary file incorrectly identified as text/x-sql due to filename

2023-11-25 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17789647#comment-17789647 ] Tilman Hausherr commented on TIKA-4172: --- Your file starts with 00 14 64 30.

[jira] [Commented] (TIKA-4172) Apple binary file incorrectly identified as text/x-sql due to filename

2023-11-24 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17789542#comment-17789542 ] Tilman Hausherr commented on TIKA-4172: --- application/octet-stream is defined as the default by the

[jira] [Commented] (TIKA-4172) Apple binary file incorrectly identified as text/x-sql due to filename

2023-11-23 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17789318#comment-17789318 ] Tilman Hausherr commented on TIKA-4172: --- https://tika.apache.org/2.1.0/detection.html "Where the

[jira] [Comment Edited] (TIKA-4172) Apple binary file incorrectly identified as text/x-sql due to filename

2023-11-22 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17788982#comment-17788982 ] Tilman Hausherr edited comment on TIKA-4172 at 11/23/23 5:05 AM: - Which

[jira] [Commented] (TIKA-4172) Apple binary file incorrectly identified as text/x-sql due to filename

2023-11-22 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17788982#comment-17788982 ] Tilman Hausherr commented on TIKA-4172: --- Which tika call are you using? Have you tried detecting

[jira] [Commented] (TIKA-4166) dependency updates for Tika 3.0

2023-11-04 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17782915#comment-17782915 ] Tilman Hausherr commented on TIKA-4166: --- The zookeeper update worked locally, but not on the CI :-(

[jira] [Created] (TIKA-4166) dependency updates for Tika 3.0

2023-11-03 Thread Tilman Hausherr (Jira)
Tilman Hausherr created TIKA-4166: - Summary: dependency updates for Tika 3.0 Key: TIKA-4166 URL: https://issues.apache.org/jira/browse/TIKA-4166 Project: Tika Issue Type: Task

[jira] [Created] (TIKA-4162) Update to 2.9.2

2023-10-21 Thread Tilman Hausherr (Jira)
Tilman Hausherr created TIKA-4162: - Summary: Update to 2.9.2 Key: TIKA-4162 URL: https://issues.apache.org/jira/browse/TIKA-4162 Project: Tika Issue Type: Task Components: build

[jira] [Commented] (TIKA-4135) Remove xerces from Tika 3.x/main branch?

2023-09-29 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770571#comment-17770571 ] Tilman Hausherr commented on TIKA-4135: --- Yes, but how to make sure it happens only in the test? I

[jira] [Commented] (TIKA-4135) Remove xerces from Tika 3.x/main branch?

2023-09-29 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770559#comment-17770559 ] Tilman Hausherr commented on TIKA-4135: --- There must be some way to run THIS test in a US locale, but

[jira] [Commented] (TIKA-4135) Remove xerces from Tika 3.x/main branch?

2023-09-29 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770550#comment-17770550 ] Tilman Hausherr commented on TIKA-4135: --- The build fails in Germany: Running

[jira] [Commented] (TIKA-4137) Building current Tika main branch fails under Java 20/21

2023-09-24 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17768361#comment-17768361 ] Tilman Hausherr commented on TIKA-4137: --- I've modified the jdk18 build on the ci to a jdk21 build

[jira] [Comment Edited] (TIKA-4137) Building current Tika main branch fails under Java 20/21

2023-09-24 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17768361#comment-17768361 ] Tilman Hausherr edited comment on TIKA-4137 at 9/24/23 9:05 AM: I've

[jira] [Closed] (TIKA-4136) Upgrade Commons compress to 1.24.x

2023-09-20 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr closed TIKA-4136. - Fix Version/s: (was: 2.9.1) Resolution: Duplicate Thanks, but this was done in

[jira] [Commented] (TIKA-4123) Update to 2.9.1

2023-09-15 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17765755#comment-17765755 ] Tilman Hausherr commented on TIKA-4123: --- Yes that's fine. > Update to 2.9.1 > --- > >

[jira] [Comment Edited] (TIKA-4123) Update to 2.9.1

2023-09-15 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17765743#comment-17765743 ] Tilman Hausherr edited comment on TIKA-4123 at 9/15/23 5:48 PM: Yes... I

  1   2   3   4   5   6   7   8   >