[jira] [Commented] (TIKA-4448) Downgrade junit5 to 5.13.2 until 6.0.0 is released

2025-10-03 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18024577#comment-18024577 ] Tim Allison commented on TIKA-4448: --- Thank you. Fixed. I hope. > Downgrade ju

[jira] [Resolved] (TIKA-4328) Update or remove tika-deployment snaps

2025-10-03 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4328. --- Fix Version/s: 4.0.0 Resolution: Fixed > Update or remove tika-deployment sn

[jira] [Resolved] (TIKA-4332) Consider removing dotnet module in 4.x/main

2025-10-03 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4332. --- Fix Version/s: 4.0.0 Resolution: Fixed > Consider removing dotnet module in 4.x/m

[jira] [Resolved] (TIKA-4503) Refactor serialization

2025-10-03 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4503. --- Fix Version/s: 4.0.0 Resolution: Fixed Breaking changes prevent this from going into 3.x

[jira] [Commented] (TIKA-4490) Fix trivial runtime problems found when adding Tika to oss-fuzz

2025-10-02 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18024360#comment-18024360 ] Tim Allison commented on TIKA-4490: --- Team, I'm sorry for all the noise on this

[jira] [Commented] (TIKA-4489) Add Tika to ossfuzz

2025-10-02 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18024348#comment-18024348 ] Tim Allison commented on TIKA-4489: --- I just opened this: https://github.com/google

[jira] [Commented] (TIKA-4307) Text in header not extracted for Microsoft Word doc file

2025-10-02 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18024343#comment-18024343 ] Tim Allison commented on TIKA-4307: --- I don't have enough knowledge of the

[jira] [Updated] (TIKA-4499) Remove dl4j components from 4.x

2025-10-02 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-4499: -- Description: dl4j hasn't been updated since 2022. People integrating modern llm-based processing

[jira] [Resolved] (TIKA-4501) remove tika-dotnet from 4.x

2025-10-02 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4501. --- Resolution: Duplicate > remove tika-dotnet from

[jira] [Created] (TIKA-4502) Remove tika-deployment/snaps from 4.x/main

2025-10-02 Thread Tim Allison (Jira)
Tim Allison created TIKA-4502: - Summary: Remove tika-deployment/snaps from 4.x/main Key: TIKA-4502 URL: https://issues.apache.org/jira/browse/TIKA-4502 Project: Tika Issue Type: Task

[jira] [Created] (TIKA-4500) Remove advancedmedia module and package from 4.x

2025-10-02 Thread Tim Allison (Jira)
Tim Allison created TIKA-4500: - Summary: Remove advancedmedia module and package from 4.x Key: TIKA-4500 URL: https://issues.apache.org/jira/browse/TIKA-4500 Project: Tika Issue Type: Task

[jira] [Created] (TIKA-4499) Remove dl4j components from 4.x

2025-10-02 Thread Tim Allison (Jira)
Tim Allison created TIKA-4499: - Summary: Remove dl4j components from 4.x Key: TIKA-4499 URL: https://issues.apache.org/jira/browse/TIKA-4499 Project: Tika Issue Type: Task Reporter

[jira] [Commented] (TIKA-4343) Remove agepredictor in 4.x

2025-10-02 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18024328#comment-18024328 ] Tim Allison commented on TIKA-4343: --- This has been open for nearly a year. I'

[jira] [Created] (TIKA-4498) Allow passing back some metadata from an emit in tika-pipes

2025-10-01 Thread Tim Allison (Jira)
Tim Allison created TIKA-4498: - Summary: Allow passing back some metadata from an emit in tika-pipes Key: TIKA-4498 URL: https://issues.apache.org/jira/browse/TIKA-4498 Project: Tika Issue Type

[jira] [Created] (TIKA-4497) Allow per parse timeouts via ParseContext in tika-pipes

2025-10-01 Thread Tim Allison (Jira)
Tim Allison created TIKA-4497: - Summary: Allow per parse timeouts via ParseContext in tika-pipes Key: TIKA-4497 URL: https://issues.apache.org/jira/browse/TIKA-4497 Project: Tika Issue Type

[jira] [Resolved] (TIKA-4496) Bump jdk23 github workflow to jdk25

2025-10-01 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4496. --- Resolution: Fixed > Bump jdk23 github workflow to jd

[jira] [Commented] (TIKA-4493) Text extracted from PDF appears vertical when using Apache Tika

2025-09-27 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18022411#comment-18022411 ] Tim Allison commented on TIKA-4493: --- [~tilman] should we make {{{}detectAngles{}}}=

[jira] [Updated] (TIKA-4495) What do we do with ossindex?

2025-09-26 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-4495: -- Description: The ossindex plugin now requires an api key. I don't know if we can get one for

[jira] [Commented] (TIKA-4494) org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler Failed to parse SST index

2025-09-26 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18023094#comment-18023094 ] Tim Allison commented on TIKA-4494: --- The child process is intended to go

[jira] [Comment Edited] (TIKA-4494) org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler Failed to parse SST index

2025-09-26 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18023085#comment-18023085 ] Tim Allison edited comment on TIKA-4494 at 9/26/25 12:1

[jira] [Assigned] (TIKA-4334) Move tika pipes components in tika-core to tika-pipes-core in 4.x

2025-09-26 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison reassigned TIKA-4334: - Assignee: Tim Allison > Move tika pipes components in tika-core to tika-pipes-core in

[jira] [Commented] (TIKA-4494) org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler Failed to parse SST index

2025-09-26 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18023101#comment-18023101 ] Tim Allison commented on TIKA-4494: --- K. Thank you. I responded on CONNECTORS-

[jira] [Comment Edited] (TIKA-4494) org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler Failed to parse SST index

2025-09-26 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18023094#comment-18023094 ] Tim Allison edited comment on TIKA-4494 at 9/26/25 12:4

[jira] [Commented] (TIKA-4494) org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler Failed to parse SST index

2025-09-26 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18023103#comment-18023103 ] Tim Allison commented on TIKA-4494: --- Maybe related? https://issues.apache.org/

[jira] [Commented] (TIKA-4494) org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler Failed to parse SST index

2025-09-26 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18023095#comment-18023095 ] Tim Allison commented on TIKA-4494: --- Is the child process behavior different bet

[jira] [Comment Edited] (TIKA-4494) org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler Failed to parse SST index

2025-09-26 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18023085#comment-18023085 ] Tim Allison edited comment on TIKA-4494 at 9/26/25 12:4

[jira] [Updated] (TIKA-4494) org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler Failed to parse SST index

2025-09-26 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-4494: -- Attachment: testEXCEL_space.xlsx > org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler Failed

[jira] [Commented] (TIKA-4494) org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler Failed to parse SST index

2025-09-26 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18023085#comment-18023085 ] Tim Allison commented on TIKA-4494: --- I can develop an example triggering file from

[jira] [Resolved] (TIKA-4392) Incorrect "org.apache.xerces.util"-entry in Manifest file

2025-09-24 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4392. --- Fix Version/s: 4.0.0 3.3.0 Resolution: Fixed > Incorr

[jira] [Updated] (TIKA-4483) Release a minimal 3.2.3

2025-09-20 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-4483: -- Description: We should get a fix out for TIKA-4482 as soon as possible. On the dev list[1], I proposed

[ANNOUNCE] Apache Tika 3.2.3 released

2025-09-20 Thread Tim Allison
Apache Tika, visit the project home page: https://tika.apache.org/ -- Tim Allison, on behalf of the Apache Tika community

[jira] [Commented] (TIKA-1180) Better Matroska MKV and WEBM Detection

2025-09-20 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-1180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18019425#comment-18019425 ] Tim Allison commented on TIKA-1180: --- The PR that I merged with some fixes (and the

[jira] [Resolved] (TIKA-4469) After upgrading to 3.2.2 most files are incorrectly treated as Archive's by AutoDetectParser

2025-09-20 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4469. --- Fix Version/s: 3.2.3 Resolution: Fixed > After upgrading to 3.2.2 most files are incorrec

[jira] [Updated] (TIKA-4154) Make DEFAULT_MAX_STRING_LEN in StreamReadConstraints configurable

2025-09-19 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-4154: -- Fix Version/s: 3.0.0 2.9.2 > Make DEFAULT_MAX_STRING_LEN in StreamReadConstrai

[jira] [Created] (TIKA-4483) Release a minimal 3.2.3

2025-09-19 Thread Tim Allison (Jira)
Tim Allison created TIKA-4483: - Summary: Release a minimal 3.2.3 Key: TIKA-4483 URL: https://issues.apache.org/jira/browse/TIKA-4483 Project: Tika Issue Type: Task Reporter: Tim

[jira] [Commented] (TIKA-4471) Add unit tests for XMLReaderUtil to confirm secure configurations

2025-09-18 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18019165#comment-18019165 ] Tim Allison commented on TIKA-4471: --- Great. Thank you! > Add unit te

[jira] [Created] (TIKA-4490) Fix trivial runtime problems found when adding Tika to oss-fuzz

2025-09-18 Thread Tim Allison (Jira)
Tim Allison created TIKA-4490: - Summary: Fix trivial runtime problems found when adding Tika to oss-fuzz Key: TIKA-4490 URL: https://issues.apache.org/jira/browse/TIKA-4490 Project: Tika Issue

[jira] [Created] (TIKA-4489) Add Tika to ossfuzz

2025-09-18 Thread Tim Allison (Jira)
Tim Allison created TIKA-4489: - Summary: Add Tika to ossfuzz Key: TIKA-4489 URL: https://issues.apache.org/jira/browse/TIKA-4489 Project: Tika Issue Type: Task Reporter: Tim Allison

[jira] [Updated] (TIKA-4483) Release a minimal 3.2.3

2025-09-17 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-4483: -- Description: We should get a fix our for TIKA-4482 as soon as possible. On the dev list[1], I proposed

Mitigating CVE-2025-54988 on Tika 2.x

2025-09-17 Thread Tim Allison
Since we made the announcement about CVE-2025-54988, we've learned that there are some mitigations available if you need to stay with the Tika 2.x branch. We regret that 2.x reached EOL in April 2025 (see: https://tika.apache.org/), and the Tika project has no plans to make another 2.x release. Ma

[jira] [Resolved] (TIKA-4471) Add unit tests for XMLReaderUtil to confirm secure configurations

2025-09-17 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4471. --- Resolution: Fixed > Add unit tests for XMLReaderUtil to confirm secure configurati

[jira] [Created] (TIKA-4487) Update Tika logo to reflect ASF's new logo

2025-09-17 Thread Tim Allison (Jira)
Tim Allison created TIKA-4487: - Summary: Update Tika logo to reflect ASF's new logo Key: TIKA-4487 URL: https://issues.apache.org/jira/browse/TIKA-4487 Project: Tika Issue Type:

[jira] [Created] (TIKA-4486) Update ASF logo on our site

2025-09-17 Thread Tim Allison (Jira)
Tim Allison created TIKA-4486: - Summary: Update ASF logo on our site Key: TIKA-4486 URL: https://issues.apache.org/jira/browse/TIKA-4486 Project: Tika Issue Type: Task Reporter: Tim

[jira] [Commented] (TIKA-3629) Keywords are not extracted anymore from PDF documents

2025-09-17 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18020948#comment-18020948 ] Tim Allison commented on TIKA-3629: --- This may have been fixed on TIKA-

[jira] [Resolved] (TIKA-4486) Update ASF logo on our site

2025-09-17 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4486. --- Resolution: Fixed > Update ASF logo on our site > --- > >

[jira] [Resolved] (TIKA-4483) Release a minimal 3.2.3

2025-09-15 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4483. --- Resolution: Fixed Cleaned up the tags and ran the release of tika-docker. I'll open a foll

[jira] [Created] (TIKA-4484) Release tika-helm for 3.2.3

2025-09-15 Thread Tim Allison (Jira)
Tim Allison created TIKA-4484: - Summary: Release tika-helm for 3.2.3 Key: TIKA-4484 URL: https://issues.apache.org/jira/browse/TIKA-4484 Project: Tika Issue Type: Task Components: tika

[jira] [Updated] (TIKA-4483) Release a minimal 3.2.3

2025-09-15 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-4483: -- Fix Version/s: 3.2.3 > Release a minimal 3.2.3 > --- > >

[jira] [Commented] (TIKA-4483) Release a minimal 3.2.3

2025-09-15 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18020331#comment-18020331 ] Tim Allison commented on TIKA-4483: --- Artifacts released and site updated. I'l

[RESULT][VOTE] Release Apache Tika 3.2.3 Candidate #1

2025-09-15 Thread Tim Allison
The vote has passed with three PMC +1s and no -1s. +1s: Oleg Tikhonov Tilman Hausherr Tim Allison I'll update the website and release the artifacts shortly. Thank you, all! Best, Tim On Fri, Sep 12, 2025 at 3:45 AM Tilman Hausherr wrote: > > +1 successful build on 11.0

[jira] [Reopened] (TIKA-4469) After upgrading to 3.2.2 most files are incorrectly treated as Archive's by AutoDetectParser

2025-09-12 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison reopened TIKA-4469: --- Assignee: Tim Allison Even though I still think this is "not a problem", I think we sho

[VOTE] Release Apache Tika 3.2.3 Candidate #1

2025-09-12 Thread Tim Allison
A candidate for the Tika 3.2.3 release is available at: https://dist.apache.org/repos/dist/dev/tika/3.2.3 The release candidate is a zip archive of the sources in: https://github.com/apache/tika/tree/3.2.3-rc1/ The SHA-512 checksum of the archive is 83ca35af53977364c1163eea3c65e9d7479e2f0f0b94b2b

[jira] [Comment Edited] (TIKA-1180) Better Matroska MKV and WEBM Detection

2025-09-12 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-1180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18019425#comment-18019425 ] Tim Allison edited comment on TIKA-1180 at 9/10/25 5:4

[jira] [Commented] (TIKA-4483) Release a minimal 3.2.3

2025-09-11 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18019618#comment-18019618 ] Tim Allison commented on TIKA-4483: --- metadata count diffs are largely in bmp.

[jira] [Resolved] (TIKA-4471) Add unit tests for XMLReaderUtil to confirm secure configurations

2025-09-11 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4471. --- Fix Version/s: 4.0.0 3.3.0 Resolution: Fixed Please reopen if the tests are

[jira] [Commented] (TIKA-4113) Better Matroska MKA Detection

2025-09-11 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18019427#comment-18019427 ] Tim Allison commented on TIKA-4113: --- We now have a MatroskaDetector. If it is

[jira] [Resolved] (TIKA-4482) tika-server and other modules that bring in woodstox can no longer parse a PDF with XFA

2025-09-11 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4482. --- Fix Version/s: 4.0.0 3.2.3 Resolution: Fixed > tika-server and ot

Re: next 3.x should be minor version increment: 3.3.0?

2025-09-11 Thread Tim Allison
In the absence of objections, I just pushed the update from 3.2.3-SNAPSHOT to 3.3.0-SNAPSHOT https://github.com/apache/tika/commit/934bbd874b0f62dfb474a09327160ff166a1f8db On Mon, Aug 18, 2025 at 3:59 PM Tim Allison wrote: > > With the changes on TIKA-4465 and the potential upcoming chan

[jira] [Commented] (TIKA-4483) Release a minimal 3.2.3

2025-09-11 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18019616#comment-18019616 ] Tim Allison commented on TIKA-4483: --- Reports attached. Not surprisingly. There are

[jira] [Updated] (TIKA-4483) Release a minimal 3.2.3

2025-09-11 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-4483: -- Attachment: 3.2.3-reports.tgz > Release a minimal 3.

[jira] [Updated] (TIKA-4483) Release a minimal 3.2.3

2025-09-11 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-4483: -- Description: We should get a fix our for TIKA-4482 as soon as possible. On the dev list[1], I proposed

[jira] [Comment Edited] (TIKA-4469) After upgrading to 3.2.2 most files are incorrectly treated as Archive's by AutoDetectParser

2025-09-11 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18019367#comment-18019367 ] Tim Allison edited comment on TIKA-4469 at 9/11/25 12:3

[jira] [Commented] (TIKA-4482) tika-server and other modules that bring in woodstox can no longer parse a PDF with XFA

2025-09-11 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18019586#comment-18019586 ] Tim Allison commented on TIKA-4482: --- https://issues.apache.org/jira/browse/TIKA-

[jira] [Commented] (TIKA-4482) tika-server and other modules that bring in woodstox can no longer parse a PDF with XFA

2025-09-11 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18019549#comment-18019549 ] Tim Allison commented on TIKA-4482: --- Y. That's it. On the dev list, I proposed

[jira] [Updated] (TIKA-4482) tika-server and other modules that bring in woodstox can no longer parse a PDF with XFA

2025-09-10 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-4482: -- Summary: tika-server and other modules that bring in woodstox can no longer parse a PDF with XFA (was

[jira] [Created] (TIKA-4482) Update stax configuration to account for woodstox not handling XMLConstants.ACCESS_EXTERNAL_DTD

2025-09-10 Thread Tim Allison (Jira)
Tim Allison created TIKA-4482: - Summary: Update stax configuration to account for woodstox not handling XMLConstants.ACCESS_EXTERNAL_DTD Key: TIKA-4482 URL: https://issues.apache.org/jira/browse/TIKA-4482

Re: Next 3.x release?

2025-09-10 Thread Tim Allison
Sorry... 3.3.3 -> 3.2.3 WDYT? If we go this route, anything else we'd want to merge into a 3.2.3 release? On Wed, Sep 10, 2025 at 10:41 AM Tim Allison wrote: > > All, > > Once we get the fix for TIKA-4482 in, I think that we should aim for > a 3.2.3 release. Without

[jira] [Updated] (TIKA-4476) Audio only MP4 files should be typed audio/mp4

2025-09-10 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-4476: -- Issue Type: Improvement (was: Bug) > Audio only MP4 files should be typed audio/

[jira] [Resolved] (TIKA-4154) Make DEFAULT_MAX_STRING_LEN in StreamReadConstraints configurable

2025-09-10 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4154. --- Resolution: Fixed > Make DEFAULT_MAX_STRING_LEN in StreamReadConstraints configura

[jira] [Resolved] (TIKA-4476) Audio only MP4 files should be typed audio/mp4

2025-09-10 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4476. --- Fix Version/s: 4.0.0 3.3.0 Resolution: Fixed > Audio only MP4 files sho

Next 3.x release?

2025-09-10 Thread Tim Allison
All, Once we get the fix for TIKA-4482 in, I think that we should aim for a 3.2.3 release. Without that fix, users of tika-server will not be able to process PDFs with XFA. We have several changes in the 3.x branch that are significant enough for a 3.3.0 release. I propose that we hold off on

[jira] [Updated] (TIKA-4482) tika-server and other modules that bring in woodstox can no longer parse a PDF with XFA

2025-09-10 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-4482: -- Description: Original title: Update stax configuration to account for woodstox not handling

[jira] [Updated] (TIKA-4482) Update stax configuration to account for woodstox not handling XMLConstants.ACCESS_EXTERNAL_DTD

2025-09-10 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-4482: -- Priority: Blocker (was: Minor) > Update stax configuration to account for woodstox not handl

[jira] [Commented] (TIKA-4482) Update stax configuration to account for woodstox not handling XMLConstants.ACCESS_EXTERNAL_DTD

2025-09-10 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18019348#comment-18019348 ] Tim Allison commented on TIKA-4482: --- This is a problem not just for users who happe

[jira] [Created] (TIKA-4480) Add -Dossindex.skip to 2.x workflow

2025-09-09 Thread Tim Allison (Jira)
Tim Allison created TIKA-4480: - Summary: Add -Dossindex.skip to 2.x workflow Key: TIKA-4480 URL: https://issues.apache.org/jira/browse/TIKA-4480 Project: Tika Issue Type: Task

[jira] [Commented] (TIKA-4471) Add unit tests for XMLReaderUtil to confirm secure configurations

2025-09-09 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18019145#comment-18019145 ] Tim Allison commented on TIKA-4471: --- Thank you, [~tilman] . Hopefully that PR w

[jira] [Commented] (TIKA-4471) Add unit tests for XMLReaderUtil to confirm secure configurations

2025-09-08 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18018852#comment-18018852 ] Tim Allison commented on TIKA-4471: --- Ha. Looks like jdk11 didn'

[jira] [Comment Edited] (TIKA-4471) Add unit tests for XMLReaderUtil to confirm secure configurations

2025-09-08 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18018834#comment-18018834 ] Tim Allison edited comment on TIKA-4471 at 9/8/25 2:18 PM: ---

[jira] [Commented] (TIKA-4471) Add unit tests for XMLReaderUtil to confirm secure configurations

2025-09-08 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18018834#comment-18018834 ] Tim Allison commented on TIKA-4471: --- I'm concerned that the tests in the P

Re: Tika End-of-life (EOL) and support

2025-09-05 Thread Tim Allison
+1 Great idea. Go forth! On Thu, Sep 4, 2025 at 11:29 PM lewis john mcgibbney wrote: > Hi dev@, > > It recently came to my attention that accurate $title information would be > useful for our community. > > Tika (EOL) data could be useful and important for several reasons: > > - Security Manage

[jira] [Commented] (TIKA-4467) Problem with XADES files format in Apache Tika

2025-08-25 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18015989#comment-18015989 ] Tim Allison commented on TIKA-4467: --- Based on [https://camel.apache.org/compon

[jira] [Comment Edited] (TIKA-4467) Problem with XADES files format in Apache Tika

2025-08-25 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18015979#comment-18015979 ] Tim Allison edited comment on TIKA-4467 at 8/25/25 12:3

[jira] [Commented] (TIKA-4467) Problem with XADES files format in Apache Tika

2025-08-25 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18015980#comment-18015980 ] Tim Allison commented on TIKA-4467: --- TIKA-1379 is slightly different in that the

[jira] [Commented] (TIKA-4467) Problem with XADES files format in Apache Tika

2025-08-25 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18015979#comment-18015979 ] Tim Allison commented on TIKA-4467: --- We should be able to add a "root-XML&q

[jira] [Resolved] (TIKA-4472) Extract macros by default in tika-app's cli when run against a single file

2025-08-21 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4472. --- Fix Version/s: 4.0.0 3.3.0 Resolution: Fixed Apologies for the two PRs. The

[jira] [Created] (TIKA-4472) Extract macros by default in tika-app's cli when run against a single file

2025-08-21 Thread Tim Allison (Jira)
Tim Allison created TIKA-4472: - Summary: Extract macros by default in tika-app's cli when run against a single file Key: TIKA-4472 URL: https://issues.apache.org/jira/browse/TIKA-4472 Project:

[jira] [Resolved] (TIKA-4460) Prep for 3.2.2 release

2025-08-21 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4460. --- Fix Version/s: 3.2.2 Resolution: Fixed > Prep for 3.2.2 rele

Re: FW: OSS-Fuzz integration

2025-08-21 Thread Tim Allison
All, Over the last two years, I've worked quite a bit with Jazzer and oss-fuzz on my $dayjob. Dominik Stadler has done an amazing job with fuzz harnesses for POI[0], and there are some rudimentary harnesses for PDFBox [1]. Commons-compress, of course, is very well represented[2]. I was initia

[jira] [Commented] (TIKA-4470) tika-batch-*-tests.jar contain test output XML files

2025-08-21 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18015419#comment-18015419 ] Tim Allison commented on TIKA-4470: --- Y, when moving from 1.x to 2.x, I thought that

[jira] [Created] (TIKA-4471) Add unit tests for XMLReaderUtil to confirm secure configurations

2025-08-21 Thread Tim Allison (Jira)
Tim Allison created TIKA-4471: - Summary: Add unit tests for XMLReaderUtil to confirm secure configurations Key: TIKA-4471 URL: https://issues.apache.org/jira/browse/TIKA-4471 Project: Tika

[jira] [Commented] (TIKA-4470) tika-batch-*-tests.jar contain test output XML files

2025-08-21 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18015414#comment-18015414 ] Tim Allison commented on TIKA-4470: --- tika-batch is entirely removed from 4.x/main.

[jira] [Commented] (TIKA-4460) Prep for 3.2.2 release

2025-08-21 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18015411#comment-18015411 ] Tim Allison commented on TIKA-4460: --- I built and pushed the docker images for 3.2

[jira] [Resolved] (TIKA-4466) OPFParser: Only the last dc:identifier is parsed, while multiple are valid.

2025-08-20 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4466. --- Fix Version/s: 4.0.0 3.3.0 Resolution: Fixed I focused on ODT and PDF for

[jira] [Comment Edited] (TIKA-4469) After upgrading to 3.2.2 most files are incorrectly treated as Archive's by AutoDetectParser

2025-08-20 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18015108#comment-18015108 ] Tim Allison edited comment on TIKA-4469 at 8/20/25 10:4

[jira] [Comment Edited] (TIKA-4469) After upgrading to 3.2.2 most files are incorrectly treated as Archive's by AutoDetectParser

2025-08-20 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18015108#comment-18015108 ] Tim Allison edited comment on TIKA-4469 at 8/20/25 10:4

[jira] [Commented] (TIKA-4469) After upgrading to 3.2.2 most files are incorrectly treated as Archive's by AutoDetectParser

2025-08-20 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18015108#comment-18015108 ] Tim Allison commented on TIKA-4469: --- How are you managing dependencies and is com

[jira] [Comment Edited] (TIKA-1295) Make some Dublin Core items multi-valued

2025-08-19 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18014930#comment-18014930 ] Tim Allison edited comment on TIKA-1295 at 8/19/25 3:46 PM: ---

[jira] [Commented] (TIKA-4466) OPFParser: Only the last dc:identifier is parsed, while multiple are valid.

2025-08-19 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18014939#comment-18014939 ] Tim Allison commented on TIKA-4466: --- Based on this: [https://exiftool.org/TagN

[jira] [Comment Edited] (TIKA-4466) OPFParser: Only the last dc:identifier is parsed, while multiple are valid.

2025-08-19 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18014934#comment-18014934 ] Tim Allison edited comment on TIKA-4466 at 8/19/25 3:2

[jira] [Commented] (TIKA-4466) OPFParser: Only the last dc:identifier is parsed, while multiple are valid.

2025-08-19 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18014935#comment-18014935 ] Tim Allison commented on TIKA-4466: --- What isn't clear to me is extracting

  1   2   3   4   5   6   7   8   9   10   >