[jira] [Commented] (TIKA-4252) PipesClient#process - seems to lose the Fetch input metadata?

2024-05-26 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849561#comment-17849561 ] ASF GitHub Bot commented on TIKA-4252: -- nddipiazza opened a new pull request, #1778: URL: https

[jira] [Commented] (TIKA-4252) PipesClient#process - seems to lose the Fetch input metadata?

2024-05-26 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849560#comment-17849560 ] ASF GitHub Bot commented on TIKA-4252: -- nddipiazza closed pull request #1774: TIKA-4252 fetch tuple

[jira] [Closed] (TIKA-4262) In pipes XML config, List serializes incorrect causing the parameters to be empty when read

2024-05-26 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas DiPiazza closed TIKA-4262. --- Assignee: Nicholas DiPiazza Resolution: Invalid never mind - this was an issue in my

[jira] [Updated] (TIKA-4262) In pipes XML config, List serializes incorrect causing the parameters to be empty when read

2024-05-26 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas DiPiazza updated TIKA-4262: Description: tika configuration when saving a fetcher with a list of strings will look like

[jira] [Created] (TIKA-4262) In pipes XML config, List serializes incorrect causing the parameters to be empty when read

2024-05-26 Thread Nicholas DiPiazza (Jira)
Nicholas DiPiazza created TIKA-4262: --- Summary: In pipes XML config, List serializes incorrect causing the parameters to be empty when read Key: TIKA-4262 URL: https://issues.apache.org/jira/browse/TIKA-4262

[jira] [Commented] (TIKA-4261) Add attachment type metadata filter

2024-05-24 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849394#comment-17849394 ] Hudson commented on TIKA-4261: -- SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1638 (See

[jira] [Commented] (TIKA-4260) Add parse context to the fetcher interface in 3.x

2024-05-24 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849384#comment-17849384 ] ASF GitHub Bot commented on TIKA-4260: -- tballison commented on PR #1776: URL: https://github.com

[jira] [Commented] (TIKA-4261) Add attachment type metadata filter

2024-05-24 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849379#comment-17849379 ] ASF GitHub Bot commented on TIKA-4261: -- tballison merged PR #1777: URL: https://github.com/apache

[jira] [Commented] (TIKA-4261) Add attachment type metadata filter

2024-05-24 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849369#comment-17849369 ] ASF GitHub Bot commented on TIKA-4261: -- tballison opened a new pull request, #1777: URL: https

[jira] [Created] (TIKA-4261) Add attachment type metadata filter

2024-05-24 Thread Tim Allison (Jira)
Tim Allison created TIKA-4261: - Summary: Add attachment type metadata filter Key: TIKA-4261 URL: https://issues.apache.org/jira/browse/TIKA-4261 Project: Tika Issue Type: Task

[jira] [Commented] (TIKA-4259) Decouple xml parser stuff from ParseContext

2024-05-24 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849321#comment-17849321 ] Hudson commented on TIKA-4259: -- SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1637 (See

[jira] [Resolved] (TIKA-4259) Decouple xml parser stuff from ParseContext

2024-05-24 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4259. --- Fix Version/s: 3.0.0 Resolution: Fixed > Decouple xml parser stuff from ParseCont

[jira] [Commented] (TIKA-4259) Decouple xml parser stuff from ParseContext

2024-05-24 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849299#comment-17849299 ] ASF GitHub Bot commented on TIKA-4259: -- tballison merged PR #1775: URL: https://github.com/apache

[jira] [Commented] (TIKA-4260) Add parse context to the fetcher interface in 3.x

2024-05-24 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849298#comment-17849298 ] Tim Allison commented on TIKA-4260: --- That PR currently only works on tika-core. More needs to be done

[jira] [Commented] (TIKA-4260) Add parse context to the fetcher interface in 3.x

2024-05-24 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849296#comment-17849296 ] ASF GitHub Bot commented on TIKA-4260: -- tballison opened a new pull request, #1776: URL: https

[jira] [Commented] (TIKA-4260) Add parse context to the fetcher interface in 3.x

2024-05-24 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849297#comment-17849297 ] ASF GitHub Bot commented on TIKA-4260: -- tballison commented on PR #1776: URL: https://github.com

[jira] [Commented] (TIKA-4243) tika configuration overhaul

2024-05-24 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849288#comment-17849288 ] Tim Allison commented on TIKA-4243: --- [~ndipiazza], I added parseContext to fetchers and emitters

[jira] [Comment Edited] (TIKA-4243) tika configuration overhaul

2024-05-24 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849103#comment-17849103 ] Tim Allison edited comment on TIKA-4243 at 5/24/24 1:00 PM: Proposed basic

[jira] [Created] (TIKA-4260) Add parse context to the fetcher interface in 3.x

2024-05-23 Thread Tim Allison (Jira)
Tim Allison created TIKA-4260: - Summary: Add parse context to the fetcher interface in 3.x Key: TIKA-4260 URL: https://issues.apache.org/jira/browse/TIKA-4260 Project: Tika Issue Type: Task

[jira] [Commented] (TIKA-4259) Decouple xml parser stuff from ParseContext

2024-05-23 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849117#comment-17849117 ] ASF GitHub Bot commented on TIKA-4259: -- tballison opened a new pull request, #1775: URL: https

[jira] [Created] (TIKA-4259) Decouple xml parser stuff from ParseContext

2024-05-23 Thread Tim Allison (Jira)
Tim Allison created TIKA-4259: - Summary: Decouple xml parser stuff from ParseContext Key: TIKA-4259 URL: https://issues.apache.org/jira/browse/TIKA-4259 Project: Tika Issue Type: Task

[jira] [Commented] (TIKA-4243) tika configuration overhaul

2024-05-23 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849114#comment-17849114 ] Tim Allison commented on TIKA-4243: --- I'm going to start working on PRs that will be generally helpful

[jira] [Commented] (TIKA-4243) tika configuration overhaul

2024-05-23 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849108#comment-17849108 ] Tim Allison commented on TIKA-4243: --- The downsides we see: a) if we there's agreement to add jackson

[jira] [Commented] (TIKA-4243) tika configuration overhaul

2024-05-23 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849103#comment-17849103 ] Tim Allison commented on TIKA-4243: --- Proposed basic roadmap: Serialize ParseContext as is... Allow

[jira] [Commented] (TIKA-4243) tika configuration overhaul

2024-05-23 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849101#comment-17849101 ] Tim Allison commented on TIKA-4243: --- Fellow devs, in chatting with Nicholas, we're thinking

[jira] [Commented] (TIKA-4243) tika configuration overhaul

2024-05-23 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848960#comment-17848960 ] Nicholas DiPiazza commented on TIKA-4243: - Sure that sounds good. When we chat later today

[jira] [Commented] (TIKA-4252) PipesClient#process - seems to lose the Fetch input metadata?

2024-05-23 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848959#comment-17848959 ] ASF GitHub Bot commented on TIKA-4252: -- nddipiazza commented on PR #1774: URL: https://github.com

[jira] [Commented] (TIKA-4252) PipesClient#process - seems to lose the Fetch input metadata?

2024-05-22 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848808#comment-17848808 ] ASF GitHub Bot commented on TIKA-4252: -- nddipiazza opened a new pull request, #1774: URL: https

[jira] [Resolved] (TIKA-4258) Multi-arch support for docker images

2024-05-21 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4258. --- Resolution: Fixed Just pushed 2.9.2.1/*-latest Thank you, all! > Multi-arch support for doc

[jira] [Commented] (TIKA-4258) Multi-arch support for docker images

2024-05-21 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848341#comment-17848341 ] ASF GitHub Bot commented on TIKA-4258: -- tballison closed pull request #19: Add Github CI workflows

[jira] [Commented] (TIKA-4258) Multi-arch support for docker images

2024-05-21 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848342#comment-17848342 ] ASF GitHub Bot commented on TIKA-4258: -- tballison commented on PR #19: URL: https://github.com/apache

[jira] [Commented] (TIKA-4166) dependency updates for Tika 3.0

2024-05-21 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848338#comment-17848338 ] Hudson commented on TIKA-4166: -- SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1636 (See

[jira] [Commented] (TIKA-4258) Multi-arch support for docker images

2024-05-21 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848087#comment-17848087 ] ASF GitHub Bot commented on TIKA-4258: -- nextgens commented on PR #19: URL: https://github.com/apache

[jira] [Commented] (TIKA-4257) Tika detect() recognizes some p7m files as format x-dbf

2024-05-20 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847996#comment-17847996 ] Hudson commented on TIKA-4257: -- SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1635 (See

[jira] [Commented] (TIKA-4257) Tika detect() recognizes some p7m files as format x-dbf

2024-05-20 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847981#comment-17847981 ] ASF GitHub Bot commented on TIKA-4257: -- tballison merged PR #1773: URL: https://github.com/apache

[jira] [Commented] (TIKA-4255) TextAndCSVParser ignores Metadata.CONTENT_ENCODING

2024-05-20 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847980#comment-17847980 ] Tim Allison commented on TIKA-4255: --- Thank you for opening this PR. Are you able to add a small unit

[jira] [Resolved] (TIKA-4256) Allow inlining of ocr'd text in container document

2024-05-20 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4256. --- Fix Version/s: 3.0.0 Resolution: Fixed > Allow inlining of ocr'd text in container docum

[jira] [Commented] (TIKA-4257) Tika detect() recognizes some p7m files as format x-dbf

2024-05-20 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847972#comment-17847972 ] ASF GitHub Bot commented on TIKA-4257: -- tballison opened a new pull request, #1773: URL: https

[jira] [Commented] (TIKA-4258) Multi-arch support for docker images

2024-05-20 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847950#comment-17847950 ] Tim Allison commented on TIKA-4258: --- I'm sure I'll need to modify the PR when I actually go to run

[jira] [Commented] (TIKA-4258) Multi-arch support for docker images

2024-05-20 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847949#comment-17847949 ] Tim Allison commented on TIKA-4258: --- Let's give it a day for fellow devs to weigh

[jira] [Commented] (TIKA-4258) Multi-arch support for docker images

2024-05-20 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847947#comment-17847947 ] ASF GitHub Bot commented on TIKA-4258: -- tballison commented on PR #19: URL: https://github.com/apache

[jira] [Commented] (TIKA-4258) Multi-arch support for docker images

2024-05-20 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847945#comment-17847945 ] ASF GitHub Bot commented on TIKA-4258: -- hegerdes commented on PR #19: URL: https://github.com/apache

[jira] [Commented] (TIKA-4258) Multi-arch support for docker images

2024-05-20 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847943#comment-17847943 ] Tim Allison commented on TIKA-4258: --- And here's the full version: https://hub.docker.com/layers/apache

[jira] [Commented] (TIKA-4258) Multi-arch support for docker images

2024-05-20 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847937#comment-17847937 ] ASF GitHub Bot commented on TIKA-4258: -- tballison commented on PR #19: URL: https://github.com/apache

[jira] [Commented] (TIKA-4243) tika configuration overhaul

2024-05-20 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847931#comment-17847931 ] Tim Allison commented on TIKA-4243: --- Separately, but related to this and also to TIKA-4252 -- should we

[jira] [Commented] (TIKA-4258) Multi-arch support for docker images

2024-05-20 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847929#comment-17847929 ] ASF GitHub Bot commented on TIKA-4258: -- tballison commented on PR #19: URL: https://github.com/apache

[jira] [Commented] (TIKA-4256) Allow inlining of ocr'd text in container document

2024-05-20 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847909#comment-17847909 ] Hudson commented on TIKA-4256: -- SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1634 (See

[jira] [Commented] (TIKA-4258) Multi-arch support for docker images

2024-05-20 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847905#comment-17847905 ] ASF GitHub Bot commented on TIKA-4258: -- fpiesche commented on PR #19: URL: https://github.com/apache

[jira] [Commented] (TIKA-4258) Multi-arch support for docker images

2024-05-20 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847896#comment-17847896 ] ASF GitHub Bot commented on TIKA-4258: -- tballison commented on PR #19: URL: https://github.com/apache

[jira] [Commented] (TIKA-4258) Multi-arch support for docker images

2024-05-20 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847895#comment-17847895 ] ASF GitHub Bot commented on TIKA-4258: -- tballison commented on PR #19: URL: https://github.com/apache

[jira] [Commented] (TIKA-4166) dependency updates for Tika 3.0

2024-05-20 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847890#comment-17847890 ] Hudson commented on TIKA-4166: -- SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1633 (See

[jira] [Commented] (TIKA-4258) Multi-arch support for docker images

2024-05-20 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847887#comment-17847887 ] ASF GitHub Bot commented on TIKA-4258: -- nextgens commented on PR #19: URL: https://github.com/apache

[jira] [Commented] (TIKA-4258) Multi-arch support for docker images

2024-05-20 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847884#comment-17847884 ] ASF GitHub Bot commented on TIKA-4258: -- tballison commented on PR #19: URL: https://github.com/apache

[jira] [Commented] (TIKA-4258) Multi-arch support for docker images

2024-05-20 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847883#comment-17847883 ] Tim Allison commented on TIKA-4258: --- Helpful links from #infra: https://infra.apache.org/docker-hub

[jira] [Commented] (TIKA-4258) Multi-arch support for docker images

2024-05-20 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847882#comment-17847882 ] Tim Allison commented on TIKA-4258: --- If fellow devs with better knowledge of github actions and docker

[jira] [Created] (TIKA-4258) Multi-arch support for docker images

2024-05-20 Thread Tim Allison (Jira)
Tim Allison created TIKA-4258: - Summary: Multi-arch support for docker images Key: TIKA-4258 URL: https://issues.apache.org/jira/browse/TIKA-4258 Project: Tika Issue Type: Task

[jira] [Updated] (TIKA-4257) Tika detect() recognizes some p7m files as format x-dbf

2024-05-20 Thread Luca Bentivoglio (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luca Bentivoglio updated TIKA-4257: --- Description: Tika detect method sometimes recognizes p7m files as format application/x-dbf

[jira] [Commented] (TIKA-4256) Allow inlining of ocr'd text in container document

2024-05-20 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847874#comment-17847874 ] ASF GitHub Bot commented on TIKA-4256: -- tballison merged PR #1762: URL: https://github.com/apache

[jira] [Commented] (TIKA-4166) dependency updates for Tika 3.0

2024-05-20 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847868#comment-17847868 ] Hudson commented on TIKA-4166: -- SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1632 (See

[jira] [Commented] (TIKA-4166) dependency updates for Tika 3.0

2024-05-20 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847856#comment-17847856 ] Hudson commented on TIKA-4166: -- UNSTABLE: Integrated in Jenkins build Tika » tika-main-jdk11 #1631 (See

[jira] [Commented] (TIKA-4166) dependency updates for Tika 3.0

2024-05-20 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847827#comment-17847827 ] Hudson commented on TIKA-4166: -- ABORTED: Integrated in Jenkins build Tika » tika-main-jdk11 #1630 (See

[jira] [Updated] (TIKA-4257) Tika detect() recognizes some p7m files as format x-dbf

2024-05-20 Thread Luca Bentivoglio (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luca Bentivoglio updated TIKA-4257: --- Description: Tika detect method sometimes recognizes p7m files as format x-dbf

[jira] [Updated] (TIKA-4257) Tika detect() recognizes some p7m files as format x-dbf

2024-05-20 Thread Luca Bentivoglio (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luca Bentivoglio updated TIKA-4257: --- Description: Tika detect method sometimes recognizes p7m files as format x-dbf

[jira] [Updated] (TIKA-4257) Tika detect() recognizes some p7m files as format x-dbf

2024-05-20 Thread Luca Bentivoglio (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luca Bentivoglio updated TIKA-4257: --- Summary: Tika detect() recognizes some p7m files as format x-dbf (was: Tika detect

[jira] [Updated] (TIKA-4257) Tika detect() riconosce alcuni file p7m come formato x-dbf

2024-05-20 Thread Luca Bentivoglio (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luca Bentivoglio updated TIKA-4257: --- Summary: Tika detect() riconosce alcuni file p7m come formato x-dbf (was: Tika detect

[jira] [Updated] (TIKA-4257) Tika detect riconosce alcuni file p7m come formato x-dbf

2024-05-20 Thread Luca Bentivoglio (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luca Bentivoglio updated TIKA-4257: --- Summary: Tika detect riconosce alcuni file p7m come formato x-dbf (was: Riconoscimento file

[jira] [Created] (TIKA-4257) Riconoscimento file p7m

2024-05-20 Thread Luca Bentivoglio (Jira)
Luca Bentivoglio created TIKA-4257: -- Summary: Riconoscimento file p7m Key: TIKA-4257 URL: https://issues.apache.org/jira/browse/TIKA-4257 Project: Tika Issue Type: Bug Components

[jira] [Commented] (TIKA-4256) Allow inlining of ocr'd text in container document

2024-05-17 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847335#comment-17847335 ] ASF GitHub Bot commented on TIKA-4256: -- tballison opened a new pull request, #1762: URL: https

[jira] [Commented] (TIKA-696) Extract watermarks from Word documents

2024-05-16 Thread Alexey Pismenskiy (Jira)
[ https://issues.apache.org/jira/browse/TIKA-696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847018#comment-17847018 ] Alexey Pismenskiy commented on TIKA-696: Hey [~nick] , we would be interested in this - any updates

[jira] [Updated] (TIKA-4256) Allow inlining of ocr'd text in container document

2024-05-16 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-4256: -- Description: For legacy tika, we're inlining all content from embedded files including ocr content

[jira] [Updated] (TIKA-4256) Allow inlining of ocr'd text in container document

2024-05-16 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-4256: -- Description: For legacy tika, we're inlining all content from embedded files including ocr content

[jira] [Created] (TIKA-4256) Allow inlining of ocr'd text in container document

2024-05-16 Thread Tim Allison (Jira)
Tim Allison created TIKA-4256: - Summary: Allow inlining of ocr'd text in container document Key: TIKA-4256 URL: https://issues.apache.org/jira/browse/TIKA-4256 Project: Tika Issue Type: Task

[jira] [Commented] (TIKA-4255) TextAndCSVParser ignores Metadata.CONTENT_ENCODING

2024-05-16 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846908#comment-17846908 ] ASF GitHub Bot commented on TIKA-4255: -- axeld opened a new pull request, #1761: URL: https

[jira] [Created] (TIKA-4255) TextAndCSVParser ignores Metadata.CONTENT_ENCODING

2024-05-16 Thread Jira
Axel Dörfler created TIKA-4255: -- Summary: TextAndCSVParser ignores Metadata.CONTENT_ENCODING Key: TIKA-4255 URL: https://issues.apache.org/jira/browse/TIKA-4255 Project: Tika Issue Type: Bug

[jira] [Updated] (TIKA-1907) Big Pdf parsing to text - Out of memory

2024-05-15 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-1907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-1907: -- Fix Version/s: 3.0.0 > Big Pdf parsing to text - Out of mem

[jira] [Commented] (TIKA-4137) Building current Tika main branch fails under Java 20/21

2024-05-15 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846697#comment-17846697 ] Tim Allison commented on TIKA-4137: --- Y, done just now. > Building current Tika main branch fails un

[jira] [Updated] (TIKA-4137) Building current Tika main branch fails under Java 20/21

2024-05-15 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-4137: -- Fix Version/s: 2.9.3 > Building current Tika main branch fails under Java 20

[jira] [Commented] (TIKA-4137) Building current Tika main branch fails under Java 20/21

2024-05-15 Thread Roberto Franchini (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846672#comment-17846672 ] Roberto Franchini commented on TIKA-4137: - Could you please backport this small fix on 2.9.x

[jira] [Commented] (TIKA-4170) Tika to extract Apple Key files

2024-05-13 Thread Tika User (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846142#comment-17846142 ] Tika User commented on TIKA-4170: - Any update on this ? > Tika to extract Apple Key fi

[jira] [Comment Edited] (TIKA-4254) The test `TestMimeTypes#testJavaRegex` is not idempotent, as it passes in the first run and fails in repeated runs in the same environment.

2024-05-12 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845590#comment-17845590 ] Tilman Hausherr edited comment on TIKA-4254 at 5/12/24 9:40 AM: THausherr

[jira] [Commented] (TIKA-4254) The test `TestMimeTypes#testJavaRegex` is not idempotent, as it passes in the first run and fails in repeated runs in the same environment.

2024-05-11 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845649#comment-17845649 ] ASF GitHub Bot commented on TIKA-4254: -- kaiyaok2 commented on PR #1754: URL: https://github.com

[jira] [Commented] (TIKA-4252) PipesClient#process - seems to lose the Fetch input metadata?

2024-05-11 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845623#comment-17845623 ] ASF GitHub Bot commented on TIKA-4252: -- nddipiazza commented on code in PR #1753: URL: https

[jira] [Commented] (TIKA-4254) The test `TestMimeTypes#testJavaRegex` is not idempotent, as it passes in the first run and fails in repeated runs in the same environment.

2024-05-11 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845595#comment-17845595 ] ASF GitHub Bot commented on TIKA-4254: -- kaiyaok2 commented on PR #1754: URL: https://github.com

[jira] [Commented] (TIKA-4254) The test `TestMimeTypes#testJavaRegex` is not idempotent, as it passes in the first run and fails in repeated runs in the same environment.

2024-05-11 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845590#comment-17845590 ] ASF GitHub Bot commented on TIKA-4254: -- THausherr commented on PR #1754: URL: https://github.com

[jira] [Commented] (TIKA-4254) The test `TestMimeTypes#testJavaRegex` is not idempotent, as it passes in the first run and fails in repeated runs in the same environment.

2024-05-11 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845586#comment-17845586 ] ASF GitHub Bot commented on TIKA-4254: -- kaiyaok2 commented on PR #1754: URL: https://github.com

[jira] [Updated] (TIKA-4254) The test `TestMimeTypes#testJavaRegex` is not idempotent, as it passes in the first run and fails in repeated runs in the same environment.

2024-05-11 Thread Kaiyao Ke (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kaiyao Ke updated TIKA-4254: Description: ### Brief Description of the Bug The test `TestMimeTypes#testJavaRegex` is non-idempotent

[jira] [Commented] (TIKA-4252) PipesClient#process - seems to lose the Fetch input metadata?

2024-05-11 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845583#comment-17845583 ] ASF GitHub Bot commented on TIKA-4252: -- tballison commented on code in PR #1753: URL: https

[jira] [Commented] (TIKA-4254) The test `TestMimeTypes#testJavaRegex` is not idempotent, as it passes in the first run and fails in repeated runs in the same environment.

2024-05-11 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845581#comment-17845581 ] ASF GitHub Bot commented on TIKA-4254: -- tballison commented on PR #1754: URL: https://github.com

[jira] [Commented] (TIKA-4254) The test `TestMimeTypes#testJavaRegex` is not idempotent, as it passes in the first run and fails in repeated runs in the same environment.

2024-05-11 Thread Kaiyao Ke (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845571#comment-17845571 ] Kaiyao Ke commented on TIKA-4254: - [~tilman] The main idea is to ensure unit tests are self-contained

[jira] [Commented] (TIKA-4254) The test `TestMimeTypes#testJavaRegex` is not idempotent, as it passes in the first run and fails in repeated runs in the same environment.

2024-05-11 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845566#comment-17845566 ] Tilman Hausherr commented on TIKA-4254: --- Why would we ever run the test twice in the same

[jira] [Commented] (TIKA-4254) The test `TestMimeTypes#testJavaRegex` is not idempotent, as it passes in the first run and fails in repeated runs in the same environment.

2024-05-11 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845560#comment-17845560 ] ASF GitHub Bot commented on TIKA-4254: -- kaiyaok2 opened a new pull request, #1754: URL: https

[jira] [Created] (TIKA-4254) The test `TestMimeTypes#testJavaRegex` is not idempotent, as it passes in the first run and fails in repeated runs in the same environment.

2024-05-11 Thread Kaiyao Ke (Jira)
://issues.apache.org/jira/browse/TIKA-4254 Project: Tika Issue Type: Bug Reporter: Kaiyao Ke ### Brief Description of the Bug The test `TestMimeTypes#testJavaRegex` is non-idempotent, as it passes in the first run but fails in the second run in the same environment

[jira] [Updated] (TIKA-4232) Create and execute unit tests for tika-helm

2024-05-10 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated TIKA-4232: --- Fix Version/s: 2.9.3 > Create and execute unit tests for tika-h

[jira] [Resolved] (TIKA-4232) Create and execute unit tests for tika-helm

2024-05-10 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved TIKA-4232. Resolution: Fixed > Create and execute unit tests for tika-h

[jira] [Closed] (TIKA-4232) Create and execute unit tests for tika-helm

2024-05-10 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney closed TIKA-4232. -- > Create and execute unit tests for tika-h

[jira] [Commented] (TIKA-4232) Create and execute unit tests for tika-helm

2024-05-10 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845510#comment-17845510 ] ASF GitHub Bot commented on TIKA-4232: -- lewismc commented on PR #17: URL: https://github.com/apache

[jira] [Commented] (TIKA-4232) Create and execute unit tests for tika-helm

2024-05-10 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845509#comment-17845509 ] ASF GitHub Bot commented on TIKA-4232: -- lewismc merged PR #17: URL: https://github.com/apache/tika

[jira] [Commented] (TIKA-4232) Create and execute unit tests for tika-helm

2024-05-10 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845508#comment-17845508 ] ASF GitHub Bot commented on TIKA-4232: -- lewismc opened a new pull request, #17: URL: https

[jira] [Commented] (TIKA-4232) Create and execute unit tests for tika-helm

2024-05-10 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845507#comment-17845507 ] ASF GitHub Bot commented on TIKA-4232: -- lewismc closed pull request #17: TIKA-4232 Create and execute

[jira] [Commented] (TIKA-4252) PipesClient#process - seems to lose the Fetch input metadata?

2024-05-10 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845302#comment-17845302 ] ASF GitHub Bot commented on TIKA-4252: -- tballison commented on code in PR #1753: URL: https

  1   2   3   4   5   6   7   8   9   10   >