[jira] [Created] (TIKA-4208) OOM error in SAS7BDATParser

2024-03-08 Thread Gregory Lepore (Jira)
Gregory Lepore created TIKA-4208: Summary: OOM error in SAS7BDATParser Key: TIKA-4208 URL: https://issues.apache.org/jira/browse/TIKA-4208 Project: Tika Issue Type: Bug Affects Versions

[jira] [Created] (TIKA-4207) PipesParser should have option to extract raw bytes of embedded files

2024-03-08 Thread Tim Allison (Jira)
Tim Allison created TIKA-4207: - Summary: PipesParser should have option to extract raw bytes of embedded files Key: TIKA-4207 URL: https://issues.apache.org/jira/browse/TIKA-4207 Project: Tika

[jira] [Commented] (TIKA-3353) Tika Server Production ready monitoring (Prometheus and JMX)

2024-03-07 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824496#comment-17824496 ] ASF GitHub Bot commented on TIKA-3353: -- lewismc commented on PR #429: URL: https://github.com/apache

[jira] [Commented] (TIKA-3353) Tika Server Production ready monitoring (Prometheus and JMX)

2024-03-07 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824328#comment-17824328 ] ASF GitHub Bot commented on TIKA-3353: -- Opa- commented on PR #429: URL: https://github.com/apache

[jira] [Closed] (TIKA-496) Language identifier profile comparison favors large profiles

2024-03-06 Thread Jira
[ https://issues.apache.org/jira/browse/TIKA-496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl closed TIKA-496. Resolution: Won't Do Closing as this is only a problem for the original TIKA langid which is superceded

[jira] [Commented] (TIKA-3353) Tika Server Production ready monitoring (Prometheus and JMX)

2024-03-05 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17823740#comment-17823740 ] ASF GitHub Bot commented on TIKA-3353: -- tballison commented on PR #429: URL: https://github.com

[jira] [Commented] (TIKA-3353) Tika Server Production ready monitoring (Prometheus and JMX)

2024-03-05 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17823734#comment-17823734 ] ASF GitHub Bot commented on TIKA-3353: -- Opa- commented on PR #429: URL: https://github.com/apache

[jira] [Commented] (TIKA-4199) commons-compress 1.26.0 breaks Apache Tika 2.9.1

2024-03-05 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17823577#comment-17823577 ] Hudson commented on TIKA-4199: -- SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1540 (See

[jira] [Created] (TIKA-4206) Variation on Zip Bomb

2024-03-03 Thread Gregory Lepore (Jira)
Gregory Lepore created TIKA-4206: Summary: Variation on Zip Bomb Key: TIKA-4206 URL: https://issues.apache.org/jira/browse/TIKA-4206 Project: Tika Issue Type: Bug Affects Versions: 3.0.0

[jira] [Commented] (TIKA-4166) dependency updates for Tika 3.0

2024-02-29 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17822078#comment-17822078 ] Hudson commented on TIKA-4166: -- SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1535 (See

[jira] [Commented] (TIKA-4202) Add page count of OCR'd pages in metadata for PDF files

2024-02-28 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821773#comment-17821773 ] Hudson commented on TIKA-4202: -- SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1533 (See

[jira] [Commented] (TIKA-4204) ChmExtractor unable to decompress file

2024-02-28 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821772#comment-17821772 ] Hudson commented on TIKA-4204: -- SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1533 (See

[jira] [Commented] (TIKA-4205) Add more columns to profiles table in tika-eval Profile mode

2024-02-28 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821774#comment-17821774 ] Hudson commented on TIKA-4205: -- SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1533 (See

[jira] [Resolved] (TIKA-4204) ChmExtractor unable to decompress file

2024-02-28 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4204. --- Fix Version/s: 2.9.2 3.0.0 Resolution: Fixed > ChmExtractor una

[jira] [Commented] (TIKA-4204) ChmExtractor unable to decompress file

2024-02-28 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821735#comment-17821735 ] Tim Allison commented on TIKA-4204: --- I just cherry-picked the fix(es) back to {{branch_2x

[jira] [Resolved] (TIKA-4202) Add page count of OCR'd pages in metadata for PDF files

2024-02-28 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4202. --- Fix Version/s: 3.0.0 Resolution: Fixed > Add page count of OCR'd pages in metadata for

[jira] [Resolved] (TIKA-4205) Add more columns to profiles table in tika-eval Profile mode

2024-02-28 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4205. --- Fix Version/s: 3.0.0 Resolution: Fixed > Add more columns to profiles table in tika-e

[jira] [Commented] (TIKA-4202) Add page count of OCR'd pages in metadata for PDF files

2024-02-28 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821706#comment-17821706 ] ASF GitHub Bot commented on TIKA-4202: -- tballison merged PR #1630: URL: https://github.com/apache

[jira] [Commented] (TIKA-4205) Add more columns to profiles table in tika-eval Profile mode

2024-02-28 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821705#comment-17821705 ] ASF GitHub Bot commented on TIKA-4205: -- tballison merged PR #1629: URL: https://github.com/apache

[jira] [Commented] (TIKA-4202) Add page count of OCR'd pages in metadata for PDF files

2024-02-28 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821703#comment-17821703 ] Tim Allison commented on TIKA-4202: --- The most recent commit actually increments the counter. I've also

[jira] [Commented] (TIKA-4204) ChmExtractor unable to decompress file

2024-02-28 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821697#comment-17821697 ] Tim Allison commented on TIKA-4204: --- Ugh. I accidentally pushed to main instead of a dev branch. Sorry

[jira] [Commented] (TIKA-4204) ChmExtractor unable to decompress file

2024-02-28 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821698#comment-17821698 ] Tim Allison commented on TIKA-4204: --- [~bossymr] thank you for opening this issue, thoroughly diagnosing

[jira] [Commented] (TIKA-4202) Add page count of OCR'd pages in metadata for PDF files

2024-02-28 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821692#comment-17821692 ] ASF GitHub Bot commented on TIKA-4202: -- tballison opened a new pull request, #1630: URL: https

[jira] [Commented] (TIKA-4205) Add more columns to profiles table in tika-eval Profile mode

2024-02-28 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821690#comment-17821690 ] ASF GitHub Bot commented on TIKA-4205: -- tballison opened a new pull request, #1629: URL: https

[jira] [Created] (TIKA-4205) Add more columns to profiles table in tika-eval Profile mode

2024-02-28 Thread Tim Allison (Jira)
Tim Allison created TIKA-4205: - Summary: Add more columns to profiles table in tika-eval Profile mode Key: TIKA-4205 URL: https://issues.apache.org/jira/browse/TIKA-4205 Project: Tika Issue

[jira] [Assigned] (TIKA-4204) ChmExtractor unable to decompress file

2024-02-27 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison reassigned TIKA-4204: - Assignee: Tim Allison > ChmExtractor unable to decompress f

[jira] [Updated] (TIKA-4204) ChmExtractor unable to decompress file

2024-02-27 Thread Robert Fromholz (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Fromholz updated TIKA-4204: -- Attachment: 3HAC050917_TRM_RAPID_RW_6-en.chm Environment: The file I am trying to parse

[jira] [Created] (TIKA-4204) ChmExtractor unable to decompress file

2024-02-27 Thread Robert Fromholz (Jira)
Robert Fromholz created TIKA-4204: - Summary: ChmExtractor unable to decompress file Key: TIKA-4204 URL: https://issues.apache.org/jira/browse/TIKA-4204 Project: Tika Issue Type: Bug

[jira] [Commented] (TIKA-4203) Add @deprecated annotation where needed

2024-02-24 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820366#comment-17820366 ] Hudson commented on TIKA-4203: -- FAILURE: Integrated in Jenkins build Tika » tika-main-jdk11 #1528 (See

[jira] [Updated] (TIKA-4203) Add @deprecated annotation where needed

2024-02-24 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-4203: -- Fix Version/s: 3.0.0 > Add @deprecated annotation where nee

[jira] [Updated] (TIKA-4203) Add @deprecated annotation where needed

2024-02-24 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-4203: -- Affects Version/s: 3.0.0 > Add @deprecated annotation where nee

[jira] [Resolved] (TIKA-4203) Add @deprecated annotation where needed

2024-02-24 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr resolved TIKA-4203. --- Resolution: Fixed > Add @deprecated annotation where nee

[jira] [Created] (TIKA-4203) Add @deprecated annotation where needed

2024-02-24 Thread Tilman Hausherr (Jira)
Tilman Hausherr created TIKA-4203: - Summary: Add @deprecated annotation where needed Key: TIKA-4203 URL: https://issues.apache.org/jira/browse/TIKA-4203 Project: Tika Issue Type: Task

[jira] [Commented] (TIKA-4202) Add page count of OCR'd pages in metadata for PDF files

2024-02-23 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820223#comment-17820223 ] Hudson commented on TIKA-4202: -- SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1526 (See

[jira] [Commented] (TIKA-4202) Add page count of OCR'd pages in metadata for PDF files

2024-02-23 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820203#comment-17820203 ] ASF GitHub Bot commented on TIKA-4202: -- tballison merged PR #1621: URL: https://github.com/apache

[jira] [Commented] (TIKA-4202) Add page count of OCR'd pages in metadata for PDF files

2024-02-23 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820165#comment-17820165 ] ASF GitHub Bot commented on TIKA-4202: -- tballison opened a new pull request, #1621: URL: https

[jira] [Updated] (TIKA-4202) Add page count of OCR'd pages in metadata for PDF files

2024-02-23 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-4202: -- Summary: Add page count of OCR'd pages in metadata for PDF files (was: Add page count of OCR'd pages

[jira] [Created] (TIKA-4202) Add page count of OCR'd pages in PDF's metadata

2024-02-23 Thread Tim Allison (Jira)
Tim Allison created TIKA-4202: - Summary: Add page count of OCR'd pages in PDF's metadata Key: TIKA-4202 URL: https://issues.apache.org/jira/browse/TIKA-4202 Project: Tika Issue Type: New Feature

[jira] [Commented] (TIKA-4199) commons-compress 1.26.0 breaks Apache Tika 2.9.1

2024-02-20 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819093#comment-17819093 ] Hudson commented on TIKA-4199: -- SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1520 (See

[jira] [Updated] (TIKA-4199) commons-compress 1.26.0 breaks Apache Tika 2.9.1

2024-02-20 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-4199: -- Fix Version/s: 2.9.2 3.0.0 > commons-compress 1.26.0 breaks Apache T

[jira] [Commented] (TIKA-4199) commons-compress 1.26.0 breaks Apache Tika 2.9.1

2024-02-20 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818937#comment-17818937 ] Tilman Hausherr commented on TIKA-4199: --- I tried an another solution {code:java

[jira] [Commented] (TIKA-4201) Add hard limit to stream reading in IWorksParser#detectType

2024-02-20 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818916#comment-17818916 ] Hudson commented on TIKA-4201: -- FAILURE: Integrated in Jenkins build Tika » tika-main-jdk11 #1518 (See

[jira] [Commented] (TIKA-4201) Add hard limit to stream reading in IWorksParser#detectType

2024-02-20 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818907#comment-17818907 ] ASF GitHub Bot commented on TIKA-4201: -- tballison merged PR #1608: URL: https://github.com/apache

[jira] [Commented] (TIKA-4199) commons-compress 1.26.0 breaks Apache Tika 2.9.1

2024-02-20 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818905#comment-17818905 ] Hudson commented on TIKA-4199: -- FAILURE: Integrated in Jenkins build Tika » tika-main-jdk11 #1517 (See

[jira] [Commented] (TIKA-4198) Skip blob fields in geopkg files

2024-02-20 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818906#comment-17818906 ] Hudson commented on TIKA-4198: -- FAILURE: Integrated in Jenkins build Tika » tika-main-jdk11 #1517 (See

[jira] [Commented] (TIKA-4198) Skip blob fields in geopkg files

2024-02-20 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818882#comment-17818882 ] ASF GitHub Bot commented on TIKA-4198: -- tballison merged PR #1607: URL: https://github.com/apache

[jira] [Resolved] (TIKA-4198) Skip blob fields in geopkg files

2024-02-20 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4198. --- Fix Version/s: 3.0.0 Resolution: Fixed > Skip blob fields in geopkg fi

[jira] [Commented] (TIKA-4201) Add hard limit to stream reading in IWorksParser#detectType

2024-02-20 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818880#comment-17818880 ] ASF GitHub Bot commented on TIKA-4201: -- tballison opened a new pull request, #1608: URL: https

[jira] [Commented] (TIKA-4199) commons-compress 1.26.0 breaks Apache Tika 2.9.1

2024-02-20 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818877#comment-17818877 ] Hudson commented on TIKA-4199: -- SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1516 (See

[jira] [Commented] (TIKA-4201) Add hard limit to stream reading in IWorksParser#detectType

2024-02-20 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818873#comment-17818873 ] Tilman Hausherr commented on TIKA-4201: --- Yeah, makes sense. > Add hard limit to stream read

[jira] [Commented] (TIKA-4199) commons-compress 1.26.0 breaks Apache Tika 2.9.1

2024-02-20 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818871#comment-17818871 ] Tim Allison commented on TIKA-4199: --- I opened TIKA-4201 to add a hard limit to the read

[jira] [Comment Edited] (TIKA-4199) commons-compress 1.26.0 breaks Apache Tika 2.9.1

2024-02-20 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818867#comment-17818867 ] Tilman Hausherr edited comment on TIKA-4199 at 2/20/24 3:37 PM: {quote}I'm

[jira] [Commented] (TIKA-4199) commons-compress 1.26.0 breaks Apache Tika 2.9.1

2024-02-20 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818867#comment-17818867 ] Tilman Hausherr commented on TIKA-4199: --- {quote}I'm not declaring this a problem with commons

[jira] [Created] (TIKA-4201) Add hard limit to stream reading in IWorksParser#detectType

2024-02-20 Thread Tim Allison (Jira)
Tim Allison created TIKA-4201: - Summary: Add hard limit to stream reading in IWorksParser#detectType Key: TIKA-4201 URL: https://issues.apache.org/jira/browse/TIKA-4201 Project: Tika Issue Type

[jira] [Commented] (TIKA-4199) commons-compress 1.26.0 breaks Apache Tika 2.9.1

2024-02-20 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818853#comment-17818853 ] Tim Allison commented on TIKA-4199: --- As I look at the IWorkPackageParser and the detectType(), I think

[jira] [Commented] (TIKA-4199) commons-compress 1.26.0 breaks Apache Tika 2.9.1

2024-02-20 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818846#comment-17818846 ] Tim Allison commented on TIKA-4199: --- Thank you [~tilman] for working on this! I'm sorry I opened

[jira] [Commented] (TIKA-4199) commons-compress 1.26.0 breaks Apache Tika 2.9.1

2024-02-20 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818823#comment-17818823 ] Tilman Hausherr commented on TIKA-4199: --- After merging I discovered that the SevenZWrapper class

[jira] [Closed] (TIKA-4200) Fix broken build after upgrade to commons-compress

2024-02-20 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr closed TIKA-4200. - Resolution: Duplicate Our CI is failing because of the CVE :-( Duplicate of TIKA-4199. I'm still

[jira] [Commented] (TIKA-4200) Fix broken build after upgrade to commons-compress

2024-02-20 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818790#comment-17818790 ] Tim Allison commented on TIKA-4200: --- Argh. Sorry. [~tilman] is already working on this: https

[jira] [Commented] (TIKA-4198) Skip blob fields in geopkg files

2024-02-20 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818781#comment-17818781 ] ASF GitHub Bot commented on TIKA-4198: -- tballison opened a new pull request, #1607: URL: https

[jira] [Created] (TIKA-4200) Fix broken build after upgrade to commons-compress

2024-02-20 Thread Tim Allison (Jira)
Tim Allison created TIKA-4200: - Summary: Fix broken build after upgrade to commons-compress Key: TIKA-4200 URL: https://issues.apache.org/jira/browse/TIKA-4200 Project: Tika Issue Type: Bug

[jira] [Comment Edited] (TIKA-4199) commons-compress 1.26.0 breaks Apache Tika 2.9.1

2024-02-20 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818774#comment-17818774 ] Tilman Hausherr edited comment on TIKA-4199 at 2/20/24 11:57 AM: - I'm

[jira] [Commented] (TIKA-4199) commons-compress 1.26.0 breaks Apache Tika 2.9.1

2024-02-20 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818774#comment-17818774 ] Tilman Hausherr commented on TIKA-4199: --- I'm working on it https://github.com/apache/pdfbox/pull

[jira] [Created] (TIKA-4199) commons-compress 1.26.0 breaks Apache Tika 2.9.1

2024-02-20 Thread Alexander Veit (Jira)
Alexander Veit created TIKA-4199: Summary: commons-compress 1.26.0 breaks Apache Tika 2.9.1 Key: TIKA-4199 URL: https://issues.apache.org/jira/browse/TIKA-4199 Project: Tika Issue Type: Bug

[jira] [Commented] (TIKA-4198) Skip blob fields in geopkg files

2024-02-16 Thread Gregory Lepore (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818069#comment-17818069 ] Gregory Lepore commented on TIKA-4198: -- For this set of data from the Bureau of Land Management

[jira] [Commented] (TIKA-4198) Skip blob fields in geopkg files

2024-02-16 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818068#comment-17818068 ] Tim Allison commented on TIKA-4198: --- [~g...@rhobard.com], based on your knowledge of the format, should

[jira] [Commented] (TIKA-4198) Skip blob fields in geopkg files

2024-02-16 Thread Gregory Lepore (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818018#comment-17818018 ] Gregory Lepore commented on TIKA-4198: -- This would make a huge difference in my agency's ability

[jira] [Commented] (TIKA-4198) Skip blob fields in geopkg files

2024-02-16 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817988#comment-17817988 ] Tim Allison commented on TIKA-4198: --- On one 130MB file, the processing time went from 320 seconds ->

[jira] [Commented] (TIKA-4198) Skip blob fields in geopkg files

2024-02-16 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817981#comment-17817981 ] Tim Allison commented on TIKA-4198: --- Turns out there's a "geom" field and also a &q

[jira] [Created] (TIKA-4198) Skip blob fields in geopkg files

2024-02-16 Thread Tim Allison (Jira)
Tim Allison created TIKA-4198: - Summary: Skip blob fields in geopkg files Key: TIKA-4198 URL: https://issues.apache.org/jira/browse/TIKA-4198 Project: Tika Issue Type: Improvement

[jira] [Commented] (TIKA-4166) dependency updates for Tika 3.0

2024-02-16 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817970#comment-17817970 ] Hudson commented on TIKA-4166: -- SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1512 (See

[jira] [Commented] (TIKA-3784) Detector returns "application/x-x509-key" when scanning a .p12 file

2024-02-13 Thread Lonzak (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816896#comment-17816896 ] Lonzak commented on TIKA-3784: -- One other possibility would be to combine both approaches: Usage

[jira] [Comment Edited] (TIKA-3784) Detector returns "application/x-x509-key" when scanning a .p12 file

2024-02-13 Thread Lonzak (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816811#comment-17816811 ] Lonzak edited comment on TIKA-3784 at 2/13/24 8:16 AM: --- PKCS12 is not the easiest

[jira] [Comment Edited] (TIKA-3784) Detector returns "application/x-x509-key" when scanning a .p12 file

2024-02-13 Thread Lonzak (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816811#comment-17816811 ] Lonzak edited comment on TIKA-3784 at 2/13/24 8:14 AM: --- PKCS12 is not the easiest

[jira] [Commented] (TIKA-3784) Detector returns "application/x-x509-key" when scanning a .p12 file

2024-02-12 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816827#comment-17816827 ] Tim Allison commented on TIKA-3784: --- Well, sure, if you want to make it easy! Y, let's go with something

[jira] [Comment Edited] (TIKA-3784) Detector returns "application/x-x509-key" when scanning a .p12 file

2024-02-12 Thread Lonzak (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816811#comment-17816811 ] Lonzak edited comment on TIKA-3784 at 2/12/24 11:16 PM: PKCS12 is not the easiest

[jira] [Comment Edited] (TIKA-3784) Detector returns "application/x-x509-key" when scanning a .p12 file

2024-02-12 Thread Lonzak (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816811#comment-17816811 ] Lonzak edited comment on TIKA-3784 at 2/12/24 11:15 PM: PKCS12 is not the easiest

[jira] [Updated] (TIKA-4194) tika fails to detect certain pkcs12 keystores types p12 pfx

2024-02-12 Thread Lonzak (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lonzak updated TIKA-4194: - Description: We use tika to detect the type of a file which is uploaded. In most cases this works quite well

[jira] [Commented] (TIKA-3784) Detector returns "application/x-x509-key" when scanning a .p12 file

2024-02-12 Thread Lonzak (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816811#comment-17816811 ] Lonzak commented on TIKA-3784: -- PKCS12 is not the easiest format :-| The oid for pkcs12 starts

[jira] [Commented] (TIKA-4191) tika-core and other deps should be "provided" in non-app contexts

2024-02-12 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816804#comment-17816804 ] Hudson commented on TIKA-4191: -- SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1505 (See

[jira] [Commented] (TIKA-3784) Detector returns "application/x-x509-key" when scanning a .p12 file

2024-02-12 Thread Nick Burch (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816788#comment-17816788 ] Nick Burch commented on TIKA-3784: -- >From [https://datatracker.ietf.org/doc/rfc7292/] it looks l

[jira] [Commented] (TIKA-4196) Add a BOM charset detector

2024-02-12 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816779#comment-17816779 ] Hudson commented on TIKA-4196: -- SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1504 (See

[jira] [Commented] (TIKA-4194) tika fails to detect certain pkcs12 keystores types p12 pfx

2024-02-12 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816781#comment-17816781 ] Hudson commented on TIKA-4194: -- SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1504 (See

[jira] [Commented] (TIKA-4195) JSoupParser conceals null from the EncodingDetector

2024-02-12 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816780#comment-17816780 ] Hudson commented on TIKA-4195: -- SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1504 (See

[jira] [Commented] (TIKA-4191) tika-core and other deps should be "provided" in non-app contexts

2024-02-12 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816715#comment-17816715 ] ASF GitHub Bot commented on TIKA-4191: -- tballison merged PR #1575: URL: https://github.com/apache

[jira] [Resolved] (TIKA-4197) Downgrade jackrabbit in 2.x

2024-02-12 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4197. --- Fix Version/s: 2.9.2 Resolution: Fixed > Downgrade jackrabbit in

[jira] [Created] (TIKA-4197) Downgrade jackrabbit in 2.x

2024-02-12 Thread Tim Allison (Jira)
Tim Allison created TIKA-4197: - Summary: Downgrade jackrabbit in 2.x Key: TIKA-4197 URL: https://issues.apache.org/jira/browse/TIKA-4197 Project: Tika Issue Type: Bug Reporter: Tim

[jira] [Updated] (TIKA-4195) JSoupParser conceals null from the EncodingDetector

2024-02-12 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-4195: -- Description: The JSoupParser runs encoding detection on the InputStream. If the result is null

[jira] [Resolved] (TIKA-4195) JSoupParser conceals null from the EncodingDetector

2024-02-12 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4195. --- Fix Version/s: 3.0.0 Resolution: Fixed > JSoupParser conceals null from the EncodingDetec

[jira] [Commented] (TIKA-4195) JSoupParser conceals null from the EncodingDetector

2024-02-12 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816694#comment-17816694 ] ASF GitHub Bot commented on TIKA-4195: -- tballison merged PR #1591: URL: https://github.com/apache

[jira] [Commented] (TIKA-4194) tika fails to detect certain pkcs12 keystores types p12 pfx

2024-02-12 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816689#comment-17816689 ] Tim Allison commented on TIKA-4194: --- Merged and cherry-picked into branch_2x. [~tom_1st] if you do have

[jira] [Commented] (TIKA-4194) tika fails to detect certain pkcs12 keystores types p12 pfx

2024-02-12 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816687#comment-17816687 ] ASF GitHub Bot commented on TIKA-4194: -- tballison merged PR #1589: URL: https://github.com/apache

[jira] [Commented] (TIKA-4196) Add a BOM charset detector

2024-02-12 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816680#comment-17816680 ] ASF GitHub Bot commented on TIKA-4196: -- tballison merged PR #1590: URL: https://github.com/apache

[jira] [Commented] (TIKA-4195) JSoupParser conceals null from the EncodingDetector

2024-02-12 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816679#comment-17816679 ] ASF GitHub Bot commented on TIKA-4195: -- tballison opened a new pull request, #1591: URL: https

[jira] [Updated] (TIKA-4196) Add a BOM charset detector

2024-02-12 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-4196: -- Description: The ICU4j and the StandardHtmlEncodingDetector detectors include a bom detector

[jira] [Commented] (TIKA-4196) Add a BOM charset detector

2024-02-12 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816668#comment-17816668 ] ASF GitHub Bot commented on TIKA-4196: -- tballison opened a new pull request, #1590: URL: https

[jira] [Created] (TIKA-4196) Add a BOM charset detector

2024-02-12 Thread Tim Allison (Jira)
Tim Allison created TIKA-4196: - Summary: Add a BOM charset detector Key: TIKA-4196 URL: https://issues.apache.org/jira/browse/TIKA-4196 Project: Tika Issue Type: New Feature Reporter

[jira] [Commented] (TIKA-4194) tika fails to detect certain pkcs12 keystores types p12 pfx

2024-02-12 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816661#comment-17816661 ] Tim Allison commented on TIKA-4194: --- Thank you for this! I'll try to take a look later today

[jira] [Created] (TIKA-4195) JSoupParser conceals null from the EncodingDetector

2024-02-12 Thread Tim Allison (Jira)
Tim Allison created TIKA-4195: - Summary: JSoupParser conceals null from the EncodingDetector Key: TIKA-4195 URL: https://issues.apache.org/jira/browse/TIKA-4195 Project: Tika Issue Type

[jira] [Comment Edited] (TIKA-4194) tika fails to detect certain pkcs12 keystores types p12 pfx

2024-02-12 Thread Lonzak (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816607#comment-17816607 ] Lonzak edited comment on TIKA-4194 at 2/12/24 1:47 PM: --- Interestingly

<    2   3   4   5   6   7   8   9   10   11   >