[jira] [Commented] (TIKA-3795) General upgrades for 2.4.2

2022-08-05 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17576143#comment-17576143 ] Hudson commented on TIKA-3795: -- FAILURE: Integrated in Jenkins build Tika » tika-main-jdk8 #732 (See

[jira] [Commented] (TIKA-3831) Allow for retries in S3Fetcher

2022-08-05 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17576078#comment-17576078 ] Hudson commented on TIKA-3831: -- SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk8 #731 (See

[jira] [Comment Edited] (TIKA-3827) Word Document extracted mpga file extension instead of bitmap

2022-08-05 Thread Tika User (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17575945#comment-17575945 ] Tika User edited comment on TIKA-3827 at 8/5/22 11:20 PM: -- Okay was (Author:

[jira] [Updated] (TIKA-3831) Allow for retries in S3Fetcher

2022-08-05 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-3831: -- Summary: Allow for retries in S3Fetcher (was: Small improvements to S3Fetcher) > Allow for retries in

[jira] [Updated] (TIKA-3831) Small improvements to S3Fetcher

2022-08-05 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-3831: -- Description: We should allow for retries. (was: When using the s3fetcher with aws public datasets, no

[jira] [Updated] (TIKA-3831) Small improvements to S3Fetcher

2022-08-05 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-3831: -- Summary: Small improvements to S3Fetcher (was: S3Fetcher does not need to require credentials) >

[jira] [Commented] (TIKA-3832) Required array length is too large (OOM) error when reading a PDF file

2022-08-05 Thread Lakatos Gyula (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17575951#comment-17575951 ] Lakatos Gyula commented on TIKA-3832: - [~tallison] Thanks a lot for fixing the problem! Tika is

[jira] [Commented] (TIKA-3827) Word Document extracted mpga file extension instead of bitmap

2022-08-05 Thread Tika User (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17575945#comment-17575945 ] Tika User commented on TIKA-3827: - When this fix will be available? Next version? > Word Document

[jira] [Commented] (TIKA-3832) Required array length is too large (OOM) error when reading a PDF file

2022-08-05 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17575936#comment-17575936 ] Hudson commented on TIKA-3832: -- SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk8 #730 (See

[jira] [Commented] (TIKA-3827) Word Document extracted mpga file extension instead of bitmap

2022-08-05 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17575935#comment-17575935 ] Hudson commented on TIKA-3827: -- SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk8 #730 (See

[jira] [Commented] (TIKA-3832) Required array length is too large (OOM) error when reading a PDF file

2022-08-05 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17575915#comment-17575915 ] Hudson commented on TIKA-3832: -- SUCCESS: Integrated in Jenkins build Tika » tika-branch1x-jdk8 #244 (See

[jira] [Commented] (TIKA-3827) Word Document extracted mpga file extension instead of bitmap

2022-08-05 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17575891#comment-17575891 ] Tim Allison commented on TIKA-3827: --- For now, I've added a mediatype hint that the bytes are of type

[jira] [Commented] (TIKA-3827) Word Document extracted mpga file extension instead of bitmap

2022-08-05 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17575890#comment-17575890 ] Tim Allison commented on TIKA-3827: --- That's the client code, but we don't know what "getImageData()" is

[jira] [Comment Edited] (TIKA-3829) java.lang.IllegalArgumentException: The document is really a XLS file exception while parsing doc file

2022-08-05 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17575883#comment-17575883 ] Tim Allison edited comment on TIKA-3829 at 8/5/22 2:47 PM: --- You can exclude

[jira] [Commented] (TIKA-3827) Word Document extracted mpga file extension instead of bitmap

2022-08-05 Thread Tika User (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17575881#comment-17575881 ] Tika User commented on TIKA-3827: - Below is the code:   You can easily extract text from the document

[jira] [Commented] (TIKA-3829) java.lang.IllegalArgumentException: The document is really a XLS file exception while parsing doc file

2022-08-05 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17575883#comment-17575883 ] Tim Allison commented on TIKA-3829: --- You can exclude parsers and exclude specific mime types from

[jira] [Resolved] (TIKA-3832) Required array length is too large (OOM) error when reading a PDF file

2022-08-05 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-3832. --- Fix Version/s: 1.28.5 2.4.2 Resolution: Fixed Thank you [~Laxika] for

[jira] [Commented] (TIKA-3832) Required array length is too large (OOM) error when reading a PDF file

2022-08-05 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17575856#comment-17575856 ] Tim Allison commented on TIKA-3832: --- We have to defend against cycles in BookMarks... Facepalm, we do in

[jira] [Commented] (TIKA-3832) Required array length is too large (OOM) error when reading a PDF file

2022-08-05 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17575847#comment-17575847 ] Tim Allison commented on TIKA-3832: --- Thank you for sharing the file. PDFBox's ExtractText has no

[jira] [Commented] (TIKA-3832) Required array length is too large (OOM) error when reading a PDF file

2022-08-05 Thread Nick Burch (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17575814#comment-17575814 ] Nick Burch commented on TIKA-3832: -- Any chance you could try with Apache PDFBox directly? They've got a

[jira] [Created] (TIKA-3832) Required array length is too large when reading a PDF file

2022-08-05 Thread Lakatos Gyula (Jira)
Lakatos Gyula created TIKA-3832: --- Summary: Required array length is too large when reading a PDF file Key: TIKA-3832 URL: https://issues.apache.org/jira/browse/TIKA-3832 Project: Tika Issue

[jira] [Updated] (TIKA-3832) Required array length is too large (OOM) error when reading a PDF file

2022-08-05 Thread Lakatos Gyula (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lakatos Gyula updated TIKA-3832: Summary: Required array length is too large (OOM) error when reading a PDF file (was: Required

[jira] [Comment Edited] (TIKA-3829) java.lang.IllegalArgumentException: The document is really a XLS file exception while parsing doc file

2022-08-05 Thread John (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17575607#comment-17575607 ] John edited comment on TIKA-3829 at 8/5/22 7:01 AM: Ok. Will check and get you back if

[jira] [Commented] (TIKA-3829) java.lang.IllegalArgumentException: The document is really a XLS file exception while parsing doc file

2022-08-05 Thread John (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17575607#comment-17575607 ] John commented on TIKA-3829: Ok. Will check and get you back if we faced this problem again.    There is any

[GitHub] [tika] THausherr merged pull request #641: Bump aws.version from 1.12.275 to 1.12.276

2022-08-05 Thread GitBox
THausherr merged PR #641: URL: https://github.com/apache/tika/pull/641 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

[GitHub] [tika] THausherr merged pull request #640: Bump google-cloud-storage from 2.11.0 to 2.11.2

2022-08-05 Thread GitBox
THausherr merged PR #640: URL: https://github.com/apache/tika/pull/640 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

[GitHub] [tika] THausherr merged pull request #642: Bump maven-site-plugin from 3.12.0 to 3.12.1

2022-08-05 Thread GitBox
THausherr merged PR #642: URL: https://github.com/apache/tika/pull/642 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org