[jira] [Commented] (PDFBOX-5415) Infinite loop in ExtractText in 2.x branch on a specific pdf

2022-04-12 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17521382#comment-17521382 ] Tim Allison commented on PDFBOX-5415: - Michael Demey's diagnosis:

[jira] [Updated] (PDFBOX-5415) Infinite loop in ExtractText in 2.x branch on a specific pdf

2022-04-12 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-5415: Attachment: PDFBOX-5415-TIKA-3718-p10.pdf > Infinite loop in ExtractText in 2.x branch on

[jira] [Updated] (PDFBOX-5415) Infinite loop in ExtractText in 2.x branch on a specific pdf

2022-04-12 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated PDFBOX-5415: Affects Version/s: 2.0.26 > Infinite loop in ExtractText in 2.x branch on a specific pdf >

[jira] [Updated] (PDFBOX-5415) Infinite loop in ExtractText in 2.x branch on a specific pdf

2022-04-12 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-5415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated PDFBOX-5415: Component/s: Parsing > Infinite loop in ExtractText in 2.x branch on a specific pdf >

[jira] [Created] (PDFBOX-5415) Infinite loop in ExtractText in 2.x branch on a specific pdf

2022-04-12 Thread Tim Allison (Jira)
Tim Allison created PDFBOX-5415: --- Summary: Infinite loop in ExtractText in 2.x branch on a specific pdf Key: PDFBOX-5415 URL: https://issues.apache.org/jira/browse/PDFBOX-5415 Project: PDFBox

Re: 2.0.26 release

2022-04-12 Thread Tilman Hausherr
Only one left: 7LRS5U6CAFMN2P6JPTZVNBUW6XOFYH4M.pdf . There is some sort of problem with an incremental save, a part of the multi-content stream is missing / has a new object number. Lets wait whether it is related to PDFBOX-5413 . (The other one, HOAZTST4E26NPA7HL72WCIVMNRQ3E4M5.pdf is an

Re: 2.0.26 release

2022-04-12 Thread Tilman Hausherr
Only commoncrawl3/7L/7LRS5U6CAFMN2P6JPTZVNBUW6XOFYH4M commoncrawl3/HO/HOAZTST4E26NPA7HL72WCIVMNRQ3E4M5 have a different text extraction With the other two it's attachment file names or doc info. Tilman Am 12.04.2022 um 08:16 schrieb Tilman Hausherr: After having looked at the content

[jira] [Commented] (PDFBOX-4892) Improve code quality (4)

2022-04-12 Thread ASF subversion and git services (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-4892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17520935#comment-17520935 ] ASF subversion and git services commented on PDFBOX-4892: - Commit 1899764 from

[jira] [Commented] (PDFBOX-4892) Improve code quality (4)

2022-04-12 Thread ASF subversion and git services (Jira)
[ https://issues.apache.org/jira/browse/PDFBOX-4892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17520934#comment-17520934 ] ASF subversion and git services commented on PDFBOX-4892: - Commit 1899763 from

[jira] [Updated] (PDFBOX-5413) Field text missing

2022-04-12 Thread Jira
[ https://issues.apache.org/jira/browse/PDFBOX-5413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Lehmkühler updated PDFBOX-5413: --- Attachment: WYPJNTD5KQNODSXWK4GABURXRTTD5P4H.pdf

[jira] [Reopened] (PDFBOX-5413) Field text missing

2022-04-12 Thread Jira
[ https://issues.apache.org/jira/browse/PDFBOX-5413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Lehmkühler reopened PDFBOX-5413: There is another regression and it looks like the root cause is the same. > Field

Re: 2.0.26 release

2022-04-12 Thread Tilman Hausherr
After having looked at the content differences and trying to rule out the /Names differences, there are 4 files with content in TOP_10_MORE_IN_A that feel suspicious and IMHO need investigation. commoncrawl3/7L/7LRS5U6CAFMN2P6JPTZVNBUW6XOFYH4M govdocs1/365/365260.pdf

Re: 2.0.26 release

2022-04-12 Thread Andreas Lehmkuehler
Thanks Tim! Looks like there are 5 new exceptions left. I'm going to check the first two ones commoncrawl3/ZC/ZCY5MCL7KI6QXVMXUZ2AJKXICQIT4TL4 commoncrawl3/WY/WYPJNTD5KQNODSXWK4GABURXRTTD5P4H The others are thrown within Jempbox Andreas Am 11.04.22 um 12:40 schrieb Tim Allison: