[
https://issues.apache.org/jira/browse/PDFBOX-5415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17521382#comment-17521382
]
Tim Allison commented on PDFBOX-5415:
-
Michael Demey's diagnosis:
[
https://issues.apache.org/jira/browse/PDFBOX-5415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr updated PDFBOX-5415:
Attachment: PDFBOX-5415-TIKA-3718-p10.pdf
> Infinite loop in ExtractText in 2.x branch on
[
https://issues.apache.org/jira/browse/PDFBOX-5415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated PDFBOX-5415:
Affects Version/s: 2.0.26
> Infinite loop in ExtractText in 2.x branch on a specific pdf
>
[
https://issues.apache.org/jira/browse/PDFBOX-5415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated PDFBOX-5415:
Component/s: Parsing
> Infinite loop in ExtractText in 2.x branch on a specific pdf
>
Tim Allison created PDFBOX-5415:
---
Summary: Infinite loop in ExtractText in 2.x branch on a specific
pdf
Key: PDFBOX-5415
URL: https://issues.apache.org/jira/browse/PDFBOX-5415
Project: PDFBox
Only one left: 7LRS5U6CAFMN2P6JPTZVNBUW6XOFYH4M.pdf .
There is some sort of problem with an incremental save, a part of the
multi-content stream is missing / has a new object number. Lets wait
whether it is related to PDFBOX-5413 .
(The other one, HOAZTST4E26NPA7HL72WCIVMNRQ3E4M5.pdf is an
Only
commoncrawl3/7L/7LRS5U6CAFMN2P6JPTZVNBUW6XOFYH4M
commoncrawl3/HO/HOAZTST4E26NPA7HL72WCIVMNRQ3E4M5
have a different text extraction
With the other two it's attachment file names or doc info.
Tilman
Am 12.04.2022 um 08:16 schrieb Tilman Hausherr:
After having looked at the content
[
https://issues.apache.org/jira/browse/PDFBOX-4892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17520935#comment-17520935
]
ASF subversion and git services commented on PDFBOX-4892:
-
Commit 1899764 from
[
https://issues.apache.org/jira/browse/PDFBOX-4892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17520934#comment-17520934
]
ASF subversion and git services commented on PDFBOX-4892:
-
Commit 1899763 from
[
https://issues.apache.org/jira/browse/PDFBOX-5413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andreas Lehmkühler updated PDFBOX-5413:
---
Attachment: WYPJNTD5KQNODSXWK4GABURXRTTD5P4H.pdf
[
https://issues.apache.org/jira/browse/PDFBOX-5413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andreas Lehmkühler reopened PDFBOX-5413:
There is another regression and it looks like the root cause is the same.
> Field
After having looked at the content differences and trying to rule out
the /Names differences, there are 4 files with content in
TOP_10_MORE_IN_A that feel suspicious and IMHO need investigation.
commoncrawl3/7L/7LRS5U6CAFMN2P6JPTZVNBUW6XOFYH4M
govdocs1/365/365260.pdf
Thanks Tim!
Looks like there are 5 new exceptions left.
I'm going to check the first two ones
commoncrawl3/ZC/ZCY5MCL7KI6QXVMXUZ2AJKXICQIT4TL4
commoncrawl3/WY/WYPJNTD5KQNODSXWK4GABURXRTTD5P4H
The others are thrown within Jempbox
Andreas
Am 11.04.22 um 12:40 schrieb Tim Allison:
13 matches
Mail list logo