[jira] [Commented] (PDFBOX-5490) Add reconstruction information to the PDDocument

2022-08-11 Thread Maruan Sahyoun (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17578537#comment-17578537
 ] 

Maruan Sahyoun commented on PDFBOX-5490:


OK - let's wait what [~lehmi]  has to say about that as he's the one - apart 
from other areas - doing the parser. Looks like we need a somewhat extensible 
Event data model in order to deal with different needs and being extensible ...

> Add reconstruction information to the PDDocument
> 
>
> Key: PDFBOX-5490
> URL: https://issues.apache.org/jira/browse/PDFBOX-5490
> Project: PDFBox
>  Issue Type: Wish
>  Components: Parsing
>Reporter: Tim Allison
>Priority: Minor
>
> When the xref has to be rebuilt or there are other anomalies in the parsing 
> of the PDDocument, the results are currently logged.  In a multithreaded 
> environment it is not easy to reconstruct which documents had which problems.
> It would be helpful if a PDF was able to be successfully loaded to include 
> information about what had to be fixed in order to load it successfully.  
> Certainly, rebuilding the xref table comes to mind, but any other info would 
> also be useful.
> This is a wish for 3.x.  I don't think I'll have time to contribute. :(



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5490) Add reconstruction information to the PDDocument

2022-08-11 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17578510#comment-17578510
 ] 

Tim Allison commented on PDFBOX-5490:
-

My initial request would be for whether or not the xref table had to be 
rebuilt...largely because I'm somewhat interested in that at the moment. 

Any info at the pre-DOM stage for what had to be guessed or assumed -- alleged 
obj stream length != actual object stream.

Other places where PDFBox currently logs warnings (missing font, missing 
unicode mappings etc) after the DOM has been built would also be useful.

> Add reconstruction information to the PDDocument
> 
>
> Key: PDFBOX-5490
> URL: https://issues.apache.org/jira/browse/PDFBOX-5490
> Project: PDFBox
>  Issue Type: Wish
>  Components: Parsing
>Reporter: Tim Allison
>Priority: Minor
>
> When the xref has to be rebuilt or there are other anomalies in the parsing 
> of the PDDocument, the results are currently logged.  In a multithreaded 
> environment it is not easy to reconstruct which documents had which problems.
> It would be helpful if a PDF was able to be successfully loaded to include 
> information about what had to be fixed in order to load it successfully.  
> Certainly, rebuilding the xref table comes to mind, but any other info would 
> also be useful.
> This is a wish for 3.x.  I don't think I'll have time to contribute. :(



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-5490) Add reconstruction information to the PDDocument

2022-08-11 Thread Maruan Sahyoun (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17578480#comment-17578480
 ] 

Maruan Sahyoun edited comment on PDFBOX-5490 at 8/11/22 1:45 PM:
-

[~lehmi]  thoughts? I could do a small patch for an initial PoC - maybe 
initially using the FOP events package but havn't looked into it.

[~tallison] what's the information you'd like to capture. Like the fact that 
there was some repair or is there more information you are looking for?

Maybe it would be wise to postpone that until after 3.0.


was (Author: msahyoun):
[~lehmi]  thoughts? I could do a small patch for an initial PoC - maybe 
initially using the FOP events package but havn't looked into it.

[~tallison] what's the information you'd like to capture. Like the fact that 
there was some repair or is there more information you are looking for?

> Add reconstruction information to the PDDocument
> 
>
> Key: PDFBOX-5490
> URL: https://issues.apache.org/jira/browse/PDFBOX-5490
> Project: PDFBox
>  Issue Type: Wish
>  Components: Parsing
>Reporter: Tim Allison
>Priority: Minor
>
> When the xref has to be rebuilt or there are other anomalies in the parsing 
> of the PDDocument, the results are currently logged.  In a multithreaded 
> environment it is not easy to reconstruct which documents had which problems.
> It would be helpful if a PDF was able to be successfully loaded to include 
> information about what had to be fixed in order to load it successfully.  
> Certainly, rebuilding the xref table comes to mind, but any other info would 
> also be useful.
> This is a wish for 3.x.  I don't think I'll have time to contribute. :(



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5490) Add reconstruction information to the PDDocument

2022-08-11 Thread Maruan Sahyoun (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17578480#comment-17578480
 ] 

Maruan Sahyoun commented on PDFBOX-5490:


[~lehmi]  thoughts? I could do a small patch for an initial PoC - maybe 
initially using the FOP events package but havn't looked into it.

[~tallison] what's the information you'd like to capture. Like the fact that 
there was some repair or is there more information you are looking for?

> Add reconstruction information to the PDDocument
> 
>
> Key: PDFBOX-5490
> URL: https://issues.apache.org/jira/browse/PDFBOX-5490
> Project: PDFBox
>  Issue Type: Wish
>  Components: Parsing
>Reporter: Tim Allison
>Priority: Minor
>
> When the xref has to be rebuilt or there are other anomalies in the parsing 
> of the PDDocument, the results are currently logged.  In a multithreaded 
> environment it is not easy to reconstruct which documents had which problems.
> It would be helpful if a PDF was able to be successfully loaded to include 
> information about what had to be fixed in order to load it successfully.  
> Certainly, rebuilding the xref table comes to mind, but any other info would 
> also be useful.
> This is a wish for 3.x.  I don't think I'll have time to contribute. :(



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5485) Stackoverflow writing out a subset of PDF pages - COSWriterObjectStream

2022-08-11 Thread Jira


[ 
https://issues.apache.org/jira/browse/PDFBOX-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17578399#comment-17578399
 ] 

Andreas Lehmkühler commented on PDFBOX-5485:


[~omcgovern] thanks for the report and the input especially the test. I've 
fixed the Stackoverflow. You might check the fix using the next upcoming 
snapshot

> Stackoverflow writing out a subset of PDF pages - COSWriterObjectStream
> ---
>
> Key: PDFBOX-5485
> URL: https://issues.apache.org/jira/browse/PDFBOX-5485
> Project: PDFBox
>  Issue Type: Bug
>  Components: Writing
>Affects Versions: 3.0.0 PDFBox
> Environment: MacOS, but likely not OS specific.
>Reporter: Owen McGovern
>Assignee: Andreas Lehmkühler
>Priority: Major
> Fix For: 3.0.0 PDFBox
>
>
> Version:  org.apache.pdfbox:pdfbox:3.0.0-alpha3
>  
> In a subset of PDFs I process, I cannot extract a range of PDF pages and 
> write them out to a new PDF.   ( As part of test code )
> Here's the Kotlin code I use 
> {code:java}
> fun extractPages(documentName: String, fromPage: Int, toPage: Int) : Path {
>    val pdfFile = Paths.get("data", "input", "PDFS", "${documentName}.pdf")
>    val pdfPagesFile = Paths.get("data", "input", "PDFS", 
> "${documentName}_Page_$fromPage-$toPage.pdf")        
>val pdfDoc = org.apache.pdfbox.Loader.loadPDF(pdfFile.toFile())
>    val pageExtractor = PageExtractor(pdfDoc, fromPage, toPage)        
>val pdfPages = pageExtractor.extract()
>    pdfPages.save(pdfPagesFile.toFile())
>    return pdfPagesFile
> }{code}
> It doesn't occur in all PDFS... maybe 10-20% of the PDFs I use. 
>  
> The a slice of the stack trace is 
> {code:java}
> java.lang.StackOverflowError
>     at java.base/java.util.HashMap.tableSizeFor(HashMap.java:380)
>     at java.base/java.util.HashMap.(HashMap.java:453)
>     at java.base/java.util.LinkedHashMap.(LinkedHashMap.java:347)
>     at java.base/java.util.HashSet.(HashSet.java:162)
>     at java.base/java.util.LinkedHashSet.(LinkedHashSet.java:154)
>     at org.apache.pdfbox.util.SmallMap.entrySet(SmallMap.java:380)
>     at org.apache.pdfbox.cos.COSDictionary.entrySet(COSDictionary.java:1225)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSDictionary(COSWriterObjectStream.java:336)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:230)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSArray(COSWriterObjectStream.java:319)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:226)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSDictionary(COSWriterObjectStream.java:341)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:230)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSDictionary(COSWriterObjectStream.java:341)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:230)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSArray(COSWriterObjectStream.java:319)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:226)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSDictionary(COSWriterObjectStream.java:341)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:230)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSDictionary(COSWriterObjectStream.java:341)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:230)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSArray(COSWriterObjectStream.java:319)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:226)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSDictionary(COSWriterObjectStream.java:341)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:230)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSDictionary(COSWriterObjectStream.java:341)
>  {code}
> As I mentioned, hits some PDFs, not all.
> I legally cannot share the original source PDFs but it looks like a recursive 
> loop in writeCOSDictionary and writeObject in COSWriterObjectStream.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: 

[jira] [Commented] (PDFBOX-5485) Stackoverflow writing out a subset of PDF pages - COSWriterObjectStream

2022-08-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17578398#comment-17578398
 ] 

ASF subversion and git services commented on PDFBOX-5485:
-

Commit 1903349 from le...@apache.org in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1903349 ]

PDFBOX-5485: add test as proposed by Owen McGovern

> Stackoverflow writing out a subset of PDF pages - COSWriterObjectStream
> ---
>
> Key: PDFBOX-5485
> URL: https://issues.apache.org/jira/browse/PDFBOX-5485
> Project: PDFBox
>  Issue Type: Bug
>  Components: Writing
>Affects Versions: 3.0.0 PDFBox
> Environment: MacOS, but likely not OS specific.
>Reporter: Owen McGovern
>Assignee: Andreas Lehmkühler
>Priority: Major
> Fix For: 3.0.0 PDFBox
>
>
> Version:  org.apache.pdfbox:pdfbox:3.0.0-alpha3
>  
> In a subset of PDFs I process, I cannot extract a range of PDF pages and 
> write them out to a new PDF.   ( As part of test code )
> Here's the Kotlin code I use 
> {code:java}
> fun extractPages(documentName: String, fromPage: Int, toPage: Int) : Path {
>    val pdfFile = Paths.get("data", "input", "PDFS", "${documentName}.pdf")
>    val pdfPagesFile = Paths.get("data", "input", "PDFS", 
> "${documentName}_Page_$fromPage-$toPage.pdf")        
>val pdfDoc = org.apache.pdfbox.Loader.loadPDF(pdfFile.toFile())
>    val pageExtractor = PageExtractor(pdfDoc, fromPage, toPage)        
>val pdfPages = pageExtractor.extract()
>    pdfPages.save(pdfPagesFile.toFile())
>    return pdfPagesFile
> }{code}
> It doesn't occur in all PDFS... maybe 10-20% of the PDFs I use. 
>  
> The a slice of the stack trace is 
> {code:java}
> java.lang.StackOverflowError
>     at java.base/java.util.HashMap.tableSizeFor(HashMap.java:380)
>     at java.base/java.util.HashMap.(HashMap.java:453)
>     at java.base/java.util.LinkedHashMap.(LinkedHashMap.java:347)
>     at java.base/java.util.HashSet.(HashSet.java:162)
>     at java.base/java.util.LinkedHashSet.(LinkedHashSet.java:154)
>     at org.apache.pdfbox.util.SmallMap.entrySet(SmallMap.java:380)
>     at org.apache.pdfbox.cos.COSDictionary.entrySet(COSDictionary.java:1225)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSDictionary(COSWriterObjectStream.java:336)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:230)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSArray(COSWriterObjectStream.java:319)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:226)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSDictionary(COSWriterObjectStream.java:341)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:230)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSDictionary(COSWriterObjectStream.java:341)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:230)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSArray(COSWriterObjectStream.java:319)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:226)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSDictionary(COSWriterObjectStream.java:341)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:230)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSDictionary(COSWriterObjectStream.java:341)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:230)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSArray(COSWriterObjectStream.java:319)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:226)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSDictionary(COSWriterObjectStream.java:341)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:230)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSDictionary(COSWriterObjectStream.java:341)
>  {code}
> As I mentioned, hits some PDFs, not all.
> I legally cannot share the original source PDFs but it looks like a recursive 
> loop in writeCOSDictionary and writeObject in COSWriterObjectStream.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, 

[jira] [Commented] (PDFBOX-5485) Stackoverflow writing out a subset of PDF pages - COSWriterObjectStream

2022-08-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17578396#comment-17578396
 ] 

ASF subversion and git services commented on PDFBOX-5485:
-

Commit 1903348 from le...@apache.org in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1903348 ]

PDFBOX-5485: avaoid StackOverflowException

> Stackoverflow writing out a subset of PDF pages - COSWriterObjectStream
> ---
>
> Key: PDFBOX-5485
> URL: https://issues.apache.org/jira/browse/PDFBOX-5485
> Project: PDFBox
>  Issue Type: Bug
>  Components: Writing
>Affects Versions: 3.0.0 PDFBox
> Environment: MacOS, but likely not OS specific.
>Reporter: Owen McGovern
>Assignee: Andreas Lehmkühler
>Priority: Major
> Fix For: 3.0.0 PDFBox
>
>
> Version:  org.apache.pdfbox:pdfbox:3.0.0-alpha3
>  
> In a subset of PDFs I process, I cannot extract a range of PDF pages and 
> write them out to a new PDF.   ( As part of test code )
> Here's the Kotlin code I use 
> {code:java}
> fun extractPages(documentName: String, fromPage: Int, toPage: Int) : Path {
>    val pdfFile = Paths.get("data", "input", "PDFS", "${documentName}.pdf")
>    val pdfPagesFile = Paths.get("data", "input", "PDFS", 
> "${documentName}_Page_$fromPage-$toPage.pdf")        
>val pdfDoc = org.apache.pdfbox.Loader.loadPDF(pdfFile.toFile())
>    val pageExtractor = PageExtractor(pdfDoc, fromPage, toPage)        
>val pdfPages = pageExtractor.extract()
>    pdfPages.save(pdfPagesFile.toFile())
>    return pdfPagesFile
> }{code}
> It doesn't occur in all PDFS... maybe 10-20% of the PDFs I use. 
>  
> The a slice of the stack trace is 
> {code:java}
> java.lang.StackOverflowError
>     at java.base/java.util.HashMap.tableSizeFor(HashMap.java:380)
>     at java.base/java.util.HashMap.(HashMap.java:453)
>     at java.base/java.util.LinkedHashMap.(LinkedHashMap.java:347)
>     at java.base/java.util.HashSet.(HashSet.java:162)
>     at java.base/java.util.LinkedHashSet.(LinkedHashSet.java:154)
>     at org.apache.pdfbox.util.SmallMap.entrySet(SmallMap.java:380)
>     at org.apache.pdfbox.cos.COSDictionary.entrySet(COSDictionary.java:1225)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSDictionary(COSWriterObjectStream.java:336)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:230)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSArray(COSWriterObjectStream.java:319)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:226)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSDictionary(COSWriterObjectStream.java:341)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:230)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSDictionary(COSWriterObjectStream.java:341)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:230)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSArray(COSWriterObjectStream.java:319)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:226)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSDictionary(COSWriterObjectStream.java:341)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:230)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSDictionary(COSWriterObjectStream.java:341)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:230)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSArray(COSWriterObjectStream.java:319)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:226)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSDictionary(COSWriterObjectStream.java:341)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:230)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSDictionary(COSWriterObjectStream.java:341)
>  {code}
> As I mentioned, hits some PDFs, not all.
> I legally cannot share the original source PDFs but it looks like a recursive 
> loop in writeCOSDictionary and writeObject in COSWriterObjectStream.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: 

[jira] [Commented] (PDFBOX-5485) Stackoverflow writing out a subset of PDF pages - COSWriterObjectStream

2022-08-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17578387#comment-17578387
 ] 

ASF subversion and git services commented on PDFBOX-5485:
-

Commit 1903345 from le...@apache.org in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1903345 ]

PDFBOX-5485: preserve origin key

> Stackoverflow writing out a subset of PDF pages - COSWriterObjectStream
> ---
>
> Key: PDFBOX-5485
> URL: https://issues.apache.org/jira/browse/PDFBOX-5485
> Project: PDFBox
>  Issue Type: Bug
>  Components: Writing
>Affects Versions: 3.0.0 PDFBox
> Environment: MacOS, but likely not OS specific.
>Reporter: Owen McGovern
>Assignee: Andreas Lehmkühler
>Priority: Major
> Fix For: 3.0.0 PDFBox
>
>
> Version:  org.apache.pdfbox:pdfbox:3.0.0-alpha3
>  
> In a subset of PDFs I process, I cannot extract a range of PDF pages and 
> write them out to a new PDF.   ( As part of test code )
> Here's the Kotlin code I use 
> {code:java}
> fun extractPages(documentName: String, fromPage: Int, toPage: Int) : Path {
>    val pdfFile = Paths.get("data", "input", "PDFS", "${documentName}.pdf")
>    val pdfPagesFile = Paths.get("data", "input", "PDFS", 
> "${documentName}_Page_$fromPage-$toPage.pdf")        
>val pdfDoc = org.apache.pdfbox.Loader.loadPDF(pdfFile.toFile())
>    val pageExtractor = PageExtractor(pdfDoc, fromPage, toPage)        
>val pdfPages = pageExtractor.extract()
>    pdfPages.save(pdfPagesFile.toFile())
>    return pdfPagesFile
> }{code}
> It doesn't occur in all PDFS... maybe 10-20% of the PDFs I use. 
>  
> The a slice of the stack trace is 
> {code:java}
> java.lang.StackOverflowError
>     at java.base/java.util.HashMap.tableSizeFor(HashMap.java:380)
>     at java.base/java.util.HashMap.(HashMap.java:453)
>     at java.base/java.util.LinkedHashMap.(LinkedHashMap.java:347)
>     at java.base/java.util.HashSet.(HashSet.java:162)
>     at java.base/java.util.LinkedHashSet.(LinkedHashSet.java:154)
>     at org.apache.pdfbox.util.SmallMap.entrySet(SmallMap.java:380)
>     at org.apache.pdfbox.cos.COSDictionary.entrySet(COSDictionary.java:1225)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSDictionary(COSWriterObjectStream.java:336)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:230)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSArray(COSWriterObjectStream.java:319)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:226)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSDictionary(COSWriterObjectStream.java:341)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:230)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSDictionary(COSWriterObjectStream.java:341)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:230)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSArray(COSWriterObjectStream.java:319)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:226)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSDictionary(COSWriterObjectStream.java:341)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:230)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSDictionary(COSWriterObjectStream.java:341)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:230)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSArray(COSWriterObjectStream.java:319)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:226)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSDictionary(COSWriterObjectStream.java:341)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:230)
>     at 
> org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSDictionary(COSWriterObjectStream.java:341)
>  {code}
> As I mentioned, hits some PDFs, not all.
> I legally cannot share the original source PDFs but it looks like a recursive 
> loop in writeCOSDictionary and writeObject in COSWriterObjectStream.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: