[jira] [Commented] (PDFBOX-5490) Add reconstruction information to the PDDocument
[ https://issues.apache.org/jira/browse/PDFBOX-5490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17578904#comment-17578904 ] Tim Allison commented on PDFBOX-5490: - Y. Completely understand. I don't want to impede 3.0.0. Thank you! > Add reconstruction information to the PDDocument > > > Key: PDFBOX-5490 > URL: https://issues.apache.org/jira/browse/PDFBOX-5490 > Project: PDFBox > Issue Type: Wish > Components: Parsing >Reporter: Tim Allison >Priority: Minor > > When the xref has to be rebuilt or there are other anomalies in the parsing > of the PDDocument, the results are currently logged. In a multithreaded > environment it is not easy to reconstruct which documents had which problems. > It would be helpful if a PDF was able to be successfully loaded to include > information about what had to be fixed in order to load it successfully. > Certainly, rebuilding the xref table comes to mind, but any other info would > also be useful. > This is a wish for 3.x. I don't think I'll have time to contribute. :( -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5490) Add reconstruction information to the PDDocument
[ https://issues.apache.org/jira/browse/PDFBOX-5490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17578848#comment-17578848 ] Andreas Lehmkühler commented on PDFBOX-5490: First of all, that seems to be more than a couple of lines of code. We should stop adding new features to 3.0.0 as I'd like to release a first beta version in September or so if nobody objects. We should define possible kinds of information to be collected by the new feature to determine where to put the code. If it is limited to things like the xref table the parser is the right place but if one is interested in things like missing fonts or unicode maps the code has to be somewhere else. > Add reconstruction information to the PDDocument > > > Key: PDFBOX-5490 > URL: https://issues.apache.org/jira/browse/PDFBOX-5490 > Project: PDFBox > Issue Type: Wish > Components: Parsing >Reporter: Tim Allison >Priority: Minor > > When the xref has to be rebuilt or there are other anomalies in the parsing > of the PDDocument, the results are currently logged. In a multithreaded > environment it is not easy to reconstruct which documents had which problems. > It would be helpful if a PDF was able to be successfully loaded to include > information about what had to be fixed in order to load it successfully. > Certainly, rebuilding the xref table comes to mind, but any other info would > also be useful. > This is a wish for 3.x. I don't think I'll have time to contribute. :( -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5490) Add reconstruction information to the PDDocument
[ https://issues.apache.org/jira/browse/PDFBOX-5490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17578537#comment-17578537 ] Maruan Sahyoun commented on PDFBOX-5490: OK - let's wait what [~lehmi] has to say about that as he's the one - apart from other areas - doing the parser. Looks like we need a somewhat extensible Event data model in order to deal with different needs and being extensible ... > Add reconstruction information to the PDDocument > > > Key: PDFBOX-5490 > URL: https://issues.apache.org/jira/browse/PDFBOX-5490 > Project: PDFBox > Issue Type: Wish > Components: Parsing >Reporter: Tim Allison >Priority: Minor > > When the xref has to be rebuilt or there are other anomalies in the parsing > of the PDDocument, the results are currently logged. In a multithreaded > environment it is not easy to reconstruct which documents had which problems. > It would be helpful if a PDF was able to be successfully loaded to include > information about what had to be fixed in order to load it successfully. > Certainly, rebuilding the xref table comes to mind, but any other info would > also be useful. > This is a wish for 3.x. I don't think I'll have time to contribute. :( -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5490) Add reconstruction information to the PDDocument
[ https://issues.apache.org/jira/browse/PDFBOX-5490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17578510#comment-17578510 ] Tim Allison commented on PDFBOX-5490: - My initial request would be for whether or not the xref table had to be rebuilt...largely because I'm somewhat interested in that at the moment. Any info at the pre-DOM stage for what had to be guessed or assumed -- alleged obj stream length != actual object stream. Other places where PDFBox currently logs warnings (missing font, missing unicode mappings etc) after the DOM has been built would also be useful. > Add reconstruction information to the PDDocument > > > Key: PDFBOX-5490 > URL: https://issues.apache.org/jira/browse/PDFBOX-5490 > Project: PDFBox > Issue Type: Wish > Components: Parsing >Reporter: Tim Allison >Priority: Minor > > When the xref has to be rebuilt or there are other anomalies in the parsing > of the PDDocument, the results are currently logged. In a multithreaded > environment it is not easy to reconstruct which documents had which problems. > It would be helpful if a PDF was able to be successfully loaded to include > information about what had to be fixed in order to load it successfully. > Certainly, rebuilding the xref table comes to mind, but any other info would > also be useful. > This is a wish for 3.x. I don't think I'll have time to contribute. :( -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5490) Add reconstruction information to the PDDocument
[ https://issues.apache.org/jira/browse/PDFBOX-5490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17578480#comment-17578480 ] Maruan Sahyoun commented on PDFBOX-5490: [~lehmi] thoughts? I could do a small patch for an initial PoC - maybe initially using the FOP events package but havn't looked into it. [~tallison] what's the information you'd like to capture. Like the fact that there was some repair or is there more information you are looking for? > Add reconstruction information to the PDDocument > > > Key: PDFBOX-5490 > URL: https://issues.apache.org/jira/browse/PDFBOX-5490 > Project: PDFBox > Issue Type: Wish > Components: Parsing >Reporter: Tim Allison >Priority: Minor > > When the xref has to be rebuilt or there are other anomalies in the parsing > of the PDDocument, the results are currently logged. In a multithreaded > environment it is not easy to reconstruct which documents had which problems. > It would be helpful if a PDF was able to be successfully loaded to include > information about what had to be fixed in order to load it successfully. > Certainly, rebuilding the xref table comes to mind, but any other info would > also be useful. > This is a wish for 3.x. I don't think I'll have time to contribute. :( -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5490) Add reconstruction information to the PDDocument
[ https://issues.apache.org/jira/browse/PDFBOX-5490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17578129#comment-17578129 ] Tim Allison commented on PDFBOX-5490: - Oh, that looks great. > Add reconstruction information to the PDDocument > > > Key: PDFBOX-5490 > URL: https://issues.apache.org/jira/browse/PDFBOX-5490 > Project: PDFBox > Issue Type: Wish > Components: Parsing >Reporter: Tim Allison >Priority: Minor > > When the xref has to be rebuilt or there are other anomalies in the parsing > of the PDDocument, the results are currently logged. In a multithreaded > environment it is not easy to reconstruct which documents had which problems. > It would be helpful if a PDF was able to be successfully loaded to include > information about what had to be fixed in order to load it successfully. > Certainly, rebuilding the xref table comes to mind, but any other info would > also be useful. > This is a wish for 3.x. I don't think I'll have time to contribute. :( -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5490) Add reconstruction information to the PDDocument
[ https://issues.apache.org/jira/browse/PDFBOX-5490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17578115#comment-17578115 ] Maruan Sahyoun commented on PDFBOX-5490: Apache fop has something like that [https://xmlgraphics.apache.org/fop/2.7/events.html] which looks similar to what I had in mind. > Add reconstruction information to the PDDocument > > > Key: PDFBOX-5490 > URL: https://issues.apache.org/jira/browse/PDFBOX-5490 > Project: PDFBox > Issue Type: Wish > Components: Parsing >Reporter: Tim Allison >Priority: Minor > > When the xref has to be rebuilt or there are other anomalies in the parsing > of the PDDocument, the results are currently logged. In a multithreaded > environment it is not easy to reconstruct which documents had which problems. > It would be helpful if a PDF was able to be successfully loaded to include > information about what had to be fixed in order to load it successfully. > Certainly, rebuilding the xref table comes to mind, but any other info would > also be useful. > This is a wish for 3.x. I don't think I'll have time to contribute. :( -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5490) Add reconstruction information to the PDDocument
[ https://issues.apache.org/jira/browse/PDFBOX-5490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17578065#comment-17578065 ] Michael Klink commented on PDFBOX-5490: --- Sounds like a good idea. That also would allow to implement some customized parsing strictness if an exception thrown by the listener is interpreted a rejected repair... > Add reconstruction information to the PDDocument > > > Key: PDFBOX-5490 > URL: https://issues.apache.org/jira/browse/PDFBOX-5490 > Project: PDFBox > Issue Type: Wish > Components: Parsing >Reporter: Tim Allison >Priority: Minor > > When the xref has to be rebuilt or there are other anomalies in the parsing > of the PDDocument, the results are currently logged. In a multithreaded > environment it is not easy to reconstruct which documents had which problems. > It would be helpful if a PDF was able to be successfully loaded to include > information about what had to be fixed in order to load it successfully. > Certainly, rebuilding the xref table comes to mind, but any other info would > also be useful. > This is a wish for 3.x. I don't think I'll have time to contribute. :( -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5490) Add reconstruction information to the PDDocument
[ https://issues.apache.org/jira/browse/PDFBOX-5490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17578055#comment-17578055 ] Tim Allison commented on PDFBOX-5490: - A Listener would be great. Any mechanism that would allow programmatic retrieval of problems encountered during the parse per file. > Add reconstruction information to the PDDocument > > > Key: PDFBOX-5490 > URL: https://issues.apache.org/jira/browse/PDFBOX-5490 > Project: PDFBox > Issue Type: Wish > Components: Parsing >Reporter: Tim Allison >Priority: Minor > > When the xref has to be rebuilt or there are other anomalies in the parsing > of the PDDocument, the results are currently logged. In a multithreaded > environment it is not easy to reconstruct which documents had which problems. > It would be helpful if a PDF was able to be successfully loaded to include > information about what had to be fixed in order to load it successfully. > Certainly, rebuilding the xref table comes to mind, but any other info would > also be useful. > This is a wish for 3.x. I don't think I'll have time to contribute. :( -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5490) Add reconstruction information to the PDDocument
[ https://issues.apache.org/jira/browse/PDFBOX-5490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17578037#comment-17578037 ] Maruan Sahyoun commented on PDFBOX-5490: Would you think that this should be part of the PDDocument or what about being able to attach a listener? Preferences? > Add reconstruction information to the PDDocument > > > Key: PDFBOX-5490 > URL: https://issues.apache.org/jira/browse/PDFBOX-5490 > Project: PDFBox > Issue Type: Wish > Components: Parsing >Reporter: Tim Allison >Priority: Minor > > When the xref has to be rebuilt or there are other anomalies in the parsing > of the PDDocument, the results are currently logged. In a multithreaded > environment it is not easy to reconstruct which documents had which problems. > It would be helpful if a PDF was able to be successfully loaded to include > information about what had to be fixed in order to load it successfully. > Certainly, rebuilding the xref table comes to mind, but any other info would > also be useful. > This is a wish for 3.x. I don't think I'll have time to contribute. :( -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org