[jira] [Commented] (PDFBOX-5490) Add reconstruction information to the PDDocument

2022-08-12 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17578904#comment-17578904
 ] 

Tim Allison commented on PDFBOX-5490:
-

Y.  Completely understand.  I don't want to impede 3.0.0.   Thank you!  
  

> Add reconstruction information to the PDDocument
> 
>
> Key: PDFBOX-5490
> URL: https://issues.apache.org/jira/browse/PDFBOX-5490
> Project: PDFBox
>  Issue Type: Wish
>  Components: Parsing
>Reporter: Tim Allison
>Priority: Minor
>
> When the xref has to be rebuilt or there are other anomalies in the parsing 
> of the PDDocument, the results are currently logged.  In a multithreaded 
> environment it is not easy to reconstruct which documents had which problems.
> It would be helpful if a PDF was able to be successfully loaded to include 
> information about what had to be fixed in order to load it successfully.  
> Certainly, rebuilding the xref table comes to mind, but any other info would 
> also be useful.
> This is a wish for 3.x.  I don't think I'll have time to contribute. :(



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5490) Add reconstruction information to the PDDocument

2022-08-12 Thread Jira


[ 
https://issues.apache.org/jira/browse/PDFBOX-5490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17578848#comment-17578848
 ] 

Andreas Lehmkühler commented on PDFBOX-5490:


First of all, that seems to be more than a couple of lines of code. We should 
stop adding new features to 3.0.0 as I'd like to release a first beta version 
in September or so if nobody objects.

We should define possible kinds of information to be collected by the new 
feature to determine where to put the code. If it is limited to things like the 
xref table the parser is the right place but if one is interested in things 
like missing fonts or unicode maps the code has to be somewhere else.

> Add reconstruction information to the PDDocument
> 
>
> Key: PDFBOX-5490
> URL: https://issues.apache.org/jira/browse/PDFBOX-5490
> Project: PDFBox
>  Issue Type: Wish
>  Components: Parsing
>Reporter: Tim Allison
>Priority: Minor
>
> When the xref has to be rebuilt or there are other anomalies in the parsing 
> of the PDDocument, the results are currently logged.  In a multithreaded 
> environment it is not easy to reconstruct which documents had which problems.
> It would be helpful if a PDF was able to be successfully loaded to include 
> information about what had to be fixed in order to load it successfully.  
> Certainly, rebuilding the xref table comes to mind, but any other info would 
> also be useful.
> This is a wish for 3.x.  I don't think I'll have time to contribute. :(



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5490) Add reconstruction information to the PDDocument

2022-08-11 Thread Maruan Sahyoun (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17578537#comment-17578537
 ] 

Maruan Sahyoun commented on PDFBOX-5490:


OK - let's wait what [~lehmi]  has to say about that as he's the one - apart 
from other areas - doing the parser. Looks like we need a somewhat extensible 
Event data model in order to deal with different needs and being extensible ...

> Add reconstruction information to the PDDocument
> 
>
> Key: PDFBOX-5490
> URL: https://issues.apache.org/jira/browse/PDFBOX-5490
> Project: PDFBox
>  Issue Type: Wish
>  Components: Parsing
>Reporter: Tim Allison
>Priority: Minor
>
> When the xref has to be rebuilt or there are other anomalies in the parsing 
> of the PDDocument, the results are currently logged.  In a multithreaded 
> environment it is not easy to reconstruct which documents had which problems.
> It would be helpful if a PDF was able to be successfully loaded to include 
> information about what had to be fixed in order to load it successfully.  
> Certainly, rebuilding the xref table comes to mind, but any other info would 
> also be useful.
> This is a wish for 3.x.  I don't think I'll have time to contribute. :(



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5490) Add reconstruction information to the PDDocument

2022-08-11 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17578510#comment-17578510
 ] 

Tim Allison commented on PDFBOX-5490:
-

My initial request would be for whether or not the xref table had to be 
rebuilt...largely because I'm somewhat interested in that at the moment. 

Any info at the pre-DOM stage for what had to be guessed or assumed -- alleged 
obj stream length != actual object stream.

Other places where PDFBox currently logs warnings (missing font, missing 
unicode mappings etc) after the DOM has been built would also be useful.

> Add reconstruction information to the PDDocument
> 
>
> Key: PDFBOX-5490
> URL: https://issues.apache.org/jira/browse/PDFBOX-5490
> Project: PDFBox
>  Issue Type: Wish
>  Components: Parsing
>Reporter: Tim Allison
>Priority: Minor
>
> When the xref has to be rebuilt or there are other anomalies in the parsing 
> of the PDDocument, the results are currently logged.  In a multithreaded 
> environment it is not easy to reconstruct which documents had which problems.
> It would be helpful if a PDF was able to be successfully loaded to include 
> information about what had to be fixed in order to load it successfully.  
> Certainly, rebuilding the xref table comes to mind, but any other info would 
> also be useful.
> This is a wish for 3.x.  I don't think I'll have time to contribute. :(



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5490) Add reconstruction information to the PDDocument

2022-08-11 Thread Maruan Sahyoun (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17578480#comment-17578480
 ] 

Maruan Sahyoun commented on PDFBOX-5490:


[~lehmi]  thoughts? I could do a small patch for an initial PoC - maybe 
initially using the FOP events package but havn't looked into it.

[~tallison] what's the information you'd like to capture. Like the fact that 
there was some repair or is there more information you are looking for?

> Add reconstruction information to the PDDocument
> 
>
> Key: PDFBOX-5490
> URL: https://issues.apache.org/jira/browse/PDFBOX-5490
> Project: PDFBox
>  Issue Type: Wish
>  Components: Parsing
>Reporter: Tim Allison
>Priority: Minor
>
> When the xref has to be rebuilt or there are other anomalies in the parsing 
> of the PDDocument, the results are currently logged.  In a multithreaded 
> environment it is not easy to reconstruct which documents had which problems.
> It would be helpful if a PDF was able to be successfully loaded to include 
> information about what had to be fixed in order to load it successfully.  
> Certainly, rebuilding the xref table comes to mind, but any other info would 
> also be useful.
> This is a wish for 3.x.  I don't think I'll have time to contribute. :(



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5490) Add reconstruction information to the PDDocument

2022-08-10 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17578129#comment-17578129
 ] 

Tim Allison commented on PDFBOX-5490:
-

Oh, that looks great.

> Add reconstruction information to the PDDocument
> 
>
> Key: PDFBOX-5490
> URL: https://issues.apache.org/jira/browse/PDFBOX-5490
> Project: PDFBox
>  Issue Type: Wish
>  Components: Parsing
>Reporter: Tim Allison
>Priority: Minor
>
> When the xref has to be rebuilt or there are other anomalies in the parsing 
> of the PDDocument, the results are currently logged.  In a multithreaded 
> environment it is not easy to reconstruct which documents had which problems.
> It would be helpful if a PDF was able to be successfully loaded to include 
> information about what had to be fixed in order to load it successfully.  
> Certainly, rebuilding the xref table comes to mind, but any other info would 
> also be useful.
> This is a wish for 3.x.  I don't think I'll have time to contribute. :(



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5490) Add reconstruction information to the PDDocument

2022-08-10 Thread Maruan Sahyoun (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17578115#comment-17578115
 ] 

Maruan Sahyoun commented on PDFBOX-5490:


Apache fop has something like that 
[https://xmlgraphics.apache.org/fop/2.7/events.html] which looks similar to 
what I had in mind.

 

 

> Add reconstruction information to the PDDocument
> 
>
> Key: PDFBOX-5490
> URL: https://issues.apache.org/jira/browse/PDFBOX-5490
> Project: PDFBox
>  Issue Type: Wish
>  Components: Parsing
>Reporter: Tim Allison
>Priority: Minor
>
> When the xref has to be rebuilt or there are other anomalies in the parsing 
> of the PDDocument, the results are currently logged.  In a multithreaded 
> environment it is not easy to reconstruct which documents had which problems.
> It would be helpful if a PDF was able to be successfully loaded to include 
> information about what had to be fixed in order to load it successfully.  
> Certainly, rebuilding the xref table comes to mind, but any other info would 
> also be useful.
> This is a wish for 3.x.  I don't think I'll have time to contribute. :(



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5490) Add reconstruction information to the PDDocument

2022-08-10 Thread Michael Klink (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17578065#comment-17578065
 ] 

Michael Klink commented on PDFBOX-5490:
---

Sounds like a good idea.

That also would allow to implement some customized parsing strictness if an 
exception thrown by the listener is interpreted a rejected repair...

> Add reconstruction information to the PDDocument
> 
>
> Key: PDFBOX-5490
> URL: https://issues.apache.org/jira/browse/PDFBOX-5490
> Project: PDFBox
>  Issue Type: Wish
>  Components: Parsing
>Reporter: Tim Allison
>Priority: Minor
>
> When the xref has to be rebuilt or there are other anomalies in the parsing 
> of the PDDocument, the results are currently logged.  In a multithreaded 
> environment it is not easy to reconstruct which documents had which problems.
> It would be helpful if a PDF was able to be successfully loaded to include 
> information about what had to be fixed in order to load it successfully.  
> Certainly, rebuilding the xref table comes to mind, but any other info would 
> also be useful.
> This is a wish for 3.x.  I don't think I'll have time to contribute. :(



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5490) Add reconstruction information to the PDDocument

2022-08-10 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17578055#comment-17578055
 ] 

Tim Allison commented on PDFBOX-5490:
-

A Listener would be great.  Any mechanism that would allow programmatic 
retrieval of problems encountered during the parse per file.

> Add reconstruction information to the PDDocument
> 
>
> Key: PDFBOX-5490
> URL: https://issues.apache.org/jira/browse/PDFBOX-5490
> Project: PDFBox
>  Issue Type: Wish
>  Components: Parsing
>Reporter: Tim Allison
>Priority: Minor
>
> When the xref has to be rebuilt or there are other anomalies in the parsing 
> of the PDDocument, the results are currently logged.  In a multithreaded 
> environment it is not easy to reconstruct which documents had which problems.
> It would be helpful if a PDF was able to be successfully loaded to include 
> information about what had to be fixed in order to load it successfully.  
> Certainly, rebuilding the xref table comes to mind, but any other info would 
> also be useful.
> This is a wish for 3.x.  I don't think I'll have time to contribute. :(



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5490) Add reconstruction information to the PDDocument

2022-08-10 Thread Maruan Sahyoun (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17578037#comment-17578037
 ] 

Maruan Sahyoun commented on PDFBOX-5490:


Would you think that this should be part of the PDDocument or what about being 
able to attach a listener? Preferences?

> Add reconstruction information to the PDDocument
> 
>
> Key: PDFBOX-5490
> URL: https://issues.apache.org/jira/browse/PDFBOX-5490
> Project: PDFBox
>  Issue Type: Wish
>  Components: Parsing
>Reporter: Tim Allison
>Priority: Minor
>
> When the xref has to be rebuilt or there are other anomalies in the parsing 
> of the PDDocument, the results are currently logged.  In a multithreaded 
> environment it is not easy to reconstruct which documents had which problems.
> It would be helpful if a PDF was able to be successfully loaded to include 
> information about what had to be fixed in order to load it successfully.  
> Certainly, rebuilding the xref table comes to mind, but any other info would 
> also be useful.
> This is a wish for 3.x.  I don't think I'll have time to contribute. :(



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org