[jira] [Created] (PDFBOX-5010) How to cancel/interrupt pdfbox 2 API call

2020-11-04 Thread Kostya Samarin (Jira)
Kostya Samarin created PDFBOX-5010:
--

 Summary: How to cancel/interrupt pdfbox 2 API call 
 Key: PDFBOX-5010
 URL: https://issues.apache.org/jira/browse/PDFBOX-5010
 Project: PDFBox
  Issue Type: Improvement
Reporter: Kostya Samarin


We uses FixedThreadPool to process pdfs using PDFBOX 2 APIs. It looks like 
there is no way to cancel/interrupt long time calls gracefully. So
 * Would it worth to have a timeout for that or some sort of progress callback?
 * Could  you please provide recommendations how to cancel/interrupt long time 
calls gracefully now?  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5009) Corrupt PDF can lead to a StackOverflow

2020-11-04 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-5009:

Fix Version/s: 3.0.0 PDFBox
   2.0.22

> Corrupt PDF can lead to a StackOverflow
> ---
>
> Key: PDFBOX-5009
> URL: https://issues.apache.org/jira/browse/PDFBOX-5009
> Project: PDFBox
>  Issue Type: Task
>  Components: Text extraction
>Affects Versions: 2.0.21
>Reporter: Tim Allison
>Priority: Minor
> Fix For: 2.0.22, 3.0.0 PDFBox
>
>
> See TIKA-3224.  I confirmed this with 2.0.21 by calling the app's ExtractText 
> on the file posted on the Tika issue.
> cc [~dadoonet]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5009) Corrupt PDF can lead to a StackOverflow

2020-11-04 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17226501#comment-17226501
 ] 

Tilman Hausherr commented on PDFBOX-5009:
-

I'm able to catch this by using a set to prevent a recursive call with the same 
parameter:
{code:java}
private final class PageIterator implements Iterator
{
private final Queue queue = new ArrayDeque<>();
private Set set = new HashSet<>();

private PageIterator(COSDictionary node)
{
enqueueKids(node);
}
private void enqueueKids(COSDictionary node)
{
if (isPageTreeNode(node))
{
List kids = getKids(node);
for (COSDictionary kid : kids)
{

// ** NEW **
if (set.contains(kid))
{
LOG.error("This node has already been visited");
continue;
}
else
{
set.add(kid);
}

enqueueKids(kid);
}
}
else
{
queue.add(node);
}
}
 {code}
 

> Corrupt PDF can lead to a StackOverflow
> ---
>
> Key: PDFBOX-5009
> URL: https://issues.apache.org/jira/browse/PDFBOX-5009
> Project: PDFBox
>  Issue Type: Task
>  Components: Text extraction
>Affects Versions: 2.0.21
>Reporter: Tim Allison
>Priority: Minor
>
> See TIKA-3224.  I confirmed this with 2.0.21 by calling the app's ExtractText 
> on the file posted on the Tika issue.
> cc [~dadoonet]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5009) Corrupt PDF can lead to a StackOverflow

2020-11-04 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17226486#comment-17226486
 ] 

Tilman Hausherr commented on PDFBOX-5009:
-

I added some logging and stack tracing to see when it starts:
{noformat}
020-11-05 05:19:14 WARN  PDPageTree:154 - i = 4, element is: COSObject{207, 0}
2020-11-05 05:19:14 WARN  PDPageTree:155 - COSDictionary expected, but got null
java.lang.Exception
at org.apache.pdfbox.pdmodel.PDPageTree.getKids(PDPageTree.java:157)
at org.apache.pdfbox.pdmodel.PDPageTree.access$200(PDPageTree.java:41)
at 
org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:184)
at 
org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:187)
at 
org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:187)
at 
org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.(PDPageTree.java:173)
at 
org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.(PDPageTree.java:167)
at org.apache.pdfbox.pdmodel.PDPageTree.iterator(PDPageTree.java:126)
at 
org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:289)
at 
org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:241)
at 
org.apache.pdfbox.tools.ExtractText.extractPages(ExtractText.java:364)
at 
org.apache.pdfbox.tools.ExtractText.startExtraction(ExtractText.java:267)
at org.apache.pdfbox.tools.ExtractText.main(ExtractText.java:98)
at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:57)
2020-11-05 05:19:14 WARN  PDPageTree:154 - i = 5, element is: COSObject{214, 0}
2020-11-05 05:19:14 WARN  PDPageTree:155 - COSDictionary expected, but got null
java.lang.Exception
at org.apache.pdfbox.pdmodel.PDPageTree.getKids(PDPageTree.java:157)
at org.apache.pdfbox.pdmodel.PDPageTree.access$200(PDPageTree.java:41)
at 
org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:184)
at 
org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:187)
at 
org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:187)
at 
org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.(PDPageTree.java:173)
at 
org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.(PDPageTree.java:167)
at org.apache.pdfbox.pdmodel.PDPageTree.iterator(PDPageTree.java:126)
at 
org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:289)
at 
org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:241)
at 
org.apache.pdfbox.tools.ExtractText.extractPages(ExtractText.java:364)
at 
org.apache.pdfbox.tools.ExtractText.startExtraction(ExtractText.java:267)
at org.apache.pdfbox.tools.ExtractText.main(ExtractText.java:98)
at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:57) {noformat}

> Corrupt PDF can lead to a StackOverflow
> ---
>
> Key: PDFBOX-5009
> URL: https://issues.apache.org/jira/browse/PDFBOX-5009
> Project: PDFBox
>  Issue Type: Task
>  Components: Text extraction
>Affects Versions: 2.0.21
>Reporter: Tim Allison
>Priority: Minor
>
> See TIKA-3224.  I confirmed this with 2.0.21 by calling the app's ExtractText 
> on the file posted on the Tika issue.
> cc [~dadoonet]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3953) StackOverflowError in org.apache.pdfbox.pdmodel.PDPageTree.getKids

2020-11-04 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-3953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17226417#comment-17226417
 ] 

Tim Allison commented on PDFBOX-3953:
-

Related?

> StackOverflowError in org.apache.pdfbox.pdmodel.PDPageTree.getKids
> --
>
> Key: PDFBOX-3953
> URL: https://issues.apache.org/jira/browse/PDFBOX-3953
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel
>Affects Versions: 2.0.7
>Reporter: Jorge Spinsanti
>Priority: Major
>
> I got an StackOverflowError in 
> org.apache.pdfbox.pdmodel.PDPageTree.getKids(PDPageTree.java:135)
> {code}
> java.lang.StackOverflowError
>   at org.apache.pdfbox.pdmodel.PDPageTree.getKids(PDPageTree.java:135)
>   at org.apache.pdfbox.pdmodel.PDPageTree.access$200(PDPageTree.java:38)
>   at 
> org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:166)
>   at 
> org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169)
>   at 
> org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169)
>   at 
> org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169)
>   at 
> org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169)
>   at 
> org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169)
>   at 
> org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169)
>   at 
> org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169)
>   at 
> org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169)
>   at 
> org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169)
>   at 
> org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169)
> ...
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5009) Corrupt PDF can lead to a StackOverflow

2020-11-04 Thread Maruan Sahyoun (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maruan Sahyoun updated PDFBOX-5009:
---
Affects Version/s: 2.0.21

> Corrupt PDF can lead to a StackOverflow
> ---
>
> Key: PDFBOX-5009
> URL: https://issues.apache.org/jira/browse/PDFBOX-5009
> Project: PDFBox
>  Issue Type: Task
>  Components: Text extraction
>Affects Versions: 2.0.21
>Reporter: Tim Allison
>Priority: Minor
>
> See TIKA-3224.  I confirmed this with 2.0.21 by calling the app's ExtractText 
> on the file posted on the Tika issue.
> cc [~dadoonet]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5009) Corrupt PDF can lead to a StackOverflow

2020-11-04 Thread Maruan Sahyoun (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maruan Sahyoun updated PDFBOX-5009:
---
Component/s: Text extraction

> Corrupt PDF can lead to a StackOverflow
> ---
>
> Key: PDFBOX-5009
> URL: https://issues.apache.org/jira/browse/PDFBOX-5009
> Project: PDFBox
>  Issue Type: Task
>  Components: Text extraction
>Reporter: Tim Allison
>Priority: Minor
>
> See TIKA-3224.  I confirmed this with 2.0.21 by calling the app's ExtractText 
> on the file posted on the Tika issue.
> cc [~dadoonet]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-5009) Corrupt PDF can lead to a StackOverflow

2020-11-04 Thread Tim Allison (Jira)
Tim Allison created PDFBOX-5009:
---

 Summary: Corrupt PDF can lead to a StackOverflow
 Key: PDFBOX-5009
 URL: https://issues.apache.org/jira/browse/PDFBOX-5009
 Project: PDFBox
  Issue Type: Task
Reporter: Tim Allison


See TIKA-3224.  I confirmed this with 2.0.21 by calling the app's ExtractText 
on the file posted on the Tika issue.

cc [~dadoonet]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3953) StackOverflowError in org.apache.pdfbox.pdmodel.PDPageTree.getKids

2020-11-04 Thread Michael Klink (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-3953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17226357#comment-17226357
 ] 

Michael Klink commented on PDFBOX-3953:
---

The PDF file embedded in that docx file actually appears to originally have 
been a 4509210 bytes long PDF the first 4496523 bytes of have been overwritten 
with a different PDF (a linearized PDF-1.3 file with cross reference streams... 
ahem). Thus, the cross reference table of the original file points to 
completely random locations in the slightly smaller file. This can result in 
arbitrary exceptions...

> StackOverflowError in org.apache.pdfbox.pdmodel.PDPageTree.getKids
> --
>
> Key: PDFBOX-3953
> URL: https://issues.apache.org/jira/browse/PDFBOX-3953
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel
>Affects Versions: 2.0.7
>Reporter: Jorge Spinsanti
>Priority: Major
>
> I got an StackOverflowError in 
> org.apache.pdfbox.pdmodel.PDPageTree.getKids(PDPageTree.java:135)
> {code}
> java.lang.StackOverflowError
>   at org.apache.pdfbox.pdmodel.PDPageTree.getKids(PDPageTree.java:135)
>   at org.apache.pdfbox.pdmodel.PDPageTree.access$200(PDPageTree.java:38)
>   at 
> org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:166)
>   at 
> org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169)
>   at 
> org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169)
>   at 
> org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169)
>   at 
> org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169)
>   at 
> org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169)
>   at 
> org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169)
>   at 
> org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169)
>   at 
> org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169)
>   at 
> org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169)
>   at 
> org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169)
> ...
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-1529) Exchange hard-coded values for variables and provide command-line options in TextToPDF component

2020-11-04 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17226319#comment-17226319
 ] 

Tilman Hausherr commented on PDFBOX-1529:
-

4) has been done in PDFBOX-4025.

> Exchange hard-coded values for variables and provide command-line options in 
> TextToPDF component
> 
>
> Key: PDFBOX-1529
> URL: https://issues.apache.org/jira/browse/PDFBOX-1529
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Utilities
>Affects Versions: 1.7.1
>Reporter: Dave Powell
>Assignee: Andreas Lehmkühler
>Priority: Minor
>  Labels: features, newbie, patch
> Attachments: 
> patch-pdfbox-src-main-java-org-apache-pdfbox-TextToPDF.java.diff
>
>
> Exchange hard-coded values for variables and provide command-line options in 
> TextToPDF component
> 1) Enable the margins to be individually set from the command-line
> 2) Enable the font size to be represented as a floating-point value, e.g. 
> 10.5 or 11.5
> 3) Allow the line-spacing to be changed from the command-line
> 4) Allow the page size to be changed from the command-line, e.g. A4, A3, 
> US-Letter
> I will provide a patch for review for this added functionality



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5008) Wrong page dimensions

2020-11-04 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17226318#comment-17226318
 ] 

Tilman Hausherr commented on PDFBOX-5008:
-

{code}
PDRectangle mediaBox = doc.getPage(0).getMediaBox();
System.out.println(doc.getPage(0).getMediaBox() + " " + mediaBox.getHeight() / 
mediaBox.getWidth());
{code}
output:
{noformat}
[0.0,0.0,595.0,842.0] 1.4151261
{noformat}

You won't find it in the COSDictionary because it is higher up (this is 
unusual, but allowed).


> Wrong page dimensions
> -
>
> Key: PDFBOX-5008
> URL: https://issues.apache.org/jira/browse/PDFBOX-5008
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel
>Affects Versions: 2.0.21
> Environment: Java 11, Windows 10
>Reporter: m m
>Priority: Major
>
> For certain PDF files the dimensions seem incorrect when read, in comparison 
> to what other tools like Adobe Acrobat Reader gives (when inspecting document 
> properties).
> I will attach a PDF file as an example. With Acrobat Reader i get the normal 
> page dimensions (210mm/297mm  = *1.41*), but with PDFBox for each page, with 
> Crop box and Media box i get 612/792 = *1.29*. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Closed] (PDFBOX-5007) Content not visible after merging pdf with another pdf

2020-11-04 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr closed PDFBOX-5007.
---
Resolution: Not A Bug

> Content not visible after merging pdf with another pdf
> --
>
> Key: PDFBOX-5007
> URL: https://issues.apache.org/jira/browse/PDFBOX-5007
> Project: PDFBox
>  Issue Type: Bug
>  Components: AcroForm
>Reporter: Ashish Yadav
>Priority: Major
>  Labels: XFA
> Attachments: Accredo pdf blank .pdf, image-2020-11-04-11-11-50-629.png
>
>
> Pdf content is not visible after merging the pdf with another pdf. Please 
> find the attached error message while viewing the pdf file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Reopened] (PDFBOX-5007) Content not visible after merging pdf with another pdf

2020-11-04 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr reopened PDFBOX-5007:
-

> Content not visible after merging pdf with another pdf
> --
>
> Key: PDFBOX-5007
> URL: https://issues.apache.org/jira/browse/PDFBOX-5007
> Project: PDFBox
>  Issue Type: Bug
>  Components: AcroForm
>Reporter: Ashish Yadav
>Priority: Major
>  Labels: XFA
> Attachments: Accredo pdf blank .pdf, image-2020-11-04-11-11-50-629.png
>
>
> Pdf content is not visible after merging the pdf with another pdf. Please 
> find the attached error message while viewing the pdf file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3953) StackOverflowError in org.apache.pdfbox.pdmodel.PDPageTree.getKids

2020-11-04 Thread Mathaus Erich Ulbrich (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-3953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17226037#comment-17226037
 ] 

Mathaus Erich Ulbrich commented on PDFBOX-3953:
---

I catch the same problem when using Apache Tika in Elasticsearch to extract an 
embedded PDF in word file.

https://discuss.elastic.co/t/stackoverflow-on-elasticsearch-file-indexation-with-ingest-attachment/253455/4

> StackOverflowError in org.apache.pdfbox.pdmodel.PDPageTree.getKids
> --
>
> Key: PDFBOX-3953
> URL: https://issues.apache.org/jira/browse/PDFBOX-3953
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel
>Affects Versions: 2.0.7
>Reporter: Jorge Spinsanti
>Priority: Major
>
> I got an StackOverflowError in 
> org.apache.pdfbox.pdmodel.PDPageTree.getKids(PDPageTree.java:135)
> {code}
> java.lang.StackOverflowError
>   at org.apache.pdfbox.pdmodel.PDPageTree.getKids(PDPageTree.java:135)
>   at org.apache.pdfbox.pdmodel.PDPageTree.access$200(PDPageTree.java:38)
>   at 
> org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:166)
>   at 
> org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169)
>   at 
> org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169)
>   at 
> org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169)
>   at 
> org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169)
>   at 
> org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169)
>   at 
> org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169)
>   at 
> org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169)
>   at 
> org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169)
>   at 
> org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169)
>   at 
> org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.enqueueKids(PDPageTree.java:169)
> ...
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Resolved] (PDFBOX-5007) Content not visible after merging pdf with another pdf

2020-11-04 Thread Maruan Sahyoun (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maruan Sahyoun resolved PDFBOX-5007.

Resolution: Not A Bug

> Content not visible after merging pdf with another pdf
> --
>
> Key: PDFBOX-5007
> URL: https://issues.apache.org/jira/browse/PDFBOX-5007
> Project: PDFBox
>  Issue Type: Bug
>  Components: AcroForm
>Reporter: Ashish Yadav
>Priority: Major
>  Labels: XFA
> Attachments: Accredo pdf blank .pdf, image-2020-11-04-11-11-50-629.png
>
>
> Pdf content is not visible after merging the pdf with another pdf. Please 
> find the attached error message while viewing the pdf file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5007) Content not visible after merging pdf with another pdf

2020-11-04 Thread Maruan Sahyoun (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17226020#comment-17226020
 ] 

Maruan Sahyoun commented on PDFBOX-5007:


Well - a dynamic XFA consists basically of a template, data and scripting. An 
XFA processor will take that and render it into what you see on screen (using a 
supporting viewer). So the resulting document is generated on the fly. E.g. a 
template might have a definition for a single data line of a purchase order. No 
supplying (or entering) data for multiple lines the XFA processor takes the 
data creates a runtime model of the template and the binding result of the data 
into what might now be a purchase order with several hundred lines.

What changes are required:
- use a XFA processor to render the XFA together with the data being held into 
a PDF document or
- save the Form in Adobe Form Designer into a *static* PDF. You can the merge it

Added note. I'm working a lot with XFA based forms and workflows for customers. 
Using the proper software this can be a good technology. But pdfbox is not (and 
likely will not) be an XFA processor. For Firefox or others you need to ask the 
appropriate projects.

I will be closing the ticket - for further questions please do use the users 
mailinglist https://pdfbox.apache.org/mailinglists.html but be aware that - 
again - pdfbox can't help.

> Content not visible after merging pdf with another pdf
> --
>
> Key: PDFBOX-5007
> URL: https://issues.apache.org/jira/browse/PDFBOX-5007
> Project: PDFBox
>  Issue Type: Bug
>  Components: AcroForm
>Reporter: Ashish Yadav
>Priority: Major
>  Labels: XFA
> Attachments: Accredo pdf blank .pdf, image-2020-11-04-11-11-50-629.png
>
>
> Pdf content is not visible after merging the pdf with another pdf. Please 
> find the attached error message while viewing the pdf file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-5007) Content not visible after merging pdf with another pdf

2020-11-04 Thread Ashish Yadav (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17226010#comment-17226010
 ] 

Ashish Yadav edited comment on PDFBOX-5007 at 11/4/20, 12:01 PM:
-

Can you explain what does mean by renders the XFA first into a regular PDF?

What changes is required to achieve the this?


was (Author: 703251012):
Can you explain what does mean by renders the XFA first into a regular PDF?

> Content not visible after merging pdf with another pdf
> --
>
> Key: PDFBOX-5007
> URL: https://issues.apache.org/jira/browse/PDFBOX-5007
> Project: PDFBox
>  Issue Type: Bug
>  Components: AcroForm
>Reporter: Ashish Yadav
>Priority: Major
>  Labels: XFA
> Attachments: Accredo pdf blank .pdf, image-2020-11-04-11-11-50-629.png
>
>
> Pdf content is not visible after merging the pdf with another pdf. Please 
> find the attached error message while viewing the pdf file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5007) Content not visible after merging pdf with another pdf

2020-11-04 Thread Ashish Yadav (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17226010#comment-17226010
 ] 

Ashish Yadav commented on PDFBOX-5007:
--

Can you explain what does mean by renders the XFA first into a regular PDF?

> Content not visible after merging pdf with another pdf
> --
>
> Key: PDFBOX-5007
> URL: https://issues.apache.org/jira/browse/PDFBOX-5007
> Project: PDFBox
>  Issue Type: Bug
>  Components: AcroForm
>Reporter: Ashish Yadav
>Priority: Major
>  Labels: XFA
> Attachments: Accredo pdf blank .pdf, image-2020-11-04-11-11-50-629.png
>
>
> Pdf content is not visible after merging the pdf with another pdf. Please 
> find the attached error message while viewing the pdf file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Closed] (PDFBOX-5008) Wrong page dimensions

2020-11-04 Thread m m (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

m m closed PDFBOX-5008.
---
Resolution: Not A Bug

PDF COSDictionary was missing MediaBox attribute

> Wrong page dimensions
> -
>
> Key: PDFBOX-5008
> URL: https://issues.apache.org/jira/browse/PDFBOX-5008
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel
>Affects Versions: 2.0.21
> Environment: Java 11, Windows 10
>Reporter: m m
>Priority: Major
>
> For certain PDF files the dimensions seem incorrect when read, in comparison 
> to what other tools like Adobe Acrobat Reader gives (when inspecting document 
> properties).
> I will attach a PDF file as an example. With Acrobat Reader i get the normal 
> page dimensions (210mm/297mm  = *1.41*), but with PDFBox for each page, with 
> Crop box and Media box i get 612/792 = *1.29*. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5008) Wrong page dimensions

2020-11-04 Thread m m (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

m m updated PDFBOX-5008:

Attachment: (was: draw-report.pdf)

> Wrong page dimensions
> -
>
> Key: PDFBOX-5008
> URL: https://issues.apache.org/jira/browse/PDFBOX-5008
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel
>Affects Versions: 2.0.21
> Environment: Java 11, Windows 10
>Reporter: m m
>Priority: Major
>
> For certain PDF files the dimensions seem incorrect when read, in comparison 
> to what other tools like Adobe Acrobat Reader gives (when inspecting document 
> properties).
> I will attach a PDF file as an example. With Acrobat Reader i get the normal 
> page dimensions (210mm/297mm  = *1.41*), but with PDFBox for each page, with 
> Crop box and Media box i get 612/792 = *1.29*. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5007) Content not visible after merging pdf with another pdf

2020-11-04 Thread Maruan Sahyoun (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17225949#comment-17225949
 ] 

Maruan Sahyoun commented on PDFBOX-5007:


We can not merge a dynamic XFA with another PDF. That would need us to do an 
XFA rendering first which would be several month of development and not 
something we have the resources for. You need to render the XFA first into a 
regular PDF. With dynamic XFA PDF is only the container to hold the XFA content.

> Content not visible after merging pdf with another pdf
> --
>
> Key: PDFBOX-5007
> URL: https://issues.apache.org/jira/browse/PDFBOX-5007
> Project: PDFBox
>  Issue Type: Bug
>  Components: AcroForm
>Reporter: Ashish Yadav
>Priority: Major
>  Labels: XFA
> Attachments: Accredo pdf blank .pdf, image-2020-11-04-11-11-50-629.png
>
>
> Pdf content is not visible after merging the pdf with another pdf. Please 
> find the attached error message while viewing the pdf file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5007) Content not visible after merging pdf with another pdf

2020-11-04 Thread Ashish Yadav (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17225938#comment-17225938
 ] 

Ashish Yadav commented on PDFBOX-5007:
--

Can you resolve the merging issue?

We were facing a merging issue with fillable pdf earlier and after version 
update it resolves.

> Content not visible after merging pdf with another pdf
> --
>
> Key: PDFBOX-5007
> URL: https://issues.apache.org/jira/browse/PDFBOX-5007
> Project: PDFBox
>  Issue Type: Bug
>  Components: AcroForm
>Reporter: Ashish Yadav
>Priority: Major
>  Labels: XFA
> Attachments: Accredo pdf blank .pdf, image-2020-11-04-11-11-50-629.png
>
>
> Pdf content is not visible after merging the pdf with another pdf. Please 
> find the attached error message while viewing the pdf file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-5008) Wrong page dimensions

2020-11-04 Thread m m (Jira)
m m created PDFBOX-5008:
---

 Summary: Wrong page dimensions
 Key: PDFBOX-5008
 URL: https://issues.apache.org/jira/browse/PDFBOX-5008
 Project: PDFBox
  Issue Type: Bug
  Components: PDModel
Affects Versions: 2.0.21
 Environment: Java 11, Windows 10
Reporter: m m
 Attachments: draw-report.pdf

For certain PDF files the dimensions seem incorrect when read, in comparison to 
what other tools like Adobe Acrobat Reader gives (when inspecting document 
properties).

I will attach a PDF file as an example. With Acrobat Reader i get the normal 
page dimensions (210mm/297mm  = *1.41*), but with PDFBox for each page, with 
Crop box and Media box i get 612/792 = *1.29*. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org