[jira] [Comment Edited] (PDFBOX-4750) java.io.IOException: Error:Unknown type in content stream:COSNull{}

2020-01-25 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023589#comment-17023589
 ] 

Tilman Hausherr edited comment on PDFBOX-4750 at 1/25/20 8:42 PM:
--

Test was also to see whether we increased code coverage. Sadly it dropped from 
46% to 45.6% (maybe because of other commits that balanced things differently). 
The code coverage of the class went up to 79.6% from 43.4%.

TODOs:
- use only page 8 - done
- same test for 2.0
- loading of file through pom
- find a file that has dictionary (isn't covered) - done
- correct formatting for dictionary (probably unneeded space) - done


was (Author: tilman):
Test was also to see whether we increased code coverage. Sadly it dropped from 
46% to 45.6% (maybe because of other commits that balanced things differently). 
The code coverage of the class went up to 79.6% from 43.4%.

TODOs:
- use only page 8
- same test for 2.0
- loading of file through pom
- find a file that has dictionary (isn't covered)
- correct formatting for dictionary (probably unneeded space)

> java.io.IOException: Error:Unknown type in content stream:COSNull{}
> ---
>
> Key: PDFBOX-4750
> URL: https://issues.apache.org/jira/browse/PDFBOX-4750
> Project: PDFBox
>  Issue Type: Bug
>  Components: Writing
>Affects Versions: 2.0.8, 2.0.18
>Reporter: tomas kochan
>Assignee: Tilman Hausherr
>Priority: Major
> Fix For: 2.0.19, 3.0.0 PDFBox
>
> Attachments: 01 - K17 - Was dahinter steckt - dsb.pdf, 
> PDFBOX-4750-test.pdf, contentAllOperatorsOfCorruptedPage.txt
>
>
> By removing some optional content for specific document, which is bordered 
> with Operator BDC and EMC, we are facing an issue by writing the changed set 
> of tokens into PDStream. 
>  The code looks like:
>  PDStream updatedStream = new PDStream(document);
>  OutputStream out = updatedStream.getCOSObject().createRawOutputStream();
>  ContentStreamWriter tokenWriter = new ContentStreamWriter(out);
>  tokenWriter.writeTokens(result);
>  out.flush();
>  out.close();
>  page.setContents(updatedStream);
>   
>  The following exception occurs at line 'tokenWriter.writeTokens(result);' :
>  java.io.IOException: Error:Unknown type in content stream:COSNull{}
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(ContentStreamWriter.java:199)
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(ContentStreamWriter.java:146)
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(ContentStreamWriter.java:181)
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeTokens(ContentStreamWriter.java:109)
>  at 
> de.justiz.eip.pdf.tools.PdfContext.getOrRemoveOptionalTextContentfromPage(PdfContext.java:429)
>  at 
> de.justiz.eip.pdf.tools.paging.PagingInfoInterpreterPdfContext.removePagingInfo(PagingInfoInterpreterPdfContext.java:325)
>  
> After the analyze we figured out two issues:
>  1. We assume, the Pdf Document it's self is corrupted, It contains on some 
> place operator BI, which is based on the PDF-Reference-V1.7 a begin of inline 
> image object. This Operator is not followed by "ID" or "EI" operator. 
>  Extract from list of Tokens:
>  next PDFOperator\{Do}
> next COSFloat\{0.016674607}
> next COSInt\{0}
>  next COSInt\{0}
> next COSFloat\{0.061831153}
> next COSFloat\{0.070509767}
> next COSFloat\{-0.302021403}
>  next PDFOperator\{cm}
> next PDFOperator\{BI}
> next PDFOperator\{Q}
>  next PDFOperator\{Q}
> next COSName\{OC}
> next COSName\{eAkteOptionalContent7}
> next PDFOperator\{BDC}
> Moreover one "DP" Entry in the "BI" operator's COSDictionary contains 
> COSArray with COSNull values. However the assumption is, that the COSNull 
> values are not forbidden in the Pdf content. 
>  
> COSDictionary\{COSName{Interpolate}:true;COSName\{W}:COSInt\{35};COSName\{H}:COSInt\{26};COSName\{CS}:COSName\{RGB};COSName\{BPC}:COSInt\{8};COSName\{F}:COSArray\{[COSName{A85},COSName\{DCT}]};COSName\{DP}:COSArray{[COSNull{},COSNull{}]};}
> 2. Despite wrong content in the pdf-document (described above) the PDF-Box 
> api crashed by storing this operators into PDStream by his inability to 
> recognize COSNull in the method 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(Object)
>  
> The assumption on this place is, that the method "writeObject" forgot to 
> cover COSNull  as an valid input. The org.apache.pdfbox.cos.COSNull.NULL is 
> valid Object, which is broadly used by PDF-Api itself.
> The Method 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(Object)  PDF-Api 
> 2.0.8, also in 2.0.18 doesn't cover the COSNull  case in it's if/else 
> conditions, instead of it throws the new IOException( "Error:Unknown type in 
> content stream:" + o ). 
>  
> Could you 

[jira] [Commented] (PDFBOX-4750) java.io.IOException: Error:Unknown type in content stream:COSNull{}

2020-01-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023632#comment-17023632
 ] 

ASF subversion and git services commented on PDFBOX-4750:
-

Commit 1873162 from Tilman Hausherr in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1873162 ]

PDFBOX-4750: replace test file with one that also writes a dictionary to 
increase test coverage

> java.io.IOException: Error:Unknown type in content stream:COSNull{}
> ---
>
> Key: PDFBOX-4750
> URL: https://issues.apache.org/jira/browse/PDFBOX-4750
> Project: PDFBox
>  Issue Type: Bug
>  Components: Writing
>Affects Versions: 2.0.8, 2.0.18
>Reporter: tomas kochan
>Assignee: Tilman Hausherr
>Priority: Major
> Fix For: 2.0.19, 3.0.0 PDFBox
>
> Attachments: 01 - K17 - Was dahinter steckt - dsb.pdf, 
> PDFBOX-4750-test.pdf, contentAllOperatorsOfCorruptedPage.txt
>
>
> By removing some optional content for specific document, which is bordered 
> with Operator BDC and EMC, we are facing an issue by writing the changed set 
> of tokens into PDStream. 
>  The code looks like:
>  PDStream updatedStream = new PDStream(document);
>  OutputStream out = updatedStream.getCOSObject().createRawOutputStream();
>  ContentStreamWriter tokenWriter = new ContentStreamWriter(out);
>  tokenWriter.writeTokens(result);
>  out.flush();
>  out.close();
>  page.setContents(updatedStream);
>   
>  The following exception occurs at line 'tokenWriter.writeTokens(result);' :
>  java.io.IOException: Error:Unknown type in content stream:COSNull{}
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(ContentStreamWriter.java:199)
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(ContentStreamWriter.java:146)
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(ContentStreamWriter.java:181)
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeTokens(ContentStreamWriter.java:109)
>  at 
> de.justiz.eip.pdf.tools.PdfContext.getOrRemoveOptionalTextContentfromPage(PdfContext.java:429)
>  at 
> de.justiz.eip.pdf.tools.paging.PagingInfoInterpreterPdfContext.removePagingInfo(PagingInfoInterpreterPdfContext.java:325)
>  
> After the analyze we figured out two issues:
>  1. We assume, the Pdf Document it's self is corrupted, It contains on some 
> place operator BI, which is based on the PDF-Reference-V1.7 a begin of inline 
> image object. This Operator is not followed by "ID" or "EI" operator. 
>  Extract from list of Tokens:
>  next PDFOperator\{Do}
> next COSFloat\{0.016674607}
> next COSInt\{0}
>  next COSInt\{0}
> next COSFloat\{0.061831153}
> next COSFloat\{0.070509767}
> next COSFloat\{-0.302021403}
>  next PDFOperator\{cm}
> next PDFOperator\{BI}
> next PDFOperator\{Q}
>  next PDFOperator\{Q}
> next COSName\{OC}
> next COSName\{eAkteOptionalContent7}
> next PDFOperator\{BDC}
> Moreover one "DP" Entry in the "BI" operator's COSDictionary contains 
> COSArray with COSNull values. However the assumption is, that the COSNull 
> values are not forbidden in the Pdf content. 
>  
> COSDictionary\{COSName{Interpolate}:true;COSName\{W}:COSInt\{35};COSName\{H}:COSInt\{26};COSName\{CS}:COSName\{RGB};COSName\{BPC}:COSInt\{8};COSName\{F}:COSArray\{[COSName{A85},COSName\{DCT}]};COSName\{DP}:COSArray{[COSNull{},COSNull{}]};}
> 2. Despite wrong content in the pdf-document (described above) the PDF-Box 
> api crashed by storing this operators into PDStream by his inability to 
> recognize COSNull in the method 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(Object)
>  
> The assumption on this place is, that the method "writeObject" forgot to 
> cover COSNull  as an valid input. The org.apache.pdfbox.cos.COSNull.NULL is 
> valid Object, which is broadly used by PDF-Api itself.
> The Method 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(Object)  PDF-Api 
> 2.0.8, also in 2.0.18 doesn't cover the COSNull  case in it's if/else 
> conditions, instead of it throws the new IOException( "Error:Unknown type in 
> content stream:" + o ). 
>  
> Could you confirm, that the method writeObject contains bug and should be 
> corrected to cover also COSNull Object? If so, in which version could we 
> expect the fix?
> Thank you
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-4750) java.io.IOException: Error:Unknown type in content stream:COSNull{}

2020-01-25 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-4750:

Attachment: PDFBOX-4750-test.pdf

> java.io.IOException: Error:Unknown type in content stream:COSNull{}
> ---
>
> Key: PDFBOX-4750
> URL: https://issues.apache.org/jira/browse/PDFBOX-4750
> Project: PDFBox
>  Issue Type: Bug
>  Components: Writing
>Affects Versions: 2.0.8, 2.0.18
>Reporter: tomas kochan
>Assignee: Tilman Hausherr
>Priority: Major
> Fix For: 2.0.19, 3.0.0 PDFBox
>
> Attachments: 01 - K17 - Was dahinter steckt - dsb.pdf, 
> PDFBOX-4750-test.pdf, contentAllOperatorsOfCorruptedPage.txt
>
>
> By removing some optional content for specific document, which is bordered 
> with Operator BDC and EMC, we are facing an issue by writing the changed set 
> of tokens into PDStream. 
>  The code looks like:
>  PDStream updatedStream = new PDStream(document);
>  OutputStream out = updatedStream.getCOSObject().createRawOutputStream();
>  ContentStreamWriter tokenWriter = new ContentStreamWriter(out);
>  tokenWriter.writeTokens(result);
>  out.flush();
>  out.close();
>  page.setContents(updatedStream);
>   
>  The following exception occurs at line 'tokenWriter.writeTokens(result);' :
>  java.io.IOException: Error:Unknown type in content stream:COSNull{}
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(ContentStreamWriter.java:199)
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(ContentStreamWriter.java:146)
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(ContentStreamWriter.java:181)
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeTokens(ContentStreamWriter.java:109)
>  at 
> de.justiz.eip.pdf.tools.PdfContext.getOrRemoveOptionalTextContentfromPage(PdfContext.java:429)
>  at 
> de.justiz.eip.pdf.tools.paging.PagingInfoInterpreterPdfContext.removePagingInfo(PagingInfoInterpreterPdfContext.java:325)
>  
> After the analyze we figured out two issues:
>  1. We assume, the Pdf Document it's self is corrupted, It contains on some 
> place operator BI, which is based on the PDF-Reference-V1.7 a begin of inline 
> image object. This Operator is not followed by "ID" or "EI" operator. 
>  Extract from list of Tokens:
>  next PDFOperator\{Do}
> next COSFloat\{0.016674607}
> next COSInt\{0}
>  next COSInt\{0}
> next COSFloat\{0.061831153}
> next COSFloat\{0.070509767}
> next COSFloat\{-0.302021403}
>  next PDFOperator\{cm}
> next PDFOperator\{BI}
> next PDFOperator\{Q}
>  next PDFOperator\{Q}
> next COSName\{OC}
> next COSName\{eAkteOptionalContent7}
> next PDFOperator\{BDC}
> Moreover one "DP" Entry in the "BI" operator's COSDictionary contains 
> COSArray with COSNull values. However the assumption is, that the COSNull 
> values are not forbidden in the Pdf content. 
>  
> COSDictionary\{COSName{Interpolate}:true;COSName\{W}:COSInt\{35};COSName\{H}:COSInt\{26};COSName\{CS}:COSName\{RGB};COSName\{BPC}:COSInt\{8};COSName\{F}:COSArray\{[COSName{A85},COSName\{DCT}]};COSName\{DP}:COSArray{[COSNull{},COSNull{}]};}
> 2. Despite wrong content in the pdf-document (described above) the PDF-Box 
> api crashed by storing this operators into PDStream by his inability to 
> recognize COSNull in the method 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(Object)
>  
> The assumption on this place is, that the method "writeObject" forgot to 
> cover COSNull  as an valid input. The org.apache.pdfbox.cos.COSNull.NULL is 
> valid Object, which is broadly used by PDF-Api itself.
> The Method 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(Object)  PDF-Api 
> 2.0.8, also in 2.0.18 doesn't cover the COSNull  case in it's if/else 
> conditions, instead of it throws the new IOException( "Error:Unknown type in 
> content stream:" + o ). 
>  
> Could you confirm, that the method writeObject contains bug and should be 
> corrected to cover also COSNull Object? If so, in which version could we 
> expect the fix?
> Thank you
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4750) java.io.IOException: Error:Unknown type in content stream:COSNull{}

2020-01-25 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023631#comment-17023631
 ] 

Tilman Hausherr commented on PDFBOX-4750:
-

Test file of page 8 from the original PDF and the file from PDFBOX-1724 which 
has a dictionary in the content stream, to increase test coverage.

> java.io.IOException: Error:Unknown type in content stream:COSNull{}
> ---
>
> Key: PDFBOX-4750
> URL: https://issues.apache.org/jira/browse/PDFBOX-4750
> Project: PDFBox
>  Issue Type: Bug
>  Components: Writing
>Affects Versions: 2.0.8, 2.0.18
>Reporter: tomas kochan
>Assignee: Tilman Hausherr
>Priority: Major
> Fix For: 2.0.19, 3.0.0 PDFBox
>
> Attachments: 01 - K17 - Was dahinter steckt - dsb.pdf, 
> PDFBOX-4750-test.pdf, contentAllOperatorsOfCorruptedPage.txt
>
>
> By removing some optional content for specific document, which is bordered 
> with Operator BDC and EMC, we are facing an issue by writing the changed set 
> of tokens into PDStream. 
>  The code looks like:
>  PDStream updatedStream = new PDStream(document);
>  OutputStream out = updatedStream.getCOSObject().createRawOutputStream();
>  ContentStreamWriter tokenWriter = new ContentStreamWriter(out);
>  tokenWriter.writeTokens(result);
>  out.flush();
>  out.close();
>  page.setContents(updatedStream);
>   
>  The following exception occurs at line 'tokenWriter.writeTokens(result);' :
>  java.io.IOException: Error:Unknown type in content stream:COSNull{}
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(ContentStreamWriter.java:199)
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(ContentStreamWriter.java:146)
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(ContentStreamWriter.java:181)
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeTokens(ContentStreamWriter.java:109)
>  at 
> de.justiz.eip.pdf.tools.PdfContext.getOrRemoveOptionalTextContentfromPage(PdfContext.java:429)
>  at 
> de.justiz.eip.pdf.tools.paging.PagingInfoInterpreterPdfContext.removePagingInfo(PagingInfoInterpreterPdfContext.java:325)
>  
> After the analyze we figured out two issues:
>  1. We assume, the Pdf Document it's self is corrupted, It contains on some 
> place operator BI, which is based on the PDF-Reference-V1.7 a begin of inline 
> image object. This Operator is not followed by "ID" or "EI" operator. 
>  Extract from list of Tokens:
>  next PDFOperator\{Do}
> next COSFloat\{0.016674607}
> next COSInt\{0}
>  next COSInt\{0}
> next COSFloat\{0.061831153}
> next COSFloat\{0.070509767}
> next COSFloat\{-0.302021403}
>  next PDFOperator\{cm}
> next PDFOperator\{BI}
> next PDFOperator\{Q}
>  next PDFOperator\{Q}
> next COSName\{OC}
> next COSName\{eAkteOptionalContent7}
> next PDFOperator\{BDC}
> Moreover one "DP" Entry in the "BI" operator's COSDictionary contains 
> COSArray with COSNull values. However the assumption is, that the COSNull 
> values are not forbidden in the Pdf content. 
>  
> COSDictionary\{COSName{Interpolate}:true;COSName\{W}:COSInt\{35};COSName\{H}:COSInt\{26};COSName\{CS}:COSName\{RGB};COSName\{BPC}:COSInt\{8};COSName\{F}:COSArray\{[COSName{A85},COSName\{DCT}]};COSName\{DP}:COSArray{[COSNull{},COSNull{}]};}
> 2. Despite wrong content in the pdf-document (described above) the PDF-Box 
> api crashed by storing this operators into PDStream by his inability to 
> recognize COSNull in the method 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(Object)
>  
> The assumption on this place is, that the method "writeObject" forgot to 
> cover COSNull  as an valid input. The org.apache.pdfbox.cos.COSNull.NULL is 
> valid Object, which is broadly used by PDF-Api itself.
> The Method 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(Object)  PDF-Api 
> 2.0.8, also in 2.0.18 doesn't cover the COSNull  case in it's if/else 
> conditions, instead of it throws the new IOException( "Error:Unknown type in 
> content stream:" + o ). 
>  
> Could you confirm, that the method writeObject contains bug and should be 
> corrected to cover also COSNull Object? If so, in which version could we 
> expect the fix?
> Thank you
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4750) java.io.IOException: Error:Unknown type in content stream:COSNull{}

2020-01-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023620#comment-17023620
 ] 

ASF subversion and git services commented on PDFBOX-4750:
-

Commit 1873160 from Tilman Hausherr in branch 'pdfbox/branches/issue45'
[ https://svn.apache.org/r1873160 ]

PDFBOX-4750: don't output space after writeObject() which does this itself

> java.io.IOException: Error:Unknown type in content stream:COSNull{}
> ---
>
> Key: PDFBOX-4750
> URL: https://issues.apache.org/jira/browse/PDFBOX-4750
> Project: PDFBox
>  Issue Type: Bug
>  Components: Writing
>Affects Versions: 2.0.8, 2.0.18
>Reporter: tomas kochan
>Assignee: Tilman Hausherr
>Priority: Major
> Fix For: 2.0.19, 3.0.0 PDFBox
>
> Attachments: 01 - K17 - Was dahinter steckt - dsb.pdf, 
> contentAllOperatorsOfCorruptedPage.txt
>
>
> By removing some optional content for specific document, which is bordered 
> with Operator BDC and EMC, we are facing an issue by writing the changed set 
> of tokens into PDStream. 
>  The code looks like:
>  PDStream updatedStream = new PDStream(document);
>  OutputStream out = updatedStream.getCOSObject().createRawOutputStream();
>  ContentStreamWriter tokenWriter = new ContentStreamWriter(out);
>  tokenWriter.writeTokens(result);
>  out.flush();
>  out.close();
>  page.setContents(updatedStream);
>   
>  The following exception occurs at line 'tokenWriter.writeTokens(result);' :
>  java.io.IOException: Error:Unknown type in content stream:COSNull{}
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(ContentStreamWriter.java:199)
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(ContentStreamWriter.java:146)
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(ContentStreamWriter.java:181)
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeTokens(ContentStreamWriter.java:109)
>  at 
> de.justiz.eip.pdf.tools.PdfContext.getOrRemoveOptionalTextContentfromPage(PdfContext.java:429)
>  at 
> de.justiz.eip.pdf.tools.paging.PagingInfoInterpreterPdfContext.removePagingInfo(PagingInfoInterpreterPdfContext.java:325)
>  
> After the analyze we figured out two issues:
>  1. We assume, the Pdf Document it's self is corrupted, It contains on some 
> place operator BI, which is based on the PDF-Reference-V1.7 a begin of inline 
> image object. This Operator is not followed by "ID" or "EI" operator. 
>  Extract from list of Tokens:
>  next PDFOperator\{Do}
> next COSFloat\{0.016674607}
> next COSInt\{0}
>  next COSInt\{0}
> next COSFloat\{0.061831153}
> next COSFloat\{0.070509767}
> next COSFloat\{-0.302021403}
>  next PDFOperator\{cm}
> next PDFOperator\{BI}
> next PDFOperator\{Q}
>  next PDFOperator\{Q}
> next COSName\{OC}
> next COSName\{eAkteOptionalContent7}
> next PDFOperator\{BDC}
> Moreover one "DP" Entry in the "BI" operator's COSDictionary contains 
> COSArray with COSNull values. However the assumption is, that the COSNull 
> values are not forbidden in the Pdf content. 
>  
> COSDictionary\{COSName{Interpolate}:true;COSName\{W}:COSInt\{35};COSName\{H}:COSInt\{26};COSName\{CS}:COSName\{RGB};COSName\{BPC}:COSInt\{8};COSName\{F}:COSArray\{[COSName{A85},COSName\{DCT}]};COSName\{DP}:COSArray{[COSNull{},COSNull{}]};}
> 2. Despite wrong content in the pdf-document (described above) the PDF-Box 
> api crashed by storing this operators into PDStream by his inability to 
> recognize COSNull in the method 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(Object)
>  
> The assumption on this place is, that the method "writeObject" forgot to 
> cover COSNull  as an valid input. The org.apache.pdfbox.cos.COSNull.NULL is 
> valid Object, which is broadly used by PDF-Api itself.
> The Method 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(Object)  PDF-Api 
> 2.0.8, also in 2.0.18 doesn't cover the COSNull  case in it's if/else 
> conditions, instead of it throws the new IOException( "Error:Unknown type in 
> content stream:" + o ). 
>  
> Could you confirm, that the method writeObject contains bug and should be 
> corrected to cover also COSNull Object? If so, in which version could we 
> expect the fix?
> Thank you
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4750) java.io.IOException: Error:Unknown type in content stream:COSNull{}

2020-01-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023618#comment-17023618
 ] 

ASF subversion and git services commented on PDFBOX-4750:
-

Commit 1873158 from Tilman Hausherr in branch 'pdfbox/branches/2.0'
[ https://svn.apache.org/r1873158 ]

PDFBOX-4750: don't output space after writeObject() which does this itself

> java.io.IOException: Error:Unknown type in content stream:COSNull{}
> ---
>
> Key: PDFBOX-4750
> URL: https://issues.apache.org/jira/browse/PDFBOX-4750
> Project: PDFBox
>  Issue Type: Bug
>  Components: Writing
>Affects Versions: 2.0.8, 2.0.18
>Reporter: tomas kochan
>Assignee: Tilman Hausherr
>Priority: Major
> Fix For: 2.0.19, 3.0.0 PDFBox
>
> Attachments: 01 - K17 - Was dahinter steckt - dsb.pdf, 
> contentAllOperatorsOfCorruptedPage.txt
>
>
> By removing some optional content for specific document, which is bordered 
> with Operator BDC and EMC, we are facing an issue by writing the changed set 
> of tokens into PDStream. 
>  The code looks like:
>  PDStream updatedStream = new PDStream(document);
>  OutputStream out = updatedStream.getCOSObject().createRawOutputStream();
>  ContentStreamWriter tokenWriter = new ContentStreamWriter(out);
>  tokenWriter.writeTokens(result);
>  out.flush();
>  out.close();
>  page.setContents(updatedStream);
>   
>  The following exception occurs at line 'tokenWriter.writeTokens(result);' :
>  java.io.IOException: Error:Unknown type in content stream:COSNull{}
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(ContentStreamWriter.java:199)
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(ContentStreamWriter.java:146)
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(ContentStreamWriter.java:181)
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeTokens(ContentStreamWriter.java:109)
>  at 
> de.justiz.eip.pdf.tools.PdfContext.getOrRemoveOptionalTextContentfromPage(PdfContext.java:429)
>  at 
> de.justiz.eip.pdf.tools.paging.PagingInfoInterpreterPdfContext.removePagingInfo(PagingInfoInterpreterPdfContext.java:325)
>  
> After the analyze we figured out two issues:
>  1. We assume, the Pdf Document it's self is corrupted, It contains on some 
> place operator BI, which is based on the PDF-Reference-V1.7 a begin of inline 
> image object. This Operator is not followed by "ID" or "EI" operator. 
>  Extract from list of Tokens:
>  next PDFOperator\{Do}
> next COSFloat\{0.016674607}
> next COSInt\{0}
>  next COSInt\{0}
> next COSFloat\{0.061831153}
> next COSFloat\{0.070509767}
> next COSFloat\{-0.302021403}
>  next PDFOperator\{cm}
> next PDFOperator\{BI}
> next PDFOperator\{Q}
>  next PDFOperator\{Q}
> next COSName\{OC}
> next COSName\{eAkteOptionalContent7}
> next PDFOperator\{BDC}
> Moreover one "DP" Entry in the "BI" operator's COSDictionary contains 
> COSArray with COSNull values. However the assumption is, that the COSNull 
> values are not forbidden in the Pdf content. 
>  
> COSDictionary\{COSName{Interpolate}:true;COSName\{W}:COSInt\{35};COSName\{H}:COSInt\{26};COSName\{CS}:COSName\{RGB};COSName\{BPC}:COSInt\{8};COSName\{F}:COSArray\{[COSName{A85},COSName\{DCT}]};COSName\{DP}:COSArray{[COSNull{},COSNull{}]};}
> 2. Despite wrong content in the pdf-document (described above) the PDF-Box 
> api crashed by storing this operators into PDStream by his inability to 
> recognize COSNull in the method 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(Object)
>  
> The assumption on this place is, that the method "writeObject" forgot to 
> cover COSNull  as an valid input. The org.apache.pdfbox.cos.COSNull.NULL is 
> valid Object, which is broadly used by PDF-Api itself.
> The Method 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(Object)  PDF-Api 
> 2.0.8, also in 2.0.18 doesn't cover the COSNull  case in it's if/else 
> conditions, instead of it throws the new IOException( "Error:Unknown type in 
> content stream:" + o ). 
>  
> Could you confirm, that the method writeObject contains bug and should be 
> corrected to cover also COSNull Object? If so, in which version could we 
> expect the fix?
> Thank you
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4750) java.io.IOException: Error:Unknown type in content stream:COSNull{}

2020-01-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023619#comment-17023619
 ] 

ASF subversion and git services commented on PDFBOX-4750:
-

Commit 1873159 from Tilman Hausherr in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1873159 ]

PDFBOX-4750: don't output space after writeObject() which does this itself

> java.io.IOException: Error:Unknown type in content stream:COSNull{}
> ---
>
> Key: PDFBOX-4750
> URL: https://issues.apache.org/jira/browse/PDFBOX-4750
> Project: PDFBox
>  Issue Type: Bug
>  Components: Writing
>Affects Versions: 2.0.8, 2.0.18
>Reporter: tomas kochan
>Assignee: Tilman Hausherr
>Priority: Major
> Fix For: 2.0.19, 3.0.0 PDFBox
>
> Attachments: 01 - K17 - Was dahinter steckt - dsb.pdf, 
> contentAllOperatorsOfCorruptedPage.txt
>
>
> By removing some optional content for specific document, which is bordered 
> with Operator BDC and EMC, we are facing an issue by writing the changed set 
> of tokens into PDStream. 
>  The code looks like:
>  PDStream updatedStream = new PDStream(document);
>  OutputStream out = updatedStream.getCOSObject().createRawOutputStream();
>  ContentStreamWriter tokenWriter = new ContentStreamWriter(out);
>  tokenWriter.writeTokens(result);
>  out.flush();
>  out.close();
>  page.setContents(updatedStream);
>   
>  The following exception occurs at line 'tokenWriter.writeTokens(result);' :
>  java.io.IOException: Error:Unknown type in content stream:COSNull{}
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(ContentStreamWriter.java:199)
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(ContentStreamWriter.java:146)
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(ContentStreamWriter.java:181)
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeTokens(ContentStreamWriter.java:109)
>  at 
> de.justiz.eip.pdf.tools.PdfContext.getOrRemoveOptionalTextContentfromPage(PdfContext.java:429)
>  at 
> de.justiz.eip.pdf.tools.paging.PagingInfoInterpreterPdfContext.removePagingInfo(PagingInfoInterpreterPdfContext.java:325)
>  
> After the analyze we figured out two issues:
>  1. We assume, the Pdf Document it's self is corrupted, It contains on some 
> place operator BI, which is based on the PDF-Reference-V1.7 a begin of inline 
> image object. This Operator is not followed by "ID" or "EI" operator. 
>  Extract from list of Tokens:
>  next PDFOperator\{Do}
> next COSFloat\{0.016674607}
> next COSInt\{0}
>  next COSInt\{0}
> next COSFloat\{0.061831153}
> next COSFloat\{0.070509767}
> next COSFloat\{-0.302021403}
>  next PDFOperator\{cm}
> next PDFOperator\{BI}
> next PDFOperator\{Q}
>  next PDFOperator\{Q}
> next COSName\{OC}
> next COSName\{eAkteOptionalContent7}
> next PDFOperator\{BDC}
> Moreover one "DP" Entry in the "BI" operator's COSDictionary contains 
> COSArray with COSNull values. However the assumption is, that the COSNull 
> values are not forbidden in the Pdf content. 
>  
> COSDictionary\{COSName{Interpolate}:true;COSName\{W}:COSInt\{35};COSName\{H}:COSInt\{26};COSName\{CS}:COSName\{RGB};COSName\{BPC}:COSInt\{8};COSName\{F}:COSArray\{[COSName{A85},COSName\{DCT}]};COSName\{DP}:COSArray{[COSNull{},COSNull{}]};}
> 2. Despite wrong content in the pdf-document (described above) the PDF-Box 
> api crashed by storing this operators into PDStream by his inability to 
> recognize COSNull in the method 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(Object)
>  
> The assumption on this place is, that the method "writeObject" forgot to 
> cover COSNull  as an valid input. The org.apache.pdfbox.cos.COSNull.NULL is 
> valid Object, which is broadly used by PDF-Api itself.
> The Method 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(Object)  PDF-Api 
> 2.0.8, also in 2.0.18 doesn't cover the COSNull  case in it's if/else 
> conditions, instead of it throws the new IOException( "Error:Unknown type in 
> content stream:" + o ). 
>  
> Could you confirm, that the method writeObject contains bug and should be 
> corrected to cover also COSNull Object? If so, in which version could we 
> expect the fix?
> Thank you
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4750) java.io.IOException: Error:Unknown type in content stream:COSNull{}

2020-01-25 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023589#comment-17023589
 ] 

Tilman Hausherr commented on PDFBOX-4750:
-

Test was also to see whether we increased code coverage. Sadly it dropped from 
46% to 45.6% (maybe because of other commits that balanced things differently). 
The code coverage of the class went up to 79.6% from 43.4%.

TODOs:
- use only page 8
- same test for 2.0
- loading of file through pom
- find whether file that has dictionary (isn't covered)
- correct formatting for dictionary (probably unneeded space)

> java.io.IOException: Error:Unknown type in content stream:COSNull{}
> ---
>
> Key: PDFBOX-4750
> URL: https://issues.apache.org/jira/browse/PDFBOX-4750
> Project: PDFBox
>  Issue Type: Bug
>  Components: Writing
>Affects Versions: 2.0.8, 2.0.18
>Reporter: tomas kochan
>Assignee: Tilman Hausherr
>Priority: Major
> Fix For: 2.0.19, 3.0.0 PDFBox
>
> Attachments: 01 - K17 - Was dahinter steckt - dsb.pdf, 
> contentAllOperatorsOfCorruptedPage.txt
>
>
> By removing some optional content for specific document, which is bordered 
> with Operator BDC and EMC, we are facing an issue by writing the changed set 
> of tokens into PDStream. 
>  The code looks like:
>  PDStream updatedStream = new PDStream(document);
>  OutputStream out = updatedStream.getCOSObject().createRawOutputStream();
>  ContentStreamWriter tokenWriter = new ContentStreamWriter(out);
>  tokenWriter.writeTokens(result);
>  out.flush();
>  out.close();
>  page.setContents(updatedStream);
>   
>  The following exception occurs at line 'tokenWriter.writeTokens(result);' :
>  java.io.IOException: Error:Unknown type in content stream:COSNull{}
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(ContentStreamWriter.java:199)
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(ContentStreamWriter.java:146)
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(ContentStreamWriter.java:181)
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeTokens(ContentStreamWriter.java:109)
>  at 
> de.justiz.eip.pdf.tools.PdfContext.getOrRemoveOptionalTextContentfromPage(PdfContext.java:429)
>  at 
> de.justiz.eip.pdf.tools.paging.PagingInfoInterpreterPdfContext.removePagingInfo(PagingInfoInterpreterPdfContext.java:325)
>  
> After the analyze we figured out two issues:
>  1. We assume, the Pdf Document it's self is corrupted, It contains on some 
> place operator BI, which is based on the PDF-Reference-V1.7 a begin of inline 
> image object. This Operator is not followed by "ID" or "EI" operator. 
>  Extract from list of Tokens:
>  next PDFOperator\{Do}
> next COSFloat\{0.016674607}
> next COSInt\{0}
>  next COSInt\{0}
> next COSFloat\{0.061831153}
> next COSFloat\{0.070509767}
> next COSFloat\{-0.302021403}
>  next PDFOperator\{cm}
> next PDFOperator\{BI}
> next PDFOperator\{Q}
>  next PDFOperator\{Q}
> next COSName\{OC}
> next COSName\{eAkteOptionalContent7}
> next PDFOperator\{BDC}
> Moreover one "DP" Entry in the "BI" operator's COSDictionary contains 
> COSArray with COSNull values. However the assumption is, that the COSNull 
> values are not forbidden in the Pdf content. 
>  
> COSDictionary\{COSName{Interpolate}:true;COSName\{W}:COSInt\{35};COSName\{H}:COSInt\{26};COSName\{CS}:COSName\{RGB};COSName\{BPC}:COSInt\{8};COSName\{F}:COSArray\{[COSName{A85},COSName\{DCT}]};COSName\{DP}:COSArray{[COSNull{},COSNull{}]};}
> 2. Despite wrong content in the pdf-document (described above) the PDF-Box 
> api crashed by storing this operators into PDStream by his inability to 
> recognize COSNull in the method 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(Object)
>  
> The assumption on this place is, that the method "writeObject" forgot to 
> cover COSNull  as an valid input. The org.apache.pdfbox.cos.COSNull.NULL is 
> valid Object, which is broadly used by PDF-Api itself.
> The Method 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(Object)  PDF-Api 
> 2.0.8, also in 2.0.18 doesn't cover the COSNull  case in it's if/else 
> conditions, instead of it throws the new IOException( "Error:Unknown type in 
> content stream:" + o ). 
>  
> Could you confirm, that the method writeObject contains bug and should be 
> corrected to cover also COSNull Object? If so, in which version could we 
> expect the fix?
> Thank you
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-4750) java.io.IOException: Error:Unknown type in content stream:COSNull{}

2020-01-25 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023589#comment-17023589
 ] 

Tilman Hausherr edited comment on PDFBOX-4750 at 1/25/20 3:21 PM:
--

Test was also to see whether we increased code coverage. Sadly it dropped from 
46% to 45.6% (maybe because of other commits that balanced things differently). 
The code coverage of the class went up to 79.6% from 43.4%.

TODOs:
- use only page 8
- same test for 2.0
- loading of file through pom
- find a file that has dictionary (isn't covered)
- correct formatting for dictionary (probably unneeded space)


was (Author: tilman):
Test was also to see whether we increased code coverage. Sadly it dropped from 
46% to 45.6% (maybe because of other commits that balanced things differently). 
The code coverage of the class went up to 79.6% from 43.4%.

TODOs:
- use only page 8
- same test for 2.0
- loading of file through pom
- find whether file that has dictionary (isn't covered)
- correct formatting for dictionary (probably unneeded space)

> java.io.IOException: Error:Unknown type in content stream:COSNull{}
> ---
>
> Key: PDFBOX-4750
> URL: https://issues.apache.org/jira/browse/PDFBOX-4750
> Project: PDFBox
>  Issue Type: Bug
>  Components: Writing
>Affects Versions: 2.0.8, 2.0.18
>Reporter: tomas kochan
>Assignee: Tilman Hausherr
>Priority: Major
> Fix For: 2.0.19, 3.0.0 PDFBox
>
> Attachments: 01 - K17 - Was dahinter steckt - dsb.pdf, 
> contentAllOperatorsOfCorruptedPage.txt
>
>
> By removing some optional content for specific document, which is bordered 
> with Operator BDC and EMC, we are facing an issue by writing the changed set 
> of tokens into PDStream. 
>  The code looks like:
>  PDStream updatedStream = new PDStream(document);
>  OutputStream out = updatedStream.getCOSObject().createRawOutputStream();
>  ContentStreamWriter tokenWriter = new ContentStreamWriter(out);
>  tokenWriter.writeTokens(result);
>  out.flush();
>  out.close();
>  page.setContents(updatedStream);
>   
>  The following exception occurs at line 'tokenWriter.writeTokens(result);' :
>  java.io.IOException: Error:Unknown type in content stream:COSNull{}
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(ContentStreamWriter.java:199)
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(ContentStreamWriter.java:146)
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(ContentStreamWriter.java:181)
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeTokens(ContentStreamWriter.java:109)
>  at 
> de.justiz.eip.pdf.tools.PdfContext.getOrRemoveOptionalTextContentfromPage(PdfContext.java:429)
>  at 
> de.justiz.eip.pdf.tools.paging.PagingInfoInterpreterPdfContext.removePagingInfo(PagingInfoInterpreterPdfContext.java:325)
>  
> After the analyze we figured out two issues:
>  1. We assume, the Pdf Document it's self is corrupted, It contains on some 
> place operator BI, which is based on the PDF-Reference-V1.7 a begin of inline 
> image object. This Operator is not followed by "ID" or "EI" operator. 
>  Extract from list of Tokens:
>  next PDFOperator\{Do}
> next COSFloat\{0.016674607}
> next COSInt\{0}
>  next COSInt\{0}
> next COSFloat\{0.061831153}
> next COSFloat\{0.070509767}
> next COSFloat\{-0.302021403}
>  next PDFOperator\{cm}
> next PDFOperator\{BI}
> next PDFOperator\{Q}
>  next PDFOperator\{Q}
> next COSName\{OC}
> next COSName\{eAkteOptionalContent7}
> next PDFOperator\{BDC}
> Moreover one "DP" Entry in the "BI" operator's COSDictionary contains 
> COSArray with COSNull values. However the assumption is, that the COSNull 
> values are not forbidden in the Pdf content. 
>  
> COSDictionary\{COSName{Interpolate}:true;COSName\{W}:COSInt\{35};COSName\{H}:COSInt\{26};COSName\{CS}:COSName\{RGB};COSName\{BPC}:COSInt\{8};COSName\{F}:COSArray\{[COSName{A85},COSName\{DCT}]};COSName\{DP}:COSArray{[COSNull{},COSNull{}]};}
> 2. Despite wrong content in the pdf-document (described above) the PDF-Box 
> api crashed by storing this operators into PDStream by his inability to 
> recognize COSNull in the method 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(Object)
>  
> The assumption on this place is, that the method "writeObject" forgot to 
> cover COSNull  as an valid input. The org.apache.pdfbox.cos.COSNull.NULL is 
> valid Object, which is broadly used by PDF-Api itself.
> The Method 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(Object)  PDF-Api 
> 2.0.8, also in 2.0.18 doesn't cover the COSNull  case in it's if/else 
> conditions, instead of it throws the new IOException( "Error:Unknown type in 
> content stream:" + o ). 
>  
> Could you confirm, that the method writeObject 

[jira] [Commented] (PDFBOX-4569) Implement an ondemand Parser

2020-01-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023586#comment-17023586
 ] 

ASF subversion and git services commented on PDFBOX-4569:
-

Commit 1873154 from le...@apache.org in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1873154 ]

PDFBOX-4569: make load methods static

> Implement an ondemand Parser
> 
>
> Key: PDFBOX-4569
> URL: https://issues.apache.org/jira/browse/PDFBOX-4569
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Parsing
>Affects Versions: 3.0.0 PDFBox
>Reporter: Andreas Lehmkühler
>Assignee: Andreas Lehmkühler
>Priority: Major
> Fix For: 3.0.0 PDFBox
>
> Attachments: PDFBOX-1084.pdf
>
>
> There is a need to replace the big bang parser with an ondemand parser



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4569) Implement an ondemand Parser

2020-01-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023585#comment-17023585
 ] 

ASF subversion and git services commented on PDFBOX-4569:
-

Commit 1873153 from le...@apache.org in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1873153 ]

PDFBOX-4569: re-added 2 load methods for convenience, marked as deprecated

> Implement an ondemand Parser
> 
>
> Key: PDFBOX-4569
> URL: https://issues.apache.org/jira/browse/PDFBOX-4569
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Parsing
>Affects Versions: 3.0.0 PDFBox
>Reporter: Andreas Lehmkühler
>Assignee: Andreas Lehmkühler
>Priority: Major
> Fix For: 3.0.0 PDFBox
>
> Attachments: PDFBOX-1084.pdf
>
>
> There is a need to replace the big bang parser with an ondemand parser



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Jenkins build is back to normal : PDFBox-sonar2 #133

2020-01-25 Thread Apache Jenkins Server
See 



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Jenkins build is back to normal : PDFBox-sonar2 » Apache PDFBox #133

2020-01-25 Thread Apache Jenkins Server
See 



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4750) java.io.IOException: Error:Unknown type in content stream:COSNull{}

2020-01-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023562#comment-17023562
 ] 

ASF subversion and git services commented on PDFBOX-4750:
-

Commit 1873152 from Tilman Hausherr in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1873152 ]

PDFBOX-4750: use new Loader class

> java.io.IOException: Error:Unknown type in content stream:COSNull{}
> ---
>
> Key: PDFBOX-4750
> URL: https://issues.apache.org/jira/browse/PDFBOX-4750
> Project: PDFBox
>  Issue Type: Bug
>  Components: Writing
>Affects Versions: 2.0.8, 2.0.18
>Reporter: tomas kochan
>Assignee: Tilman Hausherr
>Priority: Major
> Fix For: 2.0.19, 3.0.0 PDFBox
>
> Attachments: 01 - K17 - Was dahinter steckt - dsb.pdf, 
> contentAllOperatorsOfCorruptedPage.txt
>
>
> By removing some optional content for specific document, which is bordered 
> with Operator BDC and EMC, we are facing an issue by writing the changed set 
> of tokens into PDStream. 
>  The code looks like:
>  PDStream updatedStream = new PDStream(document);
>  OutputStream out = updatedStream.getCOSObject().createRawOutputStream();
>  ContentStreamWriter tokenWriter = new ContentStreamWriter(out);
>  tokenWriter.writeTokens(result);
>  out.flush();
>  out.close();
>  page.setContents(updatedStream);
>   
>  The following exception occurs at line 'tokenWriter.writeTokens(result);' :
>  java.io.IOException: Error:Unknown type in content stream:COSNull{}
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(ContentStreamWriter.java:199)
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(ContentStreamWriter.java:146)
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(ContentStreamWriter.java:181)
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeTokens(ContentStreamWriter.java:109)
>  at 
> de.justiz.eip.pdf.tools.PdfContext.getOrRemoveOptionalTextContentfromPage(PdfContext.java:429)
>  at 
> de.justiz.eip.pdf.tools.paging.PagingInfoInterpreterPdfContext.removePagingInfo(PagingInfoInterpreterPdfContext.java:325)
>  
> After the analyze we figured out two issues:
>  1. We assume, the Pdf Document it's self is corrupted, It contains on some 
> place operator BI, which is based on the PDF-Reference-V1.7 a begin of inline 
> image object. This Operator is not followed by "ID" or "EI" operator. 
>  Extract from list of Tokens:
>  next PDFOperator\{Do}
> next COSFloat\{0.016674607}
> next COSInt\{0}
>  next COSInt\{0}
> next COSFloat\{0.061831153}
> next COSFloat\{0.070509767}
> next COSFloat\{-0.302021403}
>  next PDFOperator\{cm}
> next PDFOperator\{BI}
> next PDFOperator\{Q}
>  next PDFOperator\{Q}
> next COSName\{OC}
> next COSName\{eAkteOptionalContent7}
> next PDFOperator\{BDC}
> Moreover one "DP" Entry in the "BI" operator's COSDictionary contains 
> COSArray with COSNull values. However the assumption is, that the COSNull 
> values are not forbidden in the Pdf content. 
>  
> COSDictionary\{COSName{Interpolate}:true;COSName\{W}:COSInt\{35};COSName\{H}:COSInt\{26};COSName\{CS}:COSName\{RGB};COSName\{BPC}:COSInt\{8};COSName\{F}:COSArray\{[COSName{A85},COSName\{DCT}]};COSName\{DP}:COSArray{[COSNull{},COSNull{}]};}
> 2. Despite wrong content in the pdf-document (described above) the PDF-Box 
> api crashed by storing this operators into PDStream by his inability to 
> recognize COSNull in the method 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(Object)
>  
> The assumption on this place is, that the method "writeObject" forgot to 
> cover COSNull  as an valid input. The org.apache.pdfbox.cos.COSNull.NULL is 
> valid Object, which is broadly used by PDF-Api itself.
> The Method 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(Object)  PDF-Api 
> 2.0.8, also in 2.0.18 doesn't cover the COSNull  case in it's if/else 
> conditions, instead of it throws the new IOException( "Error:Unknown type in 
> content stream:" + o ). 
>  
> Could you confirm, that the method writeObject contains bug and should be 
> corrected to cover also COSNull Object? If so, in which version could we 
> expect the fix?
> Thank you
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4750) java.io.IOException: Error:Unknown type in content stream:COSNull{}

2020-01-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023560#comment-17023560
 ] 

ASF subversion and git services commented on PDFBOX-4750:
-

Commit 1873151 from Tilman Hausherr in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1873151 ]

PDFBOX-4750: use new Loader class

> java.io.IOException: Error:Unknown type in content stream:COSNull{}
> ---
>
> Key: PDFBOX-4750
> URL: https://issues.apache.org/jira/browse/PDFBOX-4750
> Project: PDFBox
>  Issue Type: Bug
>  Components: Writing
>Affects Versions: 2.0.8, 2.0.18
>Reporter: tomas kochan
>Assignee: Tilman Hausherr
>Priority: Major
> Fix For: 2.0.19, 3.0.0 PDFBox
>
> Attachments: 01 - K17 - Was dahinter steckt - dsb.pdf, 
> contentAllOperatorsOfCorruptedPage.txt
>
>
> By removing some optional content for specific document, which is bordered 
> with Operator BDC and EMC, we are facing an issue by writing the changed set 
> of tokens into PDStream. 
>  The code looks like:
>  PDStream updatedStream = new PDStream(document);
>  OutputStream out = updatedStream.getCOSObject().createRawOutputStream();
>  ContentStreamWriter tokenWriter = new ContentStreamWriter(out);
>  tokenWriter.writeTokens(result);
>  out.flush();
>  out.close();
>  page.setContents(updatedStream);
>   
>  The following exception occurs at line 'tokenWriter.writeTokens(result);' :
>  java.io.IOException: Error:Unknown type in content stream:COSNull{}
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(ContentStreamWriter.java:199)
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(ContentStreamWriter.java:146)
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(ContentStreamWriter.java:181)
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeTokens(ContentStreamWriter.java:109)
>  at 
> de.justiz.eip.pdf.tools.PdfContext.getOrRemoveOptionalTextContentfromPage(PdfContext.java:429)
>  at 
> de.justiz.eip.pdf.tools.paging.PagingInfoInterpreterPdfContext.removePagingInfo(PagingInfoInterpreterPdfContext.java:325)
>  
> After the analyze we figured out two issues:
>  1. We assume, the Pdf Document it's self is corrupted, It contains on some 
> place operator BI, which is based on the PDF-Reference-V1.7 a begin of inline 
> image object. This Operator is not followed by "ID" or "EI" operator. 
>  Extract from list of Tokens:
>  next PDFOperator\{Do}
> next COSFloat\{0.016674607}
> next COSInt\{0}
>  next COSInt\{0}
> next COSFloat\{0.061831153}
> next COSFloat\{0.070509767}
> next COSFloat\{-0.302021403}
>  next PDFOperator\{cm}
> next PDFOperator\{BI}
> next PDFOperator\{Q}
>  next PDFOperator\{Q}
> next COSName\{OC}
> next COSName\{eAkteOptionalContent7}
> next PDFOperator\{BDC}
> Moreover one "DP" Entry in the "BI" operator's COSDictionary contains 
> COSArray with COSNull values. However the assumption is, that the COSNull 
> values are not forbidden in the Pdf content. 
>  
> COSDictionary\{COSName{Interpolate}:true;COSName\{W}:COSInt\{35};COSName\{H}:COSInt\{26};COSName\{CS}:COSName\{RGB};COSName\{BPC}:COSInt\{8};COSName\{F}:COSArray\{[COSName{A85},COSName\{DCT}]};COSName\{DP}:COSArray{[COSNull{},COSNull{}]};}
> 2. Despite wrong content in the pdf-document (described above) the PDF-Box 
> api crashed by storing this operators into PDStream by his inability to 
> recognize COSNull in the method 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(Object)
>  
> The assumption on this place is, that the method "writeObject" forgot to 
> cover COSNull  as an valid input. The org.apache.pdfbox.cos.COSNull.NULL is 
> valid Object, which is broadly used by PDF-Api itself.
> The Method 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(Object)  PDF-Api 
> 2.0.8, also in 2.0.18 doesn't cover the COSNull  case in it's if/else 
> conditions, instead of it throws the new IOException( "Error:Unknown type in 
> content stream:" + o ). 
>  
> Could you confirm, that the method writeObject contains bug and should be 
> corrected to cover also COSNull Object? If so, in which version could we 
> expect the fix?
> Thank you
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Build failed in Jenkins: PDFBox-sonar2 #132

2020-01-25 Thread Apache Jenkins Server
See 


Changes:

[lehmi] PDFBOX-4569: move PDF load methods to Loader class

[lehmi] PDFBOX-4569: remove no longer needed method COSParser#getDocument

[tilman] PDFBOX-4750: add test


--
[...truncated 29.24 KB...]
[INFO] 
[INFO] --- maven-enforcer-plugin:3.0.0-M2:enforce (enforce-maven-version) @ 
xmpbox ---
[INFO] 
[INFO] --- jacoco-maven-plugin:0.8.5:prepare-agent (default) @ xmpbox ---
[INFO] surefireArgLine set to 
-javaagent:/home/jenkins/jenkins-slave/maven-repositories/1/org/jacoco/org.jacoco.agent/0.8.5/org.jacoco.agent-0.8.5-runtime.jar=destfile=
[WARNING] Failed to getClass for org.apache.maven.plugins.source.SourceJarMojo
[INFO] 
[INFO] <<< maven-source-plugin:3.0.1:jar (attach-sources) < generate-sources @ 
xmpbox <<<
[INFO] 
[INFO] 
[INFO] --- maven-source-plugin:3.0.1:jar (attach-sources) @ xmpbox ---
[INFO] Building jar: 

[INFO] 
[INFO] --< org.apache.pdfbox:pdfbox >--
[INFO] Building Apache PDFBox 3.0.0-SNAPSHOT [4/12]
[INFO] ---[ bundle ]---
[INFO] 
[INFO] --- maven-enforcer-plugin:3.0.0-M2:enforce (default) @ pdfbox ---
[INFO] 
[INFO] --- maven-enforcer-plugin:3.0.0-M2:enforce (enforce-maven-version) @ 
pdfbox ---
[INFO] 
[INFO] --- jacoco-maven-plugin:0.8.5:prepare-agent (default) @ pdfbox ---
[INFO] surefireArgLine set to 
-javaagent:/home/jenkins/jenkins-slave/maven-repositories/1/org/jacoco/org.jacoco.agent/0.8.5/org.jacoco.agent-0.8.5-runtime.jar=destfile=
[INFO] 
[INFO] --- maven-remote-resources-plugin:1.5:process (process-resource-bundles) 
@ pdfbox ---
[INFO] 
[INFO] --- maven-resources-plugin:3.1.0:resources (default-resources) @ pdfbox 
---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 1 resource
[INFO] Copying 21 resources
[INFO] Copying 3 resources
[INFO] 
[INFO] --- maven-compiler-plugin:3.7.0:compile (default-compile) @ pdfbox ---
[INFO] Changes detected - recompiling the module!
[INFO] Compiling 606 source files to 

[WARNING] 
:[192,18]
 getHeight(int) in org.apache.pdfbox.pdmodel.font.PDFontLike has been deprecated
[WARNING] 
:[429,18]
 getHeight(int) in org.apache.pdfbox.pdmodel.font.PDFontLike has been deprecated
[WARNING] 
:[431,30]
 getHeight(int) in org.apache.pdfbox.pdmodel.font.PDFontLike has been deprecated
[WARNING] 
:[446,18]
 getHeight(int) in org.apache.pdfbox.pdmodel.font.PDFontLike has been deprecated
[WARNING] 
:[395,18]
 getHeight(int) in org.apache.pdfbox.pdmodel.font.PDFontLike has been deprecated
[WARNING] 
:[340,18]
 getHeight(int) in org.apache.pdfbox.pdmodel.font.PDFontLike has been deprecated
[WARNING] 
:[317,18]
 getHeight(int) in org.apache.pdfbox.pdmodel.font.PDFontLike has been deprecated
[WARNING] 
:[304,18]
 getHeight(int) in org.apache.pdfbox.pdmodel.font.PDFontLike has been deprecated
[INFO] 
:
 Some input files use unchecked or unsafe operations.
[INFO] 
:
 Recompile with -Xlint:unchecked for details.
[INFO] 
[INFO] --- download-maven-plugin:1.3.0:wget (PDFBOX-1031-1) @ pdfbox ---
[INFO] File already exist, skipping
[INFO] 
[INFO] --- download-maven-plugin:1.3.0:wget (PDFBOX-1031-2) @ pdfbox ---
[INFO] File already exist, skipping
[INFO] 
[INFO] --- download-maven-plugin:1.3.0:wget (PDFBOX-1065-1) @ pdfbox ---
[INFO] File already exist, skipping
[INFO] 
[INFO] --- download-maven-plugin:1.3.0:wget 

Build failed in Jenkins: PDFBox-sonar2 » Apache PDFBox #132

2020-01-25 Thread Apache Jenkins Server
See 


Changes:

[lehmi] PDFBOX-4569: move PDF load methods to Loader class

[lehmi] PDFBOX-4569: remove no longer needed method COSParser#getDocument

[tilman] PDFBOX-4750: add test


--
[INFO] 
[INFO] --< org.apache.pdfbox:pdfbox >--
[INFO] Building Apache PDFBox 3.0.0-SNAPSHOT [4/12]
[INFO] ---[ bundle ]---
[INFO] 
[INFO] --- maven-enforcer-plugin:3.0.0-M2:enforce (default) @ pdfbox ---
[INFO] 
[INFO] --- maven-enforcer-plugin:3.0.0-M2:enforce (enforce-maven-version) @ 
pdfbox ---
[INFO] 
[INFO] --- jacoco-maven-plugin:0.8.5:prepare-agent (default) @ pdfbox ---
[INFO] surefireArgLine set to 
-javaagent:/home/jenkins/jenkins-slave/maven-repositories/1/org/jacoco/org.jacoco.agent/0.8.5/org.jacoco.agent-0.8.5-runtime.jar=destfile=
[INFO] 
[INFO] --- maven-remote-resources-plugin:1.5:process (process-resource-bundles) 
@ pdfbox ---
[INFO] 
[INFO] --- maven-resources-plugin:3.1.0:resources (default-resources) @ pdfbox 
---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 1 resource
[INFO] Copying 21 resources
[INFO] Copying 3 resources
[INFO] 
[INFO] --- maven-compiler-plugin:3.7.0:compile (default-compile) @ pdfbox ---
[INFO] Changes detected - recompiling the module!
[INFO] Compiling 606 source files to 

[WARNING] 
:[192,18]
 getHeight(int) in org.apache.pdfbox.pdmodel.font.PDFontLike has been deprecated
[WARNING] 
:[429,18]
 getHeight(int) in org.apache.pdfbox.pdmodel.font.PDFontLike has been deprecated
[WARNING] 
:[431,30]
 getHeight(int) in org.apache.pdfbox.pdmodel.font.PDFontLike has been deprecated
[WARNING] 
:[446,18]
 getHeight(int) in org.apache.pdfbox.pdmodel.font.PDFontLike has been deprecated
[WARNING] 
:[395,18]
 getHeight(int) in org.apache.pdfbox.pdmodel.font.PDFontLike has been deprecated
[WARNING] 
:[340,18]
 getHeight(int) in org.apache.pdfbox.pdmodel.font.PDFontLike has been deprecated
[WARNING] 
:[317,18]
 getHeight(int) in org.apache.pdfbox.pdmodel.font.PDFontLike has been deprecated
[WARNING] 
:[304,18]
 getHeight(int) in org.apache.pdfbox.pdmodel.font.PDFontLike has been deprecated
[INFO] 
:
 Some input files use unchecked or unsafe operations.
[INFO] 
:
 Recompile with -Xlint:unchecked for details.
[INFO] 
[INFO] --- download-maven-plugin:1.3.0:wget (PDFBOX-1031-1) @ pdfbox ---
[INFO] File already exist, skipping
[INFO] 
[INFO] --- download-maven-plugin:1.3.0:wget (PDFBOX-1031-2) @ pdfbox ---
[INFO] File already exist, skipping
[INFO] 
[INFO] --- download-maven-plugin:1.3.0:wget (PDFBOX-1065-1) @ pdfbox ---
[INFO] File already exist, skipping
[INFO] 
[INFO] --- download-maven-plugin:1.3.0:wget (PDFBOX-1065-2) @ pdfbox ---
[INFO] File already exist, skipping
[INFO] 
[INFO] --- download-maven-plugin:1.3.0:wget (PDFBOX-1100-1) @ pdfbox ---
[INFO] File already exist, skipping
[INFO] 
[INFO] --- download-maven-plugin:1.3.0:wget (PDFBOX-1100-2) @ pdfbox ---
[INFO] File already exist, skipping
[INFO] 
[INFO] --- download-maven-plugin:1.3.0:wget (PDFBOX-3208) @ pdfbox ---
[INFO] File already exist, skipping
[INFO] 
[INFO] --- download-maven-plugin:1.3.0:wget (PDFBOX-3656) @ pdfbox ---
[INFO] File already exist, skipping
[INFO] 
[INFO] --- download-maven-plugin:1.3.0:wget (PDFBOX-3682) @ pdfbox 

[jira] [Commented] (PDFBOX-4750) java.io.IOException: Error:Unknown type in content stream:COSNull{}

2020-01-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023552#comment-17023552
 ] 

ASF subversion and git services commented on PDFBOX-4750:
-

Commit 1873150 from Tilman Hausherr in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1873150 ]

PDFBOX-4750: add test

> java.io.IOException: Error:Unknown type in content stream:COSNull{}
> ---
>
> Key: PDFBOX-4750
> URL: https://issues.apache.org/jira/browse/PDFBOX-4750
> Project: PDFBox
>  Issue Type: Bug
>  Components: Writing
>Affects Versions: 2.0.8, 2.0.18
>Reporter: tomas kochan
>Assignee: Tilman Hausherr
>Priority: Major
> Fix For: 2.0.19, 3.0.0 PDFBox
>
> Attachments: 01 - K17 - Was dahinter steckt - dsb.pdf, 
> contentAllOperatorsOfCorruptedPage.txt
>
>
> By removing some optional content for specific document, which is bordered 
> with Operator BDC and EMC, we are facing an issue by writing the changed set 
> of tokens into PDStream. 
>  The code looks like:
>  PDStream updatedStream = new PDStream(document);
>  OutputStream out = updatedStream.getCOSObject().createRawOutputStream();
>  ContentStreamWriter tokenWriter = new ContentStreamWriter(out);
>  tokenWriter.writeTokens(result);
>  out.flush();
>  out.close();
>  page.setContents(updatedStream);
>   
>  The following exception occurs at line 'tokenWriter.writeTokens(result);' :
>  java.io.IOException: Error:Unknown type in content stream:COSNull{}
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(ContentStreamWriter.java:199)
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(ContentStreamWriter.java:146)
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(ContentStreamWriter.java:181)
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeTokens(ContentStreamWriter.java:109)
>  at 
> de.justiz.eip.pdf.tools.PdfContext.getOrRemoveOptionalTextContentfromPage(PdfContext.java:429)
>  at 
> de.justiz.eip.pdf.tools.paging.PagingInfoInterpreterPdfContext.removePagingInfo(PagingInfoInterpreterPdfContext.java:325)
>  
> After the analyze we figured out two issues:
>  1. We assume, the Pdf Document it's self is corrupted, It contains on some 
> place operator BI, which is based on the PDF-Reference-V1.7 a begin of inline 
> image object. This Operator is not followed by "ID" or "EI" operator. 
>  Extract from list of Tokens:
>  next PDFOperator\{Do}
> next COSFloat\{0.016674607}
> next COSInt\{0}
>  next COSInt\{0}
> next COSFloat\{0.061831153}
> next COSFloat\{0.070509767}
> next COSFloat\{-0.302021403}
>  next PDFOperator\{cm}
> next PDFOperator\{BI}
> next PDFOperator\{Q}
>  next PDFOperator\{Q}
> next COSName\{OC}
> next COSName\{eAkteOptionalContent7}
> next PDFOperator\{BDC}
> Moreover one "DP" Entry in the "BI" operator's COSDictionary contains 
> COSArray with COSNull values. However the assumption is, that the COSNull 
> values are not forbidden in the Pdf content. 
>  
> COSDictionary\{COSName{Interpolate}:true;COSName\{W}:COSInt\{35};COSName\{H}:COSInt\{26};COSName\{CS}:COSName\{RGB};COSName\{BPC}:COSInt\{8};COSName\{F}:COSArray\{[COSName{A85},COSName\{DCT}]};COSName\{DP}:COSArray{[COSNull{},COSNull{}]};}
> 2. Despite wrong content in the pdf-document (described above) the PDF-Box 
> api crashed by storing this operators into PDStream by his inability to 
> recognize COSNull in the method 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(Object)
>  
> The assumption on this place is, that the method "writeObject" forgot to 
> cover COSNull  as an valid input. The org.apache.pdfbox.cos.COSNull.NULL is 
> valid Object, which is broadly used by PDF-Api itself.
> The Method 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(Object)  PDF-Api 
> 2.0.8, also in 2.0.18 doesn't cover the COSNull  case in it's if/else 
> conditions, instead of it throws the new IOException( "Error:Unknown type in 
> content stream:" + o ). 
>  
> Could you confirm, that the method writeObject contains bug and should be 
> corrected to cover also COSNull Object? If so, in which version could we 
> expect the fix?
> Thank you
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4569) Implement an ondemand Parser

2020-01-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023551#comment-17023551
 ] 

ASF subversion and git services commented on PDFBOX-4569:
-

Commit 1873149 from le...@apache.org in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1873149 ]

PDFBOX-4569: remove no longer needed method COSParser#getDocument

> Implement an ondemand Parser
> 
>
> Key: PDFBOX-4569
> URL: https://issues.apache.org/jira/browse/PDFBOX-4569
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Parsing
>Affects Versions: 3.0.0 PDFBox
>Reporter: Andreas Lehmkühler
>Assignee: Andreas Lehmkühler
>Priority: Major
> Fix For: 3.0.0 PDFBox
>
> Attachments: PDFBOX-1084.pdf
>
>
> There is a need to replace the big bang parser with an ondemand parser



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-4750) java.io.IOException: Error:Unknown type in content stream:COSNull{}

2020-01-25 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023527#comment-17023527
 ] 

Tilman Hausherr edited comment on PDFBOX-4750 at 1/25/20 12:17 PM:
---

Snapshot available at
https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/2.0.19-SNAPSHOT/

I also did a test on the attached file, with this code:
{code}
try (PDDocument doc = PDFParser.load(new File("/PDFBOX-4750.pdf")))
{
PDFRenderer r = new PDFRenderer(doc);
for (int i = 0; i < doc.getNumberOfPages(); ++i)
{
BufferedImage bim1 = r.renderImageWithDPI(i, 96);
ImageIO.write(bim1, "png", new File("/in/PDFBOX-4750-saved.pdf-" + 
(i+1) + ".png"));
PDPage page = doc.getPage(i);
PDStream newContent = new PDStream(doc);
try (InputStream is = page.getContents();
 OutputStream os = 
newContent.createOutputStream(COSName.FLATE_DECODE))
{
PDFStreamParser parser = new PDFStreamParser(is);
parser.parse();
ContentStreamWriter tokenWriter = new ContentStreamWriter(os);
tokenWriter.writeTokens(parser.getTokens());
}
page.setContents(newContent);
}
doc.save(new File("/PDFBOX-4750-saved.pdf"));

}
TestPDFToImage testPDFToImage = new 
TestPDFToImage(TestPDFToImage.class.getName());
if (!testPDFToImage.doTestFile(new File("/PDFBOX-4750-saved.pdf"), 
"/in", "/out"))
{
fail("Rendering failed or is not identical");
}
{code}
and it succeeded. It also succeeded after the first commit. The following ones 
are for clarity and to prevent having too many unneeded spaces in the modified 
content stream.

You should use the test code above to create your own test. It creates a first 
rendering set in the "in" directory and then modifies the file, renders again 
in the "out" directory and then compares the renderings.


was (Author: tilman):
Snapshot available at
https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/2.0.19-SNAPSHOT/

I also did a test on the attached file, with this code:
{code}
try (PDDocument doc = PDFParser.load(new File("/PDFBOX-4750.pdf")))
{
PDFRenderer r = new PDFRenderer(doc);
for (int i = 0; i < doc.getNumberOfPages(); ++i)
{
BufferedImage bim1 = r.renderImageWithDPI(i, 96);
ImageIO.write(bim1, "png", new File("/in/PDFBOX-4750-saved.pdf-" + 
(i+1) + ".png"));
PDPage page = doc.getPage(i);
PDStream newContent = new PDStream(doc);
try (InputStream is = page.getContents();
 OutputStream os = 
newContent.createOutputStream(COSName.FLATE_DECODE))
{
PDFStreamParser parser = new PDFStreamParser(is);
parser.parse();
ContentStreamWriter tokenWriter = new ContentStreamWriter(os);
tokenWriter.writeTokens(parser.getTokens());
}
page.setContents(newContent);
}
doc.save(new File("/PDFBOX-4750-saved.pdf"));

}
TestPDFToImage testPDFToImage = new 
TestPDFToImage(TestPDFToImage.class.getName());
if (!testPDFToImage.doTestFile(new File("/PDFBOX-4750-saved.pdf"), 
"/in", "/out"))
{
fail("Rendering failed or is not identical");
}
{code}
and it succeeded. It also succeeded after the first commit. The following ones 
are for clarity and to prevent having too many unneeded spaces in the modified 
content stream. You should use the test code above to create your own test. It 
creates a first rendering set in the "in" directory and then modifies the file 
and then compares the renderings.

> java.io.IOException: Error:Unknown type in content stream:COSNull{}
> ---
>
> Key: PDFBOX-4750
> URL: https://issues.apache.org/jira/browse/PDFBOX-4750
> Project: PDFBox
>  Issue Type: Bug
>  Components: Writing
>Affects Versions: 2.0.8, 2.0.18
>Reporter: tomas kochan
>Assignee: Tilman Hausherr
>Priority: Major
> Fix For: 2.0.19, 3.0.0 PDFBox
>
> Attachments: 01 - K17 - Was dahinter steckt - dsb.pdf, 
> contentAllOperatorsOfCorruptedPage.txt
>
>
> By removing some optional content for specific document, which is bordered 
> with Operator BDC and EMC, we are facing an issue by writing the changed set 
> of tokens into PDStream. 
>  The code looks like:
>  PDStream updatedStream = new PDStream(document);
>  OutputStream out = updatedStream.getCOSObject().createRawOutputStream();
>  ContentStreamWriter tokenWriter = new ContentStreamWriter(out);
>  tokenWriter.writeTokens(result);
>  out.flush();
>  out.close();
>  page.setContents(updatedStream);
>   
>  The following exception occurs at line 'tokenWriter.writeTokens(result);' :
>  java.io.IOException: Error:Unknown type in 

[jira] [Updated] (PDFBOX-4751) Removing field / annotation does not work

2020-01-25 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-4751:

Fix Version/s: 3.0.0 PDFBox
   2.0.19

> Removing field / annotation does not work
> -
>
> Key: PDFBOX-4751
> URL: https://issues.apache.org/jira/browse/PDFBOX-4751
> Project: PDFBox
>  Issue Type: Bug
>  Components: AcroForm
>Affects Versions: 2.0.18
> Environment: macOS Mojave 10.14.6
>Reporter: Jesse
>Priority: Minor
> Fix For: 2.0.19, 3.0.0 PDFBox
>
>
> After upgrading to `2.0.18`, the following code doesn't work for me anymore:
> {code:java}
> private void removeField() throws IOException {
> PDAcroForm form = this.field.getAcroForm();
> List allFields = form.getFields();
> PDAnnotationWidget widget = field.getWidgets().get(0);
> List allAnnotations = widget.getPage().getAnnotations();
> System.out.println("BEFORE: Fields size: " + allFields.size() + "; Field 
> to array: " + COSArrayList.converterToCOSArray(allFields));
> for(PDField field : allFields) {
> if 
> (field.getFullyQualifiedName().equals(this.field.getFullyQualifiedName())) {
> allFields.remove(field);
> break;
> }
> }
> System.out.println("AFTER: Fields size: " + allFields.size() + "; Field 
> to array: " + COSArrayList.converterToCOSArray(allFields));
> System.out.println("BEFORE: Annots size: " + allAnnotations.size() + "; 
> Annots to array: " + COSArrayList.converterToCOSArray(allAnnotations));
> for(PDAnnotation annotation : allAnnotations) {
> if(annotation.getCOSObject().equals(widget.getCOSObject())) {
> allAnnotations.remove(annotation);
> break;
> }
> }
> System.out.println("AFTER: Annots size: " + allAnnotations.size() + "; 
> Annots to array: " + COSArrayList.converterToCOSArray(allAnnotations));
> }
> {code}
> For whatever reason, the array in `COSArrayList` is not updating. The 
> internal `ArrayList` is updating, but not the internal `COSArray`. Here is 
> some output from above that might helpful after trying to remove a field from 
> a pdf with a single field.
> {code:java}
> // 2.0.17
> BEFORE: Fields size: 1; Field to array: COSArray{[COSObject{10, 0}]}
> AFTER: Fields size: 0; Field to array: COSArray{[]}
> BEFORE: Annots size: 1; Annots to array: COSArray{[COSObject{10, 0}]}
> AFTER: Annots size: 0; Annots to array: COSArray{[]}
> // 2.0.18
> BEFORE: Fields size: 1; Field to array: COSArray{[COSObject{10, 0}]}
> AFTER: Fields size: 0; Field to array: COSArray{[COSObject{10, 0}]}
> BEFORE: Annots size: 1; Annots to array: COSArray{[COSObject{10, 0}]}
> AFTER: Annots size: 0; Annots to array: COSArray{[COSObject{10, 0}]}
> {code}
> I can definitely attach one of the PDFs I was working with if needed... But 
> this was happening with every PDF I tried it on, so I don't think it's 
> something special about the pdfs I was working with.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4751) Removing field / annotation does not work

2020-01-25 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023535#comment-17023535
 ] 

Tilman Hausherr commented on PDFBOX-4751:
-

targeting to 2.0.19 because that's obviously important.

> Removing field / annotation does not work
> -
>
> Key: PDFBOX-4751
> URL: https://issues.apache.org/jira/browse/PDFBOX-4751
> Project: PDFBox
>  Issue Type: Bug
>  Components: AcroForm
>Affects Versions: 2.0.18
> Environment: macOS Mojave 10.14.6
>Reporter: Jesse
>Priority: Minor
> Fix For: 2.0.19, 3.0.0 PDFBox
>
>
> After upgrading to `2.0.18`, the following code doesn't work for me anymore:
> {code:java}
> private void removeField() throws IOException {
> PDAcroForm form = this.field.getAcroForm();
> List allFields = form.getFields();
> PDAnnotationWidget widget = field.getWidgets().get(0);
> List allAnnotations = widget.getPage().getAnnotations();
> System.out.println("BEFORE: Fields size: " + allFields.size() + "; Field 
> to array: " + COSArrayList.converterToCOSArray(allFields));
> for(PDField field : allFields) {
> if 
> (field.getFullyQualifiedName().equals(this.field.getFullyQualifiedName())) {
> allFields.remove(field);
> break;
> }
> }
> System.out.println("AFTER: Fields size: " + allFields.size() + "; Field 
> to array: " + COSArrayList.converterToCOSArray(allFields));
> System.out.println("BEFORE: Annots size: " + allAnnotations.size() + "; 
> Annots to array: " + COSArrayList.converterToCOSArray(allAnnotations));
> for(PDAnnotation annotation : allAnnotations) {
> if(annotation.getCOSObject().equals(widget.getCOSObject())) {
> allAnnotations.remove(annotation);
> break;
> }
> }
> System.out.println("AFTER: Annots size: " + allAnnotations.size() + "; 
> Annots to array: " + COSArrayList.converterToCOSArray(allAnnotations));
> }
> {code}
> For whatever reason, the array in `COSArrayList` is not updating. The 
> internal `ArrayList` is updating, but not the internal `COSArray`. Here is 
> some output from above that might helpful after trying to remove a field from 
> a pdf with a single field.
> {code:java}
> // 2.0.17
> BEFORE: Fields size: 1; Field to array: COSArray{[COSObject{10, 0}]}
> AFTER: Fields size: 0; Field to array: COSArray{[]}
> BEFORE: Annots size: 1; Annots to array: COSArray{[COSObject{10, 0}]}
> AFTER: Annots size: 0; Annots to array: COSArray{[]}
> // 2.0.18
> BEFORE: Fields size: 1; Field to array: COSArray{[COSObject{10, 0}]}
> AFTER: Fields size: 0; Field to array: COSArray{[COSObject{10, 0}]}
> BEFORE: Annots size: 1; Annots to array: COSArray{[COSObject{10, 0}]}
> AFTER: Annots size: 0; Annots to array: COSArray{[COSObject{10, 0}]}
> {code}
> I can definitely attach one of the PDFs I was working with if needed... But 
> this was happening with every PDF I tried it on, so I don't think it's 
> something special about the pdfs I was working with.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4569) Implement an ondemand Parser

2020-01-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023534#comment-17023534
 ] 

ASF subversion and git services commented on PDFBOX-4569:
-

Commit 1873147 from le...@apache.org in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1873147 ]

PDFBOX-4569: move PDF load methods to Loader class

> Implement an ondemand Parser
> 
>
> Key: PDFBOX-4569
> URL: https://issues.apache.org/jira/browse/PDFBOX-4569
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Parsing
>Affects Versions: 3.0.0 PDFBox
>Reporter: Andreas Lehmkühler
>Assignee: Andreas Lehmkühler
>Priority: Major
> Fix For: 3.0.0 PDFBox
>
> Attachments: PDFBOX-1084.pdf
>
>
> There is a need to replace the big bang parser with an ondemand parser



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4071) Improve code quality (3)

2020-01-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023532#comment-17023532
 ] 

ASF subversion and git services commented on PDFBOX-4071:
-

Commit 1873146 from Tilman Hausherr in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1873146 ]

PDFBOX-4071: simplify code

> Improve code quality (3)
> 
>
> Key: PDFBOX-4071
> URL: https://issues.apache.org/jira/browse/PDFBOX-4071
> Project: PDFBox
>  Issue Type: Task
>Affects Versions: 2.0.8
>Reporter: Tilman Hausherr
>Priority: Major
> Attachments: pdfbox-screenshot-bad.png, pdfbox-screenshot-good.png
>
>
> This is a longterm issue for the task to improve code quality, by using the 
> [SonarQube 
> report|https://analysis.apache.org/dashboard/index/org.apache.pdfbox:pdfbox-reactor],
>  hints in different IDEs, the FindBugs tool and other code quality tools.
> This is a follow-up of PDFBOX-2852, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4071) Improve code quality (3)

2020-01-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023531#comment-17023531
 ] 

ASF subversion and git services commented on PDFBOX-4071:
-

Commit 1873145 from Tilman Hausherr in branch 'pdfbox/branches/issue45'
[ https://svn.apache.org/r1873145 ]

PDFBOX-4071: simplify code

> Improve code quality (3)
> 
>
> Key: PDFBOX-4071
> URL: https://issues.apache.org/jira/browse/PDFBOX-4071
> Project: PDFBox
>  Issue Type: Task
>Affects Versions: 2.0.8
>Reporter: Tilman Hausherr
>Priority: Major
> Attachments: pdfbox-screenshot-bad.png, pdfbox-screenshot-good.png
>
>
> This is a longterm issue for the task to improve code quality, by using the 
> [SonarQube 
> report|https://analysis.apache.org/dashboard/index/org.apache.pdfbox:pdfbox-reactor],
>  hints in different IDEs, the FindBugs tool and other code quality tools.
> This is a follow-up of PDFBOX-2852, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4071) Improve code quality (3)

2020-01-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023530#comment-17023530
 ] 

ASF subversion and git services commented on PDFBOX-4071:
-

Commit 1873144 from Tilman Hausherr in branch 'pdfbox/branches/2.0'
[ https://svn.apache.org/r1873144 ]

PDFBOX-4071: simplify code

> Improve code quality (3)
> 
>
> Key: PDFBOX-4071
> URL: https://issues.apache.org/jira/browse/PDFBOX-4071
> Project: PDFBox
>  Issue Type: Task
>Affects Versions: 2.0.8
>Reporter: Tilman Hausherr
>Priority: Major
> Attachments: pdfbox-screenshot-bad.png, pdfbox-screenshot-good.png
>
>
> This is a longterm issue for the task to improve code quality, by using the 
> [SonarQube 
> report|https://analysis.apache.org/dashboard/index/org.apache.pdfbox:pdfbox-reactor],
>  hints in different IDEs, the FindBugs tool and other code quality tools.
> This is a follow-up of PDFBOX-2852, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4071) Improve code quality (3)

2020-01-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023529#comment-17023529
 ] 

ASF subversion and git services commented on PDFBOX-4071:
-

Commit 1873142 from Tilman Hausherr in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1873142 ]

PDFBOX-4071: simplify code

> Improve code quality (3)
> 
>
> Key: PDFBOX-4071
> URL: https://issues.apache.org/jira/browse/PDFBOX-4071
> Project: PDFBox
>  Issue Type: Task
>Affects Versions: 2.0.8
>Reporter: Tilman Hausherr
>Priority: Major
> Attachments: pdfbox-screenshot-bad.png, pdfbox-screenshot-good.png
>
>
> This is a longterm issue for the task to improve code quality, by using the 
> [SonarQube 
> report|https://analysis.apache.org/dashboard/index/org.apache.pdfbox:pdfbox-reactor],
>  hints in different IDEs, the FindBugs tool and other code quality tools.
> This is a follow-up of PDFBOX-2852, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4071) Improve code quality (3)

2020-01-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023528#comment-17023528
 ] 

ASF subversion and git services commented on PDFBOX-4071:
-

Commit 1873141 from Tilman Hausherr in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1873141 ]

PDFBOX-4071: remove exception that isn't thrown

> Improve code quality (3)
> 
>
> Key: PDFBOX-4071
> URL: https://issues.apache.org/jira/browse/PDFBOX-4071
> Project: PDFBox
>  Issue Type: Task
>Affects Versions: 2.0.8
>Reporter: Tilman Hausherr
>Priority: Major
> Attachments: pdfbox-screenshot-bad.png, pdfbox-screenshot-good.png
>
>
> This is a longterm issue for the task to improve code quality, by using the 
> [SonarQube 
> report|https://analysis.apache.org/dashboard/index/org.apache.pdfbox:pdfbox-reactor],
>  hints in different IDEs, the FindBugs tool and other code quality tools.
> This is a follow-up of PDFBOX-2852, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4750) java.io.IOException: Error:Unknown type in content stream:COSNull{}

2020-01-25 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023527#comment-17023527
 ] 

Tilman Hausherr commented on PDFBOX-4750:
-

Snapshot available at
https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/2.0.19-SNAPSHOT/

I also did a test on the attached file, with this code:
{code}
try (PDDocument doc = PDFParser.load(new File("/PDFBOX-4750.pdf")))
{
PDFRenderer r = new PDFRenderer(doc);
for (int i = 0; i < doc.getNumberOfPages(); ++i)
{
BufferedImage bim1 = r.renderImageWithDPI(i, 96);
ImageIO.write(bim1, "png", new File("/in/PDFBOX-4750-saved.pdf-" + 
(i+1) + ".png"));
PDPage page = doc.getPage(i);
PDStream newContent = new PDStream(doc);
try (InputStream is = page.getContents();
 OutputStream os = 
newContent.createOutputStream(COSName.FLATE_DECODE))
{
PDFStreamParser parser = new PDFStreamParser(is);
parser.parse();
ContentStreamWriter tokenWriter = new ContentStreamWriter(os);
tokenWriter.writeTokens(parser.getTokens());
}
page.setContents(newContent);
}
doc.save(new File("/PDFBOX-4750-saved.pdf"));

}
TestPDFToImage testPDFToImage = new 
TestPDFToImage(TestPDFToImage.class.getName());
if (!testPDFToImage.doTestFile(new File("/PDFBOX-4750-saved.pdf"), 
"/in", "/out"))
{
fail("Rendering failed or is not identical");
}
{code}
and it succeeded. It also succeeded after the first commit. The following ones 
are for clarity and to prevent having too many unneeded spaces in the modified 
content stream. You should use the test code above to create your own test. It 
creates a first rendering set in the "in" directory and then modifies the file 
and then compares the renderings.

> java.io.IOException: Error:Unknown type in content stream:COSNull{}
> ---
>
> Key: PDFBOX-4750
> URL: https://issues.apache.org/jira/browse/PDFBOX-4750
> Project: PDFBox
>  Issue Type: Bug
>  Components: Writing
>Affects Versions: 2.0.8, 2.0.18
>Reporter: tomas kochan
>Assignee: Tilman Hausherr
>Priority: Major
> Fix For: 2.0.19, 3.0.0 PDFBox
>
> Attachments: 01 - K17 - Was dahinter steckt - dsb.pdf, 
> contentAllOperatorsOfCorruptedPage.txt
>
>
> By removing some optional content for specific document, which is bordered 
> with Operator BDC and EMC, we are facing an issue by writing the changed set 
> of tokens into PDStream. 
>  The code looks like:
>  PDStream updatedStream = new PDStream(document);
>  OutputStream out = updatedStream.getCOSObject().createRawOutputStream();
>  ContentStreamWriter tokenWriter = new ContentStreamWriter(out);
>  tokenWriter.writeTokens(result);
>  out.flush();
>  out.close();
>  page.setContents(updatedStream);
>   
>  The following exception occurs at line 'tokenWriter.writeTokens(result);' :
>  java.io.IOException: Error:Unknown type in content stream:COSNull{}
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(ContentStreamWriter.java:199)
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(ContentStreamWriter.java:146)
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(ContentStreamWriter.java:181)
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeTokens(ContentStreamWriter.java:109)
>  at 
> de.justiz.eip.pdf.tools.PdfContext.getOrRemoveOptionalTextContentfromPage(PdfContext.java:429)
>  at 
> de.justiz.eip.pdf.tools.paging.PagingInfoInterpreterPdfContext.removePagingInfo(PagingInfoInterpreterPdfContext.java:325)
>  
> After the analyze we figured out two issues:
>  1. We assume, the Pdf Document it's self is corrupted, It contains on some 
> place operator BI, which is based on the PDF-Reference-V1.7 a begin of inline 
> image object. This Operator is not followed by "ID" or "EI" operator. 
>  Extract from list of Tokens:
>  next PDFOperator\{Do}
> next COSFloat\{0.016674607}
> next COSInt\{0}
>  next COSInt\{0}
> next COSFloat\{0.061831153}
> next COSFloat\{0.070509767}
> next COSFloat\{-0.302021403}
>  next PDFOperator\{cm}
> next PDFOperator\{BI}
> next PDFOperator\{Q}
>  next PDFOperator\{Q}
> next COSName\{OC}
> next COSName\{eAkteOptionalContent7}
> next PDFOperator\{BDC}
> Moreover one "DP" Entry in the "BI" operator's COSDictionary contains 
> COSArray with COSNull values. However the assumption is, that the COSNull 
> values are not forbidden in the Pdf content. 
>  
> COSDictionary\{COSName{Interpolate}:true;COSName\{W}:COSInt\{35};COSName\{H}:COSInt\{26};COSName\{CS}:COSName\{RGB};COSName\{BPC}:COSInt\{8};COSName\{F}:COSArray\{[COSName{A85},COSName\{DCT}]};COSName\{DP}:COSArray{[COSNull{},COSNull{}]};}
> 2. Despite wrong 

[jira] [Commented] (PDFBOX-4569) Implement an ondemand Parser

2020-01-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023522#comment-17023522
 ] 

ASF subversion and git services commented on PDFBOX-4569:
-

Commit 1873140 from le...@apache.org in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1873140 ]

PDFBOX-4569: introduce Loader class, move FDF/XFDF load methods, refactor 
FDFParser

> Implement an ondemand Parser
> 
>
> Key: PDFBOX-4569
> URL: https://issues.apache.org/jira/browse/PDFBOX-4569
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Parsing
>Affects Versions: 3.0.0 PDFBox
>Reporter: Andreas Lehmkühler
>Assignee: Andreas Lehmkühler
>Priority: Major
> Fix For: 3.0.0 PDFBox
>
> Attachments: PDFBOX-1084.pdf
>
>
> There is a need to replace the big bang parser with an ondemand parser



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4750) java.io.IOException: Error:Unknown type in content stream:COSNull{}

2020-01-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023477#comment-17023477
 ] 

ASF subversion and git services commented on PDFBOX-4750:
-

Commit 1873136 from Tilman Hausherr in branch 'pdfbox/branches/2.0'
[ https://svn.apache.org/r1873136 ]

PDFBOX-4750: don't output space after writeObject() which does this itself; put 
space after "]"; remove space before "null"; put LF after "BI".

> java.io.IOException: Error:Unknown type in content stream:COSNull{}
> ---
>
> Key: PDFBOX-4750
> URL: https://issues.apache.org/jira/browse/PDFBOX-4750
> Project: PDFBox
>  Issue Type: Bug
>  Components: Writing
>Affects Versions: 2.0.8, 2.0.18
>Reporter: tomas kochan
>Assignee: Tilman Hausherr
>Priority: Major
> Fix For: 2.0.19, 3.0.0 PDFBox
>
> Attachments: 01 - K17 - Was dahinter steckt - dsb.pdf, 
> contentAllOperatorsOfCorruptedPage.txt
>
>
> By removing some optional content for specific document, which is bordered 
> with Operator BDC and EMC, we are facing an issue by writing the changed set 
> of tokens into PDStream. 
>  The code looks like:
>  PDStream updatedStream = new PDStream(document);
>  OutputStream out = updatedStream.getCOSObject().createRawOutputStream();
>  ContentStreamWriter tokenWriter = new ContentStreamWriter(out);
>  tokenWriter.writeTokens(result);
>  out.flush();
>  out.close();
>  page.setContents(updatedStream);
>   
>  The following exception occurs at line 'tokenWriter.writeTokens(result);' :
>  java.io.IOException: Error:Unknown type in content stream:COSNull{}
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(ContentStreamWriter.java:199)
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(ContentStreamWriter.java:146)
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(ContentStreamWriter.java:181)
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeTokens(ContentStreamWriter.java:109)
>  at 
> de.justiz.eip.pdf.tools.PdfContext.getOrRemoveOptionalTextContentfromPage(PdfContext.java:429)
>  at 
> de.justiz.eip.pdf.tools.paging.PagingInfoInterpreterPdfContext.removePagingInfo(PagingInfoInterpreterPdfContext.java:325)
>  
> After the analyze we figured out two issues:
>  1. We assume, the Pdf Document it's self is corrupted, It contains on some 
> place operator BI, which is based on the PDF-Reference-V1.7 a begin of inline 
> image object. This Operator is not followed by "ID" or "EI" operator. 
>  Extract from list of Tokens:
>  next PDFOperator\{Do}
> next COSFloat\{0.016674607}
> next COSInt\{0}
>  next COSInt\{0}
> next COSFloat\{0.061831153}
> next COSFloat\{0.070509767}
> next COSFloat\{-0.302021403}
>  next PDFOperator\{cm}
> next PDFOperator\{BI}
> next PDFOperator\{Q}
>  next PDFOperator\{Q}
> next COSName\{OC}
> next COSName\{eAkteOptionalContent7}
> next PDFOperator\{BDC}
> Moreover one "DP" Entry in the "BI" operator's COSDictionary contains 
> COSArray with COSNull values. However the assumption is, that the COSNull 
> values are not forbidden in the Pdf content. 
>  
> COSDictionary\{COSName{Interpolate}:true;COSName\{W}:COSInt\{35};COSName\{H}:COSInt\{26};COSName\{CS}:COSName\{RGB};COSName\{BPC}:COSInt\{8};COSName\{F}:COSArray\{[COSName{A85},COSName\{DCT}]};COSName\{DP}:COSArray{[COSNull{},COSNull{}]};}
> 2. Despite wrong content in the pdf-document (described above) the PDF-Box 
> api crashed by storing this operators into PDStream by his inability to 
> recognize COSNull in the method 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(Object)
>  
> The assumption on this place is, that the method "writeObject" forgot to 
> cover COSNull  as an valid input. The org.apache.pdfbox.cos.COSNull.NULL is 
> valid Object, which is broadly used by PDF-Api itself.
> The Method 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(Object)  PDF-Api 
> 2.0.8, also in 2.0.18 doesn't cover the COSNull  case in it's if/else 
> conditions, instead of it throws the new IOException( "Error:Unknown type in 
> content stream:" + o ). 
>  
> Could you confirm, that the method writeObject contains bug and should be 
> corrected to cover also COSNull Object? If so, in which version could we 
> expect the fix?
> Thank you
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4750) java.io.IOException: Error:Unknown type in content stream:COSNull{}

2020-01-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023480#comment-17023480
 ] 

ASF subversion and git services commented on PDFBOX-4750:
-

Commit 1873138 from Tilman Hausherr in branch 'pdfbox/branches/issue45'
[ https://svn.apache.org/r1873138 ]

PDFBOX-4750: don't output space after writeObject() which does this itself; put 
space after "]"; remove space before "null"; put LF after "BI".

> java.io.IOException: Error:Unknown type in content stream:COSNull{}
> ---
>
> Key: PDFBOX-4750
> URL: https://issues.apache.org/jira/browse/PDFBOX-4750
> Project: PDFBox
>  Issue Type: Bug
>  Components: Writing
>Affects Versions: 2.0.8, 2.0.18
>Reporter: tomas kochan
>Assignee: Tilman Hausherr
>Priority: Major
> Fix For: 2.0.19, 3.0.0 PDFBox
>
> Attachments: 01 - K17 - Was dahinter steckt - dsb.pdf, 
> contentAllOperatorsOfCorruptedPage.txt
>
>
> By removing some optional content for specific document, which is bordered 
> with Operator BDC and EMC, we are facing an issue by writing the changed set 
> of tokens into PDStream. 
>  The code looks like:
>  PDStream updatedStream = new PDStream(document);
>  OutputStream out = updatedStream.getCOSObject().createRawOutputStream();
>  ContentStreamWriter tokenWriter = new ContentStreamWriter(out);
>  tokenWriter.writeTokens(result);
>  out.flush();
>  out.close();
>  page.setContents(updatedStream);
>   
>  The following exception occurs at line 'tokenWriter.writeTokens(result);' :
>  java.io.IOException: Error:Unknown type in content stream:COSNull{}
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(ContentStreamWriter.java:199)
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(ContentStreamWriter.java:146)
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(ContentStreamWriter.java:181)
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeTokens(ContentStreamWriter.java:109)
>  at 
> de.justiz.eip.pdf.tools.PdfContext.getOrRemoveOptionalTextContentfromPage(PdfContext.java:429)
>  at 
> de.justiz.eip.pdf.tools.paging.PagingInfoInterpreterPdfContext.removePagingInfo(PagingInfoInterpreterPdfContext.java:325)
>  
> After the analyze we figured out two issues:
>  1. We assume, the Pdf Document it's self is corrupted, It contains on some 
> place operator BI, which is based on the PDF-Reference-V1.7 a begin of inline 
> image object. This Operator is not followed by "ID" or "EI" operator. 
>  Extract from list of Tokens:
>  next PDFOperator\{Do}
> next COSFloat\{0.016674607}
> next COSInt\{0}
>  next COSInt\{0}
> next COSFloat\{0.061831153}
> next COSFloat\{0.070509767}
> next COSFloat\{-0.302021403}
>  next PDFOperator\{cm}
> next PDFOperator\{BI}
> next PDFOperator\{Q}
>  next PDFOperator\{Q}
> next COSName\{OC}
> next COSName\{eAkteOptionalContent7}
> next PDFOperator\{BDC}
> Moreover one "DP" Entry in the "BI" operator's COSDictionary contains 
> COSArray with COSNull values. However the assumption is, that the COSNull 
> values are not forbidden in the Pdf content. 
>  
> COSDictionary\{COSName{Interpolate}:true;COSName\{W}:COSInt\{35};COSName\{H}:COSInt\{26};COSName\{CS}:COSName\{RGB};COSName\{BPC}:COSInt\{8};COSName\{F}:COSArray\{[COSName{A85},COSName\{DCT}]};COSName\{DP}:COSArray{[COSNull{},COSNull{}]};}
> 2. Despite wrong content in the pdf-document (described above) the PDF-Box 
> api crashed by storing this operators into PDStream by his inability to 
> recognize COSNull in the method 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(Object)
>  
> The assumption on this place is, that the method "writeObject" forgot to 
> cover COSNull  as an valid input. The org.apache.pdfbox.cos.COSNull.NULL is 
> valid Object, which is broadly used by PDF-Api itself.
> The Method 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(Object)  PDF-Api 
> 2.0.8, also in 2.0.18 doesn't cover the COSNull  case in it's if/else 
> conditions, instead of it throws the new IOException( "Error:Unknown type in 
> content stream:" + o ). 
>  
> Could you confirm, that the method writeObject contains bug and should be 
> corrected to cover also COSNull Object? If so, in which version could we 
> expect the fix?
> Thank you
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4750) java.io.IOException: Error:Unknown type in content stream:COSNull{}

2020-01-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023478#comment-17023478
 ] 

ASF subversion and git services commented on PDFBOX-4750:
-

Commit 1873137 from Tilman Hausherr in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1873137 ]

PDFBOX-4750: don't output space after writeObject() which does this itself; put 
space after "]"; remove space before "null"; put LF after "BI".

> java.io.IOException: Error:Unknown type in content stream:COSNull{}
> ---
>
> Key: PDFBOX-4750
> URL: https://issues.apache.org/jira/browse/PDFBOX-4750
> Project: PDFBox
>  Issue Type: Bug
>  Components: Writing
>Affects Versions: 2.0.8, 2.0.18
>Reporter: tomas kochan
>Assignee: Tilman Hausherr
>Priority: Major
> Fix For: 2.0.19, 3.0.0 PDFBox
>
> Attachments: 01 - K17 - Was dahinter steckt - dsb.pdf, 
> contentAllOperatorsOfCorruptedPage.txt
>
>
> By removing some optional content for specific document, which is bordered 
> with Operator BDC and EMC, we are facing an issue by writing the changed set 
> of tokens into PDStream. 
>  The code looks like:
>  PDStream updatedStream = new PDStream(document);
>  OutputStream out = updatedStream.getCOSObject().createRawOutputStream();
>  ContentStreamWriter tokenWriter = new ContentStreamWriter(out);
>  tokenWriter.writeTokens(result);
>  out.flush();
>  out.close();
>  page.setContents(updatedStream);
>   
>  The following exception occurs at line 'tokenWriter.writeTokens(result);' :
>  java.io.IOException: Error:Unknown type in content stream:COSNull{}
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(ContentStreamWriter.java:199)
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(ContentStreamWriter.java:146)
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(ContentStreamWriter.java:181)
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeTokens(ContentStreamWriter.java:109)
>  at 
> de.justiz.eip.pdf.tools.PdfContext.getOrRemoveOptionalTextContentfromPage(PdfContext.java:429)
>  at 
> de.justiz.eip.pdf.tools.paging.PagingInfoInterpreterPdfContext.removePagingInfo(PagingInfoInterpreterPdfContext.java:325)
>  
> After the analyze we figured out two issues:
>  1. We assume, the Pdf Document it's self is corrupted, It contains on some 
> place operator BI, which is based on the PDF-Reference-V1.7 a begin of inline 
> image object. This Operator is not followed by "ID" or "EI" operator. 
>  Extract from list of Tokens:
>  next PDFOperator\{Do}
> next COSFloat\{0.016674607}
> next COSInt\{0}
>  next COSInt\{0}
> next COSFloat\{0.061831153}
> next COSFloat\{0.070509767}
> next COSFloat\{-0.302021403}
>  next PDFOperator\{cm}
> next PDFOperator\{BI}
> next PDFOperator\{Q}
>  next PDFOperator\{Q}
> next COSName\{OC}
> next COSName\{eAkteOptionalContent7}
> next PDFOperator\{BDC}
> Moreover one "DP" Entry in the "BI" operator's COSDictionary contains 
> COSArray with COSNull values. However the assumption is, that the COSNull 
> values are not forbidden in the Pdf content. 
>  
> COSDictionary\{COSName{Interpolate}:true;COSName\{W}:COSInt\{35};COSName\{H}:COSInt\{26};COSName\{CS}:COSName\{RGB};COSName\{BPC}:COSInt\{8};COSName\{F}:COSArray\{[COSName{A85},COSName\{DCT}]};COSName\{DP}:COSArray{[COSNull{},COSNull{}]};}
> 2. Despite wrong content in the pdf-document (described above) the PDF-Box 
> api crashed by storing this operators into PDStream by his inability to 
> recognize COSNull in the method 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(Object)
>  
> The assumption on this place is, that the method "writeObject" forgot to 
> cover COSNull  as an valid input. The org.apache.pdfbox.cos.COSNull.NULL is 
> valid Object, which is broadly used by PDF-Api itself.
> The Method 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(Object)  PDF-Api 
> 2.0.8, also in 2.0.18 doesn't cover the COSNull  case in it's if/else 
> conditions, instead of it throws the new IOException( "Error:Unknown type in 
> content stream:" + o ). 
>  
> Could you confirm, that the method writeObject contains bug and should be 
> corrected to cover also COSNull Object? If so, in which version could we 
> expect the fix?
> Thank you
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4750) java.io.IOException: Error:Unknown type in content stream:COSNull{}

2020-01-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023468#comment-17023468
 ] 

ASF subversion and git services commented on PDFBOX-4750:
-

Commit 1873135 from Tilman Hausherr in branch 'pdfbox/branches/issue45'
[ https://svn.apache.org/r1873135 ]

PDFBOX-4750: allow to rewrite null object, as suggested by Tomas Kochan

> java.io.IOException: Error:Unknown type in content stream:COSNull{}
> ---
>
> Key: PDFBOX-4750
> URL: https://issues.apache.org/jira/browse/PDFBOX-4750
> Project: PDFBox
>  Issue Type: Bug
>  Components: Writing
>Affects Versions: 2.0.8, 2.0.18
>Reporter: tomas kochan
>Assignee: Tilman Hausherr
>Priority: Major
> Fix For: 2.0.19, 3.0.0 PDFBox
>
> Attachments: 01 - K17 - Was dahinter steckt - dsb.pdf, 
> contentAllOperatorsOfCorruptedPage.txt
>
>
> By removing some optional content for specific document, which is bordered 
> with Operator BDC and EMC, we are facing an issue by writing the changed set 
> of tokens into PDStream. 
>  The code looks like:
>  PDStream updatedStream = new PDStream(document);
>  OutputStream out = updatedStream.getCOSObject().createRawOutputStream();
>  ContentStreamWriter tokenWriter = new ContentStreamWriter(out);
>  tokenWriter.writeTokens(result);
>  out.flush();
>  out.close();
>  page.setContents(updatedStream);
>   
>  The following exception occurs at line 'tokenWriter.writeTokens(result);' :
>  java.io.IOException: Error:Unknown type in content stream:COSNull{}
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(ContentStreamWriter.java:199)
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(ContentStreamWriter.java:146)
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(ContentStreamWriter.java:181)
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeTokens(ContentStreamWriter.java:109)
>  at 
> de.justiz.eip.pdf.tools.PdfContext.getOrRemoveOptionalTextContentfromPage(PdfContext.java:429)
>  at 
> de.justiz.eip.pdf.tools.paging.PagingInfoInterpreterPdfContext.removePagingInfo(PagingInfoInterpreterPdfContext.java:325)
>  
> After the analyze we figured out two issues:
>  1. We assume, the Pdf Document it's self is corrupted, It contains on some 
> place operator BI, which is based on the PDF-Reference-V1.7 a begin of inline 
> image object. This Operator is not followed by "ID" or "EI" operator. 
>  Extract from list of Tokens:
>  next PDFOperator\{Do}
> next COSFloat\{0.016674607}
> next COSInt\{0}
>  next COSInt\{0}
> next COSFloat\{0.061831153}
> next COSFloat\{0.070509767}
> next COSFloat\{-0.302021403}
>  next PDFOperator\{cm}
> next PDFOperator\{BI}
> next PDFOperator\{Q}
>  next PDFOperator\{Q}
> next COSName\{OC}
> next COSName\{eAkteOptionalContent7}
> next PDFOperator\{BDC}
> Moreover one "DP" Entry in the "BI" operator's COSDictionary contains 
> COSArray with COSNull values. However the assumption is, that the COSNull 
> values are not forbidden in the Pdf content. 
>  
> COSDictionary\{COSName{Interpolate}:true;COSName\{W}:COSInt\{35};COSName\{H}:COSInt\{26};COSName\{CS}:COSName\{RGB};COSName\{BPC}:COSInt\{8};COSName\{F}:COSArray\{[COSName{A85},COSName\{DCT}]};COSName\{DP}:COSArray{[COSNull{},COSNull{}]};}
> 2. Despite wrong content in the pdf-document (described above) the PDF-Box 
> api crashed by storing this operators into PDStream by his inability to 
> recognize COSNull in the method 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(Object)
>  
> The assumption on this place is, that the method "writeObject" forgot to 
> cover COSNull  as an valid input. The org.apache.pdfbox.cos.COSNull.NULL is 
> valid Object, which is broadly used by PDF-Api itself.
> The Method 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(Object)  PDF-Api 
> 2.0.8, also in 2.0.18 doesn't cover the COSNull  case in it's if/else 
> conditions, instead of it throws the new IOException( "Error:Unknown type in 
> content stream:" + o ). 
>  
> Could you confirm, that the method writeObject contains bug and should be 
> corrected to cover also COSNull Object? If so, in which version could we 
> expect the fix?
> Thank you
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4750) java.io.IOException: Error:Unknown type in content stream:COSNull{}

2020-01-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023467#comment-17023467
 ] 

ASF subversion and git services commented on PDFBOX-4750:
-

Commit 1873134 from Tilman Hausherr in branch 'pdfbox/branches/2.0'
[ https://svn.apache.org/r1873134 ]

PDFBOX-4750: allow to rewrite null object, as suggested by Tomas Kochan

> java.io.IOException: Error:Unknown type in content stream:COSNull{}
> ---
>
> Key: PDFBOX-4750
> URL: https://issues.apache.org/jira/browse/PDFBOX-4750
> Project: PDFBox
>  Issue Type: Bug
>  Components: Writing
>Affects Versions: 2.0.8, 2.0.18
>Reporter: tomas kochan
>Assignee: Tilman Hausherr
>Priority: Major
> Fix For: 2.0.19, 3.0.0 PDFBox
>
> Attachments: 01 - K17 - Was dahinter steckt - dsb.pdf, 
> contentAllOperatorsOfCorruptedPage.txt
>
>
> By removing some optional content for specific document, which is bordered 
> with Operator BDC and EMC, we are facing an issue by writing the changed set 
> of tokens into PDStream. 
>  The code looks like:
>  PDStream updatedStream = new PDStream(document);
>  OutputStream out = updatedStream.getCOSObject().createRawOutputStream();
>  ContentStreamWriter tokenWriter = new ContentStreamWriter(out);
>  tokenWriter.writeTokens(result);
>  out.flush();
>  out.close();
>  page.setContents(updatedStream);
>   
>  The following exception occurs at line 'tokenWriter.writeTokens(result);' :
>  java.io.IOException: Error:Unknown type in content stream:COSNull{}
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(ContentStreamWriter.java:199)
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(ContentStreamWriter.java:146)
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(ContentStreamWriter.java:181)
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeTokens(ContentStreamWriter.java:109)
>  at 
> de.justiz.eip.pdf.tools.PdfContext.getOrRemoveOptionalTextContentfromPage(PdfContext.java:429)
>  at 
> de.justiz.eip.pdf.tools.paging.PagingInfoInterpreterPdfContext.removePagingInfo(PagingInfoInterpreterPdfContext.java:325)
>  
> After the analyze we figured out two issues:
>  1. We assume, the Pdf Document it's self is corrupted, It contains on some 
> place operator BI, which is based on the PDF-Reference-V1.7 a begin of inline 
> image object. This Operator is not followed by "ID" or "EI" operator. 
>  Extract from list of Tokens:
>  next PDFOperator\{Do}
> next COSFloat\{0.016674607}
> next COSInt\{0}
>  next COSInt\{0}
> next COSFloat\{0.061831153}
> next COSFloat\{0.070509767}
> next COSFloat\{-0.302021403}
>  next PDFOperator\{cm}
> next PDFOperator\{BI}
> next PDFOperator\{Q}
>  next PDFOperator\{Q}
> next COSName\{OC}
> next COSName\{eAkteOptionalContent7}
> next PDFOperator\{BDC}
> Moreover one "DP" Entry in the "BI" operator's COSDictionary contains 
> COSArray with COSNull values. However the assumption is, that the COSNull 
> values are not forbidden in the Pdf content. 
>  
> COSDictionary\{COSName{Interpolate}:true;COSName\{W}:COSInt\{35};COSName\{H}:COSInt\{26};COSName\{CS}:COSName\{RGB};COSName\{BPC}:COSInt\{8};COSName\{F}:COSArray\{[COSName{A85},COSName\{DCT}]};COSName\{DP}:COSArray{[COSNull{},COSNull{}]};}
> 2. Despite wrong content in the pdf-document (described above) the PDF-Box 
> api crashed by storing this operators into PDStream by his inability to 
> recognize COSNull in the method 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(Object)
>  
> The assumption on this place is, that the method "writeObject" forgot to 
> cover COSNull  as an valid input. The org.apache.pdfbox.cos.COSNull.NULL is 
> valid Object, which is broadly used by PDF-Api itself.
> The Method 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(Object)  PDF-Api 
> 2.0.8, also in 2.0.18 doesn't cover the COSNull  case in it's if/else 
> conditions, instead of it throws the new IOException( "Error:Unknown type in 
> content stream:" + o ). 
>  
> Could you confirm, that the method writeObject contains bug and should be 
> corrected to cover also COSNull Object? If so, in which version could we 
> expect the fix?
> Thank you
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4750) java.io.IOException: Error:Unknown type in content stream:COSNull{}

2020-01-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023466#comment-17023466
 ] 

ASF subversion and git services commented on PDFBOX-4750:
-

Commit 1873133 from Tilman Hausherr in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1873133 ]

PDFBOX-4750: allow to rewrite null object, as suggested by Tomas Kochan

> java.io.IOException: Error:Unknown type in content stream:COSNull{}
> ---
>
> Key: PDFBOX-4750
> URL: https://issues.apache.org/jira/browse/PDFBOX-4750
> Project: PDFBox
>  Issue Type: Bug
>  Components: Writing
>Affects Versions: 2.0.8, 2.0.18
>Reporter: tomas kochan
>Assignee: Tilman Hausherr
>Priority: Major
> Fix For: 2.0.19, 3.0.0 PDFBox
>
> Attachments: 01 - K17 - Was dahinter steckt - dsb.pdf, 
> contentAllOperatorsOfCorruptedPage.txt
>
>
> By removing some optional content for specific document, which is bordered 
> with Operator BDC and EMC, we are facing an issue by writing the changed set 
> of tokens into PDStream. 
>  The code looks like:
>  PDStream updatedStream = new PDStream(document);
>  OutputStream out = updatedStream.getCOSObject().createRawOutputStream();
>  ContentStreamWriter tokenWriter = new ContentStreamWriter(out);
>  tokenWriter.writeTokens(result);
>  out.flush();
>  out.close();
>  page.setContents(updatedStream);
>   
>  The following exception occurs at line 'tokenWriter.writeTokens(result);' :
>  java.io.IOException: Error:Unknown type in content stream:COSNull{}
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(ContentStreamWriter.java:199)
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(ContentStreamWriter.java:146)
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(ContentStreamWriter.java:181)
>  at 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeTokens(ContentStreamWriter.java:109)
>  at 
> de.justiz.eip.pdf.tools.PdfContext.getOrRemoveOptionalTextContentfromPage(PdfContext.java:429)
>  at 
> de.justiz.eip.pdf.tools.paging.PagingInfoInterpreterPdfContext.removePagingInfo(PagingInfoInterpreterPdfContext.java:325)
>  
> After the analyze we figured out two issues:
>  1. We assume, the Pdf Document it's self is corrupted, It contains on some 
> place operator BI, which is based on the PDF-Reference-V1.7 a begin of inline 
> image object. This Operator is not followed by "ID" or "EI" operator. 
>  Extract from list of Tokens:
>  next PDFOperator\{Do}
> next COSFloat\{0.016674607}
> next COSInt\{0}
>  next COSInt\{0}
> next COSFloat\{0.061831153}
> next COSFloat\{0.070509767}
> next COSFloat\{-0.302021403}
>  next PDFOperator\{cm}
> next PDFOperator\{BI}
> next PDFOperator\{Q}
>  next PDFOperator\{Q}
> next COSName\{OC}
> next COSName\{eAkteOptionalContent7}
> next PDFOperator\{BDC}
> Moreover one "DP" Entry in the "BI" operator's COSDictionary contains 
> COSArray with COSNull values. However the assumption is, that the COSNull 
> values are not forbidden in the Pdf content. 
>  
> COSDictionary\{COSName{Interpolate}:true;COSName\{W}:COSInt\{35};COSName\{H}:COSInt\{26};COSName\{CS}:COSName\{RGB};COSName\{BPC}:COSInt\{8};COSName\{F}:COSArray\{[COSName{A85},COSName\{DCT}]};COSName\{DP}:COSArray{[COSNull{},COSNull{}]};}
> 2. Despite wrong content in the pdf-document (described above) the PDF-Box 
> api crashed by storing this operators into PDStream by his inability to 
> recognize COSNull in the method 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(Object)
>  
> The assumption on this place is, that the method "writeObject" forgot to 
> cover COSNull  as an valid input. The org.apache.pdfbox.cos.COSNull.NULL is 
> valid Object, which is broadly used by PDF-Api itself.
> The Method 
> org.apache.pdfbox.pdfwriter.ContentStreamWriter.writeObject(Object)  PDF-Api 
> 2.0.8, also in 2.0.18 doesn't cover the COSNull  case in it's if/else 
> conditions, instead of it throws the new IOException( "Error:Unknown type in 
> content stream:" + o ). 
>  
> Could you confirm, that the method writeObject contains bug and should be 
> corrected to cover also COSNull Object? If so, in which version could we 
> expect the fix?
> Thank you
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org