[jira] [Commented] (PDFBOX-5209) Using Chinese character make the file size increases

2021-06-08 Thread LI MING (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17359746#comment-17359746
 ] 

LI MING commented on PDFBOX-5209:
-

Ok,I saw the updated remark,thanks.expect some methods to solve it in the 
future.

> Using Chinese character make the file size increases 
> -
>
> Key: PDFBOX-5209
> URL: https://issues.apache.org/jira/browse/PDFBOX-5209
> Project: PDFBox
>  Issue Type: Improvement
>  Components: AcroForm
>Affects Versions: 2.0.15
> Environment: java jdk 1.8
>Reporter: LI MING
>Priority: Blocker
>  Labels: FileSize
>
> Like the title,we use Chinese Character to generate PDF form file ,it is 
> successed.but the file size is larger than 10mb.except change the font 
> file,Is there any other way we can solve this problem?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5209) Using Chinese character make the file size increases

2021-06-08 Thread LI MING (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LI MING updated PDFBOX-5209:

Issue Type: Improvement  (was: Bug)

> Using Chinese character make the file size increases 
> -
>
> Key: PDFBOX-5209
> URL: https://issues.apache.org/jira/browse/PDFBOX-5209
> Project: PDFBox
>  Issue Type: Improvement
>  Components: AcroForm
>Affects Versions: 2.0.15
> Environment: java jdk 1.8
>Reporter: LI MING
>Priority: Blocker
>  Labels: FileSize
>
> Like the title,we use Chinese Character to generate PDF form file ,it is 
> successed.but the file size is larger than 10mb.except change the font 
> file,Is there any other way we can solve this problem?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5209) Using Chinese character make the file size increases

2021-06-08 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17359737#comment-17359737
 ] 

Tilman Hausherr commented on PDFBOX-5209:
-

(This is related to the last 4 remarks of PDFBOX-4629. Using a subsetted font 
for acroform doesn't work, because the appearance content stream doesn't know 
its own PDDocument so it isn't put into the subset list, so the font ends up 
not being embedded at all)

> Using Chinese character make the file size increases 
> -
>
> Key: PDFBOX-5209
> URL: https://issues.apache.org/jira/browse/PDFBOX-5209
> Project: PDFBox
>  Issue Type: Bug
>  Components: AcroForm
>Affects Versions: 2.0.15
> Environment: java jdk 1.8
>Reporter: LI MING
>Priority: Blocker
>  Labels: FileSize
>
> Like the title,we use Chinese Character to generate PDF form file ,it is 
> successed.but the file size is larger than 10mb.except change the font 
> file,Is there any other way we can solve this problem?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5209) Using Chinese character make the file size increases

2021-06-08 Thread LI MING (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17359735#comment-17359735
 ] 

LI MING commented on PDFBOX-5209:
-

it is a little similar to this question in the comments

> Using Chinese character make the file size increases 
> -
>
> Key: PDFBOX-5209
> URL: https://issues.apache.org/jira/browse/PDFBOX-5209
> Project: PDFBox
>  Issue Type: Bug
>  Components: AcroForm
>Affects Versions: 2.0.15
> Environment: java jdk 1.8
>Reporter: LI MING
>Priority: Blocker
>  Labels: FileSize
>
> Like the title,we use Chinese Character to generate PDF form file ,it is 
> successed.but the file size is larger than 10mb.except change the font 
> file,Is there any other way we can solve this problem?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-5209) Using Chinese character make the file size increases

2021-06-08 Thread LI MING (Jira)
LI MING created PDFBOX-5209:
---

 Summary: Using Chinese character make the file size increases 
 Key: PDFBOX-5209
 URL: https://issues.apache.org/jira/browse/PDFBOX-5209
 Project: PDFBox
  Issue Type: Bug
  Components: AcroForm
Affects Versions: 2.0.15
 Environment: java jdk 1.8
Reporter: LI MING


Like the title,we use Chinese Character to generate PDF form file ,it is 
successed.but the file size is larger than 10mb.except change the font file,Is 
there any other way we can solve this problem?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4629) WARNING: Using fallback font 'LiberationSans' for 'LiberationSans'

2021-06-08 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17359731#comment-17359731
 ] 

Tilman Hausherr commented on PDFBOX-4629:
-

Use a smaller font :-(

> WARNING: Using fallback font 'LiberationSans' for 'LiberationSans'
> --
>
> Key: PDFBOX-4629
> URL: https://issues.apache.org/jira/browse/PDFBOX-4629
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.15, 2.0.16
>Reporter: Nina Sutter
>Priority: Minor
> Attachments: image-2019-08-12-08-39-13-652.png, 
> image-2019-08-12-08-43-29-117.png, image-2019-08-12-08-44-58-434.png, 
> image-2019-08-12-09-25-11-355.png, image-2019-08-12-09-27-27-416.png
>
>
> Hey everyone,
> I use pdfbox version 2.0.15 deployed on AWS Lambda.
> My template is done in LibreOffice and the fields are set-up in font 
> Liberation Sans.
> When I fill the fields in the pdf I get the following log message on 
> CloudWatch:
> WARNING: Using fallback font 'LiberationSans' for 'LiberationSans'
>  
> For me this message doesn't tell me what the actual problem is. The pdf still 
> gets filled with the required data, however this message puzzles me. Maybe 
> you can help me understanding the issue.
>  
>  
> Best regards



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4629) WARNING: Using fallback font 'LiberationSans' for 'LiberationSans'

2021-06-08 Thread LI MING (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17359729#comment-17359729
 ] 

LI MING commented on PDFBOX-4629:
-

Excuse me,About the question that the file rendered pdf is larger than 10MB,Is 
there any way to solve it?

> WARNING: Using fallback font 'LiberationSans' for 'LiberationSans'
> --
>
> Key: PDFBOX-4629
> URL: https://issues.apache.org/jira/browse/PDFBOX-4629
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.15, 2.0.16
>Reporter: Nina Sutter
>Priority: Minor
> Attachments: image-2019-08-12-08-39-13-652.png, 
> image-2019-08-12-08-43-29-117.png, image-2019-08-12-08-44-58-434.png, 
> image-2019-08-12-09-25-11-355.png, image-2019-08-12-09-27-27-416.png
>
>
> Hey everyone,
> I use pdfbox version 2.0.15 deployed on AWS Lambda.
> My template is done in LibreOffice and the fields are set-up in font 
> Liberation Sans.
> When I fill the fields in the pdf I get the following log message on 
> CloudWatch:
> WARNING: Using fallback font 'LiberationSans' for 'LiberationSans'
>  
> For me this message doesn't tell me what the actual problem is. The pdf still 
> gets filled with the required data, however this message puzzles me. Maybe 
> you can help me understanding the issue.
>  
>  
> Best regards



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4892) Improve code quality (4)

2021-06-08 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17359502#comment-17359502
 ] 

ASF subversion and git services commented on PDFBOX-4892:
-

Commit 1890618 from Tilman Hausherr in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1890618 ]

PDFBOX-4892: optimize, as suggested by valerybokov

> Improve code quality (4)
> 
>
> Key: PDFBOX-4892
> URL: https://issues.apache.org/jira/browse/PDFBOX-4892
> Project: PDFBox
>  Issue Type: Improvement
>Affects Versions: 2.0.20
>Reporter: Tilman Hausherr
>Priority: Minor
>
> This is a longterm issue for the task to improve code quality, by using the 
> [SonarQube report|https://sonarcloud.io/project/issues?id=pdfbox-reactor], 
> hints in different IDEs, the FindBugs tool and other code quality tools.
> This is a follow-up of PDFBOX-4071, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4892) Improve code quality (4)

2021-06-08 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17359501#comment-17359501
 ] 

ASF subversion and git services commented on PDFBOX-4892:
-

Commit 1890617 from Tilman Hausherr in branch 'pdfbox/branches/2.0'
[ https://svn.apache.org/r1890617 ]

PDFBOX-4892: optimize, as suggested by valerybokov

> Improve code quality (4)
> 
>
> Key: PDFBOX-4892
> URL: https://issues.apache.org/jira/browse/PDFBOX-4892
> Project: PDFBox
>  Issue Type: Improvement
>Affects Versions: 2.0.20
>Reporter: Tilman Hausherr
>Priority: Minor
>
> This is a longterm issue for the task to improve code quality, by using the 
> [SonarQube report|https://sonarcloud.io/project/issues?id=pdfbox-reactor], 
> hints in different IDEs, the FindBugs tool and other code quality tools.
> This is a follow-up of PDFBOX-4071, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4892) Improve code quality (4)

2021-06-08 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17359477#comment-17359477
 ] 

ASF subversion and git services commented on PDFBOX-4892:
-

Commit 1890615 from Tilman Hausherr in branch 'pdfbox/branches/2.0'
[ https://svn.apache.org/r1890615 ]

PDFBOX-4892: optimize, as suggested by valerybokov

> Improve code quality (4)
> 
>
> Key: PDFBOX-4892
> URL: https://issues.apache.org/jira/browse/PDFBOX-4892
> Project: PDFBox
>  Issue Type: Improvement
>Affects Versions: 2.0.20
>Reporter: Tilman Hausherr
>Priority: Minor
>
> This is a longterm issue for the task to improve code quality, by using the 
> [SonarQube report|https://sonarcloud.io/project/issues?id=pdfbox-reactor], 
> hints in different IDEs, the FindBugs tool and other code quality tools.
> This is a follow-up of PDFBOX-4071, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4892) Improve code quality (4)

2021-06-08 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17359478#comment-17359478
 ] 

ASF subversion and git services commented on PDFBOX-4892:
-

Commit 1890616 from Tilman Hausherr in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1890616 ]

PDFBOX-4892: optimize, as suggested by valerybokov

> Improve code quality (4)
> 
>
> Key: PDFBOX-4892
> URL: https://issues.apache.org/jira/browse/PDFBOX-4892
> Project: PDFBox
>  Issue Type: Improvement
>Affects Versions: 2.0.20
>Reporter: Tilman Hausherr
>Priority: Minor
>
> This is a longterm issue for the task to improve code quality, by using the 
> [SonarQube report|https://sonarcloud.io/project/issues?id=pdfbox-reactor], 
> hints in different IDEs, the FindBugs tool and other code quality tools.
> This is a follow-up of PDFBOX-4071, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: [VOTE] Release Apache PDFBox 2.0.24

2021-06-08 Thread Tilman Hausherr

Thanks!

+1

Tilman

Am 07.06.2021 um 18:51 schrieb Andreas Lehmkuehler:

Hi,

a candidate for the PDFBox 2.0.24 release is available at:

    https://dist.apache.org/repos/dist/dev/pdfbox/2.0.24/

The release candidate is a zip archive of the sources in:

    http://svn.apache.org/repos/asf/pdfbox/tags/2.0.24/

The SHA-512 checksum of the archive is 
5d55b3cadbbae266d90c47f5b10c9b09b6dc16f53b77a0cf15c78e62fc69afc7b6eab5a4329608ecdf25de9194b38db1f7d23e7d71af473cc1bf7b09b0028642.


Please vote on releasing this package as Apache PDFBox 2.0.24.
The vote is open for the next 72 hours and passes if a majority of at
least three +1 PDFBox PMC votes are cast.

    [ ] +1 Release this package as Apache PDFBox 2.0.24
    [ ] -1 Do not release this package because...


Here is my +1

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: [VOTE] Release Apache PDFBox 2.0.24

2021-06-08 Thread sahy...@fileaffairs.de
+1

Thanks for preparing the release.

Maruan

Am Montag, dem 07.06.2021 um 18:51 +0200 schrieb Andreas Lehmkuehler:
> Hi,
> 
> a candidate for the PDFBox 2.0.24 release is available at:
> 
>  https://dist.apache.org/repos/dist/dev/pdfbox/2.0.24/
> 
> The release candidate is a zip archive of the sources in:
> 
>  http://svn.apache.org/repos/asf/pdfbox/tags/2.0.24/
> 
> The SHA-512 checksum of the archive is 
> 5d55b3cadbbae266d90c47f5b10c9b09b6dc16f53b77a0cf15c78e62fc69afc7b6eab
> 5a4329608ecdf25de9194b38db1f7d23e7d71af473cc1bf7b09b0028642.
> 
> Please vote on releasing this package as Apache PDFBox 2.0.24.
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 PDFBox PMC votes are cast.
> 
>  [ ] +1 Release this package as Apache PDFBox 2.0.24
>  [ ] -1 Do not release this package because...
> 
> 
> Here is my +1
> 
> Andreas
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: dev-h...@pdfbox.apache.org
> 



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5198) When merging multiple pdf ua documents, Tags become nested

2021-06-08 Thread Matthew Jung (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17359359#comment-17359359
 ] 

Matthew Jung commented on PDFBOX-5198:
--

Hi Hausherr
It looks like the issue happens when the PDF file is not tagged correctly as 
PDF UA. If the pdf is tagged correctly it works fine
Matt
On Monday, June 7, 2021, 10:31:01 PM EDT, Tilman Hausherr (Jira) 
 wrote:  
 
 
    [ 
https://issues.apache.org/jira/browse/PDFBOX-5198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17358997#comment-17358997
 ] 

Tilman Hausherr commented on PDFBOX-5198:
-

The release build is planned for today, so if you get examples later, please 
create a new issue.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


> When merging multiple pdf ua documents, Tags become nested
> --
>
> Key: PDFBOX-5198
> URL: https://issues.apache.org/jira/browse/PDFBOX-5198
> Project: PDFBox
>  Issue Type: Wish
>  Components: Utilities
>Affects Versions: 2.0.21, 2.0.23
>Reporter: Matthew Jung
>Assignee: Tilman Hausherr
>Priority: Major
> Fix For: 2.0.24, 3.0.0 PDFBox
>
> Attachments: 1622000586495blob.jpg, 1622120149457blob.jpg, 
> 1622120149457blob.jpg, 1622123253165blob.jpg, 1622123790854blob.jpg, 
> 1623105725988blob.jpg, 1623105725988blob.jpg, 1623115281967blob.jpg, 
> Binder1.pdf, PDFA3A-merged-new.pdf, PDFUA-in-a-Nutshell-PDFUA_1.pdf, 
> nested_tags_4documents_merged_using_pdfbox.tif, 
> non_nested_tags_4documents_combined_using+adobe_pro.tif, screenshot-1.png
>
>
> When merging PDF UA documents the tags seen in Adobe reader are nested. If 
> merging 200 documents then the tags are 200 nested deep. It does not appear 
> to affect that JAWS reader can still read the document  but it may slow down 
> performance when loading to a content repository.
> 
>           
>                        
> when using Adobe DC to merge multiple documents the tags are flatten
> 
>      
>       
>       
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: [VOTE] Release Apache PDFBox 2.0.24

2021-06-08 Thread Tim Allison
+1

Thank you!

On Mon, Jun 7, 2021 at 12:52 PM Andreas Lehmkuehler  wrote:
>
> Hi,
>
> a candidate for the PDFBox 2.0.24 release is available at:
>
>  https://dist.apache.org/repos/dist/dev/pdfbox/2.0.24/
>
> The release candidate is a zip archive of the sources in:
>
>  http://svn.apache.org/repos/asf/pdfbox/tags/2.0.24/
>
> The SHA-512 checksum of the archive is
> 5d55b3cadbbae266d90c47f5b10c9b09b6dc16f53b77a0cf15c78e62fc69afc7b6eab5a4329608ecdf25de9194b38db1f7d23e7d71af473cc1bf7b09b0028642.
>
> Please vote on releasing this package as Apache PDFBox 2.0.24.
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 PDFBox PMC votes are cast.
>
>  [ ] +1 Release this package as Apache PDFBox 2.0.24
>  [ ] -1 Do not release this package because...
>
>
> Here is my +1
>
> Andreas
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: dev-h...@pdfbox.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5207) Page not rendered / extracted, Unknown type in array for TJ operation

2021-06-08 Thread Jira


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler updated PDFBOX-5207:
---
Fix Version/s: 2.0.25

> Page not rendered / extracted, Unknown type in array for TJ operation
> -
>
> Key: PDFBOX-5207
> URL: https://issues.apache.org/jira/browse/PDFBOX-5207
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing
>Affects Versions: 2.0.23
>Reporter: Tilman Hausherr
>Assignee: Andreas Lehmkühler
>Priority: Major
>  Labels: regression
> Fix For: 2.0.25
>
> Attachments: ContentStream.txt, evince-395-0.zip-0.pdf
>
>
> Worked in 2.0.23, no longer now. The weird thing is that the content stream 
> (attached) is the same. It contains a "[" in an array at offset 4211.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Assigned] (PDFBOX-5207) Page not rendered / extracted, Unknown type in array for TJ operation

2021-06-08 Thread Jira


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler reassigned PDFBOX-5207:
--

Assignee: Andreas Lehmkühler

> Page not rendered / extracted, Unknown type in array for TJ operation
> -
>
> Key: PDFBOX-5207
> URL: https://issues.apache.org/jira/browse/PDFBOX-5207
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing
>Affects Versions: 2.0.23
>Reporter: Tilman Hausherr
>Assignee: Andreas Lehmkühler
>Priority: Major
>  Labels: regression
> Attachments: ContentStream.txt, evince-395-0.zip-0.pdf
>
>
> Worked in 2.0.23, no longer now. The weird thing is that the content stream 
> (attached) is the same. It contains a "[" in an array at offset 4211.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5207) Page not rendered / extracted, Unknown type in array for TJ operation

2021-06-08 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17359068#comment-17359068
 ] 

ASF subversion and git services commented on PDFBOX-5207:
-

Commit 1890583 from le...@apache.org in branch 'pdfbox/branches/2.0'
[ https://svn.apache.org/r1890583 ]

PDFBOX-5207: skip nested arrays instead of throwing an IOException

> Page not rendered / extracted, Unknown type in array for TJ operation
> -
>
> Key: PDFBOX-5207
> URL: https://issues.apache.org/jira/browse/PDFBOX-5207
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing
>Affects Versions: 2.0.23
>Reporter: Tilman Hausherr
>Priority: Major
>  Labels: regression
> Attachments: ContentStream.txt, evince-395-0.zip-0.pdf
>
>
> Worked in 2.0.23, no longer now. The weird thing is that the content stream 
> (attached) is the same. It contains a "[" in an array at offset 4211.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5207) Page not rendered / extracted, Unknown type in array for TJ operation

2021-06-08 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17359070#comment-17359070
 ] 

ASF subversion and git services commented on PDFBOX-5207:
-

Commit 1890584 from le...@apache.org in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1890584 ]

PDFBOX-5207: skip nested arrays instead of throwing an IOException

> Page not rendered / extracted, Unknown type in array for TJ operation
> -
>
> Key: PDFBOX-5207
> URL: https://issues.apache.org/jira/browse/PDFBOX-5207
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing
>Affects Versions: 2.0.23
>Reporter: Tilman Hausherr
>Priority: Major
>  Labels: regression
> Attachments: ContentStream.txt, evince-395-0.zip-0.pdf
>
>
> Worked in 2.0.23, no longer now. The weird thing is that the content stream 
> (attached) is the same. It contains a "[" in an array at offset 4211.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5207) Page not rendered / extracted, Unknown type in array for TJ operation

2021-06-08 Thread Jira


[ 
https://issues.apache.org/jira/browse/PDFBOX-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17359065#comment-17359065
 ] 

Andreas Lehmkühler commented on PDFBOX-5207:


AFAIK nested arrays are not allowed as operand for a TJ operator. The pdf in 
question has at least one array which is malformed (nested array, unbalanced 
number of square braces). Before PDFBOX-5190 those arrays were skipped and now 
the parser reads as much as possible. That nested arrays lead to an IOException 
in 
{{org.apache.pdfbox.contentstream.PDFStreamEngine.showTextStrings(COSArray)}}. 
I'm thinking about skipping such nested arrays and continue with the remaining 
part. In the current case the rendering is improved!!

BTW: we should think about a refactoring of 
{{org.apache.pdfbox.pdfparser.PDFStreamParser}}. It uses COS-objects when 
parsing a content stream. Although such content is very similar to COS-objects, 
they aren't. This should simplify the parsing and should reduce the resources 
to be used. But that is another story ...

> Page not rendered / extracted, Unknown type in array for TJ operation
> -
>
> Key: PDFBOX-5207
> URL: https://issues.apache.org/jira/browse/PDFBOX-5207
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing
>Affects Versions: 2.0.23
>Reporter: Tilman Hausherr
>Priority: Major
>  Labels: regression
> Attachments: ContentStream.txt, evince-395-0.zip-0.pdf
>
>
> Worked in 2.0.23, no longer now. The weird thing is that the content stream 
> (attached) is the same. It contains a "[" in an array at offset 4211.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org