[jira] [Commented] (PDFBOX-5664) 3.0.0: PDFCloneUtility needs a protected constructor to be useable outside of PDFBox when using Java 9 JPMS

2023-10-26 Thread Emmeran Seehuber (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17779840#comment-17779840
 ] 

Emmeran Seehuber commented on PDFBOX-5664:
--

[~lehmi] Thanks for pointing that out. I'll have to look into that the next 
time I get around to do something on the Graphics2D library.

> 3.0.0: PDFCloneUtility needs a protected constructor to be useable outside of 
> PDFBox when using Java 9 JPMS
> ---
>
> Key: PDFBOX-5664
> URL: https://issues.apache.org/jira/browse/PDFBOX-5664
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Utilities
>Affects Versions: 3.0.0 PDFBox
>Reporter: Emmeran Seehuber
>Assignee: Andreas Lehmkühler
>Priority: Major
> Fix For: 3.0.1 PDFBox, 4.0.0
>
>
> The constructor of PDFCloneUtility is package private. I did not have a 
> problem with this, because I did an ugly workaround in my pdfbox-graphics2d 
> 3.0.0 branch. I created a derived class InternalDeprecatedCOSCloner in the 
> org.apache.pdfbox.multipdf package inside my project. And could access the 
> constructor.
> This works fine as long as you don't plan to use the JPMS modules introduced 
> with Java 9. Which I personally don't plan every to do.
> But it seems Apache POI is going to use those JPMS modules, at least 
> [~fanningpj] is trying to get POI working with PDFBox 3.0.0 and my 
> pdfbox-graphics2d with version 3.0.0. And now he gets a not so nice 
>  
> {{/Users/pj.fanning/svn/poi/poi-ooxml/src/main/java9/module-info.java:18: 
> error: module org.apache.poi.ooxml reads package org.apache.pdfbox.multipdf 
> from both de.rototor.pdfbox.graphics2d and org.apache.pdfbox}}
> As the - to be honest rather dirty - workaround done be me no longer works 
> with JPMS...
> You can find the concrete usage for the cloner here 
> [https://github.com/rototor/pdfbox-graphics2d/blob/master/graphics2d/src/main/java/de/rototor/pdfbox/graphics2d/PdfBoxGraphics2DPaintApplier.java].
>  Just search for PDFCloneUtility. I use it to clone PDShading when I'm 
> "rewriting" PDFs. I.e. I use PDFBox to draw on my Graphics2D adapter to 
> create new PDFs and filter / change stuff in the PDF on the fly. Mostly to 
> split PDFs for Seperation colors and such stuff.
> Just making the PDFClonerUtility constructor public again would of course 
> work. But I'm not sure that this is the right solution. AFAIR it was made 
> package private because of many problems of users which did not really 
> understand what this class was for.
> Maybe a solution could be to make the constructor protected and create a 
> package private getCloner() factory method? That would allow me to derive 
> from the class from outside the original package but would also prevent 
> people who don't know for sure that they really want to use this class from 
> using it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5664) 3.0.0: PDFCloneUtility needs a protected constructor to be useable outside of PDFBox when using Java 9 JPMS

2023-09-03 Thread Emmeran Seehuber (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17761564#comment-17761564
 ] 

Emmeran Seehuber commented on PDFBOX-5664:
--

[~mkl] Copying the class should work, but I would like to avoid that. Yes the 
class does not change much, so as long as there is no radical structure change, 
the copied version should not get out of sync. So this could be a working 
solution.

But thats not the solution I would prefer, as cloning COS-Objects is some 
"internal" operation of PDFBox, which IMHO should not be replicated outside of 
it. Just give me a protected constructor, then I can derive from it in my 
project and can again access the functionality without breaking JPMS rules.

Yes the class and it's usage is not something the usual PDFBox user should 
stumble over, as it's likely going to cause confusion.

> 3.0.0: PDFCloneUtility needs a protected constructor to be useable outside of 
> PDFBox when using Java 9 JPMS
> ---
>
> Key: PDFBOX-5664
> URL: https://issues.apache.org/jira/browse/PDFBOX-5664
> Project: PDFBox
>  Issue Type: Improvement
>Affects Versions: 3.0.0 PDFBox
>Reporter: Emmeran Seehuber
>Priority: Major
>
> The constructor of PDFCloneUtility is package private. I did not have a 
> problem with this, because I did an ugly workaround in my pdfbox-graphics2d 
> 3.0.0 branch. I created a derived class InternalDeprecatedCOSCloner in the 
> org.apache.pdfbox.multipdf package inside my project. And could access the 
> constructor.
> This works fine as long as you don't plan to use the JPMS modules introduced 
> with Java 9. Which I personally don't plan every to do.
> But it seems Apache POI is going to use those JPMS modules, at least 
> [~fanningpj] is trying to get POI working with PDFBox 3.0.0 and my 
> pdfbox-graphics2d with version 3.0.0. And now he gets a not so nice 
>  
> {{/Users/pj.fanning/svn/poi/poi-ooxml/src/main/java9/module-info.java:18: 
> error: module org.apache.poi.ooxml reads package org.apache.pdfbox.multipdf 
> from both de.rototor.pdfbox.graphics2d and org.apache.pdfbox}}
> As the - to be honest rather dirty - workaround done be me no longer works 
> with JPMS...
> You can find the concrete usage for the cloner here 
> [https://github.com/rototor/pdfbox-graphics2d/blob/master/graphics2d/src/main/java/de/rototor/pdfbox/graphics2d/PdfBoxGraphics2DPaintApplier.java].
>  Just search for PDFCloneUtility. I use it to clone PDShading when I'm 
> "rewriting" PDFs. I.e. I use PDFBox to draw on my Graphics2D adapter to 
> create new PDFs and filter / change stuff in the PDF on the fly. Mostly to 
> split PDFs for Seperation colors and such stuff.
> Just making the PDFClonerUtility constructor public again would of course 
> work. But I'm not sure that this is the right solution. AFAIR it was made 
> package private because of many problems of users which did not really 
> understand what this class was for.
> Maybe a solution could be to make the constructor protected and create a 
> package private getCloner() factory method? That would allow me to derive 
> from the class from outside the original package but would also prevent 
> people who don't know for sure that they really want to use this class from 
> using it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-5664) 3.0.0: PDFCloneUtility needs a protected constructor to be useable outside of PDFBox when using Java 9 JPMS

2023-08-21 Thread Emmeran Seehuber (Jira)
Emmeran Seehuber created PDFBOX-5664:


 Summary: 3.0.0: PDFCloneUtility needs a protected constructor to 
be useable outside of PDFBox when using Java 9 JPMS
 Key: PDFBOX-5664
 URL: https://issues.apache.org/jira/browse/PDFBOX-5664
 Project: PDFBox
  Issue Type: Improvement
Affects Versions: 3.0.0 PDFBox
Reporter: Emmeran Seehuber


The constructor of PDFCloneUtility is package private. I did not have a problem 
with this, because I did an ugly workaround in my pdfbox-graphics2d 3.0.0 
branch. I created a derived class InternalDeprecatedCOSCloner in the 
org.apache.pdfbox.multipdf package inside my project. And could access the 
constructor.

This works fine as long as you don't plan to use the JPMS modules introduced 
with Java 9. Which I personally don't plan every to do.

But it seems Apache POI is going to use those JPMS modules, at least 
[~fanningpj] is trying to get POI working with PDFBox 3.0.0 and my 
pdfbox-graphics2d with version 3.0.0. And now he gets a not so nice 
 
{{/Users/pj.fanning/svn/poi/poi-ooxml/src/main/java9/module-info.java:18: 
error: module org.apache.poi.ooxml reads package org.apache.pdfbox.multipdf 
from both de.rototor.pdfbox.graphics2d and org.apache.pdfbox}}

As the - to be honest rather dirty - workaround done be me no longer works with 
JPMS...

You can find the concrete usage for the cloner here 
[https://github.com/rototor/pdfbox-graphics2d/blob/master/graphics2d/src/main/java/de/rototor/pdfbox/graphics2d/PdfBoxGraphics2DPaintApplier.java].
 Just search for PDFCloneUtility. I use it to clone PDShading when I'm 
"rewriting" PDFs. I.e. I use PDFBox to draw on my Graphics2D adapter to create 
new PDFs and filter / change stuff in the PDF on the fly. Mostly to split PDFs 
for Seperation colors and such stuff.

Just making the PDFClonerUtility constructor public again would of course work. 
But I'm not sure that this is the right solution. AFAIR it was made package 
private because of many problems of users which did not really understand what 
this class was for.

Maybe a solution could be to make the constructor protected and create a 
package private getCloner() factory method? That would allow me to derive from 
the class from outside the original package but would also prevent people who 
don't know for sure that they really want to use this class from using it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-5149) 3.0.0-RC1: PDFCloneUtility is no longer accessible

2023-08-21 Thread Emmeran Seehuber (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17757093#comment-17757093
 ] 

Emmeran Seehuber edited comment on PDFBOX-5149 at 8/21/23 9:03 PM:
---

Sorry, we are not done here :(

The constructor of PDFCloneUtility is package private. I did not have a problem 
with this, because I did an ugly workaround in my pdfbox-graphics2d 3.0.0 
branch. I created a derived class InternalDeprecatedCOSCloner in the 
org.apache.pdfbox.multipdf package inside my project. And could access the 
constructor.

This works fine as long as you don't plan to use the JPMS modules introduced 
with Java 9. Which I personally don't plan every to do.

But it seems Apache POI is going to use those JPMS modules, at least 
[~fanningpj] is trying to get POI working with PDFBox 3.0.0 and my 
pdfbox-graphics2d with version 3.0.0. And now he gets a not so nice 
 
{{/Users/pj.fanning/svn/poi/poi-ooxml/src/main/java9/module-info.java:18: 
error: module org.apache.poi.ooxml reads package org.apache.pdfbox.multipdf 
from both de.rototor.pdfbox.graphics2d and org.apache.pdfbox}}

As the - to be honest rather dirty - workaround done be me no longer works with 
JPMS...

You can find the concrete usage for the cloner here 
[https://github.com/rototor/pdfbox-graphics2d/blob/master/graphics2d/src/main/java/de/rototor/pdfbox/graphics2d/PdfBoxGraphics2DPaintApplier.java].
 Just search for PDFCloneUtility. I use it to clone PDShading when I'm 
"rewriting" PDFs. I.e. I use PDFBox to draw on my Graphics2D adapter to create 
new PDFs and filter / change stuff in the PDF on the fly. Mostly to split PDFs 
for Seperation colors and such stuff.

Just making the PDFClonerUtility constructor public again would of course work. 
But I'm not sure that this is the right solution. AFAIR it was made package 
private because of many problems of users which did not really understand what 
this class was for.

The "downstream" bug is here: 
[https://github.com/rototor/pdfbox-graphics2d/issues/56]

Maybe a solution could be to make the constructor protected and create a 
package private getCloner() factory method? That would allow me to derive from 
the class from outside the original package but would also prevent people who 
don't know for sure that they really want to use this class from using it.


was (Author: rototor):
Sorry, we are not done here :(

The constructor of PDFCloneUtility is package private. I did not have a problem 
with this, because I did an ugly workaround in my pdfbox-graphics2d 3.0.0 
branch. I created a derived class InternalDeprecatedCOSCloner in the 
org.apache.pdfbox.multipdf package inside my project. And could access the 
constructor.

This works fine as long as you don't plan to use the JPMS modules introduced 
with Java 9. Which I personally don't plan every to do.

But it seems Apache POI is going to use those JPMS modules, at least 
[~fanningpj] is trying to get POI working with PDFBox 3.0.0 and my 
pdfbox-graphics2d with version 3.0.0. And now he gets a not so nice 
 
{{/Users/pj.fanning/svn/poi/poi-ooxml/src/main/java9/module-info.java:18: 
error: module org.apache.poi.ooxml reads package org.apache.pdfbox.multipdf 
from both de.rototor.pdfbox.graphics2d and org.apache.pdfbox}}

As the - to be honest rather dirty - workaround done be me no longer works with 
JPMS...

You can find the concrete usage for the cloner here 
[https://github.com/rototor/pdfbox-graphics2d/blob/master/graphics2d/src/main/java/de/rototor/pdfbox/graphics2d/PdfBoxGraphics2DPaintApplier.java].
 Just search for PDFCloneUtility. I use it to clone PDShading when I'm 
"rewriting" PDFs. I.e. I use PDFBox to draw on my Graphics2D adapter to create 
new PDFs and filter / change stuff in the PDF on the fly. Mostly to split PDFs 
for Seperation colors and such stuff.

Just making the PDFClonerUtility constructor public again would of course work. 
But I'm not sure that this is the right solution. AFAIR it was made package 
private because of many problems of users which did not really understand what 
this class was for.

The "downstream" bug is here: 
https://github.com/rototor/pdfbox-graphics2d/issues/56 

> 3.0.0-RC1: PDFCloneUtility is no longer accessible
> --
>
> Key: PDFBOX-5149
> URL: https://issues.apache.org/jira/browse/PDFBOX-5149
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 3.0.0 PDFBox
>Reporter: Emmeran Seehuber
>Assignee: Andreas Lehmkühler
>Priority: Major
> Fix For: 3.0.0 PDFBox
>
>
> PDFCloneUtility is no longer accessible by default. This is low level 
> functionality, which is sometimes required and useful. PDFBox itself is low 
> level, so I don't see why you would restrict access to this. Also this API is 
> not really complicated 

[jira] [Reopened] (PDFBOX-5149) 3.0.0-RC1: PDFCloneUtility is no longer accessible

2023-08-21 Thread Emmeran Seehuber (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Emmeran Seehuber reopened PDFBOX-5149:
--

Sorry, we are not done here :(

The constructor of PDFCloneUtility is package private. I did not have a problem 
with this, because I did an ugly workaround in my pdfbox-graphics2d 3.0.0 
branch. I created a derived class InternalDeprecatedCOSCloner in the 
org.apache.pdfbox.multipdf package inside my project. And could access the 
constructor.

This works fine as long as you don't plan to use the JPMS modules introduced 
with Java 9. Which I personally don't plan every to do.

But it seems Apache POI is going to use those JPMS modules, at least 
[~fanningpj] is trying to get POI working with PDFBox 3.0.0 and my 
pdfbox-graphics2d with version 3.0.0. And now he gets a not so nice 
 
{{/Users/pj.fanning/svn/poi/poi-ooxml/src/main/java9/module-info.java:18: 
error: module org.apache.poi.ooxml reads package org.apache.pdfbox.multipdf 
from both de.rototor.pdfbox.graphics2d and org.apache.pdfbox}}

As the - to be honest rather dirty - workaround done be me no longer works with 
JPMS...

You can find the concrete usage for the cloner here 
[https://github.com/rototor/pdfbox-graphics2d/blob/master/graphics2d/src/main/java/de/rototor/pdfbox/graphics2d/PdfBoxGraphics2DPaintApplier.java].
 Just search for PDFCloneUtility. I use it to clone PDShading when I'm 
"rewriting" PDFs. I.e. I use PDFBox to draw on my Graphics2D adapter to create 
new PDFs and filter / change stuff in the PDF on the fly. Mostly to split PDFs 
for Seperation colors and such stuff.

Just making the PDFClonerUtility constructor public again would of course work. 
But I'm not sure that this is the right solution. AFAIR it was made package 
private because of many problems of users which did not really understand what 
this class was for.

The "downstream" bug is here: 
https://github.com/rototor/pdfbox-graphics2d/issues/56 

> 3.0.0-RC1: PDFCloneUtility is no longer accessible
> --
>
> Key: PDFBOX-5149
> URL: https://issues.apache.org/jira/browse/PDFBOX-5149
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 3.0.0 PDFBox
>Reporter: Emmeran Seehuber
>Assignee: Andreas Lehmkühler
>Priority: Major
> Fix For: 3.0.0 PDFBox
>
>
> PDFCloneUtility is no longer accessible by default. This is low level 
> functionality, which is sometimes required and useful. PDFBox itself is low 
> level, so I don't see why you would restrict access to this. Also this API is 
> not really complicated nor unstable.
> For now I did a dirty workaround to access it (see 
> [here|https://github.com/rototor/pdfbox-graphics2d/commit/5986bc653f83b2c06e5218ac906b9a9bc75f724e#diff-2113e77a03390c0cf920587a642fe7693e5b3c8402de783223035a79e13c2209R1]).
>  I would rather like to get rid of this workaround soon.
> Other than that, PDFBox-Graphics2d seems to work fine with PDFBox 3.0.0-RC1. 
> I just released a 3.0.0-RC1 version of it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5149) 3.0.0-RC1: PDFCloneUtility is no longer accessible

2022-07-03 Thread Emmeran Seehuber (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17561834#comment-17561834
 ] 

Emmeran Seehuber commented on PDFBOX-5149:
--

Looks good to me. It may make sense to make to make cloneForNewDocument 
generic. Would save some casts. And of course public again. I.e. I would like 
to have this patch applied:
{code:java}
--- a/pdfbox/src/main/java/org/apache/pdfbox/multipdf/PDFCloneUtility.java
+++ b/pdfbox/src/main/java/org/apache/pdfbox/multipdf/PDFCloneUtility.java
@@ -39,7 +39,7 @@ import org.apache.pdfbox.pdmodel.common.COSObjectable;
  * Utility class used to clone PDF objects. It keeps track of objects it has 
already cloned.
  *
  */
-class PDFCloneUtility
+public class PDFCloneUtility
 {
 private static final Log LOG = LogFactory.getLog(PDFCloneUtility.class);

@@ -77,7 +77,8 @@ class PDFCloneUtility
  * @return the cloned instance of the base object
  * @throws IOException if an I/O error occurs
  */
-COSBase cloneForNewDocument(COSBase base) throws IOException
+@SuppressWarnings("unchecked")
+public  TCOSBase cloneForNewDocument(TCOSBase 
base) throws IOException
 {
 if (base == null)
 {
@@ -87,7 +88,7 @@ class PDFCloneUtility
 if (retval != null)
 {
 // we are done, it has already been converted.
-return retval;
+return (TCOSBase)retval;
 }
 if (clonedValues.contains(base))
 {
@@ -97,7 +98,7 @@ class PDFCloneUtility
 retval = cloneCOSBaseForNewDocument((COSBase)base);
 clonedVersion.put(base, retval);
 clonedValues.add(retval);
-return retval;
+return (TCOSBase)retval;
 }

{code}
No idea regarding the MergeUtility. I don't use that directly. (And regarding 
merging ContentStream COSStream: I had to learn the hard way, that you can only 
place ContentStreams of Pages in an array. It is not allowed for XForms. PDFBox 
incorrectly just accepts that, but Acrobat etc. doesn't really like that...)

> 3.0.0-RC1: PDFCloneUtility is no longer accessible
> --
>
> Key: PDFBOX-5149
> URL: https://issues.apache.org/jira/browse/PDFBOX-5149
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 3.0.0 PDFBox
>Reporter: Emmeran Seehuber
>Assignee: Andreas Lehmkühler
>Priority: Major
>
> PDFCloneUtility is no longer accessible by default. This is low level 
> functionality, which is sometimes required and useful. PDFBox itself is low 
> level, so I don't see why you would restrict access to this. Also this API is 
> not really complicated nor unstable.
> For now I did a dirty workaround to access it (see 
> [here|https://github.com/rototor/pdfbox-graphics2d/commit/5986bc653f83b2c06e5218ac906b9a9bc75f724e#diff-2113e77a03390c0cf920587a642fe7693e5b3c8402de783223035a79e13c2209R1]).
>  I would rather like to get rid of this workaround soon.
> Other than that, PDFBox-Graphics2d seems to work fine with PDFBox 3.0.0-RC1. 
> I just released a 3.0.0-RC1 version of it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-5362) [PATCH] Replace finalize() with Cleaner

2022-01-15 Thread Emmeran Seehuber (Jira)
Emmeran Seehuber created PDFBOX-5362:


 Summary: [PATCH] Replace finalize() with Cleaner
 Key: PDFBOX-5362
 URL: https://issues.apache.org/jira/browse/PDFBOX-5362
 Project: PDFBox
  Issue Type: Improvement
Affects Versions: 3.0.0 PDFBox
Reporter: Emmeran Seehuber
 Attachments: replace_finalizer_with_cleaner_v1.patch

Finalizers (method finalize()) are going to be deprecated for removal with JDK 
18. See [https://openjdk.java.net/jeps/421] for details.

The best way to replace the finalize() methods is by using the JDK 9 
java.lang.ref.Cleaner. As PDFBox 3 targets JDK 8 this can not be used directly.

The attached patch implements a Cleaner using finalizers for JDK <= 8 and using 
java.lang.ref.Cleaner by reflection for JDK 9+. 

The two remaining finalize() implementing classes are migrated to the new 
Cleaner.

I’m not really happy with the name and package 
org.apache.fontbox.util.PdfBoxInternalCleaner of the cleaner. Maybe you have an 
idea for a better place and name.

In theory this patch could be back ported to PDFBox 2, but I’m not sure if this 
is worth the risk.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5322) PDFDebugger: Strange zoom depending fill/clipping artefact

2021-11-23 Thread Emmeran Seehuber (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17447877#comment-17447877
 ] 

Emmeran Seehuber commented on PDFBOX-5322:
--

Thank you [~tilman]  for reporting this upstream to the JDK.

> PDFDebugger: Strange zoom depending fill/clipping artefact
> --
>
> Key: PDFBOX-5322
> URL: https://issues.apache.org/jira/browse/PDFBOX-5322
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering, Swing GUI
>Affects Versions: 2.0.24
>Reporter: Emmeran Seehuber
>Priority: Minor
> Attachments: PDFBOX-5322_reduced.pdf, 
> buggy_shape_operation_order_fixed.pdf, image-2021-11-15-19-00-01-859.png, 
> image-2021-11-15-19-00-24-192.png, kartenvorschau2871641668946630670.pdf
>
>
> The attached (test) PDF is not rendered correctly in the PDFBox Debugger, at 
> least on MacOS with both Azul-17 and Azul-11 JDKs. 
> The way it is misrendered is depending on the zoom level. It seems fine at 
> 150%, but buggy on all other zoom levels.  
> 100%:
> !image-2021-11-15-19-00-01-859.png|width=335,height=184!
> 200%:
> !image-2021-11-15-19-00-24-192.png|width=365,height=195!
>  
> It renders fine on Adobe Acrobat and MacOS Preview. (The text is exported as 
> vector shapes)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-5322) PDFDebugger: Strange zoom depending fill/clipping artefact

2021-11-16 Thread Emmeran Seehuber (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17444508#comment-17444508
 ] 

Emmeran Seehuber edited comment on PDFBOX-5322 at 11/16/21, 12:33 PM:
--

[~mkl] Ok, thats news to me that setting properties is forbidden after defining 
a path. Where in the PDF spec (1.7) is this specified? I only found in 8.5.2.1 
that after a path construction it "{*}may{*} concluded with the application of 
a path-painting operator". As a non native english speaker I assume this means 
that a path construction does not require that the path is instantly used.

Never the less I changed in my pdfbox-graphics2d bridge the ordering, so that 
after the path construction instant a fill or a stroke is done, and all 
properties / graphics states are defined before the path construction is done. 

The problem still persists. See [^buggy_shape_operation_order_fixed.pdf]

Is there anything else in the PDF which is broken and could cause that problem?


was (Author: rototor):
[~mkl] Ok, thats news to me that setting properties is forbidden after defining 
a path. Where in the PDF spec (1.7) is this specified? I only found in 8.5.2.1 
that after a path construction it "{*}may{*} concluded with the application of 
a path-painting operator". As a non native english speaker I assume this means 
that a path construction does not require that the path is instantly used.

Never the less I changed in my pdfbox-graphics2d bridge the ordering, so that 
after the path construction instant a fill oder stroke is done, and all 
properties / graphics states are defined before the path construction is done. 

The problem still persists. See [^buggy_shape_operation_order_fixed.pdf]

Is there anything else in the PDF which is broken?

> PDFDebugger: Strange zoom depending fill/clipping artefact
> --
>
> Key: PDFBOX-5322
> URL: https://issues.apache.org/jira/browse/PDFBOX-5322
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering, Swing GUI
>Affects Versions: 2.0.24
>Reporter: Emmeran Seehuber
>Priority: Minor
> Attachments: buggy_shape_operation_order_fixed.pdf, 
> image-2021-11-15-19-00-01-859.png, image-2021-11-15-19-00-24-192.png, 
> kartenvorschau2871641668946630670.pdf
>
>
> The attached (test) PDF is not rendered correctly in the PDFBox Debugger, at 
> least on MacOS with both Azul-17 and Azul-11 JDKs. 
> The way it is misrendered is depending on the zoom level. It seems fine at 
> 150%, but buggy on all other zoom levels.  
> 100%:
> !image-2021-11-15-19-00-01-859.png|width=335,height=184!
> 200%:
> !image-2021-11-15-19-00-24-192.png|width=365,height=195!
>  
> It renders fine on Adobe Acrobat and MacOS Preview. (The text is exported as 
> vector shapes)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5322) PDFDebugger: Strange zoom depending fill/clipping artefact

2021-11-16 Thread Emmeran Seehuber (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17444508#comment-17444508
 ] 

Emmeran Seehuber commented on PDFBOX-5322:
--

[~mkl] Ok, thats news to me that setting properties is forbidden after defining 
a path. Where in the PDF spec (1.7) is this specified? I only found in 8.5.2.1 
that after a path construction it "{*}may{*} concluded with the application of 
a path-painting operator". As a non native english speaker I assume this means 
that a path construction does not require that the path is instantly used.

Never the less I changed in my pdfbox-graphics2d bridge the ordering, so that 
after the path construction instant a fill oder stroke is done, and all 
properties / graphics states are defined before the path construction is done. 

The problem still persists. See [^buggy_shape_operation_order_fixed.pdf]

Is there anything else in the PDF which is broken?

> PDFDebugger: Strange zoom depending fill/clipping artefact
> --
>
> Key: PDFBOX-5322
> URL: https://issues.apache.org/jira/browse/PDFBOX-5322
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering, Swing GUI
>Affects Versions: 2.0.24
>Reporter: Emmeran Seehuber
>Priority: Minor
> Attachments: buggy_shape_operation_order_fixed.pdf, 
> image-2021-11-15-19-00-01-859.png, image-2021-11-15-19-00-24-192.png, 
> kartenvorschau2871641668946630670.pdf
>
>
> The attached (test) PDF is not rendered correctly in the PDFBox Debugger, at 
> least on MacOS with both Azul-17 and Azul-11 JDKs. 
> The way it is misrendered is depending on the zoom level. It seems fine at 
> 150%, but buggy on all other zoom levels.  
> 100%:
> !image-2021-11-15-19-00-01-859.png|width=335,height=184!
> 200%:
> !image-2021-11-15-19-00-24-192.png|width=365,height=195!
>  
> It renders fine on Adobe Acrobat and MacOS Preview. (The text is exported as 
> vector shapes)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5322) PDFDebugger: Strange zoom depending fill/clipping artefact

2021-11-16 Thread Emmeran Seehuber (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Emmeran Seehuber updated PDFBOX-5322:
-
Attachment: buggy_shape_operation_order_fixed.pdf

> PDFDebugger: Strange zoom depending fill/clipping artefact
> --
>
> Key: PDFBOX-5322
> URL: https://issues.apache.org/jira/browse/PDFBOX-5322
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering, Swing GUI
>Affects Versions: 2.0.24
>Reporter: Emmeran Seehuber
>Priority: Minor
> Attachments: buggy_shape_operation_order_fixed.pdf, 
> image-2021-11-15-19-00-01-859.png, image-2021-11-15-19-00-24-192.png, 
> kartenvorschau2871641668946630670.pdf
>
>
> The attached (test) PDF is not rendered correctly in the PDFBox Debugger, at 
> least on MacOS with both Azul-17 and Azul-11 JDKs. 
> The way it is misrendered is depending on the zoom level. It seems fine at 
> 150%, but buggy on all other zoom levels.  
> 100%:
> !image-2021-11-15-19-00-01-859.png|width=335,height=184!
> 200%:
> !image-2021-11-15-19-00-24-192.png|width=365,height=195!
>  
> It renders fine on Adobe Acrobat and MacOS Preview. (The text is exported as 
> vector shapes)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-5322) PDFDebugger: Strange zoom depending fill/clipping artefact

2021-11-15 Thread Emmeran Seehuber (Jira)
Emmeran Seehuber created PDFBOX-5322:


 Summary: PDFDebugger: Strange zoom depending fill/clipping artefact
 Key: PDFBOX-5322
 URL: https://issues.apache.org/jira/browse/PDFBOX-5322
 Project: PDFBox
  Issue Type: Bug
  Components: Rendering, Swing GUI
Affects Versions: 2.0.24
Reporter: Emmeran Seehuber
 Attachments: image-2021-11-15-19-00-01-859.png, 
image-2021-11-15-19-00-24-192.png, kartenvorschau2871641668946630670.pdf

The attached (test) PDF is not rendered correctly in the PDFBox Debugger, at 
least on MacOS with both Azul-17 and Azul-11 JDKs. 

The way it is misrendered is depending on the zoom level. It seems fine at 
150%, but buggy on all other zoom levels.  

100%:

!image-2021-11-15-19-00-01-859.png|width=335,height=184!

200%:

!image-2021-11-15-19-00-24-192.png|width=365,height=195!

 

It renders fine on Adobe Acrobat and MacOS Preview. (The text is exported as 
vector shapes)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-5250) Colors in PDF Tile Patterns are off

2021-07-30 Thread Emmeran Seehuber (Jira)
Emmeran Seehuber created PDFBOX-5250:


 Summary: Colors in PDF Tile Patterns are off
 Key: PDFBOX-5250
 URL: https://issues.apache.org/jira/browse/PDFBOX-5250
 Project: PDFBox
  Issue Type: Bug
  Components: Rendering
Affects Versions: 2.0.24
Reporter: Emmeran Seehuber
 Attachments: kachel4.pdf, pattern_colors_broken.pdf

I've implementing for my customer some "text with pattern" features. In the 
attached PDF you see that in action.

There is a color mapping problem in the patterned text in the left bottom 
corner in the PDFBox debugger. The colors are way to bright. Acrobat and MacOS 
Preview render this fine. FoxIt Reader also has some rendering issues with this 
(image seems distorted...).

I've also attached the "base" PDF used for the pattern.

It think the way I embed this should be fine:
{code:java}
return new OurPDColorCarrier((PDPageContentStream contentStream) -> {
   PDImageResult pdImage = throwing(() -> 
textureImage.getPDImageXObject(document, colorSpaceRegistry));
   Objects.requireNonNull(pdImage);

   PDTilingPattern pattern = new PDTilingPattern();
   pattern.setPaintType(PDTilingPattern.PAINT_COLORED);
   pattern.setTilingType(PDTilingPattern.TILING_NO_DISTORTION);

   var anchorInfo = TextRenderer.getTextureTileAnchor(textureImage);

   float textureImageWidth = (float) anchorInfo.originalWidth;
   float textureImageHeight = (float) anchorInfo.originalHeight;
   pattern.setBBox(
 new PDRectangle(0f, 0f, (float) dpi2mm(textureImageWidth), (float) 
dpi2mm(textureImageHeight)));
   pattern.setXStep((float) anchorInfo.anchor.getWidth());
   pattern.setYStep((float) anchorInfo.anchor.getHeight());

   double scaleFactor = pageWidth / renderTarget.width;

   /* Die Matrix ist lt. Spec für die Phasenverschiebung des Patterns da. */
   double countPhasen = (renderTarget.pageSize.getHeight() / 
anchorInfo.anchor.getHeight());
   double rest = renderTarget.pageSize.getHeight()
 - DoubleMath.roundToInt(countPhasen, RoundingMode.DOWN) * 
anchorInfo.anchor.getHeight();

   pattern.setMatrix(AffineTransform.getTranslateInstance(scaleFactor * 
currentTransform.getTranslateX(),
 renderTarget.pageSize.getHeight() - (scaleFactor * 
currentTransform.getTranslateY()) + rest));

   PDAppearanceStream appearance = new PDAppearanceStream(document);
   appearance.setResources(pattern.getResources());
   appearance.setBBox(pattern.getBBox());

   PDPageContentStream imageContentStream = new PDPageContentStream(document, 
appearance,
 ((COSStream) pattern.getCOSObject()).createOutputStream());

   if (pdImage.image instanceof PDFormXObject) {
  imageContentStream.transform(Matrix.getScaleInstance((float) 
(anchorInfo.originalWidth),
(float) (anchorInfo.originalHeight)));
  pattern.setBBox(new PDRectangle(0f, 0f, (float) 
(anchorInfo.originalWidth),
(float) (anchorInfo.originalHeight)));
  imageContentStream.drawForm((PDFormXObject) pdImage.image);
   } else {
  pattern.setMatrix(AffineTransform.getTranslateInstance(
scaleFactor * currentTransform.getTranslateX(), 
renderTarget.pageSize.getHeight()
  - (scaleFactor * currentTransform.getTranslateY()) + rest / 
scaleFactor));
  imageContentStream.transform(
Matrix.getScaleInstance((float) (anchorInfo.anchor.getWidth() / 
textureImageWidth),
  (float) (anchorInfo.anchor.getHeight() / 
textureImageHeight)));
  imageContentStream.drawImage((PDImageXObject) pdImage.image, 0, 0);
   }
   imageContentStream.close();

   PDResources resources = ModelUtils.getPrivateField(contentStream, 
"resources");
   COSName tilingPatternName = resources.add(pattern);
   return new PDColor(tilingPatternName, new PDPattern(null));
});

 {code}
Yes, in the third bottom line the 
{code:java}
PDResources resources = ModelUtils.getPrivateField(contentStream, "resources"); 
{code}
is an ugly reflection hack - any reason why the ContentStream does not expose 
its resources with a public API?

This is with Java 11 on MacOS BigSur.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5250) Colors in PDF Tile Patterns are off

2021-07-30 Thread Emmeran Seehuber (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Emmeran Seehuber updated PDFBOX-5250:
-
Attachment: pattern_colors_broken.pdf
kachel4.pdf

> Colors in PDF Tile Patterns are off
> ---
>
> Key: PDFBOX-5250
> URL: https://issues.apache.org/jira/browse/PDFBOX-5250
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.24
>Reporter: Emmeran Seehuber
>Priority: Minor
> Attachments: kachel4.pdf, pattern_colors_broken.pdf
>
>
> I've implementing for my customer some "text with pattern" features. In the 
> attached PDF you see that in action.
> There is a color mapping problem in the patterned text in the left bottom 
> corner in the PDFBox debugger. The colors are way to bright. Acrobat and 
> MacOS Preview render this fine. FoxIt Reader also has some rendering issues 
> with this (image seems distorted...).
> I've also attached the "base" PDF used for the pattern.
> It think the way I embed this should be fine:
> {code:java}
> return new OurPDColorCarrier((PDPageContentStream contentStream) -> {
>PDImageResult pdImage = throwing(() -> 
> textureImage.getPDImageXObject(document, colorSpaceRegistry));
>Objects.requireNonNull(pdImage);
>PDTilingPattern pattern = new PDTilingPattern();
>pattern.setPaintType(PDTilingPattern.PAINT_COLORED);
>pattern.setTilingType(PDTilingPattern.TILING_NO_DISTORTION);
>var anchorInfo = TextRenderer.getTextureTileAnchor(textureImage);
>float textureImageWidth = (float) anchorInfo.originalWidth;
>float textureImageHeight = (float) anchorInfo.originalHeight;
>pattern.setBBox(
>  new PDRectangle(0f, 0f, (float) dpi2mm(textureImageWidth), (float) 
> dpi2mm(textureImageHeight)));
>pattern.setXStep((float) anchorInfo.anchor.getWidth());
>pattern.setYStep((float) anchorInfo.anchor.getHeight());
>double scaleFactor = pageWidth / renderTarget.width;
>/* Die Matrix ist lt. Spec für die Phasenverschiebung des Patterns da. */
>double countPhasen = (renderTarget.pageSize.getHeight() / 
> anchorInfo.anchor.getHeight());
>double rest = renderTarget.pageSize.getHeight()
>  - DoubleMath.roundToInt(countPhasen, RoundingMode.DOWN) * 
> anchorInfo.anchor.getHeight();
>pattern.setMatrix(AffineTransform.getTranslateInstance(scaleFactor * 
> currentTransform.getTranslateX(),
>  renderTarget.pageSize.getHeight() - (scaleFactor * 
> currentTransform.getTranslateY()) + rest));
>PDAppearanceStream appearance = new PDAppearanceStream(document);
>appearance.setResources(pattern.getResources());
>appearance.setBBox(pattern.getBBox());
>PDPageContentStream imageContentStream = new PDPageContentStream(document, 
> appearance,
>  ((COSStream) pattern.getCOSObject()).createOutputStream());
>if (pdImage.image instanceof PDFormXObject) {
>   imageContentStream.transform(Matrix.getScaleInstance((float) 
> (anchorInfo.originalWidth),
> (float) (anchorInfo.originalHeight)));
>   pattern.setBBox(new PDRectangle(0f, 0f, (float) 
> (anchorInfo.originalWidth),
> (float) (anchorInfo.originalHeight)));
>   imageContentStream.drawForm((PDFormXObject) pdImage.image);
>} else {
>   pattern.setMatrix(AffineTransform.getTranslateInstance(
> scaleFactor * currentTransform.getTranslateX(), 
> renderTarget.pageSize.getHeight()
>   - (scaleFactor * currentTransform.getTranslateY()) + rest / 
> scaleFactor));
>   imageContentStream.transform(
> Matrix.getScaleInstance((float) (anchorInfo.anchor.getWidth() / 
> textureImageWidth),
>   (float) (anchorInfo.anchor.getHeight() / 
> textureImageHeight)));
>   imageContentStream.drawImage((PDImageXObject) pdImage.image, 0, 0);
>}
>imageContentStream.close();
>PDResources resources = ModelUtils.getPrivateField(contentStream, 
> "resources");
>COSName tilingPatternName = resources.add(pattern);
>return new PDColor(tilingPatternName, new PDPattern(null));
> });
>  {code}
> Yes, in the third bottom line the 
> {code:java}
> PDResources resources = ModelUtils.getPrivateField(contentStream, 
> "resources"); {code}
> is an ugly reflection hack - any reason why the ContentStream does not expose 
> its resources with a public API?
> This is with Java 11 on MacOS BigSur.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5149) 3.0.0-RC1: PDFCloneUtility is no longer accessible

2021-04-06 Thread Emmeran Seehuber (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17315794#comment-17315794
 ] 

Emmeran Seehuber commented on PDFBOX-5149:
--

I've renamed my accessor class in PDFBox-Graphics2D to 
InternalDeprecatedCOSCloner now and added a @Deprecation annotation to it. So 
hopefully no one will find / use it...

>From my POV this is fine for now, as it works as long as PDFBox is not built 
>and used as a Java 9 module. Because then this would no longer work.

So I don't think that there is an urgent need at the moment to find a better 
solution.

> 3.0.0-RC1: PDFCloneUtility is no longer accessible
> --
>
> Key: PDFBOX-5149
> URL: https://issues.apache.org/jira/browse/PDFBOX-5149
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 3.0.0 PDFBox
>Reporter: Emmeran Seehuber
>Priority: Major
>
> PDFCloneUtility is no longer accessible by default. This is low level 
> functionality, which is sometimes required and useful. PDFBox itself is low 
> level, so I don't see why you would restrict access to this. Also this API is 
> not really complicated nor unstable.
> For now I did a dirty workaround to access it (see 
> [here|https://github.com/rototor/pdfbox-graphics2d/commit/5986bc653f83b2c06e5218ac906b9a9bc75f724e#diff-2113e77a03390c0cf920587a642fe7693e5b3c8402de783223035a79e13c2209R1]).
>  I would rather like to get rid of this workaround soon.
> Other than that, PDFBox-Graphics2d seems to work fine with PDFBox 3.0.0-RC1. 
> I just released a 3.0.0-RC1 version of it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5150) 3.0.0-RC1: PDComboBox.setValue() throws IllegalArgumentException: /DA is a required entry

2021-04-06 Thread Emmeran Seehuber (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17315781#comment-17315781
 ] 

Emmeran Seehuber commented on PDFBOX-5150:
--

Ah ok, now it works. So this is fine for me now. Thanks!

But I think this should be mentioned in the migration guide.

Also changing the exception text from
{code:java}
java.lang.IllegalArgumentException: /DA is a required entry
{code}
to
{code:java}
java.lang.IllegalArgumentException: /DA is a required entry. Please set an 
appearance first.
{code}
could avoid some future questions.

> 3.0.0-RC1: PDComboBox.setValue() throws IllegalArgumentException: /DA is a 
> required entry
> -
>
> Key: PDFBOX-5150
> URL: https://issues.apache.org/jira/browse/PDFBOX-5150
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 3.0.0 PDFBox
>Reporter: Emmeran Seehuber
>Priority: Major
>
> While porting openhtmltopdf to PDFBox 3.0.0-RC1 I got exceptions in some 
> tests:
> The tests 
> com.openhtmltopdf.nonvisualregressiontests.NonVisualRegressionTest#testInputWithoutNameAttribute,
>  com.openhtmltopdf.testcases.CssPropertiesTest#testFormControls and 
> com.openhtmltopdf.testcases.TestcaseRunnerTest#runTestcaseRunner causes the 
> „field.setValue()“ on a PDComboBox in 
> [https://github.com/rototor/openhtmltopdf/blob/open-dev-v1-pdfbox-3.0.0/openhtmltopdf-pdfbox/src/main/java/com/openhtmltopdf/pdfboxout/PdfBoxForm.java#L363]
>  to throw this exception:
> java.lang.IllegalArgumentException: /DA is a required entry
>  at 
> org.apache.pdfbox.pdmodel.interactive.form.PDDefaultAppearanceString.(PDDefaultAppearanceString.java:78)
>  at 
> org.apache.pdfbox.pdmodel.interactive.form.PDVariableText.getDefaultAppearanceString(PDVariableText.java:93)
>  at 
> org.apache.pdfbox.pdmodel.interactive.form.AppearanceGeneratorHelper.(AppearanceGeneratorHelper.java:115)
>  at 
> org.apache.pdfbox.pdmodel.interactive.form.PDComboBox.constructAppearances(PDComboBox.java:82)
>  at 
> org.apache.pdfbox.pdmodel.interactive.form.PDTerminalField.applyChange(PDTerminalField.java:210)
>  at 
> org.apache.pdfbox.pdmodel.interactive.form.PDChoice.setValue(PDChoice.java:381)
>  at 
> com.openhtmltopdf.pdfboxout.PdfBoxForm.processSelectControl(PdfBoxForm.java:363)
>  at com.openhtmltopdf.pdfboxout.PdfBoxForm.process(PdfBoxForm.java:807)
>  at 
> com.openhtmltopdf.pdfboxout.PdfBoxPerDocumentFormState.processControls(PdfBoxPerDocumentFormState.java:179)
>  at 
> com.openhtmltopdf.pdfboxout.PdfBoxFastOutputDevice.processControls(PdfBoxFastOutputDevice.java:299)
>  at 
> com.openhtmltopdf.pdfboxout.PdfBoxFastOutputDevice.finish(PdfBoxFastOutputDevice.java:904)
> You can get the project here: 
> [https://github.com/rototor/openhtmltopdf/tree/open-dev-v1-pdfbox-3.0.0] 
> mvn test
> will show you the failing test cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5150) 3.0.0-RC1: PDComboBox.setValue() throws IllegalArgumentException: /DA is a required entry

2021-04-06 Thread Emmeran Seehuber (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17315337#comment-17315337
 ] 

Emmeran Seehuber commented on PDFBOX-5150:
--

I just tried your example default appearance string, but I then get this example

{code}
Caused by: java.io.IOException: Could not process default appearance string 
'/Helv 0 Tf 0. 0. 0. rg' for field 'null': Could not find font: 
/Helv
at 
org.apache.pdfbox.pdmodel.interactive.form.AppearanceGeneratorHelper.(AppearanceGeneratorHelper.java:121)
at 
org.apache.pdfbox.pdmodel.interactive.form.PDTextField.constructAppearances(PDTextField.java:261)
at 
org.apache.pdfbox.pdmodel.interactive.form.PDTerminalField.applyChange(PDTerminalField.java:210)
at 
org.apache.pdfbox.pdmodel.interactive.form.PDTextField.setValue(PDTextField.java:218)
{code}

It seems Helv can not be resolved as font. Shouldn't this be a builtin font, 
which can always be found?

I've updated the branch open-dev-v1-pdfbox-3.0.0  with the changes to use Helv 
as default if no font is given and also set the appearance before the value.

> 3.0.0-RC1: PDComboBox.setValue() throws IllegalArgumentException: /DA is a 
> required entry
> -
>
> Key: PDFBOX-5150
> URL: https://issues.apache.org/jira/browse/PDFBOX-5150
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 3.0.0 PDFBox
>Reporter: Emmeran Seehuber
>Priority: Major
>
> While porting openhtmltopdf to PDFBox 3.0.0-RC1 I got exceptions in some 
> tests:
> The tests 
> com.openhtmltopdf.nonvisualregressiontests.NonVisualRegressionTest#testInputWithoutNameAttribute,
>  com.openhtmltopdf.testcases.CssPropertiesTest#testFormControls and 
> com.openhtmltopdf.testcases.TestcaseRunnerTest#runTestcaseRunner causes the 
> „field.setValue()“ on a PDComboBox in 
> [https://github.com/rototor/openhtmltopdf/blob/open-dev-v1-pdfbox-3.0.0/openhtmltopdf-pdfbox/src/main/java/com/openhtmltopdf/pdfboxout/PdfBoxForm.java#L363]
>  to throw this exception:
> java.lang.IllegalArgumentException: /DA is a required entry
>  at 
> org.apache.pdfbox.pdmodel.interactive.form.PDDefaultAppearanceString.(PDDefaultAppearanceString.java:78)
>  at 
> org.apache.pdfbox.pdmodel.interactive.form.PDVariableText.getDefaultAppearanceString(PDVariableText.java:93)
>  at 
> org.apache.pdfbox.pdmodel.interactive.form.AppearanceGeneratorHelper.(AppearanceGeneratorHelper.java:115)
>  at 
> org.apache.pdfbox.pdmodel.interactive.form.PDComboBox.constructAppearances(PDComboBox.java:82)
>  at 
> org.apache.pdfbox.pdmodel.interactive.form.PDTerminalField.applyChange(PDTerminalField.java:210)
>  at 
> org.apache.pdfbox.pdmodel.interactive.form.PDChoice.setValue(PDChoice.java:381)
>  at 
> com.openhtmltopdf.pdfboxout.PdfBoxForm.processSelectControl(PdfBoxForm.java:363)
>  at com.openhtmltopdf.pdfboxout.PdfBoxForm.process(PdfBoxForm.java:807)
>  at 
> com.openhtmltopdf.pdfboxout.PdfBoxPerDocumentFormState.processControls(PdfBoxPerDocumentFormState.java:179)
>  at 
> com.openhtmltopdf.pdfboxout.PdfBoxFastOutputDevice.processControls(PdfBoxFastOutputDevice.java:299)
>  at 
> com.openhtmltopdf.pdfboxout.PdfBoxFastOutputDevice.finish(PdfBoxFastOutputDevice.java:904)
> You can get the project here: 
> [https://github.com/rototor/openhtmltopdf/tree/open-dev-v1-pdfbox-3.0.0] 
> mvn test
> will show you the failing test cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-5150) 3.0.0-RC1: PDComboBox.setValue() throws IllegalArgumentException: /DA is a required entry

2021-04-03 Thread Emmeran Seehuber (Jira)
Emmeran Seehuber created PDFBOX-5150:


 Summary: 3.0.0-RC1: PDComboBox.setValue() throws 
IllegalArgumentException: /DA is a required entry
 Key: PDFBOX-5150
 URL: https://issues.apache.org/jira/browse/PDFBOX-5150
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 3.0.0 PDFBox
Reporter: Emmeran Seehuber


While porting openhtmltopdf to PDFBox 3.0.0-RC1 I got exceptions in some tests:

The tests 
com.openhtmltopdf.nonvisualregressiontests.NonVisualRegressionTest#testInputWithoutNameAttribute,
 com.openhtmltopdf.testcases.CssPropertiesTest#testFormControls and 
com.openhtmltopdf.testcases.TestcaseRunnerTest#runTestcaseRunner causes the 
„field.setValue()“ on a PDComboBox in 
[https://github.com/rototor/openhtmltopdf/blob/open-dev-v1-pdfbox-3.0.0/openhtmltopdf-pdfbox/src/main/java/com/openhtmltopdf/pdfboxout/PdfBoxForm.java#L363]
 to throw this exception:

java.lang.IllegalArgumentException: /DA is a required entry
 at 
org.apache.pdfbox.pdmodel.interactive.form.PDDefaultAppearanceString.(PDDefaultAppearanceString.java:78)
 at 
org.apache.pdfbox.pdmodel.interactive.form.PDVariableText.getDefaultAppearanceString(PDVariableText.java:93)
 at 
org.apache.pdfbox.pdmodel.interactive.form.AppearanceGeneratorHelper.(AppearanceGeneratorHelper.java:115)
 at 
org.apache.pdfbox.pdmodel.interactive.form.PDComboBox.constructAppearances(PDComboBox.java:82)
 at 
org.apache.pdfbox.pdmodel.interactive.form.PDTerminalField.applyChange(PDTerminalField.java:210)
 at 
org.apache.pdfbox.pdmodel.interactive.form.PDChoice.setValue(PDChoice.java:381)
 at 
com.openhtmltopdf.pdfboxout.PdfBoxForm.processSelectControl(PdfBoxForm.java:363)
 at com.openhtmltopdf.pdfboxout.PdfBoxForm.process(PdfBoxForm.java:807)
 at 
com.openhtmltopdf.pdfboxout.PdfBoxPerDocumentFormState.processControls(PdfBoxPerDocumentFormState.java:179)
 at 
com.openhtmltopdf.pdfboxout.PdfBoxFastOutputDevice.processControls(PdfBoxFastOutputDevice.java:299)
 at 
com.openhtmltopdf.pdfboxout.PdfBoxFastOutputDevice.finish(PdfBoxFastOutputDevice.java:904)

You can get the project here: 
[https://github.com/rototor/openhtmltopdf/tree/open-dev-v1-pdfbox-3.0.0] 

mvn test

will show you the failing test cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5149) 3.0.0-RC1: PDFCloneUtility is no longer accessible

2021-04-03 Thread Emmeran Seehuber (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17314275#comment-17314275
 ] 

Emmeran Seehuber commented on PDFBOX-5149:
--

I use PDFRenderer with my PdfBoxGraphics2D to "rerender" the PDFs and change 
certain aspects in the process. I.e. I do some kind of transformation on the 
PDF by rendering it.

For a simple example look 
[here|https://github.com/rototor/pdfbox-graphics2d/blob/master/graphics2d/src/test/java/de/rototor/pdfbox/graphics2d/PdfRerenderTest.java#L123].

Depending on what I want to do I want to preserve as much of the original PDF 
as possible. So I also try to preserve the PDShadings (this is the code using 
the clone utility). This works for most parts, only transparency groups get 
"messed up", because they are rasterized as bitmaps by the PDFRenderer. I have 
already thought about would could be done to avoid that, but did not yet find 
time do some prototyping on this. I would like to be able to use PDFRenderer -> 
PdfBoxGraphics2D to process a PDF without loosing anything (i.e. also 
preserving all images in their full colorspace etc.). This could be used for 
all different kind of things. E.g. optimizing image resolution for web 
publishing or even implement a full PrePress PDF transformation (i.e. changing 
all colorspaces to be the same target CMYK, reduce resolution of image if they 
are way above what the print machine can do and so on).

At the moment I use this to convert some seperation color shapes into forms for 
sleeve foils. I.e. if you have some printed wedding card with some gold foil on 
it, the gold foil is applied by an extra step. In this step the shape which 
should get gold is printed with a CMYK(1,1,1,1) color and then put through a 
sleeve foil machine which will put foil where the toner is on the paper. After 
this step the real content is printed on the card. I get one PDF which contains 
the pages including the seperation color, and I have to preprocess it to get a 
page with only the shape for the gold and a page with the "normal" colors of 
the page, but without the special color.

So in the long term I would really like to be able to clone objects into new 
documents. Maybe adding PDCloneableObject interface and let all stuff which can 
really be safe cloned into a new document be implemented by it? Or just 
renaming PDFCloneUtility into something like COSInternalCloner? So that nobody 
finds it if he not really knows what he wants?

> 3.0.0-RC1: PDFCloneUtility is no longer accessible
> --
>
> Key: PDFBOX-5149
> URL: https://issues.apache.org/jira/browse/PDFBOX-5149
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 3.0.0 PDFBox
>Reporter: Emmeran Seehuber
>Priority: Major
>
> PDFCloneUtility is no longer accessible by default. This is low level 
> functionality, which is sometimes required and useful. PDFBox itself is low 
> level, so I don't see why you would restrict access to this. Also this API is 
> not really complicated nor unstable.
> For now I did a dirty workaround to access it (see 
> [here|https://github.com/rototor/pdfbox-graphics2d/commit/5986bc653f83b2c06e5218ac906b9a9bc75f724e#diff-2113e77a03390c0cf920587a642fe7693e5b3c8402de783223035a79e13c2209R1]).
>  I would rather like to get rid of this workaround soon.
> Other than that, PDFBox-Graphics2d seems to work fine with PDFBox 3.0.0-RC1. 
> I just released a 3.0.0-RC1 version of it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-5149) 3.0.0-RC1: PDFCloneUtility is no longer accessible

2021-04-02 Thread Emmeran Seehuber (Jira)
Emmeran Seehuber created PDFBOX-5149:


 Summary: 3.0.0-RC1: PDFCloneUtility is no longer accessible
 Key: PDFBOX-5149
 URL: https://issues.apache.org/jira/browse/PDFBOX-5149
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 3.0.0 PDFBox
Reporter: Emmeran Seehuber


PDFCloneUtility is no longer accessible by default. This is low level 
functionality, which is sometimes required and useful. PDFBox itself is low 
level, so I don't see why you would restrict access to this. Also this API is 
not really complicated nor unstable.

For now I did a dirty workaround to access it (see 
[here|https://github.com/rototor/pdfbox-graphics2d/commit/5986bc653f83b2c06e5218ac906b9a9bc75f724e#diff-2113e77a03390c0cf920587a642fe7693e5b3c8402de783223035a79e13c2209R1]).
 I would rather like to get rid of this workaround soon.

Other than that, PDFBox-Graphics2d seems to work fine with PDFBox 3.0.0-RC1. I 
just released a 3.0.0-RC1 version of it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4202) PDDocument is closed before calling close()

2021-03-08 Thread Emmeran Seehuber (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17297518#comment-17297518
 ] 

Emmeran Seehuber commented on PDFBOX-4202:
--

As I see this problem in production too (but made a workaround long ago by just 
retrying to create the PDF later again), I now know whats going on. As [~cgao] 
found out the document must be pinned so that it is not garbage collected. In 
my case I use the LayerUtility to import pages as XForm to stamp them into a 
new PDDocument. If the "source" PDDocument goes away because of GC the XForm 
becomes invalid. This problem is hard to reproduce, as it needs some GC cycles 
to get the finalizer running.

COSDocument has a finalizer. Which is bad and has been official deprecated as 
of Java 9 (see e.g. 
https://docs.oracle.com/javase/9/docs/api/java/lang/Object.html#finalize–). 
Well, it has always been bad because of the GC pressure and other problems it 
causes.  

The finalizer() in COSDocument is more or less only used to give a warning. But 
it closes all child resources/objects, even if there are still references to 
them. But it should not do this, as all OS resources are freed by the finalizer 
on the individual OS resource objects (FileDescriptors, ...). As of Java 9 the 
JDK got more or less rid of finalizers. 

The clean fix would be  to only log the warning message. See also 
[https://softwareengineering.stackexchange.com/questions/288715/is-overriding-object-finalize-really-bad]
 

With Java 9 the finalizer can be replaced with PhantomReferences and Cleaner 
usages. As Java 9 did internally. Which is of course not yet possible in trunk 
on Java 8.

Also maybe PDDocument in trunk should implement AutoCloseable so that IDEs give 
warnings if one does not care to close it.

> PDDocument is closed before calling close()
> ---
>
> Key: PDFBOX-4202
> URL: https://issues.apache.org/jira/browse/PDFBOX-4202
> Project: PDFBox
>  Issue Type: Bug
>  Components: PDModel
>Affects Versions: 2.0.9
> Environment: WIndow 10 x64
>Reporter: Chenyue Gao
>Priority: Critical
>
>  
> The following code append PDDocument read from a file to the mainDocument 
> which is passed as parameter. When I save the mainDocument, it throws the 
> exception below. 
> {code:java}
> public static void addPDPage(PDDocument mainDocument, File file, int pagenum) 
> throws IOException {
> PDDocument pkDocument = PDDocument.load(file);
> PDPageTree pdpageTree = pkDocument.getPages();
> pdpageTree.forEach(page -> {
>  mainDocument.addPage(page);
> });
> }
> {code}
> It seems that the pkDocument inside this function automatically closed itself 
> due to Java garbage collection. As a result the mainDocument can't save the 
> page associated with the pkDocument.
> A workaround could be to return pkDcoument to the caller and keep the 
> reference at the same level of mainDocument until mainDocument saves
>  
>  
> [2018-04-16 11:11:58] COSStream has been closed and cannot be read. Perhaps 
> its enclosing PDDocument has been closed?
>  [2018-04-16 11:11:58] java.io.IOException: COSStream has been closed and 
> cannot be read. Perhaps its enclosing PDDocument has been closed?
>  at org.apache.pdfbox.cos.COSStream.checkClosed(COSStream.java:77)
>  at org.apache.pdfbox.cos.COSStream.createRawInputStream(COSStream.java:125)
>  at org.apache.pdfbox.pdfwriter.COSWriter.visitFromStream(COSWriter.java:1200)
>  at org.apache.pdfbox.cos.COSStream.accept(COSStream.java:383)
>  at org.apache.pdfbox.cos.COSObject.accept(COSObject.java:158)
>  at org.apache.pdfbox.pdfwriter.COSWriter.doWriteObject(COSWriter.java:522)
>  at org.apache.pdfbox.pdfwriter.COSWriter.doWriteObjects(COSWriter.java:460)
>  at org.apache.pdfbox.pdfwriter.COSWriter.doWriteBody(COSWriter.java:444)
>  at 
> org.apache.pdfbox.pdfwriter.COSWriter.visitFromDocument(COSWriter.java:1096)
>  at org.apache.pdfbox.cos.COSDocument.accept(COSDocument.java:419)
>  at org.apache.pdfbox.pdfwriter.COSWriter.write(COSWriter.java:1367)
>  at org.apache.pdfbox.pdfwriter.COSWriter.write(COSWriter.java:1254)
>  at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1232)
>  at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1204)
>  at 
> com.commands.PrintPackingSlipToPDFCommand.run(PrintPackingSlipToPDFCommand.java:116)
>  at com.commands.AbstractCommand.execute(AbstractCommand.java:74)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
>  at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>  at java.lang.reflect.Method.invoke(Unknown Source)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To un

[jira] [Created] (PDFBOX-5069) Mention PDFBox-Graphics2D on the 'External Links' page

2021-01-04 Thread Emmeran Seehuber (Jira)
Emmeran Seehuber created PDFBOX-5069:


 Summary: Mention PDFBox-Graphics2D on the 'External Links' page
 Key: PDFBOX-5069
 URL: https://issues.apache.org/jira/browse/PDFBOX-5069
 Project: PDFBox
  Issue Type: Bug
  Components: Documentation
Reporter: Emmeran Seehuber


Please mention my PDFBox-Graphics2D project on the External Links page.

[https://github.com/rototor/pdfbox-graphics2d]
{quote}Using this library you can use any Graphics2D API based SVG / graph / 
chart library to embed those graphics as vector drawing in a PDF. In 
combination with PDFBox PDFRenderer/PageDrawer you can also "rerender" PDF 
pages and change certain aspects ( e.g. [change the color mapping and perform 
an 
overfill|https://github.com/rototor/pdfbox-graphics2d/blob/master/graphics2d/src/test/java/de/rototor/pdfbox/graphics2d/PdfRerenderTest.java]).
{quote}
 

If you have any suggestion to improve this wording, I would be happy to hear 
them.

Thanks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4974) PDImageXObject creation based on WritableImage

2020-09-30 Thread Emmeran Seehuber (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17204838#comment-17204838
 ] 

Emmeran Seehuber commented on PDFBOX-4974:
--

Official support for JavaFX on ARM was dropped in a later JDK8 update, see 
[https://www.oracle.com/java/technologies/javase/jdk-8u33-arm-relnotes.html].

Also developing with JavaFX at least using maven was a "mess", as you had to 
manually add JavaFX to your classpath, using something like
{code:java}

com.oracle
javafx
2.2
${java.home}/lib/jfxrt.jar
system

{code}
The problem here was, that ${java.home}/lib/jfxrt.jar was the path on Windows. 
On Linux and MacOS the path was ${java.home}/lib/*ext*/jfxrt.jar. Also only the 
OracleJDK included JavaFX, OpenJDK builds usually didn't include it. This was a 
rather horrible cross platform development experience.

It got way better with later JavaFX releases. You can now get it using maven, 
as it is now deployed on Maven Central. As you can read 
[here|https://wiki.openjdk.java.net/display/OpenJFX/Building+OpenJFX] the 
source still has ARM support in it, but it is not tested at all for the current 
builds. The last official build with ARM support was JavaFX11, see 
[here|https://gluonhq.com/products/javafx/].

I think your primary concern about not using SwingFXUtils to convert your 
JavaFX image to a BufferedImage is performance. You should be able to 
workaround it by directly bridging the JavaFX image to a DataBuffer. As a 
sample how this works you can look 
[here|https://github.com/haraldk/TwelveMonkeys/tree/master/sandbox/sandbox-common/src/main/java/com/twelvemonkeys/image]
 at the classes MappedFileBuffer and MappedImageFactory. You would just change 
the implementation of the DataBuffer to not use a memory mapped backed file, 
but the image data from JavaFX image instead. No idea if this would be really 
faster and if this is worth the effort.

> PDImageXObject creation based on WritableImage
> --
>
> Key: PDFBOX-4974
> URL: https://issues.apache.org/jira/browse/PDFBOX-4974
> Project: PDFBox
>  Issue Type: New Feature
>  Components: PDModel
>Affects Versions: 2.0.0
>Reporter: Robert Fink
>Priority: Major
>  Labels: javafx
>
> The goal is to work with a WritableImage from JavaFX in addition to a 
> BufferedImage from Swing.
> My proposal for a new feature is to extend the factory classes.
> - CCITTFactory by the method:
>  createFromImage(PDDocument document, WritableImage image)
> - JPEGFactory by the methods:
>  createFromImage(PDDocument document, WritableImage image)
>  createFromImage(PDDocument document, WritableImage image, float quality)
>  createFromImage(PDDocument document, WritableImage image, float quality, int 
> dpi)
> - LosslessFactory by methods:
>  createFromImage(PDDocument document, WritableImage image)
> Until now there is a need to use the class SwingFXUtils to do the conversion 
> from WritableImage to BufferedImage or vice versa.
> This new feature should come in handy for all JavaFX developers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4974) PDImageXObject creation based on WritableImage

2020-09-30 Thread Emmeran Seehuber (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17204591#comment-17204591
 ] 

Emmeran Seehuber commented on PDFBOX-4974:
--

[~Robertoni] I don't think this in the scope of this project. PDFBox 2.0 is on 
JDK6 - which features no JavaFX. Trunk is currently targeting JDK7. As JavaFX 
was never officially part of any JDK release (i.e. not reachable without 
special classpath etc. arguments), PDFBox will likely never be able to include 
it as dependency even if Trunk is moved forward to a newer JDK. 

Also JavaFX does not run on ARM and many other common java server platforms 
(e.g. AIX (PowerPC), Solaris (Sparc)).  

But you can of course make your own project on Github which bridges PDFBox to 
JavaFX and release it on maven central. 

> PDImageXObject creation based on WritableImage
> --
>
> Key: PDFBOX-4974
> URL: https://issues.apache.org/jira/browse/PDFBOX-4974
> Project: PDFBox
>  Issue Type: New Feature
>  Components: PDModel
>Affects Versions: 2.0.0
>Reporter: Robert Fink
>Priority: Major
>  Labels: javafx
>
> The goal is to work with a WritableImage from JavaFX in addition to a 
> BufferedImage from Swing.
> My proposal for a new feature is to extend the factory classes.
> - CCITTFactory by the method:
>  createFromImage(PDDocument document, WritableImage image)
> - JPEGFactory by the methods:
>  createFromImage(PDDocument document, WritableImage image)
>  createFromImage(PDDocument document, WritableImage image, float quality)
>  createFromImage(PDDocument document, WritableImage image, float quality, int 
> dpi)
> - LosslessFactory by methods:
>  createFromImage(PDDocument document, WritableImage image)
> Until now there is a need to use the class SwingFXUtils to do the conversion 
> from WritableImage to BufferedImage or vice versa.
> This new feature should come in handy for all JavaFX developers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-4962) CMYK support

2020-09-27 Thread Emmeran Seehuber (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17202826#comment-17202826
 ] 

Emmeran Seehuber edited comment on PDFBOX-4962 at 9/27/20, 7:00 PM:


I didn't see that there was a patch attached. I would suggest a different 
solution here. See the attached [^pdfcolor.patch].

=> The target Graphics2D needs some way to get the original colors. I would 
suggest a small java.awt.Color derived class, which just carries this PDColor 
information. So if the Graphics2D wants that it can use that. Otherwise nothing 
changes. 

Currently I always have to override getPaint() in PageDrawer to get this 
information. With this patch that would not be needed any more.


was (Author: rototor):
I didn't see that there was a patch attached. I would suggest a different 
solution here. See the attached [^pdfcolor.patch].

=> The target Graphics2D needs some way to get the original colors. I would 
suggest a small java.awt.Color derived class, which just carries this PDColor 
information. So if the Graphics2D whats that it can use that. Otherwise nothing 
changes. 

Currently I always have to override getPaint() in PageDrawer to get this 
information. With this patch that would not be needed any more.

> CMYK support
> 
>
> Key: PDFBOX-4962
> URL: https://issues.apache.org/jira/browse/PDFBOX-4962
> Project: PDFBox
>  Issue Type: Bug
>Reporter: Simon Steiner
>Priority: Major
> Attachments: pdfbox2cmyk.patch, pdfcolor.patch
>
>
> If the content stream has a cmyk color:
> 0 0 0 1 k
> pdfbox will convert this to rgb causing loss of fidelity
> is it possible to pass the cmyk color when you call graphics2d eg:
> {color:#660e7a}graphics{color}.setPaint(cmyk value)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-4962) CMYK support

2020-09-27 Thread Emmeran Seehuber (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-4962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Emmeran Seehuber updated PDFBOX-4962:
-
Attachment: pdfcolor.patch

> CMYK support
> 
>
> Key: PDFBOX-4962
> URL: https://issues.apache.org/jira/browse/PDFBOX-4962
> Project: PDFBox
>  Issue Type: Bug
>Reporter: Simon Steiner
>Priority: Major
> Attachments: pdfbox2cmyk.patch, pdfcolor.patch
>
>
> If the content stream has a cmyk color:
> 0 0 0 1 k
> pdfbox will convert this to rgb causing loss of fidelity
> is it possible to pass the cmyk color when you call graphics2d eg:
> {color:#660e7a}graphics{color}.setPaint(cmyk value)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4962) CMYK support

2020-09-27 Thread Emmeran Seehuber (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17202826#comment-17202826
 ] 

Emmeran Seehuber commented on PDFBOX-4962:
--

I didn't see that there was a patch attached. I would suggest a different 
solution here. See the attached [^pdfcolor.patch].

=> The target Graphics2D needs some way to get the original colors. I would 
suggest a small java.awt.Color derived class, which just carries this PDColor 
information. So if the Graphics2D whats that it can use that. Otherwise nothing 
changes. 

Currently I always have to override getPaint() in PageDrawer to get this 
information. With this patch that would not be needed any more.

> CMYK support
> 
>
> Key: PDFBOX-4962
> URL: https://issues.apache.org/jira/browse/PDFBOX-4962
> Project: PDFBox
>  Issue Type: Bug
>Reporter: Simon Steiner
>Priority: Major
> Attachments: pdfbox2cmyk.patch, pdfcolor.patch
>
>
> If the content stream has a cmyk color:
> 0 0 0 1 k
> pdfbox will convert this to rgb causing loss of fidelity
> is it possible to pass the cmyk color when you call graphics2d eg:
> {color:#660e7a}graphics{color}.setPaint(cmyk value)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4962) CMYK support

2020-09-24 Thread Emmeran Seehuber (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201662#comment-17201662
 ] 

Emmeran Seehuber commented on PDFBOX-4962:
--

At least iText 2 had a CMYKColor class (which derives from java.awt.Color) and 
I also made my own
[PdfBoxGraphics2DCMYKColor|https://github.com/rototor/pdfbox-graphics2d/blob/master/src/main/java/de/rototor/pdfbox/graphics2d/PdfBoxGraphics2DCMYKColor.java].
 A Graphics2D like my PdfBoxGraphics2D would love to have the original color 
values. I would also be happy if I could replace my PdfBoxGraphics2DCMYKColor 
class with an official PDFBox one.

> CMYK support
> 
>
> Key: PDFBOX-4962
> URL: https://issues.apache.org/jira/browse/PDFBOX-4962
> Project: PDFBox
>  Issue Type: Bug
>Reporter: Simon Steiner
>Priority: Major
> Attachments: pdfbox2cmyk.patch
>
>
> If the content stream has a cmyk color:
> 0 0 0 1 k
> pdfbox will convert this to rgb causing loss of fidelity
> is it possible to pass the cmyk color when you call graphics2d eg:
> {color:#660e7a}graphics{color}.setPaint(cmyk value)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4847) [PATCH] Allow to access raw image data and fix ICC profile embedding in PNGConverter

2020-08-25 Thread Emmeran Seehuber (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17184689#comment-17184689
 ] 

Emmeran Seehuber commented on PDFBOX-4847:
--

The patch [^2.0-raster-image_v2.patch] implements (optional) access to raw 
images as BufferedImage. This of course only possible if we have a matching 
Java colorspace. The main use case here is to access images with a ICC color 
profile. This patch requires [^2.0-raw-raster-v2.patch] to be applied first.

 

The patch [^2.0-extractimage-raw.patch] extends the ExtractImages utility with 
a new "-noColorConvert" option. As said in the ticket description a TIFF 
encoder (like TwelveMonkeys) is required in the classpath for this to work with 
CMYK images. I see this patch as optional. Of course it requires the 
raster-image patch to be applied first.

> [PATCH] Allow to access raw image data and fix ICC profile embedding in 
> PNGConverter
> 
>
> Key: PDFBOX-4847
> URL: https://issues.apache.org/jira/browse/PDFBOX-4847
> Project: PDFBox
>  Issue Type: New Feature
>  Components: PDModel, Writing
>Affects Versions: 2.0.19
>Reporter: Emmeran Seehuber
>Priority: Minor
>  Labels: feature, patch
> Fix For: 3.0.0 PDFBox, 2.0.22
>
> Attachments: 2.0-extractimage-raw.patch, 2.0-raster-image_v2.patch, 
> 2.0-raw-raster-v2.patch, 2.0-raw-raster.patch, color_difference.png, 
> pdfbox-image-compare.patch, pdfbox-rawimages.patch, 
> png-compress-icc-profile.patch
>
>
> This patch was primary thought to add access to raw image data (i.e. without 
> any kind of color conversion/reduction). While implementing and testing it I 
> also found a bug with ICC profile embedding in the PNGConverter.
> This patch does those things:
>  - add a method getRawRaster() to PDImage. This allows to read the original 
> raster data in 8 or 16 bit without any kind of color interpretation. The user 
> must know what he wants to do with this himself (E.g. to access the raw data 
> of DeviceN images).
>  - add a method getRawImage(). Tries to return the raster obtained by 
> getRawRaster() as a BufferedImage. This is only successful if there is a 
> matching java ColorSpace for the colorspace of the image. I.e. only for 
> ICCBased images. In theory this also should work for PDIndexed sRGB images. 
> But I have to find a PDF with such an image first to test it.
>  - add a -noColorConversion switch to the ExtractImage utility to extract 
> images in their original colorspace. For CMYK images this only works when a 
> TIFF encoder (e.g. from TwelveMonkeys) is in the class path.
>  - add support to export PNGs with ICC profile data in ImageIOUtil.
>  - fix a bug in PNGConverter which does not correctly embed the ICC profile 
> from the png file.
>  - the PNGConverterTest tests the raw images; While reading PNG files to 
> compare it also ensures that the embedded ICC profile is correctly respected. 
> The default PNG reader at least till JDK11 does *not* respect the embedded 
> ICC profile. I.e. the colors are wrong. But there is a workaround for this in 
> the PNGConverterTest (which I have in production for years now). See the 
> screenshot for the correct color display of the png_rgb_romm_16.png testfile 
> (left side; macOS Preview app) and the wrong display (right side; Java; 
> inside IDEA).
>  
> Access to the raw image allows beside finding bugs like in the PNGConverter 
> it also to do all kind of funny color things. E.g. a future patch could be to 
> allow using the raw images to print PDFs. If the PDF you want to print has 
> images with a gamut > sRGB (i.e. all modern cameras) and the target printer 
> has also a gamut > sRGB (i.e. some ink photo printer) you will for sure see a 
> difference in the resulting print. Such a mode would be rather slow, as the 
> current sRGB image handling is optimized for speed and using the original raw 
> images would need on demand color conversions in the printer driver. But you 
> get „high quality“ out of it (at least in respect to colors).
> I don’t think this is in time for the 2.0.20 release.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-4847) [PATCH] Allow to access raw image data and fix ICC profile embedding in PNGConverter

2020-08-25 Thread Emmeran Seehuber (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Emmeran Seehuber updated PDFBOX-4847:
-
Attachment: 2.0-raster-image_v2.patch
2.0-extractimage-raw.patch

> [PATCH] Allow to access raw image data and fix ICC profile embedding in 
> PNGConverter
> 
>
> Key: PDFBOX-4847
> URL: https://issues.apache.org/jira/browse/PDFBOX-4847
> Project: PDFBox
>  Issue Type: New Feature
>  Components: PDModel, Writing
>Affects Versions: 2.0.19
>Reporter: Emmeran Seehuber
>Priority: Minor
>  Labels: feature, patch
> Fix For: 3.0.0 PDFBox, 2.0.22
>
> Attachments: 2.0-extractimage-raw.patch, 2.0-raster-image_v2.patch, 
> 2.0-raw-raster-v2.patch, 2.0-raw-raster.patch, color_difference.png, 
> pdfbox-image-compare.patch, pdfbox-rawimages.patch, 
> png-compress-icc-profile.patch
>
>
> This patch was primary thought to add access to raw image data (i.e. without 
> any kind of color conversion/reduction). While implementing and testing it I 
> also found a bug with ICC profile embedding in the PNGConverter.
> This patch does those things:
>  - add a method getRawRaster() to PDImage. This allows to read the original 
> raster data in 8 or 16 bit without any kind of color interpretation. The user 
> must know what he wants to do with this himself (E.g. to access the raw data 
> of DeviceN images).
>  - add a method getRawImage(). Tries to return the raster obtained by 
> getRawRaster() as a BufferedImage. This is only successful if there is a 
> matching java ColorSpace for the colorspace of the image. I.e. only for 
> ICCBased images. In theory this also should work for PDIndexed sRGB images. 
> But I have to find a PDF with such an image first to test it.
>  - add a -noColorConversion switch to the ExtractImage utility to extract 
> images in their original colorspace. For CMYK images this only works when a 
> TIFF encoder (e.g. from TwelveMonkeys) is in the class path.
>  - add support to export PNGs with ICC profile data in ImageIOUtil.
>  - fix a bug in PNGConverter which does not correctly embed the ICC profile 
> from the png file.
>  - the PNGConverterTest tests the raw images; While reading PNG files to 
> compare it also ensures that the embedded ICC profile is correctly respected. 
> The default PNG reader at least till JDK11 does *not* respect the embedded 
> ICC profile. I.e. the colors are wrong. But there is a workaround for this in 
> the PNGConverterTest (which I have in production for years now). See the 
> screenshot for the correct color display of the png_rgb_romm_16.png testfile 
> (left side; macOS Preview app) and the wrong display (right side; Java; 
> inside IDEA).
>  
> Access to the raw image allows beside finding bugs like in the PNGConverter 
> it also to do all kind of funny color things. E.g. a future patch could be to 
> allow using the raw images to print PDFs. If the PDF you want to print has 
> images with a gamut > sRGB (i.e. all modern cameras) and the target printer 
> has also a gamut > sRGB (i.e. some ink photo printer) you will for sure see a 
> difference in the resulting print. Such a mode would be rather slow, as the 
> current sRGB image handling is optimized for speed and using the original raw 
> images would need on demand color conversions in the printer driver. But you 
> get „high quality“ out of it (at least in respect to colors).
> I don’t think this is in time for the 2.0.20 release.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4847) [PATCH] Allow to access raw image data and fix ICC profile embedding in PNGConverter

2020-08-25 Thread Emmeran Seehuber (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17184675#comment-17184675
 ] 

Emmeran Seehuber commented on PDFBOX-4847:
--

While preparing the patch I found a somewhat unrelated change in the 
PNGConverter laying around... [^png-compress-icc-profile.patch]

This patch will ensure that an embedded sRGB color profile is also compressed. 
(Without this patch the sRGB color profile is embedded uncompressed)

> [PATCH] Allow to access raw image data and fix ICC profile embedding in 
> PNGConverter
> 
>
> Key: PDFBOX-4847
> URL: https://issues.apache.org/jira/browse/PDFBOX-4847
> Project: PDFBox
>  Issue Type: New Feature
>  Components: PDModel, Writing
>Affects Versions: 2.0.19
>Reporter: Emmeran Seehuber
>Priority: Minor
>  Labels: feature, patch
> Fix For: 3.0.0 PDFBox, 2.0.22
>
> Attachments: 2.0-raw-raster-v2.patch, 2.0-raw-raster.patch, 
> color_difference.png, pdfbox-image-compare.patch, pdfbox-rawimages.patch, 
> png-compress-icc-profile.patch
>
>
> This patch was primary thought to add access to raw image data (i.e. without 
> any kind of color conversion/reduction). While implementing and testing it I 
> also found a bug with ICC profile embedding in the PNGConverter.
> This patch does those things:
>  - add a method getRawRaster() to PDImage. This allows to read the original 
> raster data in 8 or 16 bit without any kind of color interpretation. The user 
> must know what he wants to do with this himself (E.g. to access the raw data 
> of DeviceN images).
>  - add a method getRawImage(). Tries to return the raster obtained by 
> getRawRaster() as a BufferedImage. This is only successful if there is a 
> matching java ColorSpace for the colorspace of the image. I.e. only for 
> ICCBased images. In theory this also should work for PDIndexed sRGB images. 
> But I have to find a PDF with such an image first to test it.
>  - add a -noColorConversion switch to the ExtractImage utility to extract 
> images in their original colorspace. For CMYK images this only works when a 
> TIFF encoder (e.g. from TwelveMonkeys) is in the class path.
>  - add support to export PNGs with ICC profile data in ImageIOUtil.
>  - fix a bug in PNGConverter which does not correctly embed the ICC profile 
> from the png file.
>  - the PNGConverterTest tests the raw images; While reading PNG files to 
> compare it also ensures that the embedded ICC profile is correctly respected. 
> The default PNG reader at least till JDK11 does *not* respect the embedded 
> ICC profile. I.e. the colors are wrong. But there is a workaround for this in 
> the PNGConverterTest (which I have in production for years now). See the 
> screenshot for the correct color display of the png_rgb_romm_16.png testfile 
> (left side; macOS Preview app) and the wrong display (right side; Java; 
> inside IDEA).
>  
> Access to the raw image allows beside finding bugs like in the PNGConverter 
> it also to do all kind of funny color things. E.g. a future patch could be to 
> allow using the raw images to print PDFs. If the PDF you want to print has 
> images with a gamut > sRGB (i.e. all modern cameras) and the target printer 
> has also a gamut > sRGB (i.e. some ink photo printer) you will for sure see a 
> difference in the resulting print. Such a mode would be rather slow, as the 
> current sRGB image handling is optimized for speed and using the original raw 
> images would need on demand color conversions in the printer driver. But you 
> get „high quality“ out of it (at least in respect to colors).
> I don’t think this is in time for the 2.0.20 release.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-4847) [PATCH] Allow to access raw image data and fix ICC profile embedding in PNGConverter

2020-08-25 Thread Emmeran Seehuber (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Emmeran Seehuber updated PDFBOX-4847:
-
Attachment: png-compress-icc-profile.patch

> [PATCH] Allow to access raw image data and fix ICC profile embedding in 
> PNGConverter
> 
>
> Key: PDFBOX-4847
> URL: https://issues.apache.org/jira/browse/PDFBOX-4847
> Project: PDFBox
>  Issue Type: New Feature
>  Components: PDModel, Writing
>Affects Versions: 2.0.19
>Reporter: Emmeran Seehuber
>Priority: Minor
>  Labels: feature, patch
> Fix For: 3.0.0 PDFBox, 2.0.22
>
> Attachments: 2.0-raw-raster-v2.patch, 2.0-raw-raster.patch, 
> color_difference.png, pdfbox-image-compare.patch, pdfbox-rawimages.patch, 
> png-compress-icc-profile.patch
>
>
> This patch was primary thought to add access to raw image data (i.e. without 
> any kind of color conversion/reduction). While implementing and testing it I 
> also found a bug with ICC profile embedding in the PNGConverter.
> This patch does those things:
>  - add a method getRawRaster() to PDImage. This allows to read the original 
> raster data in 8 or 16 bit without any kind of color interpretation. The user 
> must know what he wants to do with this himself (E.g. to access the raw data 
> of DeviceN images).
>  - add a method getRawImage(). Tries to return the raster obtained by 
> getRawRaster() as a BufferedImage. This is only successful if there is a 
> matching java ColorSpace for the colorspace of the image. I.e. only for 
> ICCBased images. In theory this also should work for PDIndexed sRGB images. 
> But I have to find a PDF with such an image first to test it.
>  - add a -noColorConversion switch to the ExtractImage utility to extract 
> images in their original colorspace. For CMYK images this only works when a 
> TIFF encoder (e.g. from TwelveMonkeys) is in the class path.
>  - add support to export PNGs with ICC profile data in ImageIOUtil.
>  - fix a bug in PNGConverter which does not correctly embed the ICC profile 
> from the png file.
>  - the PNGConverterTest tests the raw images; While reading PNG files to 
> compare it also ensures that the embedded ICC profile is correctly respected. 
> The default PNG reader at least till JDK11 does *not* respect the embedded 
> ICC profile. I.e. the colors are wrong. But there is a workaround for this in 
> the PNGConverterTest (which I have in production for years now). See the 
> screenshot for the correct color display of the png_rgb_romm_16.png testfile 
> (left side; macOS Preview app) and the wrong display (right side; Java; 
> inside IDEA).
>  
> Access to the raw image allows beside finding bugs like in the PNGConverter 
> it also to do all kind of funny color things. E.g. a future patch could be to 
> allow using the raw images to print PDFs. If the PDF you want to print has 
> images with a gamut > sRGB (i.e. all modern cameras) and the target printer 
> has also a gamut > sRGB (i.e. some ink photo printer) you will for sure see a 
> difference in the resulting print. Such a mode would be rather slow, as the 
> current sRGB image handling is optimized for speed and using the original raw 
> images would need on demand color conversions in the printer driver. But you 
> get „high quality“ out of it (at least in respect to colors).
> I don’t think this is in time for the 2.0.20 release.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4847) [PATCH] Allow to access raw image data and fix ICC profile embedding in PNGConverter

2020-08-25 Thread Emmeran Seehuber (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17184651#comment-17184651
 ] 

Emmeran Seehuber commented on PDFBOX-4847:
--

I've attached a rebased (against 2.0) version of the "raw-raster" only patch.  
[^2.0-raw-raster-v2.patch]

See also my comment in 
https://issues.apache.org/jira/browse/PDFBOX-4847?focusedCommentId=17134817&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17134817
 

 

The remaining patch, which implements getRawImage() is rather big, as it needs 
to extend the PDColorSpace`s. I'll redo a patch against a branch with the raw 
raster patch applied for that.

> [PATCH] Allow to access raw image data and fix ICC profile embedding in 
> PNGConverter
> 
>
> Key: PDFBOX-4847
> URL: https://issues.apache.org/jira/browse/PDFBOX-4847
> Project: PDFBox
>  Issue Type: New Feature
>  Components: PDModel, Writing
>Affects Versions: 2.0.19
>Reporter: Emmeran Seehuber
>Priority: Minor
>  Labels: feature, patch
> Fix For: 3.0.0 PDFBox, 2.0.22
>
> Attachments: 2.0-raw-raster-v2.patch, 2.0-raw-raster.patch, 
> color_difference.png, pdfbox-image-compare.patch, pdfbox-rawimages.patch
>
>
> This patch was primary thought to add access to raw image data (i.e. without 
> any kind of color conversion/reduction). While implementing and testing it I 
> also found a bug with ICC profile embedding in the PNGConverter.
> This patch does those things:
>  - add a method getRawRaster() to PDImage. This allows to read the original 
> raster data in 8 or 16 bit without any kind of color interpretation. The user 
> must know what he wants to do with this himself (E.g. to access the raw data 
> of DeviceN images).
>  - add a method getRawImage(). Tries to return the raster obtained by 
> getRawRaster() as a BufferedImage. This is only successful if there is a 
> matching java ColorSpace for the colorspace of the image. I.e. only for 
> ICCBased images. In theory this also should work for PDIndexed sRGB images. 
> But I have to find a PDF with such an image first to test it.
>  - add a -noColorConversion switch to the ExtractImage utility to extract 
> images in their original colorspace. For CMYK images this only works when a 
> TIFF encoder (e.g. from TwelveMonkeys) is in the class path.
>  - add support to export PNGs with ICC profile data in ImageIOUtil.
>  - fix a bug in PNGConverter which does not correctly embed the ICC profile 
> from the png file.
>  - the PNGConverterTest tests the raw images; While reading PNG files to 
> compare it also ensures that the embedded ICC profile is correctly respected. 
> The default PNG reader at least till JDK11 does *not* respect the embedded 
> ICC profile. I.e. the colors are wrong. But there is a workaround for this in 
> the PNGConverterTest (which I have in production for years now). See the 
> screenshot for the correct color display of the png_rgb_romm_16.png testfile 
> (left side; macOS Preview app) and the wrong display (right side; Java; 
> inside IDEA).
>  
> Access to the raw image allows beside finding bugs like in the PNGConverter 
> it also to do all kind of funny color things. E.g. a future patch could be to 
> allow using the raw images to print PDFs. If the PDF you want to print has 
> images with a gamut > sRGB (i.e. all modern cameras) and the target printer 
> has also a gamut > sRGB (i.e. some ink photo printer) you will for sure see a 
> difference in the resulting print. Such a mode would be rather slow, as the 
> current sRGB image handling is optimized for speed and using the original raw 
> images would need on demand color conversions in the printer driver. But you 
> get „high quality“ out of it (at least in respect to colors).
> I don’t think this is in time for the 2.0.20 release.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-4847) [PATCH] Allow to access raw image data and fix ICC profile embedding in PNGConverter

2020-08-25 Thread Emmeran Seehuber (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Emmeran Seehuber updated PDFBOX-4847:
-
Attachment: 2.0-raw-raster-v2.patch

> [PATCH] Allow to access raw image data and fix ICC profile embedding in 
> PNGConverter
> 
>
> Key: PDFBOX-4847
> URL: https://issues.apache.org/jira/browse/PDFBOX-4847
> Project: PDFBox
>  Issue Type: New Feature
>  Components: PDModel, Writing
>Affects Versions: 2.0.19
>Reporter: Emmeran Seehuber
>Priority: Minor
>  Labels: feature, patch
> Fix For: 3.0.0 PDFBox, 2.0.22
>
> Attachments: 2.0-raw-raster-v2.patch, 2.0-raw-raster.patch, 
> color_difference.png, pdfbox-image-compare.patch, pdfbox-rawimages.patch
>
>
> This patch was primary thought to add access to raw image data (i.e. without 
> any kind of color conversion/reduction). While implementing and testing it I 
> also found a bug with ICC profile embedding in the PNGConverter.
> This patch does those things:
>  - add a method getRawRaster() to PDImage. This allows to read the original 
> raster data in 8 or 16 bit without any kind of color interpretation. The user 
> must know what he wants to do with this himself (E.g. to access the raw data 
> of DeviceN images).
>  - add a method getRawImage(). Tries to return the raster obtained by 
> getRawRaster() as a BufferedImage. This is only successful if there is a 
> matching java ColorSpace for the colorspace of the image. I.e. only for 
> ICCBased images. In theory this also should work for PDIndexed sRGB images. 
> But I have to find a PDF with such an image first to test it.
>  - add a -noColorConversion switch to the ExtractImage utility to extract 
> images in their original colorspace. For CMYK images this only works when a 
> TIFF encoder (e.g. from TwelveMonkeys) is in the class path.
>  - add support to export PNGs with ICC profile data in ImageIOUtil.
>  - fix a bug in PNGConverter which does not correctly embed the ICC profile 
> from the png file.
>  - the PNGConverterTest tests the raw images; While reading PNG files to 
> compare it also ensures that the embedded ICC profile is correctly respected. 
> The default PNG reader at least till JDK11 does *not* respect the embedded 
> ICC profile. I.e. the colors are wrong. But there is a workaround for this in 
> the PNGConverterTest (which I have in production for years now). See the 
> screenshot for the correct color display of the png_rgb_romm_16.png testfile 
> (left side; macOS Preview app) and the wrong display (right side; Java; 
> inside IDEA).
>  
> Access to the raw image allows beside finding bugs like in the PNGConverter 
> it also to do all kind of funny color things. E.g. a future patch could be to 
> allow using the raw images to print PDFs. If the PDF you want to print has 
> images with a gamut > sRGB (i.e. all modern cameras) and the target printer 
> has also a gamut > sRGB (i.e. some ink photo printer) you will for sure see a 
> difference in the resulting print. Such a mode would be rather slow, as the 
> current sRGB image handling is optimized for speed and using the original raw 
> images would need on demand color conversions in the printer driver. But you 
> get „high quality“ out of it (at least in respect to colors).
> I don’t think this is in time for the 2.0.20 release.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4886) Regression: Images get blurry when rendering with 304 DPI (works fine with 2.0.19)

2020-06-15 Thread Emmeran Seehuber (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17136113#comment-17136113
 ] 

Emmeran Seehuber commented on PDFBOX-4886:
--

I've tested the fix in trunk, that seems to work fine. 

BTW: why is there a InputStreamRandomAccessRead when you need a 
RandomAccessReadBuffer to parse a InputStream with PDFParser? 
InputStreamRandomAccessRead won't work because .length() is not implemented. 
It's a little bit confusing.

> Regression: Images get blurry when rendering with 304 DPI (works fine with 
> 2.0.19)
> --
>
> Key: PDFBOX-4886
> URL: https://issues.apache.org/jira/browse/PDFBOX-4886
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.20
>Reporter: Emmeran Seehuber
>Priority: Major
> Attachments: FlowerBompA6.pdf, image-2020-06-15-15-29-32-611.png, 
> image-2020-06-15-15-29-33-208.png
>
>
> There seems to be a regression when rendering a PDF to an high DPI image with 
> PDFBox 2.0.20. Everything worked fine with PDFBox 2.0.19.
> See the attached [^FlowerBompA6.pdf] file. The rendering is done using
> {{}}{{PDFRenderer pdfRenderer = new PDFRenderer(doc);}}
> {{BufferedImage bufferedImage = pdfRenderer.renderImageWithDPI(0, 304, 
> ImageType.RGB);}}
>  
> I've build a small project to reproduce this problem:
> [https://github.com/rototor/pdfbox-render-blurry]
> Checkout this repository and run mvn clean test. You find a blurry rendering 
> of the PDF as result in the target/test folder.
> !image-2020-06-15-15-29-32-611.png!
> Then change the PDFBox Version in the pom.xml to 2.0.19 and run mvn clean 
> test again.
>  The rendering of the flowers are now sharp.
> !image-2020-06-15-15-29-33-208.png!
> A test driver in a project of mine compares the resulting file sizes after 
> exporting the image as JPGs. Because of the blurring the resulting file is 
> way smaller then it used to be...
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-4886) Regression: Images get blurry when rendering with 304 DPI (works fine with 2.0.19)

2020-06-15 Thread Emmeran Seehuber (Jira)
Emmeran Seehuber created PDFBOX-4886:


 Summary: Regression: Images get blurry when rendering with 304 DPI 
(works fine with 2.0.19)
 Key: PDFBOX-4886
 URL: https://issues.apache.org/jira/browse/PDFBOX-4886
 Project: PDFBox
  Issue Type: Bug
  Components: Rendering
Affects Versions: 2.0.20
Reporter: Emmeran Seehuber
 Attachments: FlowerBompA6.pdf, image-2020-06-15-15-29-32-611.png, 
image-2020-06-15-15-29-33-208.png

There seems to be a regression when rendering a PDF to an high DPI image with 
PDFBox 2.0.20. Everything worked fine with PDFBox 2.0.19.

See the attached [^FlowerBompA6.pdf] file. The rendering is done using

{{}}{{PDFRenderer pdfRenderer = new PDFRenderer(doc);}}
{{BufferedImage bufferedImage = pdfRenderer.renderImageWithDPI(0, 304, 
ImageType.RGB);}}

 

I've build a small project to reproduce this problem:

[https://github.com/rototor/pdfbox-render-blurry]

Checkout this repository and run mvn clean test. You find a blurry rendering of 
the PDF as result in the target/test folder.

!image-2020-06-15-15-29-32-611.png!

Then change the PDFBox Version in the pom.xml to 2.0.19 and run mvn clean test 
again.
 The rendering of the flowers are now sharp.

!image-2020-06-15-15-29-33-208.png!

A test driver in a project of mine compares the resulting file sizes after 
exporting the image as JPGs. Because of the blurring the resulting file is way 
smaller then it used to be...

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4847) [PATCH] Allow to access raw image data and fix ICC profile embedding in PNGConverter

2020-06-13 Thread Emmeran Seehuber (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17134817#comment-17134817
 ] 

Emmeran Seehuber commented on PDFBOX-4847:
--

This [^2.0-raw-raster.patch] patch implements the getRawRaster() method. It is 
independent of the other changes. You can only meaningfully validate this if 
the getRawImage() method is also implemented, as you then can compare the image 
data. 

After you applied the compare and the raw-raster patch I will provide a rebased 
raw-image patch with the remaining bits.

> [PATCH] Allow to access raw image data and fix ICC profile embedding in 
> PNGConverter
> 
>
> Key: PDFBOX-4847
> URL: https://issues.apache.org/jira/browse/PDFBOX-4847
> Project: PDFBox
>  Issue Type: New Feature
>  Components: PDModel, Writing
>Affects Versions: 2.0.19
>Reporter: Emmeran Seehuber
>Priority: Minor
>  Labels: feature, patch
> Fix For: 2.0.21, 3.0.0 PDFBox
>
> Attachments: 2.0-raw-raster.patch, color_difference.png, 
> pdfbox-image-compare.patch, pdfbox-rawimages.patch
>
>
> This patch was primary thought to add access to raw image data (i.e. without 
> any kind of color conversion/reduction). While implementing and testing it I 
> also found a bug with ICC profile embedding in the PNGConverter.
> This patch does those things:
>  - add a method getRawRaster() to PDImage. This allows to read the original 
> raster data in 8 or 16 bit without any kind of color interpretation. The user 
> must know what he wants to do with this himself (E.g. to access the raw data 
> of DeviceN images).
>  - add a method getRawImage(). Tries to return the raster obtained by 
> getRawRaster() as a BufferedImage. This is only successful if there is a 
> matching java ColorSpace for the colorspace of the image. I.e. only for 
> ICCBased images. In theory this also should work for PDIndexed sRGB images. 
> But I have to find a PDF with such an image first to test it.
>  - add a -noColorConversion switch to the ExtractImage utility to extract 
> images in their original colorspace. For CMYK images this only works when a 
> TIFF encoder (e.g. from TwelveMonkeys) is in the class path.
>  - add support to export PNGs with ICC profile data in ImageIOUtil.
>  - fix a bug in PNGConverter which does not correctly embed the ICC profile 
> from the png file.
>  - the PNGConverterTest tests the raw images; While reading PNG files to 
> compare it also ensures that the embedded ICC profile is correctly respected. 
> The default PNG reader at least till JDK11 does *not* respect the embedded 
> ICC profile. I.e. the colors are wrong. But there is a workaround for this in 
> the PNGConverterTest (which I have in production for years now). See the 
> screenshot for the correct color display of the png_rgb_romm_16.png testfile 
> (left side; macOS Preview app) and the wrong display (right side; Java; 
> inside IDEA).
>  
> Access to the raw image allows beside finding bugs like in the PNGConverter 
> it also to do all kind of funny color things. E.g. a future patch could be to 
> allow using the raw images to print PDFs. If the PDF you want to print has 
> images with a gamut > sRGB (i.e. all modern cameras) and the target printer 
> has also a gamut > sRGB (i.e. some ink photo printer) you will for sure see a 
> difference in the resulting print. Such a mode would be rather slow, as the 
> current sRGB image handling is optimized for speed and using the original raw 
> images would need on demand color conversions in the printer driver. But you 
> get „high quality“ out of it (at least in respect to colors).
> I don’t think this is in time for the 2.0.20 release.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-4847) [PATCH] Allow to access raw image data and fix ICC profile embedding in PNGConverter

2020-06-13 Thread Emmeran Seehuber (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Emmeran Seehuber updated PDFBOX-4847:
-
Attachment: 2.0-raw-raster.patch

> [PATCH] Allow to access raw image data and fix ICC profile embedding in 
> PNGConverter
> 
>
> Key: PDFBOX-4847
> URL: https://issues.apache.org/jira/browse/PDFBOX-4847
> Project: PDFBox
>  Issue Type: New Feature
>  Components: PDModel, Writing
>Affects Versions: 2.0.19
>Reporter: Emmeran Seehuber
>Priority: Minor
>  Labels: feature, patch
> Fix For: 2.0.21, 3.0.0 PDFBox
>
> Attachments: 2.0-raw-raster.patch, color_difference.png, 
> pdfbox-image-compare.patch, pdfbox-rawimages.patch
>
>
> This patch was primary thought to add access to raw image data (i.e. without 
> any kind of color conversion/reduction). While implementing and testing it I 
> also found a bug with ICC profile embedding in the PNGConverter.
> This patch does those things:
>  - add a method getRawRaster() to PDImage. This allows to read the original 
> raster data in 8 or 16 bit without any kind of color interpretation. The user 
> must know what he wants to do with this himself (E.g. to access the raw data 
> of DeviceN images).
>  - add a method getRawImage(). Tries to return the raster obtained by 
> getRawRaster() as a BufferedImage. This is only successful if there is a 
> matching java ColorSpace for the colorspace of the image. I.e. only for 
> ICCBased images. In theory this also should work for PDIndexed sRGB images. 
> But I have to find a PDF with such an image first to test it.
>  - add a -noColorConversion switch to the ExtractImage utility to extract 
> images in their original colorspace. For CMYK images this only works when a 
> TIFF encoder (e.g. from TwelveMonkeys) is in the class path.
>  - add support to export PNGs with ICC profile data in ImageIOUtil.
>  - fix a bug in PNGConverter which does not correctly embed the ICC profile 
> from the png file.
>  - the PNGConverterTest tests the raw images; While reading PNG files to 
> compare it also ensures that the embedded ICC profile is correctly respected. 
> The default PNG reader at least till JDK11 does *not* respect the embedded 
> ICC profile. I.e. the colors are wrong. But there is a workaround for this in 
> the PNGConverterTest (which I have in production for years now). See the 
> screenshot for the correct color display of the png_rgb_romm_16.png testfile 
> (left side; macOS Preview app) and the wrong display (right side; Java; 
> inside IDEA).
>  
> Access to the raw image allows beside finding bugs like in the PNGConverter 
> it also to do all kind of funny color things. E.g. a future patch could be to 
> allow using the raw images to print PDFs. If the PDF you want to print has 
> images with a gamut > sRGB (i.e. all modern cameras) and the target printer 
> has also a gamut > sRGB (i.e. some ink photo printer) you will for sure see a 
> difference in the resulting print. Such a mode would be rather slow, as the 
> current sRGB image handling is optimized for speed and using the original raw 
> images would need on demand color conversions in the printer driver. But you 
> get „high quality“ out of it (at least in respect to colors).
> I don’t think this is in time for the 2.0.20 release.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4847) [PATCH] Allow to access raw image data and fix ICC profile embedding in PNGConverter

2020-06-13 Thread Emmeran Seehuber (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17134810#comment-17134810
 ] 

Emmeran Seehuber commented on PDFBOX-4847:
--

The attached patch [^pdfbox-image-compare.patch] implements this check. For 
whatever reason, it fails when running with openjdk-8 on Travis (see 
[https://travis-ci.com/github/rototor/pdfbox/jobs/348736754]), but it works for 
me when running local on the oracle-jdk8, so no idea if it is some missing 
bugfix in the openjdk-8 used by Travis - or if they use some other color 
management which is not correct. 

The special case for the 16-bit handling in ValidateXImage.convertToSRGB() 
needs some explanation. Without this block, a 16-bit image is converted to 8 
bit by the ColorConvertOp. This "should" be fine normally, but it is not in 
this case. The reason is, that when going from 16-bit to 8-bit per channel you 
always have some information loss. "The rest of the word" usually just do 
something like ((channelValue >> 8) & 0xFF). I.e. just shift 8 bits out and be 
done. But PDFBox, especially SampledImageReader.fromAny(), uses floats and 
rounds the result (see SampledImageReader:555 in 2.0). I see the need to use 
floats here, as you must respect the domain entry for the value range, so you 
are more or less forced to use floats. But instead of truncating the value by 
just casting it to int, Math.round() is used. You can argue here if using 
Math.round() or just truncating the values is the "right" way here - but as you 
always lose information there is IMHO no "right" way. So this whole block is 
only needed because SampledImageReader.fromAny() uses Math.round().

The PNGConverterTest.getImageWithProfileData() method is required because the 
ImageIO PNG reader does not respect the color profile of the PNG when reading 
it. So you must "tag" the BufferedImage with the right color profile, which 
this method does. At least in JDK11 this bug persists, no idea if this has been 
fixed on newer JDKs.

> [PATCH] Allow to access raw image data and fix ICC profile embedding in 
> PNGConverter
> 
>
> Key: PDFBOX-4847
> URL: https://issues.apache.org/jira/browse/PDFBOX-4847
> Project: PDFBox
>  Issue Type: New Feature
>  Components: PDModel, Writing
>Affects Versions: 2.0.19
>Reporter: Emmeran Seehuber
>Priority: Minor
>  Labels: feature, patch
> Fix For: 2.0.21, 3.0.0 PDFBox
>
> Attachments: color_difference.png, pdfbox-image-compare.patch, 
> pdfbox-rawimages.patch
>
>
> This patch was primary thought to add access to raw image data (i.e. without 
> any kind of color conversion/reduction). While implementing and testing it I 
> also found a bug with ICC profile embedding in the PNGConverter.
> This patch does those things:
>  - add a method getRawRaster() to PDImage. This allows to read the original 
> raster data in 8 or 16 bit without any kind of color interpretation. The user 
> must know what he wants to do with this himself (E.g. to access the raw data 
> of DeviceN images).
>  - add a method getRawImage(). Tries to return the raster obtained by 
> getRawRaster() as a BufferedImage. This is only successful if there is a 
> matching java ColorSpace for the colorspace of the image. I.e. only for 
> ICCBased images. In theory this also should work for PDIndexed sRGB images. 
> But I have to find a PDF with such an image first to test it.
>  - add a -noColorConversion switch to the ExtractImage utility to extract 
> images in their original colorspace. For CMYK images this only works when a 
> TIFF encoder (e.g. from TwelveMonkeys) is in the class path.
>  - add support to export PNGs with ICC profile data in ImageIOUtil.
>  - fix a bug in PNGConverter which does not correctly embed the ICC profile 
> from the png file.
>  - the PNGConverterTest tests the raw images; While reading PNG files to 
> compare it also ensures that the embedded ICC profile is correctly respected. 
> The default PNG reader at least till JDK11 does *not* respect the embedded 
> ICC profile. I.e. the colors are wrong. But there is a workaround for this in 
> the PNGConverterTest (which I have in production for years now). See the 
> screenshot for the correct color display of the png_rgb_romm_16.png testfile 
> (left side; macOS Preview app) and the wrong display (right side; Java; 
> inside IDEA).
>  
> Access to the raw image allows beside finding bugs like in the PNGConverter 
> it also to do all kind of funny color things. E.g. a future patch could be to 
> allow using the raw images to print PDFs. If the PDF you want to print has 
> images with a gamut > sRGB (i.e. all modern cameras) and the target printer 
> has also a gamut > sRGB (i.e. some ink photo printer) you will

[jira] [Updated] (PDFBOX-4847) [PATCH] Allow to access raw image data and fix ICC profile embedding in PNGConverter

2020-06-13 Thread Emmeran Seehuber (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Emmeran Seehuber updated PDFBOX-4847:
-
Attachment: pdfbox-image-compare.patch

> [PATCH] Allow to access raw image data and fix ICC profile embedding in 
> PNGConverter
> 
>
> Key: PDFBOX-4847
> URL: https://issues.apache.org/jira/browse/PDFBOX-4847
> Project: PDFBox
>  Issue Type: New Feature
>  Components: PDModel, Writing
>Affects Versions: 2.0.19
>Reporter: Emmeran Seehuber
>Priority: Minor
>  Labels: feature, patch
> Fix For: 2.0.21, 3.0.0 PDFBox
>
> Attachments: color_difference.png, pdfbox-image-compare.patch, 
> pdfbox-rawimages.patch
>
>
> This patch was primary thought to add access to raw image data (i.e. without 
> any kind of color conversion/reduction). While implementing and testing it I 
> also found a bug with ICC profile embedding in the PNGConverter.
> This patch does those things:
>  - add a method getRawRaster() to PDImage. This allows to read the original 
> raster data in 8 or 16 bit without any kind of color interpretation. The user 
> must know what he wants to do with this himself (E.g. to access the raw data 
> of DeviceN images).
>  - add a method getRawImage(). Tries to return the raster obtained by 
> getRawRaster() as a BufferedImage. This is only successful if there is a 
> matching java ColorSpace for the colorspace of the image. I.e. only for 
> ICCBased images. In theory this also should work for PDIndexed sRGB images. 
> But I have to find a PDF with such an image first to test it.
>  - add a -noColorConversion switch to the ExtractImage utility to extract 
> images in their original colorspace. For CMYK images this only works when a 
> TIFF encoder (e.g. from TwelveMonkeys) is in the class path.
>  - add support to export PNGs with ICC profile data in ImageIOUtil.
>  - fix a bug in PNGConverter which does not correctly embed the ICC profile 
> from the png file.
>  - the PNGConverterTest tests the raw images; While reading PNG files to 
> compare it also ensures that the embedded ICC profile is correctly respected. 
> The default PNG reader at least till JDK11 does *not* respect the embedded 
> ICC profile. I.e. the colors are wrong. But there is a workaround for this in 
> the PNGConverterTest (which I have in production for years now). See the 
> screenshot for the correct color display of the png_rgb_romm_16.png testfile 
> (left side; macOS Preview app) and the wrong display (right side; Java; 
> inside IDEA).
>  
> Access to the raw image allows beside finding bugs like in the PNGConverter 
> it also to do all kind of funny color things. E.g. a future patch could be to 
> allow using the raw images to print PDFs. If the PDF you want to print has 
> images with a gamut > sRGB (i.e. all modern cameras) and the target printer 
> has also a gamut > sRGB (i.e. some ink photo printer) you will for sure see a 
> difference in the resulting print. Such a mode would be rather slow, as the 
> current sRGB image handling is optimized for speed and using the original raw 
> images would need on demand color conversions in the printer driver. But you 
> get „high quality“ out of it (at least in respect to colors).
> I don’t think this is in time for the 2.0.20 release.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-4847) [PATCH] Allow to access raw image data and fix ICC profile embedding in PNGConverter

2020-05-26 Thread Emmeran Seehuber (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17117050#comment-17117050
 ] 

Emmeran Seehuber edited comment on PDFBOX-4847 at 5/26/20, 9:31 PM:


The bug in the PNGConverter is, that it did not correctly write the ICC 
profile. It had a "one off" error, as it did not skip the 0-byte marker in the 
profile name (first 0..79 bytes of the iCCP chunk + 0 byte). And it did not 
mark the stream as FLATE_DECODE.

PDFBox (and likely all other PDF readers) just ignored the ICC profile because 
of this (Exception while decoding the profile). But this meant that the colors 
were not correct (as the wrong color profile was used; the alternative 
DeviceRGB was used).

The minimal patch would be:
{code:java}
diff --git 
a/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/image/PNGConverter.java
 
b/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/image/PNGConverter.java
index f17cdd7cd..866cfbfba 100644
--- 
a/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/image/PNGConverter.java
+++ 
b/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/image/PNGConverter.java
@@ -400,11 +400,15 @@ final class PNGConverter
 if (state.iCCP != null || state.sRGB != null)
 {
 // We have got a color profile, which we must attach
 cosStream.setInt(COSName.N, colorSpace.getNumberOfComponents());
 cosStream.setItem(COSName.ALTERNATE, 
colorSpace.getNumberOfComponents()
 == 1 ? COSName.DEVICEGRAY : COSName.DEVICERGB);
 if (state.iCCP != null)
 {
+cosStream.setItem(COSName.FILTER, COSName.FLATE_DECODE);
 // We need to skip over the name
@@ -415,6 +419,7 @@ final class PNGConverter
 break;
 iccProfileDataStart++;
 }
+iccProfileDataStart++;
 if (iccProfileDataStart >= state.iCCP.length)
 {
 LOG.error("Invalid iCCP chunk, to few bytes");
{code}
But this will cause test failures in the PNGConverterTest. As the image now has 
the right colors, but
 - the JDK does not respect the embedded color profile in PNG images. Without 
the fix for this in PNGConverterTest the colors will be "miles" off with the 
PNG for comparison using ImageIO.
 - comparing sRGB images does not work, even after applying the fix for the ICC 
profile, because there are some color rounding differences (off by 1 on the 
first pixel, for whatever reason, likely some different color conversion paths 
somewhere). There is a massive difference between converting single pixel 
values between colorspaces and converting a whole image at once (using 
ColorConversionOp). The later one may choose slightly different colors 
depending on the rendering intent and the colors in use in the image. The image 
from PDImage.getImage() would have been ColorConversionOp-converted, but in 
checkIdent() using getRGB() the image read with ImageIO would be "pixel by 
pixel" color converted. One could fix this by first converting the expected 
image using ColorConversionOp to sRGB if it is not yet in sRGB.

If you want to apply this fix alone, you would need to temporary disable the 
test
{code:java}
PNGConverterTest.testImageConversionRGB16BitICC(){code}
The others should still work. Or your extend checkIdent() to correctly convert 
non-sRGB BufferedImages to sRGB first. I can also provide a patch for that if 
you like. 


was (Author: rototor):
The bug in the PNGConverter is, that it did not correctly write the ICC 
profile. It had a "one off" error, as it did not skip the 0-byte marker in the 
profile name (first 0..79 bytes of the iCCP chunk + 0 byte). And it did not 
mark the stream as FLATE_DECODE.

PDFBox (and likely all other PDF readers) just ignored the ICC profile because 
of this (Exception while decoding the profile). But this meant that the colors 
were not correct (as the wrong color profile was used; the alternative 
DeviceRGB was used).

The minimal patch would be:
{code:java}
diff --git 
a/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/image/PNGConverter.java
 
b/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/image/PNGConverter.java
index f17cdd7cd..866cfbfba 100644
--- 
a/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/image/PNGConverter.java
+++ 
b/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/image/PNGConverter.java
@@ -400,11 +400,15 @@ final class PNGConverter
 if (state.iCCP != null || state.sRGB != null)
 {
 // We have got a color profile, which we must attach
 cosStream.setInt(COSName.N, colorSpace.getNumberOfComponents());
 cosStream.setItem(COSName.ALTERNATE, 
colorSpace.getNumberOfComponents()
 == 1 ? COSName.DEVICEGRAY : COSNam

[jira] [Commented] (PDFBOX-4847) [PATCH] Allow to access raw image data and fix ICC profile embedding in PNGConverter

2020-05-26 Thread Emmeran Seehuber (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17117050#comment-17117050
 ] 

Emmeran Seehuber commented on PDFBOX-4847:
--

The bug in the PNGConverter is, that it did not correctly write the ICC 
profile. It had a "one off" error, as it did not skip the 0-byte marker in the 
profile name (first 0..79 bytes of the iCCP chunk + 0 byte). And it did not 
mark the stream as FLATE_DECODE.

PDFBox (and likely all other PDF readers) just ignored the ICC profile because 
of this (Exception while decoding the profile). But this meant that the colors 
were not correct (as the wrong color profile was used; the alternative 
DeviceRGB was used).

The minimal patch would be:
{code:java}
diff --git 
a/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/image/PNGConverter.java
 
b/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/image/PNGConverter.java
index f17cdd7cd..866cfbfba 100644
--- 
a/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/image/PNGConverter.java
+++ 
b/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/image/PNGConverter.java
@@ -400,11 +400,15 @@ final class PNGConverter
 if (state.iCCP != null || state.sRGB != null)
 {
 // We have got a color profile, which we must attach
 cosStream.setInt(COSName.N, colorSpace.getNumberOfComponents());
 cosStream.setItem(COSName.ALTERNATE, 
colorSpace.getNumberOfComponents()
 == 1 ? COSName.DEVICEGRAY : COSName.DEVICERGB);
 if (state.iCCP != null)
 {
+cosStream.setItem(COSName.FILTER, COSName.FLATE_DECODE);
 // We need to skip over the name
@@ -415,6 +419,7 @@ final class PNGConverter
 break;
 iccProfileDataStart++;
 }
+iccProfileDataStart++;
 if (iccProfileDataStart >= state.iCCP.length)
 {
 LOG.error("Invalid iCCP chunk, to few bytes");
{code}
But this will cause test failures in the PNGConverterTest. As the image now has 
the right colors, but
 - the JDK does not respect the embedded color profile in PNG images. Without 
the fix for this in PNGConverterTest the colors will be "miles" off when the 
PNG for comparison using ImageIO.
 - comparing sRGB images does not work, even after applying the fix for the, 
was there are some color rounding differences (off by 1 on the first pixel, for 
whatever reason, likely some different color conversion paths somewhere). There 
is a massive difference between converting single pixel values between 
colorspaces and converting a whole image at once (using ColorConversionOp). The 
later one may choose slightly different colors depending on the rendering 
intent and the colors in use in the image. The image from PDImage.getImage() 
would have been ColorConversionOp-converted, but in checkIdent() using getRGB() 
the image read with ImageIO would be "pixel by pixel" color converted. One 
could fix this by first converting the expected image using ColorConversionOp 
to sRGB if it is not yet in sRGB.

If you want to apply this fix alone, you would need to temporary disable the 
test
{code:java}
PNGConverterTest.testImageConversionRGB16BitICC(){code}
The others should still work. Or your extend checkIdent() to correctly convert 
non-sRGB BufferedImages to sRGB first. I can also provide a patch for that if 
you like.

> [PATCH] Allow to access raw image data and fix ICC profile embedding in 
> PNGConverter
> 
>
> Key: PDFBOX-4847
> URL: https://issues.apache.org/jira/browse/PDFBOX-4847
> Project: PDFBox
>  Issue Type: New Feature
>  Components: PDModel, Writing
>Affects Versions: 2.0.19
>Reporter: Emmeran Seehuber
>Priority: Minor
>  Labels: feature, patch
> Attachments: color_difference.png, pdfbox-rawimages.patch
>
>
> This patch was primary thought to add access to raw image data (i.e. without 
> any kind of color conversion/reduction). While implementing and testing it I 
> also found a bug with ICC profile embedding in the PNGConverter.
> This patch does those things:
>  - add a method getRawRaster() to PDImage. This allows to read the original 
> raster data in 8 or 16 bit without any kind of color interpretation. The user 
> must know what he wants to do with this himself (E.g. to access the raw data 
> of DeviceN images).
>  - add a method getRawImage(). Tries to return the raster obtained by 
> getRawRaster() as a BufferedImage. This is only successful if there is a 
> matching java ColorSpace for the colorspace of the image. I.e. only for 
> ICCBased images. In theory this also should work for PDIndexed sRGB images. 
> But I h

[jira] [Created] (PDFBOX-4847) [PATCH] Allow to access raw image data and fix ICC profile embedding in PNGConverter

2020-05-26 Thread Emmeran Seehuber (Jira)
Emmeran Seehuber created PDFBOX-4847:


 Summary: [PATCH] Allow to access raw image data and fix ICC 
profile embedding in PNGConverter
 Key: PDFBOX-4847
 URL: https://issues.apache.org/jira/browse/PDFBOX-4847
 Project: PDFBox
  Issue Type: New Feature
  Components: PDModel, Writing
Affects Versions: 2.0.19
Reporter: Emmeran Seehuber
 Attachments: color_difference.png, pdfbox-rawimages.patch

This patch was primary thought to add access to raw image data (i.e. without 
any kind of color conversion/reduction). While implementing and testing it I 
also found a bug with ICC profile embedding in the PNGConverter.

This patch does those things:
 - add a method getRawRaster() to PDImage. This allows to read the original 
raster data in 8 or 16 bit without any kind of color interpretation. The user 
must know what he wants to do with this himself (E.g. to access the raw data of 
DeviceN images).
 - add a method getRawImage(). Tries to return the raster obtained by 
getRawRaster() as a BufferedImage. This is only successful if there is a 
matching java ColorSpace for the colorspace of the image. I.e. only for 
ICCBased images. In theory this also should work for PDIndexed sRGB images. But 
I have to find a PDF with such an image first to test it.
 - add a -noColorConversion switch to the ExtractImage utility to extract 
images in their original colorspace. For CMYK images this only works when a 
TIFF encoder (e.g. from TwelveMonkeys) is in the class path.
 - add support to export PNGs with ICC profile data in ImageIOUtil.
 - fix a bug in PNGConverter which does not correctly embed the ICC profile 
from the png file.
 - the PNGConverterTest tests the raw images; While reading PNG files to 
compare it also ensures that the embedded ICC profile is correctly respected. 
The default PNG reader at least till JDK11 does *not* respect the embedded ICC 
profile. I.e. the colors are wrong. But there is a workaround for this in the 
PNGConverterTest (which I have in production for years now). See the screenshot 
for the correct color display of the png_rgb_romm_16.png testfile (left side; 
macOS Preview app) and the wrong display (right side; Java; inside IDEA).

 

Access to the raw image allows beside finding bugs like in the PNGConverter it 
also to do all kind of funny color things. E.g. a future patch could be to 
allow using the raw images to print PDFs. If the PDF you want to print has 
images with a gamut > sRGB (i.e. all modern cameras) and the target printer has 
also a gamut > sRGB (i.e. some ink photo printer) you will for sure see a 
difference in the resulting print. Such a mode would be rather slow, as the 
current sRGB image handling is optimized for speed and using the original raw 
images would need on demand color conversions in the printer driver. But you 
get „high quality“ out of it (at least in respect to colors).

I don’t think this is in time for the 2.0.20 release.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-4847) [PATCH] Allow to access raw image data and fix ICC profile embedding in PNGConverter

2020-05-26 Thread Emmeran Seehuber (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Emmeran Seehuber updated PDFBOX-4847:
-
Attachment: color_difference.png

> [PATCH] Allow to access raw image data and fix ICC profile embedding in 
> PNGConverter
> 
>
> Key: PDFBOX-4847
> URL: https://issues.apache.org/jira/browse/PDFBOX-4847
> Project: PDFBox
>  Issue Type: New Feature
>  Components: PDModel, Writing
>Affects Versions: 2.0.19
>Reporter: Emmeran Seehuber
>Priority: Minor
>  Labels: feature, patch
> Attachments: color_difference.png, pdfbox-rawimages.patch
>
>
> This patch was primary thought to add access to raw image data (i.e. without 
> any kind of color conversion/reduction). While implementing and testing it I 
> also found a bug with ICC profile embedding in the PNGConverter.
> This patch does those things:
>  - add a method getRawRaster() to PDImage. This allows to read the original 
> raster data in 8 or 16 bit without any kind of color interpretation. The user 
> must know what he wants to do with this himself (E.g. to access the raw data 
> of DeviceN images).
>  - add a method getRawImage(). Tries to return the raster obtained by 
> getRawRaster() as a BufferedImage. This is only successful if there is a 
> matching java ColorSpace for the colorspace of the image. I.e. only for 
> ICCBased images. In theory this also should work for PDIndexed sRGB images. 
> But I have to find a PDF with such an image first to test it.
>  - add a -noColorConversion switch to the ExtractImage utility to extract 
> images in their original colorspace. For CMYK images this only works when a 
> TIFF encoder (e.g. from TwelveMonkeys) is in the class path.
>  - add support to export PNGs with ICC profile data in ImageIOUtil.
>  - fix a bug in PNGConverter which does not correctly embed the ICC profile 
> from the png file.
>  - the PNGConverterTest tests the raw images; While reading PNG files to 
> compare it also ensures that the embedded ICC profile is correctly respected. 
> The default PNG reader at least till JDK11 does *not* respect the embedded 
> ICC profile. I.e. the colors are wrong. But there is a workaround for this in 
> the PNGConverterTest (which I have in production for years now). See the 
> screenshot for the correct color display of the png_rgb_romm_16.png testfile 
> (left side; macOS Preview app) and the wrong display (right side; Java; 
> inside IDEA).
>  
> Access to the raw image allows beside finding bugs like in the PNGConverter 
> it also to do all kind of funny color things. E.g. a future patch could be to 
> allow using the raw images to print PDFs. If the PDF you want to print has 
> images with a gamut > sRGB (i.e. all modern cameras) and the target printer 
> has also a gamut > sRGB (i.e. some ink photo printer) you will for sure see a 
> difference in the resulting print. Such a mode would be rather slow, as the 
> current sRGB image handling is optimized for speed and using the original raw 
> images would need on demand color conversions in the printer driver. But you 
> get „high quality“ out of it (at least in respect to colors).
> I don’t think this is in time for the 2.0.20 release.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-4847) [PATCH] Allow to access raw image data and fix ICC profile embedding in PNGConverter

2020-05-26 Thread Emmeran Seehuber (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Emmeran Seehuber updated PDFBOX-4847:
-
Attachment: pdfbox-rawimages.patch

> [PATCH] Allow to access raw image data and fix ICC profile embedding in 
> PNGConverter
> 
>
> Key: PDFBOX-4847
> URL: https://issues.apache.org/jira/browse/PDFBOX-4847
> Project: PDFBox
>  Issue Type: New Feature
>  Components: PDModel, Writing
>Affects Versions: 2.0.19
>Reporter: Emmeran Seehuber
>Priority: Minor
>  Labels: feature, patch
> Attachments: color_difference.png, pdfbox-rawimages.patch
>
>
> This patch was primary thought to add access to raw image data (i.e. without 
> any kind of color conversion/reduction). While implementing and testing it I 
> also found a bug with ICC profile embedding in the PNGConverter.
> This patch does those things:
>  - add a method getRawRaster() to PDImage. This allows to read the original 
> raster data in 8 or 16 bit without any kind of color interpretation. The user 
> must know what he wants to do with this himself (E.g. to access the raw data 
> of DeviceN images).
>  - add a method getRawImage(). Tries to return the raster obtained by 
> getRawRaster() as a BufferedImage. This is only successful if there is a 
> matching java ColorSpace for the colorspace of the image. I.e. only for 
> ICCBased images. In theory this also should work for PDIndexed sRGB images. 
> But I have to find a PDF with such an image first to test it.
>  - add a -noColorConversion switch to the ExtractImage utility to extract 
> images in their original colorspace. For CMYK images this only works when a 
> TIFF encoder (e.g. from TwelveMonkeys) is in the class path.
>  - add support to export PNGs with ICC profile data in ImageIOUtil.
>  - fix a bug in PNGConverter which does not correctly embed the ICC profile 
> from the png file.
>  - the PNGConverterTest tests the raw images; While reading PNG files to 
> compare it also ensures that the embedded ICC profile is correctly respected. 
> The default PNG reader at least till JDK11 does *not* respect the embedded 
> ICC profile. I.e. the colors are wrong. But there is a workaround for this in 
> the PNGConverterTest (which I have in production for years now). See the 
> screenshot for the correct color display of the png_rgb_romm_16.png testfile 
> (left side; macOS Preview app) and the wrong display (right side; Java; 
> inside IDEA).
>  
> Access to the raw image allows beside finding bugs like in the PNGConverter 
> it also to do all kind of funny color things. E.g. a future patch could be to 
> allow using the raw images to print PDFs. If the PDF you want to print has 
> images with a gamut > sRGB (i.e. all modern cameras) and the target printer 
> has also a gamut > sRGB (i.e. some ink photo printer) you will for sure see a 
> difference in the resulting print. Such a mode would be rather slow, as the 
> current sRGB image handling is optimized for speed and using the original raw 
> images would need on demand color conversions in the printer driver. But you 
> get „high quality“ out of it (at least in respect to colors).
> I don’t think this is in time for the 2.0.20 release.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4205) LosslessFactory alters image

2020-02-25 Thread Emmeran Seehuber (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044407#comment-17044407
 ] 

Emmeran Seehuber commented on PDFBOX-4205:
--

[~cdokolas] I see, LosslessFactory.createFromGrayImage() does not care about 
the color profile of the gray image at all. PDDeviceGray.INSTANCE more or less 
means "whatever the PDF Reader likes as a Grayscale-Colorspace", as 
PDDeviceGray for a printer means "use the color values as the exact values for 
the amount of color to apply on the paper without transforming it future". And 
this is 100% device depending ...

As sRGB is/was the default for many monitors for a long time, this might not 
been seen as a problem. But nowadays most (better) monitors do some kind of 
color management and have colorspaces bigger than sRGB, and then you will also 
get this distortion on a monitor.

So it is no surprise that the gamut curve is wrong... (i.e. the different gray 
tones).

So there are two problems:
 * getRGB() is used to get the color values in sRGB. which causes color 
transformation to sRGB and some potential color loss.
 * PDDeviceGray is used as profile. If getRGB() is used it should be exported 
as sRGB with embedded sRGB Profile - which also means that you need all three 
color channels... Using sRGB but exporting as PDDeviceGray is more or less a 
doomed to shift the gamut curve, i.e. making wrong gray tones.

So getting this right is not so easy. I already struggled with this while 
implementing PDFBOX-4341.

> LosslessFactory alters image
> 
>
> Key: PDFBOX-4205
> URL: https://issues.apache.org/jira/browse/PDFBOX-4205
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.7, 2.0.8, 2.0.9
> Environment: Ubuntu 16.04
>Reporter: Harry Dent
>Priority: Minor
> Attachments: image-2020-02-24-17-59-08-300.png, lossy.png, 
> picture_of_text.png
>
>
> The attached grayscale png becomes lighter when run through the following 
> code snippet: 
> {code:java}
> BufferedImage image = ImageIO.read(new File("picture_of_text.png"));
> PDImageXObject xObject = LosslessFactory.createFromImage(new PDDocument(), 
> image);
> BufferedImage lossy = xObject.getImage();
> ImageIO.write(lossy, "png", new File("lossy.png"));
> {code}
> The difference is easiest to spot by looking at the "S" in "41st". The loss 
> in quality occurs in {{createFromImage()}} (rather than {{getImage()}}), 
> which can be shown by embedding the {{PDImageXObject}} into a {{PDDocument}} 
> and then saving this document to a file. 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4205) LosslessFactory alters image

2020-02-24 Thread Emmeran Seehuber (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043553#comment-17043553
 ] 

Emmeran Seehuber commented on PDFBOX-4205:
--

Are you sure that you still have this issue with the current PDFBox version 
4.0.19? Some fixes regarding the image handling of images with colorspaces have 
been included since PDFBox 4.0.12. See also PDFBOX-4184.

> LosslessFactory alters image
> 
>
> Key: PDFBOX-4205
> URL: https://issues.apache.org/jira/browse/PDFBOX-4205
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.7, 2.0.8, 2.0.9
> Environment: Ubuntu 16.04
>Reporter: Harry Dent
>Priority: Minor
> Attachments: lossy.png, picture_of_text.png
>
>
> The attached grayscale png becomes lighter when run through the following 
> code snippet: 
> {code:java}
> BufferedImage image = ImageIO.read(new File("picture_of_text.png"));
> PDImageXObject xObject = LosslessFactory.createFromImage(new PDDocument(), 
> image);
> BufferedImage lossy = xObject.getImage();
> ImageIO.write(lossy, "png", new File("lossy.png"));
> {code}
> The difference is easiest to spot by looking at the "S" in "41st". The loss 
> in quality occurs in {{createFromImage()}} (rather than {{getImage()}}), 
> which can be shown by embedding the {{PDImageXObject}} into a {{PDDocument}} 
> and then saving this document to a file. 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4341) [Patch] PNGConverter: PNG bytes to PDImageXObject converter

2019-10-13 Thread Emmeran Seehuber (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16950482#comment-16950482
 ] 

Emmeran Seehuber commented on PDFBOX-4341:
--

[~tilman] No, that where the same files. But I don't see your commit in the svn.

> [Patch] PNGConverter: PNG bytes to PDImageXObject converter
> ---
>
> Key: PDFBOX-4341
> URL: https://issues.apache.org/jira/browse/PDFBOX-4341
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Writing
>Affects Versions: 2.0.12
>Reporter: Emmeran Seehuber
>Priority: Minor
> Attachments: 001229.png, 001230.png, 008528.png, 014431.png, 
> 016289.png, 017012.png, 017030.png, 017063.png, 017084.png, 
> image-2018-10-25-09-29-47-251.png, optimized.zip, pngconvert_testimg.zip, 
> pngconvert_v1.patch, pngconvert_v2.patch, pngconvert_v3.patch
>
>
> The attached patch implements a PNG bytes to PDImageXObject converter. It 
> tries to create a PDImageXObject from the chunks of a PNG image, without 
> recompressing it. This allows to use programs like pngcrush and friends to 
> embedded optimal compressed images. It’s also way faster than recompressing 
> the image.
> The class PNGConverter does this in three steps:
>  - Parsing the PNG chunk structure from the byte array
>  - Validating all relevant data chunks (i.e. checking the CRC). Chunks which 
> are not needed (e.g. text chunks) are not validated.
>  - Constructing a PDImageXObject from the chunks
> When at any of this steps an error occurs or the converter detects that it is 
> not possible to map the image, it will bail out and return null. In this case 
> the image has to be embedded the „normal“ way by reading it using ImageIO and 
> compressing it again.
> Only this PNG image types can be converted (at least theoretically) without 
> recompressing the image data:
>  - Grayscale
>  - Truecolor (i.e. RGB 8-Bit/16-Bit)
>  - Indexed
> As soon as transparency is used it gets difficult:
>  - Grayscale with alpha / truecolor with alpha: The alpha channel is saved in 
> the image data stream, as they are stored as (Gray,Alpha) or 
> (Red,Green,Blue,Alpha) tuples. You have to separate the alpha information for 
> the SMASK-Image. At this moment you can just read and recompress it using the 
> LosslessFactory.
>  - Indexed with alpha. Alpha and color tables are separate in the PNG, so 
> this should be possible to build a grayscale SMASK from the image data (which 
> are just the table indices) and the alpha table. Tried that, but Acrobat 
> Reader does not like indexed SMASKs… One could just build a grayscale SMASK 
> using the alpha table and the decompressed image index data. This would at 
> least save some space, as the optimized indexed image data is still used.
> With the current patch only truecolor without alpha images work correctly. 
> The other tests for grayscale and indexed fail. (You must place the zipped 
> images in the resources folder were png.png resides to run the testdrivers; 
> This images are „original“ work done by me using Gimp, Krita and ImageOptim 
> (on macOS) to build the different png image types.)
> Notes for the current patch:
>  - The grayscale images have the wrong gamma curve. I tried using the 
> ColorSpace.CS_GRAY ICC profile and the image seems now only „slightly“ off 
> (i.e. pixel value FFD6D6D6 vs FFD7D7D7). As soon as a gAMA chunk is given the 
> image is tagged with a CalGray profile, but the colors are way more off then.
>  - The cHRM (chroma) chunk is read and *should* work, as I used the formula’s 
> from the PDF spec to convert the cRHM values to the CalRGB whitepoint and 
> matrix. I have not yet tested this, as I have no test image with cHRM at the 
> moment. Note: Matrix(COSArray) and Matrix.toCOSArray() are fine for geometric 
> matrices. But this methods are wrong for any other kind of matrix (i.e. color 
> transform matrices), as they only store/restore 6 values of the 3x3 matrix. I 
> deprecated PDCalRGB.setMatrix(Matrix) because of this, as this was never 
> working and can not work as long as the Matrix class is for geometric use 
> cases only. This should also be documented on the Matrix class, that it is 
> not general purpose. I added a PDCalRGB.setMatrix(COSArray) method to allow 
> to set the matrix.
>  - The indexed image displays fine in Acrobat Reader, but the test driver 
> fails as PDImageXObject.getImage() returns a complete black (everything 0) 
> image. Strange, I suspect some error in the PDFBox image decoding.
>  - If an image is tagged with sRGB, the builtin Java sRGB ICC profile is 
> attached. Theoretically you can use a CalRGB colorspace, but using a ICC 
> color profile is likely faster (at least in PDFBox) and more „standard“.
> You can also look at this patch on GitHub 
> [https://github

[jira] [Commented] (PDFBOX-4341) [Patch] PNGConverter: PNG bytes to PDImageXObject converter

2019-10-09 Thread Emmeran Seehuber (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16947669#comment-16947669
 ] 

Emmeran Seehuber commented on PDFBOX-4341:
--

[~jdimeo] I totally forgot about this patch... Nice that it helps you. But 
please use this updated branch 
[https://github.com/rototor/pdfbox/tree/2.0-png-from-bytes-encoder-update] for 
your project, as the original branch on GitHub is based on 2.0.12. This new 
branch is based on PDFBox 2.0.17.

[~tilman] I've updated the patch against 2.0.17, see [^pngconvert_v3.patch]. 
I've included the test images inside the patch this time. 
What do I need to do to get this merged? I see that you don't like it that the 
PDF file sizes may regress if the original PNG images are not compressed very 
well as currently
createFromByteArray() always recompresses PNG files. Would it be enough to 
document on createFromByteArray() that for PNG files you may get a smaller file 
size if you first load and then use the LosslessFactory on them?

> [Patch] PNGConverter: PNG bytes to PDImageXObject converter
> ---
>
> Key: PDFBOX-4341
> URL: https://issues.apache.org/jira/browse/PDFBOX-4341
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Writing
>Affects Versions: 2.0.12
>Reporter: Emmeran Seehuber
>Priority: Minor
> Attachments: 001229.png, 001230.png, 008528.png, 014431.png, 
> 016289.png, 017012.png, 017030.png, 017063.png, 017084.png, 
> image-2018-10-25-09-29-47-251.png, optimized.zip, pngconvert_testimg.zip, 
> pngconvert_v1.patch, pngconvert_v2.patch, pngconvert_v3.patch
>
>
> The attached patch implements a PNG bytes to PDImageXObject converter. It 
> tries to create a PDImageXObject from the chunks of a PNG image, without 
> recompressing it. This allows to use programs like pngcrush and friends to 
> embedded optimal compressed images. It’s also way faster than recompressing 
> the image.
> The class PNGConverter does this in three steps:
>  - Parsing the PNG chunk structure from the byte array
>  - Validating all relevant data chunks (i.e. checking the CRC). Chunks which 
> are not needed (e.g. text chunks) are not validated.
>  - Constructing a PDImageXObject from the chunks
> When at any of this steps an error occurs or the converter detects that it is 
> not possible to map the image, it will bail out and return null. In this case 
> the image has to be embedded the „normal“ way by reading it using ImageIO and 
> compressing it again.
> Only this PNG image types can be converted (at least theoretically) without 
> recompressing the image data:
>  - Grayscale
>  - Truecolor (i.e. RGB 8-Bit/16-Bit)
>  - Indexed
> As soon as transparency is used it gets difficult:
>  - Grayscale with alpha / truecolor with alpha: The alpha channel is saved in 
> the image data stream, as they are stored as (Gray,Alpha) or 
> (Red,Green,Blue,Alpha) tuples. You have to separate the alpha information for 
> the SMASK-Image. At this moment you can just read and recompress it using the 
> LosslessFactory.
>  - Indexed with alpha. Alpha and color tables are separate in the PNG, so 
> this should be possible to build a grayscale SMASK from the image data (which 
> are just the table indices) and the alpha table. Tried that, but Acrobat 
> Reader does not like indexed SMASKs… One could just build a grayscale SMASK 
> using the alpha table and the decompressed image index data. This would at 
> least save some space, as the optimized indexed image data is still used.
> With the current patch only truecolor without alpha images work correctly. 
> The other tests for grayscale and indexed fail. (You must place the zipped 
> images in the resources folder were png.png resides to run the testdrivers; 
> This images are „original“ work done by me using Gimp, Krita and ImageOptim 
> (on macOS) to build the different png image types.)
> Notes for the current patch:
>  - The grayscale images have the wrong gamma curve. I tried using the 
> ColorSpace.CS_GRAY ICC profile and the image seems now only „slightly“ off 
> (i.e. pixel value FFD6D6D6 vs FFD7D7D7). As soon as a gAMA chunk is given the 
> image is tagged with a CalGray profile, but the colors are way more off then.
>  - The cHRM (chroma) chunk is read and *should* work, as I used the formula’s 
> from the PDF spec to convert the cRHM values to the CalRGB whitepoint and 
> matrix. I have not yet tested this, as I have no test image with cHRM at the 
> moment. Note: Matrix(COSArray) and Matrix.toCOSArray() are fine for geometric 
> matrices. But this methods are wrong for any other kind of matrix (i.e. color 
> transform matrices), as they only store/restore 6 values of the 3x3 matrix. I 
> deprecated PDCalRGB.setMatrix(Matrix) because of this, as this was never 
> working and

[jira] [Updated] (PDFBOX-4341) [Patch] PNGConverter: PNG bytes to PDImageXObject converter

2019-10-09 Thread Emmeran Seehuber (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-4341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Emmeran Seehuber updated PDFBOX-4341:
-
Attachment: pngconvert_v3.patch

> [Patch] PNGConverter: PNG bytes to PDImageXObject converter
> ---
>
> Key: PDFBOX-4341
> URL: https://issues.apache.org/jira/browse/PDFBOX-4341
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Writing
>Affects Versions: 2.0.12
>Reporter: Emmeran Seehuber
>Priority: Minor
> Attachments: 001229.png, 001230.png, 008528.png, 014431.png, 
> 016289.png, 017012.png, 017030.png, 017063.png, 017084.png, 
> image-2018-10-25-09-29-47-251.png, optimized.zip, pngconvert_testimg.zip, 
> pngconvert_v1.patch, pngconvert_v2.patch, pngconvert_v3.patch
>
>
> The attached patch implements a PNG bytes to PDImageXObject converter. It 
> tries to create a PDImageXObject from the chunks of a PNG image, without 
> recompressing it. This allows to use programs like pngcrush and friends to 
> embedded optimal compressed images. It’s also way faster than recompressing 
> the image.
> The class PNGConverter does this in three steps:
>  - Parsing the PNG chunk structure from the byte array
>  - Validating all relevant data chunks (i.e. checking the CRC). Chunks which 
> are not needed (e.g. text chunks) are not validated.
>  - Constructing a PDImageXObject from the chunks
> When at any of this steps an error occurs or the converter detects that it is 
> not possible to map the image, it will bail out and return null. In this case 
> the image has to be embedded the „normal“ way by reading it using ImageIO and 
> compressing it again.
> Only this PNG image types can be converted (at least theoretically) without 
> recompressing the image data:
>  - Grayscale
>  - Truecolor (i.e. RGB 8-Bit/16-Bit)
>  - Indexed
> As soon as transparency is used it gets difficult:
>  - Grayscale with alpha / truecolor with alpha: The alpha channel is saved in 
> the image data stream, as they are stored as (Gray,Alpha) or 
> (Red,Green,Blue,Alpha) tuples. You have to separate the alpha information for 
> the SMASK-Image. At this moment you can just read and recompress it using the 
> LosslessFactory.
>  - Indexed with alpha. Alpha and color tables are separate in the PNG, so 
> this should be possible to build a grayscale SMASK from the image data (which 
> are just the table indices) and the alpha table. Tried that, but Acrobat 
> Reader does not like indexed SMASKs… One could just build a grayscale SMASK 
> using the alpha table and the decompressed image index data. This would at 
> least save some space, as the optimized indexed image data is still used.
> With the current patch only truecolor without alpha images work correctly. 
> The other tests for grayscale and indexed fail. (You must place the zipped 
> images in the resources folder were png.png resides to run the testdrivers; 
> This images are „original“ work done by me using Gimp, Krita and ImageOptim 
> (on macOS) to build the different png image types.)
> Notes for the current patch:
>  - The grayscale images have the wrong gamma curve. I tried using the 
> ColorSpace.CS_GRAY ICC profile and the image seems now only „slightly“ off 
> (i.e. pixel value FFD6D6D6 vs FFD7D7D7). As soon as a gAMA chunk is given the 
> image is tagged with a CalGray profile, but the colors are way more off then.
>  - The cHRM (chroma) chunk is read and *should* work, as I used the formula’s 
> from the PDF spec to convert the cRHM values to the CalRGB whitepoint and 
> matrix. I have not yet tested this, as I have no test image with cHRM at the 
> moment. Note: Matrix(COSArray) and Matrix.toCOSArray() are fine for geometric 
> matrices. But this methods are wrong for any other kind of matrix (i.e. color 
> transform matrices), as they only store/restore 6 values of the 3x3 matrix. I 
> deprecated PDCalRGB.setMatrix(Matrix) because of this, as this was never 
> working and can not work as long as the Matrix class is for geometric use 
> cases only. This should also be documented on the Matrix class, that it is 
> not general purpose. I added a PDCalRGB.setMatrix(COSArray) method to allow 
> to set the matrix.
>  - The indexed image displays fine in Acrobat Reader, but the test driver 
> fails as PDImageXObject.getImage() returns a complete black (everything 0) 
> image. Strange, I suspect some error in the PDFBox image decoding.
>  - If an image is tagged with sRGB, the builtin Java sRGB ICC profile is 
> attached. Theoretically you can use a CalRGB colorspace, but using a ICC 
> color profile is likely faster (at least in PDFBox) and more „standard“.
> You can also look at this patch on GitHub 
> [https://github.com/apache/pdfbox/compare/2.0...rototor:2.0-png-from-bytes-encoder?expand=1]
>  if you like.
> It woul

[jira] [Commented] (PDFBOX-4607) Transparent 16 bit image doesn't display in Adobe Reader

2019-07-25 Thread Emmeran Seehuber (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16892942#comment-16892942
 ] 

Emmeran Seehuber commented on PDFBOX-4607:
--

No, I just did an "extract expression" for {{srcCspaceType}}. Theoretically 
their can be grayscale images without an ICC-Profile. But at least everything 
read in with ImageIO should have a ICC Profile, even if it's only the default 
builtin ICC Profile. So feel free to extend this line to also handle the 
grayscale case. I.e.
{code:java}
PDColorSpace pdColorSpace = srcCspaceType == ColorSpace.TYPE_CMYK
  ? PDDeviceCMYK.INSTANCE : (srcCspaceType == ColorSpace. TYPE_GRAY ? 
PDDeviceGray.INSTANCE : PDDeviceRGB.INSTANCE);
{code}

I just would not know how to trigger/test this case.

> Transparent 16 bit image doesn't display in Adobe Reader
> 
>
> Key: PDFBOX-4607
> URL: https://issues.apache.org/jira/browse/PDFBOX-4607
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.12, 2.0.16
>Reporter: Tilman Hausherr
>Priority: Major
>  Labels: regression
> Attachments: 16bit-transparent.pdf, 16bit-transparent.png, 
> Correctly_handle_grayscale_colorspacees_when_setting_the_alternate__colorspace_.patch
>
>
> Thomas on stackoverflow has very simple code that creates a PDF from a 16 bit 
> PNG image. The PDF displays fine on Adobe Reader in 2.0.8 but no longer since 
> 2.0.12. It displays on all other viewers I've tried: PDFBox, GS, PDF.js and 
> Chrome. It works on Adobe Reader when disabling the predictor logic in 
> LosslessFactory.java. Thomas and I narrowed the problem last evening and now 
> he has a non confidential file for us and also a workaround on his side.
> To reproduce the problem I used the ImageToPDF example.
> cc [~rototor]
> the "bad" file has imageType custom, colorspace with getNumComponents = 1, 8 
> bpc, data buffer TYPE_BYTE



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4607) Transparent 16 bit image doesn't display in Adobe Reader

2019-07-25 Thread Emmeran Seehuber (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16892680#comment-16892680
 ] 

Emmeran Seehuber commented on PDFBOX-4607:
--

The alternate color space is not set correctly for grayscale images.

This patch 
[^Correctly_handle_grayscale_colorspacees_when_setting_the_alternate__colorspace_.patch]
 fixes it. 

> Transparent 16 bit image doesn't display in Adobe Reader
> 
>
> Key: PDFBOX-4607
> URL: https://issues.apache.org/jira/browse/PDFBOX-4607
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.12, 2.0.16
>Reporter: Tilman Hausherr
>Priority: Major
>  Labels: regression
> Attachments: 16bit-transparent.pdf, 16bit-transparent.png, 
> Correctly_handle_grayscale_colorspacees_when_setting_the_alternate__colorspace_.patch
>
>
> Thomas on stackoverflow has very simple code that creates a PDF from a 16 bit 
> PNG image. The PDF displays fine on Adobe Reader in 2.0.8 but no longer since 
> 2.0.12. It displays on all other viewers I've tried: PDFBox, GS, PDF.js and 
> Chrome. It works on Adobe Reader when disabling the predictor logic in 
> LosslessFactory.java. Thomas and I narrowed the problem last evening and now 
> he has a non confidential file for us and also a workaround on his side.
> To reproduce the problem I used the ImageToPDF example.
> cc [~rototor]
> the "bad" file has imageType custom, colorspace with getNumComponents = 1, 8 
> bpc, data buffer TYPE_BYTE



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-4607) Transparent 16 bit image doesn't display in Adobe Reader

2019-07-25 Thread Emmeran Seehuber (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-4607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Emmeran Seehuber updated PDFBOX-4607:
-
Attachment: 
Correctly_handle_grayscale_colorspacees_when_setting_the_alternate__colorspace_.patch

> Transparent 16 bit image doesn't display in Adobe Reader
> 
>
> Key: PDFBOX-4607
> URL: https://issues.apache.org/jira/browse/PDFBOX-4607
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.12, 2.0.16
>Reporter: Tilman Hausherr
>Priority: Major
>  Labels: regression
> Attachments: 16bit-transparent.pdf, 16bit-transparent.png, 
> Correctly_handle_grayscale_colorspacees_when_setting_the_alternate__colorspace_.patch
>
>
> Thomas on stackoverflow has very simple code that creates a PDF from a 16 bit 
> PNG image. The PDF displays fine on Adobe Reader in 2.0.8 but no longer since 
> 2.0.12. It displays on all other viewers I've tried: PDFBox, GS, PDF.js and 
> Chrome. It works on Adobe Reader when disabling the predictor logic in 
> LosslessFactory.java. Thomas and I narrowed the problem last evening and now 
> he has a non confidential file for us and also a workaround on his side.
> To reproduce the problem I used the ImageToPDF example.
> cc [~rototor]
> the "bad" file has imageType custom, colorspace with getNumComponents = 1, 8 
> bpc, data buffer TYPE_BYTE



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4607) Transparent 16 bit image doesn't display in Adobe Reader

2019-07-25 Thread Emmeran Seehuber (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16892544#comment-16892544
 ] 

Emmeran Seehuber commented on PDFBOX-4607:
--

I'll try to look into this today afternoon.

> Transparent 16 bit image doesn't display in Adobe Reader
> 
>
> Key: PDFBOX-4607
> URL: https://issues.apache.org/jira/browse/PDFBOX-4607
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.12, 2.0.16
>Reporter: Tilman Hausherr
>Priority: Major
>  Labels: regression
> Attachments: 16bit-transparent.pdf, 16bit-transparent.png
>
>
> Thomas on stackoverflow has very simple code that creates a PDF from a 16 bit 
> PNG image. The PDF displays fine on Adobe Reader in 2.0.8 but no longer since 
> 2.0.12. It displays on all other viewers I've tried: PDFBox, GS, PDF.js and 
> Chrome. It works on Adobe Reader when disabling the predictor logic in 
> LosslessFactory.java. Thomas and I narrowed the problem last evening and now 
> he has a non confidential file for us and also a workaround on his side.
> To reproduce the problem I used the ImageToPDF example.
> cc [~rototor]
> the "bad" file has imageType custom, colorspace with getNumComponents = 1, 8 
> bpc, data buffer TYPE_BYTE



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4300) Reduce im memory buffers when creating grayscale images

2018-11-20 Thread Emmeran Seehuber (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16694351#comment-16694351
 ] 

Emmeran Seehuber commented on PDFBOX-4300:
--

[~sdanig] The color mismatch is because the different gamma curve between sRGB 
and whatever grayscale color profile the image had (color management fun ...). 
The PDDeviceGray profile has a gamma curve depending on the output device, i.e. 
this can really vary. PDFBox may just assume a sRGB gamma curve (but I didn't 
look into).

To fix this you should just tag the image with the right profile, i.e. instead 
of PDDeviceGray you build a PDICCBased profile, see the code in 
PredictorEncoder#preparePredictorPDImage(), search for ICC_Profile.

But: The approach with directly casting the image-raster data buffer to 
DataBufferByte is wrong. This works for your special case, but this won't work 
in general because:
 * You don't respect any strides the image data may have, i.e. there may be 
trailing bytes every single image line.
 * The buffer does not have to be a DataBufferByte. It can e.g. also be a 
DataBufferShort for 16 bit images. Or it may be a memory mapped byte buffer 
(see e.g. 
[this|https://github.com/haraldk/TwelveMonkeys/blob/master/sandbox/sandbox-common/src/main/java/com/twelvemonkeys/image/MappedFileBuffer.java]
 class, which is sadly not released on maven central, but I use a copy of it 
very successful in production with huge images)

So the right solution would be to do it like the predictor encoder and use
{code:java}
image.getRaster().getDataElements(){code}
with the right array type.  But you could also just simply try to extend the 
PredictorEncoder to be also able to handle grayscale images. 

> Reduce im memory buffers when creating grayscale images
> ---
>
> Key: PDFBOX-4300
> URL: https://issues.apache.org/jira/browse/PDFBOX-4300
> Project: PDFBox
>  Issue Type: Improvement
>  Components: PDModel
>Affects Versions: 2.0.11
>Reporter: Jesse Long
>Priority: Minor
>  Labels: optimization
> Attachments: PDFBOX-4300-1.patch
>
>
> LosslessFactory uses ByteArrayOutputStreams when creating PDF image data. 
> First, it creates a BAOS in which to store the data, then a BAOS in which to 
> store the flate encoded data. Finally the flate encoded data is written to 
> the PDImageXObject's stream.
> We could instead create an empty PDStream, give it a filter, and write the 
> image data directly into the stream. We then instantiate a PDImageXObject 
> giving it the already created stream.
> This would dramatically reduce RAM requirement if a scratchfile is in play.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4363) [Patch] Add a common interface PDShadingPaint for all shading paints

2018-10-30 Thread Emmeran Seehuber (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16668453#comment-16668453
 ] 

Emmeran Seehuber commented on PDFBOX-4363:
--

Sorry, I was not aware that PD is reserved for official PDF structures. 

I thought about a base class at first too. This is possible without problems 
using generics. See [^shadingpaint_generic_baseclass.patch]

But it introduces a hidden class cast (as all generics do). I.e. the first 
access to the shading using the concrete type of the derived class will 
internally cause a cast/type check on the type of the shading member. As this 
code path should not be extremely performance critical it shouldn't matter. And 
the JVM is nowadays very good at optimizing this out.

> [Patch] Add a common interface PDShadingPaint for all shading paints
> 
>
> Key: PDFBOX-4363
> URL: https://issues.apache.org/jira/browse/PDFBOX-4363
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Rendering
>Affects Versions: 2.0.12
>Reporter: Emmeran Seehuber
>Priority: Minor
> Attachments: pdshadingpaint_base_interface.patch, 
> shadingpaint_generic_baseclass.patch
>
>
> The attached patch adds an common interface PDShadingPaint to all 
> PDShading-based Paint's. It allows to access the underlying PDShading object 
> and the matrix of the paint.
> At the moment it is not possible to access this fields without using dirty 
> accessibility hacks, see also [this 
> commit|https://github.com/rototor/pdfbox-graphics2d/commit/8603f6f9c781604d4684b8cb46499703c34b4711].
> Why would you need that? I need that for my 
> [PdfBoxGraphics2D|https://github.com/rototor/pdfbox-graphics2d] adapter. I 
> draw PDF pages using PDFRenderer/PageDrawer back into a PDF. While doing so I 
> derive from both classes and change / filter certain aspects of the PDF. One 
> use case is to extract a specific seperation color into its own PDF page, 
> remap it to another color and also draw some overfill (i.e. a additional 
> border of 0.5pt around all shapes drawn with this color). This so prepared 
> page is then used with a machine which glues foil (gold, silver or copper) on 
> the places marked with that color.
> You can look at an [example 
> here|https://github.com/rototor/pdfbox-graphics2d/blob/master/src/test/java/de/rototor/pdfbox/graphics2d/PdfRerenderTest.java],
>  it does not use a seperation color, but you should get the idea. 
> My long term goal is to be able to use PDFBox for all possible pre-press PDF 
> manipulations. E.g. changing/remapping colorspaces, resampling images to the 
> target resolution, ...
> At the moment I know of this "special cases" which will need a special 
> treatment, as they are normally handled through rendering them first into a 
> BufferedImage:
>  - Transparency groups
>  - Softmasks
> Are there other places which resort to rendering to a BufferedImage first?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-4363) [Patch] Add a common interface PDShadingPaint for all shading paints

2018-10-30 Thread Emmeran Seehuber (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-4363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Emmeran Seehuber updated PDFBOX-4363:
-
Attachment: shadingpaint_generic_baseclass.patch

> [Patch] Add a common interface PDShadingPaint for all shading paints
> 
>
> Key: PDFBOX-4363
> URL: https://issues.apache.org/jira/browse/PDFBOX-4363
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Rendering
>Affects Versions: 2.0.12
>Reporter: Emmeran Seehuber
>Priority: Minor
> Attachments: pdshadingpaint_base_interface.patch, 
> shadingpaint_generic_baseclass.patch
>
>
> The attached patch adds an common interface PDShadingPaint to all 
> PDShading-based Paint's. It allows to access the underlying PDShading object 
> and the matrix of the paint.
> At the moment it is not possible to access this fields without using dirty 
> accessibility hacks, see also [this 
> commit|https://github.com/rototor/pdfbox-graphics2d/commit/8603f6f9c781604d4684b8cb46499703c34b4711].
> Why would you need that? I need that for my 
> [PdfBoxGraphics2D|https://github.com/rototor/pdfbox-graphics2d] adapter. I 
> draw PDF pages using PDFRenderer/PageDrawer back into a PDF. While doing so I 
> derive from both classes and change / filter certain aspects of the PDF. One 
> use case is to extract a specific seperation color into its own PDF page, 
> remap it to another color and also draw some overfill (i.e. a additional 
> border of 0.5pt around all shapes drawn with this color). This so prepared 
> page is then used with a machine which glues foil (gold, silver or copper) on 
> the places marked with that color.
> You can look at an [example 
> here|https://github.com/rototor/pdfbox-graphics2d/blob/master/src/test/java/de/rototor/pdfbox/graphics2d/PdfRerenderTest.java],
>  it does not use a seperation color, but you should get the idea. 
> My long term goal is to be able to use PDFBox for all possible pre-press PDF 
> manipulations. E.g. changing/remapping colorspaces, resampling images to the 
> target resolution, ...
> At the moment I know of this "special cases" which will need a special 
> treatment, as they are normally handled through rendering them first into a 
> BufferedImage:
>  - Transparency groups
>  - Softmasks
> Are there other places which resort to rendering to a BufferedImage first?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-4365) PDFDebugger: JComboBox does not take generic parameters in Java 1.6

2018-10-30 Thread Emmeran Seehuber (JIRA)
Emmeran Seehuber created PDFBOX-4365:


 Summary: PDFDebugger: JComboBox does not take generic parameters 
in Java 1.6
 Key: PDFBOX-4365
 URL: https://issues.apache.org/jira/browse/PDFBOX-4365
 Project: PDFBox
  Issue Type: Bug
  Components: Swing GUI
Affects Versions: 2.0.13
Reporter: Emmeran Seehuber


In the pdfbox-debugger/StreamPane.java:168 you are using generics with a 
JComboBox.

This causes a compile error when targeting JDK 1.6, as JComboBox does only take 
a generic parameter since 1.7 AFAIR. I assume that this code will not run on 
JDK 1.6, you may get it to compile with a JDK 1.7+ - but at least JDK 10 
complains about this when compiling.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-4363) [Patch] Add a common interface PDShadingPaint for all shading paints

2018-10-28 Thread Emmeran Seehuber (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-4363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Emmeran Seehuber updated PDFBOX-4363:
-
Component/s: Rendering

> [Patch] Add a common interface PDShadingPaint for all shading paints
> 
>
> Key: PDFBOX-4363
> URL: https://issues.apache.org/jira/browse/PDFBOX-4363
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Rendering
>Affects Versions: 2.0.12
>Reporter: Emmeran Seehuber
>Priority: Minor
> Attachments: pdshadingpaint_base_interface.patch
>
>
> The attached patch adds an common interface PDShadingPaint to all 
> PDShading-based Paint's. It allows to access the underlying PDShading object 
> and the matrix of the paint.
> At the moment it is not possible to access this fields without using dirty 
> accessibility hacks, see also [this 
> commit|https://github.com/rototor/pdfbox-graphics2d/commit/8603f6f9c781604d4684b8cb46499703c34b4711].
> Why would you need that? I need that for my 
> [PdfBoxGraphics2D|https://github.com/rototor/pdfbox-graphics2d] adapter. I 
> draw PDF pages using PDFRenderer/PageDrawer back into a PDF. While doing so I 
> derive from both classes and change / filter certain aspects of the PDF. One 
> use case is to extract a specific seperation color into its own PDF page, 
> remap it to another color and also draw some overfill (i.e. a additional 
> border of 0.5pt around all shapes drawn with this color). This so prepared 
> page is then used with a machine which glues foil (gold, silver or copper) on 
> the places marked with that color.
> You can look at an [example 
> here|https://github.com/rototor/pdfbox-graphics2d/blob/master/src/test/java/de/rototor/pdfbox/graphics2d/PdfRerenderTest.java],
>  it does not use a seperation color, but you should get the idea. 
> My long term goal is to be able to use PDFBox for all possible pre-press PDF 
> manipulations. E.g. changing/remapping colorspaces, resampling images to the 
> target resolution, ...
> At the moment I know of this "special cases" which will need a special 
> treatment, as they are normally handled through rendering them first into a 
> BufferedImage:
>  - Transparency groups
>  - Softmasks
> Are there other places which resort to rendering to a BufferedImage first?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-4363) [Patch] Add a common interface PDShadingPaint for all shading paints

2018-10-28 Thread Emmeran Seehuber (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-4363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Emmeran Seehuber updated PDFBOX-4363:
-
Affects Version/s: 2.0.12

> [Patch] Add a common interface PDShadingPaint for all shading paints
> 
>
> Key: PDFBOX-4363
> URL: https://issues.apache.org/jira/browse/PDFBOX-4363
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Rendering
>Affects Versions: 2.0.12
>Reporter: Emmeran Seehuber
>Priority: Minor
> Attachments: pdshadingpaint_base_interface.patch
>
>
> The attached patch adds an common interface PDShadingPaint to all 
> PDShading-based Paint's. It allows to access the underlying PDShading object 
> and the matrix of the paint.
> At the moment it is not possible to access this fields without using dirty 
> accessibility hacks, see also [this 
> commit|https://github.com/rototor/pdfbox-graphics2d/commit/8603f6f9c781604d4684b8cb46499703c34b4711].
> Why would you need that? I need that for my 
> [PdfBoxGraphics2D|https://github.com/rototor/pdfbox-graphics2d] adapter. I 
> draw PDF pages using PDFRenderer/PageDrawer back into a PDF. While doing so I 
> derive from both classes and change / filter certain aspects of the PDF. One 
> use case is to extract a specific seperation color into its own PDF page, 
> remap it to another color and also draw some overfill (i.e. a additional 
> border of 0.5pt around all shapes drawn with this color). This so prepared 
> page is then used with a machine which glues foil (gold, silver or copper) on 
> the places marked with that color.
> You can look at an [example 
> here|https://github.com/rototor/pdfbox-graphics2d/blob/master/src/test/java/de/rototor/pdfbox/graphics2d/PdfRerenderTest.java],
>  it does not use a seperation color, but you should get the idea. 
> My long term goal is to be able to use PDFBox for all possible pre-press PDF 
> manipulations. E.g. changing/remapping colorspaces, resampling images to the 
> target resolution, ...
> At the moment I know of this "special cases" which will need a special 
> treatment, as they are normally handled through rendering them first into a 
> BufferedImage:
>  - Transparency groups
>  - Softmasks
> Are there other places which resort to rendering to a BufferedImage first?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-4363) [Patch] Add a common interface PDShadingPaint for all shading paints

2018-10-28 Thread Emmeran Seehuber (JIRA)
Emmeran Seehuber created PDFBOX-4363:


 Summary: [Patch] Add a common interface PDShadingPaint for all 
shading paints
 Key: PDFBOX-4363
 URL: https://issues.apache.org/jira/browse/PDFBOX-4363
 Project: PDFBox
  Issue Type: Improvement
Reporter: Emmeran Seehuber
 Attachments: pdshadingpaint_base_interface.patch

The attached patch adds an common interface PDShadingPaint to all 
PDShading-based Paint's. It allows to access the underlying PDShading object 
and the matrix of the paint.

At the moment it is not possible to access this fields without using dirty 
accessibility hacks, see also [this 
commit|https://github.com/rototor/pdfbox-graphics2d/commit/8603f6f9c781604d4684b8cb46499703c34b4711].

Why would you need that? I need that for my 
[PdfBoxGraphics2D|https://github.com/rototor/pdfbox-graphics2d] adapter. I draw 
PDF pages using PDFRenderer/PageDrawer back into a PDF. While doing so I derive 
from both classes and change / filter certain aspects of the PDF. One use case 
is to extract a specific seperation color into its own PDF page, remap it to 
another color and also draw some overfill (i.e. a additional border of 0.5pt 
around all shapes drawn with this color). This so prepared page is then used 
with a machine which glues foil (gold, silver or copper) on the places marked 
with that color.

You can look at an [example 
here|https://github.com/rototor/pdfbox-graphics2d/blob/master/src/test/java/de/rototor/pdfbox/graphics2d/PdfRerenderTest.java],
 it does not use a seperation color, but you should get the idea. 

My long term goal is to be able to use PDFBox for all possible pre-press PDF 
manipulations. E.g. changing/remapping colorspaces, resampling images to the 
target resolution, ...

At the moment I know of this "special cases" which will need a special 
treatment, as they are normally handled through rendering them first into a 
BufferedImage:
 - Transparency groups
 - Softmasks

Are there other places which resort to rendering to a BufferedImage first?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4341) [Patch] PNGConverter: PNG bytes to PDImageXObject converter

2018-10-25 Thread Emmeran Seehuber (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16664735#comment-16664735
 ] 

Emmeran Seehuber commented on PDFBOX-4341:
--

OkHttp is a nice http client to work with, does Http/2 out of the box with Java 
9+. And it features a good builder interface.

I'm not against making this configurable. But it should be done with a builder 
interface, not with static methods and tons of overloads. This would also allow 
to develop future features, e.g.
{code:java}
PDImageXObject.Compressor compressor = PDImageXObject.newCompressor(document)
/* 
 * Zip Level = 0, always try to embed file as is, etc.
 */
.bestSpeed()  
/*
 * Zip Level = 9, try PNGConverter but also try and compare with
 * predictor compression and non predictor compression. Take whats better
 */
.bestCompression()
/*
 * Change the zip level when recompressing as a fine tuning
 */
.zipLevel(6)
/* 
 * Try to built an indexed image in the LosslessFactory, this may fail
 * if there are more than 256 colors. It will fallback to "normal" predictor
 * compression in that case. You would just waste some CPU cycles in that case.
 */
.tryIndexedImage() 
/*
 * Convert the image to sRGB if it is not already in that color space. May
 * loose gamut, but this is not a problem for display only PDFs
 */
.flattenToSRGB()
/* 
 * Compress as JPEG using the given compression quality. Will be lossy, but 
likely smaller.
 */
.lossyCompression(0.9) 
.build();{code}
The user could then use the compressor when ever he wants to embed an image
{code:java}
PDImageXObject img = compressor.fromImage(bufferedImage);
PDImageXObject img2 = compressor.fromByteArray(bytes);{code}
What do you think? I can prepare a patch for such an interface, can't just not 
say when exactly I will have time.

> [Patch] PNGConverter: PNG bytes to PDImageXObject converter
> ---
>
> Key: PDFBOX-4341
> URL: https://issues.apache.org/jira/browse/PDFBOX-4341
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Writing
>Affects Versions: 2.0.12
>Reporter: Emmeran Seehuber
>Priority: Minor
> Attachments: 001229.png, 001230.png, 008528.png, 014431.png, 
> 016289.png, 017012.png, 017030.png, 017063.png, 017084.png, 
> image-2018-10-25-09-29-47-251.png, optimized.zip, pngconvert_testimg.zip, 
> pngconvert_v1.patch, pngconvert_v2.patch
>
>
> The attached patch implements a PNG bytes to PDImageXObject converter. It 
> tries to create a PDImageXObject from the chunks of a PNG image, without 
> recompressing it. This allows to use programs like pngcrush and friends to 
> embedded optimal compressed images. It’s also way faster than recompressing 
> the image.
> The class PNGConverter does this in three steps:
>  - Parsing the PNG chunk structure from the byte array
>  - Validating all relevant data chunks (i.e. checking the CRC). Chunks which 
> are not needed (e.g. text chunks) are not validated.
>  - Constructing a PDImageXObject from the chunks
> When at any of this steps an error occurs or the converter detects that it is 
> not possible to map the image, it will bail out and return null. In this case 
> the image has to be embedded the „normal“ way by reading it using ImageIO and 
> compressing it again.
> Only this PNG image types can be converted (at least theoretically) without 
> recompressing the image data:
>  - Grayscale
>  - Truecolor (i.e. RGB 8-Bit/16-Bit)
>  - Indexed
> As soon as transparency is used it gets difficult:
>  - Grayscale with alpha / truecolor with alpha: The alpha channel is saved in 
> the image data stream, as they are stored as (Gray,Alpha) or 
> (Red,Green,Blue,Alpha) tuples. You have to separate the alpha information for 
> the SMASK-Image. At this moment you can just read and recompress it using the 
> LosslessFactory.
>  - Indexed with alpha. Alpha and color tables are separate in the PNG, so 
> this should be possible to build a grayscale SMASK from the image data (which 
> are just the table indices) and the alpha table. Tried that, but Acrobat 
> Reader does not like indexed SMASKs… One could just build a grayscale SMASK 
> using the alpha table and the decompressed image index data. This would at 
> least save some space, as the optimized indexed image data is still used.
> With the current patch only truecolor without alpha images work correctly. 
> The other tests for grayscale and indexed fail. (You must place the zipped 
> images in the resources folder were png.png resides to run the testdrivers; 
> This images are „original“ work done by me using Gimp, Krita and ImageOptim 
> (on macOS) to build the different png image types.)
> Notes for the current patch:
>  - The grayscale images have the wrong gamma curve. I tried using the 
> ColorSpace.CS_GRAY ICC profile and the image se

[jira] [Comment Edited] (PDFBOX-4341) [Patch] PNGConverter: PNG bytes to PDImageXObject converter

2018-10-25 Thread Emmeran Seehuber (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16663396#comment-16663396
 ] 

Emmeran Seehuber edited comment on PDFBOX-4341 at 10/25/18 7:46 AM:


Sorry, I won't have time the next days to look into this. But this result does 
not surprise me, the images are far from being perfectly compressed:

!image-2018-10-25-09-29-47-251.png|height=200!

Using special tools like [ImageOptim|https://github.com/ImageOptim/ImageOptim] 
which uses 4 different PNG cruncher under the belt you can get the optimum 
compression. You just have to press the "Repeat" button multiply times, till 
you get a cross instead of the checkmark icon. And of course this takes time. 
I've attached [^optimized.zip] with the images optimized.

The main benefit of this patch is embedding speed, as it will (if possible) 
embed the image as is. If the image is perfectly optimized then you get the 
perfectly optimized image in the PDF, and if the image is poorly compressed you 
get exactly that.

The LosslessFactory-predictor based encoder does a good "baseline" job with 
compression. It wont be able to reach the compression possible with ImageOptim 
and such tools, but it will also not take ages.

The thing to test on the patch here is not that it compresses the image better 
than the LosslessFactory, because it does not compress the image at all. It's 
rather about embedding speed and correctness. If the PNGConverter converts a 
PNG image it still should stay the same and be correct.

I also think the idea of PDImageXObject.createFromByteArray() is to save 
loading and recompression, isn't it?

Regarding downloading govdocs: Instead of streaming them you should just 
download them first. I think you hit a server connection timeout. 

I'll try to make some benchmark to better show the benefits, but as I already 
wrote I wont have time the next days.


was (Author: rototor):
Sorry, I won't have time the next days to look into this. But this result does 
not surprise me, the image are far from being perfectly compressed:

!image-2018-10-25-09-29-47-251.png|height=200!

Using special tools like [ImageOptim|https://github.com/ImageOptim/ImageOptim] 
which uses 4 different PNG cruncher under the belt you can get the optimum 
compression. You just have to press the "Repeat" button multiply times, till 
you get a cross instead of the checkmark icon. And of course this takes time. 
I've attached [^optimized.zip] with the images optimized.

The main benefit of this patch is embedding speed, as it will (if possible) 
embed the image as is. If the image is perfectly optimized then you get the 
perfectly optimized image in the PDF, and if the image is poorly compressed you 
get exactly that.

The LosslessFactory-predictor based encoder does a good "baseline" job with 
compression. It wont be able to reach the compression possible with ImageOptim 
and such tools, but it will also not take ages.

The thing to test on the patch here is not that it compresses the image better 
than the LosslessFactory, because it does not compress the image at all. It's 
rather about embedding speed and correctness. If the PNGConverter converts a 
PNG image it still should stay the same and be correct.

I also think the idea of PDImageXObject.createFromByteArray() is to save 
loading and recompression, isn't it?

Regarding downloading govdocs: Instead of streaming them you should just 
download them first. I think you hit a server connection timeout. 

I'll try to make some benchmark to better show the benefits, but as I already 
wrote I wont have time the next days.

> [Patch] PNGConverter: PNG bytes to PDImageXObject converter
> ---
>
> Key: PDFBOX-4341
> URL: https://issues.apache.org/jira/browse/PDFBOX-4341
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Writing
>Affects Versions: 2.0.12
>Reporter: Emmeran Seehuber
>Priority: Minor
> Attachments: 001229.png, 001230.png, 008528.png, 014431.png, 
> 016289.png, 017012.png, 017030.png, 017063.png, 017084.png, 
> image-2018-10-25-09-29-47-251.png, optimized.zip, pngconvert_testimg.zip, 
> pngconvert_v1.patch, pngconvert_v2.patch
>
>
> The attached patch implements a PNG bytes to PDImageXObject converter. It 
> tries to create a PDImageXObject from the chunks of a PNG image, without 
> recompressing it. This allows to use programs like pngcrush and friends to 
> embedded optimal compressed images. It’s also way faster than recompressing 
> the image.
> The class PNGConverter does this in three steps:
>  - Parsing the PNG chunk structure from the byte array
>  - Validating all relevant data chunks (i.e. checking the CRC). Chunks which 
> are not needed (e.g. text chunks) are not validated.
>  - Constructi

[jira] [Commented] (PDFBOX-4341) [Patch] PNGConverter: PNG bytes to PDImageXObject converter

2018-10-25 Thread Emmeran Seehuber (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16663396#comment-16663396
 ] 

Emmeran Seehuber commented on PDFBOX-4341:
--

Sorry, I won't have time the next days to look into this. But this result does 
not surprise me, the image are far from being perfectly compressed:

!image-2018-10-25-09-29-47-251.png|height=200!

Using special tools like [ImageOptim|https://github.com/ImageOptim/ImageOptim] 
which uses 4 different PNG cruncher under the belt you can get the optimum 
compression. You just have to press the "Repeat" button multiply times, till 
you get a cross instead of the checkmark icon. And of course this takes time. 
I've attached [^optimized.zip] with the images optimized.

The main benefit of this patch is embedding speed, as it will (if possible) 
embed the image as is. If the image is perfectly optimized then you get the 
perfectly optimized image in the PDF, and if the image is poorly compressed you 
get exactly that.

The LosslessFactory-predictor based encoder does a good "baseline" job with 
compression. It wont be able to reach the compression possible with ImageOptim 
and such tools, but it will also not take ages.

The thing to test on the patch here is not that it compresses the image better 
than the LosslessFactory, because it does not compress the image at all. It's 
rather about embedding speed and correctness. If the PNGConverter converts a 
PNG image it still should stay the same and be correct.

I also think the idea of PDImageXObject.createFromByteArray() is to save 
loading and recompression, isn't it?

Regarding downloading govdocs: Instead of streaming them you should just 
download them first. I think you hit a server connection timeout. 

I'll try to make some benchmark to better show the benefits, but as I already 
wrote I wont have time the next days.

> [Patch] PNGConverter: PNG bytes to PDImageXObject converter
> ---
>
> Key: PDFBOX-4341
> URL: https://issues.apache.org/jira/browse/PDFBOX-4341
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Writing
>Affects Versions: 2.0.12
>Reporter: Emmeran Seehuber
>Priority: Minor
> Attachments: 001229.png, 001230.png, 008528.png, 014431.png, 
> 016289.png, 017012.png, 017030.png, 017063.png, 017084.png, 
> image-2018-10-25-09-29-47-251.png, optimized.zip, pngconvert_testimg.zip, 
> pngconvert_v1.patch, pngconvert_v2.patch
>
>
> The attached patch implements a PNG bytes to PDImageXObject converter. It 
> tries to create a PDImageXObject from the chunks of a PNG image, without 
> recompressing it. This allows to use programs like pngcrush and friends to 
> embedded optimal compressed images. It’s also way faster than recompressing 
> the image.
> The class PNGConverter does this in three steps:
>  - Parsing the PNG chunk structure from the byte array
>  - Validating all relevant data chunks (i.e. checking the CRC). Chunks which 
> are not needed (e.g. text chunks) are not validated.
>  - Constructing a PDImageXObject from the chunks
> When at any of this steps an error occurs or the converter detects that it is 
> not possible to map the image, it will bail out and return null. In this case 
> the image has to be embedded the „normal“ way by reading it using ImageIO and 
> compressing it again.
> Only this PNG image types can be converted (at least theoretically) without 
> recompressing the image data:
>  - Grayscale
>  - Truecolor (i.e. RGB 8-Bit/16-Bit)
>  - Indexed
> As soon as transparency is used it gets difficult:
>  - Grayscale with alpha / truecolor with alpha: The alpha channel is saved in 
> the image data stream, as they are stored as (Gray,Alpha) or 
> (Red,Green,Blue,Alpha) tuples. You have to separate the alpha information for 
> the SMASK-Image. At this moment you can just read and recompress it using the 
> LosslessFactory.
>  - Indexed with alpha. Alpha and color tables are separate in the PNG, so 
> this should be possible to build a grayscale SMASK from the image data (which 
> are just the table indices) and the alpha table. Tried that, but Acrobat 
> Reader does not like indexed SMASKs… One could just build a grayscale SMASK 
> using the alpha table and the decompressed image index data. This would at 
> least save some space, as the optimized indexed image data is still used.
> With the current patch only truecolor without alpha images work correctly. 
> The other tests for grayscale and indexed fail. (You must place the zipped 
> images in the resources folder were png.png resides to run the testdrivers; 
> This images are „original“ work done by me using Gimp, Krita and ImageOptim 
> (on macOS) to build the different png image types.)
> Notes for the current patch:
>  - The grayscale images have the wrong gamm

[jira] [Updated] (PDFBOX-4341) [Patch] PNGConverter: PNG bytes to PDImageXObject converter

2018-10-25 Thread Emmeran Seehuber (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-4341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Emmeran Seehuber updated PDFBOX-4341:
-
Attachment: optimized.zip

> [Patch] PNGConverter: PNG bytes to PDImageXObject converter
> ---
>
> Key: PDFBOX-4341
> URL: https://issues.apache.org/jira/browse/PDFBOX-4341
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Writing
>Affects Versions: 2.0.12
>Reporter: Emmeran Seehuber
>Priority: Minor
> Attachments: 001229.png, 001230.png, 008528.png, 014431.png, 
> 016289.png, 017012.png, 017030.png, 017063.png, 017084.png, 
> image-2018-10-25-09-29-47-251.png, optimized.zip, pngconvert_testimg.zip, 
> pngconvert_v1.patch, pngconvert_v2.patch
>
>
> The attached patch implements a PNG bytes to PDImageXObject converter. It 
> tries to create a PDImageXObject from the chunks of a PNG image, without 
> recompressing it. This allows to use programs like pngcrush and friends to 
> embedded optimal compressed images. It’s also way faster than recompressing 
> the image.
> The class PNGConverter does this in three steps:
>  - Parsing the PNG chunk structure from the byte array
>  - Validating all relevant data chunks (i.e. checking the CRC). Chunks which 
> are not needed (e.g. text chunks) are not validated.
>  - Constructing a PDImageXObject from the chunks
> When at any of this steps an error occurs or the converter detects that it is 
> not possible to map the image, it will bail out and return null. In this case 
> the image has to be embedded the „normal“ way by reading it using ImageIO and 
> compressing it again.
> Only this PNG image types can be converted (at least theoretically) without 
> recompressing the image data:
>  - Grayscale
>  - Truecolor (i.e. RGB 8-Bit/16-Bit)
>  - Indexed
> As soon as transparency is used it gets difficult:
>  - Grayscale with alpha / truecolor with alpha: The alpha channel is saved in 
> the image data stream, as they are stored as (Gray,Alpha) or 
> (Red,Green,Blue,Alpha) tuples. You have to separate the alpha information for 
> the SMASK-Image. At this moment you can just read and recompress it using the 
> LosslessFactory.
>  - Indexed with alpha. Alpha and color tables are separate in the PNG, so 
> this should be possible to build a grayscale SMASK from the image data (which 
> are just the table indices) and the alpha table. Tried that, but Acrobat 
> Reader does not like indexed SMASKs… One could just build a grayscale SMASK 
> using the alpha table and the decompressed image index data. This would at 
> least save some space, as the optimized indexed image data is still used.
> With the current patch only truecolor without alpha images work correctly. 
> The other tests for grayscale and indexed fail. (You must place the zipped 
> images in the resources folder were png.png resides to run the testdrivers; 
> This images are „original“ work done by me using Gimp, Krita and ImageOptim 
> (on macOS) to build the different png image types.)
> Notes for the current patch:
>  - The grayscale images have the wrong gamma curve. I tried using the 
> ColorSpace.CS_GRAY ICC profile and the image seems now only „slightly“ off 
> (i.e. pixel value FFD6D6D6 vs FFD7D7D7). As soon as a gAMA chunk is given the 
> image is tagged with a CalGray profile, but the colors are way more off then.
>  - The cHRM (chroma) chunk is read and *should* work, as I used the formula’s 
> from the PDF spec to convert the cRHM values to the CalRGB whitepoint and 
> matrix. I have not yet tested this, as I have no test image with cHRM at the 
> moment. Note: Matrix(COSArray) and Matrix.toCOSArray() are fine for geometric 
> matrices. But this methods are wrong for any other kind of matrix (i.e. color 
> transform matrices), as they only store/restore 6 values of the 3x3 matrix. I 
> deprecated PDCalRGB.setMatrix(Matrix) because of this, as this was never 
> working and can not work as long as the Matrix class is for geometric use 
> cases only. This should also be documented on the Matrix class, that it is 
> not general purpose. I added a PDCalRGB.setMatrix(COSArray) method to allow 
> to set the matrix.
>  - The indexed image displays fine in Acrobat Reader, but the test driver 
> fails as PDImageXObject.getImage() returns a complete black (everything 0) 
> image. Strange, I suspect some error in the PDFBox image decoding.
>  - If an image is tagged with sRGB, the builtin Java sRGB ICC profile is 
> attached. Theoretically you can use a CalRGB colorspace, but using a ICC 
> color profile is likely faster (at least in PDFBox) and more „standard“.
> You can also look at this patch on GitHub 
> [https://github.com/apache/pdfbox/compare/2.0...rototor:2.0-png-from-bytes-encoder?expand=1]
>  if you like.
> It would be nice if someone could 

[jira] [Updated] (PDFBOX-4341) [Patch] PNGConverter: PNG bytes to PDImageXObject converter

2018-10-25 Thread Emmeran Seehuber (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-4341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Emmeran Seehuber updated PDFBOX-4341:
-
Attachment: image-2018-10-25-09-29-47-251.png

> [Patch] PNGConverter: PNG bytes to PDImageXObject converter
> ---
>
> Key: PDFBOX-4341
> URL: https://issues.apache.org/jira/browse/PDFBOX-4341
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Writing
>Affects Versions: 2.0.12
>Reporter: Emmeran Seehuber
>Priority: Minor
> Attachments: 001229.png, 001230.png, 008528.png, 014431.png, 
> 016289.png, 017012.png, 017030.png, 017063.png, 017084.png, 
> image-2018-10-25-09-29-47-251.png, pngconvert_testimg.zip, 
> pngconvert_v1.patch, pngconvert_v2.patch
>
>
> The attached patch implements a PNG bytes to PDImageXObject converter. It 
> tries to create a PDImageXObject from the chunks of a PNG image, without 
> recompressing it. This allows to use programs like pngcrush and friends to 
> embedded optimal compressed images. It’s also way faster than recompressing 
> the image.
> The class PNGConverter does this in three steps:
>  - Parsing the PNG chunk structure from the byte array
>  - Validating all relevant data chunks (i.e. checking the CRC). Chunks which 
> are not needed (e.g. text chunks) are not validated.
>  - Constructing a PDImageXObject from the chunks
> When at any of this steps an error occurs or the converter detects that it is 
> not possible to map the image, it will bail out and return null. In this case 
> the image has to be embedded the „normal“ way by reading it using ImageIO and 
> compressing it again.
> Only this PNG image types can be converted (at least theoretically) without 
> recompressing the image data:
>  - Grayscale
>  - Truecolor (i.e. RGB 8-Bit/16-Bit)
>  - Indexed
> As soon as transparency is used it gets difficult:
>  - Grayscale with alpha / truecolor with alpha: The alpha channel is saved in 
> the image data stream, as they are stored as (Gray,Alpha) or 
> (Red,Green,Blue,Alpha) tuples. You have to separate the alpha information for 
> the SMASK-Image. At this moment you can just read and recompress it using the 
> LosslessFactory.
>  - Indexed with alpha. Alpha and color tables are separate in the PNG, so 
> this should be possible to build a grayscale SMASK from the image data (which 
> are just the table indices) and the alpha table. Tried that, but Acrobat 
> Reader does not like indexed SMASKs… One could just build a grayscale SMASK 
> using the alpha table and the decompressed image index data. This would at 
> least save some space, as the optimized indexed image data is still used.
> With the current patch only truecolor without alpha images work correctly. 
> The other tests for grayscale and indexed fail. (You must place the zipped 
> images in the resources folder were png.png resides to run the testdrivers; 
> This images are „original“ work done by me using Gimp, Krita and ImageOptim 
> (on macOS) to build the different png image types.)
> Notes for the current patch:
>  - The grayscale images have the wrong gamma curve. I tried using the 
> ColorSpace.CS_GRAY ICC profile and the image seems now only „slightly“ off 
> (i.e. pixel value FFD6D6D6 vs FFD7D7D7). As soon as a gAMA chunk is given the 
> image is tagged with a CalGray profile, but the colors are way more off then.
>  - The cHRM (chroma) chunk is read and *should* work, as I used the formula’s 
> from the PDF spec to convert the cRHM values to the CalRGB whitepoint and 
> matrix. I have not yet tested this, as I have no test image with cHRM at the 
> moment. Note: Matrix(COSArray) and Matrix.toCOSArray() are fine for geometric 
> matrices. But this methods are wrong for any other kind of matrix (i.e. color 
> transform matrices), as they only store/restore 6 values of the 3x3 matrix. I 
> deprecated PDCalRGB.setMatrix(Matrix) because of this, as this was never 
> working and can not work as long as the Matrix class is for geometric use 
> cases only. This should also be documented on the Matrix class, that it is 
> not general purpose. I added a PDCalRGB.setMatrix(COSArray) method to allow 
> to set the matrix.
>  - The indexed image displays fine in Acrobat Reader, but the test driver 
> fails as PDImageXObject.getImage() returns a complete black (everything 0) 
> image. Strange, I suspect some error in the PDFBox image decoding.
>  - If an image is tagged with sRGB, the builtin Java sRGB ICC profile is 
> attached. Theoretically you can use a CalRGB colorspace, but using a ICC 
> color profile is likely faster (at least in PDFBox) and more „standard“.
> You can also look at this patch on GitHub 
> [https://github.com/apache/pdfbox/compare/2.0...rototor:2.0-png-from-bytes-encoder?expand=1]
>  if you like.
> It would be nice if someone c

[jira] [Comment Edited] (PDFBOX-4341) [Patch] PNGConverter: PNG bytes to PDImageXObject converter

2018-10-21 Thread Emmeran Seehuber (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658346#comment-16658346
 ] 

Emmeran Seehuber edited comment on PDFBOX-4341 at 10/22/18 5:53 AM:


No problem, I know there is also always that strange thing called "real life" :)
 * PDCalRGB: setMatrix(): fixed it using getValues()
 * Intent: You are right, I missed that completely, the values in the PDF are 
of course different than the integers in the PNG... To be honest, the Intent is 
only relevant for PDF readers/printers who try to output the image on devices 
whose gamut has not the full gamut of sRGB in it (nowadays mostly laser 
printers, ink printers can handle sRGB usually without a problem). And as soon 
as an ICC profile is specified at least the Java color management uses the 
intent defined in the ICC profile. You have no other chance than to patch the 
ICC profile bytes if you want an different intent as the one in the ICC profile 
in Java... But setting the rendering intent on the image should not hurt in any 
way. 
 * The "black" indexed image was a "stale PDImageXObject state" bug, because it 
has to reread the image/colorspace as soon as you call setColorSpace() on it. 
I.e. the bug was that PDImageXObject.setColorSpace() has to clear the 
colorSpace and cachedImage field. I've added this to 
PDImageXObject.setColorSpace(). You can't just set the  colorspace parameter to 
the colorSpace field, as the PDColorSpace used for writing has not all state 
initialized that is needed to read an image...
 * I stripped down the patch and fixed some bugs on the way ... the usual "how 
could that work in the first place?!" stuff :)

 * 
 ** TrueColor/sRGB without transparency works
 ** When a gamma value of 1/2.2 is specified it is just ignored, because thats 
the gamma of sRGB.
 ** Indexed images work with and without transparency. I've updated the test 
images and have test files for 1, 2, 4 and 8 bit indexed images with 
transparency. This means if a user optimizes his PNG image and it has not more 
then 256 color/transparent combinations it will usually get indexed while 
optimizing and that will be kept at least for the image data. The SMASK has to 
be rebuilt as an grayscale image as Acrobat Reader does not like indexed 
SMASK's. But this will usually still be smaller and faster than recompressing 
the image.
 ** So "only" grayscale and grayscale/TrueColor with transparency are not 
handled. For grayscale I have no idea how to map the colorspace and when 
transparency is combined in the image data we simple can not map that in the 
PDF without completely recoding it. Also TrueColor with a chroma chunk but 
without a ICC profile is not handled, as I have no idea how to map that color 
space.
 ** I've update the image zip as I had to add some additional test images for 
the indexed image case.

With digitalcorpora you mean the govdoc files? I can run a test over this, but 
I can't yet say when I will have time for it. 


was (Author: rototor):
No problem, I know there is also always that strange thing called "real life" :)
 * PDCalRGB: setMatrix(): fixed it using getValues()
 * Intent: You are right, I missed that complete, the values in the PDF are of 
course different than the integers in the PNG... To be honest, the Intent is 
only relevant for PDF readers/printers who try to output the image on devices 
whose gamut has not the full gamut of sRGB in it (nowadays mostly laser 
printers, ink printers can handle sRGB usually without a problem). And as soon 
as an ICC profile is specified at least the Java color management uses the 
intent defined in the ICC profile.You have no other chance than to patch the 
ICC profile bytes if you want an different intent as the one in the ICC profile 
in Java... But setting the rendering intent on the image should not hurt in any 
way. 
 * The "black" indexed image was a "stale PDImageXObject state" bug, because it 
has to reread the image/colorspace as soon as you call setColorSpace() on it. 
I.e. the bug was that PDImageXObject.setColorSpace() has to clear the 
colorSpace and cachedImage field. I've added this to 
PDImageXObject.setColorSpace(). You can't just set the  colorspace parameter to 
the colorSpace field, as the PDColorSpace used for writing has not all state 
initialized that is needed to read an image...
 * I stripped down the patch and fixed some bugs on the way ... the usual "how 
could that work in the first place?!" stuff :)

 ** TrueColor/sRGB without transparency works
 ** When a gamma value of 1/2.2 is specified it is just ignored, because thats 
the gamma of sRGB.
 ** Indexed images work with and without transparency. I've updated the test 
images and have test files for 1, 2, 4 and 8 bit indexed images with 
transparency. This means if a user optimizes his PNG image and it has not more 
then 256 color/transparent combinatio

[jira] [Commented] (PDFBOX-4341) [Patch] PNGConverter: PNG bytes to PDImageXObject converter

2018-10-21 Thread Emmeran Seehuber (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658346#comment-16658346
 ] 

Emmeran Seehuber commented on PDFBOX-4341:
--

No problem, I know there is also always that strange thing called "real life" :)
 * PDCalRGB: setMatrix(): fixed it using getValues()
 * Intent: You are right, I missed that complete, the values in the PDF are of 
course different than the integers in the PNG... To be honest, the Intent is 
only relevant for PDF readers/printers who try to output the image on devices 
whose gamut has not the full gamut of sRGB in it (nowadays mostly laser 
printers, ink printers can handle sRGB usually without a problem). And as soon 
as an ICC profile is specified at least the Java color management uses the 
intent defined in the ICC profile.You have no other chance than to patch the 
ICC profile bytes if you want an different intent as the one in the ICC profile 
in Java... But setting the rendering intent on the image should not hurt in any 
way. 
 * The "black" indexed image was a "stale PDImageXObject state" bug, because it 
has to reread the image/colorspace as soon as you call setColorSpace() on it. 
I.e. the bug was that PDImageXObject.setColorSpace() has to clear the 
colorSpace and cachedImage field. I've added this to 
PDImageXObject.setColorSpace(). You can't just set the  colorspace parameter to 
the colorSpace field, as the PDColorSpace used for writing has not all state 
initialized that is needed to read an image...
 * I stripped down the patch and fixed some bugs on the way ... the usual "how 
could that work in the first place?!" stuff :)

 ** TrueColor/sRGB without transparency works
 ** When a gamma value of 1/2.2 is specified it is just ignored, because thats 
the gamma of sRGB.
 ** Indexed images work with and without transparency. I've updated the test 
images and have test files for 1, 2, 4 and 8 bit indexed images with 
transparency. This means if a user optimizes his PNG image and it has not more 
then 256 color/transparent combinations it will usually get indexed while 
optimizing and that will be kept at least for the image data. The SMASK has to 
rebuilt as grayscale image as Acrobat Reader does not like indexed SMASK's. But 
this will usually still be smaller and faster than recompressing the image.
 ** So "only" grayscale and grayscale/TrueColor with transparency are not 
handled. For grayscale I have no idea how to map the colorspace and when 
transparency is combined in the image data we simple can not map that in the 
PDF without a completely recoding it. Also TrueColor with a chroma chunk but 
without a ICC profile is not handled, as I have no idea how to map that color 
space.
 ** I've update the image zip as I had to add some additional test images for 
the indexed image case.

With digitalcorpora you mean the govdoc files? I can run a test over this, but 
I can't yet say when I will have time for it. 

> [Patch] PNGConverter: PNG bytes to PDImageXObject converter
> ---
>
> Key: PDFBOX-4341
> URL: https://issues.apache.org/jira/browse/PDFBOX-4341
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Writing
>Affects Versions: 2.0.12
>Reporter: Emmeran Seehuber
>Priority: Minor
> Attachments: pngconvert_testimg.zip, pngconvert_v1.patch, 
> pngconvert_v2.patch
>
>
> The attached patch implements a PNG bytes to PDImageXObject converter. It 
> tries to create a PDImageXObject from the chunks of a PNG image, without 
> recompressing it. This allows to use programs like pngcrush and friends to 
> embedded optimal compressed images. It’s also way faster than recompressing 
> the image.
> The class PNGConverter does this in three steps:
>  - Parsing the PNG chunk structure from the byte array
>  - Validating all relevant data chunks (i.e. checking the CRC). Chunks which 
> are not needed (e.g. text chunks) are not validated.
>  - Constructing a PDImageXObject from the chunks
> When at any of this steps an error occurs or the converter detects that it is 
> not possible to map the image, it will bail out and return null. In this case 
> the image has to be embedded the „normal“ way by reading it using ImageIO and 
> compressing it again.
> Only this PNG image types can be converted (at least theoretically) without 
> recompressing the image data:
>  - Grayscale
>  - Truecolor (i.e. RGB 8-Bit/16-Bit)
>  - Indexed
> As soon as transparency is used it gets difficult:
>  - Grayscale with alpha / truecolor with alpha: The alpha channel is saved in 
> the image data stream, as they are stored as (Gray,Alpha) or 
> (Red,Green,Blue,Alpha) tuples. You have to separate the alpha information for 
> the SMASK-Image. At this moment you can just read and recompress it using the 
> LosslessFactor

[jira] [Updated] (PDFBOX-4341) [Patch] PNGConverter: PNG bytes to PDImageXObject converter

2018-10-21 Thread Emmeran Seehuber (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-4341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Emmeran Seehuber updated PDFBOX-4341:
-
Attachment: (was: pngconvert_testimg.zip)

> [Patch] PNGConverter: PNG bytes to PDImageXObject converter
> ---
>
> Key: PDFBOX-4341
> URL: https://issues.apache.org/jira/browse/PDFBOX-4341
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Writing
>Affects Versions: 2.0.12
>Reporter: Emmeran Seehuber
>Priority: Minor
> Attachments: pngconvert_testimg.zip, pngconvert_v1.patch, 
> pngconvert_v2.patch
>
>
> The attached patch implements a PNG bytes to PDImageXObject converter. It 
> tries to create a PDImageXObject from the chunks of a PNG image, without 
> recompressing it. This allows to use programs like pngcrush and friends to 
> embedded optimal compressed images. It’s also way faster than recompressing 
> the image.
> The class PNGConverter does this in three steps:
>  - Parsing the PNG chunk structure from the byte array
>  - Validating all relevant data chunks (i.e. checking the CRC). Chunks which 
> are not needed (e.g. text chunks) are not validated.
>  - Constructing a PDImageXObject from the chunks
> When at any of this steps an error occurs or the converter detects that it is 
> not possible to map the image, it will bail out and return null. In this case 
> the image has to be embedded the „normal“ way by reading it using ImageIO and 
> compressing it again.
> Only this PNG image types can be converted (at least theoretically) without 
> recompressing the image data:
>  - Grayscale
>  - Truecolor (i.e. RGB 8-Bit/16-Bit)
>  - Indexed
> As soon as transparency is used it gets difficult:
>  - Grayscale with alpha / truecolor with alpha: The alpha channel is saved in 
> the image data stream, as they are stored as (Gray,Alpha) or 
> (Red,Green,Blue,Alpha) tuples. You have to separate the alpha information for 
> the SMASK-Image. At this moment you can just read and recompress it using the 
> LosslessFactory.
>  - Indexed with alpha. Alpha and color tables are separate in the PNG, so 
> this should be possible to build a grayscale SMASK from the image data (which 
> are just the table indices) and the alpha table. Tried that, but Acrobat 
> Reader does not like indexed SMASKs… One could just build a grayscale SMASK 
> using the alpha table and the decompressed image index data. This would at 
> least save some space, as the optimized indexed image data is still used.
> With the current patch only truecolor without alpha images work correctly. 
> The other tests for grayscale and indexed fail. (You must place the zipped 
> images in the resources folder were png.png resides to run the testdrivers; 
> This images are „original“ work done by me using Gimp, Krita and ImageOptim 
> (on macOS) to build the different png image types.)
> Notes for the current patch:
>  - The grayscale images have the wrong gamma curve. I tried using the 
> ColorSpace.CS_GRAY ICC profile and the image seems now only „slightly“ off 
> (i.e. pixel value FFD6D6D6 vs FFD7D7D7). As soon as a gAMA chunk is given the 
> image is tagged with a CalGray profile, but the colors are way more off then.
>  - The cHRM (chroma) chunk is read and *should* work, as I used the formula’s 
> from the PDF spec to convert the cRHM values to the CalRGB whitepoint and 
> matrix. I have not yet tested this, as I have no test image with cHRM at the 
> moment. Note: Matrix(COSArray) and Matrix.toCOSArray() are fine for geometric 
> matrices. But this methods are wrong for any other kind of matrix (i.e. color 
> transform matrices), as they only store/restore 6 values of the 3x3 matrix. I 
> deprecated PDCalRGB.setMatrix(Matrix) because of this, as this was never 
> working and can not work as long as the Matrix class is for geometric use 
> cases only. This should also be documented on the Matrix class, that it is 
> not general purpose. I added a PDCalRGB.setMatrix(COSArray) method to allow 
> to set the matrix.
>  - The indexed image displays fine in Acrobat Reader, but the test driver 
> fails as PDImageXObject.getImage() returns a complete black (everything 0) 
> image. Strange, I suspect some error in the PDFBox image decoding.
>  - If an image is tagged with sRGB, the builtin Java sRGB ICC profile is 
> attached. Theoretically you can use a CalRGB colorspace, but using a ICC 
> color profile is likely faster (at least in PDFBox) and more „standard“.
> You can also look at this patch on GitHub 
> [https://github.com/apache/pdfbox/compare/2.0...rototor:2.0-png-from-bytes-encoder?expand=1]
>  if you like.
> It would be nice if someone could give me some hints with the colorspace 
> problems. I will try to reread the specs again, maybe I have missed 
> something. But it would be grea

[jira] [Updated] (PDFBOX-4341) [Patch] PNGConverter: PNG bytes to PDImageXObject converter

2018-10-21 Thread Emmeran Seehuber (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-4341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Emmeran Seehuber updated PDFBOX-4341:
-
Attachment: pngconvert_testimg.zip

> [Patch] PNGConverter: PNG bytes to PDImageXObject converter
> ---
>
> Key: PDFBOX-4341
> URL: https://issues.apache.org/jira/browse/PDFBOX-4341
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Writing
>Affects Versions: 2.0.12
>Reporter: Emmeran Seehuber
>Priority: Minor
> Attachments: pngconvert_testimg.zip, pngconvert_v1.patch, 
> pngconvert_v2.patch
>
>
> The attached patch implements a PNG bytes to PDImageXObject converter. It 
> tries to create a PDImageXObject from the chunks of a PNG image, without 
> recompressing it. This allows to use programs like pngcrush and friends to 
> embedded optimal compressed images. It’s also way faster than recompressing 
> the image.
> The class PNGConverter does this in three steps:
>  - Parsing the PNG chunk structure from the byte array
>  - Validating all relevant data chunks (i.e. checking the CRC). Chunks which 
> are not needed (e.g. text chunks) are not validated.
>  - Constructing a PDImageXObject from the chunks
> When at any of this steps an error occurs or the converter detects that it is 
> not possible to map the image, it will bail out and return null. In this case 
> the image has to be embedded the „normal“ way by reading it using ImageIO and 
> compressing it again.
> Only this PNG image types can be converted (at least theoretically) without 
> recompressing the image data:
>  - Grayscale
>  - Truecolor (i.e. RGB 8-Bit/16-Bit)
>  - Indexed
> As soon as transparency is used it gets difficult:
>  - Grayscale with alpha / truecolor with alpha: The alpha channel is saved in 
> the image data stream, as they are stored as (Gray,Alpha) or 
> (Red,Green,Blue,Alpha) tuples. You have to separate the alpha information for 
> the SMASK-Image. At this moment you can just read and recompress it using the 
> LosslessFactory.
>  - Indexed with alpha. Alpha and color tables are separate in the PNG, so 
> this should be possible to build a grayscale SMASK from the image data (which 
> are just the table indices) and the alpha table. Tried that, but Acrobat 
> Reader does not like indexed SMASKs… One could just build a grayscale SMASK 
> using the alpha table and the decompressed image index data. This would at 
> least save some space, as the optimized indexed image data is still used.
> With the current patch only truecolor without alpha images work correctly. 
> The other tests for grayscale and indexed fail. (You must place the zipped 
> images in the resources folder were png.png resides to run the testdrivers; 
> This images are „original“ work done by me using Gimp, Krita and ImageOptim 
> (on macOS) to build the different png image types.)
> Notes for the current patch:
>  - The grayscale images have the wrong gamma curve. I tried using the 
> ColorSpace.CS_GRAY ICC profile and the image seems now only „slightly“ off 
> (i.e. pixel value FFD6D6D6 vs FFD7D7D7). As soon as a gAMA chunk is given the 
> image is tagged with a CalGray profile, but the colors are way more off then.
>  - The cHRM (chroma) chunk is read and *should* work, as I used the formula’s 
> from the PDF spec to convert the cRHM values to the CalRGB whitepoint and 
> matrix. I have not yet tested this, as I have no test image with cHRM at the 
> moment. Note: Matrix(COSArray) and Matrix.toCOSArray() are fine for geometric 
> matrices. But this methods are wrong for any other kind of matrix (i.e. color 
> transform matrices), as they only store/restore 6 values of the 3x3 matrix. I 
> deprecated PDCalRGB.setMatrix(Matrix) because of this, as this was never 
> working and can not work as long as the Matrix class is for geometric use 
> cases only. This should also be documented on the Matrix class, that it is 
> not general purpose. I added a PDCalRGB.setMatrix(COSArray) method to allow 
> to set the matrix.
>  - The indexed image displays fine in Acrobat Reader, but the test driver 
> fails as PDImageXObject.getImage() returns a complete black (everything 0) 
> image. Strange, I suspect some error in the PDFBox image decoding.
>  - If an image is tagged with sRGB, the builtin Java sRGB ICC profile is 
> attached. Theoretically you can use a CalRGB colorspace, but using a ICC 
> color profile is likely faster (at least in PDFBox) and more „standard“.
> You can also look at this patch on GitHub 
> [https://github.com/apache/pdfbox/compare/2.0...rototor:2.0-png-from-bytes-encoder?expand=1]
>  if you like.
> It would be nice if someone could give me some hints with the colorspace 
> problems. I will try to reread the specs again, maybe I have missed 
> something. But it would be great if someon

[jira] [Updated] (PDFBOX-4341) [Patch] PNGConverter: PNG bytes to PDImageXObject converter

2018-10-21 Thread Emmeran Seehuber (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-4341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Emmeran Seehuber updated PDFBOX-4341:
-
Attachment: pngconvert_v2.patch

> [Patch] PNGConverter: PNG bytes to PDImageXObject converter
> ---
>
> Key: PDFBOX-4341
> URL: https://issues.apache.org/jira/browse/PDFBOX-4341
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Writing
>Affects Versions: 2.0.12
>Reporter: Emmeran Seehuber
>Priority: Minor
> Attachments: pngconvert_testimg.zip, pngconvert_v1.patch, 
> pngconvert_v2.patch
>
>
> The attached patch implements a PNG bytes to PDImageXObject converter. It 
> tries to create a PDImageXObject from the chunks of a PNG image, without 
> recompressing it. This allows to use programs like pngcrush and friends to 
> embedded optimal compressed images. It’s also way faster than recompressing 
> the image.
> The class PNGConverter does this in three steps:
>  - Parsing the PNG chunk structure from the byte array
>  - Validating all relevant data chunks (i.e. checking the CRC). Chunks which 
> are not needed (e.g. text chunks) are not validated.
>  - Constructing a PDImageXObject from the chunks
> When at any of this steps an error occurs or the converter detects that it is 
> not possible to map the image, it will bail out and return null. In this case 
> the image has to be embedded the „normal“ way by reading it using ImageIO and 
> compressing it again.
> Only this PNG image types can be converted (at least theoretically) without 
> recompressing the image data:
>  - Grayscale
>  - Truecolor (i.e. RGB 8-Bit/16-Bit)
>  - Indexed
> As soon as transparency is used it gets difficult:
>  - Grayscale with alpha / truecolor with alpha: The alpha channel is saved in 
> the image data stream, as they are stored as (Gray,Alpha) or 
> (Red,Green,Blue,Alpha) tuples. You have to separate the alpha information for 
> the SMASK-Image. At this moment you can just read and recompress it using the 
> LosslessFactory.
>  - Indexed with alpha. Alpha and color tables are separate in the PNG, so 
> this should be possible to build a grayscale SMASK from the image data (which 
> are just the table indices) and the alpha table. Tried that, but Acrobat 
> Reader does not like indexed SMASKs… One could just build a grayscale SMASK 
> using the alpha table and the decompressed image index data. This would at 
> least save some space, as the optimized indexed image data is still used.
> With the current patch only truecolor without alpha images work correctly. 
> The other tests for grayscale and indexed fail. (You must place the zipped 
> images in the resources folder were png.png resides to run the testdrivers; 
> This images are „original“ work done by me using Gimp, Krita and ImageOptim 
> (on macOS) to build the different png image types.)
> Notes for the current patch:
>  - The grayscale images have the wrong gamma curve. I tried using the 
> ColorSpace.CS_GRAY ICC profile and the image seems now only „slightly“ off 
> (i.e. pixel value FFD6D6D6 vs FFD7D7D7). As soon as a gAMA chunk is given the 
> image is tagged with a CalGray profile, but the colors are way more off then.
>  - The cHRM (chroma) chunk is read and *should* work, as I used the formula’s 
> from the PDF spec to convert the cRHM values to the CalRGB whitepoint and 
> matrix. I have not yet tested this, as I have no test image with cHRM at the 
> moment. Note: Matrix(COSArray) and Matrix.toCOSArray() are fine for geometric 
> matrices. But this methods are wrong for any other kind of matrix (i.e. color 
> transform matrices), as they only store/restore 6 values of the 3x3 matrix. I 
> deprecated PDCalRGB.setMatrix(Matrix) because of this, as this was never 
> working and can not work as long as the Matrix class is for geometric use 
> cases only. This should also be documented on the Matrix class, that it is 
> not general purpose. I added a PDCalRGB.setMatrix(COSArray) method to allow 
> to set the matrix.
>  - The indexed image displays fine in Acrobat Reader, but the test driver 
> fails as PDImageXObject.getImage() returns a complete black (everything 0) 
> image. Strange, I suspect some error in the PDFBox image decoding.
>  - If an image is tagged with sRGB, the builtin Java sRGB ICC profile is 
> attached. Theoretically you can use a CalRGB colorspace, but using a ICC 
> color profile is likely faster (at least in PDFBox) and more „standard“.
> You can also look at this patch on GitHub 
> [https://github.com/apache/pdfbox/compare/2.0...rototor:2.0-png-from-bytes-encoder?expand=1]
>  if you like.
> It would be nice if someone could give me some hints with the colorspace 
> problems. I will try to reread the specs again, maybe I have missed 
> something. But it would be great if someone e

[jira] [Created] (PDFBOX-4341) [Patch] PNGConverter: PNG bytes to PDImageXObject converter

2018-10-14 Thread Emmeran Seehuber (JIRA)
Emmeran Seehuber created PDFBOX-4341:


 Summary: [Patch] PNGConverter: PNG bytes to PDImageXObject 
converter
 Key: PDFBOX-4341
 URL: https://issues.apache.org/jira/browse/PDFBOX-4341
 Project: PDFBox
  Issue Type: Improvement
  Components: Writing
Affects Versions: 2.0.12
Reporter: Emmeran Seehuber
 Attachments: pngconvert_testimg.zip, pngconvert_v1.patch

The attached patch implements a PNG bytes to PDImageXObject converter. It tries 
to create a PDImageXObject from the chunks of a PNG image, without 
recompressing it. This allows to use programs like pngcrush and friends to 
embedded optimal compressed images. It’s also way faster than recompressing the 
image.

The class PNGConverter does this in three steps:
 - Parsing the PNG chunk structure from the byte array
 - Validating all relevant data chunks (i.e. checking the CRC). Chunks which 
are not needed (e.g. text chunks) are not validated.
 - Constructing a PDImageXObject from the chunks

When at any of this steps an error occurs or the converter detects that it is 
not possible to map the image, it will bail out and return null. In this case 
the image has to be embedded the „normal“ way by reading it using ImageIO and 
compressing it again.

Only this PNG image types can be converted (at least theoretically) without 
recompressing the image data:
 - Grayscale
 - Truecolor (i.e. RGB 8-Bit/16-Bit)
 - Indexed

As soon as transparency is used it gets difficult:
 - Grayscale with alpha / truecolor with alpha: The alpha channel is saved in 
the image data stream, as they are stored as (Gray,Alpha) or 
(Red,Green,Blue,Alpha) tuples. You have to separate the alpha information for 
the SMASK-Image. At this moment you can just read and recompress it using the 
LosslessFactory.
 - Indexed with alpha. Alpha and color tables are separate in the PNG, so this 
should be possible to build a grayscale SMASK from the image data (which are 
just the table indices) and the alpha table. Tried that, but Acrobat Reader 
does not like indexed SMASKs… One could just build a grayscale SMASK using the 
alpha table and the decompressed image index data. This would at least save 
some space, as the optimized indexed image data is still used.

With the current patch only truecolor without alpha images work correctly. The 
other tests for grayscale and indexed fail. (You must place the zipped images 
in the resources folder were png.png resides to run the testdrivers; This 
images are „original“ work done by me using Gimp, Krita and ImageOptim (on 
macOS) to build the different png image types.)

Notes for the current patch:
 - The grayscale images have the wrong gamma curve. I tried using the 
ColorSpace.CS_GRAY ICC profile and the image seems now only „slightly“ off 
(i.e. pixel value FFD6D6D6 vs FFD7D7D7). As soon as a gAMA chunk is given the 
image is tagged with a CalGray profile, but the colors are way more off then.
 - The cHRM (chroma) chunk is read and *should* work, as I used the formula’s 
from the PDF spec to convert the cRHM values to the CalRGB whitepoint and 
matrix. I have not yet tested this, as I have no test image with cHRM at the 
moment. Note: Matrix(COSArray) and Matrix.toCOSArray() are fine for geometric 
matrices. But this methods are wrong for any other kind of matrix (i.e. color 
transform matrices), as they only store/restore 6 values of the 3x3 matrix. I 
deprecated PDCalRGB.setMatrix(Matrix) because of this, as this was never 
working and can not work as long as the Matrix class is for geometric use cases 
only. This should also be documented on the Matrix class, that it is not 
general purpose. I added a PDCalRGB.setMatrix(COSArray) method to allow to set 
the matrix.
 - The indexed image displays fine in Acrobat Reader, but the test driver fails 
as PDImageXObject.getImage() returns a complete black (everything 0) image. 
Strange, I suspect some error in the PDFBox image decoding.
 - If an image is tagged with sRGB, the builtin Java sRGB ICC profile is 
attached. Theoretically you can use a CalRGB colorspace, but using a ICC color 
profile is likely faster (at least in PDFBox) and more „standard“.

You can also look at this patch on GitHub 
[https://github.com/apache/pdfbox/compare/2.0...rototor:2.0-png-from-bytes-encoder?expand=1]
 if you like.

It would be nice if someone could give me some hints with the colorspace 
problems. I will try to reread the specs again, maybe I have missed something. 
But it would be great if someone else who has an idea about colorspaces could 
also take a look into this.

As I have no idea how long it takes to understand why the colors are off for 
grayscale and wrong for indexed, I could prepare a stripped down version of 
this patch, which only contains the working stuff (i.e. truecolor), and would 
just do nothing on the not working cases. What do you think?



--
This message wa

[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images

2018-09-20 Thread Emmeran Seehuber (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16621926#comment-16621926
 ] 

Emmeran Seehuber commented on PDFBOX-4184:
--

[~tilman] Well the checkIdent() works perfectly fine for me on Mac OS X with 
JDK 10.0.2. But converting between color spaces can always be lossy if you are 
not converting into a bigger color space (e.g. into 16 bit ProPhoto etc.), as 
you may e.g. get clippings when not all source color values can be mapped 1:1 
into the destination color space. No idea if there is something fixed in LCMS 
in JDK 10.0.2 to work better than in whatever JDK you used...

I want to implement a "getRawImage()" method similar to the "getImage()" method 
in the PDImageXObject. Directly comparing the "raw" pixel values would a allow 
a test which would never fail. I started a branch with some changes for that 
some months ago, but had no time yet to finish it... That would also be 
something for a new ticket.

> [PATCH]: Support simple lossless compression of 16 bit RGB images
> -
>
> Key: PDFBOX-4184
> URL: https://issues.apache.org/jira/browse/PDFBOX-4184
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Writing
>Affects Versions: 2.0.9
>Reporter: Emmeran Seehuber
>Priority: Minor
> Fix For: 2.0.12, 3.0.0 PDFBox
>
> Attachments: 16bit.png, LoadGovdocs.java, fix_profile_use.patch, 
> fix_profile_use3.patch, fix_profile_use4.patch, images.zip, 
> lossless_predictor_based_imageencoding.patch, 
> lossless_predictor_based_imageencoding_v2.patch, 
> lossless_predictor_based_imageencoding_v3.patch, 
> lossless_predictor_based_imageencoding_v4.patch, 
> lossless_predictor_based_imageencoding_v5.patch, 
> lossless_predictor_based_imageencoding_v6.patch, 
> pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, 
> png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf, 
> size_compare.txt
>
>
> The attached patch add support to write 16 bit per component images 
> correctly. I've integrated a test for this here: 
> [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9]
> It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this 
> is what you usually get when you read a 16 bit PNG file.
> This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173].
> The patch is against 2.0.9, but should apply to 3.0.0 too.
> There is still some room for improvements when writing lossless images, as 
> the images are currently not efficiently encoded. I.e. you could use PNG 
> encodings to get a better compression. (By adding a COSName.DECODE_PARMS with 
> a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is 
> something for a later patch. It would also need another API, as there is a 
> tradeoff speed vs compression ratio. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images

2018-09-18 Thread Emmeran Seehuber (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16618743#comment-16618743
 ] 

Emmeran Seehuber edited comment on PDFBOX-4184 at 9/18/18 12:31 PM:


[~tilman] If you have a ICC profile on an image, which is not the builtin sRGB 
profile, you need the ICC profile, otherwise you will just have plain wrong 
colors. You should not look at (r,g,b) or (c,m,y,k) as concrete color values, 
but rather as vectors within the color space. Without a profile describing the 
vectorspace/colorspace you have no idea what real colors the vector values 
result in. DeviceRGB is (on screen) often interpreted as sRGB. But what 
DeviceCMYK means is really up to the concrete interpreting device. I.e. this 
will look different on every printer (brightness, color, ...). So DeviceCMYK as 
a colorspace for an image mostly means "random", if you are not explicit 
targeting one specific printer. 

The ICC profile describes how to transform the color-vector-data into other 
colorspaces, e.g. into sRGB to view on the screen or the concrete ICC profile 
of the printing device. 

If you load images in java using ImageIO you usually (especially when using 
twelve monkeys) get an sRGB image. So you would never hit this code path. If 
you want to load an image with the real color profile of the image you must 
pass a special prepared (i.e. with the right profile) BufferedImage into 
ImageIO. So you won't get an image with a color space different to sRGB by 
accident.

If you have an image with an ICC profile, you always want the image to be 
written with the ICC profile because you explicit care about it.

Regarding file size bloat: Yes, the ICC profile will sum up, especially if you 
have more images. The correct solution would be a ICC_Profile <-> PDICCBased 
cache in the document, so that the same profile does not get encoded twice. 
Should I implement such a cache? In my application I manually deduplicate the 
ICC profiles at the moment.

The attached patch [^fix_profile_use4.patch] fixes the test driver and also 
specifies a "Alternate" colorspace for the profile, for all those devices which 
can not handle ICC_Profile's. With the correct ICC_Profile specified now also 
the "roundtrip" sRGB->ISO Coated->sRGB works correctly, so the image can be 
compared with the original image.

 


was (Author: rototor):
[~tilman] If you have a ICC profile on an image, which is not the builtin sRGB 
profile, you need the ICC profile, otherwise you will just have plain wrong 
colors. You should not look at (r,g,b) or (c,m,y,k) as concrete color values, 
but rather as vectors within the color space. Without a profile describing the 
vectorspace/colorspace you have no idea what real colors the vector values 
result in. DeviceRGB is (on screen) often interpreted as sRGB. But what 
DeviceCMYK means is really up to the concrete interpreting device. I.e. this 
will look different on every printer (brightness, color, ...). So DeviceCMYK as 
a colorspace for an image mostly means "random", if you are not explicit 
targeting one specific printer. 

The ICC profile describes how to transform the color-vector-data into other 
colorspaces, e.g. into sRGB to view on the screen or the concrete ICC profile 
of the printing device. 

If you load images in java using ImageIO you usually (especially when using 
twelve monkeys) get an sRGB image. So you would never hit this path. If you 
want to load an image with the real color profile of the image you must pass a 
special prepared (i.e. with the right profile) BufferedImage into ImageIO. So 
you wont get an image with an color space different to sRGB by accident.

If you have a image with an ICC profile, you always want the in this colorspace 
with the attached profile. As its already not so easy to get the image in 
anything different than sRGB.

Regarding file size bloat: Yes, the ICC profile will sum up, especially if you 
have more images. The correct solution would be a ICC_Profile <-> PDICCBased 
cache in the document, so that the same profile does not get encoded twice. 
Should I implement such a cache? In my application I manually deduplicate the 
ICC profiles at the moment.

The attached patch [^fix_profile_use4.patch] fixes the test driver and also 
specifies a "Alternate" colorspace for the profile, for all those devices which 
can not handle ICC_Profile's. With the correct ICC_Profile specified now also 
the "roundtrip" sRGB->ISO Coated->sRGB works correctly, so the image can be 
compared with the original image.

 

> [PATCH]: Support simple lossless compression of 16 bit RGB images
> -
>
> Key: PDFBOX-4184
> URL: https://issues.apache.org/jira/browse/PDFBOX-4184
> Project: PDFBox
>  Issue Type: Improvement
>  Components: 

[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images

2018-09-18 Thread Emmeran Seehuber (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16618743#comment-16618743
 ] 

Emmeran Seehuber commented on PDFBOX-4184:
--

[~tilman] If you have a ICC profile on an image, which is not the builtin sRGB 
profile, you need the ICC profile, otherwise you will just have plain wrong 
colors. You should not look at (r,g,b) or (c,m,y,k) as concrete color values, 
but rather as vectors within the color space. Without a profile describing the 
vectorspace/colorspace you have no idea what real colors the vector values 
result in. DeviceRGB is (on screen) often interpreted as sRGB. But what 
DeviceCMYK means is really up to the concrete interpreting device. I.e. this 
will look different on every printer (brightness, color, ...). So DeviceCMYK as 
a colorspace for an image mostly means "random", if you are not explicit 
targeting one specific printer. 

The ICC profile describes how to transform the color-vector-data into other 
colorspaces, e.g. into sRGB to view on the screen or the concrete ICC profile 
of the printing device. 

If you load images in java using ImageIO you usually (especially when using 
twelve monkeys) get an sRGB image. So you would never hit this path. If you 
want to load an image with the real color profile of the image you must pass a 
special prepared (i.e. with the right profile) BufferedImage into ImageIO. So 
you wont get an image with an color space different to sRGB by accident.

If you have a image with an ICC profile, you always want the in this colorspace 
with the attached profile. As its already not so easy to get the image in 
anything different than sRGB.

Regarding file size bloat: Yes, the ICC profile will sum up, especially if you 
have more images. The correct solution would be a ICC_Profile <-> PDICCBased 
cache in the document, so that the same profile does not get encoded twice. 
Should I implement such a cache? In my application I manually deduplicate the 
ICC profiles at the moment.

The attached patch [^fix_profile_use4.patch] fixes the test driver and also 
specifies a "Alternate" colorspace for the profile, for all those devices which 
can not handle ICC_Profile's. With the correct ICC_Profile specified now also 
the "roundtrip" sRGB->ISO Coated->sRGB works correctly, so the image can be 
compared with the original image.

 

> [PATCH]: Support simple lossless compression of 16 bit RGB images
> -
>
> Key: PDFBOX-4184
> URL: https://issues.apache.org/jira/browse/PDFBOX-4184
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Writing
>Affects Versions: 2.0.9
>Reporter: Emmeran Seehuber
>Priority: Minor
> Fix For: 2.0.12, 3.0.0 PDFBox
>
> Attachments: 16bit.png, LoadGovdocs.java, fix_profile_use.patch, 
> fix_profile_use3.patch, fix_profile_use4.patch, images.zip, 
> lossless_predictor_based_imageencoding.patch, 
> lossless_predictor_based_imageencoding_v2.patch, 
> lossless_predictor_based_imageencoding_v3.patch, 
> lossless_predictor_based_imageencoding_v4.patch, 
> lossless_predictor_based_imageencoding_v5.patch, 
> lossless_predictor_based_imageencoding_v6.patch, 
> pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, 
> png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf, 
> size_compare.txt
>
>
> The attached patch add support to write 16 bit per component images 
> correctly. I've integrated a test for this here: 
> [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9]
> It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this 
> is what you usually get when you read a 16 bit PNG file.
> This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173].
> The patch is against 2.0.9, but should apply to 3.0.0 too.
> There is still some room for improvements when writing lossless images, as 
> the images are currently not efficiently encoded. I.e. you could use PNG 
> encodings to get a better compression. (By adding a COSName.DECODE_PARMS with 
> a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is 
> something for a later patch. It would also need another API, as there is a 
> tradeoff speed vs compression ratio. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images

2018-09-18 Thread Emmeran Seehuber (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Emmeran Seehuber updated PDFBOX-4184:
-
Attachment: fix_profile_use4.patch

> [PATCH]: Support simple lossless compression of 16 bit RGB images
> -
>
> Key: PDFBOX-4184
> URL: https://issues.apache.org/jira/browse/PDFBOX-4184
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Writing
>Affects Versions: 2.0.9
>Reporter: Emmeran Seehuber
>Priority: Minor
> Fix For: 2.0.12, 3.0.0 PDFBox
>
> Attachments: 16bit.png, LoadGovdocs.java, fix_profile_use.patch, 
> fix_profile_use3.patch, fix_profile_use4.patch, images.zip, 
> lossless_predictor_based_imageencoding.patch, 
> lossless_predictor_based_imageencoding_v2.patch, 
> lossless_predictor_based_imageencoding_v3.patch, 
> lossless_predictor_based_imageencoding_v4.patch, 
> lossless_predictor_based_imageencoding_v5.patch, 
> lossless_predictor_based_imageencoding_v6.patch, 
> pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, 
> png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf, 
> size_compare.txt
>
>
> The attached patch add support to write 16 bit per component images 
> correctly. I've integrated a test for this here: 
> [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9]
> It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this 
> is what you usually get when you read a 16 bit PNG file.
> This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173].
> The patch is against 2.0.9, but should apply to 3.0.0 too.
> There is still some room for improvements when writing lossless images, as 
> the images are currently not efficiently encoded. I.e. you could use PNG 
> encodings to get a better compression. (By adding a COSName.DECODE_PARMS with 
> a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is 
> something for a later patch. It would also need another API, as there is a 
> tradeoff speed vs compression ratio. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images

2018-09-18 Thread Emmeran Seehuber (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Emmeran Seehuber updated PDFBOX-4184:
-
Attachment: fix_profile_use3.patch

> [PATCH]: Support simple lossless compression of 16 bit RGB images
> -
>
> Key: PDFBOX-4184
> URL: https://issues.apache.org/jira/browse/PDFBOX-4184
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Writing
>Affects Versions: 2.0.9
>Reporter: Emmeran Seehuber
>Priority: Minor
> Fix For: 2.0.12, 3.0.0 PDFBox
>
> Attachments: 16bit.png, LoadGovdocs.java, fix_profile_use.patch, 
> fix_profile_use3.patch, images.zip, 
> lossless_predictor_based_imageencoding.patch, 
> lossless_predictor_based_imageencoding_v2.patch, 
> lossless_predictor_based_imageencoding_v3.patch, 
> lossless_predictor_based_imageencoding_v4.patch, 
> lossless_predictor_based_imageencoding_v5.patch, 
> lossless_predictor_based_imageencoding_v6.patch, 
> pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, 
> png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf, 
> size_compare.txt
>
>
> The attached patch add support to write 16 bit per component images 
> correctly. I've integrated a test for this here: 
> [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9]
> It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this 
> is what you usually get when you read a 16 bit PNG file.
> This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173].
> The patch is against 2.0.9, but should apply to 3.0.0 too.
> There is still some room for improvements when writing lossless images, as 
> the images are currently not efficiently encoded. I.e. you could use PNG 
> encodings to get a better compression. (By adding a COSName.DECODE_PARMS with 
> a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is 
> something for a later patch. It would also need another API, as there is a 
> tradeoff speed vs compression ratio. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images

2018-09-18 Thread Emmeran Seehuber (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Emmeran Seehuber updated PDFBOX-4184:
-
Attachment: (was: fix_profile_use2.patch)

> [PATCH]: Support simple lossless compression of 16 bit RGB images
> -
>
> Key: PDFBOX-4184
> URL: https://issues.apache.org/jira/browse/PDFBOX-4184
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Writing
>Affects Versions: 2.0.9
>Reporter: Emmeran Seehuber
>Priority: Minor
> Fix For: 2.0.12, 3.0.0 PDFBox
>
> Attachments: 16bit.png, LoadGovdocs.java, fix_profile_use.patch, 
> images.zip, lossless_predictor_based_imageencoding.patch, 
> lossless_predictor_based_imageencoding_v2.patch, 
> lossless_predictor_based_imageencoding_v3.patch, 
> lossless_predictor_based_imageencoding_v4.patch, 
> lossless_predictor_based_imageencoding_v5.patch, 
> lossless_predictor_based_imageencoding_v6.patch, 
> pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, 
> png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf, 
> size_compare.txt
>
>
> The attached patch add support to write 16 bit per component images 
> correctly. I've integrated a test for this here: 
> [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9]
> It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this 
> is what you usually get when you read a 16 bit PNG file.
> This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173].
> The patch is against 2.0.9, but should apply to 3.0.0 too.
> There is still some room for improvements when writing lossless images, as 
> the images are currently not efficiently encoded. I.e. you could use PNG 
> encodings to get a better compression. (By adding a COSName.DECODE_PARMS with 
> a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is 
> something for a later patch. It would also need another API, as there is a 
> tradeoff speed vs compression ratio. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images

2018-09-18 Thread Emmeran Seehuber (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Emmeran Seehuber updated PDFBOX-4184:
-
Attachment: fix_profile_use2.patch

> [PATCH]: Support simple lossless compression of 16 bit RGB images
> -
>
> Key: PDFBOX-4184
> URL: https://issues.apache.org/jira/browse/PDFBOX-4184
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Writing
>Affects Versions: 2.0.9
>Reporter: Emmeran Seehuber
>Priority: Minor
> Fix For: 2.0.12, 3.0.0 PDFBox
>
> Attachments: 16bit.png, LoadGovdocs.java, fix_profile_use.patch, 
> fix_profile_use2.patch, images.zip, 
> lossless_predictor_based_imageencoding.patch, 
> lossless_predictor_based_imageencoding_v2.patch, 
> lossless_predictor_based_imageencoding_v3.patch, 
> lossless_predictor_based_imageencoding_v4.patch, 
> lossless_predictor_based_imageencoding_v5.patch, 
> lossless_predictor_based_imageencoding_v6.patch, 
> pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, 
> png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf, 
> size_compare.txt
>
>
> The attached patch add support to write 16 bit per component images 
> correctly. I've integrated a test for this here: 
> [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9]
> It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this 
> is what you usually get when you read a 16 bit PNG file.
> This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173].
> The patch is against 2.0.9, but should apply to 3.0.0 too.
> There is still some room for improvements when writing lossless images, as 
> the images are currently not efficiently encoded. I.e. you could use PNG 
> encodings to get a better compression. (By adding a COSName.DECODE_PARMS with 
> a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is 
> something for a later patch. It would also need another API, as there is a 
> tradeoff speed vs compression ratio. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images

2018-09-17 Thread Emmeran Seehuber (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16617610#comment-16617610
 ] 

Emmeran Seehuber commented on PDFBOX-4184:
--

I'm in the progress to migrate some part of my application from iText to 
PDFBox. While doing so I found a bug with image that have a ICC_Profile. The 
LosslessFactory compresses the ICC Profile of an image correctly - but does not 
use it ... This small patch fixes this:

[^fix_profile_use.patch]

> [PATCH]: Support simple lossless compression of 16 bit RGB images
> -
>
> Key: PDFBOX-4184
> URL: https://issues.apache.org/jira/browse/PDFBOX-4184
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Writing
>Affects Versions: 2.0.9
>Reporter: Emmeran Seehuber
>Priority: Minor
> Fix For: 2.0.12, 3.0.0 PDFBox
>
> Attachments: 16bit.png, LoadGovdocs.java, fix_profile_use.patch, 
> images.zip, lossless_predictor_based_imageencoding.patch, 
> lossless_predictor_based_imageencoding_v2.patch, 
> lossless_predictor_based_imageencoding_v3.patch, 
> lossless_predictor_based_imageencoding_v4.patch, 
> lossless_predictor_based_imageencoding_v5.patch, 
> lossless_predictor_based_imageencoding_v6.patch, 
> pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, 
> png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf, 
> size_compare.txt
>
>
> The attached patch add support to write 16 bit per component images 
> correctly. I've integrated a test for this here: 
> [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9]
> It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this 
> is what you usually get when you read a 16 bit PNG file.
> This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173].
> The patch is against 2.0.9, but should apply to 3.0.0 too.
> There is still some room for improvements when writing lossless images, as 
> the images are currently not efficiently encoded. I.e. you could use PNG 
> encodings to get a better compression. (By adding a COSName.DECODE_PARMS with 
> a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is 
> something for a later patch. It would also need another API, as there is a 
> tradeoff speed vs compression ratio. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images

2018-09-17 Thread Emmeran Seehuber (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Emmeran Seehuber updated PDFBOX-4184:
-
Attachment: fix_profile_use.patch

> [PATCH]: Support simple lossless compression of 16 bit RGB images
> -
>
> Key: PDFBOX-4184
> URL: https://issues.apache.org/jira/browse/PDFBOX-4184
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Writing
>Affects Versions: 2.0.9
>Reporter: Emmeran Seehuber
>Priority: Minor
> Fix For: 2.0.12, 3.0.0 PDFBox
>
> Attachments: 16bit.png, LoadGovdocs.java, fix_profile_use.patch, 
> images.zip, lossless_predictor_based_imageencoding.patch, 
> lossless_predictor_based_imageencoding_v2.patch, 
> lossless_predictor_based_imageencoding_v3.patch, 
> lossless_predictor_based_imageencoding_v4.patch, 
> lossless_predictor_based_imageencoding_v5.patch, 
> lossless_predictor_based_imageencoding_v6.patch, 
> pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, 
> png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf, 
> size_compare.txt
>
>
> The attached patch add support to write 16 bit per component images 
> correctly. I've integrated a test for this here: 
> [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9]
> It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this 
> is what you usually get when you read a 16 bit PNG file.
> This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173].
> The patch is against 2.0.9, but should apply to 3.0.0 too.
> There is still some room for improvements when writing lossless images, as 
> the images are currently not efficiently encoded. I.e. you could use PNG 
> encodings to get a better compression. (By adding a COSName.DECODE_PARMS with 
> a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is 
> something for a later patch. It would also need another API, as there is a 
> tradeoff speed vs compression ratio. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images

2018-07-22 Thread Emmeran Seehuber (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16551961#comment-16551961
 ] 

Emmeran Seehuber commented on PDFBOX-4184:
--

Colorspace == sRGB && depth == 16  should nearly always be false. But I am fine 
with just adding the colorspace condition.

> [PATCH]: Support simple lossless compression of 16 bit RGB images
> -
>
> Key: PDFBOX-4184
> URL: https://issues.apache.org/jira/browse/PDFBOX-4184
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Writing
>Affects Versions: 2.0.9
>Reporter: Emmeran Seehuber
>Priority: Minor
> Fix For: 2.0.12, 3.0.0 PDFBox
>
> Attachments: 16bit.png, LoadGovdocs.java, images.zip, 
> lossless_predictor_based_imageencoding.patch, 
> lossless_predictor_based_imageencoding_v2.patch, 
> lossless_predictor_based_imageencoding_v3.patch, 
> lossless_predictor_based_imageencoding_v4.patch, 
> lossless_predictor_based_imageencoding_v5.patch, 
> lossless_predictor_based_imageencoding_v6.patch, 
> pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, 
> png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf, 
> size_compare.txt
>
>
> The attached patch add support to write 16 bit per component images 
> correctly. I've integrated a test for this here: 
> [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9]
> It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this 
> is what you usually get when you read a 16 bit PNG file.
> This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173].
> The patch is against 2.0.9, but should apply to 3.0.0 too.
> There is still some room for improvements when writing lossless images, as 
> the images are currently not efficiently encoded. I.e. you could use PNG 
> encodings to get a better compression. (By adding a COSName.DECODE_PARMS with 
> a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is 
> something for a later patch. It would also need another API, as there is a 
> tradeoff speed vs compression ratio. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images

2018-07-22 Thread Emmeran Seehuber (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16551951#comment-16551951
 ] 

Emmeran Seehuber commented on PDFBOX-4184:
--

I would suggest changing the condition from
{code:java}
if (pdImageXObject.getBitsPerComponent() < 16 &&
 image.getWidth() * image.getHeight() <= 50 * 
50)
{code}
to
{code:java}
if (pdImageXObject.getColorSpace == PDDeviceRGB.INSTANCE &&
 image.getWidth() * image.getHeight() <= 50 * 
50)
{code}
as otherwise the LosslessFactory may "random" destroy/reduce the color 
information of small images. If e.g. the user has a requirement to always 
encode images as CMYK, this would break it. On the other side reducing a 16 Bit 
sRGB image to 8 bit is not really losing color information, as sRGB is a rather 
small color space. As most image decoders (e.g. TwelveMonkeys) by default 
convert every image they decode to sRGB, you can be sure that the user really 
wants the non default color space used when he gives a non sRGB image to the 
LosslessFactory.

> [PATCH]: Support simple lossless compression of 16 bit RGB images
> -
>
> Key: PDFBOX-4184
> URL: https://issues.apache.org/jira/browse/PDFBOX-4184
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Writing
>Affects Versions: 2.0.9
>Reporter: Emmeran Seehuber
>Priority: Minor
> Fix For: 2.0.12, 3.0.0 PDFBox
>
> Attachments: 16bit.png, LoadGovdocs.java, images.zip, 
> lossless_predictor_based_imageencoding.patch, 
> lossless_predictor_based_imageencoding_v2.patch, 
> lossless_predictor_based_imageencoding_v3.patch, 
> lossless_predictor_based_imageencoding_v4.patch, 
> lossless_predictor_based_imageencoding_v5.patch, 
> lossless_predictor_based_imageencoding_v6.patch, 
> pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, 
> png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf, 
> size_compare.txt
>
>
> The attached patch add support to write 16 bit per component images 
> correctly. I've integrated a test for this here: 
> [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9]
> It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this 
> is what you usually get when you read a 16 bit PNG file.
> This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173].
> The patch is against 2.0.9, but should apply to 3.0.0 too.
> There is still some room for improvements when writing lossless images, as 
> the images are currently not efficiently encoded. I.e. you could use PNG 
> encodings to get a better compression. (By adding a COSName.DECODE_PARMS with 
> a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is 
> something for a later patch. It would also need another API, as there is a 
> tradeoff speed vs compression ratio. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4013) Java 9/macOS: Debugger App does not start (NoSuchMethodException)

2018-07-17 Thread Emmeran Seehuber (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16547450#comment-16547450
 ] 

Emmeran Seehuber commented on PDFBOX-4013:
--

[~tilman] I just tested the current 2.0 branch with JDK 1.7.0_67 and the 
"original" old Apple JDK1.6.0 (i.e. started the PDFDebugger from inside 
IntelliJ). Seems to work as before.

> Java 9/macOS: Debugger App does not start (NoSuchMethodException)
> -
>
> Key: PDFBOX-4013
> URL: https://issues.apache.org/jira/browse/PDFBOX-4013
> Project: PDFBox
>  Issue Type: Bug
>  Components: Utilities
>Affects Versions: 2.0.8
>Reporter: Emmeran Seehuber
>Priority: Major
>  Labels: jdk9, mac-os-x
> Fix For: 2.0.12, 3.0.0 PDFBox
>
> Attachments: pdfdebugger-macos-fixes_v1.patch
>
>
> It seems the debugger app wants to integrate nicely into macOS and uses some 
> private API for this. This worked fine with all Java versions including 8, 
> but does no longer work with 9.
> Java 9 provides new APIs for this, but till PDFBox can depend on Java 9 (or 
> the next LTS Java 11) it should at least catch this and not crash
> The application does not start, and instead displays a dialog with a stack 
> trace.
> Console Output + StackTrace:
> {code}
> WARNING: An illegal reflective access operation has occurred
> WARNING: Illegal reflective access by 
> org.apache.pdfbox.debugger.ui.OSXAdapter 
> (file:/Users/emmy/Downloads/debugger-app-2.0.7.jar) to constructor 
> com.apple.eawt.Application()
> WARNING: Please consider reporting this to the maintainers of 
> org.apache.pdfbox.debugger.ui.OSXAdapter
> WARNING: Use --illegal-access=warn to enable warnings of further illegal 
> reflective access operations
> WARNING: All illegal access operations will be denied in a future release
> Mac OS X Adapter could not talk to EAWT:
> java.lang.RuntimeException: java.lang.NoSuchMethodException: 
> com.apple.eawt.Application.addApplicationListener(com.apple.eawt.ApplicationListener)
> org.apache.pdfbox.debugger.ui.OSXAdapter.setHandler(OSXAdapter.java:171)
> 
> org.apache.pdfbox.debugger.ui.OSXAdapter.setFileHandler(OSXAdapter.java:137)
> 
> org.apache.pdfbox.debugger.PDFDebugger.initComponents(PDFDebugger.java:301)
> org.apache.pdfbox.debugger.PDFDebugger.(PDFDebugger.java:182)
> org.apache.pdfbox.debugger.PDFDebugger.main(PDFDebugger.java:1201)
> Caused by: java.lang.NoSuchMethodException: 
> com.apple.eawt.Application.addApplicationListener(com.apple.eawt.ApplicationListener)
> java.base/java.lang.Class.getDeclaredMethod(Class.java:2432)
> org.apache.pdfbox.debugger.ui.OSXAdapter.setHandler(OSXAdapter.java:163)
> 
> org.apache.pdfbox.debugger.ui.OSXAdapter.setFileHandler(OSXAdapter.java:137)
> 
> org.apache.pdfbox.debugger.PDFDebugger.initComponents(PDFDebugger.java:301)
> org.apache.pdfbox.debugger.PDFDebugger.(PDFDebugger.java:182)
> org.apache.pdfbox.debugger.PDFDebugger.main(PDFDebugger.java:1201)
> {code}
> To workaround this problem I have to run the debugger app using JDK 8. This 
> is ok for now, but very annoying.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images

2018-07-10 Thread Emmeran Seehuber (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16539054#comment-16539054
 ] 

Emmeran Seehuber commented on PDFBOX-4184:
--

{quote}Most of the time people complain about time. There are almost never 
complaints about size.
{quote}
They will complain about size when they start to write high resolution prepress 
PDFs... There is a difference between a 60 MB PDF and a 80 MB PDF, especially 
if you work with many PDFs and have to upload them to the print shop. If you 
only care about web with low resolution images then of course the size is not 
that important. As always, it depends on the use case.
{quote}It is possible in PDF specification, but it hasn't been implemented for 
PDFBox.
{quote}
Do you have a pointer (e.g. pagenumber in the PDF 1.7 spec) where the index 
image format in PDF is described? I did not find this.
{quote}Try {{PDImageXObject.createFromByteArray()}}. 
{quote}
 Ah, I see. So their is already an API for this, it just does not really handle 
PNGs yet (i.e. it will load the PNG as BufferedImage and do a lossless 
compression in opposite to directly reusing the already optimized IDAT chunk - 
which of course would also be much faster). I'll try to find some time to 
implement IDAT reusing.

> [PATCH]: Support simple lossless compression of 16 bit RGB images
> -
>
> Key: PDFBOX-4184
> URL: https://issues.apache.org/jira/browse/PDFBOX-4184
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Writing
>Affects Versions: 2.0.9
>Reporter: Emmeran Seehuber
>Priority: Minor
> Fix For: 2.0.12, 3.0.0 PDFBox
>
> Attachments: 16bit.png, LoadGovdocs.java, 
> lossless_predictor_based_imageencoding.patch, 
> lossless_predictor_based_imageencoding_v2.patch, 
> lossless_predictor_based_imageencoding_v3.patch, 
> lossless_predictor_based_imageencoding_v4.patch, 
> lossless_predictor_based_imageencoding_v5.patch, 
> lossless_predictor_based_imageencoding_v6.patch, 
> pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, 
> png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf, 
> size_compare.txt
>
>
> The attached patch add support to write 16 bit per component images 
> correctly. I've integrated a test for this here: 
> [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9]
> It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this 
> is what you usually get when you read a 16 bit PNG file.
> This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173].
> The patch is against 2.0.9, but should apply to 3.0.0 too.
> There is still some room for improvements when writing lossless images, as 
> the images are currently not efficiently encoded. I.e. you could use PNG 
> encodings to get a better compression. (By adding a COSName.DECODE_PARMS with 
> a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is 
> something for a later patch. It would also need another API, as there is a 
> tradeoff speed vs compression ratio. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images

2018-07-10 Thread Emmeran Seehuber (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16538311#comment-16538311
 ] 

Emmeran Seehuber commented on PDFBOX-4184:
--

I did a test with a subset of the govdoc images to get an idea what estimate 
method might be better. I tested the govdoc ZIPs 0 up to 57. See the attached 
report [^size_compare.txt]

Their were 5590 images found, 1719 did not change (i.e. no difference between a 
signed and a Math.abs based estimated), 77 files compressed better with a 
singed estimate and 3794 files compressed better when using Math.abs in the 
estimate. So I would suggest changing estCompressSum() to

{code}
private static long estCompressSum(byte[] dataRawRowSub)
{
long sum = 0;
for (byte aDataRawRowSub : dataRawRowSub)
{
sum += Math.abs(aDataRawRowSub);
}
return sum;
}
{code}

as this clearly seems to be a win.

> [PATCH]: Support simple lossless compression of 16 bit RGB images
> -
>
> Key: PDFBOX-4184
> URL: https://issues.apache.org/jira/browse/PDFBOX-4184
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Writing
>Affects Versions: 2.0.9
>Reporter: Emmeran Seehuber
>Priority: Minor
> Fix For: 2.0.12, 3.0.0 PDFBox
>
> Attachments: 16bit.png, LoadGovdocs.java, 
> lossless_predictor_based_imageencoding.patch, 
> lossless_predictor_based_imageencoding_v2.patch, 
> lossless_predictor_based_imageencoding_v3.patch, 
> lossless_predictor_based_imageencoding_v4.patch, 
> lossless_predictor_based_imageencoding_v5.patch, 
> lossless_predictor_based_imageencoding_v6.patch, 
> pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, 
> png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf, 
> size_compare.txt
>
>
> The attached patch add support to write 16 bit per component images 
> correctly. I've integrated a test for this here: 
> [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9]
> It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this 
> is what you usually get when you read a 16 bit PNG file.
> This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173].
> The patch is against 2.0.9, but should apply to 3.0.0 too.
> There is still some room for improvements when writing lossless images, as 
> the images are currently not efficiently encoded. I.e. you could use PNG 
> encodings to get a better compression. (By adding a COSName.DECODE_PARMS with 
> a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is 
> something for a later patch. It would also need another API, as there is a 
> tradeoff speed vs compression ratio. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images

2018-07-10 Thread Emmeran Seehuber (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Emmeran Seehuber updated PDFBOX-4184:
-
Attachment: size_compare.txt

> [PATCH]: Support simple lossless compression of 16 bit RGB images
> -
>
> Key: PDFBOX-4184
> URL: https://issues.apache.org/jira/browse/PDFBOX-4184
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Writing
>Affects Versions: 2.0.9
>Reporter: Emmeran Seehuber
>Priority: Minor
> Fix For: 2.0.12, 3.0.0 PDFBox
>
> Attachments: 16bit.png, LoadGovdocs.java, 
> lossless_predictor_based_imageencoding.patch, 
> lossless_predictor_based_imageencoding_v2.patch, 
> lossless_predictor_based_imageencoding_v3.patch, 
> lossless_predictor_based_imageencoding_v4.patch, 
> lossless_predictor_based_imageencoding_v5.patch, 
> lossless_predictor_based_imageencoding_v6.patch, 
> pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, 
> png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf, 
> size_compare.txt
>
>
> The attached patch add support to write 16 bit per component images 
> correctly. I've integrated a test for this here: 
> [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9]
> It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this 
> is what you usually get when you read a 16 bit PNG file.
> This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173].
> The patch is against 2.0.9, but should apply to 3.0.0 too.
> There is still some room for improvements when writing lossless images, as 
> the images are currently not efficiently encoded. I.e. you could use PNG 
> encodings to get a better compression. (By adding a COSName.DECODE_PARMS with 
> a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is 
> something for a later patch. It would also need another API, as there is a 
> tradeoff speed vs compression ratio. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4013) Java 9/macOS: Debugger App does not start (NoSuchMethodException)

2018-07-08 Thread Emmeran Seehuber (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16536342#comment-16536342
 ] 

Emmeran Seehuber commented on PDFBOX-4013:
--

For JDK9+ it is possible (and the preferred way) to use the 
Deskop.getDesktop().setXYHandler(). None of the gist's were using this. In the 
attached patch [^pdfdebugger-macos-fixes_v1.patch] I used reflection to set the 
the event handlers. Beside this I did some future stuff in the patch:
 * Allow printing on macOS. I see no reason not to allow it, it least it works 
for me. No, there is no global printing menu (at least not on macOS High 
Sierra, no idea why it was disabled in the first place).
 * Removed duplicate code in the exit handlers. There was one for the menu 
events and one for the window event.
 * Fixed opening files on macOS. FileDialog.getFile() only returns the file 
name without the pathname.
 * Made this class more friendly to be embedded into other applications. I'm 
already using the PDFDebugger in on of my  applications as my application 
handles many PDF files and PDFDebugger is very nice to take quick look into a 
PDF. Currently I manually remove the window listener (which would quit the 
whole java application when closing the window ...) and because my application 
is already on JDK 10 the global event handler is not set. I made two protected 
methods performApplicationExit() and initGlobalEventHandlers() which can be 
overridden when embedding the PDFDebugger in another application to just be 
NOP's. I think this is better than an additional constructor parameter, but if 
you think an explicit parameter would be better (as it would document the fact 
that you can embed the PDFDebugger) I'll change that.

As a side node: Theoretically osxOpenFiles() should work as it works on JDK <= 
8. But in practice it does nothing on all JDKs. The file open event handlers 
are only called when an application is bundled. On macOS bundled applications 
are special named directories with some "magic" config files and all 
application files in there.  You can look up details here 
[https://developer.apple.com/library/archive/documentation/CoreFoundation/Conceptual/CFBundles/BundleTypes/BundleTypes.html]
 if you are interested. To have PDFDebugger receive those events it would be 
required to build such an application bundle. As I've done this already for an 
application of mine I could also build one for the PDFDebugger. But this only 
would make sense if the PDFDebugger application would be downloadable on the 
download page in form of a .dmg image. It's possible to create such .dmg images 
on other plattforms (see 
[https://stackoverflow.com/questions/286419/how-to-build-a-dmg-mac-os-x-file-on-a-non-mac-platform)]
 but I am not sure if this is worth the hassle. But macOS applications can of 
course also be delivered as .zip files, it's just not the "native" way to 
deliver applications. When not delivering the PDFDebugger as application bundle 
the osxOpenFiles() method (and all related stuff in OSXAdapter) is just 
unreachable code and could be deleted.

Also OSXAdapter has two unused methods (setAboutHandler(), 
setPreferencesHandler()). As long as the PDFDebugger does not get a preference 
or about dialog those two could just be deleted.

Tested on macOS 10.13.5 with JDK 8, 9 and 10.

> Java 9/macOS: Debugger App does not start (NoSuchMethodException)
> -
>
> Key: PDFBOX-4013
> URL: https://issues.apache.org/jira/browse/PDFBOX-4013
> Project: PDFBox
>  Issue Type: Bug
>  Components: Utilities
>Affects Versions: 2.0.8
>Reporter: Emmeran Seehuber
>Priority: Major
>  Labels: jdk9, mac-os-x
> Attachments: pdfdebugger-macos-fixes_v1.patch
>
>
> It seems the debugger app wants to integrate nicely into macOS and uses some 
> private API for this. This worked fine with all Java versions including 8, 
> but does no longer work with 9.
> Java 9 provides new APIs for this, but till PDFBox can depend on Java 9 (or 
> the next LTS Java 11) it should at least catch this and not crash
> The application does not start, and instead displays a dialog with a stack 
> trace.
> Console Output + StackTrace:
> {code}
> WARNING: An illegal reflective access operation has occurred
> WARNING: Illegal reflective access by 
> org.apache.pdfbox.debugger.ui.OSXAdapter 
> (file:/Users/emmy/Downloads/debugger-app-2.0.7.jar) to constructor 
> com.apple.eawt.Application()
> WARNING: Please consider reporting this to the maintainers of 
> org.apache.pdfbox.debugger.ui.OSXAdapter
> WARNING: Use --illegal-access=warn to enable warnings of further illegal 
> reflective access operations
> WARNING: All illegal access operations will be denied in a future release
> Mac OS X Adapter could not talk to 

[jira] [Updated] (PDFBOX-4013) Java 9/macOS: Debugger App does not start (NoSuchMethodException)

2018-07-08 Thread Emmeran Seehuber (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-4013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Emmeran Seehuber updated PDFBOX-4013:
-
Attachment: pdfdebugger-macos-fixes_v1.patch

> Java 9/macOS: Debugger App does not start (NoSuchMethodException)
> -
>
> Key: PDFBOX-4013
> URL: https://issues.apache.org/jira/browse/PDFBOX-4013
> Project: PDFBox
>  Issue Type: Bug
>  Components: Utilities
>Affects Versions: 2.0.8
>Reporter: Emmeran Seehuber
>Priority: Major
>  Labels: jdk9, mac-os-x
> Attachments: pdfdebugger-macos-fixes_v1.patch
>
>
> It seems the debugger app wants to integrate nicely into macOS and uses some 
> private API for this. This worked fine with all Java versions including 8, 
> but does no longer work with 9.
> Java 9 provides new APIs for this, but till PDFBox can depend on Java 9 (or 
> the next LTS Java 11) it should at least catch this and not crash
> The application does not start, and instead displays a dialog with a stack 
> trace.
> Console Output + StackTrace:
> {code}
> WARNING: An illegal reflective access operation has occurred
> WARNING: Illegal reflective access by 
> org.apache.pdfbox.debugger.ui.OSXAdapter 
> (file:/Users/emmy/Downloads/debugger-app-2.0.7.jar) to constructor 
> com.apple.eawt.Application()
> WARNING: Please consider reporting this to the maintainers of 
> org.apache.pdfbox.debugger.ui.OSXAdapter
> WARNING: Use --illegal-access=warn to enable warnings of further illegal 
> reflective access operations
> WARNING: All illegal access operations will be denied in a future release
> Mac OS X Adapter could not talk to EAWT:
> java.lang.RuntimeException: java.lang.NoSuchMethodException: 
> com.apple.eawt.Application.addApplicationListener(com.apple.eawt.ApplicationListener)
> org.apache.pdfbox.debugger.ui.OSXAdapter.setHandler(OSXAdapter.java:171)
> 
> org.apache.pdfbox.debugger.ui.OSXAdapter.setFileHandler(OSXAdapter.java:137)
> 
> org.apache.pdfbox.debugger.PDFDebugger.initComponents(PDFDebugger.java:301)
> org.apache.pdfbox.debugger.PDFDebugger.(PDFDebugger.java:182)
> org.apache.pdfbox.debugger.PDFDebugger.main(PDFDebugger.java:1201)
> Caused by: java.lang.NoSuchMethodException: 
> com.apple.eawt.Application.addApplicationListener(com.apple.eawt.ApplicationListener)
> java.base/java.lang.Class.getDeclaredMethod(Class.java:2432)
> org.apache.pdfbox.debugger.ui.OSXAdapter.setHandler(OSXAdapter.java:163)
> 
> org.apache.pdfbox.debugger.ui.OSXAdapter.setFileHandler(OSXAdapter.java:137)
> 
> org.apache.pdfbox.debugger.PDFDebugger.initComponents(PDFDebugger.java:301)
> org.apache.pdfbox.debugger.PDFDebugger.(PDFDebugger.java:182)
> org.apache.pdfbox.debugger.PDFDebugger.main(PDFDebugger.java:1201)
> {code}
> To workaround this problem I have to run the debugger app using JDK 8. This 
> is ok for now, but very annoying.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images

2018-07-02 Thread Emmeran Seehuber (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16530179#comment-16530179
 ] 

Emmeran Seehuber commented on PDFBOX-4184:
--

[~tilman] Regarding estCompressSum() and chooseDataRowToWrite(): This is a 
"oracle" (=heuristic) that tries to get the "best" row to write. The idea is to 
choose the row representation (i.e. subtracted to byte in above row, subtracted 
to byte on the left, and so on) that has most value 0 or at least very small. 
This reduces the possible values in the Huffman tree for the ZIP compression 
which allows for a better compression and also makes it more likely to have the 
same value repeated to have it run length encoded (RLE). Gradients are 
"perfectly" compressed with such a scheme. 

The default algorithm is not perfect and misses the best possible combination 
of rows. But getting the best combination would mean trying all different row 
encodings which means 5 *  combinations. Tools like 
[pngcrush|https://pmt.sourceforge.io/pngcrush/] are trying to do so using more 
or less brute force search. But that takes time... So this is not suitable for 
a generic image writer.

I'm fine with adding heuristics to decide when to use which encoder. But you 
can spend ages to get this right, and you will always find cases where the 
heuristic will be wrong... The overall idea of this encoder is to get a better 
compression for *most* cases, especially when using zip compression level 9. 

When modifying estCompressSum() we should run a test on the govdocs corpus and 
record the sizes (in e.g. a textfile) and then change the method and record the 
sizes after the change. Then we could compare if the change really changes the 
overall compression for the better... I won't have time to look into that this 
week.

If I understand the PDF spec correctly, it is not possible to write an indexed 
image - which would be very nice for small icon like images... Regarding the 
impact on openhtmltopdf it's difficult to say. I have a project where I use 
openhtmltopdf to generate reports which contain tons of photo images, so this 
change is going to improve the file size there. Also if you care about file 
size you should ensure that the compression level is always set to 9.

Maybe we should really add an heuristic like "if the image is smaller then e.g. 
50x50 pixel _and_ it is default sRGB, just encode it without predictor using 
the old sRGB path". For such small images we also could do the brute force way 
and encode it using both methods and then choose the smaller result.

Regarding the testCreateLosslessFromImageCMYK(): If the image data is nearly 
identical but not exact the same, it's likely some rounding errors because of 
the color conversion. Of course, this should not happen, but it also depends 
how the image color is converted to sRGB. It would be awesome (not only for 
this test, but also for other stuff) if PDImageXObject had a getRawImage() (or 
similar named) method, which would return the BufferedImage with whatever 
colorspace it has, so that CMYK images just would be returned as CMYK images 
and not converted to sRGB.

I'm also thinking about adding a method 
LosslessFactory.createFromByteArray(PDDocument,byte[]) which would try and 
sniffer the image type. It could use JPEGFactory.createFromByteArray() for 
JPEGs and could try to directly reuse the IDAT chunk of PNGs. If it could not 
encode the image because the PNG has e.g. a index color encoding, it would 
return null, so that the user knows he has to load the image to encode it from 
the BufferedImage. This would speed up the PDF write time if the user already 
has the image encoded and it would allow to precompress images using external 
tools like pngcrush and benefit from that compression. But thats a different 
issue.

> [PATCH]: Support simple lossless compression of 16 bit RGB images
> -
>
> Key: PDFBOX-4184
> URL: https://issues.apache.org/jira/browse/PDFBOX-4184
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Writing
>Affects Versions: 2.0.9
>Reporter: Emmeran Seehuber
>Priority: Minor
> Fix For: 2.0.12, 3.0.0 PDFBox
>
> Attachments: 16bit.png, LoadGovdocs.java, 
> lossless_predictor_based_imageencoding.patch, 
> lossless_predictor_based_imageencoding_v2.patch, 
> lossless_predictor_based_imageencoding_v3.patch, 
> lossless_predictor_based_imageencoding_v4.patch, 
> lossless_predictor_based_imageencoding_v5.patch, 
> lossless_predictor_based_imageencoding_v6.patch, 
> pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, 
> png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf
>
>
> The attached patch add support to write 16 bit per component images 
> correctly. I've i

[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images

2018-07-01 Thread Emmeran Seehuber (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16529182#comment-16529182
 ] 

Emmeran Seehuber commented on PDFBOX-4184:
--

Yes, 16 bit alpha channels were ignored ... I've updated the patch and included 
your unit test. See [^lossless_predictor_based_imageencoding_v6.patch]

> [PATCH]: Support simple lossless compression of 16 bit RGB images
> -
>
> Key: PDFBOX-4184
> URL: https://issues.apache.org/jira/browse/PDFBOX-4184
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Writing
>Affects Versions: 2.0.9
>Reporter: Emmeran Seehuber
>Priority: Minor
> Fix For: 2.0.12, 3.0.0 PDFBox
>
> Attachments: 16bit.png, LoadGovdocs.java, 
> lossless_predictor_based_imageencoding.patch, 
> lossless_predictor_based_imageencoding_v2.patch, 
> lossless_predictor_based_imageencoding_v3.patch, 
> lossless_predictor_based_imageencoding_v4.patch, 
> lossless_predictor_based_imageencoding_v5.patch, 
> lossless_predictor_based_imageencoding_v6.patch, 
> pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, 
> png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf
>
>
> The attached patch add support to write 16 bit per component images 
> correctly. I've integrated a test for this here: 
> [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9]
> It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this 
> is what you usually get when you read a 16 bit PNG file.
> This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173].
> The patch is against 2.0.9, but should apply to 3.0.0 too.
> There is still some room for improvements when writing lossless images, as 
> the images are currently not efficiently encoded. I.e. you could use PNG 
> encodings to get a better compression. (By adding a COSName.DECODE_PARMS with 
> a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is 
> something for a later patch. It would also need another API, as there is a 
> tradeoff speed vs compression ratio. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images

2018-07-01 Thread Emmeran Seehuber (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Emmeran Seehuber updated PDFBOX-4184:
-
Attachment: lossless_predictor_based_imageencoding_v6.patch

> [PATCH]: Support simple lossless compression of 16 bit RGB images
> -
>
> Key: PDFBOX-4184
> URL: https://issues.apache.org/jira/browse/PDFBOX-4184
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Writing
>Affects Versions: 2.0.9
>Reporter: Emmeran Seehuber
>Priority: Minor
> Fix For: 2.0.12, 3.0.0 PDFBox
>
> Attachments: 16bit.png, LoadGovdocs.java, 
> lossless_predictor_based_imageencoding.patch, 
> lossless_predictor_based_imageencoding_v2.patch, 
> lossless_predictor_based_imageencoding_v3.patch, 
> lossless_predictor_based_imageencoding_v4.patch, 
> lossless_predictor_based_imageencoding_v5.patch, 
> lossless_predictor_based_imageencoding_v6.patch, 
> pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, 
> png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf
>
>
> The attached patch add support to write 16 bit per component images 
> correctly. I've integrated a test for this here: 
> [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9]
> It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this 
> is what you usually get when you read a 16 bit PNG file.
> This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173].
> The patch is against 2.0.9, but should apply to 3.0.0 too.
> There is still some room for improvements when writing lossless images, as 
> the images are currently not efficiently encoded. I.e. you could use PNG 
> encodings to get a better compression. (By adding a COSName.DECODE_PARMS with 
> a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is 
> something for a later patch. It would also need another API, as there is a 
> tradeoff speed vs compression ratio. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images

2018-07-01 Thread Emmeran Seehuber (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Emmeran Seehuber updated PDFBOX-4184:
-
Attachment: (was: 16bit.png)

> [PATCH]: Support simple lossless compression of 16 bit RGB images
> -
>
> Key: PDFBOX-4184
> URL: https://issues.apache.org/jira/browse/PDFBOX-4184
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Writing
>Affects Versions: 2.0.9
>Reporter: Emmeran Seehuber
>Priority: Minor
> Fix For: 2.0.12, 3.0.0 PDFBox
>
> Attachments: 16bit.png, LoadGovdocs.java, 
> lossless_predictor_based_imageencoding.patch, 
> lossless_predictor_based_imageencoding_v2.patch, 
> lossless_predictor_based_imageencoding_v3.patch, 
> lossless_predictor_based_imageencoding_v4.patch, 
> lossless_predictor_based_imageencoding_v5.patch, 
> pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, 
> png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf
>
>
> The attached patch add support to write 16 bit per component images 
> correctly. I've integrated a test for this here: 
> [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9]
> It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this 
> is what you usually get when you read a 16 bit PNG file.
> This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173].
> The patch is against 2.0.9, but should apply to 3.0.0 too.
> There is still some room for improvements when writing lossless images, as 
> the images are currently not efficiently encoded. I.e. you could use PNG 
> encodings to get a better compression. (By adding a COSName.DECODE_PARMS with 
> a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is 
> something for a later patch. It would also need another API, as there is a 
> tradeoff speed vs compression ratio. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images

2018-07-01 Thread Emmeran Seehuber (JIRA)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Emmeran Seehuber updated PDFBOX-4184:
-
Attachment: 16bit.png

> [PATCH]: Support simple lossless compression of 16 bit RGB images
> -
>
> Key: PDFBOX-4184
> URL: https://issues.apache.org/jira/browse/PDFBOX-4184
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Writing
>Affects Versions: 2.0.9
>Reporter: Emmeran Seehuber
>Priority: Minor
> Fix For: 2.0.12, 3.0.0 PDFBox
>
> Attachments: 16bit.png, 16bit.png, LoadGovdocs.java, 
> lossless_predictor_based_imageencoding.patch, 
> lossless_predictor_based_imageencoding_v2.patch, 
> lossless_predictor_based_imageencoding_v3.patch, 
> lossless_predictor_based_imageencoding_v4.patch, 
> lossless_predictor_based_imageencoding_v5.patch, 
> pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, 
> png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf
>
>
> The attached patch add support to write 16 bit per component images 
> correctly. I've integrated a test for this here: 
> [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9]
> It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this 
> is what you usually get when you read a 16 bit PNG file.
> This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173].
> The patch is against 2.0.9, but should apply to 3.0.0 too.
> There is still some room for improvements when writing lossless images, as 
> the images are currently not efficiently encoded. I.e. you could use PNG 
> encodings to get a better compression. (By adding a COSName.DECODE_PARMS with 
> a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is 
> something for a later patch. It would also need another API, as there is a 
> tradeoff speed vs compression ratio. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-3353) Create appearance streams for annotations

2018-06-28 Thread Emmeran Seehuber (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16526121#comment-16526121
 ] 

Emmeran Seehuber commented on PDFBOX-3353:
--

[~tilman] Do you mean something like 
[https://fontawesome.com/v4.7.0/icon/comment-o] ?

You can use my [https://github.com/rototor/pdfbox-graphics2d] to convert a font 
shape to a PDF command stream. For a quick test I would suggest checking out 
the sources and patching the fonts you want to use into the 
PdfBoxGraphics2dTest.testDifferentFonts method and altering the text printed. 
You should then be able to extract the resulting curves from the PDF file using 
the PDF Debugger.

> Create appearance streams for annotations
> -
>
> Key: PDFBOX-3353
> URL: https://issues.apache.org/jira/browse/PDFBOX-3353
> Project: PDFBox
>  Issue Type: Task
>  Components: PDModel, Rendering
>Affects Versions: 1.8.12, 2.0.0, 2.0.1, 2.0.2, 3.0.0 PDFBox
>Reporter: Tilman Hausherr
>Priority: Major
>  Labels: Annotations
> Attachments: AnnotationSample.Standard.pdf, 
> AnnotationSample.Standard.pdf, CTAN-example-Annotations-rot270.pdf, 
> CTAN-example-Annotations.pdf, CloudyBorder.zip, 
> Line-Annotation-OpenArrow-w10-AP.pdf, PDFBOX-2019-Annotations.pdf, 
> PDFBOX-2898-Annotations.pdf, 
> PDFBOX-3353-007071-p1-Annotation-Big-Rectangle.pdf, 
> PDFBOX-3353-084374-p48-Annotation-Big-Rectangle-NoZoom.pdf, 
> PDFBOX-3353-Annotations-AP.pdf, PDFBOX-3353-Annotations-noAP.pdf, 
> PDFBOX-3353-highlight-noAP-001796-p1.pdf, PDFBOX-3353-highlight-noAP.pdf, 
> PDFBOX-4199-StrikeOut-Empty-Surface.pdf, 
> PDFJS-1973-Annotations-pdfcomment.pdf, PDFJS-7115-indirect-rect.pdf, 
> ShowAnnotation-4.java, ShowAnnotation-5.java, ShowAnnotation-6.java, 
> SquareAnnotations.pdf, annots.pdf, gs-bugzilla-693664-AnnotationTest.pdf, 
> line_dimension_appearance_stream-noAP.pdf, 
> line_dimension_appearance_stream.pdf, pdf_commenting_new.pdf, 
> showAnnotation.java, text_markup_ap_test.pdf
>
>
> Create appearance streams for annotations when missing.
> I'll start by replacing current code for Ink and Link annotations.
> Good example PDFs:
> http://www.pdfill.com/example/pdf_commenting_new.pdf
> https://github.com/mozilla/pdf.js/issues/6810



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4253) Optimize PDFunctionType3.eval()

2018-06-26 Thread Emmeran Seehuber (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16524254#comment-16524254
 ] 

Emmeran Seehuber commented on PDFBOX-4253:
--

[~tilman] Maybe a stupid questions, but how does the cache get pruned if the 
underlying COSArray changes? Yes, should normally not happen that often, but at 
the moment the function would then use stale values.

> Optimize PDFunctionType3.eval()
> ---
>
> Key: PDFBOX-4253
> URL: https://issues.apache.org/jira/browse/PDFBOX-4253
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Rendering
>Affects Versions: 2.0.10, 2.0.11
>Reporter: Tilman Hausherr
>Assignee: Tilman Hausherr
>Priority: Major
>  Labels: optimization
> Fix For: 3.0.0 PDFBox, 2.0.12
>
> Attachments: PDFJS-9770-slow.pdf
>
>
> I ran the profiler on PDFJS-9770-slow.pdf and it turned out that a few 
> seconds were lost in {{COSArray.toFloatArray()}}. Caching it saves about 30%.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4242) Fontbox does not close file descriptor when loading fonts.

2018-06-25 Thread Emmeran Seehuber (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16522518#comment-16522518
 ] 

Emmeran Seehuber commented on PDFBOX-4242:
--

[~tilman] PhantomReferences of course do not guarantee that the cleanup will 
run (which you have to do yourself in your own thread or whenever you like). If 
you terminate the JVM before the GC had a chance to run, the cleanup will of 
course not run.

But there is a big difference to finalization: They can not resurrect the 
object. This means that the JVM does not need to wait some full GC cycles 
before it can start to call the finalizers, make some dances around the 
references etc. Instead the PhantomReference of an object is queued as soon as 
it is GCed - which can also happen with minor collects. So the time between 
"object is no longer reachable" to "PhantomReference is queued" is very short 
usually.   

Registering the font directly in the PDDocument and also ensuring that the font 
resources are freed as soon as close() is called on the document seems the sane 
thing to do here. I would not make getFontsToSubset() public, but rather add a 
registerFont() method - which must be of course public but should be marked as 
"internal use only".

> Fontbox does not close file descriptor when loading fonts.
> --
>
> Key: PDFBOX-4242
> URL: https://issues.apache.org/jira/browse/PDFBOX-4242
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Affects Versions: 2.0.9
>Reporter: Glen Peterson
>Priority: Minor
>  Labels: file_leak
>
> My app has been getting "java.io.FileNotFoundException (No file descriptors 
> available)" and I've confirmed that it's because fontbox isn't closing it's 
> file descriptors.
> In org.apache.fontbox.ttf.TTFParser there's this method:
> {{public TrueTypeFont parse(File ttfFile) throws IOException {}}
>  {{  RAFDataStream raf = new RAFDataStream(ttfFile, "r");}}
> {{  try {}}
>  {{    return this.parse((TTFDataStream)raf);}}
>  {{  } catch (IOException var4) {}}
>  {{    // close only on error (file is still being accessed later)}}
>  {{    raf.close();}}
>  {{    throw var4;}}
>  {{}}}
>  {{}}}
> I would have expected to see the close() in a finally block so that the file 
> is always closed, not just on exceptions. Presumably, you can keep it in 
> memory without leaving the file descriptor open?
> {{public TrueTypeFont parse(File ttfFile) throws IOException {}}
>  {{  RAFDataStream raf = new RAFDataStream(ttfFile, "r");}}
> {{  try {}}
>  {{    return this.parse((TTFDataStream)raf);}}
>  {{  } catch (IOException var4) {}}{{    raf.close();}}
>  {{    throw var4;}}
>  {{  } finally {}}
>  {{    raf.close();}}
>  {{}}}
>  {{}}}
> I tried performing this in a lazy initialization, but it blew up:
> java.lang.RuntimeException: java.io.IOException: The TrueType font null does 
> not contain a 'cmap' tableCaused by: java.io.IOException: The TrueType font 
> null does not contain a 'cmap' table
>   at 
> org.apache.fontbox.ttf.TrueTypeFont.getUnicodeCmapImpl(TrueTypeFont.java:548)
>   at 
> org.apache.fontbox.ttf.TrueTypeFont.getUnicodeCmapLookup(TrueTypeFont.java:528)
>   at 
> org.apache.fontbox.ttf.TrueTypeFont.getUnicodeCmapLookup(TrueTypeFont.java:514)
>   at org.apache.fontbox.ttf.TTFSubsetter.(TTFSubsetter.java:91)
>   at 
> org.apache.pdfbox.pdmodel.font.TrueTypeEmbedder.subset(TrueTypeEmbedder.java:321)
>   at org.apache.pdfbox.pdmodel.font.PDType0Font.subset(PDType0Font.java:239)
>   at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1271)
> Thoughts?
> Thanks for PDFBox - it's been really helpful!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4242) Fontbox does not close file descriptor when loading fonts.

2018-06-24 Thread Emmeran Seehuber (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16521927#comment-16521927
 ] 

Emmeran Seehuber commented on PDFBOX-4242:
--

[~tilman] Yes, using a finalizer is a no go, as it will likely never run. The 
way to go here is to use a PhantomReference and a ReferenceQueue to close the 
file handle. E.g.

 
{code:java}
class TTFPhantomReference extends PhantomReference {
  TTFPhantomReference(TrueTypeFont font) {
super(font, TTFReferenceQueue.INSTANCE);
dataToClose = font.data;
// Pin this reference, otherwise it will be GCed before it can do it's 
magic...
TTFReferenceQueue.referencePin.add(this);
  }

  TTFDataStream dataToClose;
}

class TTFReferenceQueue {
  static final TTFReferenceQueue INSTANCE = new TTFReferenceQueue();
 /* Pin list, to not have the Referencer GCed before it could cleanup the 
objects */
 ConcurrentLinkedDeque referencePin = new 
ConcurrentLinkedDeque();

  /* The ugly part, we need a thread to poll the queue */
 static final Thread ttfPollThread = new Thread(){
  public void run(){
  while(true) {
// Block till we get a reference.   
TTFPhantomReference ref =  INSTANCE.remove();
if(ref != null ) {
  ref.close();
  referencePin.remove(ref);
}
  }
}
};
static {
  ttfPollThread.setDeamon(true);
  ttfPollThread.start();
}
}
{code}
No idea if the TrueTypeFont should be the object that is referenced in the 
PhantomReference, or if it should be a parent object (e.g. PDFont). The Phantom 
Reference just needs a reference to the resource that should be closed but of 
course not a reference to the owner object. This avoids the finalizer() 
resurrection problem, as the reference object will already be gone.

The problem with this code is: It creates a static thread, which in turn may 
lead to a class loader leak. 

Guava does some special magic to avoid this problem and make it work correctly 
with container environments. If you could use Guava you would just make this 
code look like:
{code:java}
public class TTFFileCloser {
  private static FinalizableReferenceQueue queue = new 
FinalizableReferenceQueue();
  private ConcurrentLinkedDeque> 
phantomReferencePinner = new ConcurrentLinkedDeque<>();

  public void closeFileIfNotReachable(TrueTypeFont owner, final TTFDataStream 
data) {
FinalizablePhantomReference finalizablePhantomReference = new 
FinalizablePhantomReference(owner, queue) {
   @Override
   public void finalizeReferent() {  
 data.close();
 phantomReferencePinner.remove(this);
   }
};
// Pin the reference to avoid it beeing GCed 
phantomReferencePinner.add(finalizablePhantomReference);
  }
}
{code}
PhantomReferences are queued very fast after the owner object is no longer 
reachable. I use this (with the Guava base classes) very successful to cleanup 
all my temp files. As you can't use Guava I don't know if you want to go this 
route, as getting this right in a servlet context is not that easy... Note: You 
don't need to create a background thread to do the cleanup, Guava has a 
fallback that it just will cleanup all references that are queued if a new 
reference is added to the list. But it only uses this if it can't create a 
background thread.

> Fontbox does not close file descriptor when loading fonts.
> --
>
> Key: PDFBOX-4242
> URL: https://issues.apache.org/jira/browse/PDFBOX-4242
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Affects Versions: 2.0.9
>Reporter: Glen Peterson
>Priority: Minor
>  Labels: file_leak
>
> My app has been getting "java.io.FileNotFoundException (No file descriptors 
> available)" and I've confirmed that it's because fontbox isn't closing it's 
> file descriptors.
> In org.apache.fontbox.ttf.TTFParser there's this method:
> {{public TrueTypeFont parse(File ttfFile) throws IOException {}}
>  {{  RAFDataStream raf = new RAFDataStream(ttfFile, "r");}}
> {{  try {}}
>  {{    return this.parse((TTFDataStream)raf);}}
>  {{  } catch (IOException var4) {}}
>  {{    // close only on error (file is still being accessed later)}}
>  {{    raf.close();}}
>  {{    throw var4;}}
>  {{}}}
>  {{}}}
> I would have expected to see the close() in a finally block so that the file 
> is always closed, not just on exceptions. Presumably, you can keep it in 
> memory without leaving the file descriptor open?
> {{public TrueTypeFont parse(File ttfFile) throws IOException {}}
>  {{  RAFDataStream raf = new RAFDataStream(ttfFile, "r");}}
> {{  try {}}
>  {{    return this.parse((TTFDataStream)raf);}}
>  {{  } catch (IOException var4) {}}{{    raf.close();}}
>  {{    throw var4;}}
>  {{  } finally {}}
>  {{    raf.close();}}
>  {{}}}
>  {{}}}
> I tried performing this in a lazy initializatio

[jira] [Commented] (PDFBOX-4242) Fontbox does not close file descriptor when loading fonts.

2018-06-18 Thread Emmeran Seehuber (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16515894#comment-16515894
 ] 

Emmeran Seehuber commented on PDFBOX-4242:
--

[~tilman] subset() will only be called when fonts are used in the document. If 
for whatever reason you are loading but not using a font, you will leak file 
handles... which can / will bring your (web-)server down when the file handle 
limit is exhausted...

This can happen if you load all possible needed fonts upfront, but if they are 
used depends on the data you put in the PDF. (e.g. a Chinese font is only used 
when they are really Chinese characters etc.). I had this in production with 
OpenHTMLToPDF, see also [https://github.com/danfickle/openhtmltopdf/pull/215]. 
The workaround was to subset() all loaded fonts manually. As we had a handle on 
the TrueTypeFont I tried to close() it directly. But this causes a NPE as 
RAFDataStream.close() violates the close() contract, namely that calling 
close() twice should have no effect. But when called the second time 
RAFDataStream.close() will just throw a NPE. 

It would be nice if RAFDataStream.close() could be fixed (i.e. putting a 
if(raf!=null) before the raf.close()).

> Fontbox does not close file descriptor when loading fonts.
> --
>
> Key: PDFBOX-4242
> URL: https://issues.apache.org/jira/browse/PDFBOX-4242
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Affects Versions: 2.0.9
>Reporter: Glen Peterson
>Priority: Minor
>  Labels: file_leak
>
> My app has been getting "java.io.FileNotFoundException (No file descriptors 
> available)" and I've confirmed that it's because fontbox isn't closing it's 
> file descriptors.
> In org.apache.fontbox.ttf.TTFParser there's this method:
> {{public TrueTypeFont parse(File ttfFile) throws IOException {}}
>  {{  RAFDataStream raf = new RAFDataStream(ttfFile, "r");}}
> {{  try {}}
>  {{    return this.parse((TTFDataStream)raf);}}
>  {{  } catch (IOException var4) {}}
>  {{    // close only on error (file is still being accessed later)}}
>  {{    raf.close();}}
>  {{    throw var4;}}
>  {{}}}
>  {{}}}
> I would have expected to see the close() in a finally block so that the file 
> is always closed, not just on exceptions. Presumably, you can keep it in 
> memory without leaving the file descriptor open?
> {{public TrueTypeFont parse(File ttfFile) throws IOException {}}
>  {{  RAFDataStream raf = new RAFDataStream(ttfFile, "r");}}
> {{  try {}}
>  {{    return this.parse((TTFDataStream)raf);}}
>  {{  } catch (IOException var4) {}}{{    raf.close();}}
>  {{    throw var4;}}
>  {{  } finally {}}
>  {{    raf.close();}}
>  {{}}}
>  {{}}}
> I tried performing this in a lazy initialization, but it blew up:
> java.lang.RuntimeException: java.io.IOException: The TrueType font null does 
> not contain a 'cmap' tableCaused by: java.io.IOException: The TrueType font 
> null does not contain a 'cmap' table
>   at 
> org.apache.fontbox.ttf.TrueTypeFont.getUnicodeCmapImpl(TrueTypeFont.java:548)
>   at 
> org.apache.fontbox.ttf.TrueTypeFont.getUnicodeCmapLookup(TrueTypeFont.java:528)
>   at 
> org.apache.fontbox.ttf.TrueTypeFont.getUnicodeCmapLookup(TrueTypeFont.java:514)
>   at org.apache.fontbox.ttf.TTFSubsetter.(TTFSubsetter.java:91)
>   at 
> org.apache.pdfbox.pdmodel.font.TrueTypeEmbedder.subset(TrueTypeEmbedder.java:321)
>   at org.apache.pdfbox.pdmodel.font.PDType0Font.subset(PDType0Font.java:239)
>   at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1271)
> Thoughts?
> Thanks for PDFBox - it's been really helpful!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-4013) Java 9/macOS: Debugger App does not start (NoSuchMethodException)

2018-06-18 Thread Emmeran Seehuber (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16515876#comment-16515876
 ] 

Emmeran Seehuber commented on PDFBOX-4013:
--

I should look into this, as I work on Macs mostly. But I can't currently say 
when I will have time ... 

> Java 9/macOS: Debugger App does not start (NoSuchMethodException)
> -
>
> Key: PDFBOX-4013
> URL: https://issues.apache.org/jira/browse/PDFBOX-4013
> Project: PDFBox
>  Issue Type: Bug
>  Components: Utilities
>Affects Versions: 2.0.8
>Reporter: Emmeran Seehuber
>Priority: Major
>  Labels: jdk9, mac-os-x
>
> It seems the debugger app wants to integrate nicely into macOS and uses some 
> private API for this. This worked fine with all Java versions including 8, 
> but does no longer work with 9.
> Java 9 provides new APIs for this, but till PDFBox can depend on Java 9 (or 
> the next LTS Java 11) it should at least catch this and not crash
> The application does not start, and instead displays a dialog with a stack 
> trace.
> Console Output + StackTrace:
> {code}
> WARNING: An illegal reflective access operation has occurred
> WARNING: Illegal reflective access by 
> org.apache.pdfbox.debugger.ui.OSXAdapter 
> (file:/Users/emmy/Downloads/debugger-app-2.0.7.jar) to constructor 
> com.apple.eawt.Application()
> WARNING: Please consider reporting this to the maintainers of 
> org.apache.pdfbox.debugger.ui.OSXAdapter
> WARNING: Use --illegal-access=warn to enable warnings of further illegal 
> reflective access operations
> WARNING: All illegal access operations will be denied in a future release
> Mac OS X Adapter could not talk to EAWT:
> java.lang.RuntimeException: java.lang.NoSuchMethodException: 
> com.apple.eawt.Application.addApplicationListener(com.apple.eawt.ApplicationListener)
> org.apache.pdfbox.debugger.ui.OSXAdapter.setHandler(OSXAdapter.java:171)
> 
> org.apache.pdfbox.debugger.ui.OSXAdapter.setFileHandler(OSXAdapter.java:137)
> 
> org.apache.pdfbox.debugger.PDFDebugger.initComponents(PDFDebugger.java:301)
> org.apache.pdfbox.debugger.PDFDebugger.(PDFDebugger.java:182)
> org.apache.pdfbox.debugger.PDFDebugger.main(PDFDebugger.java:1201)
> Caused by: java.lang.NoSuchMethodException: 
> com.apple.eawt.Application.addApplicationListener(com.apple.eawt.ApplicationListener)
> java.base/java.lang.Class.getDeclaredMethod(Class.java:2432)
> org.apache.pdfbox.debugger.ui.OSXAdapter.setHandler(OSXAdapter.java:163)
> 
> org.apache.pdfbox.debugger.ui.OSXAdapter.setFileHandler(OSXAdapter.java:137)
> 
> org.apache.pdfbox.debugger.PDFDebugger.initComponents(PDFDebugger.java:301)
> org.apache.pdfbox.debugger.PDFDebugger.(PDFDebugger.java:182)
> org.apache.pdfbox.debugger.PDFDebugger.main(PDFDebugger.java:1201)
> {code}
> To workaround this problem I have to run the debugger app using JDK 8. This 
> is ok for now, but very annoying.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



  1   2   >