[jira] [Comment Edited] (PDFBOX-4847) [PATCH] Allow to access raw image data and fix ICC profile embedding in PNGConverter

2020-05-26 Thread Emmeran Seehuber (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17117050#comment-17117050
 ] 

Emmeran Seehuber edited comment on PDFBOX-4847 at 5/26/20, 9:31 PM:


The bug in the PNGConverter is, that it did not correctly write the ICC 
profile. It had a "one off" error, as it did not skip the 0-byte marker in the 
profile name (first 0..79 bytes of the iCCP chunk + 0 byte). And it did not 
mark the stream as FLATE_DECODE.

PDFBox (and likely all other PDF readers) just ignored the ICC profile because 
of this (Exception while decoding the profile). But this meant that the colors 
were not correct (as the wrong color profile was used; the alternative 
DeviceRGB was used).

The minimal patch would be:
{code:java}
diff --git 
a/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/image/PNGConverter.java
 
b/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/image/PNGConverter.java
index f17cdd7cd..866cfbfba 100644
--- 
a/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/image/PNGConverter.java
+++ 
b/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/image/PNGConverter.java
@@ -400,11 +400,15 @@ final class PNGConverter
 if (state.iCCP != null || state.sRGB != null)
 {
 // We have got a color profile, which we must attach
 cosStream.setInt(COSName.N, colorSpace.getNumberOfComponents());
 cosStream.setItem(COSName.ALTERNATE, 
colorSpace.getNumberOfComponents()
 == 1 ? COSName.DEVICEGRAY : COSName.DEVICERGB);
 if (state.iCCP != null)
 {
+cosStream.setItem(COSName.FILTER, COSName.FLATE_DECODE);
 // We need to skip over the name
@@ -415,6 +419,7 @@ final class PNGConverter
 break;
 iccProfileDataStart++;
 }
+iccProfileDataStart++;
 if (iccProfileDataStart >= state.iCCP.length)
 {
 LOG.error("Invalid iCCP chunk, to few bytes");
{code}
But this will cause test failures in the PNGConverterTest. As the image now has 
the right colors, but
 - the JDK does not respect the embedded color profile in PNG images. Without 
the fix for this in PNGConverterTest the colors will be "miles" off with the 
PNG for comparison using ImageIO.
 - comparing sRGB images does not work, even after applying the fix for the ICC 
profile, because there are some color rounding differences (off by 1 on the 
first pixel, for whatever reason, likely some different color conversion paths 
somewhere). There is a massive difference between converting single pixel 
values between colorspaces and converting a whole image at once (using 
ColorConversionOp). The later one may choose slightly different colors 
depending on the rendering intent and the colors in use in the image. The image 
from PDImage.getImage() would have been ColorConversionOp-converted, but in 
checkIdent() using getRGB() the image read with ImageIO would be "pixel by 
pixel" color converted. One could fix this by first converting the expected 
image using ColorConversionOp to sRGB if it is not yet in sRGB.

If you want to apply this fix alone, you would need to temporary disable the 
test
{code:java}
PNGConverterTest.testImageConversionRGB16BitICC(){code}
The others should still work. Or your extend checkIdent() to correctly convert 
non-sRGB BufferedImages to sRGB first. I can also provide a patch for that if 
you like. 


was (Author: rototor):
The bug in the PNGConverter is, that it did not correctly write the ICC 
profile. It had a "one off" error, as it did not skip the 0-byte marker in the 
profile name (first 0..79 bytes of the iCCP chunk + 0 byte). And it did not 
mark the stream as FLATE_DECODE.

PDFBox (and likely all other PDF readers) just ignored the ICC profile because 
of this (Exception while decoding the profile). But this meant that the colors 
were not correct (as the wrong color profile was used; the alternative 
DeviceRGB was used).

The minimal patch would be:
{code:java}
diff --git 
a/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/image/PNGConverter.java
 
b/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/image/PNGConverter.java
index f17cdd7cd..866cfbfba 100644
--- 
a/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/image/PNGConverter.java
+++ 
b/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/image/PNGConverter.java
@@ -400,11 +400,15 @@ final class PNGConverter
 if (state.iCCP != null || state.sRGB != null)
 {
 // We have got a color profile, which we must attach
 cosStream.setInt(COSName.N, colorSpace.getNumberOfComponents());
 cosStream.setItem(COSName.ALTERNATE, 
colorSpace.getNumberOfComponents()
 == 1 ? COSName.DEVICEGRAY : COSName.DEVICERGB);
   

[jira] [Commented] (PDFBOX-4847) [PATCH] Allow to access raw image data and fix ICC profile embedding in PNGConverter

2020-05-26 Thread Emmeran Seehuber (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17117050#comment-17117050
 ] 

Emmeran Seehuber commented on PDFBOX-4847:
--

The bug in the PNGConverter is, that it did not correctly write the ICC 
profile. It had a "one off" error, as it did not skip the 0-byte marker in the 
profile name (first 0..79 bytes of the iCCP chunk + 0 byte). And it did not 
mark the stream as FLATE_DECODE.

PDFBox (and likely all other PDF readers) just ignored the ICC profile because 
of this (Exception while decoding the profile). But this meant that the colors 
were not correct (as the wrong color profile was used; the alternative 
DeviceRGB was used).

The minimal patch would be:
{code:java}
diff --git 
a/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/image/PNGConverter.java
 
b/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/image/PNGConverter.java
index f17cdd7cd..866cfbfba 100644
--- 
a/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/image/PNGConverter.java
+++ 
b/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/image/PNGConverter.java
@@ -400,11 +400,15 @@ final class PNGConverter
 if (state.iCCP != null || state.sRGB != null)
 {
 // We have got a color profile, which we must attach
 cosStream.setInt(COSName.N, colorSpace.getNumberOfComponents());
 cosStream.setItem(COSName.ALTERNATE, 
colorSpace.getNumberOfComponents()
 == 1 ? COSName.DEVICEGRAY : COSName.DEVICERGB);
 if (state.iCCP != null)
 {
+cosStream.setItem(COSName.FILTER, COSName.FLATE_DECODE);
 // We need to skip over the name
@@ -415,6 +419,7 @@ final class PNGConverter
 break;
 iccProfileDataStart++;
 }
+iccProfileDataStart++;
 if (iccProfileDataStart >= state.iCCP.length)
 {
 LOG.error("Invalid iCCP chunk, to few bytes");
{code}
But this will cause test failures in the PNGConverterTest. As the image now has 
the right colors, but
 - the JDK does not respect the embedded color profile in PNG images. Without 
the fix for this in PNGConverterTest the colors will be "miles" off when the 
PNG for comparison using ImageIO.
 - comparing sRGB images does not work, even after applying the fix for the, 
was there are some color rounding differences (off by 1 on the first pixel, for 
whatever reason, likely some different color conversion paths somewhere). There 
is a massive difference between converting single pixel values between 
colorspaces and converting a whole image at once (using ColorConversionOp). The 
later one may choose slightly different colors depending on the rendering 
intent and the colors in use in the image. The image from PDImage.getImage() 
would have been ColorConversionOp-converted, but in checkIdent() using getRGB() 
the image read with ImageIO would be "pixel by pixel" color converted. One 
could fix this by first converting the expected image using ColorConversionOp 
to sRGB if it is not yet in sRGB.

If you want to apply this fix alone, you would need to temporary disable the 
test
{code:java}
PNGConverterTest.testImageConversionRGB16BitICC(){code}
The others should still work. Or your extend checkIdent() to correctly convert 
non-sRGB BufferedImages to sRGB first. I can also provide a patch for that if 
you like.

> [PATCH] Allow to access raw image data and fix ICC profile embedding in 
> PNGConverter
> 
>
> Key: PDFBOX-4847
> URL: https://issues.apache.org/jira/browse/PDFBOX-4847
> Project: PDFBox
>  Issue Type: New Feature
>  Components: PDModel, Writing
>Affects Versions: 2.0.19
>Reporter: Emmeran Seehuber
>Priority: Minor
>  Labels: feature, patch
> Attachments: color_difference.png, pdfbox-rawimages.patch
>
>
> This patch was primary thought to add access to raw image data (i.e. without 
> any kind of color conversion/reduction). While implementing and testing it I 
> also found a bug with ICC profile embedding in the PNGConverter.
> This patch does those things:
>  - add a method getRawRaster() to PDImage. This allows to read the original 
> raster data in 8 or 16 bit without any kind of color interpretation. The user 
> must know what he wants to do with this himself (E.g. to access the raw data 
> of DeviceN images).
>  - add a method getRawImage(). Tries to return the raster obtained by 
> getRawRaster() as a BufferedImage. This is only successful if there is a 
> matching java ColorSpace for the colorspace of the image. I.e. only for 
> ICCBased images. In theory this also should work for PDIndexed sRGB images. 
> But I have to find a 

[jira] [Commented] (PDFBOX-4847) [PATCH] Allow to access raw image data and fix ICC profile embedding in PNGConverter

2020-05-26 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17116907#comment-17116907
 ] 

Tilman Hausherr commented on PDFBOX-4847:
-

What is the bug in PNGConverter about? Is it related to the improvement change 
only or would it also have effect on existing PDFs?

Yes I would not commit the improvements now, this is too close. But the bugfix 
yes, depending on the answer.

> [PATCH] Allow to access raw image data and fix ICC profile embedding in 
> PNGConverter
> 
>
> Key: PDFBOX-4847
> URL: https://issues.apache.org/jira/browse/PDFBOX-4847
> Project: PDFBox
>  Issue Type: New Feature
>  Components: PDModel, Writing
>Affects Versions: 2.0.19
>Reporter: Emmeran Seehuber
>Priority: Minor
>  Labels: feature, patch
> Attachments: color_difference.png, pdfbox-rawimages.patch
>
>
> This patch was primary thought to add access to raw image data (i.e. without 
> any kind of color conversion/reduction). While implementing and testing it I 
> also found a bug with ICC profile embedding in the PNGConverter.
> This patch does those things:
>  - add a method getRawRaster() to PDImage. This allows to read the original 
> raster data in 8 or 16 bit without any kind of color interpretation. The user 
> must know what he wants to do with this himself (E.g. to access the raw data 
> of DeviceN images).
>  - add a method getRawImage(). Tries to return the raster obtained by 
> getRawRaster() as a BufferedImage. This is only successful if there is a 
> matching java ColorSpace for the colorspace of the image. I.e. only for 
> ICCBased images. In theory this also should work for PDIndexed sRGB images. 
> But I have to find a PDF with such an image first to test it.
>  - add a -noColorConversion switch to the ExtractImage utility to extract 
> images in their original colorspace. For CMYK images this only works when a 
> TIFF encoder (e.g. from TwelveMonkeys) is in the class path.
>  - add support to export PNGs with ICC profile data in ImageIOUtil.
>  - fix a bug in PNGConverter which does not correctly embed the ICC profile 
> from the png file.
>  - the PNGConverterTest tests the raw images; While reading PNG files to 
> compare it also ensures that the embedded ICC profile is correctly respected. 
> The default PNG reader at least till JDK11 does *not* respect the embedded 
> ICC profile. I.e. the colors are wrong. But there is a workaround for this in 
> the PNGConverterTest (which I have in production for years now). See the 
> screenshot for the correct color display of the png_rgb_romm_16.png testfile 
> (left side; macOS Preview app) and the wrong display (right side; Java; 
> inside IDEA).
>  
> Access to the raw image allows beside finding bugs like in the PNGConverter 
> it also to do all kind of funny color things. E.g. a future patch could be to 
> allow using the raw images to print PDFs. If the PDF you want to print has 
> images with a gamut > sRGB (i.e. all modern cameras) and the target printer 
> has also a gamut > sRGB (i.e. some ink photo printer) you will for sure see a 
> difference in the resulting print. Such a mode would be rather slow, as the 
> current sRGB image handling is optimized for speed and using the original raw 
> images would need on demand color conversions in the printer driver. But you 
> get „high quality“ out of it (at least in respect to colors).
> I don’t think this is in time for the 2.0.20 release.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: Automatic website generation

2020-05-26 Thread Maruan Sahyoun
 
>  
> > Am 25.05.20 um 08:27 schrieb Maruan Sahyoun:
> > > OK - I've reviewed the docs and also look at some projects using 
> > > automated website building (there aren't many yet).
> > Thanks for looking into this. It's not used that often yet as it is a quite 
> > new 
> > feature.
> > 
> > > The best example I've found is that from Apache Struts as this not only 
> > > builds the website (including staging for PR based
> > > changes) but also includes the JavaDocs when a new release was done. 
> > > Unfortunately this is setup in a way that the JavaDoc
> > > source ends up in the master branch. Currently we have older version of 
> > > the JavaDoc available on our site too which imho would
> > > clutter the master branch.
> > > 
> > > Please take a look at https://struts.apache.org/updating-website.html and 
> > > let me know what you think.
> > > 
> > > Which tasks of our current web site buidling would you like to see 
> > > simplified? Maybe we can start from there.
> > I'd like to use the automatic website build including the staging area, so 
> > that 
> > it's easy the apply changes without installing the jekyll locally.
> 
> Would you see a similar approach as done by Struts
> 
> /quote
> If you are a contributor, and the change is small you can push it directly to 
> the master branch. In any other case please open a
> Pull Request. The Pull Request will be automatically build and deployed to 
> the staging site.
> 
> You can then review your changes before applying them to the master branch.
> /quote
> 
> or a different workflow e.g. by switching to a staging branch - commit, 
> automated build, review - and if valid apply the changes
> to master. Please note that there will be some time between committing and 
> being able to review using the staging area so for
> many changes doing them locally with Jekylls hot building feature might give 
> you a quicker turnaround.
> 
> Another option would be to change from Jekyll to a maven based build - either 
> using a plugin or by downloading the proper
> tooling. So one doesn't need to have a Jekyll install. As an example this is 
> used by Apache Camel 
> https://github.com/apache/camel-website/blob/master/pom.xml. 

I've created https://issues.apache.org/jira/browse/PDFBOX-4848 so let's move 
the input/discussion there.

> 
> BR
> Maruan
> 
> > Andreas
> > > BR
> > > Maruan
> > >   
> > > > Hi,
> > > > 
> > > > infra implemented some new features for github repos including some 
> > > > automatic
> > > > website building [1]. Should we use that for convenience?
> > > > 
> > > > WDYT?
> > > > 
> > > > Andreas
> > > > 
> > > > [1]
> > > > https://cwiki.apache.org/confluence/display/INFRA/.asf.yaml+features+for+git+repositories
> > > > 
> > > > -
> > > > To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> > > > For additional commands, e-mail: dev-h...@pdfbox.apache.org
> > > > 
> > > 
> > > -
> > > To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> > > For additional commands, e-mail: dev-h...@pdfbox.apache.org
> > > 
> > 
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> > For additional commands, e-mail: dev-h...@pdfbox.apache.org
> > 
-- 
Maruan Sahyoun

FileAffairs GmbH
Josef-Schappe-Straße 21
40882 Ratingen

Tel: +49 (2102) 89497 88
Fax: +49 (2102) 89497 91
sahy...@fileaffairs.de
www.fileaffairs.de

Geschäftsführer: Maruan Sahyoun
Handelsregister: AG Düsseldorf, HRB 53837
UST.-ID: DE248275827


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-4848) Automate building website without local install

2020-05-26 Thread Maruan Sahyoun (Jira)
Maruan Sahyoun created PDFBOX-4848:
--

 Summary: Automate building website without local install
 Key: PDFBOX-4848
 URL: https://issues.apache.org/jira/browse/PDFBOX-4848
 Project: PDFBox
  Issue Type: Improvement
  Components: Documentation
Reporter: Maruan Sahyoun
Assignee: Maruan Sahyoun


As discussed on the dev mailing list we are looking to utilize the [git - 
.asf.yaml 
features|https://cwiki.apache.org/confluence/display/INFRA/git+-+.asf.yaml+features]
 and/or other capabilities to simplify building the website without the need to 
install the site generation locally.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-4847) [PATCH] Allow to access raw image data and fix ICC profile embedding in PNGConverter

2020-05-26 Thread Emmeran Seehuber (Jira)
Emmeran Seehuber created PDFBOX-4847:


 Summary: [PATCH] Allow to access raw image data and fix ICC 
profile embedding in PNGConverter
 Key: PDFBOX-4847
 URL: https://issues.apache.org/jira/browse/PDFBOX-4847
 Project: PDFBox
  Issue Type: New Feature
  Components: PDModel, Writing
Affects Versions: 2.0.19
Reporter: Emmeran Seehuber
 Attachments: color_difference.png, pdfbox-rawimages.patch

This patch was primary thought to add access to raw image data (i.e. without 
any kind of color conversion/reduction). While implementing and testing it I 
also found a bug with ICC profile embedding in the PNGConverter.

This patch does those things:
 - add a method getRawRaster() to PDImage. This allows to read the original 
raster data in 8 or 16 bit without any kind of color interpretation. The user 
must know what he wants to do with this himself (E.g. to access the raw data of 
DeviceN images).
 - add a method getRawImage(). Tries to return the raster obtained by 
getRawRaster() as a BufferedImage. This is only successful if there is a 
matching java ColorSpace for the colorspace of the image. I.e. only for 
ICCBased images. In theory this also should work for PDIndexed sRGB images. But 
I have to find a PDF with such an image first to test it.
 - add a -noColorConversion switch to the ExtractImage utility to extract 
images in their original colorspace. For CMYK images this only works when a 
TIFF encoder (e.g. from TwelveMonkeys) is in the class path.
 - add support to export PNGs with ICC profile data in ImageIOUtil.
 - fix a bug in PNGConverter which does not correctly embed the ICC profile 
from the png file.
 - the PNGConverterTest tests the raw images; While reading PNG files to 
compare it also ensures that the embedded ICC profile is correctly respected. 
The default PNG reader at least till JDK11 does *not* respect the embedded ICC 
profile. I.e. the colors are wrong. But there is a workaround for this in the 
PNGConverterTest (which I have in production for years now). See the screenshot 
for the correct color display of the png_rgb_romm_16.png testfile (left side; 
macOS Preview app) and the wrong display (right side; Java; inside IDEA).

 

Access to the raw image allows beside finding bugs like in the PNGConverter it 
also to do all kind of funny color things. E.g. a future patch could be to 
allow using the raw images to print PDFs. If the PDF you want to print has 
images with a gamut > sRGB (i.e. all modern cameras) and the target printer has 
also a gamut > sRGB (i.e. some ink photo printer) you will for sure see a 
difference in the resulting print. Such a mode would be rather slow, as the 
current sRGB image handling is optimized for speed and using the original raw 
images would need on demand color conversions in the printer driver. But you 
get „high quality“ out of it (at least in respect to colors).

I don’t think this is in time for the 2.0.20 release.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-4847) [PATCH] Allow to access raw image data and fix ICC profile embedding in PNGConverter

2020-05-26 Thread Emmeran Seehuber (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Emmeran Seehuber updated PDFBOX-4847:
-
Attachment: color_difference.png

> [PATCH] Allow to access raw image data and fix ICC profile embedding in 
> PNGConverter
> 
>
> Key: PDFBOX-4847
> URL: https://issues.apache.org/jira/browse/PDFBOX-4847
> Project: PDFBox
>  Issue Type: New Feature
>  Components: PDModel, Writing
>Affects Versions: 2.0.19
>Reporter: Emmeran Seehuber
>Priority: Minor
>  Labels: feature, patch
> Attachments: color_difference.png, pdfbox-rawimages.patch
>
>
> This patch was primary thought to add access to raw image data (i.e. without 
> any kind of color conversion/reduction). While implementing and testing it I 
> also found a bug with ICC profile embedding in the PNGConverter.
> This patch does those things:
>  - add a method getRawRaster() to PDImage. This allows to read the original 
> raster data in 8 or 16 bit without any kind of color interpretation. The user 
> must know what he wants to do with this himself (E.g. to access the raw data 
> of DeviceN images).
>  - add a method getRawImage(). Tries to return the raster obtained by 
> getRawRaster() as a BufferedImage. This is only successful if there is a 
> matching java ColorSpace for the colorspace of the image. I.e. only for 
> ICCBased images. In theory this also should work for PDIndexed sRGB images. 
> But I have to find a PDF with such an image first to test it.
>  - add a -noColorConversion switch to the ExtractImage utility to extract 
> images in their original colorspace. For CMYK images this only works when a 
> TIFF encoder (e.g. from TwelveMonkeys) is in the class path.
>  - add support to export PNGs with ICC profile data in ImageIOUtil.
>  - fix a bug in PNGConverter which does not correctly embed the ICC profile 
> from the png file.
>  - the PNGConverterTest tests the raw images; While reading PNG files to 
> compare it also ensures that the embedded ICC profile is correctly respected. 
> The default PNG reader at least till JDK11 does *not* respect the embedded 
> ICC profile. I.e. the colors are wrong. But there is a workaround for this in 
> the PNGConverterTest (which I have in production for years now). See the 
> screenshot for the correct color display of the png_rgb_romm_16.png testfile 
> (left side; macOS Preview app) and the wrong display (right side; Java; 
> inside IDEA).
>  
> Access to the raw image allows beside finding bugs like in the PNGConverter 
> it also to do all kind of funny color things. E.g. a future patch could be to 
> allow using the raw images to print PDFs. If the PDF you want to print has 
> images with a gamut > sRGB (i.e. all modern cameras) and the target printer 
> has also a gamut > sRGB (i.e. some ink photo printer) you will for sure see a 
> difference in the resulting print. Such a mode would be rather slow, as the 
> current sRGB image handling is optimized for speed and using the original raw 
> images would need on demand color conversions in the printer driver. But you 
> get „high quality“ out of it (at least in respect to colors).
> I don’t think this is in time for the 2.0.20 release.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-4847) [PATCH] Allow to access raw image data and fix ICC profile embedding in PNGConverter

2020-05-26 Thread Emmeran Seehuber (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Emmeran Seehuber updated PDFBOX-4847:
-
Attachment: pdfbox-rawimages.patch

> [PATCH] Allow to access raw image data and fix ICC profile embedding in 
> PNGConverter
> 
>
> Key: PDFBOX-4847
> URL: https://issues.apache.org/jira/browse/PDFBOX-4847
> Project: PDFBox
>  Issue Type: New Feature
>  Components: PDModel, Writing
>Affects Versions: 2.0.19
>Reporter: Emmeran Seehuber
>Priority: Minor
>  Labels: feature, patch
> Attachments: color_difference.png, pdfbox-rawimages.patch
>
>
> This patch was primary thought to add access to raw image data (i.e. without 
> any kind of color conversion/reduction). While implementing and testing it I 
> also found a bug with ICC profile embedding in the PNGConverter.
> This patch does those things:
>  - add a method getRawRaster() to PDImage. This allows to read the original 
> raster data in 8 or 16 bit without any kind of color interpretation. The user 
> must know what he wants to do with this himself (E.g. to access the raw data 
> of DeviceN images).
>  - add a method getRawImage(). Tries to return the raster obtained by 
> getRawRaster() as a BufferedImage. This is only successful if there is a 
> matching java ColorSpace for the colorspace of the image. I.e. only for 
> ICCBased images. In theory this also should work for PDIndexed sRGB images. 
> But I have to find a PDF with such an image first to test it.
>  - add a -noColorConversion switch to the ExtractImage utility to extract 
> images in their original colorspace. For CMYK images this only works when a 
> TIFF encoder (e.g. from TwelveMonkeys) is in the class path.
>  - add support to export PNGs with ICC profile data in ImageIOUtil.
>  - fix a bug in PNGConverter which does not correctly embed the ICC profile 
> from the png file.
>  - the PNGConverterTest tests the raw images; While reading PNG files to 
> compare it also ensures that the embedded ICC profile is correctly respected. 
> The default PNG reader at least till JDK11 does *not* respect the embedded 
> ICC profile. I.e. the colors are wrong. But there is a workaround for this in 
> the PNGConverterTest (which I have in production for years now). See the 
> screenshot for the correct color display of the png_rgb_romm_16.png testfile 
> (left side; macOS Preview app) and the wrong display (right side; Java; 
> inside IDEA).
>  
> Access to the raw image allows beside finding bugs like in the PNGConverter 
> it also to do all kind of funny color things. E.g. a future patch could be to 
> allow using the raw images to print PDFs. If the PDF you want to print has 
> images with a gamut > sRGB (i.e. all modern cameras) and the target printer 
> has also a gamut > sRGB (i.e. some ink photo printer) you will for sure see a 
> difference in the resulting print. Such a mode would be rather slow, as the 
> current sRGB image handling is optimized for speed and using the original raw 
> images would need on demand color conversions in the printer driver. But you 
> get „high quality“ out of it (at least in respect to colors).
> I don’t think this is in time for the 2.0.20 release.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: Automatic website generation

2020-05-26 Thread Maruan Sahyoun
 
> Am 25.05.20 um 08:27 schrieb Maruan Sahyoun:
> > OK - I've reviewed the docs and also look at some projects using automated 
> > website building (there aren't many yet).
> Thanks for looking into this. It's not used that often yet as it is a quite 
> new 
> feature.
> 
> > The best example I've found is that from Apache Struts as this not only 
> > builds the website (including staging for PR based
> > changes) but also includes the JavaDocs when a new release was done. 
> > Unfortunately this is setup in a way that the JavaDoc
> > source ends up in the master branch. Currently we have older version of the 
> > JavaDoc available on our site too which imho would
> > clutter the master branch.
> > 
> > Please take a look at https://struts.apache.org/updating-website.html and 
> > let me know what you think.
> > 
> > Which tasks of our current web site buidling would you like to see 
> > simplified? Maybe we can start from there.
> I'd like to use the automatic website build including the staging area, so 
> that 
> it's easy the apply changes without installing the jekyll locally.

Would you see a similar approach as done by Struts

/quote
If you are a contributor, and the change is small you can push it directly to 
the master branch. In any other case please open a
Pull Request. The Pull Request will be automatically build and deployed to the 
staging site.

You can then review your changes before applying them to the master branch.
/quote

or a different workflow e.g. by switching to a staging branch - commit, 
automated build, review - and if valid apply the changes
to master. Please note that there will be some time between committing and 
being able to review using the staging area so for
many changes doing them locally with Jekylls hot building feature might give 
you a quicker turnaround.

Another option would be to change from Jekyll to a maven based build - either 
using a plugin or by downloading the proper
tooling. So one doesn't need to have a Jekyll install. As an example this is 
used by Apache Camel 
https://github.com/apache/camel-website/blob/master/pom.xml. 

BR
Maruan

> 
> Andreas
> > BR
> > Maruan
> >   
> > > Hi,
> > > 
> > > infra implemented some new features for github repos including some 
> > > automatic
> > > website building [1]. Should we use that for convenience?
> > > 
> > > WDYT?
> > > 
> > > Andreas
> > > 
> > > [1]
> > > https://cwiki.apache.org/confluence/display/INFRA/.asf.yaml+features+for+git+repositories
> > > 
> > > -
> > > To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> > > For additional commands, e-mail: dev-h...@pdfbox.apache.org
> > > 
> > 
> > 
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> > For additional commands, e-mail: dev-h...@pdfbox.apache.org
> > 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: dev-h...@pdfbox.apache.org
> 
-- 
Maruan Sahyoun

FileAffairs GmbH
Josef-Schappe-Straße 21
40882 Ratingen

Tel: +49 (2102) 89497 88
Fax: +49 (2102) 89497 91
sahy...@fileaffairs.de
www.fileaffairs.de

Geschäftsführer: Maruan Sahyoun
Handelsregister: AG Düsseldorf, HRB 53837
UST.-ID: DE248275827


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: Automatic website generation

2020-05-26 Thread Andreas Lehmkuehler

Am 25.05.20 um 08:27 schrieb Maruan Sahyoun:

OK - I've reviewed the docs and also look at some projects using automated 
website building (there aren't many yet).
Thanks for looking into this. It's not used that often yet as it is a quite new 
feature.



The best example I've found is that from Apache Struts as this not only builds 
the website (including staging for PR based
changes) but also includes the JavaDocs when a new release was done. 
Unfortunately this is setup in a way that the JavaDoc
source ends up in the master branch. Currently we have older version of the 
JavaDoc available on our site too which imho would
clutter the master branch.

Please take a look at https://struts.apache.org/updating-website.html and let 
me know what you think.

Which tasks of our current web site buidling would you like to see simplified? 
Maybe we can start from there.
I'd like to use the automatic website build including the staging area, so that 
it's easy the apply changes without installing the jekyll locally.


Andreas

BR
Maruan
  

Hi,

infra implemented some new features for github repos including some automatic
website building [1]. Should we use that for convenience?

WDYT?

Andreas

[1]
https://cwiki.apache.org/confluence/display/INFRA/.asf.yaml+features+for+git+repositories

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org





-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: Release 2.0.20 ?

2020-05-26 Thread Andreas Lehmkuehler

I'm planning to cut the release on next Monday 1st of June.

Andreas

Am 20.05.20 um 08:15 schrieb Andreas Lehmkuehler:

Hi,

how about cutting a 2.0.20 release in 2 or 3 weeks from now?

Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org