[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16700762#comment-16700762 ] ASF subversion and git services commented on PDFBOX-4184: - Commit 1847570 from [~talli...@apache.org] in branch 'pdfbox/trunk' [ https://svn.apache.org/r1847570 ] PDFBOX-4184 -- pull test file from JIRA rather than internet archive > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Assignee: Tilman Hausherr >Priority: Minor > Fix For: 2.0.12, 3.0.0 PDFBox > > Attachments: 032163.jpg, 16bit.png, LoadGovdocs.java, > fix_profile_use.patch, fix_profile_use3.patch, fix_profile_use4.patch, > images.zip, lossless_predictor_based_imageencoding.patch, > lossless_predictor_based_imageencoding_v2.patch, > lossless_predictor_based_imageencoding_v3.patch, > lossless_predictor_based_imageencoding_v4.patch, > lossless_predictor_based_imageencoding_v5.patch, > lossless_predictor_based_imageencoding_v6.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf, > size_compare.txt > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16700760#comment-16700760 ] ASF subversion and git services commented on PDFBOX-4184: - Commit 1847569 from [~talli...@apache.org] in branch 'pdfbox/branches/2.0' [ https://svn.apache.org/r1847569 ] PDFBOX-4184 -- switch to download test file from jira rather than the internet archive. > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Assignee: Tilman Hausherr >Priority: Minor > Fix For: 2.0.12, 3.0.0 PDFBox > > Attachments: 032163.jpg, 16bit.png, LoadGovdocs.java, > fix_profile_use.patch, fix_profile_use3.patch, fix_profile_use4.patch, > images.zip, lossless_predictor_based_imageencoding.patch, > lossless_predictor_based_imageencoding_v2.patch, > lossless_predictor_based_imageencoding_v3.patch, > lossless_predictor_based_imageencoding_v4.patch, > lossless_predictor_based_imageencoding_v5.patch, > lossless_predictor_based_imageencoding_v6.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf, > size_compare.txt > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16621926#comment-16621926 ] Emmeran Seehuber commented on PDFBOX-4184: -- [~tilman] Well the checkIdent() works perfectly fine for me on Mac OS X with JDK 10.0.2. But converting between color spaces can always be lossy if you are not converting into a bigger color space (e.g. into 16 bit ProPhoto etc.), as you may e.g. get clippings when not all source color values can be mapped 1:1 into the destination color space. No idea if there is something fixed in LCMS in JDK 10.0.2 to work better than in whatever JDK you used... I want to implement a "getRawImage()" method similar to the "getImage()" method in the PDImageXObject. Directly comparing the "raw" pixel values would a allow a test which would never fail. I started a branch with some changes for that some months ago, but had no time yet to finish it... That would also be something for a new ticket. > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.12, 3.0.0 PDFBox > > Attachments: 16bit.png, LoadGovdocs.java, fix_profile_use.patch, > fix_profile_use3.patch, fix_profile_use4.patch, images.zip, > lossless_predictor_based_imageencoding.patch, > lossless_predictor_based_imageencoding_v2.patch, > lossless_predictor_based_imageencoding_v3.patch, > lossless_predictor_based_imageencoding_v4.patch, > lossless_predictor_based_imageencoding_v5.patch, > lossless_predictor_based_imageencoding_v6.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf, > size_compare.txt > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16620878#comment-16620878 ] Tilman Hausherr commented on PDFBOX-4184: - I committed it anyway, still without the equality test which we didn't have before either. A possible explanation is that converting from RGB to CMYK and back is not always perfect. Maybe use the code that you had commented? > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.12, 3.0.0 PDFBox > > Attachments: 16bit.png, LoadGovdocs.java, fix_profile_use.patch, > fix_profile_use3.patch, fix_profile_use4.patch, images.zip, > lossless_predictor_based_imageencoding.patch, > lossless_predictor_based_imageencoding_v2.patch, > lossless_predictor_based_imageencoding_v3.patch, > lossless_predictor_based_imageencoding_v4.patch, > lossless_predictor_based_imageencoding_v5.patch, > lossless_predictor_based_imageencoding_v6.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf, > size_compare.txt > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16620862#comment-16620862 ] ASF subversion and git services commented on PDFBOX-4184: - Commit 1841354 from til...@apache.org in branch 'pdfbox/trunk' [ https://svn.apache.org/r1841354 ] PDFBOX-4184: keep ICC colorspace + set alternate colorspace, by Emmeran Seehuber > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.12, 3.0.0 PDFBox > > Attachments: 16bit.png, LoadGovdocs.java, fix_profile_use.patch, > fix_profile_use3.patch, fix_profile_use4.patch, images.zip, > lossless_predictor_based_imageencoding.patch, > lossless_predictor_based_imageencoding_v2.patch, > lossless_predictor_based_imageencoding_v3.patch, > lossless_predictor_based_imageencoding_v4.patch, > lossless_predictor_based_imageencoding_v5.patch, > lossless_predictor_based_imageencoding_v6.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf, > size_compare.txt > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16620863#comment-16620863 ] ASF subversion and git services commented on PDFBOX-4184: - Commit 1841355 from til...@apache.org in branch 'pdfbox/branches/2.0' [ https://svn.apache.org/r1841355 ] PDFBOX-4184: keep ICC colorspace + set alternate colorspace, by Emmeran Seehuber > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.12, 3.0.0 PDFBox > > Attachments: 16bit.png, LoadGovdocs.java, fix_profile_use.patch, > fix_profile_use3.patch, fix_profile_use4.patch, images.zip, > lossless_predictor_based_imageencoding.patch, > lossless_predictor_based_imageencoding_v2.patch, > lossless_predictor_based_imageencoding_v3.patch, > lossless_predictor_based_imageencoding_v4.patch, > lossless_predictor_based_imageencoding_v5.patch, > lossless_predictor_based_imageencoding_v6.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf, > size_compare.txt > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16620827#comment-16620827 ] Tilman Hausherr commented on PDFBOX-4184: - The cmyk test fails, there are many 1-differences like this: expected: but was: ; expected: but was: ; expected: but was: ; expected: but was: ; expected: but was: ; This is not much but I wonder why it works for you. What OS and what Java are you using? I tested this on W10 with jdk8 latest. > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.12, 3.0.0 PDFBox > > Attachments: 16bit.png, LoadGovdocs.java, fix_profile_use.patch, > fix_profile_use3.patch, fix_profile_use4.patch, images.zip, > lossless_predictor_based_imageencoding.patch, > lossless_predictor_based_imageencoding_v2.patch, > lossless_predictor_based_imageencoding_v3.patch, > lossless_predictor_based_imageencoding_v4.patch, > lossless_predictor_based_imageencoding_v5.patch, > lossless_predictor_based_imageencoding_v6.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf, > size_compare.txt > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16620155#comment-16620155 ] Tilman Hausherr commented on PDFBOX-4184: - If you try something with the cache, please create a new issue, release is expected next week (build on WE). Contrary to what I wrote, only the CMYK test needs an ICC space, so most users won't have a size difference, only users "like you" who create BufferedImages in an advanced way. If somebody hits the problem in the new release, the workaround is easy - just replace the colorspace with a common object. > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.12, 3.0.0 PDFBox > > Attachments: 16bit.png, LoadGovdocs.java, fix_profile_use.patch, > fix_profile_use3.patch, fix_profile_use4.patch, images.zip, > lossless_predictor_based_imageencoding.patch, > lossless_predictor_based_imageencoding_v2.patch, > lossless_predictor_based_imageencoding_v3.patch, > lossless_predictor_based_imageencoding_v4.patch, > lossless_predictor_based_imageencoding_v5.patch, > lossless_predictor_based_imageencoding_v6.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf, > size_compare.txt > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619634#comment-16619634 ] Tilman Hausherr commented on PDFBOX-4184: - Thanks, I'll commit that soon hopefully. I had only a quick look and I liked it. About a cache - tricky. You shouldn't cache any PD or COS structures because these point back to a document (scratch file). Or it should be in a way that the cache dies with the PDDocument. > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.12, 3.0.0 PDFBox > > Attachments: 16bit.png, LoadGovdocs.java, fix_profile_use.patch, > fix_profile_use3.patch, fix_profile_use4.patch, images.zip, > lossless_predictor_based_imageencoding.patch, > lossless_predictor_based_imageencoding_v2.patch, > lossless_predictor_based_imageencoding_v3.patch, > lossless_predictor_based_imageencoding_v4.patch, > lossless_predictor_based_imageencoding_v5.patch, > lossless_predictor_based_imageencoding_v6.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf, > size_compare.txt > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16618743#comment-16618743 ] Emmeran Seehuber commented on PDFBOX-4184: -- [~tilman] If you have a ICC profile on an image, which is not the builtin sRGB profile, you need the ICC profile, otherwise you will just have plain wrong colors. You should not look at (r,g,b) or (c,m,y,k) as concrete color values, but rather as vectors within the color space. Without a profile describing the vectorspace/colorspace you have no idea what real colors the vector values result in. DeviceRGB is (on screen) often interpreted as sRGB. But what DeviceCMYK means is really up to the concrete interpreting device. I.e. this will look different on every printer (brightness, color, ...). So DeviceCMYK as a colorspace for an image mostly means "random", if you are not explicit targeting one specific printer. The ICC profile describes how to transform the color-vector-data into other colorspaces, e.g. into sRGB to view on the screen or the concrete ICC profile of the printing device. If you load images in java using ImageIO you usually (especially when using twelve monkeys) get an sRGB image. So you would never hit this path. If you want to load an image with the real color profile of the image you must pass a special prepared (i.e. with the right profile) BufferedImage into ImageIO. So you wont get an image with an color space different to sRGB by accident. If you have a image with an ICC profile, you always want the in this colorspace with the attached profile. As its already not so easy to get the image in anything different than sRGB. Regarding file size bloat: Yes, the ICC profile will sum up, especially if you have more images. The correct solution would be a ICC_Profile <-> PDICCBased cache in the document, so that the same profile does not get encoded twice. Should I implement such a cache? In my application I manually deduplicate the ICC profiles at the moment. The attached patch [^fix_profile_use4.patch] fixes the test driver and also specifies a "Alternate" colorspace for the profile, for all those devices which can not handle ICC_Profile's. With the correct ICC_Profile specified now also the "roundtrip" sRGB->ISO Coated->sRGB works correctly, so the image can be compared with the original image. > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.12, 3.0.0 PDFBox > > Attachments: 16bit.png, LoadGovdocs.java, fix_profile_use.patch, > fix_profile_use3.patch, fix_profile_use4.patch, images.zip, > lossless_predictor_based_imageencoding.patch, > lossless_predictor_based_imageencoding_v2.patch, > lossless_predictor_based_imageencoding_v3.patch, > lossless_predictor_based_imageencoding_v4.patch, > lossless_predictor_based_imageencoding_v5.patch, > lossless_predictor_based_imageencoding_v6.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf, > size_compare.txt > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16617995#comment-16617995 ] Tilman Hausherr commented on PDFBOX-4184: - Thanks, the change makes sense, but I'd like to have a "no longer failing" test for this, i.e. where the generated PDF looks different than the image due to the missing ICC profile. Another problem is that\{{testCreateLosslessFromImageCMYK}} now fails. I wonder if the ICC profile is needed for CMYK? I also see the danger that PDFs get bigger, if each image now has a (different) ICC profile. And what about b/w images? > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.12, 3.0.0 PDFBox > > Attachments: 16bit.png, LoadGovdocs.java, fix_profile_use.patch, > images.zip, lossless_predictor_based_imageencoding.patch, > lossless_predictor_based_imageencoding_v2.patch, > lossless_predictor_based_imageencoding_v3.patch, > lossless_predictor_based_imageencoding_v4.patch, > lossless_predictor_based_imageencoding_v5.patch, > lossless_predictor_based_imageencoding_v6.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf, > size_compare.txt > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16617610#comment-16617610 ] Emmeran Seehuber commented on PDFBOX-4184: -- I'm in the progress to migrate some part of my application from iText to PDFBox. While doing so I found a bug with image that have a ICC_Profile. The LosslessFactory compresses the ICC Profile of an image correctly - but does not use it ... This small patch fixes this: [^fix_profile_use.patch] > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.12, 3.0.0 PDFBox > > Attachments: 16bit.png, LoadGovdocs.java, fix_profile_use.patch, > images.zip, lossless_predictor_based_imageencoding.patch, > lossless_predictor_based_imageencoding_v2.patch, > lossless_predictor_based_imageencoding_v3.patch, > lossless_predictor_based_imageencoding_v4.patch, > lossless_predictor_based_imageencoding_v5.patch, > lossless_predictor_based_imageencoding_v6.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf, > size_compare.txt > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16551964#comment-16551964 ] ASF subversion and git services commented on PDFBOX-4184: - Commit 1836431 from til...@apache.org in branch 'pdfbox/branches/2.0' [ https://svn.apache.org/r1836431 ] PDFBOX-4184: size comparison only for DeviceRGB images > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.12, 3.0.0 PDFBox > > Attachments: 16bit.png, LoadGovdocs.java, images.zip, > lossless_predictor_based_imageencoding.patch, > lossless_predictor_based_imageencoding_v2.patch, > lossless_predictor_based_imageencoding_v3.patch, > lossless_predictor_based_imageencoding_v4.patch, > lossless_predictor_based_imageencoding_v5.patch, > lossless_predictor_based_imageencoding_v6.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf, > size_compare.txt > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16551963#comment-16551963 ] ASF subversion and git services commented on PDFBOX-4184: - Commit 1836430 from til...@apache.org in branch 'pdfbox/trunk' [ https://svn.apache.org/r1836430 ] PDFBOX-4184: size comparison only for DeviceRGB images > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.12, 3.0.0 PDFBox > > Attachments: 16bit.png, LoadGovdocs.java, images.zip, > lossless_predictor_based_imageencoding.patch, > lossless_predictor_based_imageencoding_v2.patch, > lossless_predictor_based_imageencoding_v3.patch, > lossless_predictor_based_imageencoding_v4.patch, > lossless_predictor_based_imageencoding_v5.patch, > lossless_predictor_based_imageencoding_v6.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf, > size_compare.txt > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16551961#comment-16551961 ] Emmeran Seehuber commented on PDFBOX-4184: -- Colorspace == sRGB && depth == 16 should nearly always be false. But I am fine with just adding the colorspace condition. > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.12, 3.0.0 PDFBox > > Attachments: 16bit.png, LoadGovdocs.java, images.zip, > lossless_predictor_based_imageencoding.patch, > lossless_predictor_based_imageencoding_v2.patch, > lossless_predictor_based_imageencoding_v3.patch, > lossless_predictor_based_imageencoding_v4.patch, > lossless_predictor_based_imageencoding_v5.patch, > lossless_predictor_based_imageencoding_v6.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf, > size_compare.txt > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16551958#comment-16551958 ] Tilman Hausherr commented on PDFBOX-4184: - I could add PDDeviceRGB to the "if", but I'd like to keep the 16 bit condition, because I added that one after the test failed. > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.12, 3.0.0 PDFBox > > Attachments: 16bit.png, LoadGovdocs.java, images.zip, > lossless_predictor_based_imageencoding.patch, > lossless_predictor_based_imageencoding_v2.patch, > lossless_predictor_based_imageencoding_v3.patch, > lossless_predictor_based_imageencoding_v4.patch, > lossless_predictor_based_imageencoding_v5.patch, > lossless_predictor_based_imageencoding_v6.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf, > size_compare.txt > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16551951#comment-16551951 ] Emmeran Seehuber commented on PDFBOX-4184: -- I would suggest changing the condition from {code:java} if (pdImageXObject.getBitsPerComponent() < 16 && image.getWidth() * image.getHeight() <= 50 * 50) {code} to {code:java} if (pdImageXObject.getColorSpace == PDDeviceRGB.INSTANCE && image.getWidth() * image.getHeight() <= 50 * 50) {code} as otherwise the LosslessFactory may "random" destroy/reduce the color information of small images. If e.g. the user has a requirement to always encode images as CMYK, this would break it. On the other side reducing a 16 Bit sRGB image to 8 bit is not really losing color information, as sRGB is a rather small color space. As most image decoders (e.g. TwelveMonkeys) by default convert every image they decode to sRGB, you can be sure that the user really wants the non default color space used when he gives a non sRGB image to the LosslessFactory. > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.12, 3.0.0 PDFBox > > Attachments: 16bit.png, LoadGovdocs.java, images.zip, > lossless_predictor_based_imageencoding.patch, > lossless_predictor_based_imageencoding_v2.patch, > lossless_predictor_based_imageencoding_v3.patch, > lossless_predictor_based_imageencoding_v4.patch, > lossless_predictor_based_imageencoding_v5.patch, > lossless_predictor_based_imageencoding_v6.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf, > size_compare.txt > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16551933#comment-16551933 ] ASF subversion and git services commented on PDFBOX-4184: - Commit 1836425 from til...@apache.org in branch 'pdfbox/branches/2.0' [ https://svn.apache.org/r1836425 ] PDFBOX-4184: compare sizes of different strategies for small non 16 bit images > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.12, 3.0.0 PDFBox > > Attachments: 16bit.png, LoadGovdocs.java, images.zip, > lossless_predictor_based_imageencoding.patch, > lossless_predictor_based_imageencoding_v2.patch, > lossless_predictor_based_imageencoding_v3.patch, > lossless_predictor_based_imageencoding_v4.patch, > lossless_predictor_based_imageencoding_v5.patch, > lossless_predictor_based_imageencoding_v6.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf, > size_compare.txt > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16551932#comment-16551932 ] ASF subversion and git services commented on PDFBOX-4184: - Commit 1836424 from til...@apache.org in branch 'pdfbox/trunk' [ https://svn.apache.org/r1836424 ] PDFBOX-4184: compare sizes of different strategies for small non 16 bit images > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.12, 3.0.0 PDFBox > > Attachments: 16bit.png, LoadGovdocs.java, images.zip, > lossless_predictor_based_imageencoding.patch, > lossless_predictor_based_imageencoding_v2.patch, > lossless_predictor_based_imageencoding_v3.patch, > lossless_predictor_based_imageencoding_v4.patch, > lossless_predictor_based_imageencoding_v5.patch, > lossless_predictor_based_imageencoding_v6.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf, > size_compare.txt > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16551917#comment-16551917 ] Tilman Hausherr commented on PDFBOX-4184: - I did a size comparison. It went over the zip files from 0 to 18. The attachment has the files were the size of the predictor compression was at least 5% over the size of the "old" compression. Alsmost all of the files are jpeg files and of the kind that shouldn't have been jpeg compressed in the first place. Jpeg is for photographs and not for charts, or anything with sharp edges. > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.12, 3.0.0 PDFBox > > Attachments: 16bit.png, LoadGovdocs.java, images.zip, > lossless_predictor_based_imageencoding.patch, > lossless_predictor_based_imageencoding_v2.patch, > lossless_predictor_based_imageencoding_v3.patch, > lossless_predictor_based_imageencoding_v4.patch, > lossless_predictor_based_imageencoding_v5.patch, > lossless_predictor_based_imageencoding_v6.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf, > size_compare.txt > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539089#comment-16539089 ] ASF subversion and git services commented on PDFBOX-4184: - Commit 1835595 from til...@apache.org in branch 'pdfbox/trunk' [ https://svn.apache.org/r1835595 ] PDFBOX-4184: fix heuristics > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.12, 3.0.0 PDFBox > > Attachments: 16bit.png, LoadGovdocs.java, > lossless_predictor_based_imageencoding.patch, > lossless_predictor_based_imageencoding_v2.patch, > lossless_predictor_based_imageencoding_v3.patch, > lossless_predictor_based_imageencoding_v4.patch, > lossless_predictor_based_imageencoding_v5.patch, > lossless_predictor_based_imageencoding_v6.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf, > size_compare.txt > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539091#comment-16539091 ] ASF subversion and git services commented on PDFBOX-4184: - Commit 1835596 from til...@apache.org in branch 'pdfbox/branches/2.0' [ https://svn.apache.org/r1835596 ] PDFBOX-4184: fix heuristics > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.12, 3.0.0 PDFBox > > Attachments: 16bit.png, LoadGovdocs.java, > lossless_predictor_based_imageencoding.patch, > lossless_predictor_based_imageencoding_v2.patch, > lossless_predictor_based_imageencoding_v3.patch, > lossless_predictor_based_imageencoding_v4.patch, > lossless_predictor_based_imageencoding_v5.patch, > lossless_predictor_based_imageencoding_v6.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf, > size_compare.txt > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539073#comment-16539073 ] Tilman Hausherr commented on PDFBOX-4184: - 8.6.6.3 Indexed Colour Spaces, page 156 in the PDF 32000 specification. Our implementation to decode them is in PDIndexed.java. > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.12, 3.0.0 PDFBox > > Attachments: 16bit.png, LoadGovdocs.java, > lossless_predictor_based_imageencoding.patch, > lossless_predictor_based_imageencoding_v2.patch, > lossless_predictor_based_imageencoding_v3.patch, > lossless_predictor_based_imageencoding_v4.patch, > lossless_predictor_based_imageencoding_v5.patch, > lossless_predictor_based_imageencoding_v6.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf, > size_compare.txt > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539054#comment-16539054 ] Emmeran Seehuber commented on PDFBOX-4184: -- {quote}Most of the time people complain about time. There are almost never complaints about size. {quote} They will complain about size when they start to write high resolution prepress PDFs... There is a difference between a 60 MB PDF and a 80 MB PDF, especially if you work with many PDFs and have to upload them to the print shop. If you only care about web with low resolution images then of course the size is not that important. As always, it depends on the use case. {quote}It is possible in PDF specification, but it hasn't been implemented for PDFBox. {quote} Do you have a pointer (e.g. pagenumber in the PDF 1.7 spec) where the index image format in PDF is described? I did not find this. {quote}Try {{PDImageXObject.createFromByteArray()}}. {quote} Ah, I see. So their is already an API for this, it just does not really handle PNGs yet (i.e. it will load the PNG as BufferedImage and do a lossless compression in opposite to directly reusing the already optimized IDAT chunk - which of course would also be much faster). I'll try to find some time to implement IDAT reusing. > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.12, 3.0.0 PDFBox > > Attachments: 16bit.png, LoadGovdocs.java, > lossless_predictor_based_imageencoding.patch, > lossless_predictor_based_imageencoding_v2.patch, > lossless_predictor_based_imageencoding_v3.patch, > lossless_predictor_based_imageencoding_v4.patch, > lossless_predictor_based_imageencoding_v5.patch, > lossless_predictor_based_imageencoding_v6.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf, > size_compare.txt > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16538905#comment-16538905 ] Tilman Hausherr commented on PDFBOX-4184: - Yeah I'll make that change later… I was also planning to do some tests to find out when it is better to use the predictor and when not, but then I did nothing because I was too busy with other things. Maybe I will... maybe the predictor thing will be an option. {quote} it is not possible to write an indexed image {quote} It is possible in PDF specification, but it hasn't been implemented for PDFBox. Most of the time people complain about time. There are almost never complaints about size. {quote} LosslessFactory.createFromByteArray(PDDocument,byte[]) which would try and sniffer the image type. {quote} Try {{PDImageXObject.createFromByteArray()}}. > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.12, 3.0.0 PDFBox > > Attachments: 16bit.png, LoadGovdocs.java, > lossless_predictor_based_imageencoding.patch, > lossless_predictor_based_imageencoding_v2.patch, > lossless_predictor_based_imageencoding_v3.patch, > lossless_predictor_based_imageencoding_v4.patch, > lossless_predictor_based_imageencoding_v5.patch, > lossless_predictor_based_imageencoding_v6.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf, > size_compare.txt > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16538311#comment-16538311 ] Emmeran Seehuber commented on PDFBOX-4184: -- I did a test with a subset of the govdoc images to get an idea what estimate method might be better. I tested the govdoc ZIPs 0 up to 57. See the attached report [^size_compare.txt] Their were 5590 images found, 1719 did not change (i.e. no difference between a signed and a Math.abs based estimated), 77 files compressed better with a singed estimate and 3794 files compressed better when using Math.abs in the estimate. So I would suggest changing estCompressSum() to {code} private static long estCompressSum(byte[] dataRawRowSub) { long sum = 0; for (byte aDataRawRowSub : dataRawRowSub) { sum += Math.abs(aDataRawRowSub); } return sum; } {code} as this clearly seems to be a win. > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.12, 3.0.0 PDFBox > > Attachments: 16bit.png, LoadGovdocs.java, > lossless_predictor_based_imageencoding.patch, > lossless_predictor_based_imageencoding_v2.patch, > lossless_predictor_based_imageencoding_v3.patch, > lossless_predictor_based_imageencoding_v4.patch, > lossless_predictor_based_imageencoding_v5.patch, > lossless_predictor_based_imageencoding_v6.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf, > size_compare.txt > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530179#comment-16530179 ] Emmeran Seehuber commented on PDFBOX-4184: -- [~tilman] Regarding estCompressSum() and chooseDataRowToWrite(): This is a "oracle" (=heuristic) that tries to get the "best" row to write. The idea is to choose the row representation (i.e. subtracted to byte in above row, subtracted to byte on the left, and so on) that has most value 0 or at least very small. This reduces the possible values in the Huffman tree for the ZIP compression which allows for a better compression and also makes it more likely to have the same value repeated to have it run length encoded (RLE). Gradients are "perfectly" compressed with such a scheme. The default algorithm is not perfect and misses the best possible combination of rows. But getting the best combination would mean trying all different row encodings which means 5 * combinations. Tools like [pngcrush|https://pmt.sourceforge.io/pngcrush/] are trying to do so using more or less brute force search. But that takes time... So this is not suitable for a generic image writer. I'm fine with adding heuristics to decide when to use which encoder. But you can spend ages to get this right, and you will always find cases where the heuristic will be wrong... The overall idea of this encoder is to get a better compression for *most* cases, especially when using zip compression level 9. When modifying estCompressSum() we should run a test on the govdocs corpus and record the sizes (in e.g. a textfile) and then change the method and record the sizes after the change. Then we could compare if the change really changes the overall compression for the better... I won't have time to look into that this week. If I understand the PDF spec correctly, it is not possible to write an indexed image - which would be very nice for small icon like images... Regarding the impact on openhtmltopdf it's difficult to say. I have a project where I use openhtmltopdf to generate reports which contain tons of photo images, so this change is going to improve the file size there. Also if you care about file size you should ensure that the compression level is always set to 9. Maybe we should really add an heuristic like "if the image is smaller then e.g. 50x50 pixel _and_ it is default sRGB, just encode it without predictor using the old sRGB path". For such small images we also could do the brute force way and encode it using both methods and then choose the smaller result. Regarding the testCreateLosslessFromImageCMYK(): If the image data is nearly identical but not exact the same, it's likely some rounding errors because of the color conversion. Of course, this should not happen, but it also depends how the image color is converted to sRGB. It would be awesome (not only for this test, but also for other stuff) if PDImageXObject had a getRawImage() (or similar named) method, which would return the BufferedImage with whatever colorspace it has, so that CMYK images just would be returned as CMYK images and not converted to sRGB. I'm also thinking about adding a method LosslessFactory.createFromByteArray(PDDocument,byte[]) which would try and sniffer the image type. It could use JPEGFactory.createFromByteArray() for JPEGs and could try to directly reuse the IDAT chunk of PNGs. If it could not encode the image because the PNG has e.g. a index color encoding, it would return null, so that the user knows he has to load the image to encode it from the BufferedImage. This would speed up the PDF write time if the user already has the image encoded and it would allow to precompress images using external tools like pngcrush and benefit from that compression. But thats a different issue. > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.12, 3.0.0 PDFBox > > Attachments: 16bit.png, LoadGovdocs.java, > lossless_predictor_based_imageencoding.patch, > lossless_predictor_based_imageencoding_v2.patch, > lossless_predictor_based_imageencoding_v3.patch, > lossless_predictor_based_imageencoding_v4.patch, > lossless_predictor_based_imageencoding_v5.patch, > lossless_predictor_based_imageencoding_v6.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test
[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530102#comment-16530102 ] Tilman Hausherr commented on PDFBOX-4184: - I looked at the sizes of the PDF test result files. Have a look at bitmask4babgr.pdf and intargb.pdf. This isn't just space needed for the extra dictionary. In bitmask4babgr.pdf, the first image had a compressed size of 214 and now it has a size of 701. OTOH the file PDFBOX-4184-032163.pdf had a size of 36240 and now 31607, and only 27007 by modifying estCompressSum() to sum += Math.abs(aDataRawRowSub); I'm wondering about the logic of chooseDataRowToWrite(). You're choosing the compression method based on the result of estCompressSum() which is the sum of the byte values. How would this have any influence on compression? Why would a sequence of 00 have a different compression length than a sequence of FF? Your comment mentions "This is just the recommend algorithm in the spec" and surprisingly, this is true: [https://medium.com/@duhroach/how-png-works-f1174e3cc7b7] that one recommends to use abs of signed values (which I tried above). I tried that but it doesn't make things better for the non photo files. Same here with more details: [https://www.w3.org/TR/PNG-Encoders.html#E.Filter-selection] I think we should count colors and/or consider the bit depth. Or the geometric size of the image, i.e. something below 25x25 is probably rather an icon than a photograph. The current situation might have a negative impact on the openhtmltopdf project, because many web page have small icons. > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.12, 3.0.0 PDFBox > > Attachments: 16bit.png, LoadGovdocs.java, > lossless_predictor_based_imageencoding.patch, > lossless_predictor_based_imageencoding_v2.patch, > lossless_predictor_based_imageencoding_v3.patch, > lossless_predictor_based_imageencoding_v4.patch, > lossless_predictor_based_imageencoding_v5.patch, > lossless_predictor_based_imageencoding_v6.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530100#comment-16530100 ] ASF subversion and git services commented on PDFBOX-4184: - Commit 1834861 from til...@apache.org in branch 'pdfbox/trunk' [ https://svn.apache.org/r1834861 ] PDFBOX-4184: write test image into PDF > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.12, 3.0.0 PDFBox > > Attachments: 16bit.png, LoadGovdocs.java, > lossless_predictor_based_imageencoding.patch, > lossless_predictor_based_imageencoding_v2.patch, > lossless_predictor_based_imageencoding_v3.patch, > lossless_predictor_based_imageencoding_v4.patch, > lossless_predictor_based_imageencoding_v5.patch, > lossless_predictor_based_imageencoding_v6.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530099#comment-16530099 ] Tilman Hausherr commented on PDFBOX-4184: - What about the commented code in testCreateLosslessFromImageCMYK? I tried that it and there is a very slight difference. > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.12, 3.0.0 PDFBox > > Attachments: 16bit.png, LoadGovdocs.java, > lossless_predictor_based_imageencoding.patch, > lossless_predictor_based_imageencoding_v2.patch, > lossless_predictor_based_imageencoding_v3.patch, > lossless_predictor_based_imageencoding_v4.patch, > lossless_predictor_based_imageencoding_v5.patch, > lossless_predictor_based_imageencoding_v6.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530101#comment-16530101 ] ASF subversion and git services commented on PDFBOX-4184: - Commit 1834862 from til...@apache.org in branch 'pdfbox/branches/2.0' [ https://svn.apache.org/r1834862 ] PDFBOX-4184: write test image into PDF > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.12, 3.0.0 PDFBox > > Attachments: 16bit.png, LoadGovdocs.java, > lossless_predictor_based_imageencoding.patch, > lossless_predictor_based_imageencoding_v2.patch, > lossless_predictor_based_imageencoding_v3.patch, > lossless_predictor_based_imageencoding_v4.patch, > lossless_predictor_based_imageencoding_v5.patch, > lossless_predictor_based_imageencoding_v6.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16529444#comment-16529444 ] ASF subversion and git services commented on PDFBOX-4184: - Commit 1834822 from til...@apache.org in branch 'pdfbox/branches/2.0' [ https://svn.apache.org/r1834822 ] PDFBOX-4184: sonar fix; remove println from test > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.12, 3.0.0 PDFBox > > Attachments: 16bit.png, LoadGovdocs.java, > lossless_predictor_based_imageencoding.patch, > lossless_predictor_based_imageencoding_v2.patch, > lossless_predictor_based_imageencoding_v3.patch, > lossless_predictor_based_imageencoding_v4.patch, > lossless_predictor_based_imageencoding_v5.patch, > lossless_predictor_based_imageencoding_v6.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16529443#comment-16529443 ] ASF subversion and git services commented on PDFBOX-4184: - Commit 1834821 from til...@apache.org in branch 'pdfbox/trunk' [ https://svn.apache.org/r1834821 ] PDFBOX-4184: sonar fix; remove println from test > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.12, 3.0.0 PDFBox > > Attachments: 16bit.png, LoadGovdocs.java, > lossless_predictor_based_imageencoding.patch, > lossless_predictor_based_imageencoding_v2.patch, > lossless_predictor_based_imageencoding_v3.patch, > lossless_predictor_based_imageencoding_v4.patch, > lossless_predictor_based_imageencoding_v5.patch, > lossless_predictor_based_imageencoding_v6.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16529227#comment-16529227 ] ASF subversion and git services commented on PDFBOX-4184: - Commit 1834805 from til...@apache.org in branch 'pdfbox/branches/2.0' [ https://svn.apache.org/r1834805 ] PDFBOX-4184: support predictor compression and 16 bit RGB images, by Emmeran Seehuber > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.12, 3.0.0 PDFBox > > Attachments: 16bit.png, LoadGovdocs.java, > lossless_predictor_based_imageencoding.patch, > lossless_predictor_based_imageencoding_v2.patch, > lossless_predictor_based_imageencoding_v3.patch, > lossless_predictor_based_imageencoding_v4.patch, > lossless_predictor_based_imageencoding_v5.patch, > lossless_predictor_based_imageencoding_v6.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16529226#comment-16529226 ] ASF subversion and git services commented on PDFBOX-4184: - Commit 1834804 from til...@apache.org in branch 'pdfbox/trunk' [ https://svn.apache.org/r1834804 ] PDFBOX-4184: support predictor compression and 16 bit RGB images, by Emmeran Seehuber > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.12, 3.0.0 PDFBox > > Attachments: 16bit.png, LoadGovdocs.java, > lossless_predictor_based_imageencoding.patch, > lossless_predictor_based_imageencoding_v2.patch, > lossless_predictor_based_imageencoding_v3.patch, > lossless_predictor_based_imageencoding_v4.patch, > lossless_predictor_based_imageencoding_v5.patch, > lossless_predictor_based_imageencoding_v6.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16529218#comment-16529218 ] Tilman Hausherr commented on PDFBOX-4184: - For copyright reasons we can't include some of the files in the repository. File 032163.jpg comes from a government site but I couldn't find any details. And of course we don't know the copyright of the arrow picture from https://github.com/danfickle/openhtmltopdf/issues/173 . > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.12, 3.0.0 PDFBox > > Attachments: 16bit.png, LoadGovdocs.java, > lossless_predictor_based_imageencoding.patch, > lossless_predictor_based_imageencoding_v2.patch, > lossless_predictor_based_imageencoding_v3.patch, > lossless_predictor_based_imageencoding_v4.patch, > lossless_predictor_based_imageencoding_v5.patch, > lossless_predictor_based_imageencoding_v6.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16529214#comment-16529214 ] ASF subversion and git services commented on PDFBOX-4184: - Commit 1834802 from til...@apache.org in branch 'pdfbox/branches/2.0' [ https://svn.apache.org/r1834802 ] PDFBOX-4184: add remote loading of 16 bit ARGB test file > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.12, 3.0.0 PDFBox > > Attachments: 16bit.png, LoadGovdocs.java, > lossless_predictor_based_imageencoding.patch, > lossless_predictor_based_imageencoding_v2.patch, > lossless_predictor_based_imageencoding_v3.patch, > lossless_predictor_based_imageencoding_v4.patch, > lossless_predictor_based_imageencoding_v5.patch, > lossless_predictor_based_imageencoding_v6.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16529215#comment-16529215 ] ASF subversion and git services commented on PDFBOX-4184: - Commit 1834803 from til...@apache.org in branch 'pdfbox/trunk' [ https://svn.apache.org/r1834803 ] PDFBOX-4184: add remote loading of 16 bit ARGB test file > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.12, 3.0.0 PDFBox > > Attachments: 16bit.png, LoadGovdocs.java, > lossless_predictor_based_imageencoding.patch, > lossless_predictor_based_imageencoding_v2.patch, > lossless_predictor_based_imageencoding_v3.patch, > lossless_predictor_based_imageencoding_v4.patch, > lossless_predictor_based_imageencoding_v5.patch, > lossless_predictor_based_imageencoding_v6.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16529182#comment-16529182 ] Emmeran Seehuber commented on PDFBOX-4184: -- Yes, 16 bit alpha channels were ignored ... I've updated the patch and included your unit test. See [^lossless_predictor_based_imageencoding_v6.patch] > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.12, 3.0.0 PDFBox > > Attachments: 16bit.png, LoadGovdocs.java, > lossless_predictor_based_imageencoding.patch, > lossless_predictor_based_imageencoding_v2.patch, > lossless_predictor_based_imageencoding_v3.patch, > lossless_predictor_based_imageencoding_v4.patch, > lossless_predictor_based_imageencoding_v5.patch, > lossless_predictor_based_imageencoding_v6.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16528738#comment-16528738 ] ASF subversion and git services commented on PDFBOX-4184: - Commit 1834741 from til...@apache.org in branch 'pdfbox/branches/2.0' [ https://svn.apache.org/r1834741 ] PDFBOX-4184: make flate compression level public to allow access in future image compression code, as suggested by Emmeran Seehuber > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.12, 3.0.0 PDFBox > > Attachments: 16bit.png, LoadGovdocs.java, > lossless_predictor_based_imageencoding.patch, > lossless_predictor_based_imageencoding_v2.patch, > lossless_predictor_based_imageencoding_v3.patch, > lossless_predictor_based_imageencoding_v4.patch, > lossless_predictor_based_imageencoding_v5.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16528737#comment-16528737 ] ASF subversion and git services commented on PDFBOX-4184: - Commit 1834740 from til...@apache.org in branch 'pdfbox/trunk' [ https://svn.apache.org/r1834740 ] PDFBOX-4184: make flate compression level public to allow access in future image compression code, as suggested by Emmeran Seehuber > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.12, 3.0.0 PDFBox > > Attachments: 16bit.png, LoadGovdocs.java, > lossless_predictor_based_imageencoding.patch, > lossless_predictor_based_imageencoding_v2.patch, > lossless_predictor_based_imageencoding_v3.patch, > lossless_predictor_based_imageencoding_v4.patch, > lossless_predictor_based_imageencoding_v5.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16528733#comment-16528733 ] ASF subversion and git services commented on PDFBOX-4184: - Commit 1834739 from til...@apache.org in branch 'pdfbox/trunk' [ https://svn.apache.org/r1834739 ] PDFBOX-4184, PDFBOX-4071: exclude idea, as suggested by Emmeran Seehuber > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.12, 3.0.0 PDFBox > > Attachments: 16bit.png, LoadGovdocs.java, > lossless_predictor_based_imageencoding.patch, > lossless_predictor_based_imageencoding_v2.patch, > lossless_predictor_based_imageencoding_v3.patch, > lossless_predictor_based_imageencoding_v4.patch, > lossless_predictor_based_imageencoding_v5.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16528731#comment-16528731 ] ASF subversion and git services commented on PDFBOX-4184: - Commit 1834738 from til...@apache.org in branch 'pdfbox/branches/2.0' [ https://svn.apache.org/r1834738 ] PDFBOX-4184, PDFBOX-4071: exclude idea, as suggested by Emmeran Seehuber > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.12, 3.0.0 PDFBox > > Attachments: 16bit.png, LoadGovdocs.java, > lossless_predictor_based_imageencoding.patch, > lossless_predictor_based_imageencoding_v2.patch, > lossless_predictor_based_imageencoding_v3.patch, > lossless_predictor_based_imageencoding_v4.patch, > lossless_predictor_based_imageencoding_v5.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16528728#comment-16528728 ] Tilman Hausherr commented on PDFBOX-4184: - There's a new problem and I don't know why this didn't come up before. See this code: {code:java} public void testCreateLosslessFrom16BitPNG() throws IOException { PDDocument document = new PDDocument(); BufferedImage image = ImageIO.read(this.getClass().getResourceAsStream("16bit.png")); assertEquals(64, image.getColorModel().getPixelSize()); assertEquals(Transparency.TRANSLUCENT, image.getColorModel().getTransparency()); assertEquals(4, image.getRaster().getNumDataElements()); assertEquals(java.awt.image.DataBuffer.TYPE_USHORT, image.getRaster().getDataBuffer().getDataType()); PDImageXObject ximage = LosslessFactory.createFromImage(document, image); int w = image.getWidth(); int h = image.getHeight(); validate(ximage, 16, w, h, "png", PDDeviceRGB.INSTANCE.getName()); System.out.println(ximage.getImage()); checkIdent(image, ximage.getImage()); checkIdentRGB(image, ximage.getOpaqueImage()); assertNotNull(ximage.getSoftMask()); validate(ximage.getSoftMask(), 8, w, h, "png", PDDeviceGray.INSTANCE.getName()); assertEquals(35, colorCount(ximage.getSoftMask().getImage())); doWritePDF(document, ximage, testResultsDir, "png16bit.pdf"); } {code} The test fails because the softmask is all 0. For some reason, {{alphaImageData}} is not filled when {{prepareImageXObject}} is called by {{preparePredictorPDImage}}. > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.12, 3.0.0 PDFBox > > Attachments: 16bit.png, LoadGovdocs.java, > lossless_predictor_based_imageencoding.patch, > lossless_predictor_based_imageencoding_v2.patch, > lossless_predictor_based_imageencoding_v3.patch, > lossless_predictor_based_imageencoding_v4.patch, > lossless_predictor_based_imageencoding_v5.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16521073#comment-16521073 ] Tilman Hausherr commented on PDFBOX-4184: - I'll commit the change later, after the release of 2.0.11 (yes, 2.0.11) which is planned for next week. > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 3.0.0 PDFBox, 2.0.12 > > Attachments: LoadGovdocs.java, > lossless_predictor_based_imageencoding.patch, > lossless_predictor_based_imageencoding_v2.patch, > lossless_predictor_based_imageencoding_v3.patch, > lossless_predictor_based_imageencoding_v4.patch, > lossless_predictor_based_imageencoding_v5.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16521070#comment-16521070 ] Tilman Hausherr commented on PDFBOX-4184: - I'll commit the change later, after the release of 2.0.11 (yes, 2.0.11) which is planned for next week. > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.11, 3.0.0 PDFBox > > Attachments: LoadGovdocs.java, > lossless_predictor_based_imageencoding.patch, > lossless_predictor_based_imageencoding_v2.patch, > lossless_predictor_based_imageencoding_v3.patch, > lossless_predictor_based_imageencoding_v4.patch, > lossless_predictor_based_imageencoding_v5.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491995#comment-16491995 ] ASF subversion and git services commented on PDFBOX-4184: - Commit 1832331 from [~tilman] in branch 'pdfbox/branches/2.0' [ https://svn.apache.org/r1832331 ] PDFBOX-4184: add encoding test with file that failed in development test > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.10, 3.0.0 PDFBox > > Attachments: LoadGovdocs.java, > lossless_predictor_based_imageencoding.patch, > lossless_predictor_based_imageencoding_v2.patch, > lossless_predictor_based_imageencoding_v3.patch, > lossless_predictor_based_imageencoding_v4.patch, > lossless_predictor_based_imageencoding_v5.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491994#comment-16491994 ] ASF subversion and git services commented on PDFBOX-4184: - Commit 1832330 from [~tilman] in branch 'pdfbox/trunk' [ https://svn.apache.org/r1832330 ] PDFBOX-4184: add encoding test with file that failed in development test > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.10, 3.0.0 PDFBox > > Attachments: LoadGovdocs.java, > lossless_predictor_based_imageencoding.patch, > lossless_predictor_based_imageencoding_v2.patch, > lossless_predictor_based_imageencoding_v3.patch, > lossless_predictor_based_imageencoding_v4.patch, > lossless_predictor_based_imageencoding_v5.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491992#comment-16491992 ] ASF subversion and git services commented on PDFBOX-4184: - Commit 1832328 from [~tilman] in branch 'pdfbox/branches/2.0' [ https://svn.apache.org/r1832328 ] PDFBOX-4184: remote loading of test file 032163.jpg / http://www.crh.noaa.gov/Image/gjt/images/ImageGallery/Uncompahgre_small.jpg > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.10, 3.0.0 PDFBox > > Attachments: LoadGovdocs.java, > lossless_predictor_based_imageencoding.patch, > lossless_predictor_based_imageencoding_v2.patch, > lossless_predictor_based_imageencoding_v3.patch, > lossless_predictor_based_imageencoding_v4.patch, > lossless_predictor_based_imageencoding_v5.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491993#comment-16491993 ] ASF subversion and git services commented on PDFBOX-4184: - Commit 1832329 from [~tilman] in branch 'pdfbox/trunk' [ https://svn.apache.org/r1832329 ] PDFBOX-4184: remote loading of test file 032163.jpg / http://www.crh.noaa.gov/Image/gjt/images/ImageGallery/Uncompahgre_small.jpg > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.10, 3.0.0 PDFBox > > Attachments: LoadGovdocs.java, > lossless_predictor_based_imageencoding.patch, > lossless_predictor_based_imageencoding_v2.patch, > lossless_predictor_based_imageencoding_v3.patch, > lossless_predictor_based_imageencoding_v4.patch, > lossless_predictor_based_imageencoding_v5.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16475875#comment-16475875 ] Tilman Hausherr commented on PDFBOX-4184: - I found the source of the image: http://www.crh.noaa.gov/Image/gjt/images/ImageGallery/Uncompahgre_small.jpg > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.10, 3.0.0 PDFBox > > Attachments: LoadGovdocs.java, > lossless_predictor_based_imageencoding.patch, > lossless_predictor_based_imageencoding_v2.patch, > lossless_predictor_based_imageencoding_v3.patch, > lossless_predictor_based_imageencoding_v4.patch, > lossless_predictor_based_imageencoding_v5.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16475681#comment-16475681 ] Emmeran Seehuber commented on PDFBOX-4184: -- I did a batch test using the govdocs zips 001 - 057 (I downloaded them over night) and found another bug in the PNG Average... how hard can it be to type such a simple formula?! Now all seems to be fine. [^lossless_predictor_based_imageencoding_v5.patch] I've added the test image (032.zip/163.jpg) and the matching test. I also included the tool you built in the patch and extended it to allow using a directory with downloaded files for the test. > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.10, 3.0.0 PDFBox > > Attachments: LoadGovdocs.java, > lossless_predictor_based_imageencoding.patch, > lossless_predictor_based_imageencoding_v2.patch, > lossless_predictor_based_imageencoding_v3.patch, > lossless_predictor_based_imageencoding_v4.patch, > lossless_predictor_based_imageencoding_v5.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16475470#comment-16475470 ] Emmeran Seehuber commented on PDFBOX-4184: -- Just got an idea in the shower ... {code:java} Benchmark (zipLevel) Mode CntScore Error Units LosslessFactoryBenchmark.predictor 3 thrpt5 168.186 ± 1.884 ops/s LosslessFactoryBenchmark.predictor 6 thrpt5 109.865 ± 2.022 ops/s LosslessFactoryBenchmark.predictor 9 thrpt5 20.382 ± 0.432 ops/s LosslessFactoryBenchmark.predictorBig3 thrpt52.617 ± 0.047 ops/s LosslessFactoryBenchmark.predictorBig6 thrpt52.211 ± 0.029 ops/s LosslessFactoryBenchmark.predictorBig9 thrpt51.627 ± 0.039 ops/s LosslessFactoryBenchmark.predictorBigBytes 3 thrpt52.219 ± 0.055 ops/s LosslessFactoryBenchmark.predictorBigBytes 6 thrpt51.880 ± 0.057 ops/s LosslessFactoryBenchmark.predictorBigBytes 9 thrpt51.454 ± 0.025 ops/s LosslessFactoryBenchmark.rgbOnly 3 thrpt5 247.996 ± 7.758 ops/s LosslessFactoryBenchmark.rgbOnly 6 thrpt5 128.242 ± 3.246 ops/s LosslessFactoryBenchmark.rgbOnly 9 thrpt5 14.259 ± 0.339 ops/s LosslessFactoryBenchmark.rgbOnlyBig 3 thrpt58.113 ± 0.290 ops/s LosslessFactoryBenchmark.rgbOnlyBig 6 thrpt53.317 ± 0.059 ops/s LosslessFactoryBenchmark.rgbOnlyBig 9 thrpt51.308 ± 0.025 ops/s LosslessFactoryBenchmark.rgbOnlyBigBytes 3 thrpt53.506 ± 0.066 ops/s LosslessFactoryBenchmark.rgbOnlyBigBytes 6 thrpt52.149 ± 0.070 ops/s LosslessFactoryBenchmark.rgbOnlyBigBytes 9 thrpt51.081 ± 0.019 ops/s {code} Now the predictor is always faster at zip level 9. It is still slower at the other zip levels, but not that much. [^lossless_predictor_based_imageencoding_v4.patch] I would be fine with this, so no api change would be needed. > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.10, 3.0.0 PDFBox > > Attachments: LoadGovdocs.java, > lossless_predictor_based_imageencoding.patch, > lossless_predictor_based_imageencoding_v2.patch, > lossless_predictor_based_imageencoding_v3.patch, > lossless_predictor_based_imageencoding_v4.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16475382#comment-16475382 ] Emmeran Seehuber commented on PDFBOX-4184: -- I've found the bug in my code by changing the sum method (while trying to optimize the code). The PNG Average function was simply plain wrong implemented, just got the formula wrong... The bug triggered with your sample image after changing estCompressSum(). I've also implemented a benchmark. But the benchmark is likely only for trunk, as JMH now needs Java 1.7. So to even compile the benchmark you need JDK 1.7+ - I tested also with JDK 10 ... I thought that the predictor would be faster then the "simple" way. But no, it is not and at the moment I don't have any future idea what I could do to optimize it future ... {code:java} Benchmark (zipLevel) Mode CntScore Error Units LosslessFactoryBenchmark.predictor 3 thrpt5 114.055 ± 10.120 ops/s LosslessFactoryBenchmark.predictor 6 thrpt5 79.463 ± 15.921 ops/s LosslessFactoryBenchmark.predictor 9 thrpt5 16.542 ± 7.951 ops/s LosslessFactoryBenchmark.predictorBig3 thrpt51.355 ± 0.585 ops/s LosslessFactoryBenchmark.predictorBig6 thrpt51.360 ± 0.045 ops/s LosslessFactoryBenchmark.predictorBig9 thrpt51.135 ± 0.021 ops/s LosslessFactoryBenchmark.predictorBigBytes 3 thrpt51.420 ± 0.028 ops/s LosslessFactoryBenchmark.predictorBigBytes 6 thrpt51.286 ± 0.052 ops/s LosslessFactoryBenchmark.predictorBigBytes 9 thrpt51.073 ± 0.014 ops/s LosslessFactoryBenchmark.rgbOnly 3 thrpt5 248.467 ± 8.199 ops/s LosslessFactoryBenchmark.rgbOnly 6 thrpt5 126.354 ± 9.548 ops/s LosslessFactoryBenchmark.rgbOnly 9 thrpt5 13.954 ± 1.092 ops/s LosslessFactoryBenchmark.rgbOnlyBig 3 thrpt57.939 ± 0.395 ops/s LosslessFactoryBenchmark.rgbOnlyBig 6 thrpt53.278 ± 0.038 ops/s LosslessFactoryBenchmark.rgbOnlyBig 9 thrpt51.248 ± 0.080 ops/s LosslessFactoryBenchmark.rgbOnlyBigBytes 3 thrpt53.380 ± 0.229 ops/s LosslessFactoryBenchmark.rgbOnlyBigBytes 6 thrpt52.108 ± 0.064 ops/s LosslessFactoryBenchmark.rgbOnlyBigBytes 9 thrpt51.064 ± 0.023 ops/s {code} I've tested both your "old" rgbOnly code and the predictor using the zip levels 3, 6 and 9. The images used are your sample image and that image scaled up 10x to a INT Bitmap (Big) and to a 3BYTE Bitmap (BigBytes). Only when compressing with maximum zip level the predictor is on par with rgbOnly. So in all other cases it's always slower. But the big image has a huge difference in compression size on zip level 9: 58077 (Predictor) vs. 167808 (RGB Only). So I'm not sure if it would not be better to allow the user to choose between simple encoding and predictor encoding, as there is a tradeoff between speed and size. What do you think about the API? [^lossless_predictor_based_imageencoding_v3.patch] I've not yet tested against the govdocs, I'll try to let this test run in the background today. For me this patch is still WIP, not ready to be comited. > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.10, 3.0.0 PDFBox > > Attachments: LoadGovdocs.java, > lossless_predictor_based_imageencoding.patch, > lossless_predictor_based_imageencoding_v2.patch, > lossless_predictor_based_imageencoding_v3.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded.
[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16473037#comment-16473037 ] Tilman Hausherr commented on PDFBOX-4184: - If you find a bug in your code, please create a failing test. Ideally this would include an image that fails. Usually these images are from the US government but we need to be sure, e.g. by doing a reverse search on google images. > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.10, 3.0.0 PDFBox > > Attachments: LoadGovdocs.java, > lossless_predictor_based_imageencoding.patch, > lossless_predictor_based_imageencoding_v2.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16473026#comment-16473026 ] Tilman Hausherr commented on PDFBOX-4184: - CPU power shouldn't be a problem unless you're lucky and have a very fast connection. Most files are skipped because they're not images. Btw let it run with jai_imageio.jar so that it will also process .tif files. Some files are broken so they will be skipped. > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.10, 3.0.0 PDFBox > > Attachments: LoadGovdocs.java, > lossless_predictor_based_imageencoding.patch, > lossless_predictor_based_imageencoding_v2.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16473018#comment-16473018 ] Emmeran Seehuber commented on PDFBOX-4184: -- The Govdocs corpus is a little bit big ... I'll let those tests run in the office on Monday, as my iMac there is faster to process that many documents... Regarding directly using the DeflaterOutputStream: I do this to be able to *stream* compress the image data, so that the image data is compressed row by row. This leads to less memory used while compressing and better CPU cache usage (as the data of one row is still in cache when it's fed to zip, in opposite to first encode the image in one big byte buffer (which means doubling the needed memory for the image) and then compressing it at the end. Of course when constructing a DeflateOutputStream it should use the Filter.SYSPROP_DEFLATELEVEL setting. I've refactored the code for this into its own method in Filter.getCompressionLevel(). See the updated patch. [^lossless_predictor_based_imageencoding_v2.patch] - This is as still work in progress, not to be commited yet (need to analyze those image mismatches in the govdocs first) > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.10, 3.0.0 PDFBox > > Attachments: LoadGovdocs.java, > lossless_predictor_based_imageencoding.patch, > lossless_predictor_based_imageencoding_v2.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16472619#comment-16472619 ] Tilman Hausherr commented on PDFBOX-4184: - The tool is attached here, it's the file [^LoadGovdocs.java]. Test dependencies are fine regardless of the license. > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.10, 3.0.0 PDFBox > > Attachments: LoadGovdocs.java, > lossless_predictor_based_imageencoding.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16472615#comment-16472615 ] Emmeran Seehuber commented on PDFBOX-4184: -- No hurry, I won't have much time before Sunday to look into it. I also want to do some benchmarking/performance tuning first, to get exact numbers what this changes does in terms of performance/output size. Is jmh ok as test dependency for pdfbox? I'm using the Github mirror and git to track my changes. I suspect that the mirror takes a little to catch up with Subversion? At least at the moment your tool has not yet landed on Github. > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.10, 3.0.0 PDFBox > > Attachments: LoadGovdocs.java, > lossless_predictor_based_imageencoding.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16472558#comment-16472558 ] Tilman Hausherr commented on PDFBOX-4184: - I forgot to mention, we're planning a release soon, I prefer to wait until after the release. > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.10, 3.0.0 PDFBox > > Attachments: LoadGovdocs.java, > lossless_predictor_based_imageencoding.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16472478#comment-16472478 ] Tilman Hausherr commented on PDFBOX-4184: - Please run the tool I just uploaded... I get a few "hits": 001/001229.png: images not equal 001/001230.png: images not equal and also some jpg images. Without the change, this doesn't happen. I suspect that the differences are minor, but IMHO there shouldn't be any at all... > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.10, 3.0.0 PDFBox > > Attachments: LoadGovdocs.java, > lossless_predictor_based_imageencoding.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16471034#comment-16471034 ] Tilman Hausherr commented on PDFBOX-4184: - Re icc profile, we have "ISOcoated_v2_300_bas.icc", see usage in PDDeviceCMYK. Re CLA, probably yes… You're using DeflaterOutputStream directly, would it be longer if you use {{FilterFactory.INSTANCE.getFilter(COSName.FLATE_DECODE);}} as done elsewhere? Re comparing CMYK - why not get the RGB values, and compare that they have at most a difference of 2 ? > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.10, 3.0.0 PDFBox > > Attachments: lossless_predictor_based_imageencoding.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16470976#comment-16470976 ] Emmeran Seehuber commented on PDFBOX-4184: -- As the topic image encoding comes up again in OpenHTMLToPDF (see [https://github.com/danfickle/openhtmltopdf/issues/212)] I reworked my 16 bit predictor based encoding I had laying around and extended it to support most BufferedImage formats and CMYK images. I originally did this for using with iText some time ago. See [^lossless_predictor_based_imageencoding.patch] It implements image encoding with a PNG predictor. Depending on the image to encoding this results in massive space savings compared to simple image encoding without a predictor. Also image with extended color profiles work. To test the CMYK support I need a CMYK profile. Any one would do. For a quick test I used a profile from here: [http://download.adobe.com/pub/adobe/iccprofiles/win/AdobeICCProfiles.zip] I have no idea if we are allowed to include this profile in the test resources. It's missing in the patch, you must copy it from the download archive. I think we might also be allowed to use a profile from [http://www.eci.org/en/downloads]. But they did not publish any license information :( I did not do any performance tests yet, but the predictor encoding should be faster then the existing encoding, as it tries to be more friendly to the cache (e.g. writing a row directly into a zip stream). Please review this patch. Do I need to sign a CLA? > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.10, 3.0.0 PDFBox > > Attachments: lossless_predictor_based_imageencoding.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434485#comment-16434485 ] Tilman Hausherr commented on PDFBOX-4184: - If you'd like, I'd take an improved patch against the current version of LosslessFactory... Something that goes a new path if the image is 16 bit and the raster type that is supported by your code (interleaved). I.e. a combination of your existing patch and the code of your comment. > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.10, 3.0.0 PDFBox > > Attachments: pdfbox_support_16bit_image_write.patch, > png16-arrow-bad-no-smask.pdf, png16-arrow-bad.pdf, > png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430593#comment-16430593 ] ASF subversion and git services commented on PDFBOX-4184: - Commit 1828725 from [~tilman] in branch 'pdfbox/branches/2.0' [ https://svn.apache.org/r1828725 ] PDFBOX-4184: add comment > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.10, 3.0.0 PDFBox > > Attachments: pdfbox_support_16bit_image_write.patch, > png16-arrow-bad-no-smask.pdf, png16-arrow-bad.pdf, > png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430592#comment-16430592 ] ASF subversion and git services commented on PDFBOX-4184: - Commit 1828724 from [~tilman] in branch 'pdfbox/trunk' [ https://svn.apache.org/r1828724 ] PDFBOX-4184: complete comment > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.10, 3.0.0 PDFBox > > Attachments: pdfbox_support_16bit_image_write.patch, > png16-arrow-bad-no-smask.pdf, png16-arrow-bad.pdf, > png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430535#comment-16430535 ] ASF subversion and git services commented on PDFBOX-4184: - Commit 1828715 from [~tilman] in branch 'pdfbox/trunk' [ https://svn.apache.org/r1828715 ] PDFBOX-4184: divide 16 bit alpha values by 256 > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.10, 3.0.0 PDFBox > > Attachments: pdfbox_support_16bit_image_write.patch, > png16-arrow-bad-no-smask.pdf, png16-arrow-bad.pdf, > png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430536#comment-16430536 ] ASF subversion and git services commented on PDFBOX-4184: - Commit 1828716 from [~tilman] in branch 'pdfbox/branches/2.0' [ https://svn.apache.org/r1828716 ] PDFBOX-4184: divide 16 bit alpha values by 256 > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.10, 3.0.0 PDFBox > > Attachments: pdfbox_support_16bit_image_write.patch, > png16-arrow-bad-no-smask.pdf, png16-arrow-bad.pdf, > png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430483#comment-16430483 ] Tilman Hausherr commented on PDFBOX-4184: - Yes this could be a way to pass many options... but I wonder if we should change the image creation API again. For now I'd prefer to just add features to the existing API. I think I understand why I wasn't able to reproduce the problem with self-generated files. Maybe the files had similar LSB and HSB, but your file had them very different so one would notice if only one byte was used. There's no way we'd use any code from old itext versions, due to the GPL license. > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.10, 3.0.0 PDFBox > > Attachments: pdfbox_support_16bit_image_write.patch, > png16-arrow-bad-no-smask.pdf, png16-arrow-bad.pdf, > png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16429822#comment-16429822 ] Emmeran Seehuber commented on PDFBOX-4184: -- Oh yes, you are right. And I totally overlooked that the getRGB() used always converts into sRGB ... I already do colorspace tagging in [https://github.com/rototor/pdfbox-graphics2d/blob/master/src/main/java/de/rototor/pdfbox/graphics2d/PdfBoxGraphics2DLosslessImageEncoder.java] {code:java} /* * Do we have a color profile we need to embed? */ if (bi.getColorModel().getColorSpace() instanceof ICC_ColorSpace) { ICC_Profile profile = ((ICC_ColorSpace) bi.getColorModel().getColorSpace()).getProfile(); /* * Only tag a profile if it is not the default sRGB profile. */ if (((ICC_ColorSpace) bi.getColorModel().getColorSpace()).getProfile() != ICC_Profile .getInstance(ColorSpace.CS_sRGB)) { SoftReference pdProfileRef = profileMap.get(new ProfileSoftReference(profile)); PDICCBased pdProfile = pdProfileRef == null ? null : pdProfileRef.get(); if (pdProfile == null) { pdProfile = new PDICCBased(document); OutputStream outputStream = pdProfile.getPDStream() .createOutputStream(COSName.FLATE_DECODE); outputStream.write(profile.getData()); outputStream.close(); pdProfile.getPDStream().getCOSObject().setInt(COSName.N, profile.getNumComponents()); profileMap.put(new ProfileSoftReference(profile), new SoftReference(pdProfile)); } imageXObject.setColorSpace(pdProfile); } } {code} which is of course stupid if the color always get converted to sRGB Its not only stupid, but also wrong, because it causes color shifts ... argh So at the moment PDFBox is not usably for any "real" prepress stuff, as the sRGB colorspace is way to small. (At the moment i still use iText 2.1 for my prepress stuff, but I want to get rid of it in the long term) sRGB as used at the moment in the LosslessFactory is fine for web / display only PDFs. But for prepress not so much Hmm, I should really try to find some time to implement a "ImageEncoderFactory" and implement all different encodings correctly (which are mostly 8-bit and 16-bit images, everything with less bit depth is likely fine with getRGB() as now - and of course not only encode RGB but also encode CMYK...). (No, I wont use any code of iText; They have tons of special hacks to e.g. reuse already encoded PNG data etc which I think is not worth the effort and way to complex / to much code). I have a factory with an API like this in mind: (everything with method chaining) {code:java} ImageEncoder myEncoder = ImageEncoderFactory.newBuilder(pdDocument) // Lossy / JPEG quality 0.9 .jpeg(0.9) // or lossless .lossless() // Lossless Compression the fast way with a not so great compression ratio like at the moment .fastCompression() // Lossless Compression the slow way with maximum possible compression ratio (using predictors etc.) .slowCompression() // Set conversion to sRGB 8-Bit. Default would be to always use the color space / ICC Profile of the image. .toSRGB() // and finally .build(); PDImage pdImg = myEncoder.encode(img); PDImage pdImg2 = myEncoder.encode(img2); // ... reuse myEncoder as much as possible, but not multithreaded{code} What do you think? > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.10, 3.0.0 PDFBox > > Attachments: pdfbox_support_16bit_image_write.patch, >
[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16429719#comment-16429719 ] Tilman Hausherr commented on PDFBOX-4184: - I wonder if the patch code is correct - it takes the raster values directly without doing any conversions for ICC colorspaces. > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.10, 3.0.0 PDFBox > > Attachments: pdfbox_support_16bit_image_write.patch, > png16-arrow-bad-no-smask.pdf, png16-arrow-bad.pdf, > png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16429707#comment-16429707 ] Tilman Hausherr commented on PDFBOX-4184: - I found the cause of the bug from the github issue, it is in {{createAlphaFromARGBImage}}, the line {{bos.write(pixel)}}. For 16 bit images it should be changed to {{bos.write(pixel / 256)}}. So the existing code should be changed to {code} else { bpc = 8; int dataType = alphaRaster.getDataBuffer().getDataType(); if (dataType == DataBuffer.TYPE_USHORT) { for (int pixel : pixels) { bos.write(pixel / 256); } } else { for (int pixel : pixels) { bos.write(pixel); } } } {code} Sadly this doesn't explain why I can't produce a test that fails... I did make tries with alpha values and nothing weird happened. > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.10, 3.0.0 PDFBox > > Attachments: pdfbox_support_16bit_image_write.patch, > png16-arrow-bad-no-smask.pdf, png16-arrow-bad.pdf, > png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16429702#comment-16429702 ] Tilman Hausherr commented on PDFBOX-4184: - The two last files (no smask) show that the bug is in the smask creation. The RGB images are identical visually (but different in bit size). > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.10, 3.0.0 PDFBox > > Attachments: pdfbox_support_16bit_image_write.patch, > png16-arrow-bad-no-smask.pdf, png16-arrow-bad.pdf, > png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16429420#comment-16429420 ] Tilman Hausherr commented on PDFBOX-4184: - Thanks... I'll commit this within the next few days... I managed to create such an image so we can also have a local test but I didn't manage to have a failure, i.e. a bad PDF like with the image from your issue: {code} ColorModel colorModel = new ComponentColorModel(ColorSpace.getInstance(ColorSpace.CS_LINEAR_RGB), true, false, Transparency.TRANSLUCENT, DataBuffer.TYPE_USHORT); WritableRaster raster = Raster.createInterleavedRaster(DataBuffer.TYPE_USHORT, 256, 256, 4, null); BufferedImage image = new BufferedImage(colorModel, raster, false, null); for (int x = 0; x < image.getWidth(); ++x) { for (int y = 0; y < image.getHeight(); ++y) { if (x == y) { switch (x % 4) { case 0: image.setRGB(x, y, 0x); break; case 1: image.setRGB(x, y, 0xFF00FF00); break; case 2: image.setRGB(x, y, 0xFFFF); break; case 3: image.setRGB(x, y, 0x); break; } } } } PDDocument doc = new PDDocument(); PDPage page = new PDPage(); doc.addPage(page); try (PDPageContentStream cs = new PDPageContentStream(doc, page)) { cs.drawImage(LosslessFactory.createFromImage(doc, image), 0f, page.getMediaBox().getHeight() - image.getHeight()); } {code} > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.10, 3.0.0 PDFBox > > Attachments: pdfbox_support_16bit_image_write.patch > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org