[DISCUSS] GSoC Participation
Hi shall we try to participate at GSoC? Needs a mentor though. BR Maruan Sahyoun
Re: [DISCUSS] GSoC Participation
Hi, Good idea. Let's try On Wed, Jan 29, 2014 at 3:14 PM, Maruan Sahyoun sahy...@fileaffairs.dewrote: Hi shall we try to participate at GSoC? Needs a mentor though. BR Maruan Sahyoun -- *W.A.Pasan Buddhika* Contacts: Mobile : 071 366 13 58 pasanbuddhika pasanbuddhika Stay connect with me : http://www.facebook.com/pasanbuddhika http://lk.linkedin.com/in/pasanbuddhika http://twitter.com/#!/pasanbuddhika My blog : www.pasanlive.blogspot.com
Re: [DISCUSS] GSoC Participation
Hi, Maruan Sahyoun sahy...@fileaffairs.de hat am 29. Januar 2014 um 10:44 geschrieben: Hi shall we try to participate at GSoC? Needs a mentor though. That idea already came up from time to time and it didn't work for different reasons. So, to participate we need a mentor and or course at least one good idea to pe proposed. I won't act as mentor for different reasons but I'll try to help in the normal manner. IMO an appropriate idea shall not deal with pdf-specific low-level features, like linearization support, as I doubt that any possible student is familiar with the pdf-spec. So possible ideas could be: - an idea which came up some years ago, was to implement a gui-interface to bundle some/all/future tools/features of pdfbox, like printing, rendering, preflight, split, merge etc. - a high-level api to create pdfs - an advanced text extractor with table/column support BR Maruan Sahyoun BR Andreas Lehmkühler
[jira] [Commented] (PDFBOX-1669) Update the dependency on Bouncy Castle to 1.49
[ https://issues.apache.org/jira/browse/PDFBOX-1669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885602#comment-13885602 ] Andrew Phillips commented on PDFBOX-1669: - Ah, OK...like that. Thanks for clarifying, [~janstey]! Update the dependency on Bouncy Castle to 1.49 -- Key: PDFBOX-1669 URL: https://issues.apache.org/jira/browse/PDFBOX-1669 Project: PDFBox Issue Type: Improvement Affects Versions: 2.0.0 Reporter: Emmanuel Bourg Priority: Minor Attachments: pdfbox-bouncycastle-1.48-to-1.49-update.patch Bouncy Castle 1.49 has been released and breaks again the compatibility with the previous releases. The PublicKeySecurityHandler class is affected and needs a minor update. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (PDFBOX-1869) Implementation for ShadingType 1
Tilman Hausherr created PDFBOX-1869: --- Summary: Implementation for ShadingType 1 Key: PDFBOX-1869 URL: https://issues.apache.org/jira/browse/PDFBOX-1869 Project: PDFBox Issue Type: Sub-task Affects Versions: 2.0.0 Reporter: Tilman Hausherr Priority: Minor Here's an implementation for function based shading and 4 sample files. The implementation is rather simple. The real work is done in the function, see subtask. I'm using a reverse transformation of the two matrices so that getRaster() gets the pure values. The implementation works on two test images and fails on two. I believe that the cause for the two it fails is the problem I had with Type 4 and 5. I also noticed (when debugging) that for FUNSH01.pdf, my implementation always gets the same matrices despite that its 4 different areas in that page. (Don't get confused by the name asy-latticeshading.pdf, its not type 5 lattice shading) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (PDFBOX-1869) Implementation for ShadingType 1
[ https://issues.apache.org/jira/browse/PDFBOX-1869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-1869: Attachment: RGBCUBE.pdf FUNSH01.pdf asy-strokeshade.pdf asy-latticeshading.pdf Implementation for ShadingType 1 Key: PDFBOX-1869 URL: https://issues.apache.org/jira/browse/PDFBOX-1869 Project: PDFBox Issue Type: Sub-task Components: PDModel Affects Versions: 2.0.0 Reporter: Tilman Hausherr Priority: Minor Attachments: FUNSH01.pdf, RGBCUBE.pdf, Type1ShadingContext.java, Type1ShadingPaint.java, asy-latticeshading.pdf, asy-strokeshade.pdf Here's an implementation for function based shading and 4 sample files. The implementation is rather simple. The real work is done in the function, see subtask. I'm using a reverse transformation of the two matrices so that getRaster() gets the pure values. The implementation works on two test images and fails on two. I believe that the cause for the two it fails is the problem I had with Type 4 and 5. I also noticed (when debugging) that for FUNSH01.pdf, my implementation always gets the same matrices despite that its 4 different areas in that page. (Don't get confused by the name asy-latticeshading.pdf, its not type 5 lattice shading) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (PDFBOX-1870) PDFunctionType0 incorrect
Tilman Hausherr created PDFBOX-1870: --- Summary: PDFunctionType0 incorrect Key: PDFBOX-1870 URL: https://issues.apache.org/jira/browse/PDFBOX-1870 Project: PDFBox Issue Type: Sub-task Affects Versions: 2.0.0 Reporter: Tilman Hausherr Priority: Minor Type 0 (Sampled) Functions are described in 3.9.1 of the pdf spec and bascially, its cheating: there's an n-dimensional grid of values (samples) and the function shall return these values or something in-between. PDFunctionType0 has two bugs: 1) it does not do any interpolation. The function interpolate() is called several times, but only adjust values between ranges etc, not to calculate the color between 2^n samples - that part is outputValues[i] = (outputValuesPrevious[i] + outputValuesNext[i]) / 2. The spec does not tell much, only that Interpolation is used to determine output values from the nearest surrounding values in the sample table. I have done a linear/bilinear interpolation implementation for 1D/2D inputs. I did not do an interpolation implementation for 3D and higher, because its unclear whether this is actually used. Instead, I return random (!) values. 2) the sample bits are not collected correctly, the current code ignores the leftover bits when a row is done. The spec tells us Successive values are adjacent in the bit stream; there is no padding at byte boundaries. Luckily, that one is easy to correct, three lines must be moved up. Alternatively, one might use the bit-io lib I mention in PDFBOX-615. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (PDFBOX-1870) PDFunctionType0 incorrect
[ https://issues.apache.org/jira/browse/PDFBOX-1870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-1870: Attachment: PDFunctionType0.patch PDFunctionType0 incorrect - Key: PDFBOX-1870 URL: https://issues.apache.org/jira/browse/PDFBOX-1870 Project: PDFBox Issue Type: Sub-task Components: PDModel Affects Versions: 2.0.0 Reporter: Tilman Hausherr Priority: Minor Attachments: PDFunctionType0.patch Type 0 (Sampled) Functions are described in 3.9.1 of the pdf spec and bascially, its cheating: there's an n-dimensional grid of values (samples) and the function shall return these values or something in-between. PDFunctionType0 has two bugs: 1) it does not do any interpolation. The function interpolate() is called several times, but only adjust values between ranges etc, not to calculate the color between 2^n samples - that part is outputValues[i] = (outputValuesPrevious[i] + outputValuesNext[i]) / 2. The spec does not tell much, only that Interpolation is used to determine output values from the nearest surrounding values in the sample table. I have done a linear/bilinear interpolation implementation for 1D/2D inputs. I did not do an interpolation implementation for 3D and higher, because its unclear whether this is actually used. Instead, I return random (!) values. 2) the sample bits are not collected correctly, the current code ignores the leftover bits when a row is done. The spec tells us Successive values are adjacent in the bit stream; there is no padding at byte boundaries. Luckily, that one is easy to correct, three lines must be moved up. Alternatively, one might use the bit-io lib I mention in PDFBOX-615. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (PDFBOX-615) shfill operator needs implementation
[ https://issues.apache.org/jira/browse/PDFBOX-615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-615: --- Attachment: PDShadingPatternResources.patch shfill operator needs implementation Key: PDFBOX-615 URL: https://issues.apache.org/jira/browse/PDFBOX-615 Project: PDFBox Issue Type: New Feature Components: PDModel Reporter: Daniel Wilson Assignee: Daniel Wilson Attachments: Centerplan.pdf, DECAHED.pdf, GouraudShadingContext.java, GouraudTriangle.java, LATTICE1.pdf, LATTICE2.pdf, PDShadingPatternResources.patch, Type4ShadingContext.java, Type4ShadingPaint.java, Type5ShadingContext.java, Type5ShadingPaint.java, Vertex.java, axial-input-after.png, axial-input-before.png, axial-input.pdf, bugzilla843488.pdf, bugzilla843488.pdf-1.png, color_gradient.pdf, color_gradient.pdf-1.png, decahed.pdf-1.png, input.pdf, input1.png, lattice1.pdf-1.png, lattice2.pdf-1.png, parent-pom.patch, pdfbox-1.8.patch, pdfbox.patch, pslib-shading.pdf, radial-input-after.png, radial-input-before.png, radial-input.pdf, shading_pattern.pdf, shading_pattern.pdf-2.png, trityp4.pdf-1.png I have a PDF file (for which I do not yet have release permission) that uses the sh operator, equivalent to PostScript's shfill (per PDF spec 1.7 page 987). Adobe provides implementation guidance in a 78-page document at http://www.adobe.com/devnet/postscript/pdfs/TN5600.SmoothShading.pdf#17 I will be trying to add this functionality this week, but if anyone has hints, suggestions, etc. they are most certainly welcome! -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (PDFBOX-615) shfill operator needs implementation
[ https://issues.apache.org/jira/browse/PDFBOX-615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-615: --- Attachment: (was: PDShadingPatternResources.patch) shfill operator needs implementation Key: PDFBOX-615 URL: https://issues.apache.org/jira/browse/PDFBOX-615 Project: PDFBox Issue Type: New Feature Components: PDModel Reporter: Daniel Wilson Assignee: Daniel Wilson Attachments: Centerplan.pdf, DECAHED.pdf, GouraudShadingContext.java, GouraudTriangle.java, LATTICE1.pdf, LATTICE2.pdf, PDShadingPatternResources.patch, Type4ShadingContext.java, Type4ShadingPaint.java, Type5ShadingContext.java, Type5ShadingPaint.java, Vertex.java, axial-input-after.png, axial-input-before.png, axial-input.pdf, bugzilla843488.pdf, bugzilla843488.pdf-1.png, color_gradient.pdf, color_gradient.pdf-1.png, decahed.pdf-1.png, input.pdf, input1.png, lattice1.pdf-1.png, lattice2.pdf-1.png, parent-pom.patch, pdfbox-1.8.patch, pdfbox.patch, pslib-shading.pdf, radial-input-after.png, radial-input-before.png, radial-input.pdf, shading_pattern.pdf, shading_pattern.pdf-2.png, trityp4.pdf-1.png I have a PDF file (for which I do not yet have release permission) that uses the sh operator, equivalent to PostScript's shfill (per PDF spec 1.7 page 987). Adobe provides implementation guidance in a 78-page document at http://www.adobe.com/devnet/postscript/pdfs/TN5600.SmoothShading.pdf#17 I will be trying to add this functionality this week, but if anyone has hints, suggestions, etc. they are most certainly welcome! -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (PDFBOX-1869) Implementation for ShadingType 1
[ https://issues.apache.org/jira/browse/PDFBOX-1869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-1869: Attachment: asy-strokeshade.pdf-1.png asy-latticeshading.pdf-1.png rgbcube.pdf-1.png funsh01.pdf-1.png Rendered images, two of four are ok. Implementation for ShadingType 1 Key: PDFBOX-1869 URL: https://issues.apache.org/jira/browse/PDFBOX-1869 Project: PDFBox Issue Type: Sub-task Components: PDModel Affects Versions: 2.0.0 Reporter: Tilman Hausherr Priority: Minor Attachments: FUNSH01.pdf, RGBCUBE.pdf, Type1ShadingContext.java, Type1ShadingPaint.java, asy-latticeshading.pdf, asy-latticeshading.pdf-1.png, asy-strokeshade.pdf, asy-strokeshade.pdf-1.png, funsh01.pdf-1.png, rgbcube.pdf-1.png Here's an implementation for function based shading and 4 sample files. The implementation is rather simple. The real work is done in the function, see subtask. I'm using a reverse transformation of the two matrices so that getRaster() gets the pure values. The implementation works on two test images and fails on two. I believe that the cause for the two it fails is the problem I had with Type 4 and 5. I also noticed (when debugging) that for FUNSH01.pdf, my implementation always gets the same matrices despite that its 4 different areas in that page. (Don't get confused by the name asy-latticeshading.pdf, its not type 5 lattice shading) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (PDFBOX-1871) Content appears a few px higher when rasterizing PDF
Jon Wu created PDFBOX-1871: -- Summary: Content appears a few px higher when rasterizing PDF Key: PDFBOX-1871 URL: https://issues.apache.org/jira/browse/PDFBOX-1871 Project: PDFBox Issue Type: Bug Affects Versions: 1.8.3 Reporter: Jon Wu Attachments: test-text-alignment-textbox-PDF2IMG-300dpi.png, test-text-alignment-textbox-PDFBox-1.8.3-300dpi.png, test-text-alignment-textbox.pdf PDFBox seems to be off by a little bit vertically when you rasterize a PDF. This is in comparison to both Adobe's PDF library and many other PDF viewers such as Chrome's and the one in OS X. I've attached an example PDF where there's some text with a green rectangle around it. The rectangle is has about 2x as much space above it compared to below it, but in PDFBox's raster, the rectangle is closer to the top than is is to the bottom. This is obvious at 300 dpi but at 96 dpi it's hard to tell for sure. Anecdotally, I've noticed that when rendering text at about 74 dpi, PDFBox seems to be off by about 1px. I've attached both a PDFBox raster and one made with Adobe's PDF library for comparison. java -jar pdfbox-app-1.8.3.jar PDFToImage -imageType PNG -resolution 300 -color rgba test-text-alignment-textbox.pdf -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (PDFBOX-1871) Content appears a few px higher when rasterizing PDF
[ https://issues.apache.org/jira/browse/PDFBOX-1871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jon Wu updated PDFBOX-1871: --- Attachment: test-text-alignment-textbox.pdf test-text-alignment-textbox-PDFBox-1.8.3-300dpi.png test-text-alignment-textbox-PDF2IMG-300dpi.png Content appears a few px higher when rasterizing PDF Key: PDFBOX-1871 URL: https://issues.apache.org/jira/browse/PDFBOX-1871 Project: PDFBox Issue Type: Bug Affects Versions: 1.8.3 Reporter: Jon Wu Attachments: test-text-alignment-textbox-PDF2IMG-300dpi.png, test-text-alignment-textbox-PDFBox-1.8.3-300dpi.png, test-text-alignment-textbox.pdf PDFBox seems to be off by a little bit vertically when you rasterize a PDF. This is in comparison to both Adobe's PDF library and many other PDF viewers such as Chrome's and the one in OS X. I've attached an example PDF where there's some text with a green rectangle around it. The rectangle is has about 2x as much space above it compared to below it, but in PDFBox's raster, the rectangle is closer to the top than is is to the bottom. This is obvious at 300 dpi but at 96 dpi it's hard to tell for sure. Anecdotally, I've noticed that when rendering text at about 74 dpi, PDFBox seems to be off by about 1px. I've attached both a PDFBox raster and one made with Adobe's PDF library for comparison. java -jar pdfbox-app-1.8.3.jar PDFToImage -imageType PNG -resolution 300 -color rgba test-text-alignment-textbox.pdf -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (PDFBOX-1871) Content appears a few px higher when rasterizing PDF
[ https://issues.apache.org/jira/browse/PDFBOX-1871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885961#comment-13885961 ] Jon Wu commented on PDFBOX-1871: I'm actually most concerned about the text as the box is really just for debugging and the text appears to be 3-4 px too high in the 300 dpi raster and about 1px too high at around 70-90 dpi. Content appears a few px higher when rasterizing PDF Key: PDFBOX-1871 URL: https://issues.apache.org/jira/browse/PDFBOX-1871 Project: PDFBox Issue Type: Bug Affects Versions: 1.8.3 Reporter: Jon Wu Attachments: test-text-alignment-textbox-PDF2IMG-300dpi.png, test-text-alignment-textbox-PDFBox-1.8.3-300dpi.png, test-text-alignment-textbox.pdf PDFBox seems to be off by a little bit vertically when you rasterize a PDF. This is in comparison to both Adobe's PDF library and many other PDF viewers such as Chrome's and the one in OS X. I've attached an example PDF where there's some text with a green rectangle around it. The rectangle is has about 2x as much space above it compared to below it, but in PDFBox's raster, the rectangle is closer to the top than is is to the bottom. This is obvious at 300 dpi but at 96 dpi it's hard to tell for sure. Anecdotally, I've noticed that when rendering text at about 74 dpi, PDFBox seems to be off by about 1px. I've attached both a PDFBox raster and one made with Adobe's PDF library for comparison. java -jar pdfbox-app-1.8.3.jar PDFToImage -imageType PNG -resolution 300 -color rgba test-text-alignment-textbox.pdf -- This message was sent by Atlassian JIRA (v6.1.5#6160)
Re: [DISCUSS] GSoC Participation
- an idea which came up some years ago, was to implement a gui-interface to bundle some/all/future tools/features of pdfbox, like printing, rendering, preflight, split, merge etc. The AWT/Swing PDF viewer could do with rewriting. But does anyone want that? Maybe support for JavaFX? - a high-level api to create pdfs I've been thinking about this recently and have come to the conclusion that it's really hard to do well. - an advanced text extractor with table/column support The table stuff sounds a lot like Tabula? Do we really not have column support? We need that! I'll throw in some ideas too: - an interface for OCR engines to plug into the text extraction API. It could provide access to extracted images or allow badly encoded fonts to be passed to OCR one character or text run at a time. - -- John On 29 Jan 2014, at 03:20, Andreas Lehmkühler andr...@lehmi.de wrote: Hi, Maruan Sahyoun sahy...@fileaffairs.de hat am 29. Januar 2014 um 10:44 geschrieben: Hi shall we try to participate at GSoC? Needs a mentor though. That idea already came up from time to time and it didn't work for different reasons. So, to participate we need a mentor and or course at least one good idea to pe proposed. I won't act as mentor for different reasons but I'll try to help in the normal manner. IMO an appropriate idea shall not deal with pdf-specific low-level features, like linearization support, as I doubt that any possible student is familiar with the pdf-spec. So possible ideas could be: - an idea which came up some years ago, was to implement a gui-interface to bundle some/all/future tools/features of pdfbox, like printing, rendering, preflight, split, merge etc. - a high-level api to create pdfs - an advanced text extractor with table/column support BR Maruan Sahyoun BR Andreas Lehmkühler
[jira] [Commented] (PDFBOX-1870) PDFunctionType0 incorrect
[ https://issues.apache.org/jira/browse/PDFBOX-1870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886050#comment-13886050 ] John Hewson commented on PDFBOX-1870: - Excellent. I've been having some strange behaviour from tint-transform functions in color spaces and was beginning to get suspicious about Type0 (and possibly Type2) functions. PDFunctionType0 incorrect - Key: PDFBOX-1870 URL: https://issues.apache.org/jira/browse/PDFBOX-1870 Project: PDFBox Issue Type: Sub-task Components: PDModel Affects Versions: 2.0.0 Reporter: Tilman Hausherr Priority: Minor Attachments: PDFunctionType0.patch Type 0 (Sampled) Functions are described in 3.9.1 of the pdf spec and bascially, its cheating: there's an n-dimensional grid of values (samples) and the function shall return these values or something in-between. PDFunctionType0 has two bugs: 1) it does not do any interpolation. The function interpolate() is called several times, but only adjust values between ranges etc, not to calculate the color between 2^n samples - that part is outputValues[i] = (outputValuesPrevious[i] + outputValuesNext[i]) / 2. The spec does not tell much, only that Interpolation is used to determine output values from the nearest surrounding values in the sample table. I have done a linear/bilinear interpolation implementation for 1D/2D inputs. I did not do an interpolation implementation for 3D and higher, because its unclear whether this is actually used. Instead, I return random (!) values. 2) the sample bits are not collected correctly, the current code ignores the leftover bits when a row is done. The spec tells us Successive values are adjacent in the bit stream; there is no padding at byte boundaries. Luckily, that one is easy to correct, three lines must be moved up. Alternatively, one might use the bit-io lib I mention in PDFBOX-615. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (PDFBOX-1870) PDFunctionType0 incorrect
[ https://issues.apache.org/jira/browse/PDFBOX-1870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886054#comment-13886054 ] John Hewson commented on PDFBOX-1870: - I should probably add that color spaces are using 3D and 4D functions, so we should take a look at implementing them... PDFunctionType0 incorrect - Key: PDFBOX-1870 URL: https://issues.apache.org/jira/browse/PDFBOX-1870 Project: PDFBox Issue Type: Sub-task Components: PDModel Affects Versions: 2.0.0 Reporter: Tilman Hausherr Priority: Minor Attachments: PDFunctionType0.patch Type 0 (Sampled) Functions are described in 3.9.1 of the pdf spec and bascially, its cheating: there's an n-dimensional grid of values (samples) and the function shall return these values or something in-between. PDFunctionType0 has two bugs: 1) it does not do any interpolation. The function interpolate() is called several times, but only adjust values between ranges etc, not to calculate the color between 2^n samples - that part is outputValues[i] = (outputValuesPrevious[i] + outputValuesNext[i]) / 2. The spec does not tell much, only that Interpolation is used to determine output values from the nearest surrounding values in the sample table. I have done a linear/bilinear interpolation implementation for 1D/2D inputs. I did not do an interpolation implementation for 3D and higher, because its unclear whether this is actually used. Instead, I return random (!) values. 2) the sample bits are not collected correctly, the current code ignores the leftover bits when a row is done. The spec tells us Successive values are adjacent in the bit stream; there is no padding at byte boundaries. Luckily, that one is easy to correct, three lines must be moved up. Alternatively, one might use the bit-io lib I mention in PDFBOX-615. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
Re: [DISCUSS] GSoC Participation
IMHO a task for GSoC should be non-critical, localized, and not a user interface. A non-critical is one where PDFBOX development can continue without relying on the project result. A localized project is one that can be incorporated into the code base with few changes to the base. This will limit the effort required to learn about the system into which the effort will fit. A user-interface implements an interactive window or an API. I have low expectations of the capabilities of students for doing good designs in these areas. So I looked through JIRA for open projects meeting the above. Since I am not all that familiar with PDFBOX, some of my suggestions may be laughable and surely I have missed some. Nonetheless, here's what I found: PDFBOX-553 writing pdf file in Japanese, garbled PDFBOX-570 Windings font recognition + spacing issue PDFBOX-605 Better support for Type0 fonts PDFBOX-678 Support missing Text Rendering Modes when rendering a PDF PDFBOX-870 PDF-To-IMAGE output is not anti-aliased PDFBOX-1094 Pattern colorspace support PDFBOX-1594 Add support for AES256 Encryption (see also PDFBOX-1450 document how to encrypt with AES 256 ) PDFBOX-1734 ImageIoUtil.WriteImage doesn't work with tiff images PDFBOX-1843 Find a way to test PDFToImage From: John Hewson j...@jahewson.com To: dev@pdfbox.apache.org dev@pdfbox.apache.org Sent: Wednesday, January 29, 2014 6:38 PM Subject: Re: [DISCUSS] GSoC Participation - an idea which came up some years ago, was to implement a gui-interface to bundle some/all/future tools/features of pdfbox, like printing, rendering, preflight, split, merge etc. The AWT/Swing PDF viewer could do with rewriting. But does anyone want that? Maybe support for JavaFX? - a high-level api to create pdfs I've been thinking about this recently and have come to the conclusion that it's really hard to do well. - an advanced text extractor with table/column support The table stuff sounds a lot like Tabula? Do we really not have column support? We need that! I'll throw in some ideas too: - an interface for OCR engines to plug into the text extraction API. It could provide access to extracted images or allow badly encoded fonts to be passed to OCR one character or text run at a time. - -- John On 29 Jan 2014, at 03:20, Andreas Lehmkühler andr...@lehmi.de wrote: Hi, Maruan Sahyoun sahy...@fileaffairs.de hat am 29. Januar 2014 um 10:44 geschrieben: Hi shall we try to participate at GSoC? Needs a mentor though. That idea already came up from time to time and it didn't work for different reasons. So, to participate we need a mentor and or course at least one good idea to pe proposed. I won't act as mentor for different reasons but I'll try to help in the normal manner. IMO an appropriate idea shall not deal with pdf-specific low-level features, like linearization support, as I doubt that any possible student is familiar with the pdf-spec. So possible ideas could be: - an idea which came up some years ago, was to implement a gui-interface to bundle some/all/future tools/features of pdfbox, like printing, rendering, preflight, split, merge etc. - a high-level api to create pdfs - an advanced text extractor with table/column support BR Maruan Sahyoun BR Andreas Lehmkühler
[jira] [Commented] (PDFBOX-615) shfill operator needs implementation
[ https://issues.apache.org/jira/browse/PDFBOX-615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886205#comment-13886205 ] John Hewson commented on PDFBOX-615: I've recently discovered {{ImageInputStream}} in the package {{javax.imageio.stream}}. Even though it's part of ImageIO it's actually a general-purpose bit stream. You can probably replace your bit-io dependency with it if you want? shfill operator needs implementation Key: PDFBOX-615 URL: https://issues.apache.org/jira/browse/PDFBOX-615 Project: PDFBox Issue Type: New Feature Components: PDModel Reporter: Daniel Wilson Assignee: Daniel Wilson Attachments: Centerplan.pdf, DECAHED.pdf, GouraudShadingContext.java, GouraudTriangle.java, LATTICE1.pdf, LATTICE2.pdf, PDShadingPatternResources.patch, Type4ShadingContext.java, Type4ShadingPaint.java, Type5ShadingContext.java, Type5ShadingPaint.java, Vertex.java, axial-input-after.png, axial-input-before.png, axial-input.pdf, bugzilla843488.pdf, bugzilla843488.pdf-1.png, color_gradient.pdf, color_gradient.pdf-1.png, decahed.pdf-1.png, input.pdf, input1.png, lattice1.pdf-1.png, lattice2.pdf-1.png, parent-pom.patch, pdfbox-1.8.patch, pdfbox.patch, pslib-shading.pdf, radial-input-after.png, radial-input-before.png, radial-input.pdf, shading_pattern.pdf, shading_pattern.pdf-2.png, trityp4.pdf-1.png I have a PDF file (for which I do not yet have release permission) that uses the sh operator, equivalent to PostScript's shfill (per PDF spec 1.7 page 987). Adobe provides implementation guidance in a 78-page document at http://www.adobe.com/devnet/postscript/pdfs/TN5600.SmoothShading.pdf#17 I will be trying to add this functionality this week, but if anyone has hints, suggestions, etc. they are most certainly welcome! -- This message was sent by Atlassian JIRA (v6.1.5#6160)
Re: [DISCUSS] GSoC Participation
I think the idea needs to be considerably more exciting to attract students - nobody want’s to fix the bugs that even we don’t want to fix! There are some interesting users of PDFBox, see http://pdfliberation.wordpress.com/ for some possible ideas… lots of people using OCR there too. PDFBOX-1594 Add support for AES256 Encryption Seems like a reasonable project. -- John On 29 Jan 2014, at 17:28, Fred Hansen zweibie...@yahoo.com wrote: IMHO a task for GSoC should be non-critical, localized, and not a user interface. A non-critical is one where PDFBOX development can continue without relying on the project result. A localized project is one that can be incorporated into the code base with few changes to the base. This will limit the effort required to learn about the system into which the effort will fit. A user-interface implements an interactive window or an API. I have low expectations of the capabilities of students for doing good designs in these areas. So I looked through JIRA for open projects meeting the above. Since I am not all that familiar with PDFBOX, some of my suggestions may be laughable and surely I have missed some. Nonetheless, here's what I found: PDFBOX-553 writing pdf file in Japanese, garbled PDFBOX-570 Windings font recognition + spacing issue PDFBOX-605 Better support for Type0 fonts PDFBOX-678 Support missing Text Rendering Modes when rendering a PDF PDFBOX-870 PDF-To-IMAGE output is not anti-aliased PDFBOX-1094 Pattern colorspace support PDFBOX-1594 Add support for AES256 Encryption (see also PDFBOX-1450 document how to encrypt with AES 256 ) PDFBOX-1734 ImageIoUtil.WriteImage doesn't work with tiff images PDFBOX-1843 Find a way to test PDFToImage From: John Hewson j...@jahewson.com To: dev@pdfbox.apache.org dev@pdfbox.apache.org Sent: Wednesday, January 29, 2014 6:38 PM Subject: Re: [DISCUSS] GSoC Participation - an idea which came up some years ago, was to implement a gui-interface to bundle some/all/future tools/features of pdfbox, like printing, rendering, preflight, split, merge etc. The AWT/Swing PDF viewer could do with rewriting. But does anyone want that? Maybe support for JavaFX? - a high-level api to create pdfs I've been thinking about this recently and have come to the conclusion that it's really hard to do well. - an advanced text extractor with table/column support The table stuff sounds a lot like Tabula? Do we really not have column support? We need that! I'll throw in some ideas too: - an interface for OCR engines to plug into the text extraction API. It could provide access to extracted images or allow badly encoded fonts to be passed to OCR one character or text run at a time. - -- John On 29 Jan 2014, at 03:20, Andreas Lehmkühler andr...@lehmi.de wrote: Hi, Maruan Sahyoun sahy...@fileaffairs.de hat am 29. Januar 2014 um 10:44 geschrieben: Hi shall we try to participate at GSoC? Needs a mentor though. That idea already came up from time to time and it didn't work for different reasons. So, to participate we need a mentor and or course at least one good idea to pe proposed. I won't act as mentor for different reasons but I'll try to help in the normal manner. IMO an appropriate idea shall not deal with pdf-specific low-level features, like linearization support, as I doubt that any possible student is familiar with the pdf-spec. So possible ideas could be: - an idea which came up some years ago, was to implement a gui-interface to bundle some/all/future tools/features of pdfbox, like printing, rendering, preflight, split, merge etc. - a high-level api to create pdfs - an advanced text extractor with table/column support BR Maruan Sahyoun BR Andreas Lehmkühler
[jira] [Updated] (PDFBOX-1847) TSA Time Signature
[ https://issues.apache.org/jira/browse/PDFBOX-1847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vakhtang koroghlishvili updated PDFBOX-1847: Affects Version/s: 2.0.0 TSA Time Signature -- Key: PDFBOX-1847 URL: https://issues.apache.org/jira/browse/PDFBOX-1847 Project: PDFBox Issue Type: Improvement Components: Signing Affects Versions: 1.8.4, 2.0.0 Reporter: vakhtang koroghlishvili Attachments: CreateSignature-updated.java.patch, TSATimeSignature.patch, resultOfSigning.jpg When we was signing document, we was using time from our time. For more security we can use Time Stamp server. Trusted timestamping is the process of securely keeping track of the creation and modification time of a document. Security here means that no one — not even the owner of the document — should be able to change it once it has been recorded provided that the timestamper's integrity is never compromised.(wiki) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (PDFBOX-1848) Time Stamp Document Level Sigature
[ https://issues.apache.org/jira/browse/PDFBOX-1848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vakhtang koroghlishvili updated PDFBOX-1848: Affects Version/s: 2.0.0 Time Stamp Document Level Sigature -- Key: PDFBOX-1848 URL: https://issues.apache.org/jira/browse/PDFBOX-1848 Project: PDFBox Issue Type: Improvement Components: Signing Affects Versions: 1.8.4, 2.0.0 Reporter: vakhtang koroghlishvili Attachments: CreateTSASignature.java.patch, TSA-SIG-LOOKS-LIKE-THIS.png We need TSA Document Level signature modulo too! At the moment we sign document with our certificate. But... sometimes we need to sign document with TSA too. This is important part of signing. Sometimes this is very very very important- for instance when we will implement PAdES 4 profile this module will be essential. without that Document Secure Store will not work :) I'm working on this improvement. I'will finish this soon. It's almost done. I only must add some java docs, and might be I change architect design and etc.. So, please assign this it to me :) I will upload patch as soon as possible :) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (PDFBOX-615) shfill operator needs implementation
[ https://issues.apache.org/jira/browse/PDFBOX-615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886332#comment-13886332 ] Tilman Hausherr commented on PDFBOX-615: Great idea! The readBits() function seems to do all I need. I'll test this today or tomorrow. shfill operator needs implementation Key: PDFBOX-615 URL: https://issues.apache.org/jira/browse/PDFBOX-615 Project: PDFBox Issue Type: New Feature Components: PDModel Reporter: Daniel Wilson Assignee: Daniel Wilson Attachments: Centerplan.pdf, DECAHED.pdf, GouraudShadingContext.java, GouraudTriangle.java, LATTICE1.pdf, LATTICE2.pdf, PDShadingPatternResources.patch, Type4ShadingContext.java, Type4ShadingPaint.java, Type5ShadingContext.java, Type5ShadingPaint.java, Vertex.java, axial-input-after.png, axial-input-before.png, axial-input.pdf, bugzilla843488.pdf, bugzilla843488.pdf-1.png, color_gradient.pdf, color_gradient.pdf-1.png, decahed.pdf-1.png, input.pdf, input1.png, lattice1.pdf-1.png, lattice2.pdf-1.png, parent-pom.patch, pdfbox-1.8.patch, pdfbox.patch, pslib-shading.pdf, radial-input-after.png, radial-input-before.png, radial-input.pdf, shading_pattern.pdf, shading_pattern.pdf-2.png, trityp4.pdf-1.png I have a PDF file (for which I do not yet have release permission) that uses the sh operator, equivalent to PostScript's shfill (per PDF spec 1.7 page 987). Adobe provides implementation guidance in a 78-page document at http://www.adobe.com/devnet/postscript/pdfs/TN5600.SmoothShading.pdf#17 I will be trying to add this functionality this week, but if anyone has hints, suggestions, etc. they are most certainly welcome! -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (PDFBOX-615) shfill operator needs implementation
[ https://issues.apache.org/jira/browse/PDFBOX-615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-615: --- Attachment: (was: Type4ShadingContext.java) shfill operator needs implementation Key: PDFBOX-615 URL: https://issues.apache.org/jira/browse/PDFBOX-615 Project: PDFBox Issue Type: New Feature Components: PDModel Reporter: Daniel Wilson Assignee: Daniel Wilson Attachments: Centerplan.pdf, DECAHED.pdf, GouraudTriangle.java, LATTICE1.pdf, LATTICE2.pdf, PDShadingPatternResources.patch, Type4ShadingPaint.java, Type5ShadingPaint.java, Vertex.java, axial-input-after.png, axial-input-before.png, axial-input.pdf, bugzilla843488.pdf, bugzilla843488.pdf-1.png, color_gradient.pdf, color_gradient.pdf-1.png, decahed.pdf-1.png, input.pdf, input1.png, lattice1.pdf-1.png, lattice2.pdf-1.png, pdfbox-1.8.patch, pdfbox.patch, pslib-shading.pdf, radial-input-after.png, radial-input-before.png, radial-input.pdf, shading_pattern.pdf, shading_pattern.pdf-2.png, trityp4.pdf-1.png I have a PDF file (for which I do not yet have release permission) that uses the sh operator, equivalent to PostScript's shfill (per PDF spec 1.7 page 987). Adobe provides implementation guidance in a 78-page document at http://www.adobe.com/devnet/postscript/pdfs/TN5600.SmoothShading.pdf#17 I will be trying to add this functionality this week, but if anyone has hints, suggestions, etc. they are most certainly welcome! -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (PDFBOX-615) shfill operator needs implementation
[ https://issues.apache.org/jira/browse/PDFBOX-615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-615: --- Attachment: (was: GouraudShadingContext.java) shfill operator needs implementation Key: PDFBOX-615 URL: https://issues.apache.org/jira/browse/PDFBOX-615 Project: PDFBox Issue Type: New Feature Components: PDModel Reporter: Daniel Wilson Assignee: Daniel Wilson Attachments: Centerplan.pdf, DECAHED.pdf, GouraudTriangle.java, LATTICE1.pdf, LATTICE2.pdf, PDShadingPatternResources.patch, Type4ShadingPaint.java, Type5ShadingPaint.java, Vertex.java, axial-input-after.png, axial-input-before.png, axial-input.pdf, bugzilla843488.pdf, bugzilla843488.pdf-1.png, color_gradient.pdf, color_gradient.pdf-1.png, decahed.pdf-1.png, input.pdf, input1.png, lattice1.pdf-1.png, lattice2.pdf-1.png, pdfbox-1.8.patch, pdfbox.patch, pslib-shading.pdf, radial-input-after.png, radial-input-before.png, radial-input.pdf, shading_pattern.pdf, shading_pattern.pdf-2.png, trityp4.pdf-1.png I have a PDF file (for which I do not yet have release permission) that uses the sh operator, equivalent to PostScript's shfill (per PDF spec 1.7 page 987). Adobe provides implementation guidance in a 78-page document at http://www.adobe.com/devnet/postscript/pdfs/TN5600.SmoothShading.pdf#17 I will be trying to add this functionality this week, but if anyone has hints, suggestions, etc. they are most certainly welcome! -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (PDFBOX-615) shfill operator needs implementation
[ https://issues.apache.org/jira/browse/PDFBOX-615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-615: --- Attachment: (was: parent-pom.patch) shfill operator needs implementation Key: PDFBOX-615 URL: https://issues.apache.org/jira/browse/PDFBOX-615 Project: PDFBox Issue Type: New Feature Components: PDModel Reporter: Daniel Wilson Assignee: Daniel Wilson Attachments: Centerplan.pdf, DECAHED.pdf, GouraudTriangle.java, LATTICE1.pdf, LATTICE2.pdf, PDShadingPatternResources.patch, Type4ShadingPaint.java, Type5ShadingPaint.java, Vertex.java, axial-input-after.png, axial-input-before.png, axial-input.pdf, bugzilla843488.pdf, bugzilla843488.pdf-1.png, color_gradient.pdf, color_gradient.pdf-1.png, decahed.pdf-1.png, input.pdf, input1.png, lattice1.pdf-1.png, lattice2.pdf-1.png, pdfbox-1.8.patch, pdfbox.patch, pslib-shading.pdf, radial-input-after.png, radial-input-before.png, radial-input.pdf, shading_pattern.pdf, shading_pattern.pdf-2.png, trityp4.pdf-1.png I have a PDF file (for which I do not yet have release permission) that uses the sh operator, equivalent to PostScript's shfill (per PDF spec 1.7 page 987). Adobe provides implementation guidance in a 78-page document at http://www.adobe.com/devnet/postscript/pdfs/TN5600.SmoothShading.pdf#17 I will be trying to add this functionality this week, but if anyone has hints, suggestions, etc. they are most certainly welcome! -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (PDFBOX-615) shfill operator needs implementation
[ https://issues.apache.org/jira/browse/PDFBOX-615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-615: --- Attachment: (was: Type5ShadingContext.java) shfill operator needs implementation Key: PDFBOX-615 URL: https://issues.apache.org/jira/browse/PDFBOX-615 Project: PDFBox Issue Type: New Feature Components: PDModel Reporter: Daniel Wilson Assignee: Daniel Wilson Attachments: Centerplan.pdf, DECAHED.pdf, GouraudTriangle.java, LATTICE1.pdf, LATTICE2.pdf, PDShadingPatternResources.patch, Type4ShadingPaint.java, Type5ShadingPaint.java, Vertex.java, axial-input-after.png, axial-input-before.png, axial-input.pdf, bugzilla843488.pdf, bugzilla843488.pdf-1.png, color_gradient.pdf, color_gradient.pdf-1.png, decahed.pdf-1.png, input.pdf, input1.png, lattice1.pdf-1.png, lattice2.pdf-1.png, pdfbox-1.8.patch, pdfbox.patch, pslib-shading.pdf, radial-input-after.png, radial-input-before.png, radial-input.pdf, shading_pattern.pdf, shading_pattern.pdf-2.png, trityp4.pdf-1.png I have a PDF file (for which I do not yet have release permission) that uses the sh operator, equivalent to PostScript's shfill (per PDF spec 1.7 page 987). Adobe provides implementation guidance in a 78-page document at http://www.adobe.com/devnet/postscript/pdfs/TN5600.SmoothShading.pdf#17 I will be trying to add this functionality this week, but if anyone has hints, suggestions, etc. they are most certainly welcome! -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (PDFBOX-615) shfill operator needs implementation
[ https://issues.apache.org/jira/browse/PDFBOX-615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-615: --- Attachment: Type5ShadingContext.java Type4ShadingContext.java GouraudShadingContext.java Great idea! The readBits() function did all I need. Funny thing: while cleaning up, I found there's already a bit reading class in the code - NBitInputStream. shfill operator needs implementation Key: PDFBOX-615 URL: https://issues.apache.org/jira/browse/PDFBOX-615 Project: PDFBox Issue Type: New Feature Components: PDModel Reporter: Daniel Wilson Assignee: Daniel Wilson Attachments: Centerplan.pdf, DECAHED.pdf, GouraudShadingContext.java, GouraudTriangle.java, LATTICE1.pdf, LATTICE2.pdf, PDShadingPatternResources.patch, Type4ShadingContext.java, Type4ShadingPaint.java, Type5ShadingContext.java, Type5ShadingPaint.java, Vertex.java, axial-input-after.png, axial-input-before.png, axial-input.pdf, bugzilla843488.pdf, bugzilla843488.pdf-1.png, color_gradient.pdf, color_gradient.pdf-1.png, decahed.pdf-1.png, input.pdf, input1.png, lattice1.pdf-1.png, lattice2.pdf-1.png, pdfbox-1.8.patch, pdfbox.patch, pslib-shading.pdf, radial-input-after.png, radial-input-before.png, radial-input.pdf, shading_pattern.pdf, shading_pattern.pdf-2.png, trityp4.pdf-1.png I have a PDF file (for which I do not yet have release permission) that uses the sh operator, equivalent to PostScript's shfill (per PDF spec 1.7 page 987). Adobe provides implementation guidance in a 78-page document at http://www.adobe.com/devnet/postscript/pdfs/TN5600.SmoothShading.pdf#17 I will be trying to add this functionality this week, but if anyone has hints, suggestions, etc. they are most certainly welcome! -- This message was sent by Atlassian JIRA (v6.1.5#6160)