[jira] [Commented] (PDFBOX-1962) Refactor the packages in the core pdfbox module
[ https://issues.apache.org/jira/browse/PDFBOX-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925476#comment-13925476 ] John Hewson commented on PDFBOX-1962: - Revision 1575836 moves various text handling classes from the util package into a new text package. Refactor the packages in the core pdfbox module --- Key: PDFBOX-1962 URL: https://issues.apache.org/jira/browse/PDFBOX-1962 Project: PDFBox Issue Type: Improvement Affects Versions: 2.0.0 Reporter: John Hewson We want to refactor the core pdfbox module packages so that there is no longer a dependency on AWT. Any packages which are moved outside the of the org.apache.pdfbox module need to be re-packaged appropriately (e.g. org.apache.pdfbox.rendering). AWT code could live in pdfbox-rendering but we need to think carefully about how to do this because, e.g. some of the Filters use AWT, as does FontBox. What are the use cases for modularisation, currently we have: - Android - Google App Engine Android seems to have some support for AWT and ImageIO, can somebody in the know provide more information? Google App Engine seems to blacklist ImageIO and AWT classes. Is there a strong desire to support it? Also, as Fred discussed on the mailing list the util package functionality is shared across numerous parts of the code but most classes are either used only from one package or can be replaced with new Java 1.6 constructs. By the end of this refactoring the pdfbox.util package should be mostly empty, containing only a handful of true utility classes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (PDFBOX-1971) PDFBox
janak created PDFBOX-1971: - Summary: PDFBox Key: PDFBOX-1971 URL: https://issues.apache.org/jira/browse/PDFBOX-1971 Project: PDFBox Issue Type: Wish Components: FontBox, Utilities Affects Versions: 1.8.4 Environment: Linux and Windows Reporter: janak I have tried with adding my customer font in PDFBox resource but it not creating proper PDF. PDF was created successfully but its showing Font contain bad /bbox dialogue while opening the pdf. What will be solution for this. Testing with font : LiberationSans with Regular and Bold -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-1971) PDFBox
[ https://issues.apache.org/jira/browse/PDFBOX-1971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925478#comment-13925478 ] John Hewson commented on PDFBOX-1971: - Can you provide some sample code which shows this problem? PDFBox --- Key: PDFBOX-1971 URL: https://issues.apache.org/jira/browse/PDFBOX-1971 Project: PDFBox Issue Type: Wish Components: FontBox, Utilities Affects Versions: 1.8.4 Environment: Linux and Windows Reporter: janak I have tried with adding my customer font in PDFBox resource but it not creating proper PDF. PDF was created successfully but its showing Font contain bad /bbox dialogue while opening the pdf. What will be solution for this. Testing with font : LiberationSans with Regular and Bold -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-1971) PDFBox
[ https://issues.apache.org/jira/browse/PDFBOX-1971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Hewson updated PDFBOX-1971: Issue Type: Bug (was: Wish) PDFBox --- Key: PDFBOX-1971 URL: https://issues.apache.org/jira/browse/PDFBOX-1971 Project: PDFBox Issue Type: Bug Components: Writing Affects Versions: 1.8.4 Environment: Linux and Windows Reporter: janak I have tried with adding my customer font in PDFBox resource but it not creating proper PDF. PDF was created successfully but its showing Font contain bad /bbox dialogue while opening the pdf. What will be solution for this. Testing with font : LiberationSans with Regular and Bold -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-1971) PDFBox
[ https://issues.apache.org/jira/browse/PDFBOX-1971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Hewson updated PDFBOX-1971: Component/s: (was: Utilities) (was: FontBox) Writing PDFBox --- Key: PDFBOX-1971 URL: https://issues.apache.org/jira/browse/PDFBOX-1971 Project: PDFBox Issue Type: Wish Components: Writing Affects Versions: 1.8.4 Environment: Linux and Windows Reporter: janak I have tried with adding my customer font in PDFBox resource but it not creating proper PDF. PDF was created successfully but its showing Font contain bad /bbox dialogue while opening the pdf. What will be solution for this. Testing with font : LiberationSans with Regular and Bold -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-1971) PDFBox
[ https://issues.apache.org/jira/browse/PDFBOX-1971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] janak updated PDFBOX-1971: -- Attachment: Main.java LiberationSans-Regular.ttf LiberationSans-Regular.pfb LiberationSans-Regular.afm LiberationSans-Bold.ttf LiberationSans-Bold.pfb LiberationSans-Bold.afm Hello World.pdf Here with I have attached files. PDFBox --- Key: PDFBOX-1971 URL: https://issues.apache.org/jira/browse/PDFBOX-1971 Project: PDFBox Issue Type: Bug Components: Writing Affects Versions: 1.8.4 Environment: Linux and Windows Reporter: janak Attachments: Hello World.pdf, LiberationSans-Bold.afm, LiberationSans-Bold.pfb, LiberationSans-Bold.ttf, LiberationSans-Regular.afm, LiberationSans-Regular.pfb, LiberationSans-Regular.ttf, Main.java I have tried with adding my customer font in PDFBox resource but it not creating proper PDF. PDF was created successfully but its showing Font contain bad /bbox dialogue while opening the pdf. What will be solution for this. Testing with font : LiberationSans with Regular and Bold -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-1971) Bad /bbox dialog after creating PDF with custom font
[ https://issues.apache.org/jira/browse/PDFBOX-1971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-1971: Summary: Bad /bbox dialog after creating PDF with custom font (was: PDFBox ) Bad /bbox dialog after creating PDF with custom font Key: PDFBOX-1971 URL: https://issues.apache.org/jira/browse/PDFBOX-1971 Project: PDFBox Issue Type: Bug Components: Writing Affects Versions: 1.8.4 Environment: Linux and Windows Reporter: janak Attachments: Hello World.pdf, LiberationSans-Bold.afm, LiberationSans-Bold.pfb, LiberationSans-Bold.ttf, LiberationSans-Regular.afm, LiberationSans-Regular.pfb, LiberationSans-Regular.ttf, Main.java I have tried with adding my customer font in PDFBox resource but it not creating proper PDF. PDF was created successfully but its showing Font contain bad /bbox dialogue while opening the pdf. What will be solution for this. Testing with font : LiberationSans with Regular and Bold -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: advanced signatures - the feature plans
Hi Vakhtang, I think, it's time to create another project named sign-box or something like that. Should the classes in org.apache.pdfbox.pdmodel.interactive.digitalsignature.visible be moved into this new project also? My understanding is that the PDF spec defines a basic signature container which is extensible and can embed signature formats defined by others, e.g. the PAdES standard defined by ETSI. This seems like a good candidate for a new sub-project e.g. “pdfbox-signing”. 1. create basic digital signature with the time of CPU. *done* 2. create digital signature with visible signature. *done* This is very poor functionality and is not easy to use. It's just in the project named examples. It must have very easy API, as we said before. It would be nice to have a command-line program e.g. “SignPDF” in pdfbox-tools. So, at the moment I have that functionality: 1. signing document with PADES-BES or PADES-BASIC profile, with CPU signing time. *done* Just checking: is this already in PDFBox? 2. signing document with PADES-BES or PADES-BASIC profile, with timeStamp server time. Already *implemented* - I have uploaded a patch in our jira, some classes are in the pdfbox project and some classes are in the example project. Great, what is the JIRA issue number? 3. signing document with timestamp server. Already *implemented* and patch is uploaded in a jira … Same question: JIRA issue number? 4. creted document secure store and PADES LTV profile implementation (advanced signatures!). I have already *implemented* this. I can create patch in the example project or create patch for sign-box too :) Tell me and I will create patch for one of them :) Creating your patches in the example project is fine, we can move them to a different sub-project for you. 5. certificate chain verification while signing process, against OCSP, CRL protocols (with advanced ocsp, crl certificate verifications too!) - I have already *implemented* this. I can create patch in the example project or create patch for sign-box and etc.. :-) :-) :-) Once again, the example project is fine, we can change the packages. Finally, I want tell you that I like that project and I want to help you as I can. I'm very well with digital signatures and I have very good experience with this. So, if you need, please tell me what should I do for this apache project? :) I am with you :) Perhaps the org.apache.pdfbox.pdmodel.interactive.digitalsignature is in need of simplification, what do you think? Thanks for your efforts! -- John On 9 Mar 2014, at 10:21, Vakhtang koroghlishvili vakhtang.koroghlishv...@gmail.com wrote: Hello, how are you? :) You know , that I have already fix and implement some issues and new features which was about digital signature. I have already created another new features too but I don't know if I should create this patches in the pdfbox example project. I think, it's time to create another project named sign-box or something like that. At the moment I have time and I can create that project with very good design architect and show you a patch or comitters can create that project with existence features and then we will add new features step by step. I will write here, what we have at the moment, and what can we add too: At the moment, if we want to use pdfbox for the document signing , we can only do that thing: 1. create basic digital signature with the time of CPU. *done* 2. create digital signature with visible signature. *done* -that was my first contribution :-) This is very poor functionality and is not easy to use. It's just in the project named examples. It must have very easy API, as we sad before. I have implement and add another functionality nd created patches some of them. some patches of new features is not updated in the jira, because I don't now whether this must be in the example project or not. So, at the moment I have that functionality: 1. signing document with PADES-BES or PADES-BASIC profile, with CPU signing time. *done* 2. signing document with PADES-BES or PADES-BASIC profile, with timeStamp server time. Already *implemented* - I have uploaded a patch in our jira, some classes are in the pdfbox project and some classes are in the example project. 3. signing document with timestamp server. Already *implemented* and patch is uploaded in a jira ... 4. creted document secure store and PADES LTV profile implementation (advanced signatures!). I have already *implemented* this. I can create patch in the example project or create patch for sign-box too :) Tell me and I will create patch for one of them :) 5. certificate chain verification while signing process, against OCSP, CRL protocols (with advanced ocsp, crl certificate verifications too!) - I have already *implemented* this. I can create patch in the example project or create patch for sign-box and etc.. :-) :-)
Re: advanced signatures - the feature plans
Hello Vakhtang, I remember you from some time ago without details, and that one of the committers was very satisfied with your code. Maybe that person will come back later, the weekend seems to be very quiet here. I'm answering you that you don't get frustrated. A few thoughts: - did Andreas already make you sign a CLA ( https://www.apache.org/licenses/ ) ? This is required when you submit real code, i.e. not just a small bugfix. - what JIRA issues are still open that you wish to have committed? Please name the ones with the smallest code first - if not already there and if possible / applicable, attach a test PDF (that isn't copyrighted except by you) or a unit test or whatever . This is just a general thought, e.g. you wrote code to create a new type of signature for a PDF. Then (if possible) you should also create a unit test that creates a signature for that PDF (if possible without user interaction), verifies that signature, modifies the PDF, and verifies the signature again and of course fails. - make sure that your code works with the current trunk, a lot was changed, although not related to signatures Tilman Am 09.03.2014 18:21, schrieb Vakhtang koroghlishvili: Hello, how are you? :) You know , that I have already fix and implement some issues and new features which was about digital signature. I have already created another new features too but I don't know if I should create this patches in the pdfbox example project. I think, it's time to create another project named sign-box or something like that. At the moment I have time and I can create that project with very good design architect and show you a patch or comitters can create that project with existence features and then we will add new features step by step. I will write here, what we have at the moment, and what can we add too: At the moment, if we want to use pdfbox for the document signing , we can only do that thing: 1. create basic digital signature with the time of CPU. *done* 2. create digital signature with visible signature. *done* -that was my first contribution :-) This is very poor functionality and is not easy to use. It's just in the project named examples. It must have very easy API, as we sad before. I have implement and add another functionality nd created patches some of them. some patches of new features is not updated in the jira, because I don't now whether this must be in the example project or not. So, at the moment I have that functionality: 1. signing document with PADES-BES or PADES-BASIC profile, with CPU signing time. *done* 2. signing document with PADES-BES or PADES-BASIC profile, with timeStamp server time. Already *implemented* - I have uploaded a patch in our jira, some classes are in the pdfbox project and some classes are in the example project. 3. signing document with timestamp server. Already *implemented* and patch is uploaded in a jira ... 4. creted document secure store and PADES LTV profile implementation (advanced signatures!). I have already *implemented* this. I can create patch in the example project or create patch for sign-box too :) Tell me and I will create patch for one of them :) 5. certificate chain verification while signing process, against OCSP, CRL protocols (with advanced ocsp, crl certificate verifications too!) - I have already *implemented* this. I can create patch in the example project or create patch for sign-box and etc.. :-) :-) :-) Finally, I want tell you that I like that project and I want to help you as I can. I'm very well with digital signatures and I have very good experience with this. So, if you need, please tell me what should I do for this apache project? :) I am with you :) Best regards,
[jira] [Commented] (PDFBOX-1971) Bad /bbox dialog after creating PDF with custom font
[ https://issues.apache.org/jira/browse/PDFBOX-1971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925497#comment-13925497 ] John Hewson commented on PDFBOX-1971: - Your example does not compile, specifically on this line the constant does not exist in PDFBOX: {code} PDFont font = PDType1Font.LIBERATIONSANS_BOLD; {code} Bad /bbox dialog after creating PDF with custom font Key: PDFBOX-1971 URL: https://issues.apache.org/jira/browse/PDFBOX-1971 Project: PDFBox Issue Type: Bug Components: Writing Affects Versions: 1.8.4 Environment: Linux and Windows Reporter: janak Attachments: Hello World.pdf, LiberationSans-Bold.afm, LiberationSans-Bold.pfb, LiberationSans-Bold.ttf, LiberationSans-Regular.afm, LiberationSans-Regular.pfb, LiberationSans-Regular.ttf, Main.java I have tried with adding my customer font in PDFBox resource but it not creating proper PDF. PDF was created successfully but its showing Font contain bad /bbox dialogue while opening the pdf. What will be solution for this. Testing with font : LiberationSans with Regular and Bold Also tried with loadTTF and PDType1AfmPfbFont class but pdf was 150 times larger then attached one. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-1971) Bad /bbox dialog after creating PDF with custom font
[ https://issues.apache.org/jira/browse/PDFBOX-1971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925498#comment-13925498 ] janak commented on PDFBOX-1971: --- Have modified source of PDFBox on latest version and added liberationsans fonts. Basically idea is to generate pdf using pdfbox with liberationsans fonts. Bad /bbox dialog after creating PDF with custom font Key: PDFBOX-1971 URL: https://issues.apache.org/jira/browse/PDFBOX-1971 Project: PDFBox Issue Type: Bug Components: Writing Affects Versions: 1.8.4 Environment: Linux and Windows Reporter: janak Attachments: Hello World.pdf, LiberationSans-Bold.afm, LiberationSans-Bold.pfb, LiberationSans-Bold.ttf, LiberationSans-Regular.afm, LiberationSans-Regular.pfb, LiberationSans-Regular.ttf, Main.java I have tried with adding my customer font in PDFBox resource but it not creating proper PDF. PDF was created successfully but its showing Font contain bad /bbox dialogue while opening the pdf. What will be solution for this. Testing with font : LiberationSans with Regular and Bold Also tried with loadTTF and PDType1AfmPfbFont class but pdf was 150 times larger then attached one. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-1971) Bad /bbox dialog after creating PDF with custom font
[ https://issues.apache.org/jira/browse/PDFBOX-1971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] janak updated PDFBOX-1971: -- Attachment: pdfbox.zip Bad /bbox dialog after creating PDF with custom font Key: PDFBOX-1971 URL: https://issues.apache.org/jira/browse/PDFBOX-1971 Project: PDFBox Issue Type: Bug Components: Writing Affects Versions: 1.8.4 Environment: Linux and Windows Reporter: janak Attachments: Hello World.pdf, LiberationSans-Bold.afm, LiberationSans-Bold.pfb, LiberationSans-Bold.ttf, LiberationSans-Regular.afm, LiberationSans-Regular.pfb, LiberationSans-Regular.ttf, Main.java, pdfbox.zip I have tried with adding my customer font in PDFBox resource but it not creating proper PDF. PDF was created successfully but its showing Font contain bad /bbox dialogue while opening the pdf. What will be solution for this. Testing with font : LiberationSans with Regular and Bold Also tried with loadTTF and PDType1AfmPfbFont class but pdf was 150 times larger then attached one. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-1971) Bad /bbox dialog after creating PDF with custom font
[ https://issues.apache.org/jira/browse/PDFBOX-1971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925501#comment-13925501 ] John Hewson commented on PDFBOX-1971: - There's no need to modify the source of PDFBox in order to embed fonts. The font constants defined in PDType1Font, such as HELVETICA, are used for fonts which are pre-defined by the PDF format, i.e. they come built-in to readers and so don't get embedded. To embedded a custom Type 1 font you just need to do the following: {code} PDFont font = new PDType1AfmPfbFont(document,LiberationSans-Bold.afm); {code} This will embed the .pfb and .afm data into the PDF, and the file size will be larger. Bad /bbox dialog after creating PDF with custom font Key: PDFBOX-1971 URL: https://issues.apache.org/jira/browse/PDFBOX-1971 Project: PDFBox Issue Type: Bug Components: Writing Affects Versions: 1.8.4 Environment: Linux and Windows Reporter: janak Attachments: Hello World.pdf, LiberationSans-Bold.afm, LiberationSans-Bold.pfb, LiberationSans-Bold.ttf, LiberationSans-Regular.afm, LiberationSans-Regular.pfb, LiberationSans-Regular.ttf, Main.java, pdfbox.zip I have tried with adding my customer font in PDFBox resource but it not creating proper PDF. PDF was created successfully but its showing Font contain bad /bbox dialogue while opening the pdf. What will be solution for this. Testing with font : LiberationSans with Regular and Bold Also tried with loadTTF and PDType1AfmPfbFont class but pdf was 150 times larger then attached one. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (PDFBOX-1971) Bad /bbox dialog after creating PDF with custom font
[ https://issues.apache.org/jira/browse/PDFBOX-1971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Hewson resolved PDFBOX-1971. - Resolution: Invalid Bad /bbox dialog after creating PDF with custom font Key: PDFBOX-1971 URL: https://issues.apache.org/jira/browse/PDFBOX-1971 Project: PDFBox Issue Type: Bug Components: Writing Affects Versions: 1.8.4 Environment: Linux and Windows Reporter: janak Attachments: Hello World.pdf, LiberationSans-Bold.afm, LiberationSans-Bold.pfb, LiberationSans-Bold.ttf, LiberationSans-Regular.afm, LiberationSans-Regular.pfb, LiberationSans-Regular.ttf, Main.java, pdfbox.zip I have tried with adding my customer font in PDFBox resource but it not creating proper PDF. PDF was created successfully but its showing Font contain bad /bbox dialogue while opening the pdf. What will be solution for this. Testing with font : LiberationSans with Regular and Bold Also tried with loadTTF and PDType1AfmPfbFont class but pdf was 150 times larger then attached one. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-1971) Bad /bbox dialog after creating PDF with custom font
[ https://issues.apache.org/jira/browse/PDFBOX-1971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925507#comment-13925507 ] janak commented on PDFBOX-1971: --- is there any way to shrink the file size? Bad /bbox dialog after creating PDF with custom font Key: PDFBOX-1971 URL: https://issues.apache.org/jira/browse/PDFBOX-1971 Project: PDFBox Issue Type: Bug Components: Writing Affects Versions: 1.8.4 Environment: Linux and Windows Reporter: janak Attachments: Hello World.pdf, LiberationSans-Bold.afm, LiberationSans-Bold.pfb, LiberationSans-Bold.ttf, LiberationSans-Regular.afm, LiberationSans-Regular.pfb, LiberationSans-Regular.ttf, Main.java, pdfbox.zip I have tried with adding my customer font in PDFBox resource but it not creating proper PDF. PDF was created successfully but its showing Font contain bad /bbox dialogue while opening the pdf. What will be solution for this. Testing with font : LiberationSans with Regular and Bold Also tried with loadTTF and PDType1AfmPfbFont class but pdf was 150 times larger then attached one. -- This message was sent by Atlassian JIRA (v6.2#6252)
[DISCUSS] PDFBox and support for PDF versions, PDF standards
Hi, as I’m currently looking at the parsing part of PDFBox one question came to my mind which is a more formal support for PDF versions and PDF standards such as PDF/A, PDF/UA … As of today PDFBox has no formal support for specific PDF versions in a way that a specific version can be enforced, validated ... The PDFBox PDF/A validation does a good job for PDF/A 1b but it can not be easily extended to other standards. Do you think that there is a need for a more formal support of such standards and versions? The would influence some of the design decisions for the parser and affect the base objects. BR Maruan Sahyoun
Re: PDFBox Documentation - Rendering
Hi Maruan, Right now rendering is in a state of flux, it’s changing a lot and we often have to refactor in order to gain new functionality in a clean manner. Once things stabilise a bit more we should certainly create some more documentation, but right now it will be out of date within a week. The quickest most effective summary I can give at the moment is to look at the newly added PDFRenderer class. This is the point from where all rendering is coordinated most of which is done by PageDrawer, though the recent Pattern drawing changes mean that PageDrawer no longer draws a page, it draws a “content stream so it will probably be changed soon too. You get the idea… :) If you have any specific questions regarding rendering, feel free to ask. -- John On 10 Mar 2014, at 00:08, Maruan Sahyoun sahy...@fileaffairs.de wrote: Hi, I’m currently enhancing the documentation for PDFBox with some more samples, code snippets etc. For the developer section would it be possible that someone - maybe John or Tilman as they are most familiar with the rendering code - writes up a small introductory article about how rendering works in PDFBox. Only a quick overview? BR Maruan Sahyoun
[jira] [Commented] (PDFBOX-1971) Bad /bbox dialog after creating PDF with custom font
[ https://issues.apache.org/jira/browse/PDFBOX-1971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925522#comment-13925522 ] John Hewson commented on PDFBOX-1971: - Each font you use will result in the .pfb (and some of the .afm information) being embedded. It's possible to subset a Type 1 font but currently PDFBox can't do that. Given that LiberationSans is a clone of Arial, and Arial is a clone of Helvetica, you could just use {{PDType1Font.HELVETICA}} instead and avoid the need to embed any fonts. Bad /bbox dialog after creating PDF with custom font Key: PDFBOX-1971 URL: https://issues.apache.org/jira/browse/PDFBOX-1971 Project: PDFBox Issue Type: Bug Components: Writing Affects Versions: 1.8.4 Environment: Linux and Windows Reporter: janak Attachments: Hello World.pdf, LiberationSans-Bold.afm, LiberationSans-Bold.pfb, LiberationSans-Bold.ttf, LiberationSans-Regular.afm, LiberationSans-Regular.pfb, LiberationSans-Regular.ttf, Main.java, pdfbox.zip I have tried with adding my customer font in PDFBox resource but it not creating proper PDF. PDF was created successfully but its showing Font contain bad /bbox dialogue while opening the pdf. What will be solution for this. Testing with font : LiberationSans with Regular and Bold Also tried with loadTTF and PDType1AfmPfbFont class but pdf was 150 times larger then attached one. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Closed] (PDFBOX-1971) Bad /bbox dialog after creating PDF with custom font
[ https://issues.apache.org/jira/browse/PDFBOX-1971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Hewson closed PDFBOX-1971. --- Bad /bbox dialog after creating PDF with custom font Key: PDFBOX-1971 URL: https://issues.apache.org/jira/browse/PDFBOX-1971 Project: PDFBox Issue Type: Bug Components: Writing Affects Versions: 1.8.4 Environment: Linux and Windows Reporter: janak Attachments: Hello World.pdf, LiberationSans-Bold.afm, LiberationSans-Bold.pfb, LiberationSans-Bold.ttf, LiberationSans-Regular.afm, LiberationSans-Regular.pfb, LiberationSans-Regular.ttf, Main.java, pdfbox.zip I have tried with adding my customer font in PDFBox resource but it not creating proper PDF. PDF was created successfully but its showing Font contain bad /bbox dialogue while opening the pdf. What will be solution for this. Testing with font : LiberationSans with Regular and Bold Also tried with loadTTF and PDType1AfmPfbFont class but pdf was 150 times larger then attached one. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: [DISCUSS] PDFBox and support for PDF versions, PDF standards
Hi Maruan As of today PDFBox has no formal support for specific PDF versions in a way that a specific version can be enforced, validated ... Perhaps that is because there is not much demand for this? Nowadays everyone has instant access to the latest version of Adobe Reader so checking that a PDF can be opened with a specific version of Adobe Reader is not that useful anymore. There might be some niche cases, but I can’t think what they would be. For cases where it’s important that a PDF file is valid then a format such as PDF/A or PDF/X must be used instead as “vanilla PDF is ambiguous. The PDFBox PDF/A validation does a good job for PDF/A 1b but it can not be easily extended to other standards. Yes, PDF/A is carefully validated because it is for archival purposes, unlike regular PDF files. Do you think that there is a need for a more formal support of such standards and versions? The would influence some of the design decisions for the parser and affect the base objects. I can’t think of a reason why someone would want to parse a specific PDF version, so my answer is no, I don’t think there is such a need. Has the syntax of PDF even changed that much over the different versions? — John
Re: [GSoC 2014]Optical Character Recognition project - Introduction
Dimuthu, That’s looking really good. You just need to use the following Apache header on your Java source files: /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the License); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an AS IS BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ -- John On 7 Mar 2014, at 07:56, DImuthu Upeksha dimuthu.upeks...@gmail.com wrote: Hi John I refactored Tesseract JNI code to support maven build. To create the JNI library I added pre-built static libraries of Tesseract and Leptonica to resources folder[2]. For now it includes librararies supported for mac. But we can easily add both windows and linux libraries. After mvn clean install, the jar is created under target folder. Now all setting up is done. What remains is implementing those native methods in tessbaseapi.cpp [3]. Hope to finish it asap. Please let me know if there is any concern about project structure. [1] https://github.com/DImuthuUpe/Tesseract-API.git [2] https://github.com/DImuthuUpe/Tesseract-API/tree/master/src/main/resources [3] https://github.com/DImuthuUpe/Tesseract-API/blob/master/src/main/native/src/tessbaseapi.cpp Thanks Dimuthu On Thu, Mar 6, 2014 at 1:15 AM, John Hewson j...@jahewson.com wrote: Dimuthu There is a lot of code fractions in current android jni wrapper which use (jint)somePointer casting which will create terrible memory leaks in 64 bit environments because ponters are 64 bit. So I believe writing it from the beginning is much better. That's a classic 64-bit pitfall, well spotted. We definitely need to support 64-bit JVMs. we can use the static library of Leptonica (I did and it worked nicely). I think it is not a issue to use it's static library because both Tesseract and Leptonica is under apache licence. Sounds good, I found the following in the README: Leptonica is required. (www.leptonica.com). Tesseract no longer compiles without Leptonica. Which makes sense. -- John On 5 Mar 2014, at 09:45, DImuthu Upeksha dimuthu.upeks...@gmail.com wrote: Hi John, +1 for you suggestion about converting image = byte array at java side. It reduces lot of complexities. I don't know whether you have noticed or not, jint data type in jni is a 32bit integer type. I noticed it in my Mac but don't know about other operating systems. Leptonica is the image processing library for Tesseract [1]. What tesseract do is using image processing algorithms in Leptonica to implement its OCR algorithms. This [2] is the responsible .cpp file to create Tesseract API. You can see it includes allheaders.h header file which is the main header file of Leptonoca. So I think it is a must to build Leptonica first and link it when we build Tesseract. This is not a big problem if we can use the static library of Leptonica (I did and it worked nicely). I think it is not a issue to use it's static library because both Tesseract and Leptonica is under apache licence. I'm working on the maven implementation you have mentioned and will get back to you soon. Thanks Dimuthu [1] https://code.google.com/p/tesseract-ocr/wiki/Compiling [2] https://github.com/DImuthuUpe/Tesseract-API/blob/master/jni/tesseract/src/api/tesseractmain.cpp On Wed, Mar 5, 2014 at 1:15 AM, John Hewson j...@jahewson.com wrote: Hi Dimuthu, 1,2,3: Feel free to write your own Tesseract binding or port the existing code as you see fit. The JNI binding should be minimal, only the methods you require need to be wrapped. Also, don't forget that some of the interop can be done in Java, for example if it is easier to convert a BufferedImage to a byte array in Java then do it there and pass the result to JNI rather than writing lots of JNI C++ to achieve the same result. Your GitHub repo looks like a good start, I can make comments there as things progress. Is it possible to build Tesseract without leptonica? I was under the impression that it was used for image i/o only, but I may be misinformed. 4: The native platform library should be built as part of the Maven build for the Tesseract wrapper which can be a separate project. The output can be a jar file which contains the native binaries. It should be possible for the jar to contain prebuilt binaries for all platforms but this is something we can
Re: [GSoC 2014]Optical Character Recognition project - Introduction
You just need to use the following Apache header on your Java source files: Actually, no, forget that. I don’t think you can use that header yet as you haven’t signed a CLA. Leave the files as they are without headers for now. We’ll deal with the licensing later because your code isn’t in the official Apache repository yet. -- John On 10 Mar 2014, at 01:30, John Hewson j...@jahewson.com wrote: Dimuthu, That’s looking really good. You just need to use the following Apache header on your Java source files: /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the License); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an AS IS BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ -- John On 7 Mar 2014, at 07:56, DImuthu Upeksha dimuthu.upeks...@gmail.com wrote: Hi John I refactored Tesseract JNI code to support maven build. To create the JNI library I added pre-built static libraries of Tesseract and Leptonica to resources folder[2]. For now it includes librararies supported for mac. But we can easily add both windows and linux libraries. After mvn clean install, the jar is created under target folder. Now all setting up is done. What remains is implementing those native methods in tessbaseapi.cpp [3]. Hope to finish it asap. Please let me know if there is any concern about project structure. [1] https://github.com/DImuthuUpe/Tesseract-API.git [2] https://github.com/DImuthuUpe/Tesseract-API/tree/master/src/main/resources [3] https://github.com/DImuthuUpe/Tesseract-API/blob/master/src/main/native/src/tessbaseapi.cpp Thanks Dimuthu On Thu, Mar 6, 2014 at 1:15 AM, John Hewson j...@jahewson.com wrote: Dimuthu There is a lot of code fractions in current android jni wrapper which use (jint)somePointer casting which will create terrible memory leaks in 64 bit environments because ponters are 64 bit. So I believe writing it from the beginning is much better. That's a classic 64-bit pitfall, well spotted. We definitely need to support 64-bit JVMs. we can use the static library of Leptonica (I did and it worked nicely). I think it is not a issue to use it's static library because both Tesseract and Leptonica is under apache licence. Sounds good, I found the following in the README: Leptonica is required. (www.leptonica.com). Tesseract no longer compiles without Leptonica. Which makes sense. -- John On 5 Mar 2014, at 09:45, DImuthu Upeksha dimuthu.upeks...@gmail.com wrote: Hi John, +1 for you suggestion about converting image = byte array at java side. It reduces lot of complexities. I don't know whether you have noticed or not, jint data type in jni is a 32bit integer type. I noticed it in my Mac but don't know about other operating systems. Leptonica is the image processing library for Tesseract [1]. What tesseract do is using image processing algorithms in Leptonica to implement its OCR algorithms. This [2] is the responsible .cpp file to create Tesseract API. You can see it includes allheaders.h header file which is the main header file of Leptonoca. So I think it is a must to build Leptonica first and link it when we build Tesseract. This is not a big problem if we can use the static library of Leptonica (I did and it worked nicely). I think it is not a issue to use it's static library because both Tesseract and Leptonica is under apache licence. I'm working on the maven implementation you have mentioned and will get back to you soon. Thanks Dimuthu [1] https://code.google.com/p/tesseract-ocr/wiki/Compiling [2] https://github.com/DImuthuUpe/Tesseract-API/blob/master/jni/tesseract/src/api/tesseractmain.cpp On Wed, Mar 5, 2014 at 1:15 AM, John Hewson j...@jahewson.com wrote: Hi Dimuthu, 1,2,3: Feel free to write your own Tesseract binding or port the existing code as you see fit. The JNI binding should be minimal, only the methods you require need to be wrapped. Also, don't forget that some of the interop can be done in Java, for example if it is easier to convert a BufferedImage to a byte array in Java then do it there and pass the result to JNI rather than writing lots of JNI C++ to achieve the same result. Your GitHub repo looks like a good start, I can make comments there as things progress. Is it possible to build Tesseract
Re: [DISCUSS] PDFBox and support for PDF versions, PDF standards
Hi John, it’s not about PDF versions but PDF versions and standards. The base syntax has not changed. But the elements described by the base have. BR Maruan Sahyoun Am 10.03.2014 um 09:20 schrieb John Hewson j...@jahewson.com: Hi Maruan As of today PDFBox has no formal support for specific PDF versions in a way that a specific version can be enforced, validated ... Perhaps that is because there is not much demand for this? Nowadays everyone has instant access to the latest version of Adobe Reader so checking that a PDF can be opened with a specific version of Adobe Reader is not that useful anymore. There might be some niche cases, but I can’t think what they would be. For cases where it’s important that a PDF file is valid then a format such as PDF/A or PDF/X must be used instead as “vanilla PDF is ambiguous. The PDFBox PDF/A validation does a good job for PDF/A 1b but it can not be easily extended to other standards. Yes, PDF/A is carefully validated because it is for archival purposes, unlike regular PDF files. Do you think that there is a need for a more formal support of such standards and versions? The would influence some of the design decisions for the parser and affect the base objects. I can’t think of a reason why someone would want to parse a specific PDF version, so my answer is no, I don’t think there is such a need. Has the syntax of PDF even changed that much over the different versions? — John
Re: advanced signatures - the feature plans
Should the classes in org.apache.pdfbox.pdmodel. interactive.digitalsignature.visible be moved into this new project also Because of this classes is for creating visible signature fields (not for signing) we can not to move that classes. sign-box will be for only signing. is this already in PDFBox? As I remember Thomas Chojecki have implemented this in the example project of pdfbox like BASIC profile. We can make it BES with some changes. I have implemented PADES LTV in my computer (this profile is based on this issues PDFBOX-1847 and PDFBOX-1848) and we will add this too. JIRA issue number? PDFBOX-1847 and PDFBOX-1848 Creating your patches in the example project is fine, we can move them to a different sub-project for you. Yes, but sub-project architecture must not be the same because that sub-project API must be very easy to use. So we might change some structures. Finally the architect will be like that: you will just create signature object and then you will call signing method with your signature profile and parameters, and that's all it is very simplified. So we must create different architecture of that. As I remember Thomas Chojecki was creating code review of that patches. :) So we should wait :) Perhaps the org.apache.pdfbox.pdmodel.interactive.digitalsignature is in need of simplification, what do you think? As I see, that interfaces and classes are written very well. We will add another classes and interfaces for another signature functionality. But most of them will be in the new sub-project. We will move some classes from example-project to new sub-project, with different architecture. On Mon, Mar 10, 2014 at 10:54 AM, John Hewson j...@jahewson.com wrote: Hi Vakhtang, I think, it's time to create another project named sign-box or something like that. Should the classes in org.apache.pdfbox.pdmodel.interactive.digitalsignature.visible be moved into this new project also? My understanding is that the PDF spec defines a basic signature container which is extensible and can embed signature formats defined by others, e.g. the PAdES standard defined by ETSI. This seems like a good candidate for a new sub-project e.g. pdfbox-signing. 1. create basic digital signature with the time of CPU. *done* 2. create digital signature with visible signature. *done* This is very poor functionality and is not easy to use. It's just in the project named examples. It must have very easy API, as we said before. It would be nice to have a command-line program e.g. SignPDF in pdfbox-tools. So, at the moment I have that functionality: 1. signing document with PADES-BES or PADES-BASIC profile, with CPU signing time. *done* Just checking: is this already in PDFBox? 2. signing document with PADES-BES or PADES-BASIC profile, with timeStamp server time. Already *implemented* - I have uploaded a patch in our jira, some classes are in the pdfbox project and some classes are in the example project. Great, what is the JIRA issue number? 3. signing document with timestamp server. Already *implemented* and patch is uploaded in a jira ... Same question: JIRA issue number? 4. creted document secure store and PADES LTV profile implementation (advanced signatures!). I have already *implemented* this. I can create patch in the example project or create patch for sign-box too :) Tell me and I will create patch for one of them :) Creating your patches in the example project is fine, we can move them to a different sub-project for you. 5. certificate chain verification while signing process, against OCSP, CRL protocols (with advanced ocsp, crl certificate verifications too!) - I have already *implemented* this. I can create patch in the example project or create patch for sign-box and etc.. :-) :-) :-) Once again, the example project is fine, we can change the packages. Finally, I want tell you that I like that project and I want to help you as I can. I'm very well with digital signatures and I have very good experience with this. So, if you need, please tell me what should I do for this apache project? :) I am with you :) Perhaps the org.apache.pdfbox.pdmodel.interactive.digitalsignature is in need of simplification, what do you think? Thanks for your efforts! -- John On 9 Mar 2014, at 10:21, Vakhtang koroghlishvili vakhtang.koroghlishv...@gmail.com wrote: Hello, how are you? :) You know , that I have already fix and implement some issues and new features which was about digital signature. I have already created another new features too but I don't know if I should create this patches in the pdfbox example project. I think, it's time to create another project named sign-box or something like that. At the moment I have time and I can create that project with very good design architect and show you a patch or comitters can create that project with
Re: [GSoC 2014]Optical Character Recognition project - Introduction
Dimuthu, I finished basic implementation of JNI wrapper for Tesseract. Now it can be build using maven. Some useful methods that are needed to do basic OCR were implemented. Great, it’s looking good, nice and clean. 1. What is the task of processStream method in PDFTextStripper class line 456 : processStream( page.findResources(), content, page.findCropBox(), page.findRotation() ); A PDF file is made up of pages, each of which contains a “content stream”. This content stream contains a list of drawing commands such as “move to 10,15” or “write the word `foo`”, these are called operators. The processStream function reads the stream for the current page and executes each of the operators. The operators themselves are implemented each in their own class which is a subclass of PDFOperator. The constructor of PDFStreamEngine creates the operator classes using reflection, which is rather odd and I’m not sure why this design was chosen. The operators used by PDFTextStripper can be found in org/apache/pdfbox/resources/PDFTextStripper.properties 2. Say I need to extract images and it's metadata from a pdf. What is the better approach to do it? You could subclass PDFTextStripper and override the startDocument method and use it to create a PDFRenderer and store it in a field. Then override the processPage method and use the previously created PDFRenderer to render the current page to a buffered image and perform OCR on the image. Once you have the OCR text + positions, instead of calling processStream you can call processTextPosition once for each character + position. The PDFRenderer class was just added to the trunk, so make sure you do an “svn update”. Let me know if you need me to change PDFTextStripper to make it easier to subclass. Cheers -- John On 9 Mar 2014, at 09:08, DImuthu Upeksha dimuthu.upeks...@gmail.com wrote: Hi John, I finished basic implementation of JNI wrapper for Tesseract. Now it can be build using maven. Some useful methods that are needed to do basic OCR were implemented. I went through PDFBox code several times and got couple of issues that are needed to be clarified 1. What is the task of processStream method in PDFTextStripper class line 456 : processStream( page.findResources(), content, page.findCropBox(), page.findRotation() ); 2. Say I need to extract images and it's metadata from a pdf. What is the better approach to do it? Thanks Dimuthu On Fri, Mar 7, 2014 at 9:26 PM, DImuthu Upeksha dimuthu.upeks...@gmail.comwrote: Hi John I refactored Tesseract JNI code to support maven build. To create the JNI library I added pre-built static libraries of Tesseract and Leptonica to resources folder[2]. For now it includes librararies supported for mac. But we can easily add both windows and linux libraries. After mvn clean install, the jar is created under target folder. Now all setting up is done. What remains is implementing those native methods in tessbaseapi.cpp [3]. Hope to finish it asap. Please let me know if there is any concern about project structure. [1] https://github.com/DImuthuUpe/Tesseract-API.git [2] https://github.com/DImuthuUpe/Tesseract-API/tree/master/src/main/resources [3] https://github.com/DImuthuUpe/Tesseract-API/blob/master/src/main/native/src/tessbaseapi.cpp Thanks Dimuthu On Thu, Mar 6, 2014 at 1:15 AM, John Hewson j...@jahewson.com wrote: Dimuthu There is a lot of code fractions in current android jni wrapper which use (jint)somePointer casting which will create terrible memory leaks in 64 bit environments because ponters are 64 bit. So I believe writing it from the beginning is much better. That's a classic 64-bit pitfall, well spotted. We definitely need to support 64-bit JVMs. we can use the static library of Leptonica (I did and it worked nicely). I think it is not a issue to use it's static library because both Tesseract and Leptonica is under apache licence. Sounds good, I found the following in the README: Leptonica is required. (www.leptonica.com). Tesseract no longer compiles without Leptonica. Which makes sense. -- John On 5 Mar 2014, at 09:45, DImuthu Upeksha dimuthu.upeks...@gmail.com wrote: Hi John, +1 for you suggestion about converting image = byte array at java side. It reduces lot of complexities. I don't know whether you have noticed or not, jint data type in jni is a 32bit integer type. I noticed it in my Mac but don't know about other operating systems. Leptonica is the image processing library for Tesseract [1]. What tesseract do is using image processing algorithms in Leptonica to implement its OCR algorithms. This [2] is the responsible .cpp file to create Tesseract API. You can see it includes allheaders.h header file which is the main header file of Leptonoca. So I think it is a must to build Leptonica first and link it when we build Tesseract. This is not a big problem if we can use the static
Re: [DISCUSS] PDFBox and support for PDF versions, PDF standards
The base syntax has not changed. But the elements described by the base have. If the syntax hasn’t changed then there can’t be anything in the parser which is version-specific. -- John On 10 Mar 2014, at 01:43, Maruan Sahyoun sahy...@fileaffairs.de wrote: Hi John, it’s not about PDF versions but PDF versions and standards. The base syntax has not changed. But the elements described by the base have. BR Maruan Sahyoun Am 10.03.2014 um 09:20 schrieb John Hewson j...@jahewson.com: Hi Maruan As of today PDFBox has no formal support for specific PDF versions in a way that a specific version can be enforced, validated ... Perhaps that is because there is not much demand for this? Nowadays everyone has instant access to the latest version of Adobe Reader so checking that a PDF can be opened with a specific version of Adobe Reader is not that useful anymore. There might be some niche cases, but I can’t think what they would be. For cases where it’s important that a PDF file is valid then a format such as PDF/A or PDF/X must be used instead as “vanilla PDF is ambiguous. The PDFBox PDF/A validation does a good job for PDF/A 1b but it can not be easily extended to other standards. Yes, PDF/A is carefully validated because it is for archival purposes, unlike regular PDF files. Do you think that there is a need for a more formal support of such standards and versions? The would influence some of the design decisions for the parser and affect the base objects. I can’t think of a reason why someone would want to parse a specific PDF version, so my answer is no, I don’t think there is such a need. Has the syntax of PDF even changed that much over the different versions? — John
Re: advanced signatures - the feature plans
Hello, I remember you from some time ago without details, and that one of the committers was very satisfied with your code. Yes, might be it was Thomas Chojecki. The architecture of the new features was written using very beautiful design patterns and it was working very well :) did Andreas already make you sign a CLA ( https://www.apache.org/licenses/ ) ? My codes have a CLA license, Tomas Chojecki have added it. At the moment some of them are committed and some of them are not (Tomas was created code review and was adding Individual CLA [2] license to them, but might be he was busy and some codes is not committed at the moment). What do you means about signing? Is there another procedure too? :) This is required when you submit real code, i.e. not just a small bugfix. As I know, I have written and upload patches from 5 000 line to 10 000 line of code. For instance, when I implement PDFBOX-1766 new feature, it was for about 3500-3800 line of code. :) what JIRA issues are still open that you wish to have committed? PDFBOX-1848 and PDFBOX-1847 if not already there and if possible / applicable, attach a test PDF it's already done in my open issues. :) Make sure that your code works with the current trunk, a lot was changed, although not related to signatures Off course, I will check again. As I remember Thomas Chojecki was creating code review :) We might wait for Tomas Chojecki :) Vakhtang, On Mon, Mar 10, 2014 at 10:58 AM, Tilman Hausherr thaush...@t-online.dewrote: Hello Vakhtang, I remember you from some time ago without details, and that one of the committers was very satisfied with your code. Maybe that person will come back later, the weekend seems to be very quiet here. I'm answering you that you don't get frustrated. A few thoughts: - did Andreas already make you sign a CLA ( https://www.apache.org/ licenses/ ) ? This is required when you submit real code, i.e. not just a small bugfix. - what JIRA issues are still open that you wish to have committed? Please name the ones with the smallest code first - if not already there and if possible / applicable, attach a test PDF (that isn't copyrighted except by you) or a unit test or whatever . This is just a general thought, e.g. you wrote code to create a new type of signature for a PDF. Then (if possible) you should also create a unit test that creates a signature for that PDF (if possible without user interaction), verifies that signature, modifies the PDF, and verifies the signature again and of course fails. - make sure that your code works with the current trunk, a lot was changed, although not related to signatures Tilman Am 09.03.2014 18:21, schrieb Vakhtang koroghlishvili: Hello, how are you? :) You know , that I have already fix and implement some issues and new features which was about digital signature. I have already created another new features too but I don't know if I should create this patches in the pdfbox example project. I think, it's time to create another project named sign-box or something like that. At the moment I have time and I can create that project with very good design architect and show you a patch or comitters can create that project with existence features and then we will add new features step by step. I will write here, what we have at the moment, and what can we add too: At the moment, if we want to use pdfbox for the document signing , we can only do that thing: 1. create basic digital signature with the time of CPU. *done* 2. create digital signature with visible signature. *done* -that was my first contribution :-) This is very poor functionality and is not easy to use. It's just in the project named examples. It must have very easy API, as we sad before. I have implement and add another functionality nd created patches some of them. some patches of new features is not updated in the jira, because I don't now whether this must be in the example project or not. So, at the moment I have that functionality: 1. signing document with PADES-BES or PADES-BASIC profile, with CPU signing time. *done* 2. signing document with PADES-BES or PADES-BASIC profile, with timeStamp server time. Already *implemented* - I have uploaded a patch in our jira, some classes are in the pdfbox project and some classes are in the example project. 3. signing document with timestamp server. Already *implemented* and patch is uploaded in a jira ... 4. creted document secure store and PADES LTV profile implementation (advanced signatures!). I have already *implemented* this. I can create patch in the example project or create patch for sign-box too :) Tell me and I will create patch for one of them :) 5. certificate chain verification while signing process, against OCSP, CRL protocols (with advanced ocsp, crl certificate verifications too!) - I have already *implemented* this. I can create patch in the example project or
PageDrawer
Hi, In 2.0 PageDrawer now takes a parameter, is this changed needed since it breaks API compatability? Thanks
Re: [DISCUSS] PDFBox and support for PDF versions, PDF standards
I think we are talking about two different things here. The parsing process to get the tokens, and the parsing process to follow the PDF file layout and to form and follow the higher level structures such as Xref. Tokens didn’t change. File layout and higher level structures did like - Linerization or Xref Streams. Dependent on the PDF standard some are permitted some are not. BR Maruan Am 10.03.2014 um 10:06 schrieb John Hewson j...@jahewson.com: The base syntax has not changed. But the elements described by the base have. If the syntax hasn’t changed then there can’t be anything in the parser which is version-specific. -- John On 10 Mar 2014, at 01:43, Maruan Sahyoun sahy...@fileaffairs.de wrote: Hi John, it’s not about PDF versions but PDF versions and standards. The base syntax has not changed. But the elements described by the base have. BR Maruan Sahyoun Am 10.03.2014 um 09:20 schrieb John Hewson j...@jahewson.com: Hi Maruan As of today PDFBox has no formal support for specific PDF versions in a way that a specific version can be enforced, validated ... Perhaps that is because there is not much demand for this? Nowadays everyone has instant access to the latest version of Adobe Reader so checking that a PDF can be opened with a specific version of Adobe Reader is not that useful anymore. There might be some niche cases, but I can’t think what they would be. For cases where it’s important that a PDF file is valid then a format such as PDF/A or PDF/X must be used instead as “vanilla PDF is ambiguous. The PDFBox PDF/A validation does a good job for PDF/A 1b but it can not be easily extended to other standards. Yes, PDF/A is carefully validated because it is for archival purposes, unlike regular PDF files. Do you think that there is a need for a more formal support of such standards and versions? The would influence some of the design decisions for the parser and affect the base objects. I can’t think of a reason why someone would want to parse a specific PDF version, so my answer is no, I don’t think there is such a need. Has the syntax of PDF even changed that much over the different versions? — John
Re: PageDrawer
Simon Yes, it is. The PageDrawer class is going through some big changes currently with more to come (e.g. it now renders content streams, not pages) and is not going to serve the same role it did in 1.8 and should no longer be used directly. Instead the new PDFRenderer should be used for rendering pages. -- John On 10 Mar 2014, at 02:18, Simon Steiner simonsteiner1...@gmail.com wrote: Hi, In 2.0 PageDrawer now takes a parameter, is this changed needed since it breaks API compatability? Thanks
Re: [DISCUSS] PDFBox and support for PDF versions, PDF standards
If the syntax hasn’t changed then there can’t be anything in the parser which is version-specific. I think we are talking about two different things here. The parsing process to get the tokens and the parsing process to follow the PDF file layout and to form and follow the higher level structures such as Xref. Yes, there are two phases, tokenizing and parsing; sometimes both are called parsing. Tokens didn’t change. File layout and higher level structures did like - Linerization or Xref Streams. Dependent on the PDF standard some are permitted some are not. That’s not right. The tokens have changed: “xref” is a keyword and therefore a token. Also, as I said originally, the syntax has changed, because what you call higher level structures” is actually the syntax. -- John On 10 Mar 2014, at 02:32, Maruan Sahyoun sahy...@fileaffairs.de wrote: I think we are talking about two different things here. The parsing process to get the tokens, and the parsing process to follow the PDF file layout and to form and follow the higher level structures such as Xref. Tokens didn’t change. File layout and higher level structures did like - Linerization or Xref Streams. Dependent on the PDF standard some are permitted some are not. BR Maruan Am 10.03.2014 um 10:06 schrieb John Hewson j...@jahewson.com: The base syntax has not changed. But the elements described by the base have. If the syntax hasn’t changed then there can’t be anything in the parser which is version-specific. -- John On 10 Mar 2014, at 01:43, Maruan Sahyoun sahy...@fileaffairs.de wrote: Hi John, it’s not about PDF versions but PDF versions and standards. The base syntax has not changed. But the elements described by the base have. BR Maruan Sahyoun Am 10.03.2014 um 09:20 schrieb John Hewson j...@jahewson.com: Hi Maruan As of today PDFBox has no formal support for specific PDF versions in a way that a specific version can be enforced, validated ... Perhaps that is because there is not much demand for this? Nowadays everyone has instant access to the latest version of Adobe Reader so checking that a PDF can be opened with a specific version of Adobe Reader is not that useful anymore. There might be some niche cases, but I can’t think what they would be. For cases where it’s important that a PDF file is valid then a format such as PDF/A or PDF/X must be used instead as “vanilla PDF is ambiguous. The PDFBox PDF/A validation does a good job for PDF/A 1b but it can not be easily extended to other standards. Yes, PDF/A is carefully validated because it is for archival purposes, unlike regular PDF files. Do you think that there is a need for a more formal support of such standards and versions? The would influence some of the design decisions for the parser and affect the base objects. I can’t think of a reason why someone would want to parse a specific PDF version, so my answer is no, I don’t think there is such a need. Has the syntax of PDF even changed that much over the different versions? — John
Re: [DISCUSS] PDFBox and support for PDF versions, PDF standards
OK - wasn’t precise enough - token types didn’t change but there are newer tokens introduced. As the syntax has changed do we need version and standards support in the parsing phase then? Other way would be to parse what’s in there and do validation etc. purely on the parsing result (COS model, PD model). Need to do that anyway. What about writing? BR Maruan Sahyoun Am 10.03.2014 um 11:43 schrieb John Hewson j...@jahewson.com: If the syntax hasn’t changed then there can’t be anything in the parser which is version-specific. I think we are talking about two different things here. The parsing process to get the tokens and the parsing process to follow the PDF file layout and to form and follow the higher level structures such as Xref. Yes, there are two phases, tokenizing and parsing; sometimes both are called parsing. Tokens didn’t change. File layout and higher level structures did like - Linerization or Xref Streams. Dependent on the PDF standard some are permitted some are not. That’s not right. The tokens have changed: “xref” is a keyword and therefore a token. Also, as I said originally, the syntax has changed, because what you call higher level structures” is actually the syntax. -- John On 10 Mar 2014, at 02:32, Maruan Sahyoun sahy...@fileaffairs.de wrote: I think we are talking about two different things here. The parsing process to get the tokens, and the parsing process to follow the PDF file layout and to form and follow the higher level structures such as Xref. Tokens didn’t change. File layout and higher level structures did like - Linerization or Xref Streams. Dependent on the PDF standard some are permitted some are not. BR Maruan Am 10.03.2014 um 10:06 schrieb John Hewson j...@jahewson.com: The base syntax has not changed. But the elements described by the base have. If the syntax hasn’t changed then there can’t be anything in the parser which is version-specific. -- John On 10 Mar 2014, at 01:43, Maruan Sahyoun sahy...@fileaffairs.de wrote: Hi John, it’s not about PDF versions but PDF versions and standards. The base syntax has not changed. But the elements described by the base have. BR Maruan Sahyoun Am 10.03.2014 um 09:20 schrieb John Hewson j...@jahewson.com: Hi Maruan As of today PDFBox has no formal support for specific PDF versions in a way that a specific version can be enforced, validated ... Perhaps that is because there is not much demand for this? Nowadays everyone has instant access to the latest version of Adobe Reader so checking that a PDF can be opened with a specific version of Adobe Reader is not that useful anymore. There might be some niche cases, but I can’t think what they would be. For cases where it’s important that a PDF file is valid then a format such as PDF/A or PDF/X must be used instead as “vanilla PDF is ambiguous. The PDFBox PDF/A validation does a good job for PDF/A 1b but it can not be easily extended to other standards. Yes, PDF/A is carefully validated because it is for archival purposes, unlike regular PDF files. Do you think that there is a need for a more formal support of such standards and versions? The would influence some of the design decisions for the parser and affect the base objects. I can’t think of a reason why someone would want to parse a specific PDF version, so my answer is no, I don’t think there is such a need. Has the syntax of PDF even changed that much over the different versions? — John
[jira] [Comment Edited] (PDFBOX-1164) Inline image parsing error causes RuntimeException + FIX
[ https://issues.apache.org/jira/browse/PDFBOX-1164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925689#comment-13925689 ] Timo Boehme edited comment on PDFBOX-1164 at 3/10/14 12:06 PM: --- I have to beg pardon that I haven't commit this fix when getting a committer. Thanks for doing it. was (Author: tboehme): I have to bag pardon that I haven't commit this fix when getting a committer. Thanks for doing it. Inline image parsing error causes RuntimeException + FIX Key: PDFBOX-1164 URL: https://issues.apache.org/jira/browse/PDFBOX-1164 Project: PDFBox Issue Type: Bug Components: Parsing Affects Versions: 1.7.0 Reporter: Timo Boehme Fix For: 1.8.5, 2.0.0 Attachments: PDFStreamParser.diff Inline images start with BI operator, followed by some parameters and ID operator. Then the binary image data with a trailing EI operator follows. The problem is the detection of the EI operator. The current code in PDFStreamParser requires the operator to be surrounded by whitespaces. However I have a document where the sequence EI with preceding 0x09 and following 0x20 occurs in the image data. Thus PDFBOX wrongly assumes the end of image data and the parsing later fails with a RuntimeException (from PDFStreamParser#getTokenIterator - this should be changed to throw IOException; will file another issue) because the following binary data is interpreted as operator. In earlier versions a heuristic was used to test the expected byte count of the image to circumvent this problem, however it was disabled because the data could also be compressed. To fix the problem I have added a test involving the following X (with X=5) bytes after the 'WS EI WS'. In order to treat the EI as operator all of the bytes must be printable ASCII characters because it can only be followed by PDF operators. If 5 bytes are too many because a comment with non ASCII character could follow this could be reduced to 3 bytes which in most cases should be enough. Diff of fix is added to this issue. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-1164) Inline image parsing error causes RuntimeException + FIX
[ https://issues.apache.org/jira/browse/PDFBOX-1164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925689#comment-13925689 ] Timo Boehme commented on PDFBOX-1164: - I have to bag pardon that I haven't commit this fix when getting a committer. Thanks for doing it. Inline image parsing error causes RuntimeException + FIX Key: PDFBOX-1164 URL: https://issues.apache.org/jira/browse/PDFBOX-1164 Project: PDFBox Issue Type: Bug Components: Parsing Affects Versions: 1.7.0 Reporter: Timo Boehme Fix For: 1.8.5, 2.0.0 Attachments: PDFStreamParser.diff Inline images start with BI operator, followed by some parameters and ID operator. Then the binary image data with a trailing EI operator follows. The problem is the detection of the EI operator. The current code in PDFStreamParser requires the operator to be surrounded by whitespaces. However I have a document where the sequence EI with preceding 0x09 and following 0x20 occurs in the image data. Thus PDFBOX wrongly assumes the end of image data and the parsing later fails with a RuntimeException (from PDFStreamParser#getTokenIterator - this should be changed to throw IOException; will file another issue) because the following binary data is interpreted as operator. In earlier versions a heuristic was used to test the expected byte count of the image to circumvent this problem, however it was disabled because the data could also be compressed. To fix the problem I have added a test involving the following X (with X=5) bytes after the 'WS EI WS'. In order to treat the EI as operator all of the bytes must be printable ASCII characters because it can only be followed by PDF operators. If 5 bytes are too many because a comment with non ASCII character could follow this could be reduced to 3 bytes which in most cases should be enough. Diff of fix is added to this issue. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: advanced signatures - the feature plans
Ok. If these classes are not for signing then why are they in the digitalsignature package? Should they be moved elsewhere? This classes is essential for signature but this classes does not signs document (cryptography signing is creating in the example-project). The same is for about *org.apache.pdfbox.pdmodel.interactive.digitalsignature* classes. For instance, PDSignature or SignatureInterface class is auxiliary class to create digital signature, but with cryptography signature is creating in the example project. The same is about .visible package *BUT*: In conclusion: I think, because of there was no sub-project when digital signature profile was writing, all the signature interfaces and classes were created in the *org.apache.pdfbox.pdmodel.**interactive.digitalsignature *package. if we move *org.apache.pdfbox.pdmodel.interactive.digitalsignature.* *visible* then we should move *org.apache.pdfbox.pdmodel.* *interactive.digitalsignature* classes too. In my opinion, it will be better if we move all the signature classes and interfaces in the new sub project not only *visible* package. I thought that you asked me only for *visible* package. On Mon, Mar 10, 2014 at 1:26 PM, John Hewson j...@jahewson.com wrote: Because of this classes is for creating visible signature fields (not for signing) we can not to move that classes. sign-box will be for only signing. Ok. If these classes are not for signing then why are they in the digitalsignature package? Should they be moved elsewhere? JIRA issue number? PDFBOX-1847 and PDFBOX-1848 Great, I'll take a look. Yes, but sub-project architecture must not be the same because that sub-project API must be very easy to use. So we might change some structures. That's fine, we can iterate and make changes. Getting a nice easy to use API always takes a number of refactorings, you can always submit more patches later on. As I see, that interfaces and classes are written very well. We will add another classes and interfaces for another signature functionality. But most of them will be in the new sub-project. We will move some classes from example-project to new sub-project, with different architecture. Well, two of the interfaces are only implemented by one class, so they are redundant - unless your new sub-project has some new classes which implement them (I guess it probably does?). -- John On 10 Mar 2014, at 01:49, Vakhtang koroghlishvili vakhtang.koroghlishv...@gmail.com wrote: Should the classes in org.apache.pdfbox.pdmodel. interactive.digitalsignature.visible be moved into this new project also Because of this classes is for creating visible signature fields (not for signing) we can not to move that classes. sign-box will be for only signing. is this already in PDFBox? As I remember Thomas Chojecki have implemented this in the example project of pdfbox like BASIC profile. We can make it BES with some changes. I have implemented PADES LTV in my computer (this profile is based on this issues PDFBOX-1847 and PDFBOX-1848) and we will add this too. JIRA issue number? PDFBOX-1847 and PDFBOX-1848 Creating your patches in the example project is fine, we can move them to a different sub-project for you. Yes, but sub-project architecture must not be the same because that sub-project API must be very easy to use. So we might change some structures. Finally the architect will be like that: you will just create signature object and then you will call signing method with your signature profile and parameters, and that's all it is very simplified. So we must create different architecture of that. As I remember Thomas Chojecki was creating code review of that patches. :) So we should wait :) Perhaps the org.apache.pdfbox.pdmodel.interactive.digitalsignature is in need of simplification, what do you think? As I see, that interfaces and classes are written very well. We will add another classes and interfaces for another signature functionality. But most of them will be in the new sub-project. We will move some classes from example-project to new sub-project, with different architecture. On Mon, Mar 10, 2014 at 10:54 AM, John Hewson j...@jahewson.com wrote: Hi Vakhtang, I think, it's time to create another project named sign-box or something like that. Should the classes in org.apache.pdfbox.pdmodel.interactive.digitalsignature.visible be moved into this new project also? My understanding is that the PDF spec defines a basic signature container which is extensible and can embed signature formats defined by others, e.g. the PAdES standard defined by ETSI. This seems like a good candidate for a new sub-project e.g. pdfbox-signing. 1. create basic digital signature with the time of CPU. *done* 2. create digital signature with visible signature. *done* This is very poor functionality and is not
Re: advanced signatures - the feature plans
Am 2014-03-10 10:13, schrieb Vakhtang koroghlishvili: Hello, Hi Vakhtang, i'm quite busy right now at work and planing my vacation. But at least I got my internet back this month *wohoo*. My codes have a CLA license, Tomas Chojecki have added it. At the moment some of them are committed and some of them are not (Tomas was created code review and was adding Individual CLA [2] license to them, but might be he was busy and some codes is not committed at the moment). What do you means about signing? Is there another procedure too? :) The CLA license is a document that you need to sign. With this document you assure that the code that you provide, belong to you and can be used by apache. We need this document first, before we can commit larger parts of code. Patches are ok without the CLA but for new features we need this document. So please look at the site [1] and fill the Individual Contributor License Agreement [2]. The other thing are licence header :-) Both are different things. Each class need to have this header. If such header are missing, we commiter will add these. Best regards Thomas [1] https://www.apache.org/licenses/ [2] https://www.apache.org/licenses/icla.txt
Re: advanced signatures - the feature plans
Vakhtang Ok. If these classes are not for signing then why are they in the digitalsignature package? Should they be moved elsewhere? This classes is essential for signature but this classes does not signs document (cryptography signing is creating in the example-project”) Ok, so the classes are used for signing - good - just checking! In my opinion, it will be better if we move all the signature classes and interfaces in the new sub project not only *visible* package. I thought that you asked me only for *visible* package. Yes, that’s what I was trying to figure out, I agree. -- John
[jira] [Commented] (PDFBOX-1963) PDFImageWriter doesn't make use of PDFStreamEngine
[ https://issues.apache.org/jira/browse/PDFBOX-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925931#comment-13925931 ] Tilman Hausherr commented on PDFBOX-1963: - Looking back, I realize that the test I had wasn't worth the name, because it didn't test whether the file written did have the correct compression. I wrote a test this weekend, however, that test needs JAI. We can't distribute JAI. But we could use it for test scope only. My proposed change would alter the code at three places: 1) in TestImageIOUtils {code} // testing TIFF imageType = tif; writeImage(document, imageType, outDir + file.getName() + -bw-, ImageType.BINARY, dpi); String tiffCompression = getTiffCompression(outDir + file.getName() + -bw-1.tif); assertEquals(CCITT T.6, tiffCompression); writeImage(document, imageType, outDir + file.getName() + -co-, ImageType.RGB, dpi); BufferedImage.TYPE_INT_RGB, resolution); tiffCompression = getTiffCompression(outDir + file.getName() + -co-1.tif); assertEquals(LZW, tiffCompression); {code} 2) I'm not including the code for getTiffCompression(), it simply returns the compression by reading the meta data XML object, e.g LZW or CCITT T.6. 3) in the pdfbox pom.xml: {code} dependency groupIdjavax.media/groupId artifactIdjai_imageio/artifactId version1.1/version scopetest/scope /dependency ... repository idosgeo/id nameOpen Source Geospatial Foundation Repository/name urlhttp://download.osgeo.org/webdav/geotools//url /repository {code} PDFImageWriter doesn't make use of PDFStreamEngine -- Key: PDFBOX-1963 URL: https://issues.apache.org/jira/browse/PDFBOX-1963 Project: PDFBox Issue Type: Improvement Components: Utilities Affects Versions: 2.0.0 Reporter: John Hewson Assignee: John Hewson Fix For: 2.0.0 PDFImageWriter is a subclass of PDFStreamEngine, however it never uses any of its functionality, the writeImage methods could be marked as static and behave in the same manner. The relationship between PDFImageWriter, RenderUtil, and ImageIOUtil no longer matches its historical origins and needs to be refactored. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-1963) PDFImageWriter doesn't make use of PDFStreamEngine
[ https://issues.apache.org/jira/browse/PDFBOX-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925942#comment-13925942 ] Tilman Hausherr commented on PDFBOX-1963: - Re: the commit, I didn't test it yet (although the previous comment does use an excerpt), but obviously, it is good :-) Thanks. PDFImageWriter doesn't make use of PDFStreamEngine -- Key: PDFBOX-1963 URL: https://issues.apache.org/jira/browse/PDFBOX-1963 Project: PDFBox Issue Type: Improvement Components: Utilities Affects Versions: 2.0.0 Reporter: John Hewson Assignee: John Hewson Fix For: 2.0.0 PDFImageWriter is a subclass of PDFStreamEngine, however it never uses any of its functionality, the writeImage methods could be marked as static and behave in the same manner. The relationship between PDFImageWriter, RenderUtil, and ImageIOUtil no longer matches its historical origins and needs to be refactored. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-1963) PDFImageWriter doesn't make use of PDFStreamEngine
[ https://issues.apache.org/jira/browse/PDFBOX-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925987#comment-13925987 ] John Hewson commented on PDFBOX-1963: - Yes, more unit tests! 1) and 2) look good to me. I'm not sure about 3) because Sun's Binary Code License is explicitly listed as being [prohibited for Apache projects|https://www.apache.org/legal/resolved.html#prohibited], but it's not clear exactly what is and isn't permitted, so I've opened LEGAL-195 to ask this question. PDFImageWriter doesn't make use of PDFStreamEngine -- Key: PDFBOX-1963 URL: https://issues.apache.org/jira/browse/PDFBOX-1963 Project: PDFBox Issue Type: Improvement Components: Utilities Affects Versions: 2.0.0 Reporter: John Hewson Assignee: John Hewson Fix For: 2.0.0 PDFImageWriter is a subclass of PDFStreamEngine, however it never uses any of its functionality, the writeImage methods could be marked as static and behave in the same manner. The relationship between PDFImageWriter, RenderUtil, and ImageIOUtil no longer matches its historical origins and needs to be refactored. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: advanced signatures - the feature plans
Hi, Ok, I will do license things too. :) thanks for explanations :) And who will create new sub project? Should I create this or will you create this when you have spear time? :) I'm not committer, I can only create patches and show you the result. :) Vakhtang, On Mon, Mar 10, 2014 at 6:19 PM, Thomas Chojecki i...@rayman2200.de wrote: Am 2014-03-10 10:13, schrieb Vakhtang koroghlishvili: Hello, Hi Vakhtang, i'm quite busy right now at work and planing my vacation. But at least I got my internet back this month *wohoo*. My codes have a CLA license, Tomas Chojecki have added it. At the moment some of them are committed and some of them are not (Tomas was created code review and was adding Individual CLA [2] license to them, but might be he was busy and some codes is not committed at the moment). What do you means about signing? Is there another procedure too? :) The CLA license is a document that you need to sign. With this document you assure that the code that you provide, belong to you and can be used by apache. We need this document first, before we can commit larger parts of code. Patches are ok without the CLA but for new features we need this document. So please look at the site [1] and fill the Individual Contributor License Agreement [2]. The other thing are licence header :-) Both are different things. Each class need to have this header. If such header are missing, we commiter will add these. Best regards Thomas [1] https://www.apache.org/licenses/ [2] https://www.apache.org/licenses/icla.txt
Re: advanced signatures - the feature plans
And who will create new sub project? Should I create this or will you create this when you have spear time? :) I'm not committer, I can only create patches and show you the result. :) I’m happy do do this for you. -- John
Re: [DISCUSS] PDFBox and support for PDF versions, PDF standards
OK - wasn’t precise enough - token types didn’t change but there are newer tokens introduced. Yes. As the syntax has changed do we need version and standards support in the parsing phase then? I don’t think so, no. I don’t know what the use-case would be. You’d have to go back and read all seven versions of the PDF Reference and make sure that the parser implements the correct handling for each version, that’s an awful lot of work. Other way would be to parse what’s in there and do validation etc. purely on the parsing result (COS model, PD model). Need to do that anyway. Yes, I prefer this approach, you can always write a tool which inspects a PDDocument and determines whether or not it uses features available in a given PDF version. It seems better to do this as a separate feature than to try and build it into the parser or the PD model directly. What about writing? Yes, we want versions for writing, because a user may want to generate e.g a PDF 1.6 file. This is going to be even more important in the near future because the PDF 2.0 standard is supposed to be introduced in 2014. -- John
[jira] [Commented] (PDFBOX-1915) Implement shading with Coons and tensor-product patch meshes
[ https://issues.apache.org/jira/browse/PDFBOX-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13926109#comment-13926109 ] Sergey Smal commented on PDFBOX-1915: - Thanks a lot for your reply! How can I get this issue as a project for GSoC 2014? Implement shading with Coons and tensor-product patch meshes Key: PDFBOX-1915 URL: https://issues.apache.org/jira/browse/PDFBOX-1915 Project: PDFBox Issue Type: Improvement Components: Rendering Affects Versions: 2.0.0 Reporter: Tilman Hausherr Labels: graphical, gsoc2014, java, math, shading Attachments: CONICAL.pdf, HSBWHEEL.pdf, McAfee-ShadingType7.pdf, TENSOR.pdf, XYZsweep.pdf, asy-coons-but-really-tensor.pdf, asy-tensor.pdf, lamp_cairo.pdf Of the seven shading methods described in the PDF specification, type 6 (Coons patch meshes) and type 7 (Tensor-product patch meshes) haven't been implemented. I have done type 1, 4 and 5, but I don't know the math for type 6 and 7. My math days are decades away. Knowledge prerequisites: - java, although you don't have to be a java ace, just feel confortable - math: you should know what cubic Bézier curves, Degenerate Bézier curves, bilinear interpolation, tensor-product, affine transform matrix and Bernstein polynomials are, or be able to learn it - maven (basic) - svn (basic) - an IDE like Netbeans or Eclipse or IntelliJ (basic) - ideally, you are either a math student who likes to program, or a computer science student who is specializing in graphics. A first look at PDFBOX: try the command utility here: https://pdfbox.apache.org/commandline/#pdfToImage and use your favorite PDF, or the PDFs mentioned in PDFBOX-615, these have the shading types that are already implemented. Some simple source code to convert to images: String filename = blah.pdf; PDDocument document = PDDocument.loadNonSeq(new File(filename), null); ListPDPage pdPages = document.getDocumentCatalog().getAllPages(); int page = 0; for (PDPage pdPage : pdPages) { ++page; BufferedImage bim = RenderUtil.convertToImage(pdPage, BufferedImage.TYPE_BYTE_BINARY, 300); ImageIO.write(bim, png, new File(filename+page+.png)); } document.close(); You are not starting from scratch. The implementation of type 4 and 5 shows you how to read parameters from the PDF and set the graphics. You don't have to learn the complete PDF spec, only 15 pages related to the two shading types, and 6 pages about shading in general. The PDF specification is here: http://www.adobe.com/devnet/pdf/pdf_reference.html The tricky parts are: - decide whether a point(x,y) is inside or outside a patch - decide the color of a point within the patch To get an idea about the code, look at the classes GouraudTriangle, GouraudShadingContext, Type4ShadingContext and Vertex here https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/shading/ or download the whole project from the repository. https://pdfbox.apache.org/downloads.html#scm If you want to see the existing code in the debugger with a Gouraud shading, try this file: http://asymptote.sourceforge.net/gallery/Gouraud.pdf Testing: I have attached several example PDFs. To see which one has which shading, open them with an editor like NOTEPAD++, and search for /ShadingType (without the quotes). If your images are rendering like the example PDFs, then you were successful. Optional: Review and optimize the complete shading package for speed; implement cubic spline interpolation for type 0 (sampled) functions (that one is really low-low priority, see details by looking up cubic spline interpolation in the PDF spec, which tells that it is disregarded in printing, and I don't have a test PDF). Mentor: Tilman Hausherr (European timezone, languages: german, english, french) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-1915) Implement shading with Coons and tensor-product patch meshes
[ https://issues.apache.org/jira/browse/PDFBOX-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13926131#comment-13926131 ] Tilman Hausherr commented on PDFBOX-1915: - Hello Sergey, it is a project for GSoC 2014, that is how you found it :-) If your question is how you can apply, this is done at https://www.google-melange.com/ . Implement shading with Coons and tensor-product patch meshes Key: PDFBOX-1915 URL: https://issues.apache.org/jira/browse/PDFBOX-1915 Project: PDFBox Issue Type: Improvement Components: Rendering Affects Versions: 2.0.0 Reporter: Tilman Hausherr Labels: graphical, gsoc2014, java, math, shading Attachments: CONICAL.pdf, HSBWHEEL.pdf, McAfee-ShadingType7.pdf, TENSOR.pdf, XYZsweep.pdf, asy-coons-but-really-tensor.pdf, asy-tensor.pdf, lamp_cairo.pdf Of the seven shading methods described in the PDF specification, type 6 (Coons patch meshes) and type 7 (Tensor-product patch meshes) haven't been implemented. I have done type 1, 4 and 5, but I don't know the math for type 6 and 7. My math days are decades away. Knowledge prerequisites: - java, although you don't have to be a java ace, just feel confortable - math: you should know what cubic Bézier curves, Degenerate Bézier curves, bilinear interpolation, tensor-product, affine transform matrix and Bernstein polynomials are, or be able to learn it - maven (basic) - svn (basic) - an IDE like Netbeans or Eclipse or IntelliJ (basic) - ideally, you are either a math student who likes to program, or a computer science student who is specializing in graphics. A first look at PDFBOX: try the command utility here: https://pdfbox.apache.org/commandline/#pdfToImage and use your favorite PDF, or the PDFs mentioned in PDFBOX-615, these have the shading types that are already implemented. Some simple source code to convert to images: String filename = blah.pdf; PDDocument document = PDDocument.loadNonSeq(new File(filename), null); ListPDPage pdPages = document.getDocumentCatalog().getAllPages(); int page = 0; for (PDPage pdPage : pdPages) { ++page; BufferedImage bim = RenderUtil.convertToImage(pdPage, BufferedImage.TYPE_BYTE_BINARY, 300); ImageIO.write(bim, png, new File(filename+page+.png)); } document.close(); You are not starting from scratch. The implementation of type 4 and 5 shows you how to read parameters from the PDF and set the graphics. You don't have to learn the complete PDF spec, only 15 pages related to the two shading types, and 6 pages about shading in general. The PDF specification is here: http://www.adobe.com/devnet/pdf/pdf_reference.html The tricky parts are: - decide whether a point(x,y) is inside or outside a patch - decide the color of a point within the patch To get an idea about the code, look at the classes GouraudTriangle, GouraudShadingContext, Type4ShadingContext and Vertex here https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/shading/ or download the whole project from the repository. https://pdfbox.apache.org/downloads.html#scm If you want to see the existing code in the debugger with a Gouraud shading, try this file: http://asymptote.sourceforge.net/gallery/Gouraud.pdf Testing: I have attached several example PDFs. To see which one has which shading, open them with an editor like NOTEPAD++, and search for /ShadingType (without the quotes). If your images are rendering like the example PDFs, then you were successful. Optional: Review and optimize the complete shading package for speed; implement cubic spline interpolation for type 0 (sampled) functions (that one is really low-low priority, see details by looking up cubic spline interpolation in the PDF spec, which tells that it is disregarded in printing, and I don't have a test PDF). Mentor: Tilman Hausherr (European timezone, languages: german, english, french) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-1847) TSA Time Signature
[ https://issues.apache.org/jira/browse/PDFBOX-1847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13926138#comment-13926138 ] John Hewson commented on PDFBOX-1847: - We need to wait until Vakhtang has a signed CLA is on file for this patch can be applied. TSA Time Signature -- Key: PDFBOX-1847 URL: https://issues.apache.org/jira/browse/PDFBOX-1847 Project: PDFBox Issue Type: Improvement Components: Signing Affects Versions: 1.8.4, 2.0.0 Reporter: vakhtang koroghlishvili Attachments: CreateSignature-updated.java.patch, TSATimeSignature.patch, resultOfSigning.jpg When we was signing document, we was using time from our time. For more security we can use Time Stamp server. Trusted timestamping is the process of securely keeping track of the creation and modification time of a document. Security here means that no one — not even the owner of the document — should be able to change it once it has been recorded provided that the timestamper's integrity is never compromised.(wiki) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-1848) Time Stamp Document Level Sigature
[ https://issues.apache.org/jira/browse/PDFBOX-1848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13926139#comment-13926139 ] John Hewson commented on PDFBOX-1848: - We need to wait until Vakhtang has a signed CLA is on file for this patch can be applied. Time Stamp Document Level Sigature -- Key: PDFBOX-1848 URL: https://issues.apache.org/jira/browse/PDFBOX-1848 Project: PDFBox Issue Type: Improvement Components: Signing Affects Versions: 1.8.4, 2.0.0 Reporter: vakhtang koroghlishvili Attachments: CreateTSASignature.java.patch, TSA-SIG-LOOKS-LIKE-THIS.png We need TSA Document Level signature modulo too! At the moment we sign document with our certificate. But... sometimes we need to sign document with TSA too. This is important part of signing. Sometimes this is very very very important- for instance when we will implement PAdES 4 profile this module will be essential. without that Document Secure Store will not work :) I'm working on this improvement. I'will finish this soon. It's almost done. I only must add some java docs, and might be I change architect design and etc.. So, please assign this it to me :) I will upload patch as soon as possible :) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (PDFBOX-1847) TSA Time Signature
[ https://issues.apache.org/jira/browse/PDFBOX-1847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13926138#comment-13926138 ] John Hewson edited comment on PDFBOX-1847 at 3/10/14 8:07 PM: -- We need to wait until Vakhtang has a signed CLA on file for this patch can be applied. was (Author: jahewson): We need to wait until Vakhtang has a signed CLA is on file for this patch can be applied. TSA Time Signature -- Key: PDFBOX-1847 URL: https://issues.apache.org/jira/browse/PDFBOX-1847 Project: PDFBox Issue Type: Improvement Components: Signing Affects Versions: 1.8.4, 2.0.0 Reporter: vakhtang koroghlishvili Attachments: CreateSignature-updated.java.patch, TSATimeSignature.patch, resultOfSigning.jpg When we was signing document, we was using time from our time. For more security we can use Time Stamp server. Trusted timestamping is the process of securely keeping track of the creation and modification time of a document. Security here means that no one — not even the owner of the document — should be able to change it once it has been recorded provided that the timestamper's integrity is never compromised.(wiki) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (PDFBOX-1848) Time Stamp Document Level Sigature
[ https://issues.apache.org/jira/browse/PDFBOX-1848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13926139#comment-13926139 ] John Hewson edited comment on PDFBOX-1848 at 3/10/14 8:07 PM: -- We need to wait until Vakhtang has a signed CLA on file before this patch can be applied. was (Author: jahewson): We need to wait until Vakhtang has a signed CLA on file for this patch can be applied. Time Stamp Document Level Sigature -- Key: PDFBOX-1848 URL: https://issues.apache.org/jira/browse/PDFBOX-1848 Project: PDFBox Issue Type: Improvement Components: Signing Affects Versions: 1.8.4, 2.0.0 Reporter: vakhtang koroghlishvili Attachments: CreateTSASignature.java.patch, TSA-SIG-LOOKS-LIKE-THIS.png We need TSA Document Level signature modulo too! At the moment we sign document with our certificate. But... sometimes we need to sign document with TSA too. This is important part of signing. Sometimes this is very very very important- for instance when we will implement PAdES 4 profile this module will be essential. without that Document Secure Store will not work :) I'm working on this improvement. I'will finish this soon. It's almost done. I only must add some java docs, and might be I change architect design and etc.. So, please assign this it to me :) I will upload patch as soon as possible :) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (PDFBOX-1848) Time Stamp Document Level Sigature
[ https://issues.apache.org/jira/browse/PDFBOX-1848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13926139#comment-13926139 ] John Hewson edited comment on PDFBOX-1848 at 3/10/14 8:07 PM: -- We need to wait until Vakhtang has a signed CLA on file for this patch can be applied. was (Author: jahewson): We need to wait until Vakhtang has a signed CLA is on file for this patch can be applied. Time Stamp Document Level Sigature -- Key: PDFBOX-1848 URL: https://issues.apache.org/jira/browse/PDFBOX-1848 Project: PDFBox Issue Type: Improvement Components: Signing Affects Versions: 1.8.4, 2.0.0 Reporter: vakhtang koroghlishvili Attachments: CreateTSASignature.java.patch, TSA-SIG-LOOKS-LIKE-THIS.png We need TSA Document Level signature modulo too! At the moment we sign document with our certificate. But... sometimes we need to sign document with TSA too. This is important part of signing. Sometimes this is very very very important- for instance when we will implement PAdES 4 profile this module will be essential. without that Document Secure Store will not work :) I'm working on this improvement. I'will finish this soon. It's almost done. I only must add some java docs, and might be I change architect design and etc.. So, please assign this it to me :) I will upload patch as soon as possible :) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (PDFBOX-1847) TSA Time Signature
[ https://issues.apache.org/jira/browse/PDFBOX-1847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13926138#comment-13926138 ] John Hewson edited comment on PDFBOX-1847 at 3/10/14 8:07 PM: -- We need to wait until Vakhtang has a signed CLA on file before this patch can be applied. was (Author: jahewson): We need to wait until Vakhtang has a signed CLA on file for this patch can be applied. TSA Time Signature -- Key: PDFBOX-1847 URL: https://issues.apache.org/jira/browse/PDFBOX-1847 Project: PDFBox Issue Type: Improvement Components: Signing Affects Versions: 1.8.4, 2.0.0 Reporter: vakhtang koroghlishvili Attachments: CreateSignature-updated.java.patch, TSATimeSignature.patch, resultOfSigning.jpg When we was signing document, we was using time from our time. For more security we can use Time Stamp server. Trusted timestamping is the process of securely keeping track of the creation and modification time of a document. Security here means that no one — not even the owner of the document — should be able to change it once it has been recorded provided that the timestamper's integrity is never compromised.(wiki) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-1963) PDFImageWriter doesn't make use of PDFStreamEngine
[ https://issues.apache.org/jira/browse/PDFBOX-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13926155#comment-13926155 ] Tilman Hausherr commented on PDFBOX-1963: - It crashes when trying to write a bitonal TIFF. One cause is that setCompressionMode() was at the correct place, I fixed this in rev 1576067. (More to come) PDFImageWriter doesn't make use of PDFStreamEngine -- Key: PDFBOX-1963 URL: https://issues.apache.org/jira/browse/PDFBOX-1963 Project: PDFBox Issue Type: Improvement Components: Utilities Affects Versions: 2.0.0 Reporter: John Hewson Assignee: John Hewson Fix For: 2.0.0 PDFImageWriter is a subclass of PDFStreamEngine, however it never uses any of its functionality, the writeImage methods could be marked as static and behave in the same manner. The relationship between PDFImageWriter, RenderUtil, and ImageIOUtil no longer matches its historical origins and needs to be refactored. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-1963) PDFImageWriter doesn't make use of PDFStreamEngine
[ https://issues.apache.org/jira/browse/PDFBOX-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13926178#comment-13926178 ] Tilman Hausherr commented on PDFBOX-1963: - I couldn't write TIFF files, the apparent cause was the branches of an if statement that were switched. I fixed this in rev 1576071. Now I'm missing the resolution in PNG files. PDFImageWriter doesn't make use of PDFStreamEngine -- Key: PDFBOX-1963 URL: https://issues.apache.org/jira/browse/PDFBOX-1963 Project: PDFBox Issue Type: Improvement Components: Utilities Affects Versions: 2.0.0 Reporter: John Hewson Assignee: John Hewson Fix For: 2.0.0 PDFImageWriter is a subclass of PDFStreamEngine, however it never uses any of its functionality, the writeImage methods could be marked as static and behave in the same manner. The relationship between PDFImageWriter, RenderUtil, and ImageIOUtil no longer matches its historical origins and needs to be refactored. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-1915) Implement shading with Coons and tensor-product patch meshes
[ https://issues.apache.org/jira/browse/PDFBOX-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13926180#comment-13926180 ] Sergey Smal commented on PDFBOX-1915: - Sorry, i visited that site before registration opens, that's why i asked a question. I find everything I need, thanks again :) Implement shading with Coons and tensor-product patch meshes Key: PDFBOX-1915 URL: https://issues.apache.org/jira/browse/PDFBOX-1915 Project: PDFBox Issue Type: Improvement Components: Rendering Affects Versions: 2.0.0 Reporter: Tilman Hausherr Labels: graphical, gsoc2014, java, math, shading Attachments: CONICAL.pdf, HSBWHEEL.pdf, McAfee-ShadingType7.pdf, TENSOR.pdf, XYZsweep.pdf, asy-coons-but-really-tensor.pdf, asy-tensor.pdf, lamp_cairo.pdf Of the seven shading methods described in the PDF specification, type 6 (Coons patch meshes) and type 7 (Tensor-product patch meshes) haven't been implemented. I have done type 1, 4 and 5, but I don't know the math for type 6 and 7. My math days are decades away. Knowledge prerequisites: - java, although you don't have to be a java ace, just feel confortable - math: you should know what cubic Bézier curves, Degenerate Bézier curves, bilinear interpolation, tensor-product, affine transform matrix and Bernstein polynomials are, or be able to learn it - maven (basic) - svn (basic) - an IDE like Netbeans or Eclipse or IntelliJ (basic) - ideally, you are either a math student who likes to program, or a computer science student who is specializing in graphics. A first look at PDFBOX: try the command utility here: https://pdfbox.apache.org/commandline/#pdfToImage and use your favorite PDF, or the PDFs mentioned in PDFBOX-615, these have the shading types that are already implemented. Some simple source code to convert to images: String filename = blah.pdf; PDDocument document = PDDocument.loadNonSeq(new File(filename), null); ListPDPage pdPages = document.getDocumentCatalog().getAllPages(); int page = 0; for (PDPage pdPage : pdPages) { ++page; BufferedImage bim = RenderUtil.convertToImage(pdPage, BufferedImage.TYPE_BYTE_BINARY, 300); ImageIO.write(bim, png, new File(filename+page+.png)); } document.close(); You are not starting from scratch. The implementation of type 4 and 5 shows you how to read parameters from the PDF and set the graphics. You don't have to learn the complete PDF spec, only 15 pages related to the two shading types, and 6 pages about shading in general. The PDF specification is here: http://www.adobe.com/devnet/pdf/pdf_reference.html The tricky parts are: - decide whether a point(x,y) is inside or outside a patch - decide the color of a point within the patch To get an idea about the code, look at the classes GouraudTriangle, GouraudShadingContext, Type4ShadingContext and Vertex here https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/shading/ or download the whole project from the repository. https://pdfbox.apache.org/downloads.html#scm If you want to see the existing code in the debugger with a Gouraud shading, try this file: http://asymptote.sourceforge.net/gallery/Gouraud.pdf Testing: I have attached several example PDFs. To see which one has which shading, open them with an editor like NOTEPAD++, and search for /ShadingType (without the quotes). If your images are rendering like the example PDFs, then you were successful. Optional: Review and optimize the complete shading package for speed; implement cubic spline interpolation for type 0 (sampled) functions (that one is really low-low priority, see details by looking up cubic spline interpolation in the PDF spec, which tells that it is disregarded in printing, and I don't have a test PDF). Mentor: Tilman Hausherr (European timezone, languages: german, english, french) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-1963) PDFImageWriter doesn't make use of PDFStreamEngine
[ https://issues.apache.org/jira/browse/PDFBOX-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13926190#comment-13926190 ] John Hewson commented on PDFBOX-1963: - {code} Now I'm missing the resolution in PNG files. {code} That's strange, we could do with a unit test which tests this too, i.e. it reads back in the written PNG and checks that the resolution is correct. PDFImageWriter doesn't make use of PDFStreamEngine -- Key: PDFBOX-1963 URL: https://issues.apache.org/jira/browse/PDFBOX-1963 Project: PDFBox Issue Type: Improvement Components: Utilities Affects Versions: 2.0.0 Reporter: John Hewson Assignee: John Hewson Fix For: 2.0.0 PDFImageWriter is a subclass of PDFStreamEngine, however it never uses any of its functionality, the writeImage methods could be marked as static and behave in the same manner. The relationship between PDFImageWriter, RenderUtil, and ImageIOUtil no longer matches its historical origins and needs to be refactored. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-1963) PDFImageWriter doesn't make use of PDFStreamEngine
[ https://issues.apache.org/jira/browse/PDFBOX-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13926201#comment-13926201 ] John Hewson commented on PDFBOX-1963: - I've found the problem with PNG resolution, HorizontalPixelSize is repeated twice: {code} Element h = new IIOMetadataNode(HorizontalPixelSize); h.setAttribute(value, Double.toString(dpi / 25.4)); dimension.appendChild(h); Element v = new IIOMetadataNode(HorizontalPixelSize); v.setAttribute(value, Double.toString(dpi / 25.4)); dimension.appendChild(v); {code} But it should be VerticalPixelSize the second time. Fixed in 1576078.. PDFImageWriter doesn't make use of PDFStreamEngine -- Key: PDFBOX-1963 URL: https://issues.apache.org/jira/browse/PDFBOX-1963 Project: PDFBox Issue Type: Improvement Components: Utilities Affects Versions: 2.0.0 Reporter: John Hewson Assignee: John Hewson Fix For: 2.0.0 PDFImageWriter is a subclass of PDFStreamEngine, however it never uses any of its functionality, the writeImage methods could be marked as static and behave in the same manner. The relationship between PDFImageWriter, RenderUtil, and ImageIOUtil no longer matches its historical origins and needs to be refactored. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-1963) PDFImageWriter doesn't make use of PDFStreamEngine
[ https://issues.apache.org/jira/browse/PDFBOX-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13926210#comment-13926210 ] John Hewson commented on PDFBOX-1963: - Made a fix to PDFToImage because it was not using the ImageType enum correctly, revision 1576084. PDFImageWriter doesn't make use of PDFStreamEngine -- Key: PDFBOX-1963 URL: https://issues.apache.org/jira/browse/PDFBOX-1963 Project: PDFBox Issue Type: Improvement Components: Utilities Affects Versions: 2.0.0 Reporter: John Hewson Assignee: John Hewson Fix For: 2.0.0 PDFImageWriter is a subclass of PDFStreamEngine, however it never uses any of its functionality, the writeImage methods could be marked as static and behave in the same manner. The relationship between PDFImageWriter, RenderUtil, and ImageIOUtil no longer matches its historical origins and needs to be refactored. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-1963) PDFImageWriter doesn't make use of PDFStreamEngine
[ https://issues.apache.org/jira/browse/PDFBOX-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13926211#comment-13926211 ] Tilman Hausherr commented on PDFBOX-1963: - Yeah, we really need better unit tests. I'll handle that resolution thing in the next few days. I fixed one more thing in rev 1576083, the dimension wasn't attached. PDFImageWriter doesn't make use of PDFStreamEngine -- Key: PDFBOX-1963 URL: https://issues.apache.org/jira/browse/PDFBOX-1963 Project: PDFBox Issue Type: Improvement Components: Utilities Affects Versions: 2.0.0 Reporter: John Hewson Assignee: John Hewson Fix For: 2.0.0 PDFImageWriter is a subclass of PDFStreamEngine, however it never uses any of its functionality, the writeImage methods could be marked as static and behave in the same manner. The relationship between PDFImageWriter, RenderUtil, and ImageIOUtil no longer matches its historical origins and needs to be refactored. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-1963) PDFImageWriter doesn't make use of PDFStreamEngine
[ https://issues.apache.org/jira/browse/PDFBOX-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13926215#comment-13926215 ] John Hewson commented on PDFBOX-1963: - Great, ImageIOUtil is looking nice and maintainable now! PDFImageWriter doesn't make use of PDFStreamEngine -- Key: PDFBOX-1963 URL: https://issues.apache.org/jira/browse/PDFBOX-1963 Project: PDFBox Issue Type: Improvement Components: Utilities Affects Versions: 2.0.0 Reporter: John Hewson Assignee: John Hewson Fix For: 2.0.0 PDFImageWriter is a subclass of PDFStreamEngine, however it never uses any of its functionality, the writeImage methods could be marked as static and behave in the same manner. The relationship between PDFImageWriter, RenderUtil, and ImageIOUtil no longer matches its historical origins and needs to be refactored. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-1094) Pattern colorspace support
[ https://issues.apache.org/jira/browse/PDFBOX-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13926331#comment-13926331 ] John Hewson commented on PDFBOX-1094: - I'm extending TilingPaint so that it accepts the parent stream's initial CTM as a parameter. However, this information is not available in PageDrawer, or its superclass PDFStreamEngine. In fact, there is no class which represents a content stream. In order for the information we need to be available to PageDrawer#drawTilingPattern we need to hold a reference to the current content stream and have it implement a method such as getDefaultMatrix() which is then overridden in the various subclasses, e.g. PageContentStream, PatternContentStream. Because we never needed this previously there is actually no class in PDFBox which represents a content stream. There is a class named PDPageContentStream but it is a helper class for writing content streams and does not even represent a PD object! We need to introduce a new PDContentStream class, and rename the confusingly named helper to something like ContentStreamBuilder and move it to pdfbox.pdfwriter along with the other writing classes. Then we can pass an instance of the new PDContentStream class to PageDrawer#drawTilingPattern so that we have access to the stream's default CTM. Phew! Pattern colorspace support -- Key: PDFBOX-1094 URL: https://issues.apache.org/jira/browse/PDFBOX-1094 Project: PDFBox Issue Type: Improvement Components: Rendering Affects Versions: 1.6.0 Reporter: Andreas Lehmkühler Assignee: John Hewson Priority: Minor Attachments: ColoredTilingPaint.patch, PATTYP1.pdf, PATTYP2.pdf, PDF32000_2008_pg737.pdf, PDFStreamEngine.patch, PageDrawer.patch, _pdfbox-1094-tiling_pattern.pdf-1-blurry.png, jagpdf_doc_patterns.pdf, jagpdf_doc_patterns.pdf-1.png, pdfbox-1094-pdf32000_2008_pg737.pdf-1.png, pdfbox-1094-pdf32000_2008_pg737.pdf-1.png, pdfbox-1094-tiling_pattern.pdf-1.png, pdfbox-1094-tiling_pattern.pdf-1.png, pdfbox-1094-tiling_pattern.pdf-1.png, pdfbox-1861-tracemonkey.pdf-13.png, pdfbox-1861-tracemonkey.pdf-13.png, tiling_pattern.pdf PDFBox doesn't support PDPattern colorspaces -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (PDFBOX-1094) Pattern colorspace support
[ https://issues.apache.org/jira/browse/PDFBOX-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13926331#comment-13926331 ] John Hewson edited comment on PDFBOX-1094 at 3/10/14 10:18 PM: --- I'm extending TilingPaint so that it accepts the parent stream's initial CTM as a parameter. However, this information is not available in PageDrawer, or its superclass PDFStreamEngine. In fact, there is no class which represents a content stream. In order for the information we need to be available to PageDrawer#drawTilingPattern we need to hold a reference to the current content stream and have it implement a method such as getDefaultMatrix() which is then overridden in the various subclasses, e.g. PageContentStream, PatternContentStream. Because we never needed this previously there is actually no class in PDFBox which represents a content stream. There is a class named PDPageContentStream but it is a helper class for writing content streams and does not even represent a PD object! We need to introduce a new PDContentStream class, and rename the confusingly named helper to something like ContentStreamBuilder and move it to pdfbox.pdfwriter along with the other writing classes. Then we can pass an instance of the new PDContentStream class to PageDrawer#drawTilingPattern so that we have access to the stream's default CTM. Then I can begin to get tiling paint working... was (Author: jahewson): I'm extending TilingPaint so that it accepts the parent stream's initial CTM as a parameter. However, this information is not available in PageDrawer, or its superclass PDFStreamEngine. In fact, there is no class which represents a content stream. In order for the information we need to be available to PageDrawer#drawTilingPattern we need to hold a reference to the current content stream and have it implement a method such as getDefaultMatrix() which is then overridden in the various subclasses, e.g. PageContentStream, PatternContentStream. Because we never needed this previously there is actually no class in PDFBox which represents a content stream. There is a class named PDPageContentStream but it is a helper class for writing content streams and does not even represent a PD object! We need to introduce a new PDContentStream class, and rename the confusingly named helper to something like ContentStreamBuilder and move it to pdfbox.pdfwriter along with the other writing classes. Then we can pass an instance of the new PDContentStream class to PageDrawer#drawTilingPattern so that we have access to the stream's default CTM. Phew! Pattern colorspace support -- Key: PDFBOX-1094 URL: https://issues.apache.org/jira/browse/PDFBOX-1094 Project: PDFBox Issue Type: Improvement Components: Rendering Affects Versions: 1.6.0 Reporter: Andreas Lehmkühler Assignee: John Hewson Priority: Minor Attachments: ColoredTilingPaint.patch, PATTYP1.pdf, PATTYP2.pdf, PDF32000_2008_pg737.pdf, PDFStreamEngine.patch, PageDrawer.patch, _pdfbox-1094-tiling_pattern.pdf-1-blurry.png, jagpdf_doc_patterns.pdf, jagpdf_doc_patterns.pdf-1.png, pdfbox-1094-pdf32000_2008_pg737.pdf-1.png, pdfbox-1094-pdf32000_2008_pg737.pdf-1.png, pdfbox-1094-tiling_pattern.pdf-1.png, pdfbox-1094-tiling_pattern.pdf-1.png, pdfbox-1094-tiling_pattern.pdf-1.png, pdfbox-1861-tracemonkey.pdf-13.png, pdfbox-1861-tracemonkey.pdf-13.png, tiling_pattern.pdf PDFBox doesn't support PDPattern colorspaces -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-1946) Running within an Applet has many AccessControlException 's
[ https://issues.apache.org/jira/browse/PDFBOX-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fred Andrews updated PDFBOX-1946: - Attachment: patch.zip Took me a while to get back to this... Attached is a zip file with 5 patch files that correct all the applet issues and security violations I found. I had also needed to change org\apache\pdfbox\pdmodel\graphics\color\PDColorState but that class seemed to have been deleted. And to address your previous question, this was on a signed Applet. The latest versions of Java seem to be more restrictive on what even signed Applets are allowed to do. Running within an Applet has many AccessControlException 's --- Key: PDFBOX-1946 URL: https://issues.apache.org/jira/browse/PDFBOX-1946 Project: PDFBox Issue Type: Wish Affects Versions: 1.8.4 Environment: Running within an Applet Reporter: Fred Andrews Labels: Security Attachments: patch.zip I've identified 6 modules that should be modified to avoid AccessControlException's while running within an Applet. My solution would be to catch each AccessControlException and then use a default or continue on. For most of these, that is probably the best solution, for a few especially PDFStreamEngine someone may have a better idea. The modules that have issues: pdfbox\pdfparser\BaseParser -- line 131 call to Boolean.getBoolean, line 170 call to Integer.getInteger pdfbox\util\PDFTextStripper -- line 79 call to System.getProperty() pdfbox\util\ResourceLoader -- line 67 call to getSystemClassLoader() pdfbox\pdmodel\graphics\color\PDColorState -- line 50, call to Color.getColor pdfbox/encoding/Encoding -- line 78, call to System.getProperty pdfbox\util\PDFStreamEngine -- Line 351 364 check for font == null (will be null if had resource loading problems) Not sure what the best way is to proceed. Please advise. Thanks -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (PDFBOX-1094) Pattern colorspace support
[ https://issues.apache.org/jira/browse/PDFBOX-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13926331#comment-13926331 ] John Hewson edited comment on PDFBOX-1094 at 3/11/14 2:49 AM: -- I'm extending TilingPaint so that it accepts the parent stream's initial CTM as a parameter. However, this information is not available in PageDrawer, or its superclass PDFStreamEngine. In order for the information we need to be available to PageDrawer#drawTilingPattern we need to hold a reference to the current content stream in PDFStreamEngine. Update: It looks like we need to pass the parent stream to PDFStreamEngine processStream (or processSubStream, processTilingPattern). was (Author: jahewson): I'm extending TilingPaint so that it accepts the parent stream's initial CTM as a parameter. However, this information is not available in PageDrawer, or its superclass PDFStreamEngine. In fact, there is no class which represents a content stream. In order for the information we need to be available to PageDrawer#drawTilingPattern we need to hold a reference to the current content stream in PDFStreamEngine. Update: It looks like we need to pass the parent stream to PDFStreamEngine processStream (or processSubStream, processTilingPattern). Pattern colorspace support -- Key: PDFBOX-1094 URL: https://issues.apache.org/jira/browse/PDFBOX-1094 Project: PDFBox Issue Type: Improvement Components: Rendering Affects Versions: 1.6.0 Reporter: Andreas Lehmkühler Assignee: John Hewson Priority: Minor Attachments: ColoredTilingPaint.patch, PATTYP1.pdf, PATTYP2.pdf, PDF32000_2008_pg737.pdf, PDFStreamEngine.patch, PageDrawer.patch, _pdfbox-1094-tiling_pattern.pdf-1-blurry.png, jagpdf_doc_patterns.pdf, jagpdf_doc_patterns.pdf-1.png, pdfbox-1094-pdf32000_2008_pg737.pdf-1.png, pdfbox-1094-pdf32000_2008_pg737.pdf-1.png, pdfbox-1094-tiling_pattern.pdf-1.png, pdfbox-1094-tiling_pattern.pdf-1.png, pdfbox-1094-tiling_pattern.pdf-1.png, pdfbox-1861-tracemonkey.pdf-13.png, pdfbox-1861-tracemonkey.pdf-13.png, tiling_pattern.pdf PDFBox doesn't support PDPattern colorspaces -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (PDFBOX-1094) Pattern colorspace support
[ https://issues.apache.org/jira/browse/PDFBOX-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13926331#comment-13926331 ] John Hewson edited comment on PDFBOX-1094 at 3/11/14 2:49 AM: -- I'm extending TilingPaint so that it accepts the parent stream's initial CTM as a parameter. However, this information is not available in PageDrawer, or its superclass PDFStreamEngine. In fact, there is no class which represents a content stream. In order for the information we need to be available to PageDrawer#drawTilingPattern we need to hold a reference to the current content stream in PDFStreamEngine. Update: It looks like we need to pass the parent stream to PDFStreamEngine processStream (or processSubStream, processTilingPattern). was (Author: jahewson): I'm extending TilingPaint so that it accepts the parent stream's initial CTM as a parameter. However, this information is not available in PageDrawer, or its superclass PDFStreamEngine. In fact, there is no class which represents a content stream. In order for the information we need to be available to PageDrawer#drawTilingPattern we need to hold a reference to the current content stream and have it implement a method such as getDefaultMatrix() which is then overridden in the various subclasses, e.g. PageContentStream, PatternContentStream. Because we never needed this previously there is actually no class in PDFBox which represents a content stream. There is a class named PDPageContentStream but it is a helper class for writing content streams and does not even represent a PD object! We need to introduce a new PDContentStream class, and rename the confusingly named helper to something like ContentStreamBuilder and move it to pdfbox.pdfwriter along with the other writing classes. Then we can pass an instance of the new PDContentStream class to PageDrawer#drawTilingPattern so that we have access to the stream's default CTM. Then I can begin to get tiling paint working... Pattern colorspace support -- Key: PDFBOX-1094 URL: https://issues.apache.org/jira/browse/PDFBOX-1094 Project: PDFBox Issue Type: Improvement Components: Rendering Affects Versions: 1.6.0 Reporter: Andreas Lehmkühler Assignee: John Hewson Priority: Minor Attachments: ColoredTilingPaint.patch, PATTYP1.pdf, PATTYP2.pdf, PDF32000_2008_pg737.pdf, PDFStreamEngine.patch, PageDrawer.patch, _pdfbox-1094-tiling_pattern.pdf-1-blurry.png, jagpdf_doc_patterns.pdf, jagpdf_doc_patterns.pdf-1.png, pdfbox-1094-pdf32000_2008_pg737.pdf-1.png, pdfbox-1094-pdf32000_2008_pg737.pdf-1.png, pdfbox-1094-tiling_pattern.pdf-1.png, pdfbox-1094-tiling_pattern.pdf-1.png, pdfbox-1094-tiling_pattern.pdf-1.png, pdfbox-1861-tracemonkey.pdf-13.png, pdfbox-1861-tracemonkey.pdf-13.png, tiling_pattern.pdf PDFBox doesn't support PDPattern colorspaces -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (PDFBOX-1972) WrappedIOException no longer needed in Java 1.6
John Hewson created PDFBOX-1972: --- Summary: WrappedIOException no longer needed in Java 1.6 Key: PDFBOX-1972 URL: https://issues.apache.org/jira/browse/PDFBOX-1972 Project: PDFBox Issue Type: Bug Components: PDModel Affects Versions: 2.0.0 Reporter: John Hewson Priority: Minor Java 1.6 added the IOException(Throwable cause) constructor which means that WrappedIOException is no longer needed. WrappedIOException is never caught anywhere, only its superclass IOException is, so this is an easy change to make. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-1972) WrappedIOException no longer needed in Java 1.6
[ https://issues.apache.org/jira/browse/PDFBOX-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Hewson updated PDFBOX-1972: Issue Type: Improvement (was: Bug) WrappedIOException no longer needed in Java 1.6 --- Key: PDFBOX-1972 URL: https://issues.apache.org/jira/browse/PDFBOX-1972 Project: PDFBox Issue Type: Improvement Components: PDModel Affects Versions: 2.0.0 Reporter: John Hewson Priority: Minor Java 1.6 added the IOException(Throwable cause) constructor which means that WrappedIOException is no longer needed. WrappedIOException is never caught anywhere, only its superclass IOException is, so this is an easy change to make. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-1972) WrappedIOException no longer needed in Java 1.6
[ https://issues.apache.org/jira/browse/PDFBOX-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13929925#comment-13929925 ] John Hewson commented on PDFBOX-1972: - Fixed in revision 1576185. WrappedIOException no longer needed in Java 1.6 --- Key: PDFBOX-1972 URL: https://issues.apache.org/jira/browse/PDFBOX-1972 Project: PDFBox Issue Type: Improvement Components: PDModel Affects Versions: 2.0.0 Reporter: John Hewson Priority: Minor Fix For: 2.0.0 Java 1.6 added the IOException(Throwable cause) constructor which means that WrappedIOException is no longer needed. WrappedIOException is never caught anywhere, only its superclass IOException is, so this is an easy change to make. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (PDFBOX-1972) WrappedIOException no longer needed in Java 1.6
[ https://issues.apache.org/jira/browse/PDFBOX-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Hewson resolved PDFBOX-1972. - Resolution: Fixed Fix Version/s: 2.0.0 WrappedIOException no longer needed in Java 1.6 --- Key: PDFBOX-1972 URL: https://issues.apache.org/jira/browse/PDFBOX-1972 Project: PDFBox Issue Type: Improvement Components: PDModel Affects Versions: 2.0.0 Reporter: John Hewson Priority: Minor Fix For: 2.0.0 Java 1.6 added the IOException(Throwable cause) constructor which means that WrappedIOException is no longer needed. WrappedIOException is never caught anywhere, only its superclass IOException is, so this is an easy change to make. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-1972) WrappedIOException no longer needed in Java 1.6
[ https://issues.apache.org/jira/browse/PDFBOX-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13929930#comment-13929930 ] John Hewson commented on PDFBOX-1972: - WrappedException is also no longer needed for the same reason, so revision 1576186 removes it also. WrappedIOException no longer needed in Java 1.6 --- Key: PDFBOX-1972 URL: https://issues.apache.org/jira/browse/PDFBOX-1972 Project: PDFBox Issue Type: Improvement Components: PDModel Affects Versions: 2.0.0 Reporter: John Hewson Priority: Minor Fix For: 2.0.0 Java 1.6 added the IOException(Throwable cause) constructor which means that WrappedIOException is no longer needed. WrappedIOException is never caught anywhere, only its superclass IOException is, so this is an easy change to make. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (PDFBOX-1973) COSVisitorException should be removed
John Hewson created PDFBOX-1973: --- Summary: COSVisitorException should be removed Key: PDFBOX-1973 URL: https://issues.apache.org/jira/browse/PDFBOX-1973 Project: PDFBox Issue Type: Improvement Reporter: John Hewson Priority: Minor COSVisitorException is redundant, it is a simple wrapper for SignatureException, CryptographyException and NoSuchAlgorithmException and should be replaced by those exceptions directly. For example, we can replace: public void write(PDDocument doc) throws COSVisitorException With: public void write(PDDocument doc) throws IOException, CryptographyException and so on... -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (PDFBOX-1973) COSVisitorException should be removed
[ https://issues.apache.org/jira/browse/PDFBOX-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Hewson reassigned PDFBOX-1973: --- Assignee: John Hewson COSVisitorException should be removed - Key: PDFBOX-1973 URL: https://issues.apache.org/jira/browse/PDFBOX-1973 Project: PDFBox Issue Type: Improvement Reporter: John Hewson Assignee: John Hewson Priority: Minor COSVisitorException is redundant, it is a simple wrapper for SignatureException, CryptographyException and NoSuchAlgorithmException and should be replaced by those exceptions directly. For example, we can replace: public void write(PDDocument doc) throws COSVisitorException With: public void write(PDDocument doc) throws IOException, CryptographyException and so on... -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (PDFBOX-1974) ICOSVisitor is redundant
John Hewson created PDFBOX-1974: --- Summary: ICOSVisitor is redundant Key: PDFBOX-1974 URL: https://issues.apache.org/jira/browse/PDFBOX-1974 Project: PDFBox Issue Type: Improvement Components: PDModel Affects Versions: 2.0.0 Reporter: John Hewson Priority: Minor ICOSVisitor is only implemented by one class, COSWriter, so is redundant and should be removed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-1974) ICOSVisitor is redundant
[ https://issues.apache.org/jira/browse/PDFBOX-1974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Hewson updated PDFBOX-1974: Description: ICOSVisitor is only implemented by one class, COSWriter, so it can be removed. (was: ICOSVisitor is only implemented by one class, COSWriter, so is redundant and should be removed.) ICOSVisitor is redundant Key: PDFBOX-1974 URL: https://issues.apache.org/jira/browse/PDFBOX-1974 Project: PDFBox Issue Type: Improvement Components: PDModel Affects Versions: 2.0.0 Reporter: John Hewson Priority: Minor ICOSVisitor is only implemented by one class, COSWriter, so it can be removed. -- This message was sent by Atlassian JIRA (v6.2#6252)