[jira] [Updated] (PDFBOX-2126) Optimize clipping
[ https://issues.apache.org/jira/browse/PDFBOX-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-2126: Attachment: PDFBOX-1772.pdf-1-bad.png pdfbox-1772.pdf-1-good.png PDFBOX-1772.pdf Something is missing in the rendering of PDFBOX-1772, see attached files Optimize clipping - Key: PDFBOX-2126 URL: https://issues.apache.org/jira/browse/PDFBOX-2126 Project: PDFBox Issue Type: Improvement Components: Rendering Affects Versions: 2.0.0 Reporter: Petr Slaby Attachments: ClipPath.1.patch, ClipPath.patch, PDFBOX-1772.pdf, PDFBOX-1772.pdf-1-bad.png, example_010.pdf, pdfbox-1772.pdf-1-good.png As already stated in a TODO comment in PageDrawer, the call of Graphics2D#setClip() is time and memory consuming. The attached patch optimizes clipping by calling Graphics2D#setClip() only if the clipping path has changed. The effect depends on the document, e.g. the attached one renders in 10.5s without the optimization and in 5.5 seconds in the optimized version. The clipping has to be re-applied whenever the transform in Graphics2D changes. This is not explicitly checked for, the implementation rather depends on the cached value being reset manually. Currently this is only needed at one place when processing annotations (AcroForms). Also, the implementation relies upon the clipping path object stored in PDGraphicsState to never change so that a comparison using == can be used. This works fine, but needs a bit of awareness in future changes. To make the design more clean, the clipping path could be made private to PDGraphcisState and thus really immutable from outside. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (PDFBOX-2169) NPE in PDTrueTypeFont.makeFontDescriptor
[ https://issues.apache.org/jira/browse/PDFBOX-2169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048565#comment-14048565 ] Tilman Hausherr edited comment on PDFBOX-2169 at 7/1/14 6:22 AM: - -There's a difference now in the rendering of PDFBOX-2149. The part on the top above the line, previously it was done with a serif font, now it is done with a fallback font.- uh - ignore this for now. Maybe this is because of changes I did locally. was (Author: tilman): There's a difference now in the rendering of PDFBOX-2149. The part on the top above the line, previously it was done with a serif font, now it is done with a fallback font. NPE in PDTrueTypeFont.makeFontDescriptor Key: PDFBOX-2169 URL: https://issues.apache.org/jira/browse/PDFBOX-2169 Project: PDFBox Issue Type: Bug Affects Versions: 2.0.0 Reporter: Tilman Hausherr Assignee: John Hewson Attachments: 000153.pdf The attached file brings this exception when rendering or when extracting text {code} java.lang.NullPointerException at org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.makeFontDescriptor(PDTrueTypeFont.java:161) at org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.getFontDescriptor(PDTrueTypeFont.java:150) at org.apache.pdfbox.pdmodel.font.PDFont.getFontWidth(PDFont.java:814) at org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.getFontWidth(PDTrueTypeFont.java:382) at org.apache.pdfbox.pdmodel.font.PDFont.getFontWidth(PDFont.java:312) at org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:377) at org.apache.pdfbox.util.operator.ShowText.process(ShowText.java:44) at org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:508) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Apache PDFBox July 2014 board report due
+1 - thx for taking care of this. Maruan Am 28.06.2014 um 12:15 schrieb Andreas Lehmkuehler andr...@lehmi.de: Hi, find attached a quick draft of the board report we're expected to submit this month. @John, @Tilman Please add something about the GSoC status. Any further comments, objections or additions? draft The Apache PDFBox library is an open source Java tool for working with PDF documents. General Comments There are no issues that require Board attention. Community - There is a steady stream of contributions and bug reports from the community. 451 (452 last report) subscribers on the user@ list 153 (157 last report) subscribers on the dev@ list Maruan gave a presentation about PDFBox at the PDF Days Europe 2014 in cologne. We got some positive feedback and a couple of people show some interest in our project/community. Releases Version 1.8.5 was released on 2nd of May 2014 Version 1.8.6 was released on 22nd of June 2014 Both are incremental bugfix releases based on PDFBox 1.8.x. GSoC TODO John Tilman Development: The work on our next major release is an ongoing effort. The main topics are: - switch to java 1.6 - modularization - replace/enhance the parser - code cleanup - enhance rendering We are targeting the late summer as a rough release date for the next major release. /draft BR Andreas Lehmkühler
[jira] [Comment Edited] (PDFBOX-2169) NPE in PDTrueTypeFont.makeFontDescriptor
[ https://issues.apache.org/jira/browse/PDFBOX-2169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048565#comment-14048565 ] Tilman Hausherr edited comment on PDFBOX-2169 at 7/1/14 6:47 AM: - -There's a difference now in the rendering of PDFBOX-2149. The part on the top above the line, previously it was done with a serif font, now it is done with a fallback font.- uh - ignore this for now. Maybe this is because of changes I did locally and that were in PageDrawer, see PDFBOX-1701. I see you're handling this in getTrueTypeFallbackFont() now. was (Author: tilman): -There's a difference now in the rendering of PDFBOX-2149. The part on the top above the line, previously it was done with a serif font, now it is done with a fallback font.- uh - ignore this for now. Maybe this is because of changes I did locally. NPE in PDTrueTypeFont.makeFontDescriptor Key: PDFBOX-2169 URL: https://issues.apache.org/jira/browse/PDFBOX-2169 Project: PDFBox Issue Type: Bug Affects Versions: 2.0.0 Reporter: Tilman Hausherr Assignee: John Hewson Attachments: 000153.pdf The attached file brings this exception when rendering or when extracting text {code} java.lang.NullPointerException at org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.makeFontDescriptor(PDTrueTypeFont.java:161) at org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.getFontDescriptor(PDTrueTypeFont.java:150) at org.apache.pdfbox.pdmodel.font.PDFont.getFontWidth(PDFont.java:814) at org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.getFontWidth(PDTrueTypeFont.java:382) at org.apache.pdfbox.pdmodel.font.PDFont.getFontWidth(PDFont.java:312) at org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:377) at org.apache.pdfbox.util.operator.ShowText.process(ShowText.java:44) at org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:508) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Apache RAT
Hi, John Hewson j...@jahewson.com hat am 1. Juli 2014 um 03:10 geschrieben: Hi All, The Apache RAT plugin runs on the Jenkins server but not as part of local builds, which is causing build failures when I commit code. How can we enable it for local builds? I tried running mvn org.apache.rat:apache-rat-plugin:0.10:check” but I get many errors which don’t occur on Jenkins. -Ppedantic as command line option should do the trick. -- John BR Andreas Lehmkühler
[jira] [Commented] (PDFBOX-2162) annotation that highlights a text is not visible in image (converted from the pdf)
[ https://issues.apache.org/jira/browse/PDFBOX-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048881#comment-14048881 ] Julien Savoyet commented on PDFBOX-2162: Hi Maruan, thanks a lot for your comment, this help me really to understand the way of working about the annotations in PdbBox. So I've tried to add an appearance COSName.AP to the dictionnary, but I remain in the same impasse without success at the moment because the annotations are still not taken into account in the png images generated in the end of the script. Hereafter is the initial code refined (without the appearance part since it doesn't work). If you have an idea how I could add the appearance part in order to make the annotation appear in the output images, I'd be happy to receive your suggestions : package fr.annotation.images; import java.io.IOException; import java.util.List; import org.apache.pdfbox.exceptions.COSVisitorException; import org.apache.pdfbox.pdmodel.PDDocument; import org.apache.pdfbox.pdmodel.PDPage; import org.apache.pdfbox.pdmodel.common.PDRectangle; import org.apache.pdfbox.pdmodel.graphics.color.PDGamma; import org.apache.pdfbox.pdmodel.interactive.annotation.PDAnnotation; import org.apache.pdfbox.pdmodel.interactive.annotation.PDAnnotationTextMarkup; import org.apache.pdfbox.util.PDFImageWriter; public class createAnnotation { public static void main(String[] args) throws COSVisitorException, IOException { PDDocument doc = null; try { doc = PDDocument.load(myfile1.pdf); //Input PDF File Name List? pages = doc.getDocumentCatalog().getAllPages(); for (int i = 0; i pages.size(); i++) { PDPage page = (PDPage) pages.get(i); ListPDAnnotation annotations = page.getAnnotations(); PDGamma colourBlue = new PDGamma(); colourBlue.setB(1); float pw = page.getMediaBox().getUpperRightX(); float ph = page.getMediaBox().getUpperRightY(); // Now add the markup annotation, a highlight to PDFBox text PDAnnotationTextMarkup txtMark = new PDAnnotationTextMarkup(PDAnnotationTextMarkup.SUB_TYPE_HIGHLIGHT); txtMark.setColour(colourBlue); txtMark.setConstantOpacity((float) 0.5); // Make the highlight 20% transparent //PART I - Set the rectangle containing the markup PDRectangle position = new PDRectangle(); float lowerLeftX = 94; float upperRightX = 94+100; float lowerLeftY = ph - 89; float upperRightY = ph - 89 + 20; System.out.println(lowerLeftX = + lowerLeftX + upperRightX = + upperRightX + lowerLeftY = + lowerLeftY + upperRightY = + upperRightY); position.setLowerLeftX(lowerLeftX); position.setLowerLeftY(lowerLeftY); position.setUpperRightX(upperRightX); position.setUpperRightY(upperRightY); txtMark.setRectangle(position); //PART II - Set the quad float[] quads = new float[8]; quads[0] = position.getLowerLeftX(); // x1 quads[1] = position.getUpperRightY() - 2; // y1 quads[2] = position.getUpperRightX(); // x2 quads[3] = quads[1]; // y2 quads[4] = quads[0]; // x3 quads[5] = position.getLowerLeftY() - 2; // y3 quads[6] = quads[2]; // x4 quads[7] = quads[5]; // y5 txtMark.setQuadPoints(quads); } doc.save(tmpfile.pdf); //Output file name } finally { if (doc != null) { doc.close(); } } String pdfPath = tmpfile.pdf; PDFImageWriter imageWriter = new PDFImageWriter(); PDDocument pddoc =
[jira] [Updated] (PDFBOX-2173) Nullpointer when validating empty file
[ https://issues.apache.org/jira/browse/PDFBOX-2173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Per-Olof Widström updated PDFBOX-2173: -- Description: I am validating a PDF and I am getting a NullpointerException when the filesize is 0 bytes. I looked at the code and saw a small misstake, and therefore I am reporting this. I use version 2.8.6 {code:java|title=PreflightParser.java} protected void checkPdfHeader() { //[snip] String secondLine = reader.readLine(); byte[] secondLineAsBytes = secondLine.getBytes(encoding.name()); if (secondLine != null secondLineAsBytes.length = 5) //[snip] } {code} As you can see {{secondLineAsBytes}} is checked for null value, but only after being called once. {code} java.lang.NullPointerException at org.apache.pdfbox.preflight.parser.PreflightParser.checkPdfHeader(PreflightParser.java:297) at org.apache.pdfbox.preflight.parser.PreflightParser.parse(PreflightParser.java:195) at org.apache.pdfbox.preflight.parser.PreflightParser.parse(PreflightParser.java:180) at org.apache.pdfbox.preflight.parser.PreflightParser.parse(PreflightParser.java:168) {code} h4. Workaround Before validating, check the filesize, if it is larger than 0. h4. How to reproduce Try to validate with an empty file. was: I am validating a PDF and I am getting a NullpointerException when the filesize is 0 bytes. I looked at the code and saw a small misstake, and therefore I am reporting this. I use version 2.8.6 {code:java|title=PreflightParser.java} protected void checkPdfHeader() { //[snip] String secondLine = reader.readLine(); byte[] secondLineAsBytes = secondLine.getBytes(encoding.name()); if (secondLine != null secondLineAsBytes.length = 5) //[snip] } {code} As you can see {{secondLineAsBytes}} is checked for null value, but only after being called once. {code} java.lang.NullPointerException at org.apache.pdfbox.preflight.parser.PreflightParser.checkPdfHeader(PreflightParser.java:297) at org.apache.pdfbox.preflight.parser.PreflightParser.parse(PreflightParser.java:195) at org.apache.pdfbox.preflight.parser.PreflightParser.parse(PreflightParser.java:180) at org.apache.pdfbox.preflight.parser.PreflightParser.parse(PreflightParser.java:168) {code} h4. Workaround Before validating, check the filesize, if it is larger than 0. h4. How to reproduce Try to validate with an empty file. Priority: Minor (was: Major) Nullpointer when validating empty file -- Key: PDFBOX-2173 URL: https://issues.apache.org/jira/browse/PDFBOX-2173 Project: PDFBox Issue Type: Bug Components: Preflight Reporter: Per-Olof Widström Priority: Minor I am validating a PDF and I am getting a NullpointerException when the filesize is 0 bytes. I looked at the code and saw a small misstake, and therefore I am reporting this. I use version 2.8.6 {code:java|title=PreflightParser.java} protected void checkPdfHeader() { //[snip] String secondLine = reader.readLine(); byte[] secondLineAsBytes = secondLine.getBytes(encoding.name()); if (secondLine != null secondLineAsBytes.length = 5) //[snip] } {code} As you can see {{secondLineAsBytes}} is checked for null value, but only after being called once. {code} java.lang.NullPointerException at org.apache.pdfbox.preflight.parser.PreflightParser.checkPdfHeader(PreflightParser.java:297) at org.apache.pdfbox.preflight.parser.PreflightParser.parse(PreflightParser.java:195) at org.apache.pdfbox.preflight.parser.PreflightParser.parse(PreflightParser.java:180) at org.apache.pdfbox.preflight.parser.PreflightParser.parse(PreflightParser.java:168) {code} h4. Workaround Before validating, check the filesize, if it is larger than 0. h4. How to reproduce Try to validate with an empty file. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (PDFBOX-2162) annotation that highlights a text is not visible in image (converted from the pdf)
[ https://issues.apache.org/jira/browse/PDFBOX-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048881#comment-14048881 ] Julien Savoyet edited comment on PDFBOX-2162 at 7/1/14 3:23 PM: Hi Maruan, thanks a lot for your comment, this help me really to understand the way of working about the annotations in PdbBox. So I've tried to add an appearance COSName.AP to the dictionnary, but I remain in the same impasse without success at the moment because the annotations are still not taken into account in the png images generated in the end of the script. Hereafter is the initial code refined (without the appearance part since it doesn't work). If you have an idea how I could add the appearance part in order to make the annotation appear in the output images, I'd be happy to receive your suggestions : package fr.annotation.images; import java.io.IOException; import java.util.List; import org.apache.pdfbox.exceptions.COSVisitorException; import org.apache.pdfbox.pdmodel.PDDocument; import org.apache.pdfbox.pdmodel.PDPage; import org.apache.pdfbox.pdmodel.common.PDRectangle; import org.apache.pdfbox.pdmodel.graphics.color.PDGamma; import org.apache.pdfbox.pdmodel.interactive.annotation.PDAnnotation; import org.apache.pdfbox.pdmodel.interactive.annotation.PDAnnotationTextMarkup; import org.apache.pdfbox.util.PDFImageWriter; public class createAnnotation { public static void main(String[] args) throws COSVisitorException, IOException { PDDocument doc = null; try { doc = PDDocument.load(myfile1.pdf); //Input PDF File Name List? pages = doc.getDocumentCatalog().getAllPages(); for (int i = 0; i pages.size(); i++) { PDPage page = (PDPage) pages.get(i); ListPDAnnotation annotations = page.getAnnotations(); PDGamma colourBlue = new PDGamma(); colourBlue.setB(1); float pw = page.getMediaBox().getUpperRightX(); float ph = page.getMediaBox().getUpperRightY(); // Now add the markup annotation, a highlight to PDFBox text PDAnnotationTextMarkup txtMark = new PDAnnotationTextMarkup(PDAnnotationTextMarkup.SUB_TYPE_HIGHLIGHT); txtMark.setColour(colourBlue); txtMark.setConstantOpacity((float) 0.5); // Make the highlight 50% transparent //PART I - Set the rectangle containing the markup PDRectangle position = new PDRectangle(); float lowerLeftX = 94; float upperRightX = 94+100; float lowerLeftY = ph - 89; float upperRightY = ph - 89 + 20; System.out.println(lowerLeftX = + lowerLeftX + upperRightX = + upperRightX + lowerLeftY = + lowerLeftY + upperRightY = + upperRightY); position.setLowerLeftX(lowerLeftX); position.setLowerLeftY(lowerLeftY); position.setUpperRightX(upperRightX); position.setUpperRightY(upperRightY); txtMark.setRectangle(position); //PART II - Set the quad float[] quads = new float[8]; quads[0] = position.getLowerLeftX(); // x1 quads[1] = position.getUpperRightY() - 2; // y1 quads[2] = position.getUpperRightX(); // x2 quads[3] = quads[1]; // y2 quads[4] = quads[0]; // x3 quads[5] = position.getLowerLeftY() - 2; // y3 quads[6] = quads[2]; // x4 quads[7] = quads[5]; // y5 txtMark.setQuadPoints(quads); } doc.save(tmpfile.pdf); //Output file name } finally { if (doc != null) { doc.close(); } } String pdfPath = tmpfile.pdf; PDFImageWriter imageWriter = new PDFImageWriter();
[jira] [Comment Edited] (PDFBOX-2162) annotation that highlights a text is not visible in image (converted from the pdf)
[ https://issues.apache.org/jira/browse/PDFBOX-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048881#comment-14048881 ] Julien Savoyet edited comment on PDFBOX-2162 at 7/1/14 3:24 PM: Hi Maruan, thanks a lot for your comment, this help me really to understand the way of working about the annotations in PdbBox. So I've tried to add an appearance COSName.AP to the dictionnary, but I remain in the same impasse without success at the moment because the annotations are still not taken into account in the png images generated in the end of the script. Hereafter is the initial code refined (without the appearance part since it doesn't work). If you have an idea how I could add the appearance part in order to make the annotation appear in the output images, I'd be happy to receive your suggestions : package fr.annotation.images; import java.io.IOException; import java.util.List; import org.apache.pdfbox.exceptions.COSVisitorException; import org.apache.pdfbox.pdmodel.PDDocument; import org.apache.pdfbox.pdmodel.PDPage; import org.apache.pdfbox.pdmodel.common.PDRectangle; import org.apache.pdfbox.pdmodel.graphics.color.PDGamma; import org.apache.pdfbox.pdmodel.interactive.annotation.PDAnnotation; import org.apache.pdfbox.pdmodel.interactive.annotation.PDAnnotationTextMarkup; import org.apache.pdfbox.util.PDFImageWriter; public class createAnnotation { public static void main(String[] args) throws COSVisitorException, IOException { PDDocument doc = null; try { doc = PDDocument.load(myfile1.pdf); //Input PDF File Name List? pages = doc.getDocumentCatalog().getAllPages(); for (int i = 0; i pages.size(); i++) { PDPage page = (PDPage) pages.get(i); ListPDAnnotation annotations = page.getAnnotations(); PDGamma colourBlue = new PDGamma(); colourBlue.setB(1); float pw = page.getMediaBox().getUpperRightX(); float ph = page.getMediaBox().getUpperRightY(); // Now add the markup annotation, a highlight to PDFBox text PDAnnotationTextMarkup txtMark = new PDAnnotationTextMarkup(PDAnnotationTextMarkup.SUB_TYPE_HIGHLIGHT); txtMark.setColour(colourBlue); txtMark.setConstantOpacity((float) 0.5); // Make the highlight 50% transparent //PART I - Set the rectangle containing the markup PDRectangle position = new PDRectangle(); float lowerLeftX = 94; float upperRightX = 94+100; float lowerLeftY = ph - 89; float upperRightY = ph - 89 + 20; System.out.println(lowerLeftX = + lowerLeftX + upperRightX = + upperRightX + lowerLeftY = + lowerLeftY + upperRightY = + upperRightY); position.setLowerLeftX(lowerLeftX); position.setLowerLeftY(lowerLeftY); position.setUpperRightX(upperRightX); position.setUpperRightY(upperRightY); txtMark.setRectangle(position); //PART II - Set the quad float[] quads = new float[8]; quads[0] = position.getLowerLeftX(); // x1 quads[1] = position.getUpperRightY() - 2; // y1 quads[2] = position.getUpperRightX(); // x2 quads[3] = quads[1]; // y2 quads[4] = quads[0]; // x3 quads[5] = position.getLowerLeftY() - 2; // y3 quads[6] = quads[2]; // x4 quads[7] = quads[5]; // y5 txtMark.setQuadPoints(quads); } doc.save(tmpfile.pdf); //Output file name } finally { if (doc != null) { doc.close(); } } String pdfPath = tmpfile.pdf; PDFImageWriter imageWriter = new PDFImageWriter();
[jira] [Commented] (PDFBOX-2169) NPE in PDTrueTypeFont.makeFontDescriptor
[ https://issues.apache.org/jira/browse/PDFBOX-2169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049015#comment-14049015 ] John Hewson commented on PDFBOX-2169: - Yep, the licensing situation around the LPPL is more complex than I'd hoped. I think I may have found a better solution, the [Liberation fonts|https://fedorahosted.org/liberation-fonts/] are now under the [SIL Open Font License, Version 1.1|http://scripts.sil.org/cms/scripts/page.php?item_id=OFL_web]. The only questionable part is as follows: {quote} The OFL allows the licensed fonts to be used, studied, modified and redistributed freely as long as they are not sold by themselves. {quote} Any thoughts? NPE in PDTrueTypeFont.makeFontDescriptor Key: PDFBOX-2169 URL: https://issues.apache.org/jira/browse/PDFBOX-2169 Project: PDFBox Issue Type: Bug Affects Versions: 2.0.0 Reporter: Tilman Hausherr Assignee: John Hewson Attachments: 000153.pdf The attached file brings this exception when rendering or when extracting text {code} java.lang.NullPointerException at org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.makeFontDescriptor(PDTrueTypeFont.java:161) at org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.getFontDescriptor(PDTrueTypeFont.java:150) at org.apache.pdfbox.pdmodel.font.PDFont.getFontWidth(PDFont.java:814) at org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.getFontWidth(PDTrueTypeFont.java:382) at org.apache.pdfbox.pdmodel.font.PDFont.getFontWidth(PDFont.java:312) at org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:377) at org.apache.pdfbox.util.operator.ShowText.process(ShowText.java:44) at org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:508) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-2173) Nullpointer when validating empty file
[ https://issues.apache.org/jira/browse/PDFBOX-2173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Hewson updated PDFBOX-2173: Affects Version/s: 1.8.6 Nullpointer when validating empty file -- Key: PDFBOX-2173 URL: https://issues.apache.org/jira/browse/PDFBOX-2173 Project: PDFBox Issue Type: Bug Components: Preflight Affects Versions: 1.8.6 Reporter: Per-Olof Widström Priority: Minor I am validating a PDF and I am getting a NullpointerException when the filesize is 0 bytes. I looked at the code and saw a small misstake, and therefore I am reporting this. I use version 1.8.6 {code:java|title=PreflightParser.java} protected void checkPdfHeader() { //[snip] String secondLine = reader.readLine(); byte[] secondLineAsBytes = secondLine.getBytes(encoding.name()); if (secondLine != null secondLineAsBytes.length = 5) //[snip] } {code} As you can see {{secondLineAsBytes}} is checked for null value, but only after being called once. {code} java.lang.NullPointerException at org.apache.pdfbox.preflight.parser.PreflightParser.checkPdfHeader(PreflightParser.java:297) at org.apache.pdfbox.preflight.parser.PreflightParser.parse(PreflightParser.java:195) at org.apache.pdfbox.preflight.parser.PreflightParser.parse(PreflightParser.java:180) at org.apache.pdfbox.preflight.parser.PreflightParser.parse(PreflightParser.java:168) {code} h4. Workaround Before validating, check the filesize, if it is larger than 0. h4. How to reproduce Try to validate with an empty file. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2169) NPE in PDTrueTypeFont.makeFontDescriptor
[ https://issues.apache.org/jira/browse/PDFBOX-2169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049022#comment-14049022 ] Andreas Lehmkühler commented on PDFBOX-2169: The SIL Open Font License shouldn't be a problem, it's a [category-b|http://www.apache.org/legal/resolved.html#category-b] license. Please, don't forget to mention it in the NOTICE file NPE in PDTrueTypeFont.makeFontDescriptor Key: PDFBOX-2169 URL: https://issues.apache.org/jira/browse/PDFBOX-2169 Project: PDFBox Issue Type: Bug Affects Versions: 2.0.0 Reporter: Tilman Hausherr Assignee: John Hewson Attachments: 000153.pdf The attached file brings this exception when rendering or when extracting text {code} java.lang.NullPointerException at org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.makeFontDescriptor(PDTrueTypeFont.java:161) at org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.getFontDescriptor(PDTrueTypeFont.java:150) at org.apache.pdfbox.pdmodel.font.PDFont.getFontWidth(PDFont.java:814) at org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.getFontWidth(PDTrueTypeFont.java:382) at org.apache.pdfbox.pdmodel.font.PDFont.getFontWidth(PDFont.java:312) at org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:377) at org.apache.pdfbox.util.operator.ShowText.process(ShowText.java:44) at org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:508) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Apache RAT
Thanks for the tip Andreas, however running: mvn clean install -Ppedantic With no local modifications, gives me the error: [ERROR] Failed to execute goal org.apache.rat:apache-rat-plugin:0.10:check (default) on project pdfbox: Too many files with unapproved license: 2 See RAT report in: /Users/john/apache/pdfbox-trunk/pdfbox/target/rat.txt - [Help 1] The rat.txt contains the following: Unapproved licenses: pdfbox/src/test/resources/org/apache/pdfbox/encryption/test1.der pdfbox/src/test/resources/org/apache/pdfbox/encryption/test2.der -- John On 1 Jul 2014, at 02:58, Andreas Lehmkühler andr...@lehmi.de wrote: Hi, John Hewson j...@jahewson.com hat am 1. Juli 2014 um 03:10 geschrieben: Hi All, The Apache RAT plugin runs on the Jenkins server but not as part of local builds, which is causing build failures when I commit code. How can we enable it for local builds? I tried running mvn org.apache.rat:apache-rat-plugin:0.10:check” but I get many errors which don’t occur on Jenkins. -Ppedantic as command line option should do the trick. -- John BR Andreas Lehmkühler
[jira] [Commented] (PDFBOX-2126) Optimize clipping
[ https://issues.apache.org/jira/browse/PDFBOX-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049035#comment-14049035 ] Tilman Hausherr commented on PDFBOX-2126: - When disabling the clipping caching by deleting the if line in setClip() it works. Optimize clipping - Key: PDFBOX-2126 URL: https://issues.apache.org/jira/browse/PDFBOX-2126 Project: PDFBox Issue Type: Improvement Components: Rendering Affects Versions: 2.0.0 Reporter: Petr Slaby Attachments: ClipPath.1.patch, ClipPath.patch, PDFBOX-1772.pdf, PDFBOX-1772.pdf-1-bad.png, example_010.pdf, pdfbox-1772.pdf-1-good.png As already stated in a TODO comment in PageDrawer, the call of Graphics2D#setClip() is time and memory consuming. The attached patch optimizes clipping by calling Graphics2D#setClip() only if the clipping path has changed. The effect depends on the document, e.g. the attached one renders in 10.5s without the optimization and in 5.5 seconds in the optimized version. The clipping has to be re-applied whenever the transform in Graphics2D changes. This is not explicitly checked for, the implementation rather depends on the cached value being reset manually. Currently this is only needed at one place when processing annotations (AcroForms). Also, the implementation relies upon the clipping path object stored in PDGraphicsState to never change so that a comparison using == can be used. This works fine, but needs a bit of awareness in future changes. To make the design more clean, the clipping path could be made private to PDGraphcisState and thus really immutable from outside. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2169) NPE in PDTrueTypeFont.makeFontDescriptor
[ https://issues.apache.org/jira/browse/PDFBOX-2169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049045#comment-14049045 ] John Hewson commented on PDFBOX-2169: - What do I need to put in the NOTICE file? I don't see any attribution requirements in the SIL Open Font License. NPE in PDTrueTypeFont.makeFontDescriptor Key: PDFBOX-2169 URL: https://issues.apache.org/jira/browse/PDFBOX-2169 Project: PDFBox Issue Type: Bug Affects Versions: 2.0.0 Reporter: Tilman Hausherr Assignee: John Hewson Attachments: 000153.pdf The attached file brings this exception when rendering or when extracting text {code} java.lang.NullPointerException at org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.makeFontDescriptor(PDTrueTypeFont.java:161) at org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.getFontDescriptor(PDTrueTypeFont.java:150) at org.apache.pdfbox.pdmodel.font.PDFont.getFontWidth(PDFont.java:814) at org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.getFontWidth(PDTrueTypeFont.java:382) at org.apache.pdfbox.pdmodel.font.PDFont.getFontWidth(PDFont.java:312) at org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:377) at org.apache.pdfbox.util.operator.ShowText.process(ShowText.java:44) at org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:508) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2126) Optimize clipping
[ https://issues.apache.org/jira/browse/PDFBOX-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049050#comment-14049050 ] John Hewson commented on PDFBOX-2126: - Hmm, I'll take a look at that now... Optimize clipping - Key: PDFBOX-2126 URL: https://issues.apache.org/jira/browse/PDFBOX-2126 Project: PDFBox Issue Type: Improvement Components: Rendering Affects Versions: 2.0.0 Reporter: Petr Slaby Attachments: ClipPath.1.patch, ClipPath.patch, PDFBOX-1772.pdf, PDFBOX-1772.pdf-1-bad.png, example_010.pdf, pdfbox-1772.pdf-1-good.png As already stated in a TODO comment in PageDrawer, the call of Graphics2D#setClip() is time and memory consuming. The attached patch optimizes clipping by calling Graphics2D#setClip() only if the clipping path has changed. The effect depends on the document, e.g. the attached one renders in 10.5s without the optimization and in 5.5 seconds in the optimized version. The clipping has to be re-applied whenever the transform in Graphics2D changes. This is not explicitly checked for, the implementation rather depends on the cached value being reset manually. Currently this is only needed at one place when processing annotations (AcroForms). Also, the implementation relies upon the clipping path object stored in PDGraphicsState to never change so that a comparison using == can be used. This works fine, but needs a bit of awareness in future changes. To make the design more clean, the clipping path could be made private to PDGraphcisState and thus really immutable from outside. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2126) Optimize clipping
[ https://issues.apache.org/jira/browse/PDFBOX-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049063#comment-14049063 ] John Hewson commented on PDFBOX-2126: - What page are you looking at? Optimize clipping - Key: PDFBOX-2126 URL: https://issues.apache.org/jira/browse/PDFBOX-2126 Project: PDFBox Issue Type: Improvement Components: Rendering Affects Versions: 2.0.0 Reporter: Petr Slaby Attachments: ClipPath.1.patch, ClipPath.patch, PDFBOX-1772.pdf, PDFBOX-1772.pdf-1-bad.png, example_010.pdf, pdfbox-1772.pdf-1-good.png, screenshot.png As already stated in a TODO comment in PageDrawer, the call of Graphics2D#setClip() is time and memory consuming. The attached patch optimizes clipping by calling Graphics2D#setClip() only if the clipping path has changed. The effect depends on the document, e.g. the attached one renders in 10.5s without the optimization and in 5.5 seconds in the optimized version. The clipping has to be re-applied whenever the transform in Graphics2D changes. This is not explicitly checked for, the implementation rather depends on the cached value being reset manually. Currently this is only needed at one place when processing annotations (AcroForms). Also, the implementation relies upon the clipping path object stored in PDGraphicsState to never change so that a comparison using == can be used. This works fine, but needs a bit of awareness in future changes. To make the design more clean, the clipping path could be made private to PDGraphcisState and thus really immutable from outside. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-2126) Optimize clipping
[ https://issues.apache.org/jira/browse/PDFBOX-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-2126: Attachment: screenshot.png Screenshot of the difference Optimize clipping - Key: PDFBOX-2126 URL: https://issues.apache.org/jira/browse/PDFBOX-2126 Project: PDFBox Issue Type: Improvement Components: Rendering Affects Versions: 2.0.0 Reporter: Petr Slaby Attachments: ClipPath.1.patch, ClipPath.patch, PDFBOX-1772.pdf, PDFBOX-1772.pdf-1-bad.png, example_010.pdf, pdfbox-1772.pdf-1-good.png, screenshot.png As already stated in a TODO comment in PageDrawer, the call of Graphics2D#setClip() is time and memory consuming. The attached patch optimizes clipping by calling Graphics2D#setClip() only if the clipping path has changed. The effect depends on the document, e.g. the attached one renders in 10.5s without the optimization and in 5.5 seconds in the optimized version. The clipping has to be re-applied whenever the transform in Graphics2D changes. This is not explicitly checked for, the implementation rather depends on the cached value being reset manually. Currently this is only needed at one place when processing annotations (AcroForms). Also, the implementation relies upon the clipping path object stored in PDGraphicsState to never change so that a comparison using == can be used. This works fine, but needs a bit of awareness in future changes. To make the design more clean, the clipping path could be made private to PDGraphcisState and thus really immutable from outside. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2126) Optimize clipping
[ https://issues.apache.org/jira/browse/PDFBOX-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049070#comment-14049070 ] John Hewson commented on PDFBOX-2126: - Yep, now I see it. Optimize clipping - Key: PDFBOX-2126 URL: https://issues.apache.org/jira/browse/PDFBOX-2126 Project: PDFBox Issue Type: Improvement Components: Rendering Affects Versions: 2.0.0 Reporter: Petr Slaby Attachments: ClipPath.1.patch, ClipPath.patch, PDFBOX-1772.pdf, PDFBOX-1772.pdf-1-bad.png, example_010.pdf, pdfbox-1772.pdf-1-good.png, screenshot.png As already stated in a TODO comment in PageDrawer, the call of Graphics2D#setClip() is time and memory consuming. The attached patch optimizes clipping by calling Graphics2D#setClip() only if the clipping path has changed. The effect depends on the document, e.g. the attached one renders in 10.5s without the optimization and in 5.5 seconds in the optimized version. The clipping has to be re-applied whenever the transform in Graphics2D changes. This is not explicitly checked for, the implementation rather depends on the cached value being reset manually. Currently this is only needed at one place when processing annotations (AcroForms). Also, the implementation relies upon the clipping path object stored in PDGraphicsState to never change so that a comparison using == can be used. This works fine, but needs a bit of awareness in future changes. To make the design more clean, the clipping path could be made private to PDGraphcisState and thus really immutable from outside. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2126) Optimize clipping
[ https://issues.apache.org/jira/browse/PDFBOX-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049078#comment-14049078 ] John Hewson commented on PDFBOX-2126: - Should be fixed in [r1607146|http://svn.apache.org/r1607146], I've switched to storing the Graphics2D transformed clipping path in lastClip instead of the one from PDGraphicsState. It's not _quite_ as fast but the results so far look good. Optimize clipping - Key: PDFBOX-2126 URL: https://issues.apache.org/jira/browse/PDFBOX-2126 Project: PDFBox Issue Type: Improvement Components: Rendering Affects Versions: 2.0.0 Reporter: Petr Slaby Attachments: ClipPath.1.patch, ClipPath.patch, PDFBOX-1772.pdf, PDFBOX-1772.pdf-1-bad.png, example_010.pdf, pdfbox-1772.pdf-1-good.png, screenshot.png As already stated in a TODO comment in PageDrawer, the call of Graphics2D#setClip() is time and memory consuming. The attached patch optimizes clipping by calling Graphics2D#setClip() only if the clipping path has changed. The effect depends on the document, e.g. the attached one renders in 10.5s without the optimization and in 5.5 seconds in the optimized version. The clipping has to be re-applied whenever the transform in Graphics2D changes. This is not explicitly checked for, the implementation rather depends on the cached value being reset manually. Currently this is only needed at one place when processing annotations (AcroForms). Also, the implementation relies upon the clipping path object stored in PDGraphicsState to never change so that a comparison using == can be used. This works fine, but needs a bit of awareness in future changes. To make the design more clean, the clipping path could be made private to PDGraphcisState and thus really immutable from outside. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2126) Optimize clipping
[ https://issues.apache.org/jira/browse/PDFBOX-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049072#comment-14049072 ] Tilman Hausherr commented on PDFBOX-2126: - PDFBOX-1772.pdf, attached this morning. Optimize clipping - Key: PDFBOX-2126 URL: https://issues.apache.org/jira/browse/PDFBOX-2126 Project: PDFBox Issue Type: Improvement Components: Rendering Affects Versions: 2.0.0 Reporter: Petr Slaby Attachments: ClipPath.1.patch, ClipPath.patch, PDFBOX-1772.pdf, PDFBOX-1772.pdf-1-bad.png, example_010.pdf, pdfbox-1772.pdf-1-good.png, screenshot.png As already stated in a TODO comment in PageDrawer, the call of Graphics2D#setClip() is time and memory consuming. The attached patch optimizes clipping by calling Graphics2D#setClip() only if the clipping path has changed. The effect depends on the document, e.g. the attached one renders in 10.5s without the optimization and in 5.5 seconds in the optimized version. The clipping has to be re-applied whenever the transform in Graphics2D changes. This is not explicitly checked for, the implementation rather depends on the cached value being reset manually. Currently this is only needed at one place when processing annotations (AcroForms). Also, the implementation relies upon the clipping path object stored in PDGraphicsState to never change so that a comparison using == can be used. This works fine, but needs a bit of awareness in future changes. To make the design more clean, the clipping path could be made private to PDGraphcisState and thus really immutable from outside. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2169) NPE in PDTrueTypeFont.makeFontDescriptor
[ https://issues.apache.org/jira/browse/PDFBOX-2169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049080#comment-14049080 ] Andreas Lehmkühler commented on PDFBOX-2169: Ups, my bad. I'm talking about the README and not the NOTICE file. NPE in PDTrueTypeFont.makeFontDescriptor Key: PDFBOX-2169 URL: https://issues.apache.org/jira/browse/PDFBOX-2169 Project: PDFBox Issue Type: Bug Affects Versions: 2.0.0 Reporter: Tilman Hausherr Assignee: John Hewson Attachments: 000153.pdf The attached file brings this exception when rendering or when extracting text {code} java.lang.NullPointerException at org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.makeFontDescriptor(PDTrueTypeFont.java:161) at org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.getFontDescriptor(PDTrueTypeFont.java:150) at org.apache.pdfbox.pdmodel.font.PDFont.getFontWidth(PDFont.java:814) at org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.getFontWidth(PDTrueTypeFont.java:382) at org.apache.pdfbox.pdmodel.font.PDFont.getFontWidth(PDFont.java:312) at org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:377) at org.apache.pdfbox.util.operator.ShowText.process(ShowText.java:44) at org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:508) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Apache RAT
Hi, Am 01.07.2014 18:27, schrieb John Hewson: Thanks for the tip Andreas, however running: mvn clean install -Ppedantic With no local modifications, gives me the error: [ERROR] Failed to execute goal org.apache.rat:apache-rat-plugin:0.10:check (default) on project pdfbox: Too many files with unapproved license: 2 See RAT report in: /Users/john/apache/pdfbox-trunk/pdfbox/target/rat.txt - [Help 1] The rat.txt contains the following: Unapproved licenses: pdfbox/src/test/resources/org/apache/pdfbox/encryption/test1.der pdfbox/src/test/resources/org/apache/pdfbox/encryption/test2.der Hmmm works fine for me. Rat marks bothe files as binary B /home/lehmi/workspace/source/pdfbox-trunk-2.0.0/pdfbox/src/test/resources/org/apache/pdfbox/encryption/test2.der I'm using mvn 3.0.5 and JDK1.7.0_45 on linux -- John BR Andreas Lehmkühler On 1 Jul 2014, at 02:58, Andreas Lehmkühler andr...@lehmi.de wrote: Hi, John Hewson j...@jahewson.com hat am 1. Juli 2014 um 03:10 geschrieben: Hi All, The Apache RAT plugin runs on the Jenkins server but not as part of local builds, which is causing build failures when I commit code. How can we enable it for local builds? I tried running mvn org.apache.rat:apache-rat-plugin:0.10:check” but I get many errors which don’t occur on Jenkins. -Ppedantic as command line option should do the trick. -- John BR Andreas Lehmkühler
[jira] [Commented] (PDFBOX-2126) Optimize clipping
[ https://issues.apache.org/jira/browse/PDFBOX-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049136#comment-14049136 ] Petr Slaby commented on PDFBOX-2126: [~jahewson]: In my original code, I was resetting the clipping path (lastClip = null;) just before processSubStream in drawPage, because that's exactly where the G2D transform changes. Your commits did not have that, maybe that was the reason of the regression? I must say I am not able to understand how your last commit works. It seems just to check whether the clip has changed in G2D, but not whether a new clip has been set in PDGraphicsState? Optimize clipping - Key: PDFBOX-2126 URL: https://issues.apache.org/jira/browse/PDFBOX-2126 Project: PDFBox Issue Type: Improvement Components: Rendering Affects Versions: 2.0.0 Reporter: Petr Slaby Attachments: ClipPath.1.patch, ClipPath.patch, PDFBOX-1772.pdf, PDFBOX-1772.pdf-1-bad.png, example_010.pdf, pdfbox-1772.pdf-1-good.png, screenshot.png As already stated in a TODO comment in PageDrawer, the call of Graphics2D#setClip() is time and memory consuming. The attached patch optimizes clipping by calling Graphics2D#setClip() only if the clipping path has changed. The effect depends on the document, e.g. the attached one renders in 10.5s without the optimization and in 5.5 seconds in the optimized version. The clipping has to be re-applied whenever the transform in Graphics2D changes. This is not explicitly checked for, the implementation rather depends on the cached value being reset manually. Currently this is only needed at one place when processing annotations (AcroForms). Also, the implementation relies upon the clipping path object stored in PDGraphicsState to never change so that a comparison using == can be used. This works fine, but needs a bit of awareness in future changes. To make the design more clean, the clipping path could be made private to PDGraphcisState and thus really immutable from outside. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2126) Optimize clipping
[ https://issues.apache.org/jira/browse/PDFBOX-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049141#comment-14049141 ] Tilman Hausherr commented on PDFBOX-2126: - I'm getting really weird results: my tests are messing up a lot of files (although one, bugzilla886049.pdf is suddenly correct for the first time ever), but rendering them from the command line with PDFReader produces a correct result. Optimize clipping - Key: PDFBOX-2126 URL: https://issues.apache.org/jira/browse/PDFBOX-2126 Project: PDFBox Issue Type: Improvement Components: Rendering Affects Versions: 2.0.0 Reporter: Petr Slaby Attachments: ClipPath.1.patch, ClipPath.patch, PDFBOX-1772.pdf, PDFBOX-1772.pdf-1-bad.png, example_010.pdf, pdfbox-1772.pdf-1-good.png, screenshot.png As already stated in a TODO comment in PageDrawer, the call of Graphics2D#setClip() is time and memory consuming. The attached patch optimizes clipping by calling Graphics2D#setClip() only if the clipping path has changed. The effect depends on the document, e.g. the attached one renders in 10.5s without the optimization and in 5.5 seconds in the optimized version. The clipping has to be re-applied whenever the transform in Graphics2D changes. This is not explicitly checked for, the implementation rather depends on the cached value being reset manually. Currently this is only needed at one place when processing annotations (AcroForms). Also, the implementation relies upon the clipping path object stored in PDGraphicsState to never change so that a comparison using == can be used. This works fine, but needs a bit of awareness in future changes. To make the design more clean, the clipping path could be made private to PDGraphcisState and thus really immutable from outside. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (PDFBOX-2126) Optimize clipping
[ https://issues.apache.org/jira/browse/PDFBOX-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049141#comment-14049141 ] Tilman Hausherr edited comment on PDFBOX-2126 at 7/1/14 6:17 PM: - I'm getting really weird results: my tests are messing up a lot of files, e.g. ARCHIVERGB.ai and the HOTRODCMYK.ai that are in the pdfbox\src\test\resources\input\rendering directory. But rendering them from the command line with PDFReader produces a correct result. Another file, bugzilla886049.pdf is suddenly correct for the first time ever. But only with PDDToImage, not with PDFReader. was (Author: tilman): I'm getting really weird results: my tests are messing up a lot of files (although one, bugzilla886049.pdf is suddenly correct for the first time ever), but rendering them from the command line with PDFReader produces a correct result. Optimize clipping - Key: PDFBOX-2126 URL: https://issues.apache.org/jira/browse/PDFBOX-2126 Project: PDFBox Issue Type: Improvement Components: Rendering Affects Versions: 2.0.0 Reporter: Petr Slaby Attachments: ClipPath.1.patch, ClipPath.patch, PDFBOX-1772.pdf, PDFBOX-1772.pdf-1-bad.png, example_010.pdf, pdfbox-1772.pdf-1-good.png, screenshot.png As already stated in a TODO comment in PageDrawer, the call of Graphics2D#setClip() is time and memory consuming. The attached patch optimizes clipping by calling Graphics2D#setClip() only if the clipping path has changed. The effect depends on the document, e.g. the attached one renders in 10.5s without the optimization and in 5.5 seconds in the optimized version. The clipping has to be re-applied whenever the transform in Graphics2D changes. This is not explicitly checked for, the implementation rather depends on the cached value being reset manually. Currently this is only needed at one place when processing annotations (AcroForms). Also, the implementation relies upon the clipping path object stored in PDGraphicsState to never change so that a comparison using == can be used. This works fine, but needs a bit of awareness in future changes. To make the design more clean, the clipping path could be made private to PDGraphcisState and thus really immutable from outside. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-1935) round edge with wrong color
[ https://issues.apache.org/jira/browse/PDFBOX-1935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-1935: Attachment: _bugzilla886049.pdf-1.png The file is rendered correct (_bugzilla886049.pdf-1.png) in rev 1607146 of PDFBOX-2126, but only with PDFToImage, not with PDFReader command line option. round edge with wrong color --- Key: PDFBOX-1935 URL: https://issues.apache.org/jira/browse/PDFBOX-1935 Project: PDFBox Issue Type: Bug Components: Rendering Affects Versions: 2.0.0 Reporter: Tilman Hausherr Priority: Minor Attachments: _bugzilla886049.pdf-1.png, bugzilla886049.pdf, bugzilla886049.pdf-1.png file found at: https://bugzilla.mozilla.org/show_bug.cgi?id=886049 http://www.miloticky.unas.cz/jidelnicek.pdf the pdf.js viewer has the same bug. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (PDFBOX-2169) NPE in PDTrueTypeFont.makeFontDescriptor
[ https://issues.apache.org/jira/browse/PDFBOX-2169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049080#comment-14049080 ] Andreas Lehmkühler edited comment on PDFBOX-2169 at 7/1/14 7:45 PM: Ups, my bad. I'm talking about the README and the LICENSE files and not the NOTICE file. BTW: I've updated the README and the LICENSE file and removed some information about adobes glyphlist and ICU4J. Both aren't used anymore. was (Author: lehmi): Ups, my bad. I'm talking about the README and not the NOTICE file. NPE in PDTrueTypeFont.makeFontDescriptor Key: PDFBOX-2169 URL: https://issues.apache.org/jira/browse/PDFBOX-2169 Project: PDFBox Issue Type: Bug Affects Versions: 2.0.0 Reporter: Tilman Hausherr Assignee: John Hewson Attachments: 000153.pdf The attached file brings this exception when rendering or when extracting text {code} java.lang.NullPointerException at org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.makeFontDescriptor(PDTrueTypeFont.java:161) at org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.getFontDescriptor(PDTrueTypeFont.java:150) at org.apache.pdfbox.pdmodel.font.PDFont.getFontWidth(PDFont.java:814) at org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.getFontWidth(PDTrueTypeFont.java:382) at org.apache.pdfbox.pdmodel.font.PDFont.getFontWidth(PDFont.java:312) at org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:377) at org.apache.pdfbox.util.operator.ShowText.process(ShowText.java:44) at org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:508) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Apache PDFBox July 2014 board report due
Hi, thanks for your input. BR Andreas Lehmkühler Am 28.06.2014 12:15, schrieb Andreas Lehmkuehler: Hi, find attached a quick draft of the board report we're expected to submit this month. @John, @Tilman Please add something about the GSoC status. Any further comments, objections or additions? draft The Apache PDFBox library is an open source Java tool for working with PDF documents. General Comments There are no issues that require Board attention. Community - There is a steady stream of contributions and bug reports from the community. 451 (452 last report) subscribers on the user@ list 153 (157 last report) subscribers on the dev@ list Maruan gave a presentation about PDFBox at the PDF Days Europe 2014 in cologne. We got some positive feedback and a couple of people show some interest in our project/community. Releases Version 1.8.5 was released on 2nd of May 2014 Version 1.8.6 was released on 22nd of June 2014 Both are incremental bugfix releases based on PDFBox 1.8.x. GSoC TODO John Tilman Development: The work on our next major release is an ongoing effort. The main topics are: - switch to java 1.6 - modularization - replace/enhance the parser - code cleanup - enhance rendering We are targeting the late summer as a rough release date for the next major release. /draft BR Andreas Lehmkühler
[jira] [Created] (PDFBOX-2174) Suppress the Dock icon on OS X
John Hewson created PDFBOX-2174: --- Summary: Suppress the Dock icon on OS X Key: PDFBOX-2174 URL: https://issues.apache.org/jira/browse/PDFBOX-2174 Project: PDFBox Issue Type: Improvement Components: Utilities Affects Versions: 2.0.0 Reporter: John Hewson Priority: Minor When using the command line utilities on OS X the Java icon appears in the Dock. This can be disabled by making the following call from main(): {code} System.setProperty(apple.awt.UIElement, true); {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (PDFBOX-2174) Suppress the Dock icon on OS X
[ https://issues.apache.org/jira/browse/PDFBOX-2174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Hewson resolved PDFBOX-2174. - Resolution: Fixed Added for all command-line tools in [r1607218|http://svn.apache.org/r1607218]. Suppress the Dock icon on OS X -- Key: PDFBOX-2174 URL: https://issues.apache.org/jira/browse/PDFBOX-2174 Project: PDFBox Issue Type: Improvement Components: Utilities Affects Versions: 2.0.0 Reporter: John Hewson Priority: Minor When using the command line utilities on OS X the Java icon appears in the Dock. This can be disabled by making the following call from main(): {code} System.setProperty(apple.awt.UIElement, true); {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2169) NPE in PDTrueTypeFont.makeFontDescriptor
[ https://issues.apache.org/jira/browse/PDFBOX-2169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049509#comment-14049509 ] John Hewson commented on PDFBOX-2169: - I've replaced Nimbus Sans L with Liberation Sans in [r1607223|http://svn.apache.org/r1607223] and updated the LICENSE file. NPE in PDTrueTypeFont.makeFontDescriptor Key: PDFBOX-2169 URL: https://issues.apache.org/jira/browse/PDFBOX-2169 Project: PDFBox Issue Type: Bug Affects Versions: 2.0.0 Reporter: Tilman Hausherr Assignee: John Hewson Attachments: 000153.pdf The attached file brings this exception when rendering or when extracting text {code} java.lang.NullPointerException at org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.makeFontDescriptor(PDTrueTypeFont.java:161) at org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.getFontDescriptor(PDTrueTypeFont.java:150) at org.apache.pdfbox.pdmodel.font.PDFont.getFontWidth(PDFont.java:814) at org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.getFontWidth(PDTrueTypeFont.java:382) at org.apache.pdfbox.pdmodel.font.PDFont.getFontWidth(PDFont.java:312) at org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:377) at org.apache.pdfbox.util.operator.ShowText.process(ShowText.java:44) at org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:508) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2149) Font Refactoring
[ https://issues.apache.org/jira/browse/PDFBOX-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049554#comment-14049554 ] John Hewson commented on PDFBOX-2149: - I've moved the rendering of TrueType glyphs into a new GlyphRenderer class in FontBox instead of inside PDBox's TTFGlyph2D. I've also moved the various Glyph2D subclasses to a new pdfbox.rendering.font package in [r1607236|http://svn.apache.org/1607236]. This shouldn't affect rendering of any files. Font Refactoring Key: PDFBOX-2149 URL: https://issues.apache.org/jira/browse/PDFBOX-2149 Project: PDFBox Issue Type: Improvement Components: FontBox, PDModel Affects Versions: 2.0.0 Reporter: John Hewson Assignee: John Hewson Attachments: 39.pdf, 000467.pdf To fix bugs such as PDFBOX-2140 and to enable Unicode TTF embedding we need to sort out long-standing font/text encoding issues. The main issue is that encoding is done in an ad-hoc manner, sometimes in the PDFont subclasses, sometimes elsewhere. For example TTFGlyph2D does its own decoding, and this code is copy pasted into PDTrueTypeFont. Likewise, PDFont handles CMaps and Encodings despite the fact that these two encoding methods are mutually exclusive. The end result is that the process of reading Encodings/CMaps is often following rules which are completely invalid for that font type but mostly work by luck. Phase 1 - Refactor PDFont subclasses to remove setXXX methods which allow the object to be corrupted. Proper use of inheritance can remove all cases where public setXXX methods are used during font loading. - Clean up TTF loading and the loadTTF in anticipation of Unicode TTF embedding, FontBox's TrueTypeFont class is externally mutable via setXXX methods used only by TTFParser: these can be made package-private. - the Encoding class and EncodingManager could do with some cleaning up prior to further refactoring. - PDSimpleFont does not do anything, its functionality should be moved into its superclass, PDFont. - PDFont#determineEncoding() loads CMaps when only Encodings are applicable, and vice versa. Loading needs to be pushed down into the appropriate subclasses, as a starting point the relevant code should at least be copied into the relevant subclasses ready for further refactoring. - TTFGlyph2D does its own decoding of char codes, rather than using the font's #encode method (fair enough because #encode is broken) and there's a copy and pasted version of the same code in PDTrueTypeFont - we need to consolidate this code into PDTrueTypeFont where it belongs. Phase 2 - Refactor loading of CMaps and Encodings from font dictionaries, this will involve changes to PDFont and its subclasses to delegate loading to subclasses where it can be properly encapsulated - May need to alter the class hierarchy w.r.t CIDFont to facilitate this, as CIDFont isn't really a PDFont - it's parent Type0 font is responsible for its CMap. We'll see. Phase 3 - Refactor the decoding of character codes by PDFont and its subclasses, this will involve replacing the #getCodeFromArray, #encode and #encodeToCID methods. - Fix decoding of content stream character codes in PDFStreamEngine, using the newly refactored PDFont and using the current font's CMap to determine the code width. Phase 4 - Add support for generating embedded TTFs with Unicode -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (PDFBOX-2149) Font Refactoring
[ https://issues.apache.org/jira/browse/PDFBOX-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049554#comment-14049554 ] John Hewson edited comment on PDFBOX-2149 at 7/2/14 2:32 AM: - I've moved the rendering of TrueType glyphs into a new GlyphRenderer class in FontBox instead of inside PDBox's TTFGlyph2D. I've also moved the various Glyph2D subclasses to a new pdfbox.rendering.font package in [r1607236|http://svn.apache.org/r1607236]. This shouldn't affect rendering of any files. was (Author: jahewson): I've moved the rendering of TrueType glyphs into a new GlyphRenderer class in FontBox instead of inside PDBox's TTFGlyph2D. I've also moved the various Glyph2D subclasses to a new pdfbox.rendering.font package in [r1607236|http://svn.apache.org/1607236]. This shouldn't affect rendering of any files. Font Refactoring Key: PDFBOX-2149 URL: https://issues.apache.org/jira/browse/PDFBOX-2149 Project: PDFBox Issue Type: Improvement Components: FontBox, PDModel Affects Versions: 2.0.0 Reporter: John Hewson Assignee: John Hewson Attachments: 39.pdf, 000467.pdf To fix bugs such as PDFBOX-2140 and to enable Unicode TTF embedding we need to sort out long-standing font/text encoding issues. The main issue is that encoding is done in an ad-hoc manner, sometimes in the PDFont subclasses, sometimes elsewhere. For example TTFGlyph2D does its own decoding, and this code is copy pasted into PDTrueTypeFont. Likewise, PDFont handles CMaps and Encodings despite the fact that these two encoding methods are mutually exclusive. The end result is that the process of reading Encodings/CMaps is often following rules which are completely invalid for that font type but mostly work by luck. Phase 1 - Refactor PDFont subclasses to remove setXXX methods which allow the object to be corrupted. Proper use of inheritance can remove all cases where public setXXX methods are used during font loading. - Clean up TTF loading and the loadTTF in anticipation of Unicode TTF embedding, FontBox's TrueTypeFont class is externally mutable via setXXX methods used only by TTFParser: these can be made package-private. - the Encoding class and EncodingManager could do with some cleaning up prior to further refactoring. - PDSimpleFont does not do anything, its functionality should be moved into its superclass, PDFont. - PDFont#determineEncoding() loads CMaps when only Encodings are applicable, and vice versa. Loading needs to be pushed down into the appropriate subclasses, as a starting point the relevant code should at least be copied into the relevant subclasses ready for further refactoring. - TTFGlyph2D does its own decoding of char codes, rather than using the font's #encode method (fair enough because #encode is broken) and there's a copy and pasted version of the same code in PDTrueTypeFont - we need to consolidate this code into PDTrueTypeFont where it belongs. Phase 2 - Refactor loading of CMaps and Encodings from font dictionaries, this will involve changes to PDFont and its subclasses to delegate loading to subclasses where it can be properly encapsulated - May need to alter the class hierarchy w.r.t CIDFont to facilitate this, as CIDFont isn't really a PDFont - it's parent Type0 font is responsible for its CMap. We'll see. Phase 3 - Refactor the decoding of character codes by PDFont and its subclasses, this will involve replacing the #getCodeFromArray, #encode and #encodeToCID methods. - Fix decoding of content stream character codes in PDFStreamEngine, using the newly refactored PDFont and using the current font's CMap to determine the code width. Phase 4 - Add support for generating embedded TTFs with Unicode -- This message was sent by Atlassian JIRA (v6.2#6252)
Jenkins build became unstable: PDFBox-trunk » Apache PDFBox #1107
See https://builds.apache.org/job/PDFBox-trunk/org.apache.pdfbox$pdfbox/1107/changes
Jenkins build became unstable: PDFBox-trunk #1107
See https://builds.apache.org/job/PDFBox-trunk/1107/changes
Jenkins build is back to stable : PDFBox-trunk » Apache PDFBox #1108
See https://builds.apache.org/job/PDFBox-trunk/org.apache.pdfbox$pdfbox/1108/
Jenkins build is back to stable : PDFBox-trunk #1108
See https://builds.apache.org/job/PDFBox-trunk/1108/