[jira] [Commented] (PDFBOX-1915) Implement shading with Coons and tensor-product patch meshes
[ https://issues.apache.org/jira/browse/PDFBOX-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14063218#comment-14063218 ] Tilman Hausherr commented on PDFBOX-1915: - I committed your radial shading optimizations in rev 1610917 in the trunk. I didn't measure the exact times, but PDFBOX-1764 and PDFBOX-1416 are much much faster. Implement shading with Coons and tensor-product patch meshes Key: PDFBOX-1915 URL: https://issues.apache.org/jira/browse/PDFBOX-1915 Project: PDFBox Issue Type: Improvement Components: Rendering Affects Versions: 1.8.5, 1.8.6, 2.0.0 Reporter: Tilman Hausherr Assignee: Shaola Ren Labels: graphical, gsoc2014, java, math, shading Fix For: 2.0.0 Attachments: CIB-coons-vs-tensormesh.pdf, CIB-coonsmesh.pdf, CONICAL.pdf, GWG060_Shading_x1a.pdf, GWG060_Shading_x1a_1.png, HSBWHEEL.pdf, McAfee-ShadingType7.pdf, Shadingtype6week1.pdf, TENSOR.pdf, XYZsweep.pdf, _gwg060_shading_x1a.pdf-1.png, _mcafee-shadingtype7.pdf-1.png, asy-coons-but-really-tensor.pdf, asy-tensor-rainbow.pdf, asy-tensor.pdf, ch14.pdf, coons-function.pdf, coons-function.ps, coons-nofunction-CMYK.pdf, coons-nofunction-CMYK.ps, coons-nofunction-Duotone.pdf, coons-nofunction-Duotone.ps, coons-nofunction-Gray.pdf, coons-nofunction-Gray.ps, coons-nofunction-RGB.pdf, coons-nofunction-RGB.ps, coons2-function.pdf, coons2-function.ps, coons4-function.ps, crestron-p9.pdf, eci_altona-test-suite-v2_technical_H.pdf, example_030.pdf, failedTest.rar, lamp_cairo.pdf, lamp_cairo7_0.png, lamp_cairo7_1.png, lamp_cairo7_1.png, lineRasterization.jpg, mcafeeU5.pdf, mcafeeU5_1.png, mcafeeu5.pdf-1.png, pass4FlagTest.rar, patchCases.jpg, patchMap.jpg, shading6ContourTest.rar, shading6Done.rar, shading7.rar, tensor-nofunction-RGB.pdf, tensor-nofunction-RGB.ps, tensor-nofunction-RGB_1.png, tensor4-nofunction.pdf, tensor4-nofunction.ps, tensor4-nofunction_1.png, updateshading6ContourTest.rar Of the seven shading methods described in the PDF specification, type 6 (Coons patch meshes) and type 7 (Tensor-product patch meshes) haven't been implemented. I have done type 1, 4 and 5, but I don't know the math for type 6 and 7. My math days are decades away. Knowledge prerequisites: - java, although you don't have to be a java ace, just feel confortable - math: you should know what cubic Bézier curves, Degenerate Bézier curves, bilinear interpolation, tensor-product, affine transform matrix and Bernstein polynomials are, or be able to learn it - maven (basic) - svn (basic) - an IDE like Netbeans or Eclipse or IntelliJ (basic) - ideally, you are either a math student who likes to program, or a computer science student who is specializing in graphics. A first look at PDFBOX: try the command utility here: https://pdfbox.apache.org/commandline/#pdfToImage and use your favorite PDF, or the PDFs mentioned in PDFBOX-615, these have the shading types that are already implemented. Some simple source code to convert to images: String filename = blah.pdf; PDDocument document = PDDocument.loadNonSeq(new File(filename), null); ListPDPage pdPages = document.getDocumentCatalog().getAllPages(); int page = 0; for (PDPage pdPage : pdPages) { ++page; BufferedImage bim = RenderUtil.convertToImage(pdPage, BufferedImage.TYPE_BYTE_BINARY, 300); ImageIO.write(bim, png, new File(filename+page+.png)); } document.close(); You are not starting from scratch. The implementation of type 4 and 5 shows you how to read parameters from the PDF and set the graphics. You don't have to learn the complete PDF spec, only 15 pages related to the two shading types, and 6 pages about shading in general. The PDF specification is here: http://www.adobe.com/devnet/pdf/pdf_reference.html The tricky parts are: - decide whether a point(x,y) is inside or outside a patch - decide the color of a point within the patch To get an idea about the code, look at the classes GouraudTriangle, GouraudShadingContext, Type4ShadingContext and Vertex here https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/shading/ or download the whole project from the repository. https://pdfbox.apache.org/downloads.html#scm If you want to see the existing code in the debugger with a Gouraud shading, try this file: http://asymptote.sourceforge.net/gallery/Gouraud.pdf Testing: I have attached several example PDFs. To see which one has which shading, open them with an editor like NOTEPAD++, and search for /ShadingType (without the quotes). If your images are rendering like the example PDFs, then you were successful. Optional: Review and optimize the complete shading package for speed; implement cubic
Build failed in Jenkins: PDFBox-ant #1441
See https://builds.apache.org/job/PDFBox-ant/1441/changes Changes: [tilman] PDFBOX-1915: Optimization of radial shading by Shaola Ren as part of GSoC2014 -- Started by an SCM change Building remotely on ubuntu-6 (Ubuntu ubuntu) in workspace https://builds.apache.org/job/PDFBox-ant/ws/ Updating http://svn.apache.org/repos/asf/pdfbox/trunk at revision '2014-07-16T06:49:16.119 +' U pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/shading/RadialShadingContext.java At revision 1610917 FATAL: Cannot find executable from the chosen Ant installation Ant 1.7.0 Build step 'Invoke Ant' marked build as failure
[jira] [Commented] (PDFBOX-1915) Implement shading with Coons and tensor-product patch meshes
[ https://issues.apache.org/jira/browse/PDFBOX-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14063235#comment-14063235 ] Tilman Hausherr commented on PDFBOX-1915: - About {quote} Technically, I can use a 2D array to store the pixels' color instead of a hashmap in type 67 shading {quote} IMHO we can leave it as it is now, i.e. with the hash map. I'll review commit the changes on 6 7 later today. Implement shading with Coons and tensor-product patch meshes Key: PDFBOX-1915 URL: https://issues.apache.org/jira/browse/PDFBOX-1915 Project: PDFBox Issue Type: Improvement Components: Rendering Affects Versions: 1.8.5, 1.8.6, 2.0.0 Reporter: Tilman Hausherr Assignee: Shaola Ren Labels: graphical, gsoc2014, java, math, shading Fix For: 2.0.0 Attachments: CIB-coons-vs-tensormesh.pdf, CIB-coonsmesh.pdf, CONICAL.pdf, GWG060_Shading_x1a.pdf, GWG060_Shading_x1a_1.png, HSBWHEEL.pdf, McAfee-ShadingType7.pdf, Shadingtype6week1.pdf, TENSOR.pdf, XYZsweep.pdf, _gwg060_shading_x1a.pdf-1.png, _mcafee-shadingtype7.pdf-1.png, asy-coons-but-really-tensor.pdf, asy-tensor-rainbow.pdf, asy-tensor.pdf, ch14.pdf, coons-function.pdf, coons-function.ps, coons-nofunction-CMYK.pdf, coons-nofunction-CMYK.ps, coons-nofunction-Duotone.pdf, coons-nofunction-Duotone.ps, coons-nofunction-Gray.pdf, coons-nofunction-Gray.ps, coons-nofunction-RGB.pdf, coons-nofunction-RGB.ps, coons2-function.pdf, coons2-function.ps, coons4-function.ps, crestron-p9.pdf, eci_altona-test-suite-v2_technical_H.pdf, example_030.pdf, failedTest.rar, lamp_cairo.pdf, lamp_cairo7_0.png, lamp_cairo7_1.png, lamp_cairo7_1.png, lineRasterization.jpg, mcafeeU5.pdf, mcafeeU5_1.png, mcafeeu5.pdf-1.png, pass4FlagTest.rar, patchCases.jpg, patchMap.jpg, shading6ContourTest.rar, shading6Done.rar, shading7.rar, tensor-nofunction-RGB.pdf, tensor-nofunction-RGB.ps, tensor-nofunction-RGB_1.png, tensor4-nofunction.pdf, tensor4-nofunction.ps, tensor4-nofunction_1.png, updateshading6ContourTest.rar Of the seven shading methods described in the PDF specification, type 6 (Coons patch meshes) and type 7 (Tensor-product patch meshes) haven't been implemented. I have done type 1, 4 and 5, but I don't know the math for type 6 and 7. My math days are decades away. Knowledge prerequisites: - java, although you don't have to be a java ace, just feel confortable - math: you should know what cubic Bézier curves, Degenerate Bézier curves, bilinear interpolation, tensor-product, affine transform matrix and Bernstein polynomials are, or be able to learn it - maven (basic) - svn (basic) - an IDE like Netbeans or Eclipse or IntelliJ (basic) - ideally, you are either a math student who likes to program, or a computer science student who is specializing in graphics. A first look at PDFBOX: try the command utility here: https://pdfbox.apache.org/commandline/#pdfToImage and use your favorite PDF, or the PDFs mentioned in PDFBOX-615, these have the shading types that are already implemented. Some simple source code to convert to images: String filename = blah.pdf; PDDocument document = PDDocument.loadNonSeq(new File(filename), null); ListPDPage pdPages = document.getDocumentCatalog().getAllPages(); int page = 0; for (PDPage pdPage : pdPages) { ++page; BufferedImage bim = RenderUtil.convertToImage(pdPage, BufferedImage.TYPE_BYTE_BINARY, 300); ImageIO.write(bim, png, new File(filename+page+.png)); } document.close(); You are not starting from scratch. The implementation of type 4 and 5 shows you how to read parameters from the PDF and set the graphics. You don't have to learn the complete PDF spec, only 15 pages related to the two shading types, and 6 pages about shading in general. The PDF specification is here: http://www.adobe.com/devnet/pdf/pdf_reference.html The tricky parts are: - decide whether a point(x,y) is inside or outside a patch - decide the color of a point within the patch To get an idea about the code, look at the classes GouraudTriangle, GouraudShadingContext, Type4ShadingContext and Vertex here https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/shading/ or download the whole project from the repository. https://pdfbox.apache.org/downloads.html#scm If you want to see the existing code in the debugger with a Gouraud shading, try this file: http://asymptote.sourceforge.net/gallery/Gouraud.pdf Testing: I have attached several example PDFs. To see which one has which shading, open them with an editor like NOTEPAD++, and search for /ShadingType (without the quotes). If your images are rendering like the example PDFs, then you were successful. Optional: Review
[jira] [Commented] (PDFBOX-2209) [PATCH] Restore shading API
[ https://issues.apache.org/jira/browse/PDFBOX-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14063260#comment-14063260 ] simon steiner commented on PDFBOX-2209: --- Its not open source [PATCH] Restore shading API --- Key: PDFBOX-2209 URL: https://issues.apache.org/jira/browse/PDFBOX-2209 Project: PDFBox Issue Type: Wish Components: Rendering Affects Versions: 2.0.0 Reporter: simon steiner Assignee: Tilman Hausherr Fix For: 2.0.0 Attachments: shading.patch Some of shading API is gone in 2.0 can we have it back so we can convert PDF to postscript in fop -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2210) [PATCH] Allow caching of glyphs
[ https://issues.apache.org/jira/browse/PDFBOX-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14063263#comment-14063263 ] simon steiner commented on PDFBOX-2210: --- If we get glyph path without transform we can store that once inside postscript function and reuse it each time it is drawn in a document [PATCH] Allow caching of glyphs --- Key: PDFBOX-2210 URL: https://issues.apache.org/jira/browse/PDFBOX-2210 Project: PDFBox Issue Type: Improvement Components: Rendering Affects Versions: 2.0.0 Reporter: simon steiner Assignee: John Hewson Attachments: drawglyphs.patch If you seperate transform from glyph it means we can reuse glyphs in fop postscript output and get smaller output files -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (PDFBOX-2212) OutOfMemoryError in GlyfCompositeDescrip
Valdis Andersons created PDFBOX-2212: Summary: OutOfMemoryError in GlyfCompositeDescrip Key: PDFBOX-2212 URL: https://issues.apache.org/jira/browse/PDFBOX-2212 Project: PDFBox Issue Type: Bug Components: FontBox, Preflight Affects Versions: 1.8.6 Environment: Windows 7, JDK6 Reporter: Valdis Andersons Hi All, The application I’m working on is a web service that accepts PDF documents and combines them in a single larger PDF. Client submits a bunch of PDFs and we create a single PDF out of them. In some rare cases one of the PDF documents submitted has a glitch in it that causes Adobe Reader to throw errors when viewing the final document (attached). When I tried to check the buggy PDF with the approach outlined here: https://pdfbox.apache.org/cookbook/pdfavalidation.html I was getting an OutOfMemoryError in the GlyfCompositeDescrip class, here is the full stack trace: java.lang.OutOfMemoryError: Java heap space at org.apache.fontbox.ttf.GlyfCompositeDescript.init(GlyfCompositeDescript.java:58) at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:62) at org.apache.fontbox.ttf.GlyphTable.initData(GlyphTable.java:69) at org.apache.fontbox.ttf.TrueTypeFont.initializeTable(TrueTypeFont.java:280) at org.apache.fontbox.ttf.AbstractTTFParser.parseTables(AbstractTTFParser.java:128) at org.apache.fontbox.ttf.TTFParser.parseTables(TTFParser.java:80) at org.apache.fontbox.ttf.AbstractTTFParser.parseTTF(AbstractTTFParser.java:109) at org.apache.fontbox.ttf.TTFParser.parseTTF(TTFParser.java:25) at org.apache.fontbox.ttf.AbstractTTFParser.parseTTF(AbstractTTFParser.java:84) at org.apache.fontbox.ttf.TTFParser.parseTTF(TTFParser.java:25) at org.apache.pdfbox.preflight.font.descriptor.TrueTypeDescriptorHelper.processFontFile(TrueTypeDescriptorHelper.java:84) at org.apache.pdfbox.preflight.font.descriptor.FontDescriptorHelper.validate(FontDescriptorHelper.java:97) at org.apache.pdfbox.preflight.font.SimpleFontValidator.processFontDescriptorValidation(SimpleFontValidator.java:82) at org.apache.pdfbox.preflight.font.SimpleFontValidator.validate(SimpleFontValidator.java:55) at org.apache.pdfbox.preflight.process.reflect.FontValidationProcess.validate(FontValidationProcess.java:69) at org.apache.pdfbox.preflight.utils.ContextHelper.callValidation(ContextHelper.java:73) at org.apache.pdfbox.preflight.utils.ContextHelper.validateElement(ContextHelper.java:52) at org.apache.pdfbox.preflight.process.reflect.ResourcesValidationProcess.validateFonts(ResourcesValidationProcess.java:96) at org.apache.pdfbox.preflight.process.reflect.ResourcesValidationProcess.validate(ResourcesValidationProcess.java:74) at org.apache.pdfbox.preflight.utils.ContextHelper.callValidation(ContextHelper.java:73) at org.apache.pdfbox.preflight.utils.ContextHelper.validateElement(ContextHelper.java:52) at org.apache.pdfbox.preflight.xobject.XObjFormValidator.validateXObjectResources(XObjFormValidator.java:178) at org.apache.pdfbox.preflight.xobject.XObjFormValidator.validate(XObjFormValidator.java:75) at org.apache.pdfbox.preflight.process.reflect.GraphicObjectPageValidationProcess.validate(GraphicObjectPageValidationProcess.java:77) at org.apache.pdfbox.preflight.utils.ContextHelper.callValidation(ContextHelper.java:73) at org.apache.pdfbox.preflight.utils.ContextHelper.validateElement(ContextHelper.java:52) at org.apache.pdfbox.preflight.process.reflect.ResourcesValidationProcess.validateXObjects(ResourcesValidationProcess.java:191) at org.apache.pdfbox.preflight.process.reflect.ResourcesValidationProcess.validate(ResourcesValidationProcess.java:78) at org.apache.pdfbox.preflight.utils.ContextHelper.callValidation(ContextHelper.java:73) at org.apache.pdfbox.preflight.utils.ContextHelper.validateElement(ContextHelper.java:52) at org.apache.pdfbox.preflight.xobject.XObjFormValidator.validateXObjectResources(XObjFormValidator.java:178) at org.apache.pdfbox.preflight.xobject.XObjFormValidator.validate(XObjFormValidator.java:75) While I can’t send on the PDF in question due to the sensitivity of the contents in it, I did a bit of digging and debugging to find out why this is happening. In the GlyfCompositeDescrip classes constructor there is a do … while loop that is constructing GlyfCompositeComp objects and adding them to the components list of
[jira] [Commented] (PDFBOX-2210) [PATCH] Allow caching of glyphs
[ https://issues.apache.org/jira/browse/PDFBOX-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14063320#comment-14063320 ] Petr Slaby commented on PDFBOX-2210: We have a similar problem. In our application, we produce (among others) PCL and AFP output. For PDFBox, we have a PCL and AFP specific implementation of Graphics2D which produces the commands in the respective printer language. In the old solution, fillGlyphVector or drawGlyphVector was called for printing characters using AWT fonts. From the glyph vector, we were able to get at the AWT font and the character(s) being printed. From that, we were able to pick an existing PCL or AFP font if an equivalent for the AWT font was configured, or produce an on-the-fly font and embed it into the output. With the current solution, we just get a shape and do not even know that it is coming from rendering of text. I did not try to solve this yet, but I think I will probably need PageDrawer.drawGlyph2D() to become part of the API (make it protected instead of private) so that I can intercept it and call something else on our special G2D implementation. When producing on-the-fly fonts, we need some font metrics information - like ascend, descent and width of each character, etc. For that, I would need to put some more information into Glyph2D, e.g. have a reference to the underlying PDFont. [PATCH] Allow caching of glyphs --- Key: PDFBOX-2210 URL: https://issues.apache.org/jira/browse/PDFBOX-2210 Project: PDFBox Issue Type: Improvement Components: Rendering Affects Versions: 2.0.0 Reporter: simon steiner Assignee: John Hewson Attachments: drawglyphs.patch If you seperate transform from glyph it means we can reuse glyphs in fop postscript output and get smaller output files -- This message was sent by Atlassian JIRA (v6.2#6252)
Unmapped Glyphs in Font files
Hi Team, I want to check the unmapped glyphs in the TTF Font file and map those glyphs based on user input. Can you please help me out in this. Thanks in advance -- Shivam Gupta
[jira] [Commented] (PDFBOX-2117) AxialShadingContext is slow
[ https://issues.apache.org/jira/browse/PDFBOX-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14063436#comment-14063436 ] Shaola Ren commented on PDFBOX-2117: Thanks. I noticed you mentioned diagonal shading in a previous comment, but not too sure what the special part of it. For this axial shading, the axial line can be in any direction, the method doesn't depend on a specific direction; the diagonal direction is only one case of it. AxialShadingContext is slow --- Key: PDFBOX-2117 URL: https://issues.apache.org/jira/browse/PDFBOX-2117 Project: PDFBox Issue Type: Sub-task Components: Rendering Reporter: Petr Slaby Assignee: Shaola Ren Fix For: 2.0.0 Attachments: 01_MTEXT_CS6.pdf, AxialShading.patch, AxialShading1.patch, AxialShadingContext.java.getrgbimage, GWG061_Shading_x1a.pdf, GWG061_Shading_x1a.pdf-1.png, GWG061_Shading_x1a.pdf-1.png-diff.png, Shading2Function2.pdf, Shading2Function2.ps, Shading2Function2text.pdf, asy-shade.pdf, color_gradient.pdf, shading_pattern.pdf AxialShadingContext#getRaster() is on top of profiler hot spots in documents that use an axial shading. Inside it, the slowest part is calling PDColorSpaceRGB#toRGB() and PDFunctionType3#eval() (in this order). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2212) OutOfMemoryError in GlyfCompositeDescrip
[ https://issues.apache.org/jira/browse/PDFBOX-2212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14063640#comment-14063640 ] Tilman Hausherr commented on PDFBOX-2212: - This code in MemoryTTFDataStream looks suspicious to me: {code} public int read() throws IOException { int retval = -1; if( currentPosition data.length ) { retval = data[currentPosition]; } currentPosition++; return (retval+256)%256; } {code} it will return 255 and not -1 on EOF. Because of that, this method: {code} public int readUnsignedShort() throws IOException { int ch1 = this.read(); int ch2 = this.read(); if ((ch1 | ch2) 0) { throw new EOFException(); } return (ch1 8) + (ch2 0); } {code} won't throw an EOF. Try building from the sources and make this change in MemoryTTFDataStream: {code} public int read() throws IOException { if (currentPosition = data.length) { return -1; } int retval = data[currentPosition]; currentPosition++; return (retval+256)%256; } {code} This is just a theory, I can't test it myself, I might be wrong, so you should test it yourself by changing the code on your system and then testing that file. Obviously I'd need the file to be sure. And no, we didn't have this effect yet. We did have a similar effect a year ago that had the same cause (EOF) but that was fixed (PDFBOX-1668). OutOfMemoryError in GlyfCompositeDescrip Key: PDFBOX-2212 URL: https://issues.apache.org/jira/browse/PDFBOX-2212 Project: PDFBox Issue Type: Bug Components: FontBox, Preflight Affects Versions: 1.8.6 Environment: Windows 7, JDK6 Reporter: Valdis Andersons Attachments: adobe_error1.jpg, adobe_error2.jpg Hi All, The application I’m working on is a web service that accepts PDF documents and combines them in a single larger PDF. Client submits a bunch of PDFs and we create a single PDF out of them. In some rare cases one of the PDF documents submitted has a glitch in it that causes Adobe Reader to throw errors when viewing the final document (attached). When I tried to check the buggy PDF with the approach outlined here: https://pdfbox.apache.org/cookbook/pdfavalidation.html I was getting an OutOfMemoryError in the GlyfCompositeDescrip class, here is the full stack trace: java.lang.OutOfMemoryError: Java heap space at org.apache.fontbox.ttf.GlyfCompositeDescript.init(GlyfCompositeDescript.java:58) at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:62) at org.apache.fontbox.ttf.GlyphTable.initData(GlyphTable.java:69) at org.apache.fontbox.ttf.TrueTypeFont.initializeTable(TrueTypeFont.java:280) at org.apache.fontbox.ttf.AbstractTTFParser.parseTables(AbstractTTFParser.java:128) at org.apache.fontbox.ttf.TTFParser.parseTables(TTFParser.java:80) at org.apache.fontbox.ttf.AbstractTTFParser.parseTTF(AbstractTTFParser.java:109) at org.apache.fontbox.ttf.TTFParser.parseTTF(TTFParser.java:25) at org.apache.fontbox.ttf.AbstractTTFParser.parseTTF(AbstractTTFParser.java:84) at org.apache.fontbox.ttf.TTFParser.parseTTF(TTFParser.java:25) at org.apache.pdfbox.preflight.font.descriptor.TrueTypeDescriptorHelper.processFontFile(TrueTypeDescriptorHelper.java:84) at org.apache.pdfbox.preflight.font.descriptor.FontDescriptorHelper.validate(FontDescriptorHelper.java:97) at org.apache.pdfbox.preflight.font.SimpleFontValidator.processFontDescriptorValidation(SimpleFontValidator.java:82) at org.apache.pdfbox.preflight.font.SimpleFontValidator.validate(SimpleFontValidator.java:55) at org.apache.pdfbox.preflight.process.reflect.FontValidationProcess.validate(FontValidationProcess.java:69) at org.apache.pdfbox.preflight.utils.ContextHelper.callValidation(ContextHelper.java:73) at org.apache.pdfbox.preflight.utils.ContextHelper.validateElement(ContextHelper.java:52) at org.apache.pdfbox.preflight.process.reflect.ResourcesValidationProcess.validateFonts(ResourcesValidationProcess.java:96) at org.apache.pdfbox.preflight.process.reflect.ResourcesValidationProcess.validate(ResourcesValidationProcess.java:74) at org.apache.pdfbox.preflight.utils.ContextHelper.callValidation(ContextHelper.java:73) at org.apache.pdfbox.preflight.utils.ContextHelper.validateElement(ContextHelper.java:52)
[jira] [Updated] (PDFBOX-1511) pdfMerger App produces Garbage
[ https://issues.apache.org/jira/browse/PDFBOX-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirk Haines updated PDFBOX-1511: Attachment: PDFMergerUtility.java This version of PDFMergerUtility.java (based on 1.7.1 iirc) removes the shared resources section and instead applies resources on the page level. The cloner will create references for resources used on multiple pages, so there is not excessive resource duplication. The previous method assumed resources with the same name were identical, which is not valid (see prior comment about Font resource CMaps). pdfMerger App produces Garbage -- Key: PDFBOX-1511 URL: https://issues.apache.org/jira/browse/PDFBOX-1511 Project: PDFBox Issue Type: Bug Components: Utilities Affects Versions: 1.7.1 Environment: Win XP; Windows Server 2008 R2; java version 1.6.0_21, Reporter: Michael Huber Attachments: 1.pdf, 2.pdf, PDFMergerUtility.java, PdfRenderer.java, targetPdfMergeJava.pdf, targetPdfMergeUtilityApp.pdf pdfbox Utility pdfMerger produces a merged document containing garbage. All merged pdf files are contained but Strings are destroyed. The source pdf files are created with graphviz and are readable without error or disturbance both with Acrobat X and pdfbox pdfDebug Utility. Another astoundig thing is that a handcoded merger using pdfMergerUtility class works fine when run within Eclipse Juno and creates same garbage when run from cmd line (pls. see attached source) I checked everything that comes in mind to find the differences, e.g. Java version, encoding/codepage issues, memory settings, found nothing. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-1915) Implement shading with Coons and tensor-product patch meshes
[ https://issues.apache.org/jira/browse/PDFBOX-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064016#comment-14064016 ] Tilman Hausherr commented on PDFBOX-1915: - I took your code of PDFunctionType2 a step further (initialize all in constructor) and committed this in rev 1611163. However I don't expect much speed improvement - on the day I ran the profiler, the share of PDFunctiontype2 went to 0% after the change in getN(). Implement shading with Coons and tensor-product patch meshes Key: PDFBOX-1915 URL: https://issues.apache.org/jira/browse/PDFBOX-1915 Project: PDFBox Issue Type: Improvement Components: Rendering Affects Versions: 1.8.5, 1.8.6, 2.0.0 Reporter: Tilman Hausherr Assignee: Shaola Ren Labels: graphical, gsoc2014, java, math, shading Fix For: 2.0.0 Attachments: CIB-coons-vs-tensormesh.pdf, CIB-coonsmesh.pdf, CONICAL.pdf, GWG060_Shading_x1a.pdf, GWG060_Shading_x1a_1.png, HSBWHEEL.pdf, McAfee-ShadingType7.pdf, Shadingtype6week1.pdf, TENSOR.pdf, XYZsweep.pdf, _gwg060_shading_x1a.pdf-1.png, _mcafee-shadingtype7.pdf-1.png, asy-coons-but-really-tensor.pdf, asy-tensor-rainbow.pdf, asy-tensor.pdf, ch14.pdf, coons-function.pdf, coons-function.ps, coons-nofunction-CMYK.pdf, coons-nofunction-CMYK.ps, coons-nofunction-Duotone.pdf, coons-nofunction-Duotone.ps, coons-nofunction-Gray.pdf, coons-nofunction-Gray.ps, coons-nofunction-RGB.pdf, coons-nofunction-RGB.ps, coons2-function.pdf, coons2-function.ps, coons4-function.ps, crestron-p9.pdf, eci_altona-test-suite-v2_technical_H.pdf, example_030.pdf, failedTest.rar, lamp_cairo.pdf, lamp_cairo7_0.png, lamp_cairo7_1.png, lamp_cairo7_1.png, lineRasterization.jpg, mcafeeU5.pdf, mcafeeU5_1.png, mcafeeu5.pdf-1.png, pass4FlagTest.rar, patchCases.jpg, patchMap.jpg, shading6ContourTest.rar, shading6Done.rar, shading7.rar, tensor-nofunction-RGB.pdf, tensor-nofunction-RGB.ps, tensor-nofunction-RGB_1.png, tensor4-nofunction.pdf, tensor4-nofunction.ps, tensor4-nofunction_1.png, updateshading6ContourTest.rar Of the seven shading methods described in the PDF specification, type 6 (Coons patch meshes) and type 7 (Tensor-product patch meshes) haven't been implemented. I have done type 1, 4 and 5, but I don't know the math for type 6 and 7. My math days are decades away. Knowledge prerequisites: - java, although you don't have to be a java ace, just feel confortable - math: you should know what cubic Bézier curves, Degenerate Bézier curves, bilinear interpolation, tensor-product, affine transform matrix and Bernstein polynomials are, or be able to learn it - maven (basic) - svn (basic) - an IDE like Netbeans or Eclipse or IntelliJ (basic) - ideally, you are either a math student who likes to program, or a computer science student who is specializing in graphics. A first look at PDFBOX: try the command utility here: https://pdfbox.apache.org/commandline/#pdfToImage and use your favorite PDF, or the PDFs mentioned in PDFBOX-615, these have the shading types that are already implemented. Some simple source code to convert to images: String filename = blah.pdf; PDDocument document = PDDocument.loadNonSeq(new File(filename), null); ListPDPage pdPages = document.getDocumentCatalog().getAllPages(); int page = 0; for (PDPage pdPage : pdPages) { ++page; BufferedImage bim = RenderUtil.convertToImage(pdPage, BufferedImage.TYPE_BYTE_BINARY, 300); ImageIO.write(bim, png, new File(filename+page+.png)); } document.close(); You are not starting from scratch. The implementation of type 4 and 5 shows you how to read parameters from the PDF and set the graphics. You don't have to learn the complete PDF spec, only 15 pages related to the two shading types, and 6 pages about shading in general. The PDF specification is here: http://www.adobe.com/devnet/pdf/pdf_reference.html The tricky parts are: - decide whether a point(x,y) is inside or outside a patch - decide the color of a point within the patch To get an idea about the code, look at the classes GouraudTriangle, GouraudShadingContext, Type4ShadingContext and Vertex here https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/shading/ or download the whole project from the repository. https://pdfbox.apache.org/downloads.html#scm If you want to see the existing code in the debugger with a Gouraud shading, try this file: http://asymptote.sourceforge.net/gallery/Gouraud.pdf Testing: I have attached several example PDFs. To see which one has which shading, open them with an editor like NOTEPAD++, and search for /ShadingType (without the quotes). If your images are rendering like the example PDFs, then you were
Build failed in Jenkins: PDFBox-ant #1442
See https://builds.apache.org/job/PDFBox-ant/1442/changes Changes: [tilman] PDFBOX-1915: add missing element in toString [tilman] PDFBOX-1915: Optimization based on suggestion by Shaola Ren [tilman] PDFBOX-1915: Optimization of type 6 and 7 shading by Shaola Ren as part of GSoC2014 -- Started by an SCM change Building remotely on ubuntu-1 (Ubuntu ubuntu) in workspace https://builds.apache.org/job/PDFBox-ant/ws/ Updating http://svn.apache.org/repos/asf/pdfbox/trunk at revision '2014-07-16T21:00:10.534 +' U pdfbox/src/main/java/org/apache/pdfbox/pdmodel/common/function/PDFunctionType2.java D pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/shading/ColorRGB.java U pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/shading/RadialShadingContext.java U pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/shading/AxialShadingContext.java U pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/shading/PatchMeshesShadingContext.java At revision 1611188 FATAL: Cannot find executable from the chosen Ant installation Ant 1.7.0 Build step 'Invoke Ant' marked build as failure
[jira] [Commented] (PDFBOX-2206) Cannot save a document which has been closed
[ https://issues.apache.org/jira/browse/PDFBOX-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064130#comment-14064130 ] John Hewson commented on PDFBOX-2206: - This is why RAII is a good thing, so we don't end up with a constructor that leaves an object unusable. I guess we need ConformingPDFParser to pass the trailer to the constructor. Cannot save a document which has been closed Key: PDFBOX-2206 URL: https://issues.apache.org/jira/browse/PDFBOX-2206 Project: PDFBox Issue Type: Bug Components: Utilities Affects Versions: 2.0.0 Reporter: simon steiner Fix For: 2.0.0 Any pdf gives: java -jar ~/pdf-box-svn/app/target/pdfbox-app-2.0.0-SNAPSHOT.jar WriteDecodedDoc x.pdf Exception in thread main java.io.IOException: Cannot save a document which has been closed at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1230) at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1216) at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1204) at org.apache.pdfbox.tools.WriteDecodedDoc.doIt(WriteDecodedDoc.java:125) at org.apache.pdfbox.tools.WriteDecodedDoc.main(WriteDecodedDoc.java:191) at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:97) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2209) [PATCH] Restore shading API
[ https://issues.apache.org/jira/browse/PDFBOX-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064140#comment-14064140 ] John Hewson commented on PDFBOX-2209: - We need a really clear and compelling use case for this patch and *exactly* which methods are needed, otherwise it will be reverted. [PATCH] Restore shading API --- Key: PDFBOX-2209 URL: https://issues.apache.org/jira/browse/PDFBOX-2209 Project: PDFBox Issue Type: Wish Components: Rendering Affects Versions: 2.0.0 Reporter: simon steiner Assignee: Tilman Hausherr Fix For: 2.0.0 Attachments: shading.patch Some of shading API is gone in 2.0 can we have it back so we can convert PDF to postscript in fop -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (PDFBOX-2209) [PATCH] Restore shading API
[ https://issues.apache.org/jira/browse/PDFBOX-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064140#comment-14064140 ] John Hewson edited comment on PDFBOX-2209 at 7/16/14 9:36 PM: -- We need a really clear and compelling use case for this patch and *exactly* which methods are needed, otherwise it will be reverted. I'm strongly against exposing implementation details which are so tightly coupled to PageDrawer. was (Author: jahewson): We need a really clear and compelling use case for this patch and *exactly* which methods are needed, otherwise it will be reverted. [PATCH] Restore shading API --- Key: PDFBOX-2209 URL: https://issues.apache.org/jira/browse/PDFBOX-2209 Project: PDFBox Issue Type: Wish Components: Rendering Affects Versions: 2.0.0 Reporter: simon steiner Assignee: Tilman Hausherr Fix For: 2.0.0 Attachments: shading.patch Some of shading API is gone in 2.0 can we have it back so we can convert PDF to postscript in fop -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-2210) [PATCH] Allow caching of glyphs
[ https://issues.apache.org/jira/browse/PDFBOX-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064159#comment-14064159 ] John Hewson commented on PDFBOX-2210: - {quote} PageDrawer.drawGlyph2D() to become part of the API (make it protected instead of private) so that I can intercept it and call something else on our special G2D implementation. {quote} I've recently added some new glyph APIs to a new class PDFGraphicsStreamEngine, which PageDrawer extends. You can probably get what you need by overriding those methods. Use of drawGlyphVector will be entirely removed from 2.0 soon. Really though, PageDrawer isn't meant to be subclassed, ideally for 2.0 it will be marked final and PDFGraphicsStreamEngine can be used for anyone interested in intercepting the graphics commands (I do this myself, somewhat like your PCL output). [PATCH] Allow caching of glyphs --- Key: PDFBOX-2210 URL: https://issues.apache.org/jira/browse/PDFBOX-2210 Project: PDFBox Issue Type: Improvement Components: Rendering Affects Versions: 2.0.0 Reporter: simon steiner Assignee: John Hewson Attachments: drawglyphs.patch If you seperate transform from glyph it means we can reuse glyphs in fop postscript output and get smaller output files -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Unmapped Glyphs in Font files
What do you mean by check the unmapped glyphs”, can you elaborate? -- John On 16 Jul 2014, at 04:35, Shivam Gupta shivam26.gu...@gmail.com wrote: Hi Team, I want to check the unmapped glyphs in the TTF Font file and map those glyphs based on user input. Can you please help me out in this. Thanks in advance -- Shivam Gupta
[jira] [Commented] (PDFBOX-1000) Conforming parser
[ https://issues.apache.org/jira/browse/PDFBOX-1000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064174#comment-14064174 ] Tilman Hausherr commented on PDFBOX-1000: - In PDFBOX-2206 we wanted to create the document catalog within the constructor of PDDocument to clean things up. But this isn't possible because ConformingPDFParser.parse() is creating a ConformingPDDocument, thus a PDDocument before the trailer is known to the document. So it would be nice if you'd have a look whether the code can be changed so that the document that is passed to the ConformingPDDocument has the trailer. Conforming parser - Key: PDFBOX-1000 URL: https://issues.apache.org/jira/browse/PDFBOX-1000 Project: PDFBox Issue Type: New Feature Components: Parsing Reporter: Adam Nichols Assignee: Adam Nichols Attachments: COSUnread.java, ConformingPDDocument.java, ConformingPDFParser.java, ConformingPDFParserTest.java, PDFLexer.java, PDFLexer.java, PDFStreamConstants.java, PDFStreamConstants.java, XrefEntry.java, conforming-parser.patch, gdb-refcard.pdf A conforming parser will start at the end of the file and read backward until it has read the EOF marker, the xref location, and trailer[1]. Once this is read, it will read in the xref table so it can locate other objects and revisions. This also allows skipping objects which have been rendered obsolete (per the xref table)[2]. It also allows the minimum amount of information to be read when the file is loaded, and then subsequent information will be loaded if and when it is requested. This is all laid out in the official PDF specification, ISO 32000-1:2008. Existing code will be re-used where possible, but this will require new classes in order to accommodate the lazy reading which is a very different paradigm from the existing parser. Using separate classes will also eliminate the possibility of regression bugs from making their way into the PDDocument or BaseParser classes. Changes to existing classes will be kept to a minimum in order to prevent regression bugs. [1] Section 7.5.5 Conforming readers should read a PDF file from its end [2] Section 7.5.4 the entire file need not be read to locate any particular object -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-1511) pdfMerger App produces Garbage
[ https://issues.apache.org/jira/browse/PDFBOX-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064170#comment-14064170 ] John Hewson commented on PDFBOX-1511: - Hi Kirk, can you attach a patch using svn diff please. pdfMerger App produces Garbage -- Key: PDFBOX-1511 URL: https://issues.apache.org/jira/browse/PDFBOX-1511 Project: PDFBox Issue Type: Bug Components: Utilities Affects Versions: 1.7.1 Environment: Win XP; Windows Server 2008 R2; java version 1.6.0_21, Reporter: Michael Huber Attachments: 1.pdf, 2.pdf, PDFMergerUtility.java, PdfRenderer.java, targetPdfMergeJava.pdf, targetPdfMergeUtilityApp.pdf pdfbox Utility pdfMerger produces a merged document containing garbage. All merged pdf files are contained but Strings are destroyed. The source pdf files are created with graphviz and are readable without error or disturbance both with Acrobat X and pdfbox pdfDebug Utility. Another astoundig thing is that a handcoded merger using pdfMergerUtility class works fine when run within Eclipse Juno and creates same garbage when run from cmd line (pls. see attached source) I checked everything that comes in mind to find the differences, e.g. Java version, encoding/codepage issues, memory settings, found nothing. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Getting Line of Text from TextPos
Hi Aaron, You could store a map of TextPosition = Color which you populate in processTextPosition. Lines are not known until the end of TextStripper’s processing (they have to be inferred) so you could override a method from one of the phases at the end and I think you should have access to either lines or TextPositions which are merged into continuous runs (I can’t remember which). Alternatively you can always grab all TextPositions with the same y position. -- John On 16 Jul 2014, at 13:27, Aaron Hartman aa...@hrtmn.net wrote: Hi everyone; I am currently scanning PDF’s for errors that have red text in them. I accomplished this by extending the PDFTextStripper class and overriding the processTextPosition method to examine the PDGraphicsState for the appropriate color values. Once this position is found is it possible to extract only the line where that red text resides? For the user it would be beneficial to see the line in which the error occurs. Since the processTextPosition has the actual position I was hoping there may be a way to extract the line with the error from within this method, or by storing the position and accessing it elsewhere. If there is a way to accomplish this, please let me know! Thank you for your time. -Aaron
[jira] [Updated] (PDFBOX-1511) pdfMerger App produces Garbage
[ https://issues.apache.org/jira/browse/PDFBOX-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirk Haines updated PDFBOX-1511: Attachment: PDFMergerUtility.java.diff Diff version of changes. pdfMerger App produces Garbage -- Key: PDFBOX-1511 URL: https://issues.apache.org/jira/browse/PDFBOX-1511 Project: PDFBox Issue Type: Bug Components: Utilities Affects Versions: 1.7.1 Environment: Win XP; Windows Server 2008 R2; java version 1.6.0_21, Reporter: Michael Huber Attachments: 1.pdf, 2.pdf, PDFMergerUtility.java, PDFMergerUtility.java.diff, PdfRenderer.java, targetPdfMergeJava.pdf, targetPdfMergeUtilityApp.pdf pdfbox Utility pdfMerger produces a merged document containing garbage. All merged pdf files are contained but Strings are destroyed. The source pdf files are created with graphviz and are readable without error or disturbance both with Acrobat X and pdfbox pdfDebug Utility. Another astoundig thing is that a handcoded merger using pdfMergerUtility class works fine when run within Eclipse Juno and creates same garbage when run from cmd line (pls. see attached source) I checked everything that comes in mind to find the differences, e.g. Java version, encoding/codepage issues, memory settings, found nothing. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Getting Line of Text from TextPos
John, Excellent answer yet again, much appreciated! -Aaron On Wednesday, July 16, 2014, John Hewson j...@jahewson.com wrote: Hi Aaron, You could store a map of TextPosition = Color which you populate in processTextPosition. Lines are not known until the end of TextStripper’s processing (they have to be inferred) so you could override a method from one of the phases at the end and I think you should have access to either lines or TextPositions which are merged into continuous runs (I can’t remember which). Alternatively you can always grab all TextPositions with the same y position. -- John On 16 Jul 2014, at 13:27, Aaron Hartman aa...@hrtmn.net javascript:; wrote: Hi everyone; I am currently scanning PDF’s for errors that have red text in them. I accomplished this by extending the PDFTextStripper class and overriding the processTextPosition method to examine the PDGraphicsState for the appropriate color values. Once this position is found is it possible to extract only the line where that red text resides? For the user it would be beneficial to see the line in which the error occurs. Since the processTextPosition has the actual position I was hoping there may be a way to extract the line with the error from within this method, or by storing the position and accessing it elsewhere. If there is a way to accomplish this, please let me know! Thank you for your time. -Aaron