[jira] [Reopened] (PDFBOX-1988) PDFBox ExtractText issue of PDF with no embedded fonts
[ https://issues.apache.org/jira/browse/PDFBOX-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Hewson reopened PDFBOX-1988: - Reopening because we leave issues open until the version they were fixed in is released. PDFBox ExtractText issue of PDF with no embedded fonts -- Key: PDFBOX-1988 URL: https://issues.apache.org/jira/browse/PDFBOX-1988 Project: PDFBox Issue Type: Bug Components: Rendering, Text extraction Affects Versions: 1.8.4 Environment: Windows 7 Also, PASE on IBM i Reporter: Craig Strong Labels: patch Fix For: 1.8.5, 2.0.0 Attachments: Test1.pdf Original Estimate: 120h Remaining Estimate: 120h I have been using PDFBox 1.8.4 to extract text from several different PDF files fine. I use the latest PDFBox app with ExtractText command line. There is one PDF that PDFBox (and iText) fails to extract any text even though I can extract the text with Adobe Reader and also pdftotext.exe part of XPdf. java -jar pdfbox-app-1.8.4.jar ExtractText Test1.pdf Out.txt. I don't want to have to rely on using pdftotext.exe from a PC since this is part of an automated application. I think the error relates to an unknown font type and having to use the few fonts installed in the jar file. I tried running the API classes and trying to force a font from a certain location but I still got errors. I thought I loaded the font with the loadTTF method but I don't know if that did anything with the font. I would really like to have this working straight from the ExtractText class anyway. Here are the errors I am getting. I tried this from both a Windows 7 PC and our IBM i in the PASE environment but I get the same errors. The section starting processEncodedText and on repeats a few times so I just included the first entries. Mar 10, 2014 3:50:44 PM org.apache.pdfbox.pdmodel.font.PDFontFactory createFont WARNING: Substituting TrueType for unknown font subtype= Mar 10, 2014 3:50:45 PM org.apache.pdfbox.util.PDFStreamEngine processOperator WARNING: java.lang.NullPointerException Throwable occurred: java.lang.NullPointerException at org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.loadDescriptorDictionary(PDTrueTypeFont.java:375) at org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.ensureFontDescriptor(PDTrueTypeFont.java:221) at org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.init(PDTrueTypeFont.java:119) at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:121) at org.apache.pdfbox.pdmodel.PDResources.getFonts(PDResources.java:204) at org.apache.pdfbox.util.PDFStreamEngine.getFonts(PDFStreamEngine.java:604) at org.apache.pdfbox.util.operator.SetTextFont.process(SetTextFont.java:54) at org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:554) at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268) at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235) at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215) at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:456) at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:381) at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:340) at org.apache.pdfbox.ExtractText.startExtraction(ExtractText.java:275) at org.apache.pdfbox.ExtractText.main(ExtractText.java:85) at org.apache.pdfbox.PDFBox.main(PDFBox.java:58) Mar 10, 2014 3:50:45 PM org.apache.pdfbox.util.PDFStreamEngine processEncodedText WARNING: java.lang.NullPointerException Throwable occurred: java.lang.NullPointerException at org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:355) at org.apache.pdfbox.util.operator.ShowText.process(ShowText.java:45) at org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:554) at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268) at
[jira] [Resolved] (PDFBOX-1988) PDFBox ExtractText issue of PDF with no embedded fonts
[ https://issues.apache.org/jira/browse/PDFBOX-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Hewson resolved PDFBOX-1988. - Resolution: Fixed PDFBox ExtractText issue of PDF with no embedded fonts -- Key: PDFBOX-1988 URL: https://issues.apache.org/jira/browse/PDFBOX-1988 Project: PDFBox Issue Type: Bug Components: Rendering, Text extraction Affects Versions: 1.8.4 Environment: Windows 7 Also, PASE on IBM i Reporter: Craig Strong Labels: patch Fix For: 1.8.5, 2.0.0 Attachments: Test1.pdf Original Estimate: 120h Remaining Estimate: 120h I have been using PDFBox 1.8.4 to extract text from several different PDF files fine. I use the latest PDFBox app with ExtractText command line. There is one PDF that PDFBox (and iText) fails to extract any text even though I can extract the text with Adobe Reader and also pdftotext.exe part of XPdf. java -jar pdfbox-app-1.8.4.jar ExtractText Test1.pdf Out.txt. I don't want to have to rely on using pdftotext.exe from a PC since this is part of an automated application. I think the error relates to an unknown font type and having to use the few fonts installed in the jar file. I tried running the API classes and trying to force a font from a certain location but I still got errors. I thought I loaded the font with the loadTTF method but I don't know if that did anything with the font. I would really like to have this working straight from the ExtractText class anyway. Here are the errors I am getting. I tried this from both a Windows 7 PC and our IBM i in the PASE environment but I get the same errors. The section starting processEncodedText and on repeats a few times so I just included the first entries. Mar 10, 2014 3:50:44 PM org.apache.pdfbox.pdmodel.font.PDFontFactory createFont WARNING: Substituting TrueType for unknown font subtype= Mar 10, 2014 3:50:45 PM org.apache.pdfbox.util.PDFStreamEngine processOperator WARNING: java.lang.NullPointerException Throwable occurred: java.lang.NullPointerException at org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.loadDescriptorDictionary(PDTrueTypeFont.java:375) at org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.ensureFontDescriptor(PDTrueTypeFont.java:221) at org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.init(PDTrueTypeFont.java:119) at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:121) at org.apache.pdfbox.pdmodel.PDResources.getFonts(PDResources.java:204) at org.apache.pdfbox.util.PDFStreamEngine.getFonts(PDFStreamEngine.java:604) at org.apache.pdfbox.util.operator.SetTextFont.process(SetTextFont.java:54) at org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:554) at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268) at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235) at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215) at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:456) at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:381) at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:340) at org.apache.pdfbox.ExtractText.startExtraction(ExtractText.java:275) at org.apache.pdfbox.ExtractText.main(ExtractText.java:85) at org.apache.pdfbox.PDFBox.main(PDFBox.java:58) Mar 10, 2014 3:50:45 PM org.apache.pdfbox.util.PDFStreamEngine processEncodedText WARNING: java.lang.NullPointerException Throwable occurred: java.lang.NullPointerException at org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:355) at org.apache.pdfbox.util.operator.ShowText.process(ShowText.java:45) at org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:554) at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268) at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
Re: [GSoC 2014]Implement shading with Coons and tensor-product patch meshes
Hi Tilman, I'll look in to the PDF spec related to Function Type thanks for the that. Thanks for the tips on the proposal I uploaded my proposal to the melange here is the url https://www.google-melange.com/gsoc/proposal/review/student/google/gsoc2014/thimal/5649050225344512 I have suggested new method simple method to find patch of given point and according to pdf spec type 6 can take as special case of type 7. so given 12 points we can calculate other 4 values and use same implementation to type 6. I would be glad if you can give feed back on my proposal. On Wed, Mar 12, 2014 at 11:42 PM, Tilman Hausherr thaush...@t-online.dewrote: Hello, The function is something used mostly by shading types 1, 2 and 3. It uses as input either the coordinates, or the result of a formula based on them. Enter FunctionType in the PDF spec. Re: the proposal, no I don't have a sample. I don't even know how the google format looks like. What I'd expect to see is your background, what you are studying, what are you mostly focused on in these studies, what are your skills / experiences, and why do you think you're the one for this project. And maybe a few lines how you're going to crack the two core problems (1. point inside/outside, 2. color). If you don't know, then maybe a few lines explaining what you will want to learn to know it. Tilman Am 12.03.2014 14:39, schrieb Thimal Kempitiya: Hi Tilman, Thanks for the feedback. What you mean by the function calculations is it function evaluation method can you please give more information on it. About the proposal what advise can you give, is there specific way that pdfbox expect apart form the gsoc format and is there any sample proposal that we can get idea about writing proposal. On Sun, Mar 9, 2014 at 8:42 PM, Tilman Hausherr thaush...@t-online.de wrote: Hello, Yes this is an interesting idea. It would save the recalculation of y1y0 * (y + j - coords[1]) everytime. (Unless the java compiler detects this already) But don't expect too much from it - I believe more time is lost in function calculation (at least for types 1, 2 and 3 where functions are mandatory). Tilman Am 09.03.2014 15:12, schrieb Thimal Kempitiya: Thanks Tilman for optimization in speed I think we need to facus on methods which use again and again like getRaster for the axial shading part current implementation in the getRaster method we calculate the x' value for the raster inside the for by for loop for (int j = 0; j h; j++) { for (int i = 0; i w; i++) { useBackground = false; double inputValue = x1x0 * (x + i - coords[0]); inputValue += y1y0 * (y + j - coords[1]); but all the time changing happen in the i and j values and they vary from 0jh and 0iw so the contribution form i and j values can be calculation in separate 2 for loops which run from 0 to h and 0 to w and calculate these values separately and put them in 2 arrays and when we need to evaluate we can add to the input value this will reduce the calculations inside the for by for loop and put them inside a 2 for loops this may be speed up the axial shading what you think about it On Fri, Mar 7, 2014 at 11:44 PM, Tilman Hausherr thaush...@t-online.de wrote: Am 07.03.2014 15:03, schrieb Thimal Kempitiya: Thanks Tilman for the feedback http://www.particleincell.com/blog/2012/quad-interpolation/ seems like opposite of what we are going need to check whether its work with this by implementing it (but can easily implement if we used library with matrix manipulations) This is really up to you :-) Re: the pure math parts, its rather me who is learning something. Re: library, you can use the java standard library, or any library with Apache license or compatible license. can I know more about the optional part in the issue Optional: Review and optimize the complete shading package; implement cubic spline interpolation for type 0 (sampled) functions. where I can get more information about the cubic spline interpolation for type 0 (sampled) functions and in what aspects do you expect the optimization. Optimization for speed. Especially the axial shading. It gets slow when the shaded area is very large. The cubic spline interpolation is mentioned in the PDF spec at the type 0 (sampled) functions, it is the part where order = 3. In the PDF spec, search for it, or for Additional entries specific to a type 0 function dictionary. Its really just a nice to have and of low priority. There's a note from adobe telling that it is not done for printing. Tilman On Tue, Mar 4, 2014 at 10:13 PM, Tilman Hausherr thaush...@t-online.de wrote: Am 04.03.2014 15:19, schrieb Thimal Kempitiya: Hi, I checked the code related to the shading and studied the pdf spec related to the type 6. As I see it is
[jira] [Commented] (PDFBOX-1987) Provide a PDF Lexer as a base for PDF parsing
[ https://issues.apache.org/jira/browse/PDFBOX-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13939008#comment-13939008 ] Maruan Sahyoun commented on PDFBOX-1987: PDFBOX-276 describes such a file. PDF.js has some files with invalid hex strings. There are some files which have missing CR and/or LF at the end of a stream ... Provide a PDF Lexer as a base for PDF parsing - Key: PDFBOX-1987 URL: https://issues.apache.org/jira/browse/PDFBOX-1987 Project: PDFBox Issue Type: Improvement Components: Parsing Reporter: Maruan Sahyoun Priority: Minor Fix For: 2.0.0 Attachments: src.zip In order to enhance the parsing process and as a foundation for a combination of the different parsers a PDF lexer should be provided. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-1512) TextPositionComparator is not compatible with Java 7
[ https://issues.apache.org/jira/browse/PDFBOX-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13939027#comment-13939027 ] Hannes Erven commented on PDFBOX-1512: -- The issue is not related to a specific sorting algorithm. At the moment, the desired order of the elements is not sufficiently defined. Some algorithms don't care, which may result in inconsistent ordering across multiple calls, some algorithms (as those Java7 defaults to) detect this and throw an exception. I did try hacking the Comparator, but so far it doesn't pass the TextExtract test cases :-( TextPositionComparator is not compatible with Java 7 Key: PDFBOX-1512 URL: https://issues.apache.org/jira/browse/PDFBOX-1512 Project: PDFBox Issue Type: Bug Components: Text extraction Affects Versions: 1.7.1 Environment: Java 7 Reporter: Benjamin Papez Assignee: Andreas Lehmkühler Attachments: FOP-2252.pdf, TextPositionComparator.java, WFI_PDFParser_TextPostionComparator.txt, immo-kurier_arsenal_93x62.pdf The TextPostionCompartor causes the following exception running on Java 7: Unexpected RuntimeException from org.apache.tika.parser.ParserDecorator$1@9007fa2 Original cause: Comparison method violates its general contract! I think the problem is with this check: if ( yDifference .1 || (pos2YBottom = pos1YTop pos2YBottom = pos1YBottom) || (pos1YBottom = pos2YTop pos1YBottom = pos2YBottom)) as it violates the contract requirement: The implementor must also ensure that the relation is transitive: ((compare(x, y)0) (compare(y, z)0)) implies compare(x, z)0. Finally, the implementor must ensure that compare(x, y)==0 implies that sgn(compare(x, z))==sgn(compare(y, z)) for all z. Java 7 now is strict and throws exceptions when the contract is violated. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (PDFBOX-1969) JPEGFactory bug
[ https://issues.apache.org/jira/browse/PDFBOX-1969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr resolved PDFBOX-1969. - Resolution: Fixed JPEGFactory bug --- Key: PDFBOX-1969 URL: https://issues.apache.org/jira/browse/PDFBOX-1969 Project: PDFBox Issue Type: Bug Affects Versions: 2.0.0 Reporter: Steven Burg Fix For: 2.0.0 Attempted to run the RubberStampWithImage sample and received the following errors: Exception in thread main java.lang.NullPointerException at org.apache.pdfbox.pdmodel.graphics.image.JPEGFactory.createFromStream(JPEGFactory.java:72) at org.apache.pdfbox.examples.pdmodel.RubberStampWithImage.doIt(RubberStampWithImage.java:93) at org.apache.pdfbox.examples.pdmodel.RubberStampWithImage.main(RubberStampWithImage.java:185) This happens with any jog I tested with. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: [GSoC 2014]Implement shading with Coons and tensor-product patch meshes
Am 18.03.2014 10:41, schrieb Thimal Kempitiya: I would be glad if you can give feed back on my proposal. Hello, The URL doesn't work except for you, the text appeared in the mentors list and I can also see it in the dashboard. I will give feedback there. Tilman
[jira] [Updated] (PDFBOX-1466) Rendering of pattern colorspace fails
[ https://issues.apache.org/jira/browse/PDFBOX-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated PDFBOX-1466: Attachment: pdfbox-1466-01-img10.png pdfbox-1466-01-img9.png Here are two blurry images I found within the file. They appear in the rendering, but not in Adobe Reader. Very mysterious. Rendering of pattern colorspace fails - Key: PDFBOX-1466 URL: https://issues.apache.org/jira/browse/PDFBOX-1466 Project: PDFBox Issue Type: Bug Components: Rendering Affects Versions: 1.7.1, 1.8.4, 2.0.0 Environment: Windows 7, JDK 1.6 / 1.7 Reporter: Maurice Koch Labels: tilingpattern Fix For: 2.0.0 Attachments: pdfbox-1466-01-img10.png, pdfbox-1466-01-img9.png, pdfbox-1466.pdf-1.png, report.pdf, report.png I was trying to print a pdf which was generated by iText v2.1.5. Unfortunately parts of it were printed in white – the filling color was missing. I could reduce the problem to the attached PDF. When trying to print with e.g. PDocument.silentPrint I get the following info message: [INFO] [org.apache.pdfbox.pdfviewer.PageDrawer] ColorSpace Pattern doesn't provide a non-stroking color, using white instead! -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (PDFBOX-1466) Rendering of pattern colorspace fails
[ https://issues.apache.org/jira/browse/PDFBOX-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934711#comment-13934711 ] Tilman Hausherr edited comment on PDFBOX-1466 at 3/18/14 1:11 PM: -- Here's a current rendering, it is almost perfect now. The only problem left is a weird shadow. I'm not sure whether the shadow is related to the pattern colorspace. was (Author: tilman): Here's a current rendering, it is almost perfect now. The only problem left is a weird shadow. I'm not sure whether the shadow is related to the pattern colorspace. It might be a similar problem as in PDFBOX-1830 and PDFBOX-1954 (line width). Rendering of pattern colorspace fails - Key: PDFBOX-1466 URL: https://issues.apache.org/jira/browse/PDFBOX-1466 Project: PDFBox Issue Type: Bug Components: Rendering Affects Versions: 1.7.1, 1.8.4, 2.0.0 Environment: Windows 7, JDK 1.6 / 1.7 Reporter: Maurice Koch Labels: tilingpattern Fix For: 2.0.0 Attachments: pdfbox-1466-01-img10.png, pdfbox-1466-01-img9.png, pdfbox-1466.pdf-1.png, report.pdf, report.png I was trying to print a pdf which was generated by iText v2.1.5. Unfortunately parts of it were printed in white – the filling color was missing. I could reduce the problem to the attached PDF. When trying to print with e.g. PDocument.silentPrint I get the following info message: [INFO] [org.apache.pdfbox.pdfviewer.PageDrawer] ColorSpace Pattern doesn't provide a non-stroking color, using white instead! -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-1466) Rendering of pattern colorspace fails
[ https://issues.apache.org/jira/browse/PDFBOX-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maruan Sahyoun updated PDFBOX-1466: --- Attachment: report_Seite_1_Bild_0004.png report_Seite_1_Bild_0003.png report_Seite_1_Bild_0002.png report_Seite_1_Bild_0001.png These are the images in use within the PDF as exported by Adobe Acrobat. Rendering of pattern colorspace fails - Key: PDFBOX-1466 URL: https://issues.apache.org/jira/browse/PDFBOX-1466 Project: PDFBox Issue Type: Bug Components: Rendering Affects Versions: 1.7.1, 1.8.4, 2.0.0 Environment: Windows 7, JDK 1.6 / 1.7 Reporter: Maurice Koch Labels: tilingpattern Fix For: 2.0.0 Attachments: pdfbox-1466-01-img10.png, pdfbox-1466-01-img9.png, pdfbox-1466.pdf-1.png, report.pdf, report.png, report_Seite_1_Bild_0001.png, report_Seite_1_Bild_0002.png, report_Seite_1_Bild_0003.png, report_Seite_1_Bild_0004.png I was trying to print a pdf which was generated by iText v2.1.5. Unfortunately parts of it were printed in white – the filling color was missing. I could reduce the problem to the attached PDF. When trying to print with e.g. PDocument.silentPrint I get the following info message: [INFO] [org.apache.pdfbox.pdfviewer.PageDrawer] ColorSpace Pattern doesn't provide a non-stroking color, using white instead! -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-1466) Rendering of pattern colorspace fails
[ https://issues.apache.org/jira/browse/PDFBOX-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13939235#comment-13939235 ] Maruan Sahyoun commented on PDFBOX-1466: I added the images in use as exported by Adobe Acrobat. report_Seite_1_Bild_0004.png looks like it’s the same as pdfbox-1466-01-img10.png. For pdfbox-1466-01-img9.png this could be a mask as these are used within the PDF when inspecting how that was generated. Rendering of pattern colorspace fails - Key: PDFBOX-1466 URL: https://issues.apache.org/jira/browse/PDFBOX-1466 Project: PDFBox Issue Type: Bug Components: Rendering Affects Versions: 1.7.1, 1.8.4, 2.0.0 Environment: Windows 7, JDK 1.6 / 1.7 Reporter: Maurice Koch Labels: tilingpattern Fix For: 2.0.0 Attachments: pdfbox-1466-01-img10.png, pdfbox-1466-01-img9.png, pdfbox-1466.pdf-1.png, report.pdf, report.png, report_Seite_1_Bild_0001.png, report_Seite_1_Bild_0002.png, report_Seite_1_Bild_0003.png, report_Seite_1_Bild_0004.png I was trying to print a pdf which was generated by iText v2.1.5. Unfortunately parts of it were printed in white – the filling color was missing. I could reduce the problem to the attached PDF. When trying to print with e.g. PDocument.silentPrint I get the following info message: [INFO] [org.apache.pdfbox.pdfviewer.PageDrawer] ColorSpace Pattern doesn't provide a non-stroking color, using white instead! -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-1990) Support creating PDF from lossless encoded images
[ https://issues.apache.org/jira/browse/PDFBOX-1990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13939325#comment-13939325 ] Tilman Hausherr commented on PDFBOX-1990: - NullOutputStream optimized for speed in rev 1578940. Support creating PDF from lossless encoded images - Key: PDFBOX-1990 URL: https://issues.apache.org/jira/browse/PDFBOX-1990 Project: PDFBox Issue Type: Improvement Reporter: Tilman Hausherr Priority: Minor Currently we support the insertion of TIFF and JPEG into a PDF, but not PNG. We can pass a BufferedImage, but this one will be JPEG compressed which is not a good thing for graphics with sharp edges. I suggest that we support PNG as well. It is possible because the Flate Filter supports both directions. My implementation (coming in a few minutes) is just an RGB based start that begs for improvement. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-1990) Support creating PDF from lossless encoded images
[ https://issues.apache.org/jira/browse/PDFBOX-1990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13939626#comment-13939626 ] John Hewson commented on PDFBOX-1990: - This looks good, a couple of random thoughts: 1) The static factory method doesn't need the word Lossless in it, because it's already in the factory name: {code} LosslessFactory.createLosslessFromImage(...) {code} vs. {code} LosslessFactory.createFromImage(...) {code} 2) When naming variables for parameters in the public API, prefer short words over abbreviations, e.g. {code} BufferedImage bim --- BufferedImage image {code} Also, if a local variable name is already a short word, prefer not abbreviating it: {code} Color co --- Color color {code} :) Support creating PDF from lossless encoded images - Key: PDFBOX-1990 URL: https://issues.apache.org/jira/browse/PDFBOX-1990 Project: PDFBox Issue Type: Improvement Reporter: Tilman Hausherr Priority: Minor Currently we support the insertion of TIFF and JPEG into a PDF, but not PNG. We can pass a BufferedImage, but this one will be JPEG compressed which is not a good thing for graphics with sharp edges. I suggest that we support PNG as well. It is possible because the Flate Filter supports both directions. My implementation (coming in a few minutes) is just an RGB based start that begs for improvement. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (PDFBOX-1990) Support creating PDF from lossless encoded images
[ https://issues.apache.org/jira/browse/PDFBOX-1990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13939626#comment-13939626 ] John Hewson edited comment on PDFBOX-1990 at 3/18/14 6:42 PM: -- This looks good, a couple of random thoughts: 1) The static factory method doesn't need the word Lossless in it, because it's already in the factory name: {code} LosslessFactory.createLosslessFromImage(...) {code} vs. {code} LosslessFactory.createFromImage(...) {code} 2) When naming variables for parameters in the public API, prefer short words over abbreviations, e.g. {code} BufferedImage bim --- BufferedImage image {code} 3) Also, if a local variable name is already a short word, prefer not abbreviating it: {code} Color co --- Color color {code} :) was (Author: jahewson): This looks good, a couple of random thoughts: 1) The static factory method doesn't need the word Lossless in it, because it's already in the factory name: {code} LosslessFactory.createLosslessFromImage(...) {code} vs. {code} LosslessFactory.createFromImage(...) {code} 2) When naming variables for parameters in the public API, prefer short words over abbreviations, e.g. {code} BufferedImage bim --- BufferedImage image {code} Also, if a local variable name is already a short word, prefer not abbreviating it: {code} Color co --- Color color {code} :) Support creating PDF from lossless encoded images - Key: PDFBOX-1990 URL: https://issues.apache.org/jira/browse/PDFBOX-1990 Project: PDFBox Issue Type: Improvement Reporter: Tilman Hausherr Priority: Minor Currently we support the insertion of TIFF and JPEG into a PDF, but not PNG. We can pass a BufferedImage, but this one will be JPEG compressed which is not a good thing for graphics with sharp edges. I suggest that we support PNG as well. It is possible because the Flate Filter supports both directions. My implementation (coming in a few minutes) is just an RGB based start that begs for improvement. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-1987) Provide a PDF Lexer as a base for PDF parsing
[ https://issues.apache.org/jira/browse/PDFBOX-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13939630#comment-13939630 ] John Hewson commented on PDFBOX-1987: - Thanks, it's a tricky problem to solve. Provide a PDF Lexer as a base for PDF parsing - Key: PDFBOX-1987 URL: https://issues.apache.org/jira/browse/PDFBOX-1987 Project: PDFBox Issue Type: Improvement Components: Parsing Reporter: Maruan Sahyoun Priority: Minor Fix For: 2.0.0 Attachments: src.zip In order to enhance the parsing process and as a foundation for a combination of the different parsers a PDF lexer should be provided. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-1512) TextPositionComparator is not compatible with Java 7
[ https://issues.apache.org/jira/browse/PDFBOX-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13939639#comment-13939639 ] John Hewson commented on PDFBOX-1512: - {quote} Some algorithms don't care, which may result in inconsistent ordering across multiple calls {quote} I don't see how this would happen unless the algorithm was randomised. For a given input the output should always be the same, regardless. But as you say the ordering is not sufficiently defined, so there may be more than one sort which allowed. Perhaps we need some more sophisticated rule for determining reading order? Perhaps [Topological sorting|http://en.wikipedia.org/wiki/Topological_sorting] may be of relevance? TextPositionComparator is not compatible with Java 7 Key: PDFBOX-1512 URL: https://issues.apache.org/jira/browse/PDFBOX-1512 Project: PDFBox Issue Type: Bug Components: Text extraction Affects Versions: 1.7.1 Environment: Java 7 Reporter: Benjamin Papez Assignee: Andreas Lehmkühler Attachments: FOP-2252.pdf, TextPositionComparator.java, WFI_PDFParser_TextPostionComparator.txt, immo-kurier_arsenal_93x62.pdf The TextPostionCompartor causes the following exception running on Java 7: Unexpected RuntimeException from org.apache.tika.parser.ParserDecorator$1@9007fa2 Original cause: Comparison method violates its general contract! I think the problem is with this check: if ( yDifference .1 || (pos2YBottom = pos1YTop pos2YBottom = pos1YBottom) || (pos1YBottom = pos2YTop pos1YBottom = pos2YBottom)) as it violates the contract requirement: The implementor must also ensure that the relation is transitive: ((compare(x, y)0) (compare(y, z)0)) implies compare(x, z)0. Finally, the implementor must ensure that compare(x, y)==0 implies that sgn(compare(x, z))==sgn(compare(y, z)) for all z. Java 7 now is strict and throws exceptions when the contract is violated. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (PDFBOX-1512) TextPositionComparator is not compatible with Java 7
[ https://issues.apache.org/jira/browse/PDFBOX-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13939639#comment-13939639 ] John Hewson edited comment on PDFBOX-1512 at 3/18/14 6:55 PM: -- {quote} Some algorithms don't care, which may result in inconsistent ordering across multiple calls {quote} I don't see how this would happen unless the algorithm was randomised. For a given input the output should always be the same, regardless. But as you say the ordering is not sufficiently defined, so there may be more than one solution. Perhaps we need some more sophisticated rule for determining reading order? Perhaps [Topological sorting|http://en.wikipedia.org/wiki/Topological_sorting] may be of relevance? was (Author: jahewson): {quote} Some algorithms don't care, which may result in inconsistent ordering across multiple calls {quote} I don't see how this would happen unless the algorithm was randomised. For a given input the output should always be the same, regardless. But as you say the ordering is not sufficiently defined, so there may be more than one sort which allowed. Perhaps we need some more sophisticated rule for determining reading order? Perhaps [Topological sorting|http://en.wikipedia.org/wiki/Topological_sorting] may be of relevance? TextPositionComparator is not compatible with Java 7 Key: PDFBOX-1512 URL: https://issues.apache.org/jira/browse/PDFBOX-1512 Project: PDFBox Issue Type: Bug Components: Text extraction Affects Versions: 1.7.1 Environment: Java 7 Reporter: Benjamin Papez Assignee: Andreas Lehmkühler Attachments: FOP-2252.pdf, TextPositionComparator.java, WFI_PDFParser_TextPostionComparator.txt, immo-kurier_arsenal_93x62.pdf The TextPostionCompartor causes the following exception running on Java 7: Unexpected RuntimeException from org.apache.tika.parser.ParserDecorator$1@9007fa2 Original cause: Comparison method violates its general contract! I think the problem is with this check: if ( yDifference .1 || (pos2YBottom = pos1YTop pos2YBottom = pos1YBottom) || (pos1YBottom = pos2YTop pos1YBottom = pos2YBottom)) as it violates the contract requirement: The implementor must also ensure that the relation is transitive: ((compare(x, y)0) (compare(y, z)0)) implies compare(x, z)0. Finally, the implementor must ensure that compare(x, y)==0 implies that sgn(compare(x, z))==sgn(compare(y, z)) for all z. Java 7 now is strict and throws exceptions when the contract is violated. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (PDFBOX-1512) TextPositionComparator is not compatible with Java 7
[ https://issues.apache.org/jira/browse/PDFBOX-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13939639#comment-13939639 ] John Hewson edited comment on PDFBOX-1512 at 3/18/14 6:55 PM: -- {quote} Some algorithms don't care, which may result in inconsistent ordering across multiple calls {quote} I don't see how this would happen unless the algorithm was randomised. For a given input the output should always be the same, regardless. But as you say the ordering is not sufficiently defined, so there may be more than one solution. Perhaps we need some more sophisticated rules for determining reading order? Perhaps [Topological sorting|http://en.wikipedia.org/wiki/Topological_sorting] may be of relevance? was (Author: jahewson): {quote} Some algorithms don't care, which may result in inconsistent ordering across multiple calls {quote} I don't see how this would happen unless the algorithm was randomised. For a given input the output should always be the same, regardless. But as you say the ordering is not sufficiently defined, so there may be more than one solution. Perhaps we need some more sophisticated rule for determining reading order? Perhaps [Topological sorting|http://en.wikipedia.org/wiki/Topological_sorting] may be of relevance? TextPositionComparator is not compatible with Java 7 Key: PDFBOX-1512 URL: https://issues.apache.org/jira/browse/PDFBOX-1512 Project: PDFBox Issue Type: Bug Components: Text extraction Affects Versions: 1.7.1 Environment: Java 7 Reporter: Benjamin Papez Assignee: Andreas Lehmkühler Attachments: FOP-2252.pdf, TextPositionComparator.java, WFI_PDFParser_TextPostionComparator.txt, immo-kurier_arsenal_93x62.pdf The TextPostionCompartor causes the following exception running on Java 7: Unexpected RuntimeException from org.apache.tika.parser.ParserDecorator$1@9007fa2 Original cause: Comparison method violates its general contract! I think the problem is with this check: if ( yDifference .1 || (pos2YBottom = pos1YTop pos2YBottom = pos1YBottom) || (pos1YBottom = pos2YTop pos1YBottom = pos2YBottom)) as it violates the contract requirement: The implementor must also ensure that the relation is transitive: ((compare(x, y)0) (compare(y, z)0)) implies compare(x, z)0. Finally, the implementor must ensure that compare(x, y)==0 implies that sgn(compare(x, z))==sgn(compare(y, z)) for all z. Java 7 now is strict and throws exceptions when the contract is violated. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (PDFBOX-1512) TextPositionComparator is not compatible with Java 7
[ https://issues.apache.org/jira/browse/PDFBOX-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13939639#comment-13939639 ] John Hewson edited comment on PDFBOX-1512 at 3/18/14 6:55 PM: -- {quote} Some algorithms don't care, which may result in inconsistent ordering across multiple calls {quote} I don't see how this would happen unless the algorithm was randomised. For a given input the output should always be the same, regardless. But as you say the ordering is not sufficiently defined, so there may be more than one solution. Perhaps we need some more sophisticated rules for determining reading order? Could [Topological sorting|http://en.wikipedia.org/wiki/Topological_sorting] be of relevance? was (Author: jahewson): {quote} Some algorithms don't care, which may result in inconsistent ordering across multiple calls {quote} I don't see how this would happen unless the algorithm was randomised. For a given input the output should always be the same, regardless. But as you say the ordering is not sufficiently defined, so there may be more than one solution. Perhaps we need some more sophisticated rules for determining reading order? Perhaps [Topological sorting|http://en.wikipedia.org/wiki/Topological_sorting] may be of relevance? TextPositionComparator is not compatible with Java 7 Key: PDFBOX-1512 URL: https://issues.apache.org/jira/browse/PDFBOX-1512 Project: PDFBox Issue Type: Bug Components: Text extraction Affects Versions: 1.7.1 Environment: Java 7 Reporter: Benjamin Papez Assignee: Andreas Lehmkühler Attachments: FOP-2252.pdf, TextPositionComparator.java, WFI_PDFParser_TextPostionComparator.txt, immo-kurier_arsenal_93x62.pdf The TextPostionCompartor causes the following exception running on Java 7: Unexpected RuntimeException from org.apache.tika.parser.ParserDecorator$1@9007fa2 Original cause: Comparison method violates its general contract! I think the problem is with this check: if ( yDifference .1 || (pos2YBottom = pos1YTop pos2YBottom = pos1YBottom) || (pos1YBottom = pos2YTop pos1YBottom = pos2YBottom)) as it violates the contract requirement: The implementor must also ensure that the relation is transitive: ((compare(x, y)0) (compare(y, z)0)) implies compare(x, z)0. Finally, the implementor must ensure that compare(x, y)==0 implies that sgn(compare(x, z))==sgn(compare(y, z)) for all z. Java 7 now is strict and throws exceptions when the contract is violated. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (PDFBOX-1512) TextPositionComparator is not compatible with Java 7
[ https://issues.apache.org/jira/browse/PDFBOX-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13939639#comment-13939639 ] John Hewson edited comment on PDFBOX-1512 at 3/18/14 6:56 PM: -- {quote} Some algorithms don't care, which may result in inconsistent ordering across multiple calls {quote} I don't see how this would happen unless the algorithm was randomised. For a given input the output should always be the same, regardless. But as you say the ordering is not sufficiently defined, so there may be more than one solution. Perhaps we need some more sophisticated rules for determining reading order? Off the top of my head, it seems like [Topological sorting|http://en.wikipedia.org/wiki/Topological_sorting] may be of relevance. was (Author: jahewson): {quote} Some algorithms don't care, which may result in inconsistent ordering across multiple calls {quote} I don't see how this would happen unless the algorithm was randomised. For a given input the output should always be the same, regardless. But as you say the ordering is not sufficiently defined, so there may be more than one solution. Perhaps we need some more sophisticated rules for determining reading order? [Topological sorting|http://en.wikipedia.org/wiki/Topological_sorting] may be of relevance. TextPositionComparator is not compatible with Java 7 Key: PDFBOX-1512 URL: https://issues.apache.org/jira/browse/PDFBOX-1512 Project: PDFBox Issue Type: Bug Components: Text extraction Affects Versions: 1.7.1 Environment: Java 7 Reporter: Benjamin Papez Assignee: Andreas Lehmkühler Attachments: FOP-2252.pdf, TextPositionComparator.java, WFI_PDFParser_TextPostionComparator.txt, immo-kurier_arsenal_93x62.pdf The TextPostionCompartor causes the following exception running on Java 7: Unexpected RuntimeException from org.apache.tika.parser.ParserDecorator$1@9007fa2 Original cause: Comparison method violates its general contract! I think the problem is with this check: if ( yDifference .1 || (pos2YBottom = pos1YTop pos2YBottom = pos1YBottom) || (pos1YBottom = pos2YTop pos1YBottom = pos2YBottom)) as it violates the contract requirement: The implementor must also ensure that the relation is transitive: ((compare(x, y)0) (compare(y, z)0)) implies compare(x, z)0. Finally, the implementor must ensure that compare(x, y)==0 implies that sgn(compare(x, z))==sgn(compare(y, z)) for all z. Java 7 now is strict and throws exceptions when the contract is violated. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (PDFBOX-1512) TextPositionComparator is not compatible with Java 7
[ https://issues.apache.org/jira/browse/PDFBOX-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13939639#comment-13939639 ] John Hewson edited comment on PDFBOX-1512 at 3/18/14 6:55 PM: -- {quote} Some algorithms don't care, which may result in inconsistent ordering across multiple calls {quote} I don't see how this would happen unless the algorithm was randomised. For a given input the output should always be the same, regardless. But as you say the ordering is not sufficiently defined, so there may be more than one solution. Perhaps we need some more sophisticated rules for determining reading order? [Topological sorting|http://en.wikipedia.org/wiki/Topological_sorting] may be of relevance. was (Author: jahewson): {quote} Some algorithms don't care, which may result in inconsistent ordering across multiple calls {quote} I don't see how this would happen unless the algorithm was randomised. For a given input the output should always be the same, regardless. But as you say the ordering is not sufficiently defined, so there may be more than one solution. Perhaps we need some more sophisticated rules for determining reading order? Could [Topological sorting|http://en.wikipedia.org/wiki/Topological_sorting] be of relevance? TextPositionComparator is not compatible with Java 7 Key: PDFBOX-1512 URL: https://issues.apache.org/jira/browse/PDFBOX-1512 Project: PDFBox Issue Type: Bug Components: Text extraction Affects Versions: 1.7.1 Environment: Java 7 Reporter: Benjamin Papez Assignee: Andreas Lehmkühler Attachments: FOP-2252.pdf, TextPositionComparator.java, WFI_PDFParser_TextPostionComparator.txt, immo-kurier_arsenal_93x62.pdf The TextPostionCompartor causes the following exception running on Java 7: Unexpected RuntimeException from org.apache.tika.parser.ParserDecorator$1@9007fa2 Original cause: Comparison method violates its general contract! I think the problem is with this check: if ( yDifference .1 || (pos2YBottom = pos1YTop pos2YBottom = pos1YBottom) || (pos1YBottom = pos2YTop pos1YBottom = pos2YBottom)) as it violates the contract requirement: The implementor must also ensure that the relation is transitive: ((compare(x, y)0) (compare(y, z)0)) implies compare(x, z)0. Finally, the implementor must ensure that compare(x, y)==0 implies that sgn(compare(x, z))==sgn(compare(y, z)) for all z. Java 7 now is strict and throws exceptions when the contract is violated. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-1466) Rendering of pattern colorspace fails
[ https://issues.apache.org/jira/browse/PDFBOX-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13939651#comment-13939651 ] John Hewson commented on PDFBOX-1466: - The two blurry images do appear in Adobe Reader, over the green border around the star. You have to zoom in to around 2000% to be order to see them though. Rendering of pattern colorspace fails - Key: PDFBOX-1466 URL: https://issues.apache.org/jira/browse/PDFBOX-1466 Project: PDFBox Issue Type: Bug Components: Rendering Affects Versions: 1.7.1, 1.8.4, 2.0.0 Environment: Windows 7, JDK 1.6 / 1.7 Reporter: Maurice Koch Labels: tilingpattern Fix For: 2.0.0 Attachments: pdfbox-1466-01-img10.png, pdfbox-1466-01-img9.png, pdfbox-1466.pdf-1.png, report.pdf, report.png, report_Seite_1_Bild_0001.png, report_Seite_1_Bild_0002.png, report_Seite_1_Bild_0003.png, report_Seite_1_Bild_0004.png I was trying to print a pdf which was generated by iText v2.1.5. Unfortunately parts of it were printed in white – the filling color was missing. I could reduce the problem to the attached PDF. When trying to print with e.g. PDocument.silentPrint I get the following info message: [INFO] [org.apache.pdfbox.pdfviewer.PageDrawer] ColorSpace Pattern doesn't provide a non-stroking color, using white instead! -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (PDFBOX-1466) Rendering of pattern colorspace fails
[ https://issues.apache.org/jira/browse/PDFBOX-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13939651#comment-13939651 ] John Hewson edited comment on PDFBOX-1466 at 3/18/14 7:00 PM: -- Tilman, the two blurry images do appear in Adobe Reader, over the green border around the star. You have to zoom in to around 2000% to be order to see them though. was (Author: jahewson): The two blurry images do appear in Adobe Reader, over the green border around the star. You have to zoom in to around 2000% to be order to see them though. Rendering of pattern colorspace fails - Key: PDFBOX-1466 URL: https://issues.apache.org/jira/browse/PDFBOX-1466 Project: PDFBox Issue Type: Bug Components: Rendering Affects Versions: 1.7.1, 1.8.4, 2.0.0 Environment: Windows 7, JDK 1.6 / 1.7 Reporter: Maurice Koch Labels: tilingpattern Fix For: 2.0.0 Attachments: pdfbox-1466-01-img10.png, pdfbox-1466-01-img9.png, pdfbox-1466.pdf-1.png, report.pdf, report.png, report_Seite_1_Bild_0001.png, report_Seite_1_Bild_0002.png, report_Seite_1_Bild_0003.png, report_Seite_1_Bild_0004.png I was trying to print a pdf which was generated by iText v2.1.5. Unfortunately parts of it were printed in white – the filling color was missing. I could reduce the problem to the attached PDF. When trying to print with e.g. PDocument.silentPrint I get the following info message: [INFO] [org.apache.pdfbox.pdfviewer.PageDrawer] ColorSpace Pattern doesn't provide a non-stroking color, using white instead! -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (PDFBOX-1466) Rendering of pattern colorspace fails
[ https://issues.apache.org/jira/browse/PDFBOX-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13939651#comment-13939651 ] John Hewson edited comment on PDFBOX-1466 at 3/18/14 7:01 PM: -- Tilman, the two blurry images do appear in Adobe Reader, over the green border around the star. You have to zoom in to around 2000% to be able to see them. was (Author: jahewson): Tilman, the two blurry images do appear in Adobe Reader, over the green border around the star. You have to zoom in to around 2000% to be able to see them though. Rendering of pattern colorspace fails - Key: PDFBOX-1466 URL: https://issues.apache.org/jira/browse/PDFBOX-1466 Project: PDFBox Issue Type: Bug Components: Rendering Affects Versions: 1.7.1, 1.8.4, 2.0.0 Environment: Windows 7, JDK 1.6 / 1.7 Reporter: Maurice Koch Labels: tilingpattern Fix For: 2.0.0 Attachments: pdfbox-1466-01-img10.png, pdfbox-1466-01-img9.png, pdfbox-1466.pdf-1.png, report.pdf, report.png, report_Seite_1_Bild_0001.png, report_Seite_1_Bild_0002.png, report_Seite_1_Bild_0003.png, report_Seite_1_Bild_0004.png I was trying to print a pdf which was generated by iText v2.1.5. Unfortunately parts of it were printed in white – the filling color was missing. I could reduce the problem to the attached PDF. When trying to print with e.g. PDocument.silentPrint I get the following info message: [INFO] [org.apache.pdfbox.pdfviewer.PageDrawer] ColorSpace Pattern doesn't provide a non-stroking color, using white instead! -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (PDFBOX-1466) Rendering of pattern colorspace fails
[ https://issues.apache.org/jira/browse/PDFBOX-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13939651#comment-13939651 ] John Hewson edited comment on PDFBOX-1466 at 3/18/14 7:00 PM: -- Tilman, the two blurry images do appear in Adobe Reader, over the green border around the star. You have to zoom in to around 2000% to be able to see them though. was (Author: jahewson): Tilman, the two blurry images do appear in Adobe Reader, over the green border around the star. You have to zoom in to around 2000% to be order to see them though. Rendering of pattern colorspace fails - Key: PDFBOX-1466 URL: https://issues.apache.org/jira/browse/PDFBOX-1466 Project: PDFBox Issue Type: Bug Components: Rendering Affects Versions: 1.7.1, 1.8.4, 2.0.0 Environment: Windows 7, JDK 1.6 / 1.7 Reporter: Maurice Koch Labels: tilingpattern Fix For: 2.0.0 Attachments: pdfbox-1466-01-img10.png, pdfbox-1466-01-img9.png, pdfbox-1466.pdf-1.png, report.pdf, report.png, report_Seite_1_Bild_0001.png, report_Seite_1_Bild_0002.png, report_Seite_1_Bild_0003.png, report_Seite_1_Bild_0004.png I was trying to print a pdf which was generated by iText v2.1.5. Unfortunately parts of it were printed in white – the filling color was missing. I could reduce the problem to the attached PDF. When trying to print with e.g. PDocument.silentPrint I get the following info message: [INFO] [org.apache.pdfbox.pdfviewer.PageDrawer] ColorSpace Pattern doesn't provide a non-stroking color, using white instead! -- This message was sent by Atlassian JIRA (v6.2#6252)
Removing processStream and processSubStream
Hi All I’m still working on getting Tiling Patterns to render correctly, and need to make some changes to core PDFBox functionality in order to proceed. My problem is that tiling patterns are defined in their parent stream’s initial coordinate space, rather than the coordinate space defined by the CTM. However, in PDFBox there is no way to access the parent stream, so I can’t find out what it’s initial matrix is. The manner in which the initial coordinate space is determined is different for pages, forms, and patterns What this means is that the parent stream’s initial coordinate space needs to be passed to processStream and processSubStream in PDFStreamEngine. This will necessarily be a breaking change, and it will affect all downstream subclasses of PDFStreamEngine. Because this has to be a breaking change, I propose that we go all the way and make the new API bulletproof, 1) so that we won’t have to introduce breaking changes in the future if we encounter similar issues, 2) so that the caller of the method can’t pass the wrong data in the parameters. We would remove the two generic methods: public void processStream(PDResources resources, COSStream cosStream, PDRectangle drawingSize, int rotation) public void processSubStream(PDResources resources, COSStream cosStream) and replace them with four specific methods: public void processPage(PDPage page) public void processForm(PDFormXObject form) public void processTilingPattern(PDTilingPattern pattern) public void processType3Font(PDType3Font font) This would mean that the various “proces” methods have access to their parent stream, and can read any of its public fields in the future without introducing breaking changes by altering the method’s parameters. What do you think? -- John
[jira] [Commented] (PDFBOX-1466) Rendering of pattern colorspace fails
[ https://issues.apache.org/jira/browse/PDFBOX-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13939856#comment-13939856 ] Tilman Hausherr commented on PDFBOX-1466: - Yeah, looking there at 2000% a blurry effect appears for a short time on the green star, before the red star is painted, and after that there is still a rest. Rendering of pattern colorspace fails - Key: PDFBOX-1466 URL: https://issues.apache.org/jira/browse/PDFBOX-1466 Project: PDFBox Issue Type: Bug Components: Rendering Affects Versions: 1.7.1, 1.8.4, 2.0.0 Environment: Windows 7, JDK 1.6 / 1.7 Reporter: Maurice Koch Labels: tilingpattern Fix For: 2.0.0 Attachments: pdfbox-1466-01-img10.png, pdfbox-1466-01-img9.png, pdfbox-1466.pdf-1.png, report.pdf, report.png, report_Seite_1_Bild_0001.png, report_Seite_1_Bild_0002.png, report_Seite_1_Bild_0003.png, report_Seite_1_Bild_0004.png I was trying to print a pdf which was generated by iText v2.1.5. Unfortunately parts of it were printed in white – the filling color was missing. I could reduce the problem to the attached PDF. When trying to print with e.g. PDocument.silentPrint I get the following info message: [INFO] [org.apache.pdfbox.pdfviewer.PageDrawer] ColorSpace Pattern doesn't provide a non-stroking color, using white instead! -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PDFBOX-1990) Support creating PDF from lossless encoded images
[ https://issues.apache.org/jira/browse/PDFBOX-1990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13939884#comment-13939884 ] Tilman Hausherr commented on PDFBOX-1990: - Done as suggested in rev 1579073 and 1579074. Support creating PDF from lossless encoded images - Key: PDFBOX-1990 URL: https://issues.apache.org/jira/browse/PDFBOX-1990 Project: PDFBox Issue Type: Improvement Reporter: Tilman Hausherr Priority: Minor Currently we support the insertion of TIFF and JPEG into a PDF, but not PNG. We can pass a BufferedImage, but this one will be JPEG compressed which is not a good thing for graphics with sharp edges. I suggest that we support PNG as well. It is possible because the Flate Filter supports both directions. My implementation (coming in a few minutes) is just an RGB based start that begs for improvement. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (PDFBOX-1991) Shading PaintContexts should not depend on the page height
John Hewson created PDFBOX-1991: --- Summary: Shading PaintContexts should not depend on the page height Key: PDFBOX-1991 URL: https://issues.apache.org/jira/browse/PDFBOX-1991 Project: PDFBox Issue Type: Improvement Components: Rendering Affects Versions: 2.0.0 Reporter: John Hewson Priority: Minor I'd like to remove the page height parameter from PDPattern as soon as possible because of doubts over its safety (i.e. the current stream being processed may be a pattern or a form, not a page). Before I do that we need to remove its only use, which is... The page height is passed to all shading PaintContext subclasses but it is only used in GouraudShadingContext. However, all other drawing in PDFBox is done using the native PDF y-axis which is flipped via a call to Graphics2D#scale(0, -1) but the following code in GouraudShadingContext flips the y-axis: {code} v.point = new Point.Double(v.point.getX(), pageHeight + xform.getTranslateY() - v.point.getY()); {code} So it seems like this could be removed and the y-axis inversion done elsewhere with either a Matrix, AffineTransform or Grpahics2D#scale. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PDFBOX-1991) Shading PaintContexts should not depend on the page height
[ https://issues.apache.org/jira/browse/PDFBOX-1991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Hewson updated PDFBOX-1991: Description: I'd like to remove the page height parameter from PDPattern as soon as possible because of doubts over its safety (i.e. the current stream being processed may be a pattern or a form, not a page). Before I do that we need to remove its only use, which is... The page height is passed to all shading PaintContext subclasses but it is only used in GouraudShadingContext. However, all other drawing in PDFBox is done using the native PDF y-axis which is flipped via a call to Graphics2D#scale(0, -1) but the following code in GouraudShadingContext flips the y-axis: v.point = new Point.Double(v.point.getX(), pageHeight + xform.getTranslateY() - v.point.getY()); So it seems like this could be removed and the y-axis inversion done elsewhere with either a Matrix, AffineTransform or Grpahics2D#scale. was: I'd like to remove the page height parameter from PDPattern as soon as possible because of doubts over its safety (i.e. the current stream being processed may be a pattern or a form, not a page). Before I do that we need to remove its only use, which is... The page height is passed to all shading PaintContext subclasses but it is only used in GouraudShadingContext. However, all other drawing in PDFBox is done using the native PDF y-axis which is flipped via a call to Graphics2D#scale(0, -1) but the following code in GouraudShadingContext flips the y-axis: {code} v.point = new Point.Double(v.point.getX(), pageHeight + xform.getTranslateY() - v.point.getY()); {code} So it seems like this could be removed and the y-axis inversion done elsewhere with either a Matrix, AffineTransform or Grpahics2D#scale. Shading PaintContexts should not depend on the page height -- Key: PDFBOX-1991 URL: https://issues.apache.org/jira/browse/PDFBOX-1991 Project: PDFBox Issue Type: Improvement Components: Rendering Affects Versions: 2.0.0 Reporter: John Hewson Priority: Minor I'd like to remove the page height parameter from PDPattern as soon as possible because of doubts over its safety (i.e. the current stream being processed may be a pattern or a form, not a page). Before I do that we need to remove its only use, which is... The page height is passed to all shading PaintContext subclasses but it is only used in GouraudShadingContext. However, all other drawing in PDFBox is done using the native PDF y-axis which is flipped via a call to Graphics2D#scale(0, -1) but the following code in GouraudShadingContext flips the y-axis: v.point = new Point.Double(v.point.getX(), pageHeight + xform.getTranslateY() - v.point.getY()); So it seems like this could be removed and the y-axis inversion done elsewhere with either a Matrix, AffineTransform or Grpahics2D#scale. -- This message was sent by Atlassian JIRA (v6.2#6252)