[jira] [Commented] (PDFBOX-1915) Implement shading with Coons and tensor-product patch meshes

2014-07-16 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14063218#comment-14063218
 ] 

Tilman Hausherr commented on PDFBOX-1915:
-

I committed your radial shading optimizations in rev 1610917 in the trunk. I 
didn't measure the exact times, but PDFBOX-1764 and PDFBOX-1416 are much much 
faster.

 Implement shading with Coons and tensor-product patch meshes
 

 Key: PDFBOX-1915
 URL: https://issues.apache.org/jira/browse/PDFBOX-1915
 Project: PDFBox
  Issue Type: Improvement
  Components: Rendering
Affects Versions: 1.8.5, 1.8.6, 2.0.0
Reporter: Tilman Hausherr
Assignee: Shaola Ren
  Labels: graphical, gsoc2014, java, math, shading
 Fix For: 2.0.0

 Attachments: CIB-coons-vs-tensormesh.pdf, CIB-coonsmesh.pdf, 
 CONICAL.pdf, GWG060_Shading_x1a.pdf, GWG060_Shading_x1a_1.png, HSBWHEEL.pdf, 
 McAfee-ShadingType7.pdf, Shadingtype6week1.pdf, TENSOR.pdf, XYZsweep.pdf, 
 _gwg060_shading_x1a.pdf-1.png, _mcafee-shadingtype7.pdf-1.png, 
 asy-coons-but-really-tensor.pdf, asy-tensor-rainbow.pdf, asy-tensor.pdf, 
 ch14.pdf, coons-function.pdf, coons-function.ps, coons-nofunction-CMYK.pdf, 
 coons-nofunction-CMYK.ps, coons-nofunction-Duotone.pdf, 
 coons-nofunction-Duotone.ps, coons-nofunction-Gray.pdf, 
 coons-nofunction-Gray.ps, coons-nofunction-RGB.pdf, coons-nofunction-RGB.ps, 
 coons2-function.pdf, coons2-function.ps, coons4-function.ps, crestron-p9.pdf, 
 eci_altona-test-suite-v2_technical_H.pdf, example_030.pdf, failedTest.rar, 
 lamp_cairo.pdf, lamp_cairo7_0.png, lamp_cairo7_1.png, lamp_cairo7_1.png, 
 lineRasterization.jpg, mcafeeU5.pdf, mcafeeU5_1.png, mcafeeu5.pdf-1.png, 
 pass4FlagTest.rar, patchCases.jpg, patchMap.jpg, shading6ContourTest.rar, 
 shading6Done.rar, shading7.rar, tensor-nofunction-RGB.pdf, 
 tensor-nofunction-RGB.ps, tensor-nofunction-RGB_1.png, 
 tensor4-nofunction.pdf, tensor4-nofunction.ps, tensor4-nofunction_1.png, 
 updateshading6ContourTest.rar


 Of the seven shading methods described in the PDF specification, type 6 
 (Coons patch meshes) and type 7 (Tensor-product patch meshes) haven't been 
 implemented. I have done type 1, 4 and 5, but I don't know the math for type 
 6 and 7. My math days are decades away.
 Knowledge prerequisites: 
 - java, although you don't have to be a java ace, just feel confortable
 - math: you should know what cubic Bézier curves, Degenerate Bézier 
 curves, bilinear interpolation, tensor-product, affine transform 
 matrix and Bernstein polynomials are, or be able to learn it
 - maven (basic)
 - svn (basic)
 - an IDE like Netbeans or Eclipse or IntelliJ (basic)
 - ideally, you are either a math student who likes to program, or a computer 
 science student who is specializing in graphics.
 A first look at PDFBOX: try the command utility here:
 https://pdfbox.apache.org/commandline/#pdfToImage
 and use your favorite PDF, or the PDFs mentioned in PDFBOX-615, these have 
 the shading types that are already implemented.
 Some simple source code to convert to images:
 String filename = blah.pdf;
 PDDocument document = PDDocument.loadNonSeq(new File(filename), null);
 ListPDPage pdPages = document.getDocumentCatalog().getAllPages();
 int page = 0;
 for (PDPage pdPage : pdPages)
 {
 ++page;
 BufferedImage bim = RenderUtil.convertToImage(pdPage, 
 BufferedImage.TYPE_BYTE_BINARY, 300);
 ImageIO.write(bim, png, new File(filename+page+.png));
 }
 document.close();
 You are not starting from scratch. The implementation of type 4 and 5 shows 
 you how to read parameters from the PDF and set the graphics. You don't have 
 to learn the complete PDF spec, only 15 pages related to the two shading 
 types, and 6 pages about shading in general. The PDF specification is here:
 http://www.adobe.com/devnet/pdf/pdf_reference.html
 The tricky parts are:
 - decide whether a point(x,y) is inside or outside a patch
 - decide the color of a point within the patch
 To get an idea about the code, look at the classes GouraudTriangle, 
 GouraudShadingContext, Type4ShadingContext and Vertex here
 https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/shading/
 or download the whole project from the repository.
 https://pdfbox.apache.org/downloads.html#scm
 If you want to see the existing code in the debugger with a Gouraud shading, 
 try this file:
 http://asymptote.sourceforge.net/gallery/Gouraud.pdf
 Testing:
 I have attached several example PDFs. To see which one has which shading, 
 open them with an editor like NOTEPAD++, and search for /ShadingType 
 (without the quotes). If your images are rendering like the example PDFs, 
 then you were successful.
 Optional:
 Review and optimize the complete shading package for speed; implement cubic 
 

Build failed in Jenkins: PDFBox-ant #1441

2014-07-16 Thread Apache Jenkins Server
See https://builds.apache.org/job/PDFBox-ant/1441/changes

Changes:

[tilman] PDFBOX-1915: Optimization of radial shading by Shaola Ren as part of 
GSoC2014

--
Started by an SCM change
Building remotely on ubuntu-6 (Ubuntu ubuntu) in workspace 
https://builds.apache.org/job/PDFBox-ant/ws/
Updating http://svn.apache.org/repos/asf/pdfbox/trunk at revision 
'2014-07-16T06:49:16.119 +'
U 
pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/shading/RadialShadingContext.java
At revision 1610917
FATAL: Cannot find executable from the chosen Ant installation Ant 1.7.0
Build step 'Invoke Ant' marked build as failure


[jira] [Commented] (PDFBOX-1915) Implement shading with Coons and tensor-product patch meshes

2014-07-16 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14063235#comment-14063235
 ] 

Tilman Hausherr commented on PDFBOX-1915:
-

About 
{quote}
Technically, I can use a 2D array to store the pixels' color instead of a 
hashmap in type 67 shading
{quote}
IMHO we can leave it as it is now, i.e. with the hash map. I'll review  commit 
the changes on 6  7 later today.

 Implement shading with Coons and tensor-product patch meshes
 

 Key: PDFBOX-1915
 URL: https://issues.apache.org/jira/browse/PDFBOX-1915
 Project: PDFBox
  Issue Type: Improvement
  Components: Rendering
Affects Versions: 1.8.5, 1.8.6, 2.0.0
Reporter: Tilman Hausherr
Assignee: Shaola Ren
  Labels: graphical, gsoc2014, java, math, shading
 Fix For: 2.0.0

 Attachments: CIB-coons-vs-tensormesh.pdf, CIB-coonsmesh.pdf, 
 CONICAL.pdf, GWG060_Shading_x1a.pdf, GWG060_Shading_x1a_1.png, HSBWHEEL.pdf, 
 McAfee-ShadingType7.pdf, Shadingtype6week1.pdf, TENSOR.pdf, XYZsweep.pdf, 
 _gwg060_shading_x1a.pdf-1.png, _mcafee-shadingtype7.pdf-1.png, 
 asy-coons-but-really-tensor.pdf, asy-tensor-rainbow.pdf, asy-tensor.pdf, 
 ch14.pdf, coons-function.pdf, coons-function.ps, coons-nofunction-CMYK.pdf, 
 coons-nofunction-CMYK.ps, coons-nofunction-Duotone.pdf, 
 coons-nofunction-Duotone.ps, coons-nofunction-Gray.pdf, 
 coons-nofunction-Gray.ps, coons-nofunction-RGB.pdf, coons-nofunction-RGB.ps, 
 coons2-function.pdf, coons2-function.ps, coons4-function.ps, crestron-p9.pdf, 
 eci_altona-test-suite-v2_technical_H.pdf, example_030.pdf, failedTest.rar, 
 lamp_cairo.pdf, lamp_cairo7_0.png, lamp_cairo7_1.png, lamp_cairo7_1.png, 
 lineRasterization.jpg, mcafeeU5.pdf, mcafeeU5_1.png, mcafeeu5.pdf-1.png, 
 pass4FlagTest.rar, patchCases.jpg, patchMap.jpg, shading6ContourTest.rar, 
 shading6Done.rar, shading7.rar, tensor-nofunction-RGB.pdf, 
 tensor-nofunction-RGB.ps, tensor-nofunction-RGB_1.png, 
 tensor4-nofunction.pdf, tensor4-nofunction.ps, tensor4-nofunction_1.png, 
 updateshading6ContourTest.rar


 Of the seven shading methods described in the PDF specification, type 6 
 (Coons patch meshes) and type 7 (Tensor-product patch meshes) haven't been 
 implemented. I have done type 1, 4 and 5, but I don't know the math for type 
 6 and 7. My math days are decades away.
 Knowledge prerequisites: 
 - java, although you don't have to be a java ace, just feel confortable
 - math: you should know what cubic Bézier curves, Degenerate Bézier 
 curves, bilinear interpolation, tensor-product, affine transform 
 matrix and Bernstein polynomials are, or be able to learn it
 - maven (basic)
 - svn (basic)
 - an IDE like Netbeans or Eclipse or IntelliJ (basic)
 - ideally, you are either a math student who likes to program, or a computer 
 science student who is specializing in graphics.
 A first look at PDFBOX: try the command utility here:
 https://pdfbox.apache.org/commandline/#pdfToImage
 and use your favorite PDF, or the PDFs mentioned in PDFBOX-615, these have 
 the shading types that are already implemented.
 Some simple source code to convert to images:
 String filename = blah.pdf;
 PDDocument document = PDDocument.loadNonSeq(new File(filename), null);
 ListPDPage pdPages = document.getDocumentCatalog().getAllPages();
 int page = 0;
 for (PDPage pdPage : pdPages)
 {
 ++page;
 BufferedImage bim = RenderUtil.convertToImage(pdPage, 
 BufferedImage.TYPE_BYTE_BINARY, 300);
 ImageIO.write(bim, png, new File(filename+page+.png));
 }
 document.close();
 You are not starting from scratch. The implementation of type 4 and 5 shows 
 you how to read parameters from the PDF and set the graphics. You don't have 
 to learn the complete PDF spec, only 15 pages related to the two shading 
 types, and 6 pages about shading in general. The PDF specification is here:
 http://www.adobe.com/devnet/pdf/pdf_reference.html
 The tricky parts are:
 - decide whether a point(x,y) is inside or outside a patch
 - decide the color of a point within the patch
 To get an idea about the code, look at the classes GouraudTriangle, 
 GouraudShadingContext, Type4ShadingContext and Vertex here
 https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/shading/
 or download the whole project from the repository.
 https://pdfbox.apache.org/downloads.html#scm
 If you want to see the existing code in the debugger with a Gouraud shading, 
 try this file:
 http://asymptote.sourceforge.net/gallery/Gouraud.pdf
 Testing:
 I have attached several example PDFs. To see which one has which shading, 
 open them with an editor like NOTEPAD++, and search for /ShadingType 
 (without the quotes). If your images are rendering like the example PDFs, 
 then you were successful.
 Optional:
 Review 

[jira] [Commented] (PDFBOX-2209) [PATCH] Restore shading API

2014-07-16 Thread simon steiner (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14063260#comment-14063260
 ] 

simon steiner commented on PDFBOX-2209:
---

Its not open source

 [PATCH] Restore shading API
 ---

 Key: PDFBOX-2209
 URL: https://issues.apache.org/jira/browse/PDFBOX-2209
 Project: PDFBox
  Issue Type: Wish
  Components: Rendering
Affects Versions: 2.0.0
Reporter: simon steiner
Assignee: Tilman Hausherr
 Fix For: 2.0.0

 Attachments: shading.patch


 Some of shading API is gone in 2.0 can we have it back so we can convert PDF 
 to postscript in fop



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2210) [PATCH] Allow caching of glyphs

2014-07-16 Thread simon steiner (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14063263#comment-14063263
 ] 

simon steiner commented on PDFBOX-2210:
---

If we get glyph path without transform we can store that once inside postscript 
function and reuse it each time it is drawn in a document

 [PATCH] Allow caching of glyphs
 ---

 Key: PDFBOX-2210
 URL: https://issues.apache.org/jira/browse/PDFBOX-2210
 Project: PDFBox
  Issue Type: Improvement
  Components: Rendering
Affects Versions: 2.0.0
Reporter: simon steiner
Assignee: John Hewson
 Attachments: drawglyphs.patch


 If you seperate transform from glyph it means we can reuse glyphs in fop 
 postscript output and get smaller output files



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (PDFBOX-2212) OutOfMemoryError in GlyfCompositeDescrip

2014-07-16 Thread Valdis Andersons (JIRA)
Valdis Andersons created PDFBOX-2212:


 Summary: OutOfMemoryError in GlyfCompositeDescrip
 Key: PDFBOX-2212
 URL: https://issues.apache.org/jira/browse/PDFBOX-2212
 Project: PDFBox
  Issue Type: Bug
  Components: FontBox, Preflight
Affects Versions: 1.8.6
 Environment: Windows 7, JDK6
Reporter: Valdis Andersons


Hi All,
 
The application I’m working on is a web service that accepts PDF documents and 
combines them in a single larger PDF. Client submits a bunch of PDFs and we 
create a single PDF out of them. In some rare cases one of the PDF documents 
submitted has a glitch in it that causes Adobe Reader to throw errors when 
viewing the final document (attached).
When I tried to check the buggy PDF with the approach outlined here:
 
https://pdfbox.apache.org/cookbook/pdfavalidation.html
 
I was getting an OutOfMemoryError in the GlyfCompositeDescrip class, here is 
the full stack trace:
 
java.lang.OutOfMemoryError: Java heap space
at 
org.apache.fontbox.ttf.GlyfCompositeDescript.init(GlyfCompositeDescript.java:58)
at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:62)
at 
org.apache.fontbox.ttf.GlyphTable.initData(GlyphTable.java:69)
at 
org.apache.fontbox.ttf.TrueTypeFont.initializeTable(TrueTypeFont.java:280)
at 
org.apache.fontbox.ttf.AbstractTTFParser.parseTables(AbstractTTFParser.java:128)
at 
org.apache.fontbox.ttf.TTFParser.parseTables(TTFParser.java:80)
at 
org.apache.fontbox.ttf.AbstractTTFParser.parseTTF(AbstractTTFParser.java:109)
at org.apache.fontbox.ttf.TTFParser.parseTTF(TTFParser.java:25)
at 
org.apache.fontbox.ttf.AbstractTTFParser.parseTTF(AbstractTTFParser.java:84)
at org.apache.fontbox.ttf.TTFParser.parseTTF(TTFParser.java:25)
at 
org.apache.pdfbox.preflight.font.descriptor.TrueTypeDescriptorHelper.processFontFile(TrueTypeDescriptorHelper.java:84)
at 
org.apache.pdfbox.preflight.font.descriptor.FontDescriptorHelper.validate(FontDescriptorHelper.java:97)
at 
org.apache.pdfbox.preflight.font.SimpleFontValidator.processFontDescriptorValidation(SimpleFontValidator.java:82)
at 
org.apache.pdfbox.preflight.font.SimpleFontValidator.validate(SimpleFontValidator.java:55)
at 
org.apache.pdfbox.preflight.process.reflect.FontValidationProcess.validate(FontValidationProcess.java:69)
at 
org.apache.pdfbox.preflight.utils.ContextHelper.callValidation(ContextHelper.java:73)
at 
org.apache.pdfbox.preflight.utils.ContextHelper.validateElement(ContextHelper.java:52)
at 
org.apache.pdfbox.preflight.process.reflect.ResourcesValidationProcess.validateFonts(ResourcesValidationProcess.java:96)
at 
org.apache.pdfbox.preflight.process.reflect.ResourcesValidationProcess.validate(ResourcesValidationProcess.java:74)
at 
org.apache.pdfbox.preflight.utils.ContextHelper.callValidation(ContextHelper.java:73)
at 
org.apache.pdfbox.preflight.utils.ContextHelper.validateElement(ContextHelper.java:52)
at 
org.apache.pdfbox.preflight.xobject.XObjFormValidator.validateXObjectResources(XObjFormValidator.java:178)
at 
org.apache.pdfbox.preflight.xobject.XObjFormValidator.validate(XObjFormValidator.java:75)
at 
org.apache.pdfbox.preflight.process.reflect.GraphicObjectPageValidationProcess.validate(GraphicObjectPageValidationProcess.java:77)
at 
org.apache.pdfbox.preflight.utils.ContextHelper.callValidation(ContextHelper.java:73)
at 
org.apache.pdfbox.preflight.utils.ContextHelper.validateElement(ContextHelper.java:52)
at 
org.apache.pdfbox.preflight.process.reflect.ResourcesValidationProcess.validateXObjects(ResourcesValidationProcess.java:191)
at 
org.apache.pdfbox.preflight.process.reflect.ResourcesValidationProcess.validate(ResourcesValidationProcess.java:78)
at 
org.apache.pdfbox.preflight.utils.ContextHelper.callValidation(ContextHelper.java:73)
at 
org.apache.pdfbox.preflight.utils.ContextHelper.validateElement(ContextHelper.java:52)
at 
org.apache.pdfbox.preflight.xobject.XObjFormValidator.validateXObjectResources(XObjFormValidator.java:178)
at 
org.apache.pdfbox.preflight.xobject.XObjFormValidator.validate(XObjFormValidator.java:75)
 
While I can’t send on the PDF in question due to the sensitivity of the 
contents in it, I did a bit of digging and debugging to find out why this is 
happening.
In the GlyfCompositeDescrip classes constructor there is a do … while loop that 
is constructing GlyfCompositeComp objects and adding them to the components 
list of 

[jira] [Commented] (PDFBOX-2210) [PATCH] Allow caching of glyphs

2014-07-16 Thread Petr Slaby (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14063320#comment-14063320
 ] 

Petr Slaby commented on PDFBOX-2210:


We have a similar problem. In our application, we produce (among others)  PCL 
and AFP output. For PDFBox, we have a PCL and AFP specific implementation of 
Graphics2D which produces the commands in the respective printer language. In 
the old solution, fillGlyphVector or drawGlyphVector was called for printing 
characters using AWT fonts. From the glyph vector, we were able to get at the 
AWT font and the character(s) being printed. From that, we were able to pick an 
existing PCL or AFP font if an equivalent for the AWT font was configured, or 
produce an on-the-fly font and embed it into the output. With the current 
solution, we just get  a shape and do not even know that it is coming from 
rendering of text. I did not try to solve this yet, but I think I will probably 
need PageDrawer.drawGlyph2D() to become part of the API (make it protected 
instead of private) so that I can intercept it and call something else on our 
special G2D implementation. When producing on-the-fly fonts, we need some font 
metrics information - like ascend, descent and width of each character, etc.  
For that, I would need to put some more information into Glyph2D, e.g. have a 
reference to the underlying PDFont. 

 [PATCH] Allow caching of glyphs
 ---

 Key: PDFBOX-2210
 URL: https://issues.apache.org/jira/browse/PDFBOX-2210
 Project: PDFBox
  Issue Type: Improvement
  Components: Rendering
Affects Versions: 2.0.0
Reporter: simon steiner
Assignee: John Hewson
 Attachments: drawglyphs.patch


 If you seperate transform from glyph it means we can reuse glyphs in fop 
 postscript output and get smaller output files



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Unmapped Glyphs in Font files

2014-07-16 Thread Shivam Gupta
Hi Team,

I want to check the unmapped glyphs in the TTF Font file and map those
glyphs based on user input.

Can you please help me out in this.

Thanks in advance

-- 
Shivam Gupta


[jira] [Commented] (PDFBOX-2117) AxialShadingContext is slow

2014-07-16 Thread Shaola Ren (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14063436#comment-14063436
 ] 

Shaola Ren commented on PDFBOX-2117:


Thanks. I noticed you mentioned diagonal shading in a previous comment, but not 
too sure what the special part of it. For this axial shading, the axial line 
can be in any direction, the method doesn't depend on a specific direction; the 
diagonal direction is only one case of it.

 AxialShadingContext is slow
 ---

 Key: PDFBOX-2117
 URL: https://issues.apache.org/jira/browse/PDFBOX-2117
 Project: PDFBox
  Issue Type: Sub-task
  Components: Rendering
Reporter: Petr Slaby
Assignee: Shaola Ren
 Fix For: 2.0.0

 Attachments: 01_MTEXT_CS6.pdf, AxialShading.patch, 
 AxialShading1.patch, AxialShadingContext.java.getrgbimage, 
 GWG061_Shading_x1a.pdf, GWG061_Shading_x1a.pdf-1.png, 
 GWG061_Shading_x1a.pdf-1.png-diff.png, Shading2Function2.pdf, 
 Shading2Function2.ps, Shading2Function2text.pdf, asy-shade.pdf, 
 color_gradient.pdf, shading_pattern.pdf


 AxialShadingContext#getRaster() is on top of profiler hot spots in documents 
 that use an axial shading. Inside it, the slowest part is calling 
 PDColorSpaceRGB#toRGB() and PDFunctionType3#eval() (in this order).
   



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2212) OutOfMemoryError in GlyfCompositeDescrip

2014-07-16 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14063640#comment-14063640
 ] 

Tilman Hausherr commented on PDFBOX-2212:
-

This code in MemoryTTFDataStream looks suspicious to me:
{code}
public int read() throws IOException
{
int retval = -1;
if( currentPosition  data.length )
{
retval = data[currentPosition];
}
currentPosition++;
return (retval+256)%256;
}
{code}
it will return 255 and not -1 on EOF. Because of that, this method:
{code}
public int readUnsignedShort() throws IOException
{
int ch1 = this.read();
int ch2 = this.read();
if ((ch1 | ch2)  0)
{
throw new EOFException();
}
return (ch1  8) + (ch2  0);
}
{code}
won't throw an EOF. Try building from the sources and make this change in 
MemoryTTFDataStream:
{code}
public int read() throws IOException
{
if (currentPosition = data.length)
{
return -1;
}
int retval = data[currentPosition];
currentPosition++;
return (retval+256)%256;
}
{code}
This is just a theory, I can't test it myself, I might be wrong, so you should 
test it yourself by changing the code on your system and then testing that 
file. Obviously I'd need the file to be sure. And no, we didn't have this 
effect yet. We did have a similar effect a year ago that had the same cause 
(EOF) but that was fixed (PDFBOX-1668).

 OutOfMemoryError in GlyfCompositeDescrip
 

 Key: PDFBOX-2212
 URL: https://issues.apache.org/jira/browse/PDFBOX-2212
 Project: PDFBox
  Issue Type: Bug
  Components: FontBox, Preflight
Affects Versions: 1.8.6
 Environment: Windows 7, JDK6
Reporter: Valdis Andersons
 Attachments: adobe_error1.jpg, adobe_error2.jpg


 Hi All,
  
 The application I’m working on is a web service that accepts PDF documents 
 and combines them in a single larger PDF. Client submits a bunch of PDFs and 
 we create a single PDF out of them. In some rare cases one of the PDF 
 documents submitted has a glitch in it that causes Adobe Reader to throw 
 errors when viewing the final document (attached).
 When I tried to check the buggy PDF with the approach outlined here:
  
 https://pdfbox.apache.org/cookbook/pdfavalidation.html
  
 I was getting an OutOfMemoryError in the GlyfCompositeDescrip class, here is 
 the full stack trace:
  
 java.lang.OutOfMemoryError: Java heap space
 at 
 org.apache.fontbox.ttf.GlyfCompositeDescript.init(GlyfCompositeDescript.java:58)
 at 
 org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:62)
 at 
 org.apache.fontbox.ttf.GlyphTable.initData(GlyphTable.java:69)
 at 
 org.apache.fontbox.ttf.TrueTypeFont.initializeTable(TrueTypeFont.java:280)
 at 
 org.apache.fontbox.ttf.AbstractTTFParser.parseTables(AbstractTTFParser.java:128)
 at 
 org.apache.fontbox.ttf.TTFParser.parseTables(TTFParser.java:80)
 at 
 org.apache.fontbox.ttf.AbstractTTFParser.parseTTF(AbstractTTFParser.java:109)
 at 
 org.apache.fontbox.ttf.TTFParser.parseTTF(TTFParser.java:25)
 at 
 org.apache.fontbox.ttf.AbstractTTFParser.parseTTF(AbstractTTFParser.java:84)
 at 
 org.apache.fontbox.ttf.TTFParser.parseTTF(TTFParser.java:25)
 at 
 org.apache.pdfbox.preflight.font.descriptor.TrueTypeDescriptorHelper.processFontFile(TrueTypeDescriptorHelper.java:84)
 at 
 org.apache.pdfbox.preflight.font.descriptor.FontDescriptorHelper.validate(FontDescriptorHelper.java:97)
 at 
 org.apache.pdfbox.preflight.font.SimpleFontValidator.processFontDescriptorValidation(SimpleFontValidator.java:82)
 at 
 org.apache.pdfbox.preflight.font.SimpleFontValidator.validate(SimpleFontValidator.java:55)
 at 
 org.apache.pdfbox.preflight.process.reflect.FontValidationProcess.validate(FontValidationProcess.java:69)
 at 
 org.apache.pdfbox.preflight.utils.ContextHelper.callValidation(ContextHelper.java:73)
 at 
 org.apache.pdfbox.preflight.utils.ContextHelper.validateElement(ContextHelper.java:52)
 at 
 org.apache.pdfbox.preflight.process.reflect.ResourcesValidationProcess.validateFonts(ResourcesValidationProcess.java:96)
 at 
 org.apache.pdfbox.preflight.process.reflect.ResourcesValidationProcess.validate(ResourcesValidationProcess.java:74)
 at 
 org.apache.pdfbox.preflight.utils.ContextHelper.callValidation(ContextHelper.java:73)
 at 
 org.apache.pdfbox.preflight.utils.ContextHelper.validateElement(ContextHelper.java:52)

[jira] [Updated] (PDFBOX-1511) pdfMerger App produces Garbage

2014-07-16 Thread Kirk Haines (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk Haines updated PDFBOX-1511:


Attachment: PDFMergerUtility.java

This version of PDFMergerUtility.java (based on 1.7.1 iirc) removes the shared 
resources section and instead applies resources on the page level.  The cloner 
will create references for resources used on multiple pages, so there is not 
excessive resource duplication.  The previous method assumed resources with the 
same name were identical, which is not valid (see prior comment about Font 
resource CMaps).

 pdfMerger App produces Garbage
 --

 Key: PDFBOX-1511
 URL: https://issues.apache.org/jira/browse/PDFBOX-1511
 Project: PDFBox
  Issue Type: Bug
  Components: Utilities
Affects Versions: 1.7.1
 Environment: Win XP; Windows Server 2008 R2; java version 1.6.0_21, 
Reporter: Michael Huber
 Attachments: 1.pdf, 2.pdf, PDFMergerUtility.java, PdfRenderer.java, 
 targetPdfMergeJava.pdf, targetPdfMergeUtilityApp.pdf


 pdfbox Utility pdfMerger produces a merged document containing garbage. All 
 merged pdf files are contained but Strings are destroyed.
 The source pdf files are created with graphviz and are readable without error 
 or disturbance both with Acrobat X and pdfbox pdfDebug Utility.
 Another astoundig thing is that a handcoded merger using pdfMergerUtility 
 class works fine when run within Eclipse Juno and creates same garbage when 
 run from cmd line (pls. see attached source)
 I checked everything that comes in mind to find the differences, e.g. Java 
 version, encoding/codepage issues, memory settings, found nothing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-1915) Implement shading with Coons and tensor-product patch meshes

2014-07-16 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064016#comment-14064016
 ] 

Tilman Hausherr commented on PDFBOX-1915:
-

I took your code of PDFunctionType2  a step further (initialize all in 
constructor) and committed this in rev 1611163. However I don't expect much 
speed improvement - on the day I ran the profiler, the share of PDFunctiontype2 
went to 0% after the change in getN().

 Implement shading with Coons and tensor-product patch meshes
 

 Key: PDFBOX-1915
 URL: https://issues.apache.org/jira/browse/PDFBOX-1915
 Project: PDFBox
  Issue Type: Improvement
  Components: Rendering
Affects Versions: 1.8.5, 1.8.6, 2.0.0
Reporter: Tilman Hausherr
Assignee: Shaola Ren
  Labels: graphical, gsoc2014, java, math, shading
 Fix For: 2.0.0

 Attachments: CIB-coons-vs-tensormesh.pdf, CIB-coonsmesh.pdf, 
 CONICAL.pdf, GWG060_Shading_x1a.pdf, GWG060_Shading_x1a_1.png, HSBWHEEL.pdf, 
 McAfee-ShadingType7.pdf, Shadingtype6week1.pdf, TENSOR.pdf, XYZsweep.pdf, 
 _gwg060_shading_x1a.pdf-1.png, _mcafee-shadingtype7.pdf-1.png, 
 asy-coons-but-really-tensor.pdf, asy-tensor-rainbow.pdf, asy-tensor.pdf, 
 ch14.pdf, coons-function.pdf, coons-function.ps, coons-nofunction-CMYK.pdf, 
 coons-nofunction-CMYK.ps, coons-nofunction-Duotone.pdf, 
 coons-nofunction-Duotone.ps, coons-nofunction-Gray.pdf, 
 coons-nofunction-Gray.ps, coons-nofunction-RGB.pdf, coons-nofunction-RGB.ps, 
 coons2-function.pdf, coons2-function.ps, coons4-function.ps, crestron-p9.pdf, 
 eci_altona-test-suite-v2_technical_H.pdf, example_030.pdf, failedTest.rar, 
 lamp_cairo.pdf, lamp_cairo7_0.png, lamp_cairo7_1.png, lamp_cairo7_1.png, 
 lineRasterization.jpg, mcafeeU5.pdf, mcafeeU5_1.png, mcafeeu5.pdf-1.png, 
 pass4FlagTest.rar, patchCases.jpg, patchMap.jpg, shading6ContourTest.rar, 
 shading6Done.rar, shading7.rar, tensor-nofunction-RGB.pdf, 
 tensor-nofunction-RGB.ps, tensor-nofunction-RGB_1.png, 
 tensor4-nofunction.pdf, tensor4-nofunction.ps, tensor4-nofunction_1.png, 
 updateshading6ContourTest.rar


 Of the seven shading methods described in the PDF specification, type 6 
 (Coons patch meshes) and type 7 (Tensor-product patch meshes) haven't been 
 implemented. I have done type 1, 4 and 5, but I don't know the math for type 
 6 and 7. My math days are decades away.
 Knowledge prerequisites: 
 - java, although you don't have to be a java ace, just feel confortable
 - math: you should know what cubic Bézier curves, Degenerate Bézier 
 curves, bilinear interpolation, tensor-product, affine transform 
 matrix and Bernstein polynomials are, or be able to learn it
 - maven (basic)
 - svn (basic)
 - an IDE like Netbeans or Eclipse or IntelliJ (basic)
 - ideally, you are either a math student who likes to program, or a computer 
 science student who is specializing in graphics.
 A first look at PDFBOX: try the command utility here:
 https://pdfbox.apache.org/commandline/#pdfToImage
 and use your favorite PDF, or the PDFs mentioned in PDFBOX-615, these have 
 the shading types that are already implemented.
 Some simple source code to convert to images:
 String filename = blah.pdf;
 PDDocument document = PDDocument.loadNonSeq(new File(filename), null);
 ListPDPage pdPages = document.getDocumentCatalog().getAllPages();
 int page = 0;
 for (PDPage pdPage : pdPages)
 {
 ++page;
 BufferedImage bim = RenderUtil.convertToImage(pdPage, 
 BufferedImage.TYPE_BYTE_BINARY, 300);
 ImageIO.write(bim, png, new File(filename+page+.png));
 }
 document.close();
 You are not starting from scratch. The implementation of type 4 and 5 shows 
 you how to read parameters from the PDF and set the graphics. You don't have 
 to learn the complete PDF spec, only 15 pages related to the two shading 
 types, and 6 pages about shading in general. The PDF specification is here:
 http://www.adobe.com/devnet/pdf/pdf_reference.html
 The tricky parts are:
 - decide whether a point(x,y) is inside or outside a patch
 - decide the color of a point within the patch
 To get an idea about the code, look at the classes GouraudTriangle, 
 GouraudShadingContext, Type4ShadingContext and Vertex here
 https://svn.apache.org/viewvc/pdfbox/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/shading/
 or download the whole project from the repository.
 https://pdfbox.apache.org/downloads.html#scm
 If you want to see the existing code in the debugger with a Gouraud shading, 
 try this file:
 http://asymptote.sourceforge.net/gallery/Gouraud.pdf
 Testing:
 I have attached several example PDFs. To see which one has which shading, 
 open them with an editor like NOTEPAD++, and search for /ShadingType 
 (without the quotes). If your images are rendering like the example PDFs, 
 then you were 

Build failed in Jenkins: PDFBox-ant #1442

2014-07-16 Thread Apache Jenkins Server
See https://builds.apache.org/job/PDFBox-ant/1442/changes

Changes:

[tilman] PDFBOX-1915: add missing element in toString

[tilman] PDFBOX-1915: Optimization based on suggestion by Shaola Ren

[tilman] PDFBOX-1915: Optimization of type 6 and 7 shading by Shaola Ren as 
part of GSoC2014

--
Started by an SCM change
Building remotely on ubuntu-1 (Ubuntu ubuntu) in workspace 
https://builds.apache.org/job/PDFBox-ant/ws/
Updating http://svn.apache.org/repos/asf/pdfbox/trunk at revision 
'2014-07-16T21:00:10.534 +'
U 
pdfbox/src/main/java/org/apache/pdfbox/pdmodel/common/function/PDFunctionType2.java
D 
pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/shading/ColorRGB.java
U 
pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/shading/RadialShadingContext.java
U 
pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/shading/AxialShadingContext.java
U 
pdfbox/src/main/java/org/apache/pdfbox/pdmodel/graphics/shading/PatchMeshesShadingContext.java
At revision 1611188
FATAL: Cannot find executable from the chosen Ant installation Ant 1.7.0
Build step 'Invoke Ant' marked build as failure


[jira] [Commented] (PDFBOX-2206) Cannot save a document which has been closed

2014-07-16 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064130#comment-14064130
 ] 

John Hewson commented on PDFBOX-2206:
-

This is why RAII is a good thing, so we don't end up with a constructor that 
leaves an object unusable. I guess we need ConformingPDFParser to pass the 
trailer to the constructor.

 Cannot save a document which has been closed
 

 Key: PDFBOX-2206
 URL: https://issues.apache.org/jira/browse/PDFBOX-2206
 Project: PDFBox
  Issue Type: Bug
  Components: Utilities
Affects Versions: 2.0.0
Reporter: simon steiner
 Fix For: 2.0.0


 Any pdf gives:
 java -jar ~/pdf-box-svn/app/target/pdfbox-app-2.0.0-SNAPSHOT.jar 
 WriteDecodedDoc x.pdf 
 Exception in thread main java.io.IOException: Cannot save a document which 
 has been closed
   at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1230)
   at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1216)
   at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1204)
   at 
 org.apache.pdfbox.tools.WriteDecodedDoc.doIt(WriteDecodedDoc.java:125)
   at 
 org.apache.pdfbox.tools.WriteDecodedDoc.main(WriteDecodedDoc.java:191)
   at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:97)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2209) [PATCH] Restore shading API

2014-07-16 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064140#comment-14064140
 ] 

John Hewson commented on PDFBOX-2209:
-

We need a really clear and compelling use case for this patch and *exactly* 
which methods are needed, otherwise it will be reverted.

 [PATCH] Restore shading API
 ---

 Key: PDFBOX-2209
 URL: https://issues.apache.org/jira/browse/PDFBOX-2209
 Project: PDFBox
  Issue Type: Wish
  Components: Rendering
Affects Versions: 2.0.0
Reporter: simon steiner
Assignee: Tilman Hausherr
 Fix For: 2.0.0

 Attachments: shading.patch


 Some of shading API is gone in 2.0 can we have it back so we can convert PDF 
 to postscript in fop



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (PDFBOX-2209) [PATCH] Restore shading API

2014-07-16 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064140#comment-14064140
 ] 

John Hewson edited comment on PDFBOX-2209 at 7/16/14 9:36 PM:
--

We need a really clear and compelling use case for this patch and *exactly* 
which methods are needed, otherwise it will be reverted. I'm strongly against 
exposing implementation details which are so tightly coupled to PageDrawer.


was (Author: jahewson):
We need a really clear and compelling use case for this patch and *exactly* 
which methods are needed, otherwise it will be reverted.

 [PATCH] Restore shading API
 ---

 Key: PDFBOX-2209
 URL: https://issues.apache.org/jira/browse/PDFBOX-2209
 Project: PDFBox
  Issue Type: Wish
  Components: Rendering
Affects Versions: 2.0.0
Reporter: simon steiner
Assignee: Tilman Hausherr
 Fix For: 2.0.0

 Attachments: shading.patch


 Some of shading API is gone in 2.0 can we have it back so we can convert PDF 
 to postscript in fop



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-2210) [PATCH] Allow caching of glyphs

2014-07-16 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064159#comment-14064159
 ] 

John Hewson commented on PDFBOX-2210:
-

{quote}
PageDrawer.drawGlyph2D() to become part of the API (make it protected instead 
of private) so that I can intercept it and call something else on our special 
G2D implementation.
{quote}

I've recently added some new glyph APIs to a new class PDFGraphicsStreamEngine, 
which PageDrawer extends. You can probably get what you need by overriding 
those methods. Use of drawGlyphVector will be entirely removed from 2.0 soon.

Really though, PageDrawer isn't meant to be subclassed, ideally for 2.0 it will 
be marked final and PDFGraphicsStreamEngine can be used for anyone interested 
in intercepting the graphics commands (I do this myself, somewhat like your PCL 
output).



 [PATCH] Allow caching of glyphs
 ---

 Key: PDFBOX-2210
 URL: https://issues.apache.org/jira/browse/PDFBOX-2210
 Project: PDFBox
  Issue Type: Improvement
  Components: Rendering
Affects Versions: 2.0.0
Reporter: simon steiner
Assignee: John Hewson
 Attachments: drawglyphs.patch


 If you seperate transform from glyph it means we can reuse glyphs in fop 
 postscript output and get smaller output files



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Unmapped Glyphs in Font files

2014-07-16 Thread John Hewson
What do you mean by check the unmapped glyphs”, can you elaborate?

-- John

On 16 Jul 2014, at 04:35, Shivam Gupta shivam26.gu...@gmail.com wrote:

 Hi Team,
 
 I want to check the unmapped glyphs in the TTF Font file and map those
 glyphs based on user input.
 
 Can you please help me out in this.
 
 Thanks in advance
 
 -- 
 Shivam Gupta



[jira] [Commented] (PDFBOX-1000) Conforming parser

2014-07-16 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064174#comment-14064174
 ] 

Tilman Hausherr commented on PDFBOX-1000:
-

In PDFBOX-2206 we wanted to create the document catalog within the constructor 
of PDDocument to clean things up. But this isn't possible because 
ConformingPDFParser.parse() is creating a ConformingPDDocument, thus a 
PDDocument before the trailer is known to the document. So it would be nice if 
you'd have a look whether the code can be changed so that the document that is 
passed to the ConformingPDDocument has the trailer.

 Conforming parser
 -

 Key: PDFBOX-1000
 URL: https://issues.apache.org/jira/browse/PDFBOX-1000
 Project: PDFBox
  Issue Type: New Feature
  Components: Parsing
Reporter: Adam Nichols
Assignee: Adam Nichols
 Attachments: COSUnread.java, ConformingPDDocument.java, 
 ConformingPDFParser.java, ConformingPDFParserTest.java, PDFLexer.java, 
 PDFLexer.java, PDFStreamConstants.java, PDFStreamConstants.java, 
 XrefEntry.java, conforming-parser.patch, gdb-refcard.pdf


 A conforming parser will start at the end of the file and read backward until 
 it has read the EOF marker, the xref location, and trailer[1].  Once this is 
 read, it will read in the xref table so it can locate other objects and 
 revisions.  This also allows skipping objects which have been rendered 
 obsolete (per the xref table)[2].  It also allows the minimum amount of 
 information to be read when the file is loaded, and then subsequent 
 information will be loaded if and when it is requested.  This is all laid out 
 in the official PDF specification, ISO 32000-1:2008.
 Existing code will be re-used where possible, but this will require new 
 classes in order to accommodate the lazy reading which is a very different 
 paradigm from the existing parser.  Using separate classes will also 
 eliminate the possibility of regression bugs from making their way into the 
 PDDocument or BaseParser classes.  Changes to existing classes will be kept 
 to a minimum in order to prevent regression bugs.
 [1] Section 7.5.5 Conforming readers should read a PDF file from its end
 [2] Section 7.5.4 the entire file need not be read to locate any particular 
 object



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PDFBOX-1511) pdfMerger App produces Garbage

2014-07-16 Thread John Hewson (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064170#comment-14064170
 ] 

John Hewson commented on PDFBOX-1511:
-

Hi Kirk, can you attach a patch using svn diff please.

 pdfMerger App produces Garbage
 --

 Key: PDFBOX-1511
 URL: https://issues.apache.org/jira/browse/PDFBOX-1511
 Project: PDFBox
  Issue Type: Bug
  Components: Utilities
Affects Versions: 1.7.1
 Environment: Win XP; Windows Server 2008 R2; java version 1.6.0_21, 
Reporter: Michael Huber
 Attachments: 1.pdf, 2.pdf, PDFMergerUtility.java, PdfRenderer.java, 
 targetPdfMergeJava.pdf, targetPdfMergeUtilityApp.pdf


 pdfbox Utility pdfMerger produces a merged document containing garbage. All 
 merged pdf files are contained but Strings are destroyed.
 The source pdf files are created with graphviz and are readable without error 
 or disturbance both with Acrobat X and pdfbox pdfDebug Utility.
 Another astoundig thing is that a handcoded merger using pdfMergerUtility 
 class works fine when run within Eclipse Juno and creates same garbage when 
 run from cmd line (pls. see attached source)
 I checked everything that comes in mind to find the differences, e.g. Java 
 version, encoding/codepage issues, memory settings, found nothing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Getting Line of Text from TextPos

2014-07-16 Thread John Hewson
Hi Aaron,

You could store a map of TextPosition = Color which you populate in 
processTextPosition.
Lines are not known until the end of TextStripper’s processing (they have to be 
inferred) so you could override a method from one of the phases at the end and 
I think you should have access to either lines or TextPositions which are 
merged into continuous runs (I can’t remember which). Alternatively you can 
always grab all TextPositions with the same y position.

-- John

On 16 Jul 2014, at 13:27, Aaron Hartman aa...@hrtmn.net wrote:

 Hi everyone;
 I am currently scanning PDF’s for errors that have red text in them. I 
 accomplished this by extending the PDFTextStripper class and overriding the 
 processTextPosition method to examine the PDGraphicsState for the appropriate 
 color values.
 
 Once this position is found is it possible to extract only the line where 
 that red text resides? For the user it would be beneficial to see the line in 
 which the error occurs. Since the processTextPosition has the actual position 
 I was hoping there may be a way to extract the line with the error from 
 within this method, or by storing the position and accessing it elsewhere. 
 
 If there is a way to accomplish this, please let me know!
 
 Thank you for your time.
 
 -Aaron



[jira] [Updated] (PDFBOX-1511) pdfMerger App produces Garbage

2014-07-16 Thread Kirk Haines (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk Haines updated PDFBOX-1511:


Attachment: PDFMergerUtility.java.diff

Diff version of changes.

 pdfMerger App produces Garbage
 --

 Key: PDFBOX-1511
 URL: https://issues.apache.org/jira/browse/PDFBOX-1511
 Project: PDFBox
  Issue Type: Bug
  Components: Utilities
Affects Versions: 1.7.1
 Environment: Win XP; Windows Server 2008 R2; java version 1.6.0_21, 
Reporter: Michael Huber
 Attachments: 1.pdf, 2.pdf, PDFMergerUtility.java, 
 PDFMergerUtility.java.diff, PdfRenderer.java, targetPdfMergeJava.pdf, 
 targetPdfMergeUtilityApp.pdf


 pdfbox Utility pdfMerger produces a merged document containing garbage. All 
 merged pdf files are contained but Strings are destroyed.
 The source pdf files are created with graphviz and are readable without error 
 or disturbance both with Acrobat X and pdfbox pdfDebug Utility.
 Another astoundig thing is that a handcoded merger using pdfMergerUtility 
 class works fine when run within Eclipse Juno and creates same garbage when 
 run from cmd line (pls. see attached source)
 I checked everything that comes in mind to find the differences, e.g. Java 
 version, encoding/codepage issues, memory settings, found nothing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Getting Line of Text from TextPos

2014-07-16 Thread -A
John,
Excellent answer yet again, much appreciated!

-Aaron

On Wednesday, July 16, 2014, John Hewson j...@jahewson.com wrote:

 Hi Aaron,

 You could store a map of TextPosition = Color which you populate in
 processTextPosition.
 Lines are not known until the end of TextStripper’s processing (they have
 to be inferred) so you could override a method from one of the phases at
 the end and I think you should have access to either lines or TextPositions
 which are merged into continuous runs (I can’t remember which).
 Alternatively you can always grab all TextPositions with the same y
 position.

 -- John

 On 16 Jul 2014, at 13:27, Aaron Hartman aa...@hrtmn.net javascript:;
 wrote:

  Hi everyone;
  I am currently scanning PDF’s for errors that have red text in them. I
 accomplished this by extending the PDFTextStripper class and overriding the
 processTextPosition method to examine the PDGraphicsState for the
 appropriate color values.
 
  Once this position is found is it possible to extract only the line
 where that red text resides? For the user it would be beneficial to see the
 line in which the error occurs. Since the processTextPosition has the
 actual position I was hoping there may be a way to extract the line with
 the error from within this method, or by storing the position and accessing
 it elsewhere.
 
  If there is a way to accomplish this, please let me know!
 
  Thank you for your time.
 
  -Aaron