Re: PageDrawer bug?

Tilman Hausherr Mon, 29 Sep 2014 23:01:04 -0700

Hi,

The best is to download source code from the source and not from somesecondary websites.


https://pdfbox.apache.org/download.cgi#recent

Still can't tell why it doesn't work for you because you didn't postyour code :-(


Tilman



Am 30.09.2014 um 05:56 schrieb Frank van der Hulst:

Thanks for the replies... I'm working with 1.8.7, but the same applied to
1.8.6 and I think 1.8.5.

convertToImage() works properly, which was a bit surprising when I looked
into it and found that it created a PageDrawer object. So I tried copying
the source code for convertToImage into my code. That worked fine too.

Then I tried copying the source from
http://grepcode.com/file/repo1.maven.org/maven2/org.apache.pdfbox/pdfbox/1.8.6/org/apache/pdfbox/pdfviewer/PageDrawer.java?av=f
(couldn't find 1.8.7) into my own PageDrawer class. That *doesn't* work
properly...  lines aren't drawn at all (probably off the page?). I don't
understand this at all... surely identical code will do the same thing???
Or is something else in the pdfbox library directly accessing
org.apache.pdfbox.pdfviewer.PageDrawer via one of its public methods?

This may be the case because when I changed my PageDrawer to extend
org.apache.pdfbox.pdfviewer.PageDrawer instead of PdfStreamEngine, it
worked perfectly. Which is all the more confusing because my original class
extended PageDrawer and didn't work.

Frank


On Tue, Sep 30, 2014 at 5:04 AM, Tilman Hausherr <[email protected]>
wrote:

Hi,

The best is to upload the code and the PDFs to a public location.

PDF is not easy... coordinates that you see in the stream are always
relative to the current transformation matrix.

Tilman

Am 29.09.2014 um 10:56 schrieb Frank van der Hulst:

  Hi all,

I'm new to the list... I beg your indulgence if I'm out of line here, but
here goes...

I'm working on a PDF table extractor.  This is my second attempt at it,
and
this one is based on extending PageDrawer.

In particular, I'm looking for table cells delineated by vertical &
horizontal lines, and then grabbing whatever text is inside the rectangle.

This works well for most PDFs I've tried (admittedly all from the same
source), but there's a large subset that it doesn't work on. I've debugged
my way through one, and it appears that when      processStream(page,
page.findResources(), page.getContents().getStream()); calls fillPath()
or
strokepath() to draw the lines, they aren't drawn in the correct place.
They seem to be offset some distance down the page.

I've looked at a couple of my troublesome PDFs, and one thing they have in
common is that they are v1.4, whereas the ones that work are v1.7.

Sooo... Has anyone encountered this before? Is there a known bug with
PageDrawer.processStream() or perhaps with the PdfStreamEngine and drawing
of v1.4 PDFs?

I'm happy to share my source code and example PDFs with anyone if it would
help.

Thanks

Frank

Re: PageDrawer bug?

Reply via email to