extracting vector graphics

Ray Weidner Mon, 22 Aug 2011 14:13:33 -0700

Hi,

I'm currently using PDFBox to provide me with text/location information in
order to heuristically detect table structures in a document.  One way I'd
like to enhance this is by making use of actual grid lines, when they are
present.  To do this, I believe I need to extract the vector graphics
commands from the document.


I found one helpful post on this matter in the mail archives (
http://mail-archives.apache.org/mod_mbox/pdfbox-users/200902.mbox/browser).
The recommendation was simply to override PageDrawer in order to intercept
graphics commands.  This sounds like a good idea, but I'm totally unsure of
how to interpret the calls that I should be intercepting.  Can anyone give
me some advice here, or point me to a document that should make things
clearer?

Please be aware that I am both a newbie to PDFBox as well as the PDF
document standard, so don't assume too much about what I already know.
Thanks in advance.

Ray Weidner

extracting vector graphics

Reply via email to