Re: Help identifying hair-lines in PDFs using PDFBox and tabula

Gilad Denneboom Tue, 23 May 2017 03:02:11 -0700

There doesn't seem to be one... I guess I can try StackOverflow.

On Tue, May 23, 2017 at 11:54 AM, Andreas Lehmkühler <[email protected]>
wrote:


> > Gilad Denneboom <[email protected]> hat am 22. Mai 2017 um
> 22:07 geschrieben:
> >
> >
> > Hi all,
> >
> > So I'm trying to identify hair-lines in my PDFs. I came across tabula,
> > which seems to be able to do it, but I can't get it to quite work with my
> > files in the way I need it to, so any help is greatly appreciated!
> >
> > Here's what I've been doing so far: I used the Ruling object from tabula
> to
> > extract both the horizontal and vertical rules from a stripped version of
> > the PDF page (ie, after removing all the text in it).
> > I'm getting results but now I want to relate them back to the original
> PDF
> > page, and that's proving difficult. If I add a text field using the
> > coordinates of the Ruling objects they are way off then where I would
> > expect them to be. I think it has to do with the DPI setting used to
> > convert the PDF page to an image, which is necessary for the rulings
> > extraction.
> > So my question is: How can I take these Ruling objects and convert them
> > back to the original coordinates of the PDF?
> > I would also like to be able to only identify lines of a certain width
> and
> > height, but if I get the rectangles to work correctly I think I can do
> that
> > in post-processing.
> Sounds like a question for the tabulapdf community ...
>
> Andreas
> >
> > Thanks in advance!
> > Gilad
>

Re: Help identifying hair-lines in PDFs using PDFBox and tabula

Reply via email to