Re: Help identifying hair-lines in PDFs using PDFBox and tabula

Andreas Lehmkühler Tue, 23 May 2017 02:54:34 -0700

> Gilad Denneboom <[email protected]> hat am 22. Mai 2017 um 22:07 
> geschrieben:
> 
> 
> Hi all,
> 
> So I'm trying to identify hair-lines in my PDFs. I came across tabula,
> which seems to be able to do it, but I can't get it to quite work with my
> files in the way I need it to, so any help is greatly appreciated!
> 
> Here's what I've been doing so far: I used the Ruling object from tabula to
> extract both the horizontal and vertical rules from a stripped version of
> the PDF page (ie, after removing all the text in it).
> I'm getting results but now I want to relate them back to the original PDF
> page, and that's proving difficult. If I add a text field using the
> coordinates of the Ruling objects they are way off then where I would
> expect them to be. I think it has to do with the DPI setting used to
> convert the PDF page to an image, which is necessary for the rulings
> extraction.
> So my question is: How can I take these Ruling objects and convert them
> back to the original coordinates of the PDF?
> I would also like to be able to only identify lines of a certain width and
> height, but if I get the rectangles to work correctly I think I can do that
> in post-processing.
Sounds like a question for the tabulapdf community ...


Andreas
> 
> Thanks in advance!
> Gilad

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Help identifying hair-lines in PDFs using PDFBox and tabula

Reply via email to