I've found that if I set the dpi to 72 the locations of the Rulings match the original PDF page.
On Tue, May 23, 2017 at 12:02 PM, Gilad Denneboom <[email protected] > wrote: > PS. I'm also happy to hear any ideas on how to achieve it using PDFBox on > its own, without tabula... > > On Tue, May 23, 2017 at 12:01 PM, Gilad Denneboom < > [email protected]> wrote: > >> There doesn't seem to be one... I guess I can try StackOverflow. >> >> On Tue, May 23, 2017 at 11:54 AM, Andreas Lehmkühler <[email protected]> >> wrote: >> >>> > Gilad Denneboom <[email protected]> hat am 22. Mai 2017 um >>> 22:07 geschrieben: >>> > >>> > >>> > Hi all, >>> > >>> > So I'm trying to identify hair-lines in my PDFs. I came across tabula, >>> > which seems to be able to do it, but I can't get it to quite work with >>> my >>> > files in the way I need it to, so any help is greatly appreciated! >>> > >>> > Here's what I've been doing so far: I used the Ruling object from >>> tabula to >>> > extract both the horizontal and vertical rules from a stripped version >>> of >>> > the PDF page (ie, after removing all the text in it). >>> > I'm getting results but now I want to relate them back to the original >>> PDF >>> > page, and that's proving difficult. If I add a text field using the >>> > coordinates of the Ruling objects they are way off then where I would >>> > expect them to be. I think it has to do with the DPI setting used to >>> > convert the PDF page to an image, which is necessary for the rulings >>> > extraction. >>> > So my question is: How can I take these Ruling objects and convert them >>> > back to the original coordinates of the PDF? >>> > I would also like to be able to only identify lines of a certain width >>> and >>> > height, but if I get the rectangles to work correctly I think I can do >>> that >>> > in post-processing. >>> Sounds like a question for the tabulapdf community ... >>> >>> Andreas >>> > >>> > Thanks in advance! >>> > Gilad >>> >> >> >

