I do a lot of this and there is no generic way. The rect might be a rect or 4 lines or a polyline 3 or 4 (or 5 for overlaps). It migh be drawn twice for emplhasis . I have have some heuristics for creating probable rects. in http://github.com/petermr/ami3 If you are serious and doing a *lot* I can show you where to find them.
On Fri, 20 Sep 2019, 15:41 PDF Developer, <pdf...@yahoo.com.invalid> wrote: > Lukas. > Quick answer: > I looked at the page content stream using the PDFBox Debugger and > appendRectangle isn't triggering because there isn't a rectangle in the > page content stream. What is rendered is made up from a move and lines. I > also had to handle this in my project. One way would be to Override other > methods so that you catch a moveTo, lineTo, closePath strokeAndFill etc and > store the points, to see when closePath is called if they form a rectangle. > > If I have time, I am about to go on a business trip, I will see if I can > cut down my code to illustrate this. > > PDFDev > > On Friday, September 20, 2019, 2:58:58 PM GMT+1, STAMPF Lukas < > lukas.sta...@bat.at> wrote: > > Hello, > > Thanks for the input. > https://filebox.batmen.at/index.php/s/R2PA4HB6eIXkc8c > > Seems like I cant use the appendRectangle method. It does not trigger. > > Regards, > Lukas > > -----Ursprüngliche Nachricht----- > Von: PDF Developer <pdf...@yahoo.com.INVALID> > Gesendet: Freitag, 20. September 2019 11:02 > An: users@pdfbox.apache.org > Betreff: Re: Finding a Box containing text > > Hello Lukas, > This mailing list doesn't accept attachments; you probably want to use a > hosting site instead. > > I am currently working on a project that needs to identify text on a page > within a rectangle. > > This may or may not be appropriate but to do this I Overrride > "PDFGraphicsStreamEngine"; Which has a method appendRectangle, if your PDF > creation application is well behaved you can just use that. That said in > the real world a rectangle can be made up of lines and moves, so you may > have a bit more work to do. If you have the coordinates of the start of > the string, then you could enumerate the rectangles to see if the point was > in a rectangle. Or you could use do things slightly in reverse and use the > bounds of the rectangle and use the TextStripperByArea to get the text in > the rectangle and identify if the string is what you are looking for. > Unfortunately I can't share my project code but if you can find somewhere > to host the PDF, I will see if I can use it as a test for my code and if > that is successful provide something by way of a slimmed down example. > PDFDev > > On Friday, September 20, 2019, 9:07:20 AM GMT+1, STAMPF Lukas < > lukas.sta...@bat.at> wrote: > > <!--#yiv9876807336 _filtered #yiv9876807336 {font-family:"Cambria > Math";panose-1:2 4 5 3 5 4 6 3 2 4;} _filtered #yiv9876807336 > {font-family:Calibri;panose-1:2 15 5 2 2 2 4 3 2 4;}#yiv9876807336 > #yiv9876807336 p.yiv9876807336MsoNormal, #yiv9876807336 > li.yiv9876807336MsoNormal, #yiv9876807336 div.yiv9876807336MsoNormal > {margin:0cm;margin-bottom:.0001pt;font-size:11.0pt;font-family:"Calibri", > sans-serif;}#yiv9876807336 a:link, #yiv9876807336 > span.yiv9876807336MsoHyperlink > {color:#0563C1;text-decoration:underline;}#yiv9876807336 a:visited, > #yiv9876807336 span.yiv9876807336MsoHyperlinkFollowed > {color:#954F72;text-decoration:underline;}#yiv9876807336 > span.yiv9876807336E-MailFormatvorlage17 {font-family:"Calibri", > sans-serif;color:windowtext;}#yiv9876807336 .yiv9876807336MsoChpDefault > {font-family:"Calibri", sans-serif;} _filtered #yiv9876807336 > {margin:70.85pt 70.85pt 2.0cm 70.85pt;}#yiv9876807336 > div.yiv9876807336WordSection1 {}--> Hello, > > > > I am trying to find (x,y,widht,height) of a box containing a text within > an PDF document. Locating the text by inheriting the TextPosition was > pretty straightforward, but I had to realize that I don’t know PDF > Operators well enough to locate the box. > > > > Can somebody please have a look at the PDF I attached and tell me which > „q“ – „Q“ block represents my „FIND ME“ Box. Can I subclass PDFRenderer to > get the Box position? > > > > Regards, > > Lukas > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org > For additional commands, e-mail: users-h...@pdfbox.apache.org > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org > For additional commands, e-mail: users-h...@pdfbox.apache.org > >