Thx a lot for every answer. I managed to do it for relevant cases. 

Regards,
Lukas

-----Ursprüngliche Nachricht-----
Von: Peter Murray-Rust <peter.murray.r...@googlemail.com.INVALID> 
Gesendet: Freitag, 20. September 2019 18:58
An: users@pdfbox.apache.org
Betreff: Re: AW: Finding a Box containing text

I do a lot of this and there is no generic way. The rect might be a rect or
4 lines or a polyline 3  or  4 (or 5 for overlaps). It migh be drawn twice for 
emplhasis .
I have have some heuristics for creating probable rects.
in http://github.com/petermr/ami3
If you are serious and doing a *lot*  I can show you where to find them.


On Fri, 20 Sep 2019, 15:41 PDF Developer, <pdf...@yahoo.com.invalid> wrote:

>  Lukas.
> Quick answer:
> I looked at the page content stream using the PDFBox Debugger and 
> appendRectangle isn't triggering because there isn't a rectangle in 
> the page content stream. What is rendered is made up from a move and 
> lines. I also had to handle this in my project. One way would be to 
> Override other methods so that you catch a moveTo, lineTo, closePath 
> strokeAndFill etc and store the points, to see when closePath is called if 
> they form a rectangle.
>
> If I have time, I am about to go on a business trip, I will see if I 
> can cut down my code to illustrate this.
>
> PDFDev
>
>     On Friday, September 20, 2019, 2:58:58 PM GMT+1, STAMPF Lukas < 
> lukas.sta...@bat.at> wrote:
>
>  Hello,
>
> Thanks for the input.
> https://filebox.batmen.at/index.php/s/R2PA4HB6eIXkc8c
>
> Seems like I cant use the appendRectangle method. It does not trigger.
>
> Regards,
> Lukas
>
> -----Ursprüngliche Nachricht-----
> Von: PDF Developer <pdf...@yahoo.com.INVALID>
> Gesendet: Freitag, 20. September 2019 11:02
> An: users@pdfbox.apache.org
> Betreff: Re: Finding a Box containing text
>
>  Hello Lukas,
> This mailing list doesn't accept attachments; you probably want to use 
> a hosting site instead.
>
> I am currently working on a project that needs to identify text on a 
> page within a rectangle.
>
> This may or may not be appropriate but to do this I Overrride 
> "PDFGraphicsStreamEngine"; Which has a method appendRectangle, if your 
> PDF creation application is well behaved you can just use that. That 
> said in the real world a rectangle can be made up of lines and moves, 
> so you may have a bit more work to do.  If you have the coordinates of 
> the start of the string, then you could enumerate the rectangles to 
> see if the point was in a rectangle. Or you could use do things 
> slightly in reverse and use the bounds of the rectangle and use the 
> TextStripperByArea to get the text in the rectangle and identify if the 
> string is what you are looking for.
> Unfortunately I can't share my project code but if you can find 
> somewhere to host the PDF, I will see if I can use it as a test for my 
> code and if that is successful provide something by way of a slimmed down 
> example.
> PDFDev
>
>     On Friday, September 20, 2019, 9:07:20 AM GMT+1, STAMPF Lukas < 
> lukas.sta...@bat.at> wrote:
>
>   <!--#yiv9876807336 _filtered #yiv9876807336 {font-family:"Cambria
> Math";panose-1:2 4 5 3 5 4 6 3 2 4;} _filtered #yiv9876807336
> {font-family:Calibri;panose-1:2 15 5 2 2 2 4 3 2 4;}#yiv9876807336
> #yiv9876807336 p.yiv9876807336MsoNormal, #yiv9876807336 
> li.yiv9876807336MsoNormal, #yiv9876807336 div.yiv9876807336MsoNormal 
> {margin:0cm;margin-bottom:.0001pt;font-size:11.0pt;font-family:"Calibr
> i",
> sans-serif;}#yiv9876807336 a:link, #yiv9876807336 
> span.yiv9876807336MsoHyperlink
> {color:#0563C1;text-decoration:underline;}#yiv9876807336 a:visited,
> #yiv9876807336 span.yiv9876807336MsoHyperlinkFollowed
> {color:#954F72;text-decoration:underline;}#yiv9876807336
> span.yiv9876807336E-MailFormatvorlage17 {font-family:"Calibri",
> sans-serif;color:windowtext;}#yiv9876807336 
> .yiv9876807336MsoChpDefault {font-family:"Calibri", sans-serif;} 
> _filtered #yiv9876807336 {margin:70.85pt 70.85pt 2.0cm 
> 70.85pt;}#yiv9876807336
> div.yiv9876807336WordSection1 {}--> Hello,
>
>
>
> I am trying to find (x,y,widht,height) of a box containing a text 
> within an PDF document. Locating the text by inheriting the 
> TextPosition was pretty straightforward, but I had to realize that I 
> don’t know PDF Operators well enough to locate the box.
>
>
>
> Can somebody please have a look at the PDF I attached and tell me 
> which „q“ – „Q“ block represents my „FIND ME“ Box. Can I subclass 
> PDFRenderer to get the Box position?
>
>
>
> Regards,
>
> Lukas
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: users-h...@pdfbox.apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: users-h...@pdfbox.apache.org
>
>

Reply via email to