I'm doing as you said, first I find rectangular areas converting coordinates to 
display units then I get the text from them.

Thank you

--- Em qui, 4/2/10, Villu Ruusmann <[email protected]> escreveu:

> De: Villu Ruusmann <[email protected]>
> Assunto: Re: Conversion to display units
> Para: [email protected]
> Data: Quinta-feira, 4 de Fevereiro de 2010, 9:07
> Hello there,
> 
> >
> > I'm using PDFTextStripper to get text from a PDF
> document but I need to get text only from some regions in
> the PDF. I know these regions are being drawn using the "re"
> operator which draws a rectangle using x,y,width,height as
> arguments. How do I convert these four arguments to display
> units so I can compare them with the TextPosition.getX()?
> >
> 
> The PDF "re" operator is handled by class
> org.apache.pdfbox.util.operator.pagedrawer.AppendRectangleToPath.
> As
> the package name indicates, this class is meant to be used
> from within
> the PageDrawer utility, not from within the PDFTextStripper
> utility.
> If you take a look at this class you would see that the
> actual
> transformation is implemented in method
> org.apache.pdfbox.pdfviewer.PageDrawer#transformedPoint(double,
> double).
> 
> If I were given similar task, I would perform two runs on a
> PDF
> document, First I would use PageDrawer utility to capture
> rectangular
> areas (simply override #fillPath(int) and/or #strokePath,
> and grab
> #getLinePath there). Then I would use PDFTextStripper (or
> better yet,
> PDFTextStripperByArea), and extract text from the
> previously captured
> rectangular areas.
> 
> 
> VR
> 


      
____________________________________________________________________________________
Veja quais são os assuntos do momento no Yahoo! +Buscados
http://br.maisbuscados.yahoo.com

Reply via email to