I'm doing as you said, first I find rectangular areas converting coordinates to display units then I get the text from them.
Thank you --- Em qui, 4/2/10, Villu Ruusmann <[email protected]> escreveu: > De: Villu Ruusmann <[email protected]> > Assunto: Re: Conversion to display units > Para: [email protected] > Data: Quinta-feira, 4 de Fevereiro de 2010, 9:07 > Hello there, > > > > > I'm using PDFTextStripper to get text from a PDF > document but I need to get text only from some regions in > the PDF. I know these regions are being drawn using the "re" > operator which draws a rectangle using x,y,width,height as > arguments. How do I convert these four arguments to display > units so I can compare them with the TextPosition.getX()? > > > > The PDF "re" operator is handled by class > org.apache.pdfbox.util.operator.pagedrawer.AppendRectangleToPath. > As > the package name indicates, this class is meant to be used > from within > the PageDrawer utility, not from within the PDFTextStripper > utility. > If you take a look at this class you would see that the > actual > transformation is implemented in method > org.apache.pdfbox.pdfviewer.PageDrawer#transformedPoint(double, > double). > > If I were given similar task, I would perform two runs on a > PDF > document, First I would use PageDrawer utility to capture > rectangular > areas (simply override #fillPath(int) and/or #strokePath, > and grab > #getLinePath there). Then I would use PDFTextStripper (or > better yet, > PDFTextStripperByArea), and extract text from the > previously captured > rectangular areas. > > > VR > ____________________________________________________________________________________ Veja quais são os assuntos do momento no Yahoo! +Buscados http://br.maisbuscados.yahoo.com

