Yes, it is. This is almost what I am working on at the moment.
To prevent you from wasting much time on research, have a look at the PDFStreamEngine (more precisely override the processTextPosition function). If you manage to extend PDFTextStripper, it may be better since it manages text flows even if it is columned layered. I didn't manage to do this and PDFStreamEngine suites my needs at the moment.

In the PDF, text is cut in groups of words... and sometimes even words are cut in half. You'll have to process the text flow with a back match memory when parsing the flow. You'll need to deal with the graphic state (to get the text coordinates) and will have to hack it a bit to get the approximate position of words or sentences you are looking for (because of the text flow structure).


Julien PLÉE


Le 17 sept. 10 à 20:24, José Rodolfo Carrijo de Freitas a écrit :

Hello,

Do you believe it is possible to read a text from a pdf and wrap a text with
a link?

For example:

if it founds “pdfbox” on the box, it will link it to the pdfbox website.



Thanks,

José Rodolfo Carrijo de Freitas
Analista de Sistemas
Softplan - Departamento de pesquisa e desenvolvimento

Sistema da Qualidade Certificado ISO 9001:2008
(48) 3027 8000 Ramal 8359
http://www.softplan.com.br




Reply via email to