Position of each individual word -------------------------------- Key: PDFBOX-486 URL: https://issues.apache.org/jira/browse/PDFBOX-486 Project: PDFBox Issue Type: Wish Components: Text extraction, Utilities Affects Versions: 0.8.0-incubator Reporter: matija kancijan
Is it possible to extract possition of each word from te pdf? Similar to the PDFHighlighter class where output is xml file with page and possitions of the word. With this option you cold mark whole article and in addition produce your own xml file to select it in pdf file. When this could be also combined with PDFText2HTML class, you would have structure of the original pdf file and possition of the word, so the selection of articles would be much easier. This could be useful with bookmarks too. (I am new to the pdfbox, so if someone can put me in the right direction i would gladly do this... ;) ) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.