Hi All,

We had extended the algorithm in the following link to highlight text for PDFBox 2.x version.

Link:https://gist.github.com/joelkuiper/331a399961941989fec8It was originally written for PDFBox 1.8.x.

For some documents, it failed to highlight the given text. On debugging, we found that, it could not match the text in that page due to characters "ffi" present in the search string.

Complete search string is:

"efficiently. Fast trigger mechanisms are needed to curate events of interest online and\nsensitive statistical tools are needed to extract as much"

Actually, the above string is present in the PDF file. However, we can highlight the sub string, after removing the first characters in the search string.

 Thanks in advance.
- CM

Reply via email to