See the comments by me and mkl in the SO question.

Tilman

Am 23.07.2018 um 19:08 schrieb CM Reddy:

Hi All,

We are using PDFBox 1.8.14 to manage PDF documents in our application. Implemented algorithm listed in link <https://stackoverflow.com/questions/33253757/java-apache-pdfbox-extract-highlighted-text/51446785#51446785>to read the highlighted from PDF documents. During testing the code, we noticed that, text read from multiple line highlights got jumbled. Please find the attached document with three highlights.

 1. First highlight is a single line highlight - It works fine
      * Extracted text : "Only a resident of Michigan may be issued a
        Michigan driver's license"

 2. Second and third are multi-line highlights - Text jumbled.
      * Extracted text for 2nd highlight is:
          o You ask whether, in light of OAG, 1995-1996, No 6883, p
            120 (December 14, 1995) (OAG No 6883), the Michigan
            Secretary of State is
            No 68
            alien1
            required to issue a driver's license to an illegal
            living in Michigan

      * Extracted text for3rd highlight is:
          o iad circumstances, including cashing a check,
            At one time, the federal government assigned social
            closing on a loan, gaining employment, and securing access
            to a commercial airplane. At one
            security numbers for certain valid nonwork purposes,
            including for the purpose of obtaining

Help us resolving the above issues.

- Thanks in advance.




---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org


Reply via email to