Hi,

There is no direct API for that. What you could do is to collect the red rectangles and the cyan rectangles. Use the bottom side of the red rectangles to decide what's in a common line, and then use the cyan shapes to build a common rectangle.

Yes that one method is private. If you really think that this would help, then copy the source of PDFTextStripper from the source download, rename it and adjust that method.

Tilman

PS: Your image didn't get through. I assume it is an output of DrawPrintTextLocations.

Am 05.08.2020 um 13:19 schrieb Ahmad Al-Mughrabi:
Hi PDFBox team,

Thanks for the great framework. I'm looking for the ability to have the coordinates information (x, y, width, height) for each line in a given page. In this example <https://svn.apache.org/viewvc/pdfbox/trunk/examples/src/main/java/org/apache/pdfbox/examples/util/DrawPrintTextLocations.java?view=markup&sortby=date>, the output is a rectangle that wrapped each word. See the snapshot below:
03054985.2018.1409975-marked-1.png
From the *org.apache.pdfbox.text.PDFTextStripper#writeLine*, we see the method is not allowed for overriding, the method is *private* as long as *org.apache.pdfbox.text.PDFTextStripper.WordWithTextPositions*.

      /**
         * Write a list of string containing a whole line of a document.
         *
         * @param line a list with the words of the given line
         * @throws IOException if something went wrong
         */
        private void writeLine(List<WordWithTextPositions> line)
                throws IOException
        {
            int numberOfStrings = line.size();
            for (int i = 0; i < numberOfStrings; i++)
            {
                WordWithTextPositions word = line.get(i);
                writeString(word.getText(), word.getTextPositions());
                if (i < numberOfStrings - 1)
                {
                    writeWordSeparator();
                }
            }
        }


Can you please point me how I can obtain the line coordinates for a given page?

Thanks a million,
--
[Atypon Systems LLC] <https://www.atypon.com/>
Ahmad Al Mughrabi | Principle Software Engineer
141 Makkah Al Mukaramah Street, Hamadani 1 Complex, 3rd Floor, Amman 11181 Jordan
mobile +962788880753 | amughr...@atypon.com <mailto:amughr...@atypon.com>
[www.atypon.com] atypon .com
[Atypon Awards]
CONFIDENTIAL: This email and any attachments may contain confidential and legally privileged information for the exclusive use of the designated recipients. Unauthorized review, use, storage, disclosure or distribution is prohibited. If you are not the intended recipient, contact the sender and destroy all copies of the original message.


Reply via email to