Hi,

Your image didn't get  through, but you might want to run the DrawPrintTextLocations.java example from the source code download, this shows the bounds of characters.

Tilman

On 29.01.2025 18:45, NH Rao wrote:
Greetings,

This is not a direct PDF box question, but hoping someone here knows the answer.

I've noticed many PDF viewers such as the built-in viewer in chrome/firefox have a concept of a block when you start selecting the text. Given a visually tabular structure, many times, multi row text selected is from one column and at a certain point it spills over to the next column.

This makes me think somehow viewers are able to detect the block. I've examined a few PDFs and text blocks or rows are multiple text operations - usually one per row but sometimes for a word or in few cases one per character. Hopefully, the attached picture shows what I mean. In this case, only left hand entries are selected, but amounts are not selected even though it's part of the same visual row.

How does the viewer sense this is one text block? I think this is useful functionality and would like to grab interesting blocks from PDF using PDF box. I know about area based text stripper, but feel like viewers are extending that concept further.

image.png

Regards,

Niranjan

Reply via email to