Hi,
Your image didn't get through, but you might want to run the
DrawPrintTextLocations.java example from the source code download, this
shows the bounds of characters.
Tilman
On 29.01.2025 18:45, NH Rao wrote:
Greetings,
This is not a direct PDF box question, but hoping someone here knows
the answer.
I've noticed many PDF viewers such as the built-in viewer in
chrome/firefox have a concept of a block when you start selecting the
text. Given a visually tabular structure, many times, multi row text
selected is from one column and at a certain point it spills over to
the next column.
This makes me think somehow viewers are able to detect the block. I've
examined a few PDFs and text blocks or rows are multiple text
operations - usually one per row but sometimes for a word or in few
cases one per character. Hopefully, the attached picture shows what I
mean. In this case, only left hand entries are selected, but amounts
are not selected even though it's part of the same visual row.
How does the viewer sense this is one text block? I think this is
useful functionality and would like to grab interesting blocks from
PDF using PDF box. I know about area based text stripper, but feel
like viewers are extending that concept further.
image.png
Regards,
Niranjan