On 23.05.2025 17:01, Robert Rodini wrote:
This question is informational. I use PDFBox utilities to extract text from a
large PDF file. The pages of the PDF always contain a three-column format. PDF
Box CLI utility is wonderful since it processes the columns from top to bottom
and left to right.
Is there a way to use Apache PDF Box to recognize column breaks (start of a new
column) and page breaks (start of new page) as the text is being extracted?
No but you could use ExtractTextByArea if you know the coordinates.
Tilman
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org