On 23.05.2025 17:01, Robert Rodini wrote:
This question is informational.  I use PDFBox utilities to extract text from a 
large PDF file.  The pages of the PDF always contain a three-column format. PDF 
Box CLI utility is wonderful since it processes the columns from top to bottom 
and left to right.

Is there a way to use Apache PDF Box to recognize column breaks (start of a new 
column) and page breaks (start of new page) as the text is being extracted?


No but you could use ExtractTextByArea if you know the coordinates.

Tilman


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Reply via email to