Re: detection of column breaks and page breaks in PDF document

Tilman Hausherr Fri, 23 May 2025 10:20:34 -0700

On 23.05.2025 17:01, Robert Rodini wrote:

This question is informational.  I use PDFBox utilities to extract text from a 
large PDF file.  The pages of the PDF always contain a three-column format. PDF 
Box CLI utility is wonderful since it processes the columns from top to bottom 
and left to right.


Is there a way to use Apache PDF Box to recognize column breaks (start of a new 
column) and page breaks (start of new page) as the text is being extracted?



No but you could use ExtractTextByArea if you know the coordinates.

Tilman


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Re: detection of column breaks and page breaks in PDF document

Reply via email to