Hi Dane, As you might know, there's no thing such as tables in PDF files. The only way to extract them is to try to reconstruct the tabular arrangement from the characters' positions, ruling lines, and so on. I'm one of the maintainers of Tabula [1], which is a tool based on PDFBox that implements a number of algorithms to attempt that. We have a GUI tool [2], and a Java library [3]. Both are open source (MIT license)
Best, [1] http://tabula.technology [2] https://github.com/tabulapdf/tabula [3] https://github.com/tabulapdf/tabula-java -- Manuel Aristarán jazzido.com On Tue, Jul 18, 2017 at 9:28 AM, Dane Bezuidenhout < [email protected]> wrote: > The examples available are clear on constructing a table, but there is > little info on reading a table. I've investigated a few solution to this, > but feel that they are "hacky" in that they rely on establishing column and > row regions to read text from. > > Surely there is a canonical way to traverse the PDDocument table elements > and access table cells with reference to row and columns? > > Any advice would be appreciated. > > > Dane Bezuidenhout > SprintHive <https://sprinthive.com/> > > M: +27 82 562 7850 > > > vCard <http://www.sprinthive.com/files/dane.vcf> >

