I am very involved in trying to extract meaning from PDFs and add some
comments. Everything Eliot says is correct. The problem is hard and very
dependent on the source. An airline boarding pass has a very different
layout from a thesis. But I am optimistic that a limited solution can be
found in th
Dear Eliot,
> So in short, it's not unreasonable but it's also not something that can be
> easily generalized. For a general solution you have to have some way to
> configure the details about the pages you're extracting text from: the
> header and footer boundaries, the number of columns, the wri
2 matches
Mail list logo