Re: Text for Ebook Readers

2013-01-27 Thread Peter Murray-Rust
I am very involved in trying to extract meaning from PDFs and add some comments. Everything Eliot says is correct. The problem is hard and very dependent on the source. An airline boarding pass has a very different layout from a thesis. But I am optimistic that a limited solution can be found in th

Re: Text for Ebook Readers

2013-01-27 Thread Thomas Fischer
Dear Eliot, > So in short, it's not unreasonable but it's also not something that can be > easily generalized. For a general solution you have to have some way to > configure the details about the pages you're extracting text from: the > header and footer boundaries, the number of columns, the wri