Hi, ----- original Nachricht --------
Betreff: Re: Illegible decoding in some pdf documents Gesendet: So, 16. Mai 2010 Von: Thomas Fischer<[email protected]> > Hallo Andreas, > > I added some comments and files to > https://issues.apache.org/jira/browse/PDFBOX-534 > and created three new issues > https://issues.apache.org/jira/browse/PDFBOX-727 to -729 > which I suppose are different from the one described in PDFBOX-534: > TeX remnants, hex-decoding and unreadable text of a different kind, all > TeX-related. > > There are different methods to create PDF documents from TeX (actually, > usually LaTeX these days): > > SNIP ..... > > Either way, these TeX-created documents seem to present specific challenges > for PDFBox. Since we need to make these files available for full-text > search, we would be very happy if their text extraction could be improved. > I'm ready to help with tests and examples; I am afraid my lack of experience > in Java limits my direct help in the development of the code. I guess I've fixed most of those issues. There are only a few mappings missing, but I'm sure we will find and add them by and by. BR Andreas lehmkühler

