Re: Re: Illegible decoding in some pdf documents

Andreas Lehmkühler Mon, 17 May 2010 00:00:49 -0700

Hi,

----- original Nachricht --------


Betreff: Re: Illegible decoding in some pdf documents
Gesendet: So, 16. Mai 2010
Von: Thomas Fischer<[email protected]>

> Hallo Andreas,
> 
> I added some comments and files to 
> https://issues.apache.org/jira/browse/PDFBOX-534
> and created three new issues
> https://issues.apache.org/jira/browse/PDFBOX-727 to -729
> which I suppose are different from the one described in PDFBOX-534:
> TeX remnants, hex-decoding and unreadable text of a different kind, all
> TeX-related.
> 
> There are different methods to create PDF documents from TeX (actually,
> usually LaTeX these days):
> 
>  SNIP .....
> 
> Either way, these TeX-created documents seem to present specific challenges
> for PDFBox. Since we need to make these files available for full-text
> search, we would be very happy if their text extraction could be improved.
> I'm ready to help with tests and examples; I am afraid my lack of experience
> in Java limits my direct help in the development of the code.
I guess I've fixed most of those issues. There are only a few mappings missing, 
but
I'm sure we will find and add them by and by.

BR
Andreas lehmkühler

Re: Re: Illegible decoding in some pdf documents

Reply via email to