Hi,

On Thu, Jan 7, 2010 at 7:38 AM, Godmar Back <[email protected]> wrote:
> when parsing an PDF file with 0.8.0incubator using the 'ExtractText' driver,
> I'm seeing these errors:
>
> Jan 7, 2010 12:32:27 AM org.apache.pdfbox.util.PDFStreamEngine
> processOperator
> INFO: unsupported/disabled operation: rg
> [...]
> According to table 74 in the PDF spec [1], 'rg' is a perfectly legal color
> operator; I haven't looked up the others.

The text extractor in PDFBox has explicitly been instructed to ignore
all color-related operators as they don't affect text extraction. So
in this case the operation is just "disabled", not "unsupported".

Since these log messages are a bit misleading we recently got rid of
them for text extraction. See
https://issues.apache.org/jira/browse/PDFBOX-581 for the details.

> The resulting .txt file, btw, contains:
>
> 9slashtwothreeslashtwozerozero8
>
> where 'pdftotext' produces:
>
> 9/23/2008

Hmm, that's interesting. Would you mind filing an issue in
https://issues.apache.org/jira/browse/PDFBOX about this?

BR,

Jukka Zitting

Reply via email to