Re: Help with Charset

Omar Chiyean Thu, 22 Oct 2009 18:54:16 -0700

Thanks Andreas...
It really worked, i've updated to 0.8.0 and check ExtractText using
the same sintax is working ok.


Now I have this output of logging information, what is it??

22/10/2009 08:48:06 PM org.apache.pdfbox.util.PDFStreamEngine
processOperator
INFO: unsupported/disabled operation: cs
22/10/2009 08:48:06 PM org.apache.pdfbox.util.PDFStreamEngine
processOperator
INFO: unsupported/disabled operation: CS
22/10/2009 08:48:06 PM org.apache.pdfbox.util.PDFStreamEngine
processOperator
INFO: unsupported/disabled operation: sc
22/10/2009 08:48:06 PM org.apache.pdfbox.util.PDFStreamEngine
processOperator
INFO: unsupported/disabled operation: SC


Can I disable it or How do I dissable logging???

Thanks in advance...

2009/10/22 Andreas Lehmkühler <andr...@lehmi.de>

> Hi,
>
> Omar Chiyean schrieb:
>  > Hi there...
> > I'm new with PDFBox and i'm extracting text from some pdf and letting
> them
> > in a String variable. Now my problem is the latin characters as accentued
> > letter are not suited as they would.
> >
> > How can I set the charset or how can i see the charset returned from the
> > TextStripper from PDFBox??
> >
> > I read it was UTF-16BE but when i get byte code with this charset and
> > translate it to ISO-8859-1 i get letter separated with a space and no
> luck
> > with accented letters...
> >
> > So whats wrong or can you help me to correct this?? I'm using PDFBOX
> 0.7.3
> First of all I suggest to update to PDFBox 0.8. It includes a lot of
> improvements and bugfixes. Back to your question. Your are able to
> choose the needed charset before extraction. Have a look at ExtractText
> as an example how to use the text extraction.
>
> BR
> Andreas Lehmkühler
>

Re: Help with Charset

Reply via email to