Hi

Omar Chiyean schrieb:
> Thanks Andreas...
> It really worked, i've updated to 0.8.0 and check ExtractText using
> the same sintax is working ok.
> 
> Now I have this output of logging information, what is it??
> 
> 22/10/2009 08:48:06 PM org.apache.pdfbox.util.PDFStreamEngine
> processOperator
> INFO: unsupported/disabled operation: cs
> 22/10/2009 08:48:06 PM org.apache.pdfbox.util.PDFStreamEngine
> processOperator
> INFO: unsupported/disabled operation: CS
> 22/10/2009 08:48:06 PM org.apache.pdfbox.util.PDFStreamEngine
> processOperator
> INFO: unsupported/disabled operation: sc
> 22/10/2009 08:48:06 PM org.apache.pdfbox.util.PDFStreamEngine
> processOperator
> INFO: unsupported/disabled operation: SC
> 
> 
> Can I disable it or How do I dissable logging???
Yes. Have a look at [1]. The mentioned logging.properties file is part
of the source distribution of pdfbox.


[1] http://markmail.org/message/3wpukybujqsbfna5

> 
> Thanks in advance...
> 
> 2009/10/22 Andreas Lehmkühler <andr...@lehmi.de>
> 
>> Hi,
>>
>> Omar Chiyean schrieb:
>>  > Hi there...
>>> I'm new with PDFBox and i'm extracting text from some pdf and letting
>> them
>>> in a String variable. Now my problem is the latin characters as accentued
>>> letter are not suited as they would.
>>>
>>> How can I set the charset or how can i see the charset returned from the
>>> TextStripper from PDFBox??
>>>
>>> I read it was UTF-16BE but when i get byte code with this charset and
>>> translate it to ISO-8859-1 i get letter separated with a space and no
>> luck
>>> with accented letters...
>>>
>>> So whats wrong or can you help me to correct this?? I'm using PDFBOX
>> 0.7.3
>> First of all I suggest to update to PDFBox 0.8. It includes a lot of
>> improvements and bugfixes. Back to your question. Your are able to
>> choose the needed charset before extraction. Have a look at ExtractText
>> as an example how to use the text extraction.
>>
>> BR
>> Andreas Lehmkühler

BR
Andreas Lehmkühler

Reply via email to