Hello Jo, So you are extracting strings from the contents stream? These are dependent on the actual encoding of the font being used. So PdfString does not know how to convert them into Unicode. You might want to look at the podofotxtextract example on how to do that, but please note: PoDoFo does not support all possible encodings yet. So you will need to add the missing encodings your self.
Best regards,
Dom
Am Montag 02 August 2010 schrieb Jo Van der Snickt:
> Hello,
>
> I'm trying to parse a PDF document to extract all the text. For the array
> type I check each element for its type and I only consider the array
> elements that contain either a string or a hexstring.
>
> For the strings I retrieve the value with
> textArray.GetString().GetStringUtf8() which works just fine. But, for the
> hexstring I get weird results in the buffer. To investigate the content of
> the buffer I used the following piece of code:
>
> else if ( textArray[i].IsHexString() )
> {
> char * ptrHexString = static_cast<char *>( malloc( sizeof(char) * (
> textArray[i].GetString().GetLength() + 2 ) ) ); memcpy( ptrHexString,
> textArray[i].GetString().GetString(), textArray[i].GetString().GetLength()
> );
>
> for ( int strIndex = 0; strIndex <
> static_cast<int>(textArray[i].GetString().GetLength()); strIndex++ ) {
> cout << setw(2) << setfill('0') << dec << strIndex << ": " << hex <<
> setw(2) << setfill('0') << static_cast<int>(ptrHexString[strIndex]) << " "
> << static_cast<char>(ptrHexString[strIndex] + 0x1d) << endl; }
> free( ptrHexString );
> }
>
> This displays something like:
>
> 00: 00
> 01: 4c i
> 02: 00
> 03: 51 n
> 04: 00
> 05: 57 t
> 06: 00
> 07: 55 r
> 08: 00
> 09: 52 o
> 10: 00
> 11: 47 d
> 12: 00
> 13: 58 u
> 14: 00
> 15: 46 c
> 16: 00
> 17: 46 c
> 18: 00
> 19: 4c i
>
> I looks like a two byte encoding (first byte 0x00), but note that I had to
> add 0x1d to the actual byte to get the character I'm expecting (here the
> text "introduci").
>
> Any idea what I could have done wrong?
> The document that I use to test displays correctly in Acrobat Reader.
>
> - Jo
>
>
>
> This e-mail and any attachments thereto may contain information which is
> confidential and/or protected by intellectual property rights and are
> intended for the sole use of the recipient(s) named above. Any use of the
> information contained herein (including, but not limited to, total or
> partial reproduction, communication or distribution in any form) by
> persons other then the designated recipient(s) is prohibited. If you have
> received this e-mail in error, please notify the sender either by
> telephone or by e-mail and delete the material from any computer. Thank
> you for your cooperation.
>
> Dilys bvba
> Nieuwe Stationsstraat 23
> 9160 Lokeren
>
> tel +32 9 356 97 13
> fax +32 9 353 90 11
>
> mailto:[email protected]
> http://www.dilys.be
>
>
>
> ---------------------------------------------------------------------------
> --- The Palm PDK Hot Apps Program offers developers who use the
> Plug-In Development Kit to bring their C/C++ apps to Palm for a share
> of $1 Million in cash or HP Products. Visit us here for more details:
> http://p.sf.net/sfu/dev2dev-palm
> _______________________________________________
> Podofo-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/podofo-users
>
--
**********************************************************************
Dominik Seichter - [email protected]
KRename - http://www.krename.net - Powerful batch renamer for KDE
KBarcode - http://www.kbarcode.net - Barcode and label printing
PoDoFo - http://podofo.sf.net - PDF generation and parsing library
SchafKopf - http://schafkopf.berlios.de - Schafkopf, a card game, for KDE
Alan - http://alan.sf.net - A Turing Machine in Java
**********************************************************************
signature.asc
Description: This is a digitally signed message part.
------------------------------------------------------------------------------ The Palm PDK Hot Apps Program offers developers who use the Plug-In Development Kit to bring their C/C++ apps to Palm for a share of $1 Million in cash or HP Products. Visit us here for more details: http://p.sf.net/sfu/dev2dev-palm
_______________________________________________ Podofo-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/podofo-users
