Hello,

I'm trying to parse a PDF document to extract all the text. For the array type 
I check each element for its type and I only consider the array elements that 
contain either a string or a hexstring.

For the strings I retrieve the value with textArray.GetString().GetStringUtf8() 
which works just fine. But, for the hexstring I get weird results in the 
buffer. To investigate the content of the buffer I used the following piece of 
code:

  else if ( textArray[i].IsHexString() )
  {
    char * ptrHexString = static_cast<char *>( malloc( sizeof(char) * ( 
textArray[i].GetString().GetLength() + 2 ) ) );
    memcpy( ptrHexString, textArray[i].GetString().GetString(), 
textArray[i].GetString().GetLength() );

    for ( int strIndex = 0; strIndex < 
static_cast<int>(textArray[i].GetString().GetLength()); strIndex++ )
    {
       cout << setw(2) << setfill('0') << dec << strIndex << ": " << hex << 
setw(2) << setfill('0') << static_cast<int>(ptrHexString[strIndex])
                                << " " << 
static_cast<char>(ptrHexString[strIndex] + 0x1d) << endl;
    }
    free( ptrHexString );
  }

This displays something like:

00: 00
01: 4c i
02: 00
03: 51 n
04: 00
05: 57 t
06: 00
07: 55 r
08: 00
09: 52 o
10: 00
11: 47 d
12: 00
13: 58 u
14: 00
15: 46 c
16: 00
17: 46 c
18: 00
19: 4c i

I looks like a two byte encoding (first byte 0x00), but note that I had to add 
0x1d to the actual byte to get the character I'm expecting (here the text 
"introduci").

Any idea what I could have done wrong?
The document that I use to test displays correctly in Acrobat Reader.

- Jo



This e-mail and any attachments thereto may contain information which is 
confidential 
and/or protected by intellectual property rights and are intended for the sole 
use of the 
recipient(s) named above. Any use of the information contained herein 
(including, but 
not limited to, total or partial reproduction, communication or distribution in 
any form) 
by persons other then the designated recipient(s) is prohibited. If you have 
received this 
e-mail in error, please notify the sender either by telephone or by e-mail and 
delete the 
material from any computer. Thank you for your cooperation.

Dilys bvba
Nieuwe Stationsstraat 23
9160 Lokeren

tel +32 9 356 97 13
fax +32 9 353 90 11

mailto:[email protected]
http://www.dilys.be



------------------------------------------------------------------------------
The Palm PDK Hot Apps Program offers developers who use the
Plug-In Development Kit to bring their C/C++ apps to Palm for a share
of $1 Million in cash or HP Products. Visit us here for more details:
http://p.sf.net/sfu/dev2dev-palm
_______________________________________________
Podofo-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/podofo-users

Reply via email to