I parse a pdf file with Chinese text "我我我", using podofo-0.9.1 example
ContentParser.
The result is:
=============================================
<</Type/XRef/DecodeParms<</Columns 4/Predictor
12>>/Filter/FlateDecode/ID[<63EE8
B4DF319CC4C9DCB31874AAAFE26><076E0418816AAB40A6CA39C41BCE2178>]/Index[ 15
21]/In
fo 14 0 R/Length 67/Prev 46213/Root 16 0 R/Size 36/W[ 1 2 1]>>
Processing page      1...           1 Keyword: BT
           2 Variant: /P
           3 Variant: <<
/MCID 0
>>
           4 Keyword: BDC
           5 Variant: /CS0
           6 Keyword: cs
           7 Variant: 0
           8 Keyword: scn
           9 Variant: /C2_0
          10 Variant: 1
          11 Keyword: Tf
          12 Variant: 10.560000
          13 Variant: 0
          14 Variant: 0
          15 Variant: 10.560000
          16 Variant: 90
          17 Variant: 758.280000
          18 Keyword: Tm
          19 Variant: <184118411841>
          20 Keyword: Tj
          21 Variant: /TT0
          22 Variant: 1
          23 Keyword: Tf
          24 Variant: 2.989000
          25 Variant: 0
          26 Keyword: Td
          27 Variant: ( )
          28 Keyword: Tj
          29 Keyword: EMC
          30 Keyword: ET
           12 keywords,           18 variants - page ok
=============================================
I call "我我我" strToExtract.

The utf16be code of strToExtract is "FE FF 62 11 62 11 62 11", but as you
know "19 Variant: <184118411841>" above is correspond with strToExtract.
I don't know the correlation between "6211" and "1841".


I can get the correct characters szTest = "<FEFF621162116211>", using the
following code.
And "<184118411841>"is not what I want.

wstring func(char* szTest)
{
        std::vector<char> m_vecBuffer;
//        char szTest[]="FEFF621162116211";
        for (int i = 0; i < sizeof(szTest); i++)
        {
            m_vecBuffer.push_back(szTest[i]);
        }

        PdfString string;
        string.SetHexData( m_vecBuffer.size() ? &(m_vecBuffer[0]) : "",
m_vecBuffer.size(), NULL);
        return string.GetStringW();
}

Sorry for my poor English!
Any suggestion?

Best Wishes!
------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
Podofo-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/podofo-users

Reply via email to