You are under the mistaken impression that he PDF objects here are the same as 
the graphic objects on a page.  They are NOT.

Grab a copy of ISO 32000-1, the PDF standard, and read up…

Leonard

From: Dara Javaherian 
<[email protected]<mailto:[email protected]>>
To: 
"[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: [Podofo-users] Extract text/lines + coords from a pdf

I have been trying for a while to use this library to extract text and lines 
(with their respective coordinates). But I have no way to do this. Attached is 
the pdf I am testing it on.

This is what I have so far:

    PdfVecObjects *x = new PdfVecObjects();



    PdfParser parser(x, filename);



    parser.ParseFile("hello.pdf");

    for (TIVecObjects obj = x->begin(); obj != x->end(); obj++){



        PdfObject * a = x->RemoveObject(obj);



        // THIS IS MY PROBLEM VVVVVVVVVV



        cout << a->Reference().ToString() << endl;



    }

However, this only gives me incredibly basic information (seems to be object 
number)

1 0 R
2 0 R
3 0 R
4 0 R
5 0 R
6 0 R
7 0 R
8 0 R
9 0 R
10 0 R
11 0 R

I want to print out the coordinates of an object, and if it's a line or text. 
If it's text, I would also like to be able to print out the text. Does anyone 
that knows this library better than I do know what I could do to fix this?

--
Dara Javaherian

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Podofo-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/podofo-users

Reply via email to