You are under the mistaken impression that he PDF objects here are the same as
the graphic objects on a page. They are NOT.
Grab a copy of ISO 32000-1, the PDF standard, and read up…
Leonard
From: Dara Javaherian
<[email protected]<mailto:[email protected]>>
To:
"[email protected]<mailto:[email protected]>"
<[email protected]<mailto:[email protected]>>
Subject: [Podofo-users] Extract text/lines + coords from a pdf
I have been trying for a while to use this library to extract text and lines
(with their respective coordinates). But I have no way to do this. Attached is
the pdf I am testing it on.
This is what I have so far:
PdfVecObjects *x = new PdfVecObjects();
PdfParser parser(x, filename);
parser.ParseFile("hello.pdf");
for (TIVecObjects obj = x->begin(); obj != x->end(); obj++){
PdfObject * a = x->RemoveObject(obj);
// THIS IS MY PROBLEM VVVVVVVVVV
cout << a->Reference().ToString() << endl;
}
However, this only gives me incredibly basic information (seems to be object
number)
1 0 R
2 0 R
3 0 R
4 0 R
5 0 R
6 0 R
7 0 R
8 0 R
9 0 R
10 0 R
11 0 R
I want to print out the coordinates of an object, and if it's a line or text.
If it's text, I would also like to be able to print out the text. Does anyone
that knows this library better than I do know what I could do to fix this?
--
Dara Javaherian
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Podofo-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/podofo-users