PDF DOES support rich semantic structure including all of things listed below 
(ISO 32000-1:2008, 14.7, 14.8 and 14.9). HOWEVER, it is optional and therefore 
many PDF documents do not contain the necessary elements.   And, as pointed 
out, without the presence of such elements already in the PDF - the best you 
can do is GUESS.

-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of 
[email protected]
Sent: Thursday, January 28, 2010 7:04 AM
To: amit aggarwal
Cc: [email protected]
Subject: Re: [poppler] Extract pdf

Hi,

I think PDF is a page description language and defines
nothing for semantic structure; how to store the titles
of section, subsection, figure and tables. Therfore, I
guess, poppler cannot extract - because, PDF does not have.

Is there any reliable framework defining such and your
target documentations follow?

Regards,
mpsuzuki

On Thu, 28 Jan 2010 17:23:17 +0530
amit aggarwal <[email protected]> wrote:

>Hi All,
>
>I want to extract the following inforamaton for pdf
>1) All Chapter Section and Subsection titles,
>2)  name of the Figures and tables
>
>Can any one plz help me for the same ?
>
>-- 
>Thanks
>Amit Aggarwal
>
_______________________________________________
poppler mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/poppler
_______________________________________________
poppler mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/poppler

Reply via email to