I think you are correct. I checked many PDF's and there are no watermarks to extract titles.
Thanks On Tue, Aug 21, 2012 at 6:43 AM, Duane Nickull < [email protected]> wrote: > A title is not an item that can be deterministically accessed with > accuracy IMO. A best guess based on font size and positioning may be as > good as is possible. > > We are running into the same issue with form captions. It all depends on > how the author marks up the original documents. We (technoracle) have > done some good work in this area with predictive analysis. > > Duane Nickull > *********************************** > Technoracle Advanced Systems Inc. > Consulting and Contracting; Proven Results! > i. Neo4J, PDF, Java, LiveCycle ES, Flex, AIR, CQ5 & Mobile > b. http://technoracle.blogspot.com > t. @duanechaos > "Don't fear the Graph! Embrace Neo4J" > > > > > > > On 2012-08-20 10:32 AM, "Jagadeesh N. Malakannavar" > <[email protected]> wrote: > > >Hi, > > > >I am looking for a techniques to extract page titles. For example, if PDF > >has chapter1, chapter2 .... I want to list chapter1, chapter2. > >I may convert to few pages text and few others to html format > >conditionally. > > > >-- > > > >Thanks, > >Jagadeesh N.Malakannavar > > > -- Thanks, Jagadeesh N.Malakannavar

