Re: how to extract page titles

Jagadeesh N. Malakannavar Sun, 26 Aug 2012 20:27:04 -0700

I think you are correct. I checked many PDF's and there are no watermarks
to extract titles.


Thanks

On Tue, Aug 21, 2012 at 6:43 AM, Duane Nickull <
[email protected]> wrote:

> A title is not an item that can be deterministically accessed with
> accuracy IMO.  A best guess based on font size and positioning may be as
> good as is possible.
>
> We are running into the same issue with form captions.  It all depends on
> how the author marks up the original documents.  We (technoracle) have
> done some good work in this area with predictive analysis.
>
> Duane Nickull
> ***********************************
> Technoracle Advanced Systems Inc.
> Consulting and Contracting; Proven Results!
> i.  Neo4J, PDF, Java, LiveCycle ES, Flex, AIR, CQ5 & Mobile
> b. http://technoracle.blogspot.com
> t.  @duanechaos
> "Don't fear the Graph!  Embrace Neo4J"
>
>
>
>
>
>
> On 2012-08-20 10:32 AM, "Jagadeesh N. Malakannavar"
> <[email protected]> wrote:
>
> >Hi,
> >
> >I am looking for a techniques to extract page titles. For example, if PDF
> >has chapter1, chapter2 .... I want to list  chapter1, chapter2.
> >I may convert to few pages text and few others to html format
> >conditionally.
> >
> >--
> >
> >Thanks,
> >Jagadeesh N.Malakannavar
>
>
>


-- 

Thanks,
Jagadeesh N.Malakannavar

Re: how to extract page titles

Reply via email to