Re: Detecting the footnotes in a PDF

Jeremias Maerki Wed, 26 Oct 2011 11:09:27 -0700

Not reliably, no, because the PDF is not tagged. Together with text
extraction you might be able to come up with some heuristics to identify
footnotes. Like looking for a pattern "(<number>) " at the beginning of
a paragraph, for example. HTH



On 26.10.2011 19:52:46 Hesham G. wrote:
> May be my question was not clear enough ... I meant is there a way to know 
> that the current extracted part from the PDF page is the footnote section ?
> 
> 
> Best regards ,
> Hesham
> 
> 
> ---------------------------------------------
> Included message :
> 
> > I seee PDFBox (current trunk) extracting the footnote text correctly
> > from this PDF.  (I just ran the org.apache.pdfbox.ExtractText tool).
> > 
> > Mike McCandless
> > 
> > http://blog.mikemccandless.com
> > 
> > On Wed, Oct 26, 2011 at 8:25 AM, Hesham G. <[email protected]> wrote:
> >> Hello ,
> >>
> >> Is there a way to detect the footnotes section in a PDF file ?
> >> Here is a sample 2-pages PDF with footnotes:
> >> http://www.4shared.com/document/Q03u9SMc/pdf_with_footnotes.html
> >>
> >>
> >> Best regards ,
> >> Hesham
> >>
> >




Jeremias Maerki

Re: Detecting the footnotes in a PDF

Reply via email to