Jeremias , Thanks a lot ... That might be helpful, especially I want also to detect the number of the footnote. But how can I get the pattern "(<number>) in terms of the PDFBox language ?
Best regards , Hesham --------------------------------------------- Included message : > Not reliably, no, because the PDF is not tagged. Together with text > extraction you might be able to come up with some heuristics to identify > footnotes. Like looking for a pattern "(<number>) " at the beginning of > a paragraph, for example. HTH > > > On 26.10.2011 19:52:46 Hesham G. wrote: >> May be my question was not clear enough ... I meant is there a way to know >> that the current extracted part from the PDF page is the footnote section ? >> >> >> Best regards , >> Hesham >> >> >> --------------------------------------------- >> Included message : >> >> > I seee PDFBox (current trunk) extracting the footnote text correctly >> > from this PDF. (I just ran the org.apache.pdfbox.ExtractText tool). >> > >> > Mike McCandless >> > >> > http://blog.mikemccandless.com >> > >> > On Wed, Oct 26, 2011 at 8:25 AM, Hesham G. <[email protected]> wrote: >> >> Hello , >> >> >> >> Is there a way to detect the footnotes section in a PDF file ? >> >> Here is a sample 2-pages PDF with footnotes: >> >> http://www.4shared.com/document/Q03u9SMc/pdf_with_footnotes.html >> >> >> >> >> >> Best regards , >> >> Hesham >> >> >> > > > > > > Jeremias Maerki > >

