Re: Detecting the footnotes in a PDF

Hesham G. Wed, 26 Oct 2011 12:48:55 -0700

Jeremias ,

Thanks a lot ... That might be helpful, especially I want also to detect the 
number of the footnote. But how can I get the pattern "(<number>) in terms of 
the PDFBox language  ?



Best regards ,
Hesham

---------------------------------------------
Included message :


> Not reliably, no, because the PDF is not tagged. Together with text
> extraction you might be able to come up with some heuristics to identify
> footnotes. Like looking for a pattern "(<number>) " at the beginning of
> a paragraph, for example. HTH
> 
> 
> On 26.10.2011 19:52:46 Hesham G. wrote:
>> May be my question was not clear enough ... I meant is there a way to know 
>> that the current extracted part from the PDF page is the footnote section ?
>> 
>> 
>> Best regards ,
>> Hesham
>> 
>> 
>> ---------------------------------------------
>> Included message :
>> 
>> > I seee PDFBox (current trunk) extracting the footnote text correctly
>> > from this PDF.  (I just ran the org.apache.pdfbox.ExtractText tool).
>> > 
>> > Mike McCandless
>> > 
>> > http://blog.mikemccandless.com
>> > 
>> > On Wed, Oct 26, 2011 at 8:25 AM, Hesham G. <[email protected]> wrote:
>> >> Hello ,
>> >>
>> >> Is there a way to detect the footnotes section in a PDF file ?
>> >> Here is a sample 2-pages PDF with footnotes:
>> >> http://www.4shared.com/document/Q03u9SMc/pdf_with_footnotes.html
>> >>
>> >>
>> >> Best regards ,
>> >> Hesham
>> >>
>> >
> 
> 
> 
> 
> Jeremias Maerki
> 
>

Re: Detecting the footnotes in a PDF

Reply via email to