Re: Is there a way to extract text on a page basis from odt ?

Devin Han Sun, 25 Sep 2011 22:46:32 -0700

Hi Ram,

I suppose you only want to extract the text(header, footer, comments , end
note, etc) and don't care page break.
Please see the sample code.


       TextDocument
textdoc=(TextDocument)TextDocument.loadDocument("textExtractor.odt");
       EditableTextExtractor extractorD =
EditableTextExtractor.newOdfEditableTextExtractor(textdoc);
       String output = extractorD.getText();
       System.out.println(output);

This code fragment will return all of the context except header and
footer.For content in footer and header, please reference.
            Header header = textdoc.getHeader();
            output =TextExtractor.getText(header.getOdfElement());
            System.out.println(output);

            Footer footer = textdoc.getFooter();
            output =TextExtractor.getText(footer.getOdfElement());
            System.out.println(output);

More about TextExtractor, please reference:
http://incubator.apache.org/odftoolkit/simple/document/cookbook/TextExtractor.html#Get%20Text
There is a demo about extracting text:
http://incubator.apache.org/odftoolkit/simple/demo/demo2.html
If you never use Simple API before,please reference this guide:
http://incubator.apache.org/odftoolkit/simple/gettingstartguide.html

2011/9/24 Ram Kane <[email protected]>

> Hi,
>
> I need to extract all text (header, footer, comments, endnote, etc) from an
> ODT document. I need to do it on a page by page basis. I'm aware that ODTs
> are basically structured by paragraphs and headings, but i'd like to know
> if
> there's a way to achieve what i need.
>
> Thanks a lot.
>



-- 
-Devin

Re: Is there a way to extract text on a page basis from odt ?

Reply via email to