Hi Ram,
I suppose you only want to extract the text(header, footer, comments , end
note, etc) and don't care page break.
Please see the sample code.
TextDocument
textdoc=(TextDocument)TextDocument.loadDocument("textExtractor.odt");
EditableTextExtractor extractorD =
EditableTextExtractor.newOdfEditableTextExtractor(textdoc);
String output = extractorD.getText();
System.out.println(output);
This code fragment will return all of the context except header and
footer.For content in footer and header, please reference.
Header header = textdoc.getHeader();
output =TextExtractor.getText(header.getOdfElement());
System.out.println(output);
Footer footer = textdoc.getFooter();
output =TextExtractor.getText(footer.getOdfElement());
System.out.println(output);
More about TextExtractor, please reference:
http://incubator.apache.org/odftoolkit/simple/document/cookbook/TextExtractor.html#Get%20Text
There is a demo about extracting text:
http://incubator.apache.org/odftoolkit/simple/demo/demo2.html
If you never use Simple API before,please reference this guide:
http://incubator.apache.org/odftoolkit/simple/gettingstartguide.html
2011/9/24 Ram Kane <[email protected]>
> Hi,
>
> I need to extract all text (header, footer, comments, endnote, etc) from an
> ODT document. I need to do it on a page by page basis. I'm aware that ODTs
> are basically structured by paragraphs and headings, but i'd like to know
> if
> there's a way to achieve what i need.
>
> Thanks a lot.
>
--
-Devin