On Tue, Sep 27, 2011 at 10:38 PM, Devin Han <[email protected]> wrote: > > > 2011/9/26 Ram Kane <[email protected]> >> >> I've tried that. The problem is that it works on a document level >> >> I need to be able to extract content for a given page. > > Does it make sense to extract content by paragraph?
Only if i could associate those paragraphs to their corresponding page number. But i think that is the whole problem (parsing the document as a series of pages) :/. >> >> Thx a lot for the code though. >> >> >> On Mon, Sep 26, 2011 at 2:46 AM, Devin Han <[email protected]> wrote: >> > Hi Ram, >> > >> > I suppose you only want to extract the text(header, footer, comments , >> > end >> > note, etc) and don't care page break. >> > Please see the sample code. >> > >> > TextDocument >> > textdoc=(TextDocument)TextDocument.loadDocument("textExtractor.odt"); >> > EditableTextExtractor extractorD = >> > EditableTextExtractor.newOdfEditableTextExtractor(textdoc); >> > String output = extractorD.getText(); >> > System.out.println(output); >> > >> > This code fragment will return all of the context except header and >> > footer.For content in footer and header, please reference. >> > Header header = textdoc.getHeader(); >> > output =TextExtractor.getText(header.getOdfElement()); >> > System.out.println(output); >> > >> > Footer footer = textdoc.getFooter(); >> > output =TextExtractor.getText(footer.getOdfElement()); >> > System.out.println(output); >> > > > > > -- > -Devin >
