On Thu, 20 Jan 2011, Ricardo Quintas wrote:
I'm trying to read the table of contents from doc and docx files. I can extract all text from the documents, but can't find a way to read the table of contents of the document, or at least find the paragraphs with 'headings' style. Is there any way to achieve this?
For HWPF, I'd expect the text to just be in regular text runs, but with some flags on it that might not be exposed. Can you find the text you'd expect as part of the regular text extraction?
For XWPF, I'd suggest you unzip a sample .docx file and see where the ToC ends up in it. You should be able to locate the appropriate objects from that. Do please send in patches if you end up coding up some suitable support for it!
Nick --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
