On Wed, 25 Mar 2015, Allison, Timothy B. wrote:
Do we have examples of extracting autonumbered paragraph numbers in both doc and docx?

Not sure if we do. I'm not sure, as I haven't looked recently, but I've a hunch that there will be a special control setting thingy to enable it. docx will probably cache the last number, not sure about if .doc will or not

We have a request over on Tika-1440 to add this. I'm pretty sure I remember working through this in ppt or pptx. Any pointers would be great.

I'd suggest creating some test files, with a mixture of normal and numbered paragraphs. Have some numbered sets just be a single level, some multiple levels (eg 1, 1.1, 1.1.1), and some with different styles (eg 1, I, A). Save as both .doc and .docx. Unzip the docx, and see how it's done. Add in support for that. Then, look at the doc file format spec, know you know what the control elements are, then look at adding matching support to HWPF + interfaces for that

Best case is probably half a day, likely 2+ days work. Budget in some time to tidy up both HWPF and XWPF code in the areas you work on too...

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to