Thank you, Nick.  If I'm able to do it at the Tika level, I'll try to fold 
those back into the Extractor classes in POI.  If I need to modify POI... well, 
I'll start there and then wait for the next dev release.

Initial googling suggests that I _should_ be able to do this at the Tika level. 
 And I found POI's NumberFormatter in o.a.p.hwpf.converter, which will be quite 
useful in converting numbers to alpha/roman.

Cheers,

              Tim

-----Original Message-----
From: Nick Burch [mailto:[email protected]] 
Sent: Thursday, March 26, 2015 7:42 AM
To: POI Users List
Subject: Re: autonumbering paragraphs in doc and docx?

On Wed, 25 Mar 2015, Allison, Timothy B. wrote:
> Do we have examples of extracting autonumbered paragraph numbers in both 
> doc and docx?

Not sure if we do. I'm not sure, as I haven't looked recently, but I've a 
hunch that there will be a special control setting thingy to enable it. 
docx will probably cache the last number, not sure about if .doc will or 
not

> We have a request over on Tika-1440 to add this.  I'm pretty sure I 
> remember working through this in ppt or pptx.  Any pointers would be 
> great.

I'd suggest creating some test files, with a mixture of normal and 
numbered paragraphs. Have some numbered sets just be a single level, some 
multiple levels (eg 1, 1.1, 1.1.1), and some with different styles (eg 1, 
I, A). Save as both .doc and .docx. Unzip the docx, and see how it's done. 
Add in support for that. Then, look at the doc file format spec, know you 
know what the control elements are, then look at adding matching support 
to HWPF + interfaces for that

Best case is probably half a day, likely 2+ days work. Budget in some time 
to tidy up both HWPF and XWPF code in the areas you work on too...

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to