Are you just looking to extract text from word documents? Then HWPF probably will do the trick. I am not familiar with Clean Content SDK so can't comment on that. Why don't you give HWPF a try. Some of the JUnit testcases already operate on extracting text, may be you can have a look at them.
-Raghu On Fri, Mar 14, 2008 at 9:15 PM, Ylva Degerfeldt <[EMAIL PROTECTED]> wrote: > Hi everyone, > > Maybe I shouldn't ask this on this mailing list but I'm about to start > on a project where I'm going to extract different keywords from Word > files in the most common formats (like 97 - 2003) and I'd like to know > before I start if using POI-HWPF really is the best way to do that. > > The thing is.. I think I have found another way to do it: Oracle's > Clean Content SDK. Has anyone tried this? I was just wondering if it's > worth the time and effort to dig deeper into that or if I should > simply decide that POI-HWPF is the best solution and forget about the > other one. (I have a bit of a tight schedule so that's why I'm > asking.) > > Thanks in advance, > > Ylva > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >
