Hi, On Feb 17, 2008 2:31 PM, Jukka Zitting <[EMAIL PROTECTED]> wrote: > I looked at enhancing the structured parsing abilities of the MS > Office parsers, but except for Excel I don't think it makes sense to > add much new stuff there until the relevant POI libraries are more > feature-rich. I've just contacted the POI team about getting some of > their scratchpad code released so we could leverage it in Tika.
It turned out that they're already releasing the scratchpad code as a separate Maven artifact, so for now I've simply added that as another normal dependency and replaced our custom Word and PowerPoint parsing code with text extractors from POI. I'll be looking at adding more fine-grained parsing based on existing POI features. BR, Jukka Zitting
