On 9/22/06, Trym B. Asserson <[EMAIL PROTECTED]> wrote:

Any other suggestions? Tomi, you said you'd had difficulties too with
certain MS documents, did you manage to find a work-around or did you
just have to ignore these documents? So far we've only concentrated on
using the plugins in Nutch 0.8 as they're provided, so we have no
experience with OO/UNO. Given that POI seems to deliver reasonably good
parsing features for MS formats, we're a bit reluctant to throw it out
just yet.

No, I haven't found a work-around yet: it seemed too much work at the
moment. Right now I'm thinking it may not be necessairy to dump POI in
favour of UNO (although I believe it would be better in the long
term): maybe it would be possible to work arround the exceptions and
still get (at least) most of the text content.

I'll probably have a look at it one of these days, although I'm a bit
sceptical: wouldn't the original plugin authors have already fixed it
if they could help it?

t.n.a.

Reply via email to