Re: Correct use of WordExtractor

Nick Burch Thu, 21 Feb 2008 08:27:20 -0800

On Thu, 21 Feb 2008, João Ferreira wrote:

Hello im using WordExtractor to get text from .doc file but i get a lotof trash from indexes and pictures inside the documents, is there a wayto get only the text from the file.

This is probably related to bug #44431. It seems word stores lots of itsmetadata in things that look almost like text

For now, I'd suggest just excluding a few things you can easily spot asnot text


Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Correct use of WordExtractor

Reply via email to