On Thu, 21 Feb 2008, João Ferreira wrote:
Hello im using WordExtractor to get text from .doc file but i get a lot of trash from indexes and pictures inside the documents, is there a way to get only the text from the file.
This is probably related to bug #44431. It seems word stores lots of its metadata in things that look almost like text
For now, I'd suggest just excluding a few things you can easily spot as not text
Nick
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
