On Thu, 21 Feb 2008, João Ferreira wrote:
Hello im using WordExtractor to get text from .doc file but i get a lot of trash from indexes and pictures inside the documents, is there a way to get only the text from the file.

This is probably related to bug #44431. It seems word stores lots of its metadata in things that look almost like text

For now, I'd suggest just excluding a few things you can easily spot as not text

Nick
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to