I have successfully gotten Nutch to index msword documents. If you go under
File>Properties, and under the "Custom" tab in MS Word, you can add some
properties to the file, sort of like HTML meta tags.

I have the msword parser, index-more and query-more plugins, as well as a
custom meta tag indexer/filter installed. My question is can Nutch read
document properties like the ones I described? Does it have the ability to
go that far in the document to extract the custom user-defined properties?

If so, was there anybody that successfully implemented this? If not, I would
imagine that we need to modify index-more/query-more plugins to do that. Can
someone confirm this?

Anyone know of a good place to start looking? Any help will be appreciated.

Cheers.

-- 
View this message in context: 
http://www.nabble.com/Indexing-msword-document-properties-tp21715700p21715700.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to