> types to be searchable if at all possible. The wiki > says most file types are disabled by default, but they > can be turned on by changing conf/nutch-site.xml. > Unfortunately there is no documentation that I can > find for this file... any ideas how to do it, or > sample xml that somebody could send over?
Simply add the plugin name in the plugin.includes property. For instance, to activate word, powerpoint and excel parsing, just add in this property : ... |parse-msexcel|parse-mspowerpoint|parse-msword| ... or in a shorter syntax : ... |parse-ms(excel|powerpoint|word)| ... This is described on the Wiki in the page : http://wiki.apache.org/nutch/WritingPluginExample Section "Getting Nutch to Use Your Plugin" Regards Jérôme -- http://motrech.free.fr/ http://www.frutch.org/
