Hi, Enter the following the in the nutch-site.xml.
<nutch-conf> <property> <name>plugin.includes</name> <value>nutch-extensionpoints|protocol-http|urlfilter-regex|parse-(text|h tml|js|pdf|msword|zip|mspowerpoint|msexcel)|index-basic|query-(basic|sit e|url)</value> <description>Regular expression naming plugin directory names to include. Any plugin not matching this expression is excluded. In any case you need at least include the nutch-extensionpoints plugin. By default Nutch includes crawling just HTML and plain text via HTTP, and basic indexing and search plugins. </description> </property> </nutch-conf> Also in the nutch-conf.xml enter the follwing <property> <name>file.content.limit</name> <value>-1</value> <description>The length limit for downloaded content, in bytes. If this value is larger than zero, content longer than it will be truncated; otherwise (zero or negative), no truncation at all. </description> </property> <property> <name>plugin.folders</name> <value>your plugin folder location </value> <description>Directories where nutch plugins are located. Each element may be a relative or absolute path. If absolute, it is used as is. If relative, it is searched for on the classpath.</description> </property> -Cherian Thomas -----Original Message----- From: bob knob [mailto:[EMAIL PROTECTED] Sent: Tuesday, April 11, 2006 9:27 PM To: [email protected] Subject: Enabling different file types Hi, it's me again, If I'm going to use Nutch, I need xls, ppt, & doc file types to be searchable if at all possible. The wiki says most file types are disabled by default, but they can be turned on by changing conf/nutch-site.xml. Unfortunately there is no documentation that I can find for this file... any ideas how to do it, or sample xml that somebody could send over? Thanks, Bob __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
