Hi,
i install nutch for the first time and i want to index word and excel
document
even i change the nutch-default.xml :
<property>
<name>plugin.includes</name>
<value>protocol-http|urlfilter-regex|parse-(text|html|js|pdf|swf|
msword|mspowerpoint|rss)|index-(basic|more)|query-(basic|site|url|
more)|subcollection|clustering-carrot2|summary-basic|scoring-opic</value>
<description>Regular expression naming plugin directory names to
include. Any plugin not matching this expression is excluded.
In any case you need at least include the nutch-extensionpoints plugin. By
default Nutch includes crawling just HTML and plain text via HTTP,
and basic indexing and search plugins. In order to use HTTPS please enable
protocol-httpclient, but be aware of possible intermittent problems with
the
underlying commons-httpclient library.
</description>
</property>
enven this modification i still have the following message
Generator: 0 records selected for fetching, exiting ...
Stopping at depth=0 - no more URLs to fetch.
No URLs to fetch - check your seed list and URL filters.
crawl finished: crawl
plz some one can help me its urgent
--
View this message in context:
http://www.nabble.com/indexing-word-file-tf4819567.html#a13788425
Sent from the Nutch - User mailing list archive at Nabble.com.