Hi,
i install nutch for the first time and i want to index word and excel
document
even i change  the nutch-default.xml :
<property>
  <name>plugin.includes</name>
<value>protocol-http|urlfilter-regex|parse-(text|html|js|pdf|swf|
msword|mspowerpoint|rss)|index-(basic|more)|query-(basic|site|url|
more)|subcollection|clustering-carrot2|summary-basic|scoring-opic</value>

    <description>Regular expression naming plugin directory names to
  include.  Any plugin not matching this expression is excluded.
  In any case you need at least include the nutch-extensionpoints plugin. By
  default Nutch includes crawling just HTML and plain text via HTTP,
  and basic indexing and search plugins. In order to use HTTPS please enable
  protocol-httpclient, but be aware of possible intermittent problems with
the
  underlying commons-httpclient library.
  </description>
</property>
enven this modification i still have the following message
Generator: 0 records selected for fetching, exiting ...
Stopping at depth=0 - no more URLs to fetch.
No URLs to fetch - check your seed list and URL filters.
crawl finished: crawl
plz some one can help me its urgent 
-- 
View this message in context: 
http://www.nabble.com/indexing-word-file-tf4819567.html#a13788425
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to