if you try

bin/nutch

without any arguments and options, it will show you

Usage: nutch [-core] COMMAND
where COMMAND is one of:
...
  parse             parse a segment's pages
  invertlinks       create a linkdb from parsed segments
  index             run the indexer on parsed segments and linkdb

there is no need to redo the whole crawl.
only reparse and reindex.
may be you have to delete some of the directories in the segments you want
to reparse(i guess parse_data and parse_text)

reinh...@thord:>ls crawl/segments/20091021095928/
content  crawl_fetch  crawl_generate  crawl_parse  parse_data  parse_text

regards
reinhard

sprabhu_PN schrieb:
> We have added a few plug-ins such as date parsing plug-in that get exercised
> during a Nutch crawl and update a field in each index record. Now we find
> that we need to improve the plug-in and re-run it. Is the only option to
> crawl the whole index once again ? Is there any way we can do a recrawl
> which will just exercise newer versions of plug-ins and take less time to do
> it ?
>
> Thanks in advance.
>
> Regards
> Shreekanth Prabhu
>   

Reply via email to