Nutch crawl nutch commands

A Laxmi Mon, 28 Oct 2013 06:12:19 -0700

Hi,

For Nutch 2.2.1, I am aware of two crawl commands/scripts that came out of
the box with nutch -


(1) bin/nutch (step by step),
(2) bin/crawl (all in one)

I know how to specify a crawl ID for `bin/crawl` command. Similarly, how to
specify a crawl ID for `bin/nutch` command?

The reason I am asking is, I ran a large crawl job using `all-in-one crawl
command "bin/crawl"` specifying a crawl ID, it broke while indexing in Solr
for 9th crawl iteration. Now, I just want to run one step `"bin/nutch
solrindex"` command for just that interrupted 9th iteration to complete the
solr indexing. How should I specify crawlID in "`bin/nutch solrindex`"
command? What is the syntax?

I have all the crawl data stored in a HBase table "webpage_test"

Nutch crawl nutch commands

Reply via email to