Re: Nutch crawl nutch commands

A Laxmi Mon, 28 Oct 2013 08:01:27 -0700

Hi feng -

I tried but its ignoring the batch ID and crawlID for some reason.





On Mon, Oct 28, 2013 at 10:00 AM, feng lu <[email protected]> wrote:

> Hi
>
> please check the usage of solrindex command
>
> $ bin/nutch solrindex
> Usage: SolrIndexerJob <solr url> (<batchId> | -all | -reindex) [-crawlId
> <id>]
>
>
>
> On Mon, Oct 28, 2013 at 9:10 PM, A Laxmi <[email protected]> wrote:
>
> > Hi,
> >
> > For Nutch 2.2.1, I am aware of two crawl commands/scripts that came out
> of
> > the box with nutch -
> >
> > (1) bin/nutch (step by step),
> > (2) bin/crawl (all in one)
> >
> > I know how to specify a crawl ID for `bin/crawl` command. Similarly, how
> to
> > specify a crawl ID for `bin/nutch` command?
> >
> > The reason I am asking is, I ran a large crawl job using `all-in-one
> crawl
> > command "bin/crawl"` specifying a crawl ID, it broke while indexing in
> Solr
> > for 9th crawl iteration. Now, I just want to run one step `"bin/nutch
> > solrindex"` command for just that interrupted 9th iteration to complete
> the
> > solr indexing. How should I specify crawlID in "`bin/nutch solrindex`"
> > command? What is the syntax?
> >
> > I have all the crawl data stored in a HBase table "webpage_test"
> >
>
>
>
> --
> Don't Grow Old, Grow Up... :-)
>

Re: Nutch crawl nutch commands

Reply via email to