Re: Nutch crawl nutch commands

feng lu Mon, 28 Oct 2013 08:30:44 -0700

Hi Laxmi

I check at code in bin/crawl script


echo "Indexing $CRAWL_ID on SOLR index -> $SOLRURL"
  $bin/nutch solrindex $commonOptions $SOLRURL -all -crawlId $CRAWL_ID

if what you say is correct, then that script will also ignore the bachID
and crawlID.

you can try a small test db and run bin/nutch script step by step.


On Mon, Oct 28, 2013 at 10:57 PM, A Laxmi <[email protected]> wrote:

> Hi feng -
>
> I tried but its ignoring the batch ID and crawlID for some reason.
>
>
>
>
> On Mon, Oct 28, 2013 at 10:00 AM, feng lu <[email protected]> wrote:
>
> > Hi
> >
> > please check the usage of solrindex command
> >
> > $ bin/nutch solrindex
> > Usage: SolrIndexerJob <solr url> (<batchId> | -all | -reindex) [-crawlId
> > <id>]
> >
> >
> >
> > On Mon, Oct 28, 2013 at 9:10 PM, A Laxmi <[email protected]> wrote:
> >
> > > Hi,
> > >
> > > For Nutch 2.2.1, I am aware of two crawl commands/scripts that came out
> > of
> > > the box with nutch -
> > >
> > > (1) bin/nutch (step by step),
> > > (2) bin/crawl (all in one)
> > >
> > > I know how to specify a crawl ID for `bin/crawl` command. Similarly,
> how
> > to
> > > specify a crawl ID for `bin/nutch` command?
> > >
> > > The reason I am asking is, I ran a large crawl job using `all-in-one
> > crawl
> > > command "bin/crawl"` specifying a crawl ID, it broke while indexing in
> > Solr
> > > for 9th crawl iteration. Now, I just want to run one step `"bin/nutch
> > > solrindex"` command for just that interrupted 9th iteration to complete
> > the
> > > solr indexing. How should I specify crawlID in "`bin/nutch solrindex`"
> > > command? What is the syntax?
> > >
> > > I have all the crawl data stored in a HBase table "webpage_test"
> > >
> >
> >
> >
> > --
> > Don't Grow Old, Grow Up... :-)
> >
>



-- 
Don't Grow Old, Grow Up... :-)

Re: Nutch crawl nutch commands

Reply via email to