Re: Nutch crawl nutch commands

A Laxmi Mon, 28 Oct 2013 09:13:59 -0700

Hey Talat!!

Is there anyway I can specify the batchID as well in the following command?


bin/nutch solrindex <solr url> -all -crawlId <crawl id>


On Mon, Oct 28, 2013 at 11:51 AM, Talat UYARER <[email protected]>wrote:

> It is right Laxmi. We dont have SolrIndexerJob command :)
> you can use SolrIndexerJob with nutch shell script. May be you can use
> Like this:
>
> bin/nutch solrindex <solr url> -all -crawlId <crawl id>
>
> Talat
>
> 28-10-2013 17:46 tarihinde, A Laxmi yazdı:
>
>  It says SolrIndexerJob: command not found
>>
>> when I followed this syntax
>>
>> SolrIndexerJob <solr url> (<batchId> | -all | -reindex) [-crawlId <id>]
>>
>>
>>
>>
>>
>> On Mon, Oct 28, 2013 at 11:29 AM, feng lu <[email protected]> wrote:
>>
>>  Hi Laxmi
>>>
>>> I check at code in bin/crawl script
>>>
>>> echo "Indexing $CRAWL_ID on SOLR index -> $SOLRURL"
>>>    $bin/nutch solrindex $commonOptions $SOLRURL -all -crawlId $CRAWL_ID
>>>
>>> if what you say is correct, then that script will also ignore the bachID
>>> and crawlID.
>>>
>>> you can try a small test db and run bin/nutch script step by step.
>>>
>>>
>>> On Mon, Oct 28, 2013 at 10:57 PM, A Laxmi <[email protected]>
>>> wrote:
>>>
>>>  Hi feng -
>>>>
>>>> I tried but its ignoring the batch ID and crawlID for some reason.
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Oct 28, 2013 at 10:00 AM, feng lu <[email protected]> wrote:
>>>>
>>>>  Hi
>>>>>
>>>>> please check the usage of solrindex command
>>>>>
>>>>> $ bin/nutch solrindex
>>>>> Usage: SolrIndexerJob <solr url> (<batchId> | -all | -reindex)
>>>>>
>>>> [-crawlId
>>>
>>>> <id>]
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Oct 28, 2013 at 9:10 PM, A Laxmi <[email protected]>
>>>>>
>>>> wrote:
>>>
>>>>
>>>>>  Hi,
>>>>>>
>>>>>> For Nutch 2.2.1, I am aware of two crawl commands/scripts that came
>>>>>>
>>>>> out
>>>
>>>> of
>>>>>
>>>>>> the box with nutch -
>>>>>>
>>>>>> (1) bin/nutch (step by step),
>>>>>> (2) bin/crawl (all in one)
>>>>>>
>>>>>> I know how to specify a crawl ID for `bin/crawl` command. Similarly,
>>>>>>
>>>>> how
>>>>
>>>>> to
>>>>>
>>>>>> specify a crawl ID for `bin/nutch` command?
>>>>>>
>>>>>> The reason I am asking is, I ran a large crawl job using `all-in-one
>>>>>>
>>>>> crawl
>>>>>
>>>>>> command "bin/crawl"` specifying a crawl ID, it broke while indexing
>>>>>>
>>>>> in
>>>
>>>> Solr
>>>>>
>>>>>> for 9th crawl iteration. Now, I just want to run one step `"bin/nutch
>>>>>> solrindex"` command for just that interrupted 9th iteration to
>>>>>>
>>>>> complete
>>>
>>>> the
>>>>>
>>>>>> solr indexing. How should I specify crawlID in "`bin/nutch
>>>>>>
>>>>> solrindex`"
>>>
>>>> command? What is the syntax?
>>>>>>
>>>>>> I have all the crawl data stored in a HBase table "webpage_test"
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Don't Grow Old, Grow Up... :-)
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Don't Grow Old, Grow Up... :-)
>>>
>>>
>>
>

Re: Nutch crawl nutch commands

Reply via email to