Re: Crawling localhost Webapps - regex- urfilter query

Tejas Patil Mon, 17 Dec 2012 21:19:48 -0800

Hi Rajani,

*"status 1 (db_unfetched): 1"* means that url [1] is NOT crawled.
(FYI: it is not interpreted as "db_unfetched - status is 1". The number 1
here indicates that there is 1 url in the crawldb with status as
db_unfetched.)


You said that there are no exceptions in the log file. Which log file did
you see ?
If you are running in the distributed mode, then you must see the hadoop
logs (on jobtracker) for the nutch jobs.

Also, can you send the entry of the url [1] from the crawldb ? The command
is:
*bin/nutch readdb <path to the crawldb> -url <url>*
*
*
If you are not able to get any output for above command, then get the dump
of whole crawldb using this command:
*bin/nutch readdb <path to the crawldb> -dump <output directory>**
*

thanks,
Tejas Patil

On Mon, Dec 17, 2012 at 8:51 PM, Rajani Maski <[email protected]> wrote:

> status 1 (db_unfetched): 1

Re: Crawling localhost Webapps - regex- urfilter query

Reply via email to