Re: Help: Crawl returns no URLs

bhawna singh Mon, 07 Mar 2011 11:02:27 -0800

Did you inject the urls in the first command?
Copy your file containing in a directory ex. urls
and then inject the URls using the inject command


ex.
bin/nutch inject db urls

where urls is hte name of the directory containing the URL list file

Hope this helps.
Thanks,
Bhawna

On Sun, Mar 6, 2011 at 7:18 PM, chidu r <[email protected]> wrote:

> Hi all
>
> I am trying to setup nutch 1.2 on Hadoop and used the instructions at
> http://wiki.apache.org/nutch/NutchHadoopTutorial, it has been very useful.
>
> However, I find that when I execute the command:
>
> $bin/nutch crawl urls -dir crawl -depth 4 -topN 50
>
> The crawler stops at the generator stage with the message:
> 2011-03-06 17:23:49,538 WARN  crawl.Generator - Generator: 0 records
> selected for fetching, exiting ...
>
> I have configured the following plugins in nutch-site.xml
>
>  
> protocol-http|parse-(text|html|js)|urlnormalizer-(pass|regex|basic)|urlfilter-regex|index-(basic|anchor)
>
> I am not using crawl-urlfilter.txt or regex-urlfilter.txt tp filter URLs. I
> initiated the crawl with 10 seed urls from popular sites on internet.
>
> Any pointers to what I am missing here?
>
>
> regards
> Chidu
>

Re: Help: Crawl returns no URLs

Reply via email to