Re: Help: Crawl returns no URLs

Anurag Mon, 07 Mar 2011 08:48:21 -0800

How u know that u are not using urlfilter.txt?  Fetching "0" records tells
that no url has been selected or url mentioned is wrong one....try to find
the error in those all such files where such things about domain name is
mentioned as for e.g. , /nutch-1.0/conf/regex-urlfilter.txt
nutch-1.0/conf/prefix-urlfilter.txt
nutch-1.0/conf/crawl-urlfilter.txt


try these....
On Mon, Mar 7, 2011 at 8:49 AM, chidu r [via Lucene] <
[email protected]> wrote:

> Hi all
>
> I am trying to setup nutch 1.2 on Hadoop and used the instructions at
> http://wiki.apache.org/nutch/NutchHadoopTutorial, it has been very useful.
>
>
> However, I find that when I execute the command:
>
> $bin/nutch crawl urls -dir crawl -depth 4 -topN 50
>
> The crawler stops at the generator stage with the message:
> 2011-03-06 17:23:49,538 WARN  crawl.Generator - Generator: 0 records
> selected for fetching, exiting ...
>
> I have configured the following plugins in nutch-site.xml
>  
> protocol-http|parse-(text|html|js)|urlnormalizer-(pass|regex|basic)|urlfilter-regex|index-(basic|anchor)
>
>
> I am not using crawl-urlfilter.txt or regex-urlfilter.txt tp filter URLs. I
>
> initiated the crawl with 10 seed urls from popular sites on internet.
>
> Any pointers to what I am missing here?
>
>
> regards
> Chidu
>
>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/Help-Crawl-returns-no-URLs-tp2644587p2644587.html
>  To start a new topic under Nutch - User, email
> [email protected]
> To unsubscribe from Nutch - User, click 
> here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=603147&code=YW51cmFnLml0LmpvbGx5QGdtYWlsLmNvbXw2MDMxNDd8LTIwOTgzNDQxOTY=>.
>
>



-- 
Kumar Anurag


-----
Kumar Anurag

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Help-Crawl-returns-no-URLs-tp2644587p2645916.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Help: Crawl returns no URLs

Reply via email to