Hi,
I was going through past threads and found the problem i face has been faced
by many others. But mostly either it has been ignored or has been
unresolved.

I use Nutch 1.1. My crawl has been working fine mostly (though i am still
getting a hang of how all the screws work).

I have a particular url which I generally need to crawl more than others
(its a site-map). So i cleaned up my Solr index of the domain (to re-start.
My index had lot of 404 urls which were not getting cleaned up) i.e. deleted
all the docs of the domain of the url i need to fetch.

I deleted everything from the crawl folder, so everything is fresh.

I start off a crawl with depth = 1 and topN = 1000 and noOfThreads = 10.

It fetched lot of site in the index (though not everythin). So i repeated
the same crawl command another 7-8 times. The docs in the index kept on
increasing.

But then this final time when i try running the crawl it fails at depth 0,
with the message

Stopping at depth=0 - no more URLs to fetch.
    No URLs to fetch - check your seed list and URL filters.
    crawl finished:


I cleaned up everythin again and started from scratch, crawlingstarted off
again only to fail again after a few inital successful crawls.

Awaiting your reply.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/No-more-urls-to-fetch-tp3122462p3122462.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to