Dear Markus
Thanks for your Help.I hope it will solve my problem.Thanks a lot.


On Wednesday 24 February 2016 06:12 PM, Markus Jelsma wrote:
Ah forget about it, you are on 2.x i read in the next message. But i think it 
also has a freegen tool.
Markus

-----Original message-----
From:Markus Jelsma <[email protected]>
Sent: Wednesday 24th February 2016 13:41
To: [email protected]
Subject: RE: recrawling of specific URLS

Hi - easiest method is to use the freegen tool. But if you really want 
homepages, not just domain roots, you can use the hostdb with freegen for it.

# Update the hostdb
bin/nutch updatehostdb -hostdb crawl/hostdb -crawldb crawl/crawldb/

# Get list of homepages for each host
bin/nutch readhostdb crawl/hostdb/ output -dumpHomepages

Then use freegen.

Markus
-----Original message-----
From:harsh <[email protected]>
Sent: Wednesday 24th February 2016 12:49
To: [email protected]
Subject: recrawling of specific URLS

Hi All

Nutch is made to update ALL the URLs after a certain point of time.
But I want to recrawl only the home page of seed URL so that i could get
new link from the home page to crawl.
Currently I am using  the bug "Inject command re-inject seed URLS." for
recrawling my seed URLs.But this is not the standard way.
Please give a suggestion.I have read articles/discussions on
re-crawling.But could not find the solution.
Lewis,Tejas Please help!!!!!

Thanks


Reply via email to