Hi - easiest method is to use the freegen tool. But if you really want homepages, not just domain roots, you can use the hostdb with freegen for it.
# Update the hostdb bin/nutch updatehostdb -hostdb crawl/hostdb -crawldb crawl/crawldb/ # Get list of homepages for each host bin/nutch readhostdb crawl/hostdb/ output -dumpHomepages Then use freegen. Markus -----Original message----- > From:harsh <[email protected]> > Sent: Wednesday 24th February 2016 12:49 > To: [email protected] > Subject: recrawling of specific URLS > > Hi All > > Nutch is made to update ALL the URLs after a certain point of time. > But I want to recrawl only the home page of seed URL so that i could get > new link from the home page to crawl. > Currently I am using the bug "Inject command re-inject seed URLS." for > recrawling my seed URLs.But this is not the standard way. > Please give a suggestion.I have read articles/discussions on > re-crawling.But could not find the solution. > Lewis,Tejas Please help!!!!! > > Thanks >

