Hi Lewis, Actually I am crawlling an intranet site. We need to migrate intranet site from one CMS to another. Crawling is done successfully. Now I have to decide the migration order of the crawled web pages. For doing that I was just thinking a way to find the list of url at each depth. Suppose at the time of crawling I have specified the depth 5, then I want to fetch the list of url at depth 1,2,3.... and so on. Can this be possible using Nutch api?
Thanks, Meenakshi On Mon, May 23, 2011 at 4:02 PM, McGibbney, Lewis John < [email protected]> wrote: > Hi Meenakshi, > > Can you expand any on this? This is very vague. > > Lewis > > ________________________________________ > From: Meenakshi Kanaujia [[email protected]] > Sent: 23 May 2011 05:30 > To: [email protected] > Subject: Fetch list of urls > > Hi, > > I have crawled site using Nutch. > Is this possible to fetch the list of urls depth wise from crawlDB. > > Thanks, > Meenakshi > > Email has been scanned for viruses by Altman Technologies' email management > service - www.altman.co.uk/emailsystems > > Glasgow Caledonian University is a registered Scottish charity, number > SC021474 > > Winner: Times Higher Education’s Widening Participation Initiative of the > Year 2009 and Herald Society’s Education Initiative of the Year 2009. > > http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html > > Winner: Times Higher Education’s Outstanding Support for Early Career > Researchers of the Year 2010, GCU as a lead with Universities Scotland > partners. > > http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html >

