Re: Fetch list of urls

Meenakshi Kanaujia Mon, 23 May 2011 20:18:31 -0700

Hi Lewis,

Actually I am crawlling an intranet site.
We need to migrate intranet site from one CMS to another.
Crawling is done successfully.
Now I have to decide the migration order of the crawled web pages.
For doing that I was just thinking a way to find the list of url at each
depth.
Suppose at the time of crawling I have specified the depth 5, then I want to
fetch the list of url at depth 1,2,3.... and so on.
Can this be possible using Nutch api?


Thanks,
Meenakshi

On Mon, May 23, 2011 at 4:02 PM, McGibbney, Lewis John <
[email protected]> wrote:

> Hi Meenakshi,
>
> Can you expand any on this? This is very vague.
>
> Lewis
>
> ________________________________________
> From: Meenakshi Kanaujia [[email protected]]
> Sent: 23 May 2011 05:30
> To: [email protected]
> Subject: Fetch list of urls
>
> Hi,
>
> I have crawled site using Nutch.
> Is this possible to fetch the list of urls depth wise from crawlDB.
>
> Thanks,
> Meenakshi
>
> Email has been scanned for viruses by Altman Technologies' email management
> service - www.altman.co.uk/emailsystems
>
> Glasgow Caledonian University is a registered Scottish charity, number
> SC021474
>
> Winner: Times Higher Education’s Widening Participation Initiative of the
> Year 2009 and Herald Society’s Education Initiative of the Year 2009.
>
> http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html
>
> Winner: Times Higher Education’s Outstanding Support for Early Career
> Researchers of the Year 2010, GCU as a lead with Universities Scotland
> partners.
>
> http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html
>

Re: Fetch list of urls

Reply via email to