Hello Lewis, Pardon me for the non-verbose desription. I have a set of urls, namely product urls, in range of millions.
So I want to write my urls, in a flat file, and have nutch crawl them to depth = 1 However, I might remove url's from this list, or add new ones. I also would like nutch to revisit each site each 1 day. I would like removed urls to be deleted, and new ones to be reinjected each time nutch starts. Best Regards, -C.B. On Thu, Jul 7, 2011 at 6:21 PM, lewis john mcgibbney <[email protected]> wrote: > Hi C.B., > > This is way to vague. We really require more information regarding roughly > what kind of results you wish to get. It would be a near impossible task for > anyone to try and specify a solution to this open ended question. > > Please elaborate > > Thank you > > On Thu, Jul 7, 2011 at 12:56 PM, Cam Bazz <[email protected]> wrote: > >> Hello, >> >> I have a case where I need to crawl a list of exact url's. Somewhere >> in the range of 1 to 1.5M urls. >> >> I have written those urls in numereus files under /home/urls , ie >> /home/urls/1 /home/urls/2 >> >> Then by using the crawl command I am crawling to depth=1 >> >> Are there any recomendations or general guidelines that I should >> follow when making nutch just to fetch and index a list of urls? >> >> >> Best Regards, >> C.B. >> > > > > -- > *Lewis* >

