that helps a lot! thanks! 2009/3/2 yanky young <[email protected]>
> Hi: > > I am not an nutch expert though. But I think ur problem is easy. > > 1. make a list of seed urls in a file under urls folder > 2. add all of the domain that you want to crawl to crawl-urlfilter.txt, > just > like this: > > # accept hosts in MY.DOMAIN.NAME > +^http://([a-z0-9]*\.)*aaa.edu/ > +^http://([a-z0-9]*\.)*bbb.edu/ > ...... > > good luck! > > yanky > > 2009/3/3 Tony Wang <[email protected]> > > > Can someone on this list give me some instructions about how to crawl > > multiple websites in each run? Should I make a list of websites in the > urls > > folder? but how to set up the crawl-urlfilter.txt? > > > > thanks! > > > > -- > > Are you RCholic? www.RCholic.com > > 温 良 恭 俭 让 仁 义 礼 智 信 > > > -- Are you RCholic? www.RCholic.com 温 良 恭 俭 让 仁 义 礼 智 信
