Hi: I am not an nutch expert though. But I think ur problem is easy.
1. make a list of seed urls in a file under urls folder 2. add all of the domain that you want to crawl to crawl-urlfilter.txt, just like this: # accept hosts in MY.DOMAIN.NAME +^http://([a-z0-9]*\.)*aaa.edu/ +^http://([a-z0-9]*\.)*bbb.edu/ ...... good luck! yanky 2009/3/3 Tony Wang <[email protected]> > Can someone on this list give me some instructions about how to crawl > multiple websites in each run? Should I make a list of websites in the urls > folder? but how to set up the crawl-urlfilter.txt? > > thanks! > > -- > Are you RCholic? www.RCholic.com > 温 良 恭 俭 让 仁 义 礼 智 信 >
