Hi:

I am not an nutch expert though. But I think ur problem is easy.

1. make a list of seed urls in a file under urls folder
2. add all of the domain that you want to crawl to crawl-urlfilter.txt, just
like this:

# accept hosts in MY.DOMAIN.NAME
+^http://([a-z0-9]*\.)*aaa.edu/
+^http://([a-z0-9]*\.)*bbb.edu/
......

good luck!

yanky

2009/3/3 Tony Wang <[email protected]>

> Can someone on this list give me some instructions about how to crawl
> multiple websites in each run? Should I make a list of websites in the urls
> folder? but how to set up the crawl-urlfilter.txt?
>
> thanks!
>
> --
> Are you RCholic? www.RCholic.com
> 温 良 恭 俭 让 仁 义 礼 智 信
>

Reply via email to