that helps a lot! thanks!

2009/3/2 yanky young <[email protected]>

> Hi:
>
> I am not an nutch expert though. But I think ur problem is easy.
>
> 1. make a list of seed urls in a file under urls folder
> 2. add all of the domain that you want to crawl to crawl-urlfilter.txt,
> just
> like this:
>
> # accept hosts in MY.DOMAIN.NAME
> +^http://([a-z0-9]*\.)*aaa.edu/
> +^http://([a-z0-9]*\.)*bbb.edu/
> ......
>
> good luck!
>
> yanky
>
> 2009/3/3 Tony Wang <[email protected]>
>
> > Can someone on this list give me some instructions about how to crawl
> > multiple websites in each run? Should I make a list of websites in the
> urls
> > folder? but how to set up the crawl-urlfilter.txt?
> >
> > thanks!
> >
> > --
> > Are you RCholic? www.RCholic.com
> > 温 良 恭 俭 让 仁 义 礼 智 信
> >
>



-- 
Are you RCholic? www.RCholic.com
温 良 恭 俭 让 仁 义 礼 智 信

Reply via email to