I have problem setting up the urlfilter. For example,
I wanna crawl all student pages at http://www.cs.princeton.edu/
which ends up with http://www.cs.princeton/edu/~abcd
sth like that. Thus I made the starting page http://www.cs.princeton.edu/
and set up the crawl-urlfilter as

+^http://www.cs.princeton.edu/~([a-z0-9]*\.//)*

But it just doesn't crawl anything,
if I remove the "~", it does crawl well, but also crawl
many things that I don't need, like ../course/...., 
how should I set up the urlfilter properly? Thank you all.

Reply via email to