I have problem setting up the urlfilter. For example, I wanna crawl all student pages at http://www.cs.princeton.edu/ which ends up with http://www.cs.princeton/edu/~abcd sth like that. Thus I made the starting page http://www.cs.princeton.edu/ and set up the crawl-urlfilter as
+^http://www.cs.princeton.edu/~([a-z0-9]*\.//)* But it just doesn't crawl anything, if I remove the "~", it does crawl well, but also crawl many things that I don't need, like ../course/...., how should I set up the urlfilter properly? Thank you all.
