I have problem setting up the urlfilter. For example, I wanna crawl all student pages at http://www.cs.princeton.edu/ which ends up with http://www.cs.princeton/edu/~abcd sth like that. Thus I made the starting page http://www.cs.princeton.edu/ and set up the crawl-urlfilter as
+^http://www.cs.princeton.edu/~([a-z0-9]*\.//)*
But it just doesn't crawl anything,
You also need to accept the start page and pages between it and the tilde pages, e.g.:
+^http://www.cs.princeton.edu/(people/(grad|fac)\.php)?$
Doug
