Eric Money wrote:
I have problem setting up the urlfilter. For example,
I wanna crawl all student pages at http://www.cs.princeton.edu/
which ends up with http://www.cs.princeton/edu/~abcd
sth like that. Thus I made the starting page http://www.cs.princeton.edu/
and set up the crawl-urlfilter as

+^http://www.cs.princeton.edu/~([a-z0-9]*\.//)*

But it just doesn't crawl anything,

You also need to accept the start page and pages between it and the tilde pages, e.g.:


+^http://www.cs.princeton.edu/(people/(grad|fac)\.php)?$

Doug

Reply via email to