Eric Money wrote:
I have problem setting up the urlfilter. For example,
I wanna crawl all student pages at http://www.cs.princeton.edu/
which ends up with http://www.cs.princeton/edu/~abcd
sth like that. Thus I made the starting page http://www.cs.princeton.edu/
and set up the crawl-urlfilter as

+^http://www.cs.princeton.edu/~([a-z0-9]*\.//)*

But it just doesn't crawl anything,

You also need to accept the start page and pages between it and the tilde pages, e.g.:


+^http://www.cs.princeton.edu/(people/(grad|fac)\.php)?$

Doug


------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to