I have problem setting up the urlfilter. For example, I wanna crawl all student pages at http://www.cs.princeton.edu/ which ends up with http://www.cs.princeton/edu/~abcd sth like that. Thus I made the starting page http://www.cs.princeton.edu/ and set up the crawl-urlfilter as
+^http://www.cs.princeton.edu/~([a-z0-9]*\.//)*
But it just doesn't crawl anything,
You also need to accept the start page and pages between it and the tilde pages, e.g.:
+^http://www.cs.princeton.edu/(people/(grad|fac)\.php)?$
Doug
------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
