Hi Eric,
I might be wrong (and people here will answer properly if I am), but the crawl-urlfilter file contains the urls _to be crawled_, not _to be indexed_. A solution might be to keep your regexp, to add one to allow the /people/ pages to be crawled, and to start the crawling at those links: http://www.cs.princeton.edu/people/grad.php http://www.cs.princeton.edu/people/ugrad.php http://www.cs.princeton.edu/people/techstaff.php and any other that might contain links to personal pages. Hope this helps. Regards, S�bastien. --- Eric Money <[EMAIL PROTECTED]> wrote: > I have problem setting up the urlfilter. For example, > I wanna crawl all student pages at http://www.cs.princeton.edu/ > which ends up with http://www.cs.princeton/edu/~abcd > sth like that. Thus I made the starting page > http://www.cs.princeton.edu/ > and set up the crawl-urlfilter as > > +^http://www.cs.princeton.edu/~([a-z0-9]*\.//)* > > But it just doesn't crawl anything, > if I remove the "~", it does crawl well, but also crawl > many things that I don't need, like ../course/...., > how should I set up the urlfilter properly? Thank you all. > __________________________________________________________________ D�couvrez le nouveau Yahoo! Mail : 250 Mo d'espace de stockage pour vos mails ! Cr�ez votre Yahoo! Mail sur http://fr.mail.yahoo.com/
