Hi Eric, 

I might be wrong (and people here will answer properly if I am), but
the crawl-urlfilter file contains the urls _to be crawled_, not _to be
indexed_.  A solution might be to keep your regexp, to add one to allow
the /people/ pages to be crawled, and to start the crawling at those
links: 

http://www.cs.princeton.edu/people/grad.php
http://www.cs.princeton.edu/people/ugrad.php
http://www.cs.princeton.edu/people/techstaff.php

and any other that might contain links to personal pages. 

Hope this helps.

Regards,
S�bastien.



--- Eric Money <[EMAIL PROTECTED]> wrote:
> I have problem setting up the urlfilter. For example,
> I wanna crawl all student pages at http://www.cs.princeton.edu/
> which ends up with http://www.cs.princeton/edu/~abcd
> sth like that. Thus I made the starting page
> http://www.cs.princeton.edu/
> and set up the crawl-urlfilter as
> 
> +^http://www.cs.princeton.edu/~([a-z0-9]*\.//)*
> 
> But it just doesn't crawl anything,
> if I remove the "~", it does crawl well, but also crawl
> many things that I don't need, like ../course/...., 
> how should I set up the urlfilter properly? Thank you all.
> 


        

        
                
__________________________________________________________________
D�couvrez le nouveau Yahoo! Mail : 250 Mo d'espace de stockage pour vos mails ! 
Cr�ez votre Yahoo! Mail sur http://fr.mail.yahoo.com/

Reply via email to