I just did a generate DB to create a fresh segment to
fetch and i have this setup in the urlfilter


# skip 'file:' urls
-^file:
-^ftp:
-^gopher:
-^mailto:
-^https:

is that the correct way to defign those?  I added FTP
since FTP slows the crawler to a stand still (doesn't
seem to gracefully end or it fills up all the
threads), didn't want a bunch of spam addresses in
mailto's and since there is no parser for https by
default (or i din't have it enabled) i set that up.

I'm still seeing https urls come alone..


-------------------------------------------------------
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g. 
Take an Oracle 10g class now, and we'll give you the exam FREE.
http://ads.osdn.com/?ad_id=3149&alloc_id=8166&op=click
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to