I just did a generate DB to create a fresh segment to fetch and i have this setup in the urlfilter
# skip 'file:' urls -^file: -^ftp: -^gopher: -^mailto: -^https: is that the correct way to defign those? I added FTP since FTP slows the crawler to a stand still (doesn't seem to gracefully end or it fills up all the threads), didn't want a bunch of spam addresses in mailto's and since there is no parser for https by default (or i din't have it enabled) i set that up. I'm still seeing https urls come alone.. ------------------------------------------------------- This SF.Net email is sponsored by: Oracle 10g Get certified on the hottest thing ever to hit the market... Oracle 10g. Take an Oracle 10g class now, and we'll give you the exam FREE. http://ads.osdn.com/?ad_id=3149&alloc_id=8166&op=click _______________________________________________ Nutch-developers mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-developers
