You can use the prefix based url filter to do what banned-host.txt did - and it works just as well - if not faster.
THe filter setup & notes are in the archive - should be able to search for messages regarding "large regex" where i was asking about this very same issue :) --- Kashif Khadim <[EMAIL PROTECTED]> wrote: > "Nutch filters out less porn than the major search > engines. For an example, use the keyword > "cheerleaders". ". > > > In my opinon the search quality also depends on > filters . Nutch offer regex-urlfilter.txt file that > can be used for this. I am voting for > banned-host.txt file to make a comeback in release > 0.5 because even i use "regex-urlfilter.txt" i end > up many spam sites which makes search result not > that good.I cannot make "regex-urlfilter.txt" large > or else the operation of injecting etc take for ------------------------------------------------------- This SF.Net email is sponsored by BEA Weblogic Workshop FREE Java Enterprise J2EE developer tools! Get your free copy of BEA WebLogic Workshop 8.1 today. http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click _______________________________________________ Nutch-developers mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-developers
