You can use the prefix based url filter to do what
banned-host.txt did - and it works just as well - if
not faster.

THe filter setup & notes are in the archive - should
be able to search for messages regarding "large regex"
where i was asking about this very same issue :)

--- Kashif Khadim <[EMAIL PROTECTED]> wrote:
> "Nutch filters out less porn than the major search
> engines. For an example, use the keyword
> "cheerleaders". ".
>  
> 
> In my opinon the search quality also depends on
> filters . Nutch offer regex-urlfilter.txt file that
> can be  used for this. I  am voting for
> banned-host.txt file to make a comeback in release
> 0.5 because even i use "regex-urlfilter.txt" i end
> up many spam sites which makes search result not
> that good.I cannot make "regex-urlfilter.txt" large
> or else the operation of injecting etc take for



-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to