Hi,
I'm trying to determine if there's a better way to whitelist a large
number of domains than just adding them as a regular expression in the
filter.
We're setting up a regional search engine and using the filter file to
determine what URL's make it into the db. We've added specific domain
extensions and a list of IP address in the region, but that barely taps
the list. So we've taken to adding individual domains into the filter
list - lots of them. I'd take a guess and say that ultimately we're
going to need to have hundreds of thousands of domains whitelisted in
our filter.
Ultimately I don't think this is going to be workeable. Updating the
database with a filter like this will simply take too long - likely
hours or days.
Is there a more scalable way of doing this?
Thanks,
g.
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general