So, pull and parse the phishtank document every day or so. XML, JSON,
whatever floats your boat. It'll be an offline process.
Store ALL the document's stuff in a data object for later.
Then take out just the urls - all of the unique ones - and store them in
memory as an application list or array.
Each time a url is submitted, scan the memory list for a match.
If you find a match, go back to the data, find out why and act on that.
Takes some RAM, but seems to me like the way to handle any real load.
Until that phishtank document becomes unwieldy or your cluster grows a lot.
Al
On 12/16/2010 10:35 AM, Jason King wrote:
Looks like there is approx 830k entries in the xml file. Some only
have a few attributes, some of them have hundreds.
So basically, everytime I scan the doc it has to parse through nearly
1million unique parent elements.
--
Open BlueDragon Public Mailing List
http://www.openbluedragon.org/ http://twitter.com/OpenBlueDragon
official manual: http://www.openbluedragon.org/manual/
Ready2Run CFML http://www.openbluedragon.org/openbdjam/
mailing list - http://groups.google.com/group/openbd?hl=en