Re: [OpenBD] Which format would OpenBD be most efficient with?

Alan Holden Thu, 16 Dec 2010 11:40:59 -0800

So, pull and parse the phishtank document every day or so. XML, JSON,whatever floats your boat. It'll be an offline process.

Store ALL the document's stuff in a data object for later.

Then take out just the urls - all of the unique ones - and store them inmemory as an application list or array.

Each time a url is submitted, scan the memory list for a match.
If you find a match, go back to the data, find out why and act on that.


Takes some RAM, but seems to me like the way to handle any real load.
Until that phishtank document becomes unwieldy or your cluster grows a lot.

Al

On 12/16/2010 10:35 AM, Jason King wrote:

Looks like there is approx 830k entries in the xml file. Some onlyhave a few attributes, some of them have hundreds.
So basically, everytime I scan the doc it has to parse through nearly1million unique parent elements.


--
Open BlueDragon Public Mailing List
http://www.openbluedragon.org/   http://twitter.com/OpenBlueDragon
official manual: http://www.openbluedragon.org/manual/
Ready2Run CFML http://www.openbluedragon.org/openbdjam/

mailing list - http://groups.google.com/group/openbd?hl=en

Re: [OpenBD] Which format would OpenBD be most efficient with?

Reply via email to