Gotcha.. So rather than hitting the original xml doc.. maybe create a custom tailored SQL table that only holds the info I need, and just import the XML data into that? I was actually considering that... This way, I could write all my searching and such in standard sql.
-jason On Thu, Dec 16, 2010 at 1:40 PM, Alan Holden <[email protected]> wrote: > So, pull and parse the phishtank document every day or so. XML, JSON, > whatever floats your boat. It'll be an offline process. > Store ALL the document's stuff in a data object for later. > Then take out just the urls - all of the unique ones - and store them in > memory as an application list or array. > Each time a url is submitted, scan the memory list for a match. > If you find a match, go back to the data, find out why and act on that. > > Takes some RAM, but seems to me like the way to handle any real load. > Until that phishtank document becomes unwieldy or your cluster grows a lot. > > Al > > > On 12/16/2010 10:35 AM, Jason King wrote: > >> Looks like there is approx 830k entries in the xml file. Some only have a >> few attributes, some of them have hundreds. >> >> So basically, everytime I scan the doc it has to parse through nearly >> 1million unique parent elements. >> >> > -- > Open BlueDragon Public Mailing List > http://www.openbluedragon.org/ http://twitter.com/OpenBlueDragon > official manual: http://www.openbluedragon.org/manual/ > Ready2Run CFML http://www.openbluedragon.org/openbdjam/ > > mailing list - http://groups.google.com/group/openbd?hl=en > -- Open BlueDragon Public Mailing List http://www.openbluedragon.org/ http://twitter.com/OpenBlueDragon official manual: http://www.openbluedragon.org/manual/ Ready2Run CFML http://www.openbluedragon.org/openbdjam/ mailing list - http://groups.google.com/group/openbd?hl=en
