Gotcha..

So rather than hitting the original xml doc.. maybe create a custom tailored
SQL table that only holds the info I need, and just import the XML data into
that? I was actually considering that...  This way, I could write all my
searching and such in standard sql.

-jason

On Thu, Dec 16, 2010 at 1:40 PM, Alan Holden <[email protected]> wrote:

>  So, pull and parse the phishtank document every day or so. XML, JSON,
> whatever floats your boat. It'll be an offline process.
> Store ALL the document's stuff in a data object for later.
> Then take out just the urls - all of the unique ones - and store them in
> memory as an application list or array.
> Each time a url is submitted, scan the memory list for a match.
> If you find a match, go back to the data, find out why and act on that.
>
> Takes some RAM, but seems to me like the way to handle any real load.
> Until that phishtank document becomes unwieldy or your cluster grows a lot.
>
> Al
>
>
> On 12/16/2010 10:35 AM, Jason King wrote:
>
>> Looks like there is approx 830k entries in the xml file. Some only have a
>> few attributes, some of them have hundreds.
>>
>> So basically, everytime I scan the doc it has to parse through nearly
>> 1million unique parent elements.
>>
>>
> --
> Open BlueDragon Public Mailing List
> http://www.openbluedragon.org/   http://twitter.com/OpenBlueDragon
> official manual: http://www.openbluedragon.org/manual/
> Ready2Run CFML http://www.openbluedragon.org/openbdjam/
>
> mailing list - http://groups.google.com/group/openbd?hl=en
>

-- 
Open BlueDragon Public Mailing List
 http://www.openbluedragon.org/   http://twitter.com/OpenBlueDragon
 official manual: http://www.openbluedragon.org/manual/
 Ready2Run CFML http://www.openbluedragon.org/openbdjam/

 mailing list - http://groups.google.com/group/openbd?hl=en

Reply via email to