They actually say that. If you don't have a key, they only let you hit the
list a few times per hour. Then they go on to say that they only update the
list every hour, so it's essentially pointless to keep hitting the list when
you could simply download it, get better performance, save them money and
bandwidth, etc.

Since my db server is sandboxed and isolated from the Internet, I guess I
would have to use a server with web access, maybe even the webserver itself,
to fetch the file, open it, and import it into the DB. If it's just a
straight insert via XML to SQL server, and there isn't much logic involved
other than the structured import, would it be wise to just have the
webserver do it? Or could that potentially cause a big hit and lock the
server down until the import is done? I could actually setup a VM that does
nothing but fetch that file, parse it, and then insert it into the DB. My
webserver is windows, but I could just use centOS and php to do the insert
if that's all it would be doing. My app is written in cfml, but a dedicated
vm would be separate.



On Thu, Dec 16, 2010 at 2:26 PM, Alan Holden <[email protected]> wrote:

>  Sure. You're pretty darn close to what I said.
>
> But that's an important point: I doubt that any of these list providers
> (phishing, email blocklists, rss feeds, etc) want you to actually hit their
> server every time you get a request of your own. Most of them expect you to
> pull their data occasionally and bank it locally; and will even block you if
> you request it too often.
>
> Al
>
>
>
> On 12/16/2010 12:07 PM, Jason King wrote:
>
> Gotcha..
>
>  So rather than hitting the original xml doc.. maybe create a custom
> tailored SQL table that only holds the info I need, and just import the XML
> data into that? I was actually considering that...  This way, I could write
> all my searching and such in standard sql.
>
>  -jason
>
> On Thu, Dec 16, 2010 at 1:40 PM, Alan Holden <[email protected]> wrote:
>
>>  So, pull and parse the phishtank document every day or so. XML, JSON,
>> whatever floats your boat. It'll be an offline process.
>> Store ALL the document's stuff in a data object for later.
>> Then take out just the urls - all of the unique ones - and store them in
>> memory as an application list or array.
>> Each time a url is submitted, scan the memory list for a match.
>> If you find a match, go back to the data, find out why and act on that.
>>
>>    --
> Open BlueDragon Public Mailing List
> http://www.openbluedragon.org/ http://twitter.com/OpenBlueDragon
> official manual: http://www.openbluedragon.org/manual/
> Ready2Run CFML http://www.openbluedragon.org/openbdjam/
>
> mailing list - http://groups.google.com/group/openbd?hl=en
>

-- 
Open BlueDragon Public Mailing List
 http://www.openbluedragon.org/   http://twitter.com/OpenBlueDragon
 official manual: http://www.openbluedragon.org/manual/
 Ready2Run CFML http://www.openbluedragon.org/openbdjam/

 mailing list - http://groups.google.com/group/openbd?hl=en

Reply via email to