On 2/20/2018 9:42 PM, Rob McEwen wrote:
Google might easily start putting captchas in the way or otherwise consider such lookups to be abusive and/or mistake them for malicious bots...

This prediction turned out to be 100% true. Even though others have mentioned that they have been able to do high-volume lookups with no problems... And granted I wasn't implementing a multi-server or multi-ip lookup strategy... But I don't think I was doing nearly as many lookups as others have claimed that they were able to do. I took a batch of 55,000 spams that I had collected from the past 4 weeks where those spams were maliciously using the Google shortener as a way to get their spam delivered via hiding their spammy domain names from spam filters. I started checking those by looking up the redirect from Google's redirector, but without actually visiting the site that the redirector was pointing to. Please note that I was doing the lookups one-at-a-time, not starting the next lookup until the last lookup had completed. After about ONLY 1,400 lookups, ALL of my following lookups started hitting captchas. See attached screenshot. Also, other than not sending from multiple IPs, I was otherwise doing everything correct to make my script look/act like a regular browser.

I'll try spreading it out between multiple IPs in order to try to avoid rate limits... However... This is still cause for concern about high-volume lookups in high production systems... those may have to be implemented a little more carefully if they're going to do these kind of lookups!

Just because small or medium production systems are able to do this... Or just because somebody went out of their way to get more sophisticated with it to get it to work out... doesn't mean that it's going to work in high production systems that are trying to use "canned" software or plugins. This is a particular challenge for anti-spam blacklists because they typically process a very high volume of spams. Hopefully, the randomness of the ones I process as they come in... will be sufficiently spread out enough to avoid rate limiting?

It was my hope to start processing these live with my own DNSBL engine, so that I could start blacklisting the domains that they redirect to... In those cases where they were not already blacklisted... Now I'm going to have to deal with constantly trying to make sure that I'm not hitting this captcha, along with implementing some other strategies to hopefully prevent that.

But this brings up a whole other issue... That is more of a policy or legal issue... is Google basically making a statement that automated lookups are not welcome? Or are considered abusive?

(btw, I could have collected order of magnitudes more than 55,000 of THESE types of spams, but this was merely what was left over in an after-the-fact search of my archives, after a lot of otherwise redundant spams had already been purged from my system.)

PS - Once I gather this information, I will submit more details about the results of this testing. But what is shocking right now is that less than four tenths of 1% of these redirect URLs has been terminated, even though they average two weeks old, with some almost a month old.

--
Rob McEwen
https://www.invaluement.com
+1 (478) 475-9032


Reply via email to