You might take a look at
https://developers.google.com/url-shortener/v1/getting_started

1 millllion requests per day is the default limit.

On Wed, 14 Mar 2018, Rob McEwen wrote:

On 2/20/2018 9:42 PM, Rob McEwen wrote:
      Google might easily start putting captchas in the way or
      otherwise consider such lookups to be abusive and/or mistake
      them for malicious bots...

This prediction turned out to be 100% true. Even though others have
mentioned that they have been able to do high-volume lookups with no
problems... And granted I wasn't implementing a multi-server or multi-ip
lookup strategy... But I don't think I was doing nearly as many lookups as
others have claimed that they were able to do. I took a batch of 55,000
spams that I had collected from the past 4 weeks where those spams were
maliciously using the Google shortener as a way to get their spam delivered
via hiding their spammy domain names from spam filters. I started checking
those by looking up the redirect from Google's redirector, but without
actually visiting the site that the redirector was pointing to. Please note
that I was doing the lookups one-at-a-time, not starting the next lookup
until the last lookup had completed. After about ONLY 1,400 lookups, ALL of
my following lookups started hitting captchas. See attached screenshot.
Also, other than not sending from multiple IPs, I was otherwise doing
everything correct to make my script look/act like a regular browser.

I'll try spreading it out between multiple IPs in order to try to avoid rate
limits... However... This is still cause for concern about high-volume
lookups in high production systems... those may have to be implemented a
little more carefully if they're going to do these kind of lookups!

Just because small or medium production systems are able to do this... Or
just because somebody went out of their way to get more sophisticated with
it to get it to work out... doesn't mean that it's going to work in high
production systems that are trying to use "canned" software or plugins. This
is a particular challenge for anti-spam blacklists because they typically
process a very high volume of spams. Hopefully, the randomness of the ones I
process as they come in... will be sufficiently spread out enough to avoid
rate limiting?

It was my hope to start processing these live with my own DNSBL engine, so
that I could start blacklisting the domains that they redirect to... In
those cases where they were not already blacklisted... Now I'm going to have
to deal with constantly trying to make sure that I'm not hitting this
captcha, along with implementing some other strategies to hopefully prevent
that.

But this brings up a whole other issue... That is more of a policy or legal
issue... is Google basically making a statement that automated lookups are
not welcome? Or are considered abusive?

(btw, I could have collected order of magnitudes more than 55,000 of THESE
types of spams, but this was merely what was left over in an after-the-fact
search of my archives, after a lot of otherwise redundant spams had already
been purged from my system.)

PS - Once I gather this information, I will submit more details about the
results of this testing. But what is shocking right now is that less than
four tenths of 1% of these redirect URLs has been terminated, even though
they average two weeks old, with some almost a month old.



--
Public key #7BBC68D9 at            |                 Shane Williams
http://pgp.mit.edu/                |      System Admin - UT CompSci
=----------------------------------+-------------------------------
All syllogisms contain three lines |              sha...@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew

Reply via email to