> > So, who's going to yell at us?
>
> With all you data miners out there clicking and downloading everything
> in sight, pretty soon you will only measure the noise created by data
> miners, web crawlers and the like.
If someone would operated a free global place where we could get that
information (like the OEmbed standard calls for) then we could ask
without counting. In the meantime, I'm offering a valuable service to
my audience by unrolling the shortened URL to something meaningful. I
hope you bothered to look at the pages I gave to understand what that
value is. The canonicalization does NOT click/crawl anything on the
final page... it just follows the redirections and frame-busting as
needed to get to the actual content.
> Google, yandex and the rest are already a signigicant amount of the
> traffic for small sites.
Oh, I know it... that's why a Sitemap.xml, ROBOTS.TXT and offering an
OEmbed endpoint on your sites is a really good idea. See http://oembed.com/
for the use of the latter.
> What this means is that because you are introducing more and more
> background noise into your data, you will only be able to measure the
> really strong signals. That narrows what you can find, and you risk
> that eventually you find only obvious things.
I'm not introducing noise in my OWN data because I'm correctly
rendering the links with rel="nofollow" so Google and other well-
behaved crawlers won't follow them. What I'm measuring is the click-
though rate ON MY SITE of links leading off-site. This is standard
behavior.
Sadly, I will agree that my crawl of the RawLink to canonical link
will add noise to that destination site's numbers. I hope that the
fact that I follow the best practice of using a bot-noted User-Agent
helps in statistics on their end. I know that I have had to understand
and honor/count those UAs correctly.
Marc