[twitter-dev] Re: t.co Is cool, and I might have an issue with it anyway.

2010-06-10 Thread @IDisposable
> > Oh, I know it... that's why a Sitemap.xml, ROBOTS.TXT and offering an
> > OEmbed endpoint on your sites is a really good idea. Seehttp://oembed.com/
> > for the use of the latter.
>
> What's their business model? What do they sell to whom?

OEmbed.com is the place where the standard is spelled out... e.g. what
you should provide as a web developer if you want to encourage
embedding and/or reduce crawling loads.  As such, there's no business
model (for them), but some website owner might have one that
incentives you to use the standard.

There is also a service that provides OEmbed data for tons of sites
already, my favorite being http://api.embed.ly/  I have no idea what
their business model is, but they have a wicked-cool service.

Marc


Re: [twitter-dev] Re: t.co Is cool, and I might have an issue with it anyway.

2010-06-10 Thread M. Edward (Ed) Borasky

Quoting "@IDisposable" :



If someone would operated a free global place where we could get that
information (like the OEmbed standard calls for) then we could ask
without counting.  In the meantime, I'm offering a valuable service to
my audience by unrolling the shortened URL to something meaningful.  I
hope you bothered to look at the pages I gave to understand what that
value is.  The canonicalization does NOT click/crawl anything on the
final page... it just follows the redirections and frame-busting as
needed to get to the actual content.


Google, yandex and the rest are already a signigicant amount of the
traffic for small sites.


Oh, I know it... that's why a Sitemap.xml, ROBOTS.TXT and offering an
OEmbed endpoint on your sites is a really good idea. See http://oembed.com/
for the use of the latter.


What's their business model? What do they sell to whom?





[twitter-dev] Re: t.co Is cool, and I might have an issue with it anyway.

2010-06-10 Thread @IDisposable
> > So, who's going to yell at us?
>
> With all you data miners out there clicking and downloading everything
> in sight, pretty soon you will only measure the noise created by data
> miners, web crawlers and the like.

If someone would operated a free global place where we could get that
information (like the OEmbed standard calls for) then we could ask
without counting.  In the meantime, I'm offering a valuable service to
my audience by unrolling the shortened URL to something meaningful.  I
hope you bothered to look at the pages I gave to understand what that
value is.  The canonicalization does NOT click/crawl anything on the
final page... it just follows the redirections and frame-busting as
needed to get to the actual content.

> Google, yandex and the rest are already a signigicant amount of the
> traffic for small sites.

Oh, I know it... that's why a Sitemap.xml, ROBOTS.TXT and offering an
OEmbed endpoint on your sites is a really good idea. See http://oembed.com/
for the use of the latter.

> What this means is that because you are introducing more and more
> background noise into your data, you will only be able to measure the
> really strong signals. That narrows what you can find, and you risk
> that eventually you find only obvious things.

I'm not introducing noise in my OWN data because I'm correctly
rendering the links with rel="nofollow" so Google and other well-
behaved crawlers won't follow them. What I'm measuring is the click-
though rate ON MY SITE of links leading off-site. This is standard
behavior.

Sadly, I will agree that my crawl of the RawLink to canonical link
will add noise to that destination site's numbers. I hope that the
fact that I follow the best practice of using a bot-noted User-Agent
helps in statistics on their end. I know that I have had to understand
and honor/count those UAs correctly.

Marc