Michael Loftis wrote:
--On Tuesday, August 10, 2010 9:05 PM +0100 Rob S
<[email protected]> wrote:
Hi,
On one site we run behind varnish, we've got a "most popular" widget
displayed on every page (much like http://www.bbc.co.uk/news/).
However,
we have difficulties where this pollutes search engines, as searches for
a specific popular headline tend not to link directly to the article
itself, but to one of the index pages with high Google pagerank or
similar.
What I'd like to know is how other Varnish users might have served
different ESI content based on whether it's a bot or not.
My initial idea was to set an "X-Not-For-Bots: 1" header on the URL that
generates the most-popular fragment, then do something like (though
untested):
ESI goes through all the normal steps, so a <esi:include
src="/esi/blargh"> is fired off starting with vcl_receive looking just
exactly like the browser had hit the cache with that as the req.url --
the entire req object is the same -- i am *not* certain that headers
you've added get propogated as I've not tested that (and all of my
rules are built on the assumption that is not the case, just to be sure)
So there's no need to do it in vcl_deliver, in fact, you're far better
handling it in vcl_recv and/or vcl_hash (actually you really SHOULD
handle it in vcl_hash and change the hash for these search engine
specific objects else you'll serve them to regular users)...
for example -- assume vcl_recv sets X-BotDetector in the req header...
(not tested)::
sub vcl_hash {
// always take into account the url and host
set req.hash += req.url;
if (req.http.host) {
set req.hash += req.http.host;
} else {
set req.hash += server.ip;
}
if(req.http.X-BotDetector == "1") {
set req.hash += "bot detector";
}
}
You still have to do the detection inside of varnish, I don't see any
way around that. The reason is that only varnish knows who it's
talking to, and varnish needs to decide which object to spit out.
Working properly what happens is essentially the webserver sends back
a 'template' for the page containing the page specific stuff, and
pointers to a bunch of ESI fragments. The ESI fragments are also
cache objects/requests...So what happens is the cache takes this
template, fills in ESI fragments (from cache if it can, fetching them
if it needs to, treating them just as if the web browser had run to
the ESI url)
This is actually exactly how I handle menu's that change based on a
users authentication status. The browser gets a cookie. The ESI URL
is formed as either 'authenticated' 'personalized' or 'global' --
authenticated means it varies only on the clients login state,
personalized takes into account the actual session we're working
with. And global means everyone gets the same cache regardless (we
strip cookies going into these ESI URLs and coming from these ESI URLs
in the vcl_recv/vcl_fetch code, the vcl_fetch code looks for some
special headers set that indicate that the recv has decided it needs
to ditch set-cookies -- this is mostly a safety measure to prevent a
session sticking to a client it shouldn't due to any bugs in code)
The basic idea is borrowed from
<http://varnish-cache.org/wiki/VCLExampleCachingLoggedInUsers> and
<http://varnish-cache.org/wiki/VCLExampleCacheCookies>
HTH!
Thanks. We've proved this works with a simple setup:
sub vcl_recv {
....
// Establish if the visitor is a search engine:
set req.http.X-IsABot = "0";
if (req.http.user-agent ~ "Yahoo! Slurp") { set
req.http.X-IsABot = "1"; }
if (req.http.X-IsABot == "0" && req.http.user-agent ~
"Googlebot") { set req.http.X-IsABot = "1"; }
if (req.http.X-IsABot == "0" && req.http.user-agent ~ "msnbot")
{ set req.http.X-IsABot = "1"; }
....
}
...
sub vcl_hash {
set req.hash += req.url;
if (req.http.host) {
set req.hash += req.http.host;
} else {
set req.hash += server.ip;
}
if (req.http.X-IsABot == "1") {
set req.hash += "for-bot";
} else {
set req.hash += "for-non-bot";
}
hash;
}
The main HTML has a simple ESI, which loads a page fragment whose PHP reads:
if ($_SERVER["HTTP_X_ISABOT"]) {
echo "<!-- The list of popular posts is not displayed to search
engines -->";
} else {
// calculate most popular
echo "The most popular article is XYZ";
}
Thanks again.
_______________________________________________
varnish-misc mailing list
[email protected]
http://lists.varnish-cache.org/mailman/listinfo/varnish-misc