Hi, Whilst this looks excellent and I may use it to serve different content to other types of users I think you should read, if you haven't already, this URL which discourages this sort of behaviour.
http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=66355 Great VCL though! Stew On 11 August 2010 16:20, Rob S <[email protected]> wrote: > > Michael Loftis wrote: >> >> >> --On Tuesday, August 10, 2010 9:05 PM +0100 Rob S <[email protected]> >> wrote: >> >>> Hi, >>> >>> On one site we run behind varnish, we've got a "most popular" widget >>> displayed on every page (much like http://www.bbc.co.uk/news/). However, >>> we have difficulties where this pollutes search engines, as searches for >>> a specific popular headline tend not to link directly to the article >>> itself, but to one of the index pages with high Google pagerank or >>> similar. >>> >>> What I'd like to know is how other Varnish users might have served >>> different ESI content based on whether it's a bot or not. >>> >>> My initial idea was to set an "X-Not-For-Bots: 1" header on the URL that >>> generates the most-popular fragment, then do something like (though >>> untested): >>> >> >> ESI goes through all the normal steps, so a <esi:include >> src="/esi/blargh"> is fired off starting with vcl_receive looking just >> exactly like the browser had hit the cache with that as the req.url -- the >> entire req object is the same -- i am *not* certain that headers you've >> added get propogated as I've not tested that (and all of my rules are built >> on the assumption that is not the case, just to be sure) >> >> So there's no need to do it in vcl_deliver, in fact, you're far better >> handling it in vcl_recv and/or vcl_hash (actually you really SHOULD handle >> it in vcl_hash and change the hash for these search engine specific objects >> else you'll serve them to regular users)... >> >> >> for example -- assume vcl_recv sets X-BotDetector in the req header... >> (not tested):: >> >> >> sub vcl_hash { >> // always take into account the url and host >> set req.hash += req.url; >> if (req.http.host) { >> set req.hash += req.http.host; >> } else { >> set req.hash += server.ip; >> } >> >> if(req.http.X-BotDetector == "1") { >> set req.hash += "bot detector"; >> } >> } >> >> >> You still have to do the detection inside of varnish, I don't see any way >> around that. The reason is that only varnish knows who it's talking to, and >> varnish needs to decide which object to spit out. Working properly what >> happens is essentially the webserver sends back a 'template' for the page >> containing the page specific stuff, and pointers to a bunch of ESI >> fragments. The ESI fragments are also cache objects/requests...So what >> happens is the cache takes this template, fills in ESI fragments (from cache >> if it can, fetching them if it needs to, treating them just as if the web >> browser had run to the ESI url) >> >> >> This is actually exactly how I handle menu's that change based on a users >> authentication status. The browser gets a cookie. The ESI URL is formed as >> either 'authenticated' 'personalized' or 'global' -- authenticated means it >> varies only on the clients login state, personalized takes into account the >> actual session we're working with. And global means everyone gets the same >> cache regardless (we strip cookies going into these ESI URLs and coming from >> these ESI URLs in the vcl_recv/vcl_fetch code, the vcl_fetch code looks for >> some special headers set that indicate that the recv has decided it needs to >> ditch set-cookies -- this is mostly a safety measure to prevent a session >> sticking to a client it shouldn't due to any bugs in code) >> >> The basic idea is borrowed from >> <http://varnish-cache.org/wiki/VCLExampleCachingLoggedInUsers> and >> <http://varnish-cache.org/wiki/VCLExampleCacheCookies> >> >> HTH! > > Thanks. We've proved this works with a simple setup: > > sub vcl_recv { > .... > // Establish if the visitor is a search engine: > set req.http.X-IsABot = "0"; > if (req.http.user-agent ~ "Yahoo! Slurp") { set req.http.X-IsABot = > "1"; } > if (req.http.X-IsABot == "0" && req.http.user-agent ~ "Googlebot") { > set req.http.X-IsABot = "1"; } > if (req.http.X-IsABot == "0" && req.http.user-agent ~ "msnbot") { set > req.http.X-IsABot = "1"; } > .... > > } > ... > sub vcl_hash { > set req.hash += req.url; > if (req.http.host) { > set req.hash += req.http.host; > } else { > set req.hash += server.ip; > } > > if (req.http.X-IsABot == "1") { > set req.hash += "for-bot"; > } else { > set req.hash += "for-non-bot"; > } > hash; > } > > The main HTML has a simple ESI, which loads a page fragment whose PHP reads: > > if ($_SERVER["HTTP_X_ISABOT"]) { > > echo "<!-- The list of popular posts is not displayed to search > engines -->"; > } else { > // calculate most popular > echo "The most popular article is XYZ"; > } > > > > Thanks again. > > _______________________________________________ > varnish-misc mailing list > [email protected] > http://lists.varnish-cache.org/mailman/listinfo/varnish-misc > _______________________________________________ varnish-misc mailing list [email protected] http://lists.varnish-cache.org/mailman/listinfo/varnish-misc
