On Wed, Jul 10, 2013 at 9:25 AM, Craig Ringer <cr...@2ndquadrant.com> wrote: > On 07/09/2013 11:30 PM, Andres Freund wrote: >> On 2013-07-09 16:24:42 +0100, Greg Stark wrote: >>> I note that git.postgresql.org's robot.txt refuses permission to crawl >>> the git repository: >>> >>> http://git.postgresql.org/robots.txt >>> >>> User-agent: * >>> Disallow: / >>> >>> >>> I'm curious what motivates this. It's certainly useful to be able to >>> search for commits. >> >> Gitweb is horribly slow. I don't think anybody with a bigger git repo >> using gitweb can afford to let all the crawlers go through it. > > Wouldn't whacking a reverse proxy in front be a pretty reasonable > option? There's a disk space cost, but using Apache's mod_proxy or > similar would do quite nicely.
It's already sitting behind Varnish, but the vast majority of pages on that site would only ever be hit by crawlers anyway, so I doubt that'd help a great deal as those pages would likely expire from the cache before it really saved us anything. -- Dave Page Blog: http://pgsnake.blogspot.com Twitter: @pgsnake EnterpriseDB UK: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers