On Wed, Jul 10, 2013 at 10:25 AM, Craig Ringer <cr...@2ndquadrant.com> wrote:
> On 07/09/2013 11:30 PM, Andres Freund wrote:
>> On 2013-07-09 16:24:42 +0100, Greg Stark wrote:
>>> I note that git.postgresql.org's robot.txt refuses permission to crawl
>>> the git repository:
>>>
>>> http://git.postgresql.org/robots.txt
>>>
>>> User-agent: *
>>> Disallow: /
>>>
>>>
>>> I'm curious what motivates this. It's certainly useful to be able to
>>> search for commits.
>>
>> Gitweb is horribly slow. I don't think anybody with a bigger git repo
>> using gitweb can afford to let all the crawlers go through it.
>
> Wouldn't whacking a reverse proxy in front be a pretty reasonable
> option? There's a disk space cost, but using Apache's mod_proxy or
> similar would do quite nicely.

We already run this, that's what we did to make it survive at all. The
problem is there are so many thousands of different URLs you can get
to on that site, and google indexes them all by default.

It's before we had this that the side regularly died.


--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to