On 07/09/2013 11:24 AM, Greg Stark wrote:
I note that git.postgresql.org's robot.txt refuses permission to crawl
the git repository:
I'm curious what motivates this. It's certainly useful to be able to
search for commits. I frequently type git commit hashes into Google to
find the commit in other projects. I think I've even done it in
Postgres before and not had a problem. Maybe Google brought up github
or something else.
Fwiw the reason I noticed this is because I searched for "postgresql
git log" and the first hit was for "see the commit that fixed the
issue, with all the gory details" which linked to
This was indexed despite the robot.txt because it was linked to from
elsewhere (Hence the interesting link title). There are ways to ask
Google not to index pages if that's really what we're after but I
don't see why we would be.
It's certainly not universal. For example, the only reason I found
buildfarm client commit d533edea5441115d40ffcd02bd97e64c4d5814d9, for
which the repo is housed at GitHub, is that Google has indexed the
buildfarm commits mailing list on pgfoundry. Do we have a robots.txt on
the postgres mailing list archives site?
Sent via pgsql-hackers mailing list (firstname.lastname@example.org)
To make changes to your subscription: