On Vr, 2007-06-22 at 01:13 +0200, lars wrote:
> Hi,
> 
> > An alternate would be to put directives in the header if we are not
> > operating at the root of the server.  robots.txt is just simpler :)
> 
> hm - this is a good point: if pootle has no control over
> the /robots.txt file,
> but still wants to prevent robots from going through all the linked
> data, then
> we should add the appropriate meta tag to the header of each page
> (http://www.robotstxt.org/wc/meta-user.html).
> 
> The attached patch prevents robots from harvesting all templates for
> links.
> The dynamically created robots.txt would be somehow obsolete then, but
> it could
> stay there to keep away robots which do not understand the meta tag.
> 
> What do you think?
> 
> Lars
> 

I think this is a very good solution. I think it is important to limit
robots, since we have gazillions of links as soon as you go into the
projects, some of them causing quite a bit of server activity, depending
on the server setup.

I would however consider a few of these different. Forgive the long
winded comments.

> Index: templates/projectadmin.html

None of the admin pages can be reached by a robot, so I don't think we
need to add the tag to these.

> Index: templates/register.html

I am not the expert on search engine optimisation, but I don't think the
register, activate and login page is that interesting to a bot. Perhaps
noindex?


> Index: templates/home.html

home.html can only be reached by a loged in user

> Index: templates/error.html

Let's hope a search engine never gets this one :-)  Perhaps noindex here
as well?

> Index: templates/translatepage.html

This one is interesting, since somebody might put up a link directly to
a translate page. Do we want them to index it? If they got the meta tag,
they already got the page anyway, so there is probably no harm. But I
don't think a spider will be able to get to this page if it followed the
rules otherwise. In other words, I agree with this one.

> Index: templates/options.html

Only available to logged in users.

> Index: templates/pootlepage.html

This is not a template for any real page produced by Pootle. It only
contains elements that are reused. I guess the idea is that this is the
master template for all the other pages, but it doesn't yet work that
way. It is probably ok to patch this anyway as you suggest.

> Index: templates/redirect.html

I don't know if there will every be anything to index on a redirect
page. Is there?

> Index: templates/index.html

I'm wondering about nofollow on this page: that would stop bots from
going to about.html, am I right? That would be non-ideal, but I'm not
sure what we can do about it. We can't afford that bots go into the top
level pages for the languages or projects, so this doesn't quite help.
The robots.txt solved this one nicely, I believe. Nicolas has some
changes in mind for Debian, that might make it quite safe to let the
robots go into the top level language and project pages, so perhaps in
future we can relook at this.

> Index: templates/about.html

Perhaps we can allow follow here? This will usually contain links to the
main project page and to our project page :-) 


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Translate-pootle mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/translate-pootle

Reply via email to