On Vr, 2007-06-22 at 01:13 +0200, lars wrote: > Hi, > > > An alternate would be to put directives in the header if we are not > > operating at the root of the server. robots.txt is just simpler :) > > hm - this is a good point: if pootle has no control over > the /robots.txt file, > but still wants to prevent robots from going through all the linked > data, then > we should add the appropriate meta tag to the header of each page > (http://www.robotstxt.org/wc/meta-user.html). > > The attached patch prevents robots from harvesting all templates for > links. > The dynamically created robots.txt would be somehow obsolete then, but > it could > stay there to keep away robots which do not understand the meta tag. > > What do you think? > > Lars >
I think this is a very good solution. I think it is important to limit robots, since we have gazillions of links as soon as you go into the projects, some of them causing quite a bit of server activity, depending on the server setup. I would however consider a few of these different. Forgive the long winded comments. > Index: templates/projectadmin.html None of the admin pages can be reached by a robot, so I don't think we need to add the tag to these. > Index: templates/register.html I am not the expert on search engine optimisation, but I don't think the register, activate and login page is that interesting to a bot. Perhaps noindex? > Index: templates/home.html home.html can only be reached by a loged in user > Index: templates/error.html Let's hope a search engine never gets this one :-) Perhaps noindex here as well? > Index: templates/translatepage.html This one is interesting, since somebody might put up a link directly to a translate page. Do we want them to index it? If they got the meta tag, they already got the page anyway, so there is probably no harm. But I don't think a spider will be able to get to this page if it followed the rules otherwise. In other words, I agree with this one. > Index: templates/options.html Only available to logged in users. > Index: templates/pootlepage.html This is not a template for any real page produced by Pootle. It only contains elements that are reused. I guess the idea is that this is the master template for all the other pages, but it doesn't yet work that way. It is probably ok to patch this anyway as you suggest. > Index: templates/redirect.html I don't know if there will every be anything to index on a redirect page. Is there? > Index: templates/index.html I'm wondering about nofollow on this page: that would stop bots from going to about.html, am I right? That would be non-ideal, but I'm not sure what we can do about it. We can't afford that bots go into the top level pages for the languages or projects, so this doesn't quite help. The robots.txt solved this one nicely, I believe. Nicolas has some changes in mind for Debian, that might make it quite safe to let the robots go into the top level language and project pages, so perhaps in future we can relook at this. > Index: templates/about.html Perhaps we can allow follow here? This will usually contain links to the main project page and to our project page :-) ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Translate-pootle mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/translate-pootle
