Re: [Savannah-hackers-public] Please disallow www-commits in robots.txt

2023-05-14 Thread Ineiev
On Sun, May 14, 2023 at 09:34:50AM +0200, Thérèse Godefroy wrote: > > The problem is that the www-fr web repo doesn't seem to be checked out. > Dora added a blank index page, but we still get a 404 with the usual URL ... There are more problems: * The repository belongs the www-fr group, but it

Re: [Savannah-hackers-public] Please disallow www-commits in robots.txt

2023-05-14 Thread Thérèse Godefroy
Le 14/05/2023 à 09:34, Thérèse Godefroy a écrit : From: Karl Berry Date: Fri, 12 May 2023 15:10:46 -0600 [...] Thus, using a separate and private repo like www-fr as Therese suggested sounds to me like the best solution, both technically and philosophically. I'm sure it is possible

Re: [Savannah-hackers-public] Please disallow www-commits in robots.txt

2023-05-14 Thread Thérèse Godefroy
From: Karl Berry Date: Fri, 12 May 2023 15:10:46 -0600 [...] Thus, using a separate and private repo like www-fr as Therese suggested sounds to me like the best solution, both technically and philosophically. I'm sure it is possible somehow to restrict viewing the www-fr web pages to www

Re: [Savannah-hackers-public] Please disallow www-commits in robots.txt

2023-05-13 Thread Thérèse Godefroy
Le 12/05/2023 à 22:24, Ian Kelling a écrit : [...] robots.txt isn't about just not making things public, it is perfectly valid to use for SEO and helping google turn up the best results for your site and avoiding non-canonical duplicate pages. _Non-canonical_ is the most important word here.

Re: [Savannah-hackers-public] Please disallow www-commits in robots.txt

2023-05-13 Thread Thérèse Godefroy
Le 12/05/2023 à 22:24, Ian Kelling a écrit : [...] robots.txt isn't about just not making things public, it is perfectly valid to use for SEO and helping google turn up the best results for your site and avoiding non-canonical duplicate pages. I've seen www-commits list archive pages come up in

Re: [Savannah-hackers-public] Please disallow www-commits in robots.txt

2023-05-12 Thread Karl Berry
> In sum, this private playground is something webmasters want and need, > and search engines should have no business indexing it. Is it possible? robots.txt will not stop ill-behaved robots from indexing. Although Google and Duck Duck Go, to the best of my knowledge, do respect it, it

Re: [Savannah-hackers-public] Please disallow www-commits in robots.txt

2023-05-12 Thread Ian Kelling
Thérèse Godefroy writes: > Hello, > > I was searching for an article with DuckDuckGo, and guess what appeared > on top of the results... > https://lists.gnu.org/archive/html/www-commits/2023-05/msg00082.html > and > https://lists.gnu.org/archive/html/www-commits/2023-05/msg00062.html !! > > I

Re: [Savannah-hackers-public] Please disallow www-commits in robots.txt

2023-05-12 Thread Bob Proulx
Thérèse Godefroy wrote: > I understand it wouldn't be practical to make this list private, but it > could at least be off-limits for crawlers. I am not a member of the www team so I just don't know the answer to this question but why not? If the www team doesn't need the public web archive then

Re: [Savannah-hackers-public] Please disallow www-commits in robots.txt

2023-05-12 Thread Ineiev
On Fri, May 12, 2023 at 07:52:41PM +0200, Thérèse Godefroy wrote: > Le 12/05/2023 à 19:24, Alfred M. Szmidt a écrit : > > Might be worth noting that www.gnu.org is mostly usable locally from > > the CVS checkout as well, if one needs to look things over. > > > > It would work if the checked out

Re: [Savannah-hackers-public] Please disallow www-commits in robots.txt

2023-05-12 Thread Thérèse Godefroy
Le 12/05/2023 à 19:24, Alfred M. Szmidt a écrit : Might be worth noting that www.gnu.org is mostly usable locally from the CVS checkout as well, if one needs to look things over. It would work if the checked out files were complete HTML pages. Our problem is that they are not. The banner,

Re: [Savannah-hackers-public] Please disallow www-commits in robots.txt

2023-05-12 Thread Alfred M. Szmidt
That sounds reasonable to me. I would assume that if webmasters don't have access to change robots.txt, it would just be the fsf tech team. They do. It is in www. I'm still sorta opposing this .. we have always been transparant. That search engines "have no business" is as saying that

Re: [Savannah-hackers-public] Please disallow www-commits in robots.txt

2023-05-12 Thread Alfred M. Szmidt
Might be worth noting that www.gnu.org is mostly usable locally from the CVS checkout as well, if one needs to look things over. So a patch to www-discuss@ or whatever for a unpublished article would be sufficient in postponing any publishing. It would just be a matter of applying the patch, and

Re: [Savannah-hackers-public] Please disallow www-commits in robots.txt

2023-05-12 Thread Ian Kelling
Dora Scilipoti writes: > Hi, > > the gnu/server/staging directory was created years ago because GNU > webmasters expressed the wish to have a private space where they could > play around, test and see how different styles would look in the > website. Things like where to put the nav bar, which

Re: [Savannah-hackers-public] Please disallow www-commits in robots.txt

2023-05-12 Thread Dora Scilipoti
Hi, the gnu/server/staging directory was created years ago because GNU webmasters expressed the wish to have a private space where they could play around, test and see how different styles would look in the website. Things like where to put the nav bar, which colors to use, where to use h3s or

Re: [Savannah-hackers-public] Please disallow www-commits in robots.txt

2023-05-11 Thread Thérèse Godefroy
Le 11/05/2023 à 19:43, Ineiev a écrit : On Thu, May 11, 2023 at 10:17:52AM +0200, Thérèse Godefroy wrote: Instead, we could use the www-fr cvs repo for things that should stay unpublished. It is empty and is not linked anymore from the www-fr project page. It still shall be available for

Re: [Savannah-hackers-public] Please disallow www-commits in robots.txt

2023-05-11 Thread Ineiev
On Thu, May 11, 2023 at 10:17:52AM +0200, Thérèse Godefroy wrote: > > Instead, we could use the www-fr cvs repo for things that should stay > unpublished. It is empty and is not linked anymore from the www-fr > project page. It still shall be available for anonymous checkouts and at

Re: [Savannah-hackers-public] Please disallow www-commits in robots.txt

2023-05-11 Thread Thérèse Godefroy
Le 11/05/2023 à 07:04, Alfred M. Szmidt a écrit : [...] What about putting such articles / pages on fp? Or sharing them on some internal list? The difference between the staging area and other workspaces is that it acts as a preview. The unpublished article looks exactly as it will in its

Re: [Savannah-hackers-public] Please disallow www-commits in robots.txt

2023-05-10 Thread Alfred M. Szmidt
Le 10/05/2023 à 20:50, Alfred M. Szmidt a écrit : > > You've not explained the actual problem. What are you trying to > > solve? > > "it" is the www-commits list, which registers all changes to the www > directory, including to pages that are not published

Re: [Savannah-hackers-public] Please disallow www-commits in robots.txt

2023-05-10 Thread Corwin Brust
On Wed, May 10, 2023 at 2:04 PM Thérèse Godefroy wrote: > > Le 10/05/2023 à 20:50, Alfred M. Szmidt a écrit : > > > > The purpose of robots.txt is to avoid overloading a web site, it is > > not to disallow access to pages. > > > > This still does not explain what the problem is -- "don't let

Re: [Savannah-hackers-public] Please disallow www-commits in robots.txt

2023-05-10 Thread Thérèse Godefroy
Le 10/05/2023 à 20:50, Alfred M. Szmidt a écrit : > You've not explained the actual problem. What are you trying to > solve? "it" is the www-commits list, which registers all changes to the www directory, including to pages that are not published yet. I suspect most of the

Re: [Savannah-hackers-public] Please disallow www-commits in robots.txt

2023-05-10 Thread Alfred M. Szmidt
> You've not explained the actual problem. What are you trying to > solve? "it" is the www-commits list, which registers all changes to the www directory, including to pages that are not published yet. I suspect most of the other *-commits lists deal with source code repositories,

Re: [Savannah-hackers-public] Please disallow www-commits in robots.txt

2023-05-10 Thread Thérèse Godefroy
Le 10/05/2023 à 19:38, Alfred M. Szmidt a écrit : Because it registers every single commit to www, What is "it"? How is this different from _any_ -commits list we have? including to working directories that webmasters have disallowed, for instance */po/, /server/staging/,

Re: [Savannah-hackers-public] Please disallow www-commits in robots.txt

2023-05-10 Thread Alfred M. Szmidt
Because it registers every single commit to www, What is "it"? How is this different from _any_ -commits list we have? including to working directories that webmasters have disallowed, for instance */po/, /server/staging/, */workshop/, /prep/gnumaint/, etc. Ok, and? Please

Re: [Savannah-hackers-public] Please disallow www-commits in robots.txt

2023-05-10 Thread Thérèse Godefroy
Le 10/05/2023 à 19:14, Alfred M. Szmidt a écrit : I was searching for an article with DuckDuckGo, and guess what appeared on top of the results... https://lists.gnu.org/archive/html/www-commits/2023-05/msg00082.html and

Re: [Savannah-hackers-public] Please disallow www-commits in robots.txt

2023-05-10 Thread Alfred M. Szmidt
I was searching for an article with DuckDuckGo, and guess what appeared on top of the results... https://lists.gnu.org/archive/html/www-commits/2023-05/msg00082.html and https://lists.gnu.org/archive/html/www-commits/2023-05/msg00062.html !! What is the problem? I understand