bug#46807: [website] return 404 with HTTP header 'Accept-Language: zh-CN, zh'

2021-03-10 Thread pelzflorian (Florian Pelz)
Pushed to maintenance.git as 82b075685b6089c7f98acb0993c003936d833776.

Closing.  Thank you all!





bug#46807: [website] return 404 with HTTP header 'Accept-Language: zh-CN, zh'

2021-03-08 Thread Ludovic Courtès
Hi,

"pelzflorian (Florian Pelz)"  skribis:

> The attached patch to maintenance.git fixes the remaining minor issue:
> Now Accept-Language language codes get normalized, zh to zh-CN, so web
> browsers requesting any kind of Chinese get the website in mainland
> Chinese.  (This is a minor issue.  The only valid URL is /zh-CN/ since
> my last patch to guix-artwork because I don’t know how to
> rewrite/redirect URLs in nginx.)
>
> The patch was tested on a berlin VM.

Yay!

> There is no copyright header in maintenance.git’s
> hydra/nginx/berlin.scm so I did not add a copyright.  I hereby license
> the patch CC0
> .

Good point; I guess it was meant to be GPLv3+ like the rest, but thanks
for clarifying.

> Shall I just push?  A reconfigure of berlin will be necessary but is
> not urgent.

Yes, sounds good!

We’ll reconfigure sooner or later, just ping if you don’t see it happen
within two weeks or so.

Thanks,
Ludo’.





bug#46807: [website] return 404 with HTTP header 'Accept-Language: zh-CN, zh'

2021-03-05 Thread YLC
Thank you for your help! Everything goes fine now.





bug#46807: [website] return 404 with HTTP header 'Accept-Language: zh-CN, zh'

2021-03-05 Thread pelzflorian (Florian Pelz)
Hello all,

On Mon, Mar 01, 2021 at 11:06:59AM +0100, Ludovic Courtès wrote:
> Florian, could it be that we’re not normalizing language tags
> appropriately?  Does that ring a bell?

The attached patch to maintenance.git fixes the remaining minor issue:
Now Accept-Language language codes get normalized, zh to zh-CN, so web
browsers requesting any kind of Chinese get the website in mainland
Chinese.  (This is a minor issue.  The only valid URL is /zh-CN/ since
my last patch to guix-artwork because I don’t know how to
rewrite/redirect URLs in nginx.)

The patch was tested on a berlin VM.

There is no copyright header in maintenance.git’s
hydra/nginx/berlin.scm so I did not add a copyright.  I hereby license
the patch CC0
.

Shall I just push?  A reconfigure of berlin will be necessary but is
not urgent.

Regards,
FlorianFrom: Florian Pelz 
Date: Thu, 4 Mar 2021 20:29:27 +0100
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Subject: [PATCH] nginx: berlin: Normalize Accept-Language language code zh to
 zh-CN.

Now web browsers requesting any kind of Chinese get the website in
mainland Chinese.

zh, zh-Hans, zh-Hans-CN all are synonymous with zh-CN now.

* hydra/nginx/berlin.scm (accept-languages): New procedure.
(%extra-content): Normalize $lang variable with it.
---
 hydra/nginx/berlin.scm | 27 ++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/hydra/nginx/berlin.scm b/hydra/nginx/berlin.scm
index 85aaf38..4b9d297 100644
--- a/hydra/nginx/berlin.scm
+++ b/hydra/nginx/berlin.scm
@@ -995,12 +995,37 @@ PUBLISH-URL."
(uri "~ /(.*)")
(body (list "return 301 $scheme://guixwl.org/$1;"
 
+(define (accept-languages language-lists)
+  "Returns nginx configuration code to set up the $lang variable
+according to the Accept-Language header in the HTTP request.  The
+requesting user agent will be served the files at /$lang/some/url.
+Each list in LANGUAGE-LISTS starts with the $lang and is followed by
+synonymous IETF language tags that should be mapped to the same $lang."
+  (define (language-mappings language-list)
+(define (language-mapping language)
+  (string-join (list ""  language (car language-list) ";")))
+(string-join (map language-mapping language-list) "\n"))
+
+  (let ((directives
+ `(,(string-join
+ `("set_from_accept_language $lang_unmapped"
+   ,@(map string-join language-lists)
+   ";"))
+   "map $lang_unmapped $lang {"
+   ,@(map language-mappings language-lists)
+   "}")))
+(string-join directives "\n")))
+
 (define %extra-content
   (list
"default_type  application/octet-stream;"
"sendfileon;"
 
-   "set_from_accept_language $lang en de es fr zh-CN;"
+   (accept-languages '(("en")
+   ("de")
+   ("es")
+   ("fr")
+   ("zh-CN" "zh" "zh-Hans" "zh-Hans-CN")))
 
;; Maximum chunk size to send.  Partly this is a workaround for
;; , but also the nginx docs mention that
-- 
2.30.1



bug#46807: [website] return 404 with HTTP header 'Accept-Language: zh-CN, zh'

2021-03-04 Thread pelzflorian (Florian Pelz)
On Sat, Feb 27, 2021 at 01:31:40PM +0100, Tobias Geerinckx-Rice via Bug reports 
for GNU Guix wrote:
> I expect that adding it and changing ietf-tags.scm to use "zh-CN" will fix
> both 404s, but need to check that it doesn't break anything else.

I made the tiny change to guix-artwork’s ietf-tags.scm as
04c96a370b8cae48ed162e4414b8950cc65c513b now (sorry for taking so
long):

diff --git a/website/po/ietf-tags.scm b/website/po/ietf-tags.scm
index 32b81ef..5bd22f4 100644
--- a/website/po/ietf-tags.scm
+++ b/website/po/ietf-tags.scm
@@ -10,4 +10,4 @@
  ("de_DE" . "de")
  ("es_ES" . "es")
  ("fr_FR" . "fr")
- ("zh_CN" . "zh-cn"))
+ ("zh_CN" . "zh-CN"))

Note that the prior zh-cn URLs will be broken.

I will play around with nginx’ map directive to make zh-cn and zh
Accept-Language settings direct to the proper URL later, afterwards I
will close this bug.  zh-cn URLs remain invalid.  Links to the manual
continue to use zh-cn.

For testing I dug out the VM code

where I had removed parts of berlin that are not relevant to the
website.  The change breaks neither website nor manual.

Thanks ylc991 for the report!

Regards,
Florian





bug#46807: [website] return 404 with HTTP header 'Accept-Language: zh-CN, zh'

2021-03-01 Thread pelzflorian (Florian Pelz)
Hello,

On Mon, Mar 01, 2021 at 11:06:59AM +0100, Ludovic Courtès wrote:
> Florian, could it be that we’re not normalizing language tags
> appropriately?  Does that ring a bell?

Tobias’ analysis likely is correct.  I haven’t yet build a current
berlin virtual machine to test though.

We’re not normalizing language tags at all currently.  Doing URL
redirects in nginx confuses me greatly; I have no idea how to
concisely specify redirects *and* have them execute in the right
order.  The many lines

(redirect "/blog/2006/purely-functional-software-deployment-model" 
"/$lang/blog/2006/purely-functional-software-deployment-model/")

and similar in maintenance.git’s hydra/nginx/berlin.scm file are a bad
solution and are testament to my confusion.  I would not like one line
for each package.

Regards,
Florian





bug#46807: [website] return 404 with HTTP header 'Accept-Language: zh-CN, zh'

2021-03-01 Thread Ludovic Courtès
Hello,

ylc991  skribis:

> Hello! My webbrowser has set ‘Accept-Language’ to 'zh-CN,zh' by default, and 
> https://guix.gnu.org returns 404. I have tested with curl, 'zh-CN,zh', 
> 'zh-CN',
> 'zh-cn' is 404 while 'zh', 'zh_CN' is 200.

Florian, could it be that we’re not normalizing language tags
appropriately?  Does that ring a bell?

Thanks for your report!

Ludo’.





bug#46807: [website] return 404 with HTTP header 'Accept-Language: zh-CN, zh'

2021-02-27 Thread Julien Lepiller
It might be related to translations. When you use zh-cn, we have a translation 
for that language, so you're redirected to it. Not sure why you get a 404 
though.

Le 26 février 2021 21:18:12 GMT-05:00, ylc991  a écrit :
>Hello! My webbrowser has set ‘Accept-Language’ to 'zh-CN,zh' by
>default, and https://guix.gnu.org returns 404. I have tested with curl,
>'zh-CN,zh', 'zh-CN', 'zh-cn' is 404 while 'zh', 'zh_CN' is 200.
>
>
>The first time I found it is on 2021-02-23. And it didn't happened
>about one or two months ago. I think there may be something wrong with
>the web server.


bug#46807: [website] return 404 with HTTP header 'Accept-Language: zh-CN, zh'

2021-02-27 Thread Tobias Geerinckx-Rice via Bug reports for GNU Guix

Ylc991,

Thanks for the report!

My verbose notes so far; I need to (finally!) set up a local build 
of the Web site first.


ylc991 写道:
Hello! My webbrowser has set ‘Accept-Language’ to 'zh-CN,zh' by 
default, and https://guix.gnu.org returns 404.


Indeed, handling of zh-CN specifically is broken.  :-(

--8<---cut here---start->8---
~ λ curl -LI -H 'Accept-Language: zh-cn' https://guix.gnu.org
HTTP/1.1 404 Not Found
[...]
--8<---cut here---end--->8---

This is because our nginx configuration 
(maintenance/hydra/nginx/berlin.scm) does:


--8<---cut here---start->8---
set_from_accept_language $lang en de es fr zh-CN;
[...]
try_files $uri /$lang/$uri /$lang/$uri/index.html =404;
--8<---cut here---end--->8---

i.e., it looks in /srv/guix.gnu.org/zh-CN, but our website uses...

--8<---cut here---start->8---
nckx@berlin ~$ ls -d /srv/guix.gnu.org/zh*
/srv/guix.gnu.org/zh-cn/
--8<---cut here---end--->8---

...lowercase.  This questionable choice comes from 
artwork/po/ietf-tags.scm:


--8<---cut here---start->8---
;;; This file contains an association list for each translation 
   from
;;; the locale to an IETF language tag to be used in the URL path 
   of
;;; translated pages.  The language tag results from the 
   translation

;;; team<80><99>s language code from
;;; .  The 
   underscore
;;; in the team<80><99>s code is replaced by a hyphen.  For 
   example, az would
;;; be used for the Azerbaijani language (not az-Latn) and zh-CN 
   would

;;; be used for mainland Chinese (not zh-Hans-CN)
([...]
("zh_CN" . "zh-cn"))
--8<---cut here---end--->8---

Questionable only because, while a lowercase region is technically 
valid, it's so rare that it's likely to cause problems -- as we 
found out.



I have tested with curl, 'zh-CN,zh', 'zh-CN', [is 404]


These are valid, so the nginx accept-language module accepts them, 
but then looks for a subdirectory that doesn't exist and returns 
404.



'zh-cn' is 404


This is valid, but since we configure the accept-language module 
to use ‘zh-CN’ it normalises $lang to the latter.  Which is good, 
but it causes the same 404 as above.



'zh_CN' is 200.


This is bogus (‘_’ is not valid), hence ignored, and so the site 
falls back to English 200.



'zh' [is 200]


Valid but the accept-language module is not clever; we need to add 
an explicit 'zh' entry for that to work:


--8<---cut here---start->8---
set_from_accept_language $lang en de es fr zh-CN zh en;
--8<---cut here---end--->8---

I expect that adding it and changing ietf-tags.scm to use "zh-CN" 
will fix both 404s, but need to check that it doesn't break 
anything else.


The other untested solution is using lowercase

--8<---cut here---start->8---
set_from_accept_language $lang en de es fr zh-cn zh en;
--8<---cut here---end--->8---

but I--assuming that even works--'m not fond of making the 
unconventional the norm.


Kind regards,

T G-R


signature.asc
Description: PGP signature


bug#46807: [website] return 404 with HTTP header 'Accept-Language: zh-CN, zh'

2021-02-27 Thread ylc991
Hello! My webbrowser has set ‘Accept-Language’ to 'zh-CN,zh' by default, and https://guix.gnu.org returns 404. I have tested with curl, 'zh-CN,zh', 'zh-CN', 'zh-cn' is 404 while 'zh', 'zh_CN' is 200.

The first time I found it is on 2021-02-23. And it didn't happened about one or two months ago. I think there may be something wrong with the web server.