Re: [PATCH]: ls: do not show long iso time format for en_* locales
Jim Meyering j...@meyering.net writes: However, I'm a little reluctant to change back. Let's wait a day or two, in case Paul Eggert has an objection. Here are some objections to the change, under the assumption that we're in a poorly-configured environment (as the behavior is unaffected in well-configured ones): * The change will cause column-alignment problems in non-English locales where %b generates different numbers of columns for different months. * There are more users in non-English locales than in non-C English locales, and the harm in the non-English case (incomprehensible dates) is much greater than the harm in the English case (comprehensible but ugly dates). * The existing code is better for the poorly-configured case where only one of the two translations is missing. Unless I'm missing something, I'd leave it alone, as it sounds like the change will cause more trouble than it'll cure.
Re: [PATCH]: ls: do not show long iso time format for en_* locales
Pádraig Brady wrote: Jim Meyering wrote: The only advantage is that my patch uses the existing framework, rather than adding special case code in ls.c proper. Whether that is worth the apparent complexity... If you prefer his patch and want to adjust it and handle the rest, I have no objection. Yes it's debatable. I'll take a closer look at both and send an updated one if I think the special case in ls.c more appropriate. Thinking more about this I'm wondering about special casing en_* at all. The result of this patch is that for most people the usual timestamp format changes from 1 (ISO) to 3 fields (POSIX). So the first minor issue I have is that ISO has been the usual format for 4 years at least, so I suspect that this might trigger bugs in scripts parsing ls output. I do prefer the traditional POSIX specified format myself and I'm surprised that no one reported this until now. In summary I'm about 60:40 for making the change, and if we do I'll add appropriate text to NEWS. The other question I have is why do we assume ISO anyway when a format translation it not available? For example we've no translations for en_PH or tl_PH and so at the moment they'll get ISO format even though Tagalog month abbreviations are available: $ LANG=tl_PH locale abmon Ene;Peb;Mar;Abr;May;Hun;Hul;Ago;Sep;Okt;Nob;Dis Now if we do apply the special casing for en_* then you'll have different date formats for en_PH and tl_PH. Really the date format is associated with the country rather than the language. (Note I don't think we can determine whether abmon is specific to the locale or whether it's just the C default). So I think if we accept the first point above that we would change the default format to POSIX for most people I think we should just remove the code defaulting to ISO if a translation is not available ? I.E. revert part of 6837183d as follows: diff --git a/src/ls.c b/src/ls.c index 1bb6873..4531b94 100644 --- a/src/ls.c +++ b/src/ls.c @@ -2014,7 +2014,6 @@ decode_switches (int argc, char **argv) break; case long_iso_time_style: - case_long_iso_time_style: long_time_format[0] = long_time_format[1] = %Y-%m-%d %H:%M; break; @@ -2030,13 +2029,8 @@ decode_switches (int argc, char **argv) formats. If not, fall back on long-iso format. */ int i; for (i = 0; i 2; i++) - { -char const *locale_format = - dcgettext (NULL, long_time_format[i], LC_TIME); -if (locale_format == long_time_format[i]) - goto case_long_iso_time_style; -long_time_format[i] = locale_format; - } + long_time_format[i] = +dcgettext (NULL, long_time_format[i], LC_TIME); } } /* Note we leave %5b etc. alone so user widths/flags are honored. */
Re: [PATCH]: ls: do not show long iso time format for en_* locales
Pádraig Brady wrote: Pádraig Brady wrote: Jim Meyering wrote: The only advantage is that my patch uses the existing framework, rather than adding special case code in ls.c proper. Whether that is worth the apparent complexity... If you prefer his patch and want to adjust it and handle the rest, I have no objection. Yes it's debatable. I'll take a closer look at both and send an updated one if I think the special case in ls.c more appropriate. Thinking more about this I'm wondering about special casing en_* at all. The result of this patch is that for most people the usual timestamp format changes from 1 (ISO) to 3 fields (POSIX). ISO is 2 fields actually :P So the first minor issue I have is that ISO has been the usual format for 4 years at least, so I suspect that this might trigger bugs in scripts parsing ls output. I do prefer the traditional POSIX specified format myself and I'm surprised that no one reported this until now. In summary I'm about 60:40 for making the change, and if we do I'll add appropriate text to NEWS. The other question I have is why do we assume ISO anyway when a format translation it not available? For example we've no translations for en_PH or tl_PH and so at the moment they'll get ISO format even though Tagalog month abbreviations are available: $ LANG=tl_PH locale abmon Ene;Peb;Mar;Abr;May;Hun;Hul;Ago;Sep;Okt;Nob;Dis Now if we do apply the special casing for en_* then you'll have different date formats for en_PH and tl_PH. Really the date format is associated with the country rather than the language. (Note I don't think we can determine whether abmon is specific to the locale or whether it's just the C default). So I think if we accept the first point above that we would change the default format to POSIX for most people I think we should just remove the code defaulting to ISO if a translation is not available ? The full patch is attached. cheers, Pádraig. From 2829fa07edc1ef3f550521b53999dc53c89ff215 Mon Sep 17 00:00:00 2001 From: =?utf-8?q?P=C3=A1draig=20Brady?= p...@draigbrady.com Date: Mon, 28 Sep 2009 17:32:15 +0100 Subject: [PATCH] ls: use the POSIX date style when the locale does not specify one Previously we defaulted to long-iso format in locales without specific format translations, like the en_* locales for example. This reverts part of commit 6837183d, 08-11-2005, ls ... acts like --time-style='posix-long-iso' if the locale settings are messed up * src/ls.c (decode_switches): Only use the ISO format when specified. * NEWS: Mention the change in behavior. Reported at http://bugzilla.redhat.com/525134 --- NEWS |5 + src/ls.c | 10 ++ 2 files changed, 7 insertions(+), 8 deletions(-) diff --git a/NEWS b/NEWS index 1571c9c..0380975 100644 --- a/NEWS +++ b/NEWS @@ -32,6 +32,11 @@ GNU coreutils NEWS-*- outline -*- last component (possibly via a dangling symlink) can be created, since mkdir will succeed in that case. + ls -l now uses the traditional three field time style rather than + the two field numeric ISO style, in locales where a specific style + has not been specified. This currently affects all English language + locales for example. [old behavior was introduced in coreutils-6.0] + ** Improvements rm: rewrite to use gnulib's fts diff --git a/src/ls.c b/src/ls.c index 1bb6873..4531b94 100644 --- a/src/ls.c +++ b/src/ls.c @@ -2014,7 +2014,6 @@ decode_switches (int argc, char **argv) break; case long_iso_time_style: - case_long_iso_time_style: long_time_format[0] = long_time_format[1] = %Y-%m-%d %H:%M; break; @@ -2030,13 +2029,8 @@ decode_switches (int argc, char **argv) formats. If not, fall back on long-iso format. */ int i; for (i = 0; i 2; i++) - { -char const *locale_format = - dcgettext (NULL, long_time_format[i], LC_TIME); -if (locale_format == long_time_format[i]) - goto case_long_iso_time_style; -long_time_format[i] = locale_format; - } + long_time_format[i] = +dcgettext (NULL, long_time_format[i], LC_TIME); } } /* Note we leave %5b etc. alone so user widths/flags are honored. */ -- 1.6.2.5
Re: [PATCH]: ls: do not show long iso time format for en_* locales
Pádraig Brady wrote: The full patch is attached. ... That looks fine. Thanks. However, I'm a little reluctant to change back. Let's wait a day or two, in case Paul Eggert has an objection. Would you please add a test to exercise this, perhaps based on the one from Ondřej? From 2829fa07edc1ef3f550521b53999dc53c89ff215 Mon Sep 17 00:00:00 2001 From: =?utf-8?q?P=C3=A1draig=20Brady?= p...@draigbrady.com Date: Mon, 28 Sep 2009 17:32:15 +0100 Subject: [PATCH] ls: use the POSIX date style when the locale does not specify one ...
Re: [PATCH]: ls: do not show long iso time format for en_* locales
Jim Meyering wrote: Pádraig Brady wrote: Paul Eggert wrote: Ondřej Vašík ova...@redhat.com writes: as reported in https://bugzilla.redhat.com/show_bug.cgi?id=525134 by Daniel Qarras, ls -l shows iso long format for en_* locales. I just now read that Bugzilla report, and the diagnosis and the patch do not seem correct. The diagnosis says: In ls.c (case locale_time_style) is dcgettext (NULL, long_time_format[i], LC_TIME); ... that translates the string, but the translation is THE SAME as the default - as the format is the same for en_* locales. But that is not what the ls.c source code does. The code does this: char const *locale_format = dcgettext (NULL, long_time_format[i], LC_TIME); if (locale_format == long_time_format[i]) goto case_long_iso_time_style; The == test returns true when dcgettext returns the msgid (its 2nd argument) because it finds no translation. Right. We don't have any translations for en. This is key. Looking at Makefile.in.in, you can see that the file /usr/share/locale/en/LC_TIME/coreutils.mo is never installed, and neither is .../LC_MESSAGES/coreutils.mo, because coreutils has no po/en.po file. Pre-optimizers might argue that not installing the latter is a good thing, because gettext will then not waste time with the fstat+mmap+close that it normally performs after each successful .mo file open. Not to mention the cost of each subsequent failed message lookup. I try not to pre-optimize, so question whether the Makefile.in.in patch is worthwhile, but from an aesthetics point of view, I prefer not to install the LC_MESSAGES/coreutils.mo file. The core of the patch is this two-line addition: ++lang=en; for lc in '' $(EXTRA_LOCALE_CATEGORIES); do \ ++ test -n $$lc mv -f $(message_mo) $(lc_mo_file); done All of the rest is factoring. I'm leaning towards rewriting it to insert the non-factored equivalent. However, one advantage of using the patch is that it records the context: if we update to a newer gettext that changes the body of that rule, the patch will no longer apply, and that will make bootstrap fail. One alternative that could be used with the 3-line-insertion approach is to record (and always cross-check) a checksum of the rule contents. With this patch, in an en* locale, ls -l now uses the conventional date formats. Here's an incomplete patch. It needs a test and a NEWS entry. Ondřej, can you adjust your test to work (or skip) if there is no en* locale? From 53c88d531d88e5d4ef393a61758bc3fee4894e48 Mon Sep 17 00:00:00 2001 From: Jim Meyering meyer...@redhat.com Date: Sat, 26 Sep 2009 14:59:05 +0200 Subject: [PATCH] ls: use conventional date format in long listing for en* locales * bootstrap.conf: Generate po/en.po with identity translations for the two LC_TIME strings required by ls.c. * configure.ac: Require gettext version 0.17. * po/Makefile.in.in-patch: Patch autopoint-provided file so that it provides the LC_TIME .mo file, but not the LC_MESSAGES one. * po/.gitattributes: Allow space-before-TAB in the patch. --- bootstrap.conf | 41 + configure.ac|2 +- po/.gitattributes |1 + po/Makefile.in.in-patch | 65 +++ So that will apply generate an en.po with the traditional unix format to apply to en_* for e.g. $ locale -a | sed -n 's/\(en_..\).*/\1/p' | sort -u | while read LANG; do echo $LANG $(locale territory); done en_AG Antigua and Barbuda en_AU Australia en_BW Botswana en_CA Canada en_DK Denmark en_GB Great Britain en_HK Hong Kong en_IE Ireland en_IN India en_NG Nigeria en_NZ New Zealand en_PH Philippines en_SG Singapore en_US USA en_ZA South Africa en_ZW Zimbabwe Is that not functionally equivalent to Ondřej's patch which is much simpler and probably more efficient? His patch will also use an en_ translation if supplied (say if en_HK wanted a different format). cheers, Pádraig.
Re: [PATCH]: ls: do not show long iso time format for en_* locales
Pádraig Brady wrote: ... So that will apply generate an en.po with the traditional unix format to apply to en_* for e.g. Right. $ locale -a | sed -n 's/\(en_..\).*/\1/p' | sort -u | while read LANG; do echo $LANG $(locale territory); done en_AG Antigua and Barbuda en_AU Australia en_BW Botswana ... Is that not functionally equivalent to Ondřej's patch Yes. However, ... which is much simpler and probably more efficient? His patch will also use an en_ translation if supplied his added setlocale call (unnecessary in a non-en_* locale) is not a plus. Though that could be fixed. (say if en_HK wanted a different format). The only advantage is that my patch uses the existing framework, rather than adding special case code in ls.c proper. Whether that is worth the apparent complexity... If you prefer his patch and want to adjust it and handle the rest, I have no objection.
Re: [PATCH]: ls: do not show long iso time format for en_* locales
Pádraig Brady wrote: Paul Eggert wrote: Ondřej Vašík ova...@redhat.com writes: as reported in https://bugzilla.redhat.com/show_bug.cgi?id=525134 by Daniel Qarras, ls -l shows iso long format for en_* locales. I just now read that Bugzilla report, and the diagnosis and the patch do not seem correct. The diagnosis says: In ls.c (case locale_time_style) is dcgettext (NULL, long_time_format[i], LC_TIME); ... that translates the string, but the translation is THE SAME as the default - as the format is the same for en_* locales. But that is not what the ls.c source code does. The code does this: char const *locale_format = dcgettext (NULL, long_time_format[i], LC_TIME); if (locale_format == long_time_format[i]) goto case_long_iso_time_style; The == test returns true when dcgettext returns the msgid (its 2nd argument) because it finds no translation. Right. We don't have any translations for en. This is key. Looking at Makefile.in.in, you can see that the file /usr/share/locale/en/LC_TIME/coreutils.mo is never installed, and neither is .../LC_MESSAGES/coreutils.mo, because coreutils has no po/en.po file. Pre-optimizers might argue that not installing the latter is a good thing, because gettext will then not waste time with the fstat+mmap+close that it normally performs after each successful .mo file open. Not to mention the cost of each subsequent failed message lookup. I try not to pre-optimize, so question whether the Makefile.in.in patch is worthwhile, but from an aesthetics point of view, I prefer not to install the LC_MESSAGES/coreutils.mo file. The core of the patch is this two-line addition: ++ lang=en; for lc in '' $(EXTRA_LOCALE_CATEGORIES); do \ ++test -n $$lc mv -f $(message_mo) $(lc_mo_file); done All of the rest is factoring. I'm leaning towards rewriting it to insert the non-factored equivalent. However, one advantage of using the patch is that it records the context: if we update to a newer gettext that changes the body of that rule, the patch will no longer apply, and that will make bootstrap fail. One alternative that could be used with the 3-line-insertion approach is to record (and always cross-check) a checksum of the rule contents. With this patch, in an en* locale, ls -l now uses the conventional date formats. Here's an incomplete patch. It needs a test and a NEWS entry. Ondřej, can you adjust your test to work (or skip) if there is no en* locale? From 53c88d531d88e5d4ef393a61758bc3fee4894e48 Mon Sep 17 00:00:00 2001 From: Jim Meyering meyer...@redhat.com Date: Sat, 26 Sep 2009 14:59:05 +0200 Subject: [PATCH] ls: use conventional date format in long listing for en* locales * bootstrap.conf: Generate po/en.po with identity translations for the two LC_TIME strings required by ls.c. * configure.ac: Require gettext version 0.17. * po/Makefile.in.in-patch: Patch autopoint-provided file so that it provides the LC_TIME .mo file, but not the LC_MESSAGES one. * po/.gitattributes: Allow space-before-TAB in the patch. --- bootstrap.conf | 41 + configure.ac|2 +- po/.gitattributes |1 + po/Makefile.in.in-patch | 65 +++ 4 files changed, 108 insertions(+), 1 deletions(-) create mode 100644 po/.gitattributes create mode 100644 po/Makefile.in.in-patch diff --git a/bootstrap.conf b/bootstrap.conf index 726092c..c2e349d 100644 --- a/bootstrap.conf +++ b/bootstrap.conf @@ -320,10 +320,51 @@ tar- # Automake requires that ChangeLog exist. touch ChangeLog || exit 1 +# The following is to accommodate coreutils' use of LC_TIME. +# Two steps: +# - create a nearly empty po/en.po +# - patch po/Makefile.in.in to do what we want at install-time +# +# The existence of po/en.po ensures that $(localedir)/en/LC_TIME/coreutils.mo +# will be created at install time. That file is required by ls.c's +# dcgettext call, when run in an en* locale. +date=$(date '+%F %H:%M%z') +ver=$(cat .prev-version) +pkg=$(sed -n 's/AC_INIT(\[GNU \(.*\)\],.*/\1/p' configure.ac) + +cat EOF po/en.po +# Copyright 2009 Free Software Foundation, Inc. +# This file is distributed under the same license as the $pkg package. +msgid +msgstr +Project-Id-Version: $pkg-$ver\n +Report-Msgid-Bugs-To: bug-$...@gnu.org\n +POT-Creation-Date: $date\n +PO-Revision-Date: $date\n +Last-Translator: nobody\n +Language-Team: none\n +MIME-Version: 1.0\n +Content-Type: text/plain; charset=UTF-8\n +Content-Transfer-Encoding: 8bit\n + +msgid %b %e %Y +msgstr %b %e %Y + +msgid %b %e %H:%M +msgstr %b %e %H:%M +EOF + bootstrap_epilogue() { # Change paths in gnulib-tests/gnulib.mk from ../.. to ... m=gnulib-tests/gnulib.mk sed 's,\.\./\.\.,..,g' $m $m-t mv -f $m-t $m + + # Patch the autopoint-supplied po/Makefile.in.in to ensure that we + # install en/LC_TIME/coreutils.mo, but not,
Re: [PATCH]: ls: do not show long iso time format for en_* locales
Paul Eggert wrote: Ondřej Vašík ova...@redhat.com writes: as reported in https://bugzilla.redhat.com/show_bug.cgi?id=525134 by Daniel Qarras, ls -l shows iso long format for en_* locales. I just now read that Bugzilla report, and the diagnosis and the patch do not seem correct. The diagnosis says: In ls.c (case locale_time_style) is dcgettext (NULL, long_time_format[i], LC_TIME); ... that translates the string, but the translation is THE SAME as the default - as the format is the same for en_* locales. But that is not what the ls.c source code does. The code does this: char const *locale_format = dcgettext (NULL, long_time_format[i], LC_TIME); if (locale_format == long_time_format[i]) goto case_long_iso_time_style; The == test returns true when dcgettext returns the msgid (its 2nd argument) because it finds no translation. If it found a translation, dcgettext would return a different string, so the == test would return false, and the code would use the translation. Even if the translation has the same _contents_ as the msgid, it will have a different _address_, so the code is correct as-is and does not need this modification. Ah, sorry... you are right, the address should be different, so the code is correct, I got confused somehow. Also, the proposed patch would use U.S. styles for all English locales, which certainly is not right. I suspect the diagnosis given by Jim Meyering in comment #3 at that bug report is correct, and that something is going wrong at install time. But as Pádraig wrote in the reply, there are no translation for en_* languages, so long iso style is used - which is imho wrong. The patch is fixing it (although it seems to be only a workaround and wrong approach). Better would be to have translations even for en_* locales in some cases - like this. This is better way than this workaround... Is it possible add those translations? Sorry for noise... Greetings, Ondřej signature.asc Description: Toto je digitálně podepsaná část zprávy
Re: [PATCH]: ls: do not show long iso time format for en_* locales
Jim Meyering wrote: Here's an incomplete patch. It needs a test and a NEWS entry. Ondřej, can you adjust your test to work (or skip) if there is no en* locale? Maybe something like that (attachment)? It's checking for existence of locale binary and en_US locale and performing the test only if both exists. I guess en_US locale is better than random choosing of one en_* locale. Also NEWS entry added although it maybe needs some tweaks as usually. Greetings, Ondřej From 3e327dd41c55c5f25180c779a766549b90af59fa Mon Sep 17 00:00:00 2001 From: =?utf-8?q?Ond=C5=99ej=20Va=C5=A1=C3=ADk?= ova...@redhat.com Date: Fri, 25 Sep 2009 15:20:47 +0200 Subject: [PATCH] tests/misc/ls-time: ls shouldn't show long iso time format for en_* locales * tests/misc/ls-time: test if ls doesn't show long iso time format when en_US locale is present * NEWS: mention that ls no longer shows long iso time format for en_* locales --- NEWS |4 tests/misc/ls-time | 13 + 2 files changed, 17 insertions(+), 0 deletions(-) diff --git a/NEWS b/NEWS index 1571c9c..2cf8b33 100644 --- a/NEWS +++ b/NEWS @@ -16,6 +16,10 @@ GNU coreutils NEWS-*- outline -*- when the color of a more specific type is disabled. [bug introduced in coreutils-5.90] + ls -l now correctly shows en_* locale timestamps of files instead of + long iso format. + [bug introduced in coreutils-6.0] + ** Portability On Solaris 9, many commands would mistakenly treat file/ the same as diff --git a/tests/misc/ls-time b/tests/misc/ls-time index abdd429..39108f0 100755 --- a/tests/misc/ls-time +++ b/tests/misc/ls-time @@ -123,4 +123,17 @@ EOF fail=1 fi +# The output for english locale should differ from long iso format +# This failed between 6.0 and 7.7 +# Do the test only when locale binary and any en_US locale is available +if (locale --version) /dev/null 21; then + en_locale=`locale -a | grep ^en_US$` + if test -n $en_locale; then +LC_ALL=$en_locale ls -l c en_output +ls -l --time-style=long-iso c liso_output +compare en_output liso_output { fail=1; + echo Long format timestamp for $en_locale locale is same as for long iso. 12; } + fi +fi + Exit $fail -- 1.5.6.1.156.ge903b signature.asc Description: Toto je digitálně podepsaná část zprávy
Re: [PATCH]: ls: do not show long iso time format for en_* locales
Ondřej Vašík ova...@redhat.com writes: as reported in https://bugzilla.redhat.com/show_bug.cgi?id=525134 by Daniel Qarras, ls -l shows iso long format for en_* locales. I just now read that Bugzilla report, and the diagnosis and the patch do not seem correct. The diagnosis says: In ls.c (case locale_time_style) is dcgettext (NULL, long_time_format[i], LC_TIME); ... that translates the string, but the translation is THE SAME as the default - as the format is the same for en_* locales. But that is not what the ls.c source code does. The code does this: char const *locale_format = dcgettext (NULL, long_time_format[i], LC_TIME); if (locale_format == long_time_format[i]) goto case_long_iso_time_style; The == test returns true when dcgettext returns the msgid (its 2nd argument) because it finds no translation. If it found a translation, dcgettext would return a different string, so the == test would return false, and the code would use the translation. Even if the translation has the same _contents_ as the msgid, it will have a different _address_, so the code is correct as-is and does not need this modification. Also, the proposed patch would use U.S. styles for all English locales, which certainly is not right. I suspect the diagnosis given by Jim Meyering in comment #3 at that bug report is correct, and that something is going wrong at install time.
Re: [PATCH]: ls: do not show long iso time format for en_* locales
Paul Eggert wrote: Ondřej Vašík ova...@redhat.com writes: as reported in https://bugzilla.redhat.com/show_bug.cgi?id=525134 by Daniel Qarras, ls -l shows iso long format for en_* locales. I just now read that Bugzilla report, and the diagnosis and the patch do not seem correct. The diagnosis says: In ls.c (case locale_time_style) is dcgettext (NULL, long_time_format[i], LC_TIME); ... that translates the string, but the translation is THE SAME as the default - as the format is the same for en_* locales. But that is not what the ls.c source code does. The code does this: char const *locale_format = dcgettext (NULL, long_time_format[i], LC_TIME); if (locale_format == long_time_format[i]) goto case_long_iso_time_style; The == test returns true when dcgettext returns the msgid (its 2nd argument) because it finds no translation. Right. We don't have any translations for en. I noticed this previously and assumed it was on purpose. I.E. we can override the default time style by providing an en translation for the time formats. Note one can't use LC_TIME=C to set the format, as that will also cause nl_langinfo to lookup the abbreviated months in that locale. I.E. the following would show english abbreviations: LC_TIME=C LANG=fr_FR.utf8 ls -l What I do is to set TIME_STYLE as follows: # traditional unix time format with abbreviated month translated from locale export TIME_STYLE='+%b %e %Y %b %e %H:%M' cheers, Pádraig.