Re: Alignment bug in ls with UTF-8 filenames under Mac OS X
On 2007-01-19 03:43:02 +0100, Vincent Lefevre wrote: On 2007-01-18 17:39:40 +0100, Bruno Haible wrote: Vincent, do you have time to report that to the Apple people? No need to mention 'ls' - a simple printf 'E\xcc\x81\t2nd column\nFoo\t2nd column\n' should be all you need to demonstrate the bug. I'm not in such a good position to report it, since I'm using an older version of MacOS X. Done. FYI, the ID is 4940781 (but since the bug reports are not public, I doubt this ID is useful). However I have reported several bugs for more than a year, and none of them are fixed. Fixed by Apple. -- Vincent Lefèvre [EMAIL PROTECTED] - Web: http://www.vinc17.org/ 100% accessible validated (X)HTML - Blog: http://www.vinc17.org/blog/ Work: CR INRIA - computer arithmetic / Arenaire project (LIP, ENS-Lyon) ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: Alignment bug in ls with UTF-8 filenames under Mac OS X
Vincent Lefevre [EMAIL PROTECTED] wrote: On 2007-01-19 01:23:44 +0100, Bruno Haible wrote: Apple Terminal version 1.4.6, part of MacOS X 10.3.9, is affected. I forgot to say. This is still not fixed in Terminal 1.5 (133) from Mac OS X 10.4.8. Thanks. I've checked this in: * coreutils.texi (ls: General output formatting): Mention the workarounds to accommodate the Apple Terminal bug. diff --git a/doc/coreutils.texi b/doc/coreutils.texi index 6fc6704..89e97d8 100644 --- a/doc/coreutils.texi +++ b/doc/coreutils.texi @@ -6419,6 +6419,13 @@ Assume that each tab stop is @var{cols} columns wide. The default is 8. @command{ls} uses tabs where possible in the output, for efficiency. If @var{cols} is zero, do not use tabs at all. [EMAIL PROTECTED] FIXME: remove in 2009, if Apple Terminal has been fixed for long enough. +Some terminal emulators (at least Apple Terminal 1.5 (133) from Mac OS X 10.4.8) +do not properly align columns to the right of a TAB following a [EMAIL PROTECTED] byte. If you use such a terminal emulator, use the [EMAIL PROTECTED] option or put @code{TABSIZE=0} in your environment to tell [EMAIL PROTECTED] to align using spaces, not tabs. + @item -w @itemx [EMAIL PROTECTED] @opindex -w ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: Alignment bug in ls with UTF-8 filenames under Mac OS X
On 2007-01-15 21:05:53 -0600, Vincent Lefevre [EMAIL PROTECTED] said: Hi, Under Mac OS X 10.4.8 with ls (GNU coreutils) 5.97 (installed via MacPorts), in a 80-column terminal (uxterm), I get: $ ls É y123456789012345678901234567890 x123456789012345678901234567890 z123456789012345678901234567890 instead of: $ ls Éy123456789012345678901234567890 x123456789012345678901234567890 z123456789012345678901234567890 Note: $ locale LANG=POSIX LC_COLLATE=POSIX LC_CTYPE=en_US.UTF-8 LC_MESSAGES=POSIX LC_MONETARY=POSIX LC_NUMERIC=POSIX LC_TIME=POSIX LC_ALL=POSIX/en_US.UTF-8/POSIX/POSIX/POSIX/POSIX Regards, How to reproduce, please? Does changing the Apple Terminal Window Settings aka Terminal Inspector help? In particular, select the tab named Display, and try the first three checkmarks under the Text Font section there. Sometimes the Anti-Alias setting is enough to push the width of the character cell over to make the rest of the printed line line-up properly. The next two checkmarks are for wide glyphs, sometimes Terminal needs to be fooled with these settings for accented chars anyway. How does iTerm behave? They've been working on some enhancements of their own (nevermind Apple ;) ). -- ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: Alignment bug in ls with UTF-8 filenames under Mac OS X
Vincent Lefevre wrote: Hmm... I forgot that ls was an alias (the same one on all my accounts). So, back on Mac OS X: prunille:~/blah \ls -C --color=always | hexdump -C 1b 5b 30 30 6d 1b 5b 30 6d 45 cc 81 1b 5b 30 30 |.[00m.[0mE�..[00| 0010 6d 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 |m | 0020 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 || 0030 1b 5b 30 6d 79 31 32 33 34 35 36 37 38 39 30 31 |.[0my12345678901| 0040 32 33 34 35 36 37 38 39 30 31 32 33 34 35 36 37 |2345678901234567| 0050 38 39 30 1b 5b 30 30 6d 0a 1b 5b 30 6d 78 31 32 |890.[00m..[0mx12| 0060 33 34 35 36 37 38 39 30 31 32 33 34 35 36 37 38 |3456789012345678| 0070 39 30 31 32 33 34 35 36 37 38 39 30 1b 5b 30 30 |901234567890.[00| 0080 6d 20 20 1b 5b 30 6d 7a 31 32 33 34 35 36 37 38 |m .[0mz12345678| 0090 39 30 31 32 33 34 35 36 37 38 39 30 31 32 33 34 |9012345678901234| 00a0 35 36 37 38 39 30 1b 5b 30 30 6d 0a 1b 5b 6d |567890.[00m..[m| 00af That makes - except for the escape sequences - an E, a combining accent and 31 spaces. So it's the same bug as in ls -C -T0. I see that the first call to wcwidth() gives: wcwidth(0x0301) = 1. U+0301 is COMBINING ACUTE ACCENT. So here is the problem: MacOS' wcwidth is buggy for combining characters like accents. OK. Can't autoconf detect that and use another implementation? Yes. We can do that in gnulib. I'll work on this issue in the next few weeks. Please remind us (on the bug-gnulib mailing list) in 1 or 2 months. And, as we have seen, the other issue is that Apple Terminal has problems estimating the width of tabs when there are non-ASCII characters. Since you can start an telnet/ssh session from MacOS X to any platform (Linux, Solaris, etc.), the fix needs to be platform independent. Here is such a fix: 2007-01-18 Bruno Haible [EMAIL PROTECTED] Avoid problems with tabs after non-ASCII characters in some terminals. * src/ls.c (nonascii_in_this_line): New variable. (quote_name): Update nonascii_in_this_line. (print_many_per_line, print_horizontal): Set nonascii_in_this_line to false at the beginning of each line. (indent): Use spaces for indentation when nonascii_in_this_line. diff -c -3 -r1.447 ls.c *** src/ls.c2 Jan 2007 06:29:12 - 1.447 --- src/ls.c18 Jan 2007 14:38:14 - *** *** 851,856 --- 851,859 for the separating white space. */ #define MIN_COLUMN_WIDTH 3 + /* True if some non-ASCII character has been output on this line. */ + static bool nonascii_in_this_line; + /* This zero-based index is used solely with the --dired option. When that option is in effect, this counter is incremented for each *** *** 3704,3710 } if (out != NULL) ! fwrite (buf, 1, len, out); if (width != NULL) *width = displayed_width; return len; --- 3702,3722 } if (out != NULL) ! { ! /* Update nonascii_in_this_line indicator. */ ! char const *p = buf; ! char const *plimit = buf + len; ! ! for (; p plimit; p++) ! if (!isascii (to_uchar (*p))) ! { ! nonascii_in_this_line = true; ! break; ! } ! ! /* Actually output the quoted representation. */ ! fwrite (buf, 1, len, out); ! } if (width != NULL) *width = displayed_width; return len; *** *** 3957,3962 --- 3969,3975 size_t pos = 0; /* Print the next row. */ + nonascii_in_this_line = false; while (1) { size_t name_length = length_of_file_name_and_frills (files + filesno); *** *** 3984,3989 --- 3997,4004 size_t name_length = length_of_file_name_and_frills (files); size_t max_name_length = line_fmt-col_arr[0]; + nonascii_in_this_line = false; + /* Print first entry. */ print_file_name_and_frills (files); *** *** 3996,4001 --- 4011,4017 { putchar ('\n'); pos = 0; + nonascii_in_this_line = false; } else { *** *** 4047,4060 } /* Assuming cursor is at position FROM, indent up to position TO. !Use a TAB character instead of two or more spaces whenever possible. */ static void indent (size_t from, size_t to) { while (from to) { ! if (tabsize != 0 to / tabsize (from + 1) / tabsize) { putchar ('\t'); from += tabsize - from % tabsize; --- 4063,4085 } /* Assuming cursor is at position FROM, indent up to position TO. !Use a TAB character instead of two or more spaces whenever possible. !Depends on the TABSIZE option and on the current value of !NONASCII_IN_THIS_LINE. */ static void indent (size_t from, size_t to) { while (from
Re: Alignment bug in ls with UTF-8 filenames under Mac OS X
Bruno Haible [EMAIL PROTECTED] wrote: Vincent Lefevre wrote: ... I see that the first call to wcwidth() gives: wcwidth(0x0301) = 1. U+0301 is COMBINING ACUTE ACCENT. So here is the problem: MacOS' wcwidth is buggy for combining characters like accents. OK. Can't autoconf detect that and use another implementation? Yes. We can do that in gnulib. I'll work on this issue in the next few weeks. Please remind us (on the bug-gnulib mailing list) in 1 or 2 months. Thanks for volunteering to do that. And, as we have seen, the other issue is that Apple Terminal has problems estimating the width of tabs when there are non-ASCII characters. Since you can start an telnet/ssh session from MacOS X to any platform (Linux, Solaris, etc.), the fix needs to be platform independent. Here is such a fix: 2007-01-18 Bruno Haible [EMAIL PROTECTED] Avoid problems with tabs after non-ASCII characters in some terminals. * src/ls.c (nonascii_in_this_line): New variable. (quote_name): Update nonascii_in_this_line. (print_many_per_line, print_horizontal): Set nonascii_in_this_line to false at the beginning of each line. (indent): Use spaces for indentation when nonascii_in_this_line. Thank you for working on this. As I understand the goal, you'd like to make ls act differently (outputting spaces, not TABs, for column alignment) on all systems for each line containing a non-ASCII byte. The proposed change in behavior would serve solely to make it so columns line up better when displaying on a buggy Apple Terminal. That change would contradict the documentation of -T, but more importantly, it would make the output significantly larger when there are wide columns and many lines containing a non-ASCII byte, thus penalizing all users in order to cater to a buggy terminal emulator. I would rather simply have someone who cares about Apple Terminal report the bug, and in the mean time, advise people to use -T0 (or set TABSIZE=0 in their environment) if they care about alignment when using a buggy version of that particular terminal emulator. ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: Alignment bug in ls with UTF-8 filenames under Mac OS X
Jim Meyering wrote: As I understand the goal, you'd like to make ls act differently (outputting spaces, not TABs, for column alignment) on all systems for each line containing a non-ASCII byte. Yes, this is what the proposed patch does. That change would contradict the documentation of -T The --color option also has the effect of turning tabs into spaces; yet this is undocumented. Actually the doc states `ls' uses tabs where possible in the output, for efficiency. If COLS is zero, do not use tabs at all. and the phrase where possible is vague enough. It is not possible to use tabs with --color, and it is not possible to use tabs after non-ASCII characters. but more importantly, it would make the output significantly larger when there are wide columns and many lines containing a non-ASCII byte, thus penalizing all users in order to cater to a buggy terminal emulator. I thought with xterm, as with most terminal emulators, the network transmit time is negligible compared to the rendering time on the X side. Besides that, your argument trades correctness of display against efficiency. I would rather simply have someone who cares about Apple Terminal report the bug, and in the mean time, advise people to use -T0 (or set TABSIZE=0 in their environment) if they care about alignment when using a buggy version of that particular terminal emulator. Vincent, do you have time to report that to the Apple people? No need to mention 'ls' - a simple printf 'E\xcc\x81\t2nd column\nFoo\t2nd column\n' should be all you need to demonstrate the bug. I'm not in such a good position to report it, since I'm using an older version of MacOS X. Bruno ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: Alignment bug in ls with UTF-8 filenames under Mac OS X
Bruno Haible [EMAIL PROTECTED] wrote: Jim Meyering wrote: As I understand the goal, you'd like to make ls act differently (outputting spaces, not TABs, for column alignment) on all systems for each line containing a non-ASCII byte. Yes, this is what the proposed patch does. That change would contradict the documentation of -T The --color option also has the effect of turning tabs into spaces; yet this is undocumented. Actually the doc states `ls' uses tabs where possible in the output, for efficiency. If COLS is zero, do not use tabs at all. and the phrase where possible is vague enough. It is not possible to use tabs with --color, and it is not possible to use tabs after non-ASCII characters. Um... it *is* possible to use TABs after non-ASCII bytes and get correct alignment. The only requirement is that you be using a reasonable (non-buggy) terminal emulator. but more importantly, it would make the output significantly larger when there are wide columns and many lines containing a non-ASCII byte, thus penalizing all users in order to cater to a buggy terminal emulator. I thought with xterm, as with most terminal emulators, the network transmit time is negligible compared to the rendering time on the X side. Besides that, your argument trades correctness of display against efficiency. Not at all. I merely refuse to pessimize ls output for everyone, solely to accommodate some currently buggy version of Apple Terminal. I would rather simply have someone who cares about Apple Terminal report the bug, and in the mean time, advise people to use -T0 (or set TABSIZE=0 in their environment) if they care about alignment when using a buggy version of that particular terminal emulator. Do you really think it would be better to make everyone pay (even a tiny bit) when there is such an easy work-around? ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: Alignment bug in ls with UTF-8 filenames under Mac OS X
Jim Meyering wrote: Um... it *is* possible to use TABs after non-ASCII bytes and get correct alignment. The only requirement is that you be using a reasonable (non-buggy) terminal emulator. Yes, sure. I was only pointing out that the proposed change wouldn't need a doc change, because the wording in the doc is already vague. in the mean time, advise people to use -T0 (or set TABSIZE=0 in their environment) if they care about alignment when using a buggy version of that particular terminal emulator. Do you really think it would be better to make everyone pay (even a tiny bit) when there is such an easy work-around? Given that - Apple Terminal is the default/normal terminal emulator on MacOS X, - networking/pipe speed are not critical nowadays (in the times of internet radio and streaming video), - the bug was tricky enough to analyze, that an average user couldn't do it by himself, I would say yes in this case. Bruno ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: Alignment bug in ls with UTF-8 filenames under Mac OS X
Bruno Haible [EMAIL PROTECTED] wrote: in the mean time, advise people to use -T0 (or set TABSIZE=0 in their environment) if they care about alignment when using a buggy version of that particular terminal emulator. Do you really think it would be better to make everyone pay (even a tiny bit) when there is such an easy work-around? Given that - Apple Terminal is the default/normal terminal emulator on MacOS X, - networking/pipe speed are not critical nowadays (in the times of internet radio and streaming video), - the bug was tricky enough to analyze, that an average user couldn't do it by himself, I would say yes in this case. We disagree. IMHO, it would be unwise to make such a global sacrifice for a single, buggy, closed-source terminator emulator. However, if someone tells me which version of Apple Terminal is affected, I'll mention the work-around in the coreutils README file. ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: Alignment bug in ls with UTF-8 filenames under Mac OS X
Paul Eggert wrote: Long ago I regularly used terminal emulators that mishandled tabs. Eventually they got fixed (or I stopped using them). Long ago I used terminals where the tab stops were customizable, and the previous user had set them to weird values. At that time, I stopped using tabs. :-) Bruno ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: Alignment bug in ls with UTF-8 filenames under Mac OS X
On 2007-01-18 17:39:40 +0100, Bruno Haible wrote: The --color option also has the effect of turning tabs into spaces; yet this is undocumented. Actually the doc states `ls' uses tabs where possible in the output, for efficiency. If COLS is zero, do not use tabs at all. and the phrase where possible is vague enough. It is not possible to use tabs with --color, and it is not possible to use tabs after non-ASCII characters. BTW, it shouldn't use tabs when the output does not correspond to a terminal. For instance, the user may want to send the file by mail or may want to indent it. Incorrect results can be obtained if there are tabs. A solution could be to have tabsize set to 0 by default. For users who need 8 (or some other value) because of a slow network (without compression, since a sequence of spaces should be compressed) could change its value. Vincent, do you have time to report that to the Apple people? No need to mention 'ls' - a simple printf 'E\xcc\x81\t2nd column\nFoo\t2nd column\n' should be all you need to demonstrate the bug. I'm not in such a good position to report it, since I'm using an older version of MacOS X. Done. FYI, the ID is 4940781 (but since the bug reports are not public, I doubt this ID is useful). However I have reported several bugs for more than a year, and none of them are fixed. -- Vincent Lefèvre [EMAIL PROTECTED] - Web: http://www.vinc17.org/ 100% accessible validated (X)HTML - Blog: http://www.vinc17.org/blog/ Work: CR INRIA - computer arithmetic / Arenaire project (LIP, ENS-Lyon) ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: Alignment bug in ls with UTF-8 filenames under Mac OS X
On 2007-01-19 01:23:44 +0100, Bruno Haible wrote: Apple Terminal version 1.4.6, part of MacOS X 10.3.9, is affected. I forgot to say. This is still not fixed in Terminal 1.5 (133) from Mac OS X 10.4.8. -- Vincent Lefèvre [EMAIL PROTECTED] - Web: http://www.vinc17.org/ 100% accessible validated (X)HTML - Blog: http://www.vinc17.org/blog/ Work: CR INRIA - computer arithmetic / Arenaire project (LIP, ENS-Lyon) ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: Alignment bug in ls with UTF-8 filenames under Mac OS X
Eric Blake wrote: coreutils does not handle multi-byte locales well. True, The problem is that no one has yet written a patch that makes it easy to handle multibyte locales without penalizing single-byte locales. There are patches for multibyte locale support for many of the text utilities, written in 2001. They are based on the mbchar and mbiter modules that are now in gnulib. But regardless how they were written, Jim preferred not to use them: - If the code used multibyte functions always, it was too much of a slowdown compared to the older implementation that worked only for unibyte locales. Everyone agreed on this. - If the code used an if (MB_CUR_MAX 1) ... code which uses mb* functions ... else ... unibyte code ... Jim objected that there was too much code duplication between the multibyte and the unibyte branch. - If the code used macros that can expand to multibyte or unibyte primitives, depending on the situation, one could put the code that uses these macros into a separate file, say, fold-subroutines.h, and in the main fold.c write #define DO_MULTIBYTE 1 #include fold-subroutines.h /* defines fold_multibyte */ #define DO_UNIBYTE 1 #include fold-subroutines.h /* defines fold_unibyte */ if (MB_CUR_MAX 1) fold_multibyte (...); else fold_unibyte (...); Here Jim said that it was too many macros for him. There has been no progress since then, since noone sees how one can get all 3 of Jim's requirements simultaneously: - Good speed for the unibyte case. - No code duplication. - No macros. Bruno ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: Alignment bug in ls with UTF-8 filenames under Mac OS X
Vincent Lefevre wrote: Therefore: can you also show wrong behaviour when you set LC_ALL=en_US.UTF-8 ? Yes: prunille:~/blah export LC_ALL=en_US.UTF-8 prunille:~/blah locale LANG=POSIX LC_COLLATE=en_US.UTF-8 LC_CTYPE=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_NUMERIC=en_US.UTF-8 LC_TIME=en_US.UTF-8 LC_ALL=en_US.UTF-8 prunille:~/blah ls É y123456789012345678901234567890 x123456789012345678901234567890 z123456789012345678901234567890 On MacOS X 10.3.9 I can reproduce this. Let's look at the hexdump of ls' output: 1) In an Apple Terminal 2) In an xterm, launched with LC_ALL=en_US.UTF-8 xterm 3) In an xterm running on Linux, with an ssh to MacOS X In all three cases the output of ls is the same: $ LC_ALL=en_US.UTF-8 ls -C | hd 00 45 CC 81 09 09 09 09 20 79 31 32 33 34 35 36 37 E.. y1234567 10 38 39 30 31 32 33 34 35 36 37 38 39 30 31 32 33 8901234567890123 20 34 35 36 37 38 39 30 0A 78 31 32 33 34 35 36 37 4567890.x1234567 30 38 39 30 31 32 33 34 35 36 37 38 39 30 31 32 33 8901234567890123 40 34 35 36 37 38 39 30 20 20 7A 31 32 33 34 35 36 4567890 z123456 50 37 38 39 30 31 32 33 34 35 36 37 38 39 30 31 32 7890123456789012 60 33 34 35 36 37 38 39 30 0A 34567890. You see, it starts with E, the accent - on MacOS X, filenames are represented in decomposed Unicode form -, 4 tabs and a space. So that the second column of filenames should start in screen column 33 (where the leftmost is screen column 0). But the output in the terminal looks like this: 1) In an Apple Terminal É y123456789012345678901234567890 x123456789012345678901234567890 z123456789012345678901234567890 2), 3) Éy123456789012345678901234567890 x123456789012345678901234567890 z123456789012345678901234567890 So what you see is that Apple Terminal has problems knowing the width of combining characters like accents when it expands tabs. If you tell 'ls' to emit spaces instead of tabs, like this: ls -C -T0 or TABSIZE=0 ls -C then the output looks the same in all kinds of terminals. Conclusion: What you see is not an ls bug, but an Apple Terminal bug with tabs. But there is an ls bug: $ ls -C -T0 É y123456789012345678901234567890 x123456789012345678901234567890 z123456789012345678901234567890 $ ls -C -T0 | hd 00 45 CC 81 20 20 20 20 20 20 20 20 20 20 20 20 20 E.. 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 79 31 32 33 34 35 36 37 38 39 30 31 32 33y1234567890123 30 34 35 36 37 38 39 30 31 32 33 34 35 36 37 38 39 4567890123456789 40 30 0A 78 31 32 33 34 35 36 37 38 39 30 31 32 33 0.x1234567890123 50 34 35 36 37 38 39 30 31 32 33 34 35 36 37 38 39 4567890123456789 60 30 20 20 7A 31 32 33 34 35 36 37 38 39 30 31 32 0 z123456789012 70 33 34 35 36 37 38 39 30 31 32 33 34 35 36 37 38 3456789012345678 80 39 30 0A 90. What 'ls' here outputs is: an E, a combining accent and 31 spaces - text that moves to column 32, not 33. When I set a breakpoint in wcwidth, I see that the first call to wcwidth() gives: wcwidth(0x0301) = 1. U+0301 is COMBINING ACUTE ACCENT. So here is the problem: MacOS' wcwidth is buggy for combining characters like accents. Bruno (*) 'hd' is a shell script: #!/bin/sh hexdump -e '%06.6_ax 16/1 %02X ' -e ' 16/1 %_p \n' $@ ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: Alignment bug in ls with UTF-8 filenames under Mac OS X
On 2007-01-18 03:14:37 +0100, Bruno Haible wrote: Conclusion: What you see is not an ls bug, but an Apple Terminal bug with tabs. I don't use the Apple Terminal (and never use it). As I said in my bug report, I'm using uxterm here. More precisely: prunille:~ uxterm -version XFree86 4.3.99.903(184) With the same uxterm, after a ssh to a Linux machine: vin:~tmp/blah LC_ALL=en_US.UTF-8 \ls -C | hd 45 cc 81 09 09 09 09 20 79 31 32 33 34 35 36 37 |E.. y1234567| 0010 38 39 30 31 32 33 34 35 36 37 38 39 30 31 32 33 |8901234567890123| 0020 34 35 36 37 38 39 30 0a 78 31 32 33 34 35 36 37 |4567890.x1234567| 0030 38 39 30 31 32 33 34 35 36 37 38 39 30 31 32 33 |8901234567890123| 0040 34 35 36 37 38 39 30 20 20 7a 31 32 33 34 35 36 |4567890 z123456| 0050 37 38 39 30 31 32 33 34 35 36 37 38 39 30 31 32 |7890123456789012| 0060 33 34 35 36 37 38 39 30 0a |34567890.| 0069 vin:~tmp/blah LC_ALL=en_US.UTF-8 \ls -C Éy123456789012345678901234567890 x123456789012345678901234567890 z123456789012345678901234567890 No problem. Hmm... I forgot that ls was an alias (the same one on all my accounts). So, back on Mac OS X: prunille:~/blah \ls Éy123456789012345678901234567890 x123456789012345678901234567890 z123456789012345678901234567890 prunille:~/blah \ls --color=always É y123456789012345678901234567890 x123456789012345678901234567890 z123456789012345678901234567890 prunille:~/blah \ls -C | hexdump -C 45 cc 81 09 09 09 09 20 79 31 32 33 34 35 36 37 |E�. y1234567| 0010 38 39 30 31 32 33 34 35 36 37 38 39 30 31 32 33 |8901234567890123| 0020 34 35 36 37 38 39 30 0a 78 31 32 33 34 35 36 37 |4567890.x1234567| 0030 38 39 30 31 32 33 34 35 36 37 38 39 30 31 32 33 |8901234567890123| 0040 34 35 36 37 38 39 30 20 20 7a 31 32 33 34 35 36 |4567890 z123456| 0050 37 38 39 30 31 32 33 34 35 36 37 38 39 30 31 32 |7890123456789012| 0060 33 34 35 36 37 38 39 30 0a |34567890.| 0069 prunille:~/blah \ls -C --color=always | hexdump -C 1b 5b 30 30 6d 1b 5b 30 6d 45 cc 81 1b 5b 30 30 |.[00m.[0mE�..[00| 0010 6d 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 |m | 0020 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 || 0030 1b 5b 30 6d 79 31 32 33 34 35 36 37 38 39 30 31 |.[0my12345678901| 0040 32 33 34 35 36 37 38 39 30 31 32 33 34 35 36 37 |2345678901234567| 0050 38 39 30 1b 5b 30 30 6d 0a 1b 5b 30 6d 78 31 32 |890.[00m..[0mx12| 0060 33 34 35 36 37 38 39 30 31 32 33 34 35 36 37 38 |3456789012345678| 0070 39 30 31 32 33 34 35 36 37 38 39 30 1b 5b 30 30 |901234567890.[00| 0080 6d 20 20 1b 5b 30 6d 7a 31 32 33 34 35 36 37 38 |m .[0mz12345678| 0090 39 30 31 32 33 34 35 36 37 38 39 30 31 32 33 34 |9012345678901234| 00a0 35 36 37 38 39 30 1b 5b 30 30 6d 0a 1b 5b 6d |567890.[00m..[m| 00af But there is an ls bug: $ ls -C -T0 É y123456789012345678901234567890 x123456789012345678901234567890 z123456789012345678901234567890 $ ls -C -T0 | hd 00 45 CC 81 20 20 20 20 20 20 20 20 20 20 20 20 20 E.. 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 79 31 32 33 34 35 36 37 38 39 30 31 32 33y1234567890123 [...] OK, so I think I was seeing this bug. What 'ls' here outputs is: an E, a combining accent and 31 spaces - text that moves to column 32, not 33. When I set a breakpoint in wcwidth, I see that the first call to wcwidth() gives: wcwidth(0x0301) = 1. U+0301 is COMBINING ACUTE ACCENT. So here is the problem: MacOS' wcwidth is buggy for combining characters like accents. OK. Can't autoconf detect that and use another implementation? (*) 'hd' is a shell script: #!/bin/sh hexdump -e '%06.6_ax 16/1 %02X ' -e ' 16/1 %_p \n' $@ It's a bit like (or identical to) hexdump -C, then. Regards, -- Vincent Lefèvre [EMAIL PROTECTED] - Web: http://www.vinc17.org/ 100% accessible validated (X)HTML - Blog: http://www.vinc17.org/blog/ Work: CR INRIA - computer arithmetic / Arenaire project (LIP, ENS-Lyon) ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: Alignment bug in ls with UTF-8 filenames under Mac OS X
On 2007-01-15 22:29:41 -0800, Paul Eggert wrote: Most likely this has something to do with how mbrtowc and/or wcwidth behaves on MacOS X. Perhaps you can debug the quote_name function of 'ls' on the affected file name, and see why it's computing the width that it's computing? First, do you know any freely available test suite for functions such as mbrtowc and wcwidth? It would be easier to know where the problem is. -- Vincent Lefèvre [EMAIL PROTECTED] - Web: http://www.vinc17.org/ 100% accessible validated (X)HTML - Blog: http://www.vinc17.org/blog/ Work: CR INRIA - computer arithmetic / Arenaire project (LIP, ENS-Lyon) ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: Alignment bug in ls with UTF-8 filenames under Mac OS X
Vincent Lefevre [EMAIL PROTECTED] writes: First, do you know any freely available test suite for functions such as mbrtowc and wcwidth? It would be easier to know where the problem is. There are some tests in glibc. For most of them it should be possible to run them standalone, too. Andreas. -- Andreas Schwab, SuSE Labs, [EMAIL PROTECTED] SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany PGP key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 And now for something completely different. ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Alignment bug in ls with UTF-8 filenames under Mac OS X
Hi, Under Mac OS X 10.4.8 with ls (GNU coreutils) 5.97 (installed via MacPorts), in a 80-column terminal (uxterm), I get: $ ls É y123456789012345678901234567890 x123456789012345678901234567890 z123456789012345678901234567890 instead of: $ ls Éy123456789012345678901234567890 x123456789012345678901234567890 z123456789012345678901234567890 Note: $ locale LANG=POSIX LC_COLLATE=POSIX LC_CTYPE=en_US.UTF-8 LC_MESSAGES=POSIX LC_MONETARY=POSIX LC_NUMERIC=POSIX LC_TIME=POSIX LC_ALL=POSIX/en_US.UTF-8/POSIX/POSIX/POSIX/POSIX Regards, -- Vincent Lefèvre [EMAIL PROTECTED] - Web: http://www.vinc17.org/ 100% accessible validated (X)HTML - Blog: http://www.vinc17.org/blog/ Work: CR INRIA - computer arithmetic / Arenaire project (LIP, ENS-Lyon) ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: Alignment bug in ls with UTF-8 filenames under Mac OS X
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 According to Vincent Lefevre on 1/15/2007 8:05 PM: Hi, Under Mac OS X 10.4.8 with ls (GNU coreutils) 5.97 (installed via MacPorts), in a 80-column terminal (uxterm), I get: $ ls É y123456789012345678901234567890 x123456789012345678901234567890 z123456789012345678901234567890 This is yet another symptom of a much larger issue - namely, coreutils does not handle multi-byte locales well. The problem is that no one has yet written a patch that makes it easy to handle multibyte locales without penalizing single-byte locales. - -- Don't work too hard, make some time for fun as well! Eric Blake [EMAIL PROTECTED] -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.5 (Cygwin) Comment: Public key at home.comcast.net/~ericblake/eblake.gpg Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFFrEK+84KuGfSFAYARAgo4AJ9sx7SmmVcm7uzsAHcWxK+7GVb2iwCgoKZI XDy07bliUTYTIzz37ZsA0xI= =hZfX -END PGP SIGNATURE- ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: Alignment bug in ls with UTF-8 filenames under Mac OS X
On 2007-01-15 20:13:02 -0700, Eric Blake wrote: According to Vincent Lefevre on 1/15/2007 8:05 PM: Under Mac OS X 10.4.8 with ls (GNU coreutils) 5.97 (installed via MacPorts), in a 80-column terminal (uxterm), I get: $ ls É y123456789012345678901234567890 x123456789012345678901234567890 z123456789012345678901234567890 This is yet another symptom of a much larger issue - namely, coreutils does not handle multi-byte locales well. The problem is that no one has yet written a patch that makes it easy to handle multibyte locales without penalizing single-byte locales. But I don't have this problem under Linux (Debian). Note: with the example above, one needs LC_COLLATE=en_US.UTF-8 so that the É comes first. $ ls Éy123456789012345678901234567890 x123456789012345678901234567890 z123456789012345678901234567890 In fact the problem seems to be due to the combining character under Mac OS X. The filename É is encoded as 45 cc 81. -- Vincent Lefèvre [EMAIL PROTECTED] - Web: http://www.vinc17.org/ 100% accessible validated (X)HTML - Blog: http://www.vinc17.org/blog/ Work: CR INRIA - computer arithmetic / Arenaire project (LIP, ENS-Lyon) ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: Alignment bug in ls with UTF-8 filenames under Mac OS X
Vincent Lefevre [EMAIL PROTECTED] writes: In fact the problem seems to be due to the combining character under Mac OS X. The filename É is encoded as 45 cc 81. Most likely this has something to do with how mbrtowc and/or wcwidth behaves on MacOS X. Perhaps you can debug the quote_name function of 'ls' on the affected file name, and see why it's computing the width that it's computing? ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils