Re: bug#32236: df header corrupted with LANG=zh_TW.UTF-8 on macOS

2018-07-28 Thread Chih-Hsuan Yen
2018-07-28 3:05 GMT+08:00 Paul Eggert : > Bruno Haible wrote: >> >> You can assume that mbrtowc returns >> 0 if and only if the multibyte sequence is a NUL byte - but you had >> chosen srcend in such a way that this would not happen in the loop. > > > Thanks for the correction. I mistakenly

Re: bug#32236: df header corrupted with LANG=zh_TW.UTF-8 on macOS

2018-07-27 Thread Paul Eggert
Bruno Haible wrote: You can assume that mbrtowc returns 0 if and only if the multibyte sequence is a NUL byte - but you had chosen srcend in such a way that this would not happen in the loop. Thanks for the correction. I mistakenly thought that C allows multibyte encodings in which a null

Re: bug#32236: df header corrupted with LANG=zh_TW.UTF-8 on macOS

2018-07-27 Thread Bruno Haible
Paul Eggert wrote: > my earlier patch > neglected the possibility that mbrtowc can return 0 I wouldn't see this as a bug: You can assume that mbrtowc returns 0 if and only if the multibyte sequence is a NUL byte - but you had chosen srcend in such a way that this would not happen in the loop. >

Re: bug#32236: df header corrupted with LANG=zh_TW.UTF-8 on macOS

2018-07-26 Thread Paul Eggert
Pádraig Brady wrote: I've pushed the c_iscntrl patch since it's simplest and probably most appropriate patch for an existing release. Yes, that makes sense for a quick patch. However, for the next release I think it'd be better to catch encoding errors and multibyte control characters, given

Re: bug#32236: df header corrupted with LANG=zh_TW.UTF-8 on macOS

2018-07-26 Thread Pádraig Brady
On 26/07/18 02:01, Paul Eggert wrote: > Chih-Hsuan Yen wrote: >> How about following the idea from Pádraig Brady >> and filter \n only? > > Given the later comments it seems better to filter out encoding errors and > control characters. Programs that parse the output already cannot trust the >

Re: bug#32236: df header corrupted with LANG=zh_TW.UTF-8 on macOS

2018-07-26 Thread Bruno Haible
Paul Eggert wrote: > Revised proposed patch(es) attached. Looks good to me, except for one little thing: > memcpy (dst, src, n); src and dst may overlap. Therefore memmove should be used instead of memcpy. Bruno

Re: bug#32236: df header corrupted with LANG=zh_TW.UTF-8 on macOS

2018-07-26 Thread Paul Eggert
Chih-Hsuan Yen wrote: How about following the idea from Pádraig Brady and filter \n only? Given the later comments it seems better to filter out encoding errors and control characters. Programs that parse the output already cannot trust the strings to be exactly right, since newlines are

Re: bug#32236: df header corrupted with LANG=zh_TW.UTF-8 on macOS

2018-07-25 Thread Chih-Hsuan Yen
2018-07-23 5:40 GMT+08:00 Bruno Haible : > Pádraig Brady wrote: >> > This patch is correct (because the characters that you test for in >> > c_iscntrl >> > are 0x00..0x1F, 0x7F, which don't occur as second or later byte in a >> > multibyte >> > character in the EUC-JP, EUC-KR, GB2312, EUC-TW,

Re: bug#32236: df header corrupted with LANG=zh_TW.UTF-8 on macOS

2018-07-22 Thread Bruno Haible
Pádraig Brady wrote: > > This patch is correct (because the characters that you test for in c_iscntrl > > are 0x00..0x1F, 0x7F, which don't occur as second or later byte in a > > multibyte > > character in the EUC-JP, EUC-KR, GB2312, EUC-TW, GB18030, SJIS encodings). > > ... It might be worth

Re: bug#32236: df header corrupted with LANG=zh_TW.UTF-8 on macOS

2018-07-22 Thread Bruno Haible
Pádraig Brady wrote: > but I did want to only avoid \n etc. that might cause issues for > programs that parsed output from df on a line by line basis. The current code (which uses iscntrl) also catches escape sequences that can cause weird output on the screen, in a terminal emulator. This is

Re: bug#32236: df header corrupted with LANG=zh_TW.UTF-8 on macOS

2018-07-22 Thread Paul Eggert
Pádraig Brady wrote: I did want to only avoid \n etc. that might cause issues for programs that parsed output from df on a line by line basis. This subset of control characters is safe to identify It seems problematic to start eliding improperly encoded mount points for example, rather than just

Re: bug#32236: df header corrupted with LANG=zh_TW.UTF-8 on macOS

2018-07-22 Thread Pádraig Brady
On 21/07/18 15:43, Bruno Haible wrote: > Hi Pádraig, > >> I've attached a gnulib patch to document for iscntrl at least. > >> +This function does not support arguments outside of the range of the >> +unsigned char type in locales with large character sets, on some platforms. >> +OS X 10.5 will

Re: bug#32236: df header corrupted with LANG=zh_TW.UTF-8 on macOS

2018-07-22 Thread Pádraig Brady
On 22/07/18 08:12, Paul Eggert wrote: > Pádraig Brady wrote: >> I've also attached an alternative patch for df (in your name). > > That still has problems, since it can generate improperly-encoded strings in > UTF-8 locales (if the inputs are improperly encoded), and can replace parts > of >

Re: bug#32236: df header corrupted with LANG=zh_TW.UTF-8 on macOS

2018-07-22 Thread Chih-Hsuan Yen
2018-07-22 23:12 GMT+08:00 Paul Eggert : > Pádraig Brady wrote: >> >> I've also attached an alternative patch for df (in your name). > > > That still has problems, since it can generate improperly-encoded strings in > UTF-8 locales (if the inputs are improperly encoded), and can replace parts > of

Re: bug#32236: df header corrupted with LANG=zh_TW.UTF-8 on macOS

2018-07-22 Thread Paul Eggert
Pádraig Brady wrote: I've also attached an alternative patch for df (in your name). That still has problems, since it can generate improperly-encoded strings in UTF-8 locales (if the inputs are improperly encoded), and can replace parts of multibyte characters with '?' in non-UTF-8 locales.

Re: bug#32236: df header corrupted with LANG=zh_TW.UTF-8 on macOS

2018-07-22 Thread Chih-Hsuan Yen
2018-07-22 18:46 GMT+08:00 Bruno Haible : > Chih-Hsuan Yen wrote: >> The `c_iscntrl()` patch also fixes the issue on macOS. Please tell me >> if you want me to test other patches, thanks! > > You could test how it behaves with mount points that contain U+2028 or > U+2029 characters. On Linux, I'd

Re: bug#32236: df header corrupted with LANG=zh_TW.UTF-8 on macOS

2018-07-22 Thread Bruno Haible
Chih-Hsuan Yen wrote: > The `c_iscntrl()` patch also fixes the issue on macOS. Please tell me > if you want me to test other patches, thanks! You could test how it behaves with mount points that contain U+2028 or U+2029 characters. On Linux, I'd test it like this. Hope it's similar on macOS: $

Re: bug#32236: df header corrupted with LANG=zh_TW.UTF-8 on macOS

2018-07-22 Thread Chih-Hsuan Yen
2018-07-22 6:43 GMT+08:00 Bruno Haible : > Hi Pádraig, > >> I've attached a gnulib patch to document for iscntrl at least. > >> +This function does not support arguments outside of the range of the >> +unsigned char type in locales with large character sets, on some platforms. >> +OS X 10.5 will

Re: bug#32236: df header corrupted with LANG=zh_TW.UTF-8 on macOS

2018-07-21 Thread Bruno Haible
Hi Pádraig, > I've attached a gnulib patch to document for iscntrl at least. > +This function does not support arguments outside of the range of the > +unsigned char type in locales with large character sets, on some platforms. > +OS X 10.5 will return non zero for characters >= 0x80 in UTF-8

Re: bug#32236: df header corrupted with LANG=zh_TW.UTF-8 on macOS

2018-07-21 Thread Pádraig Brady
On 21/07/18 07:20, Chih-Hsuan Yen wrote: > Hi coreutils developers, > > I'm using coreutils on macOS High Sierra (10.13). I noticed that with > `LANG=zh_TW.UTF-8`, `df` output is corrupted. > > �?�?系統 容�?? 已�?� �?��?� 已�?�% �??�?�? > /dev/disk1s1234G 151G81G65% / > /dev/disk1s4