Re: bug#32236: df header corrupted with LANG=zh_TW.UTF-8 on macOS

2018-07-22 Thread Chih-Hsuan Yen
2018-07-22 6:43 GMT+08:00 Bruno Haible : > Hi Pádraig, > >> I've attached a gnulib patch to document for iscntrl at least. > >> +This function does not support arguments outside of the range of the >> +unsigned char type in locales with large character sets, on some platforms. >> +OS X 10.5 will

Re: bug#32236: df header corrupted with LANG=zh_TW.UTF-8 on macOS

2018-07-22 Thread Paul Eggert
Pádraig Brady wrote: I've also attached an alternative patch for df (in your name). That still has problems, since it can generate improperly-encoded strings in UTF-8 locales (if the inputs are improperly encoded), and can replace parts of multibyte characters with '?' in non-UTF-8 locales.

Re: bug#32236: df header corrupted with LANG=zh_TW.UTF-8 on macOS

2018-07-22 Thread Pádraig Brady
On 22/07/18 08:12, Paul Eggert wrote: > Pádraig Brady wrote: >> I've also attached an alternative patch for df (in your name). > > That still has problems, since it can generate improperly-encoded strings in > UTF-8 locales (if the inputs are improperly encoded), and can replace parts > of >

Re: bug#32236: df header corrupted with LANG=zh_TW.UTF-8 on macOS

2018-07-22 Thread Chih-Hsuan Yen
2018-07-22 23:12 GMT+08:00 Paul Eggert : > Pádraig Brady wrote: >> >> I've also attached an alternative patch for df (in your name). > > > That still has problems, since it can generate improperly-encoded strings in > UTF-8 locales (if the inputs are improperly encoded), and can replace parts > of

Re: bug#32236: df header corrupted with LANG=zh_TW.UTF-8 on macOS

2018-07-22 Thread Paul Eggert
Pádraig Brady wrote: I did want to only avoid \n etc. that might cause issues for programs that parsed output from df on a line by line basis. This subset of control characters is safe to identify It seems problematic to start eliding improperly encoded mount points for example, rather than just

Re: bug#32236: df header corrupted with LANG=zh_TW.UTF-8 on macOS

2018-07-22 Thread Pádraig Brady
On 21/07/18 15:43, Bruno Haible wrote: > Hi Pádraig, > >> I've attached a gnulib patch to document for iscntrl at least. > >> +This function does not support arguments outside of the range of the >> +unsigned char type in locales with large character sets, on some platforms. >> +OS X 10.5 will

Re: bug#32236: df header corrupted with LANG=zh_TW.UTF-8 on macOS

2018-07-22 Thread Bruno Haible
Chih-Hsuan Yen wrote: > The `c_iscntrl()` patch also fixes the issue on macOS. Please tell me > if you want me to test other patches, thanks! You could test how it behaves with mount points that contain U+2028 or U+2029 characters. On Linux, I'd test it like this. Hope it's similar on macOS: $

Re: bug#32236: df header corrupted with LANG=zh_TW.UTF-8 on macOS

2018-07-22 Thread Chih-Hsuan Yen
2018-07-22 18:46 GMT+08:00 Bruno Haible : > Chih-Hsuan Yen wrote: >> The `c_iscntrl()` patch also fixes the issue on macOS. Please tell me >> if you want me to test other patches, thanks! > > You could test how it behaves with mount points that contain U+2028 or > U+2029 characters. On Linux, I'd

Re: bug#32236: df header corrupted with LANG=zh_TW.UTF-8 on macOS

2018-07-22 Thread Bruno Haible
Pádraig Brady wrote: > but I did want to only avoid \n etc. that might cause issues for > programs that parsed output from df on a line by line basis. The current code (which uses iscntrl) also catches escape sequences that can cause weird output on the screen, in a terminal emulator. This is

Re: bug#32236: df header corrupted with LANG=zh_TW.UTF-8 on macOS

2018-07-22 Thread Bruno Haible
Pádraig Brady wrote: > > This patch is correct (because the characters that you test for in c_iscntrl > > are 0x00..0x1F, 0x7F, which don't occur as second or later byte in a > > multibyte > > character in the EUC-JP, EUC-KR, GB2312, EUC-TW, GB18030, SJIS encodings). > > ... It might be worth