2018-07-28 3:05 GMT+08:00 Paul Eggert :
> Bruno Haible wrote:
>>
>> You can assume that mbrtowc returns
>> 0 if and only if the multibyte sequence is a NUL byte - but you had
>> chosen srcend in such a way that this would not happen in the loop.
>
>
> Thanks for the correction. I mistakenly
Bruno Haible wrote:
You can assume that mbrtowc returns
0 if and only if the multibyte sequence is a NUL byte - but you had
chosen srcend in such a way that this would not happen in the loop.
Thanks for the correction. I mistakenly thought that C allows multibyte
encodings in which a null
Paul Eggert wrote:
> my earlier patch
> neglected the possibility that mbrtowc can return 0
I wouldn't see this as a bug: You can assume that mbrtowc returns
0 if and only if the multibyte sequence is a NUL byte - but you had
chosen srcend in such a way that this would not happen in the loop.
>
Pádraig Brady wrote:
I've pushed the c_iscntrl patch since it's simplest
and probably most appropriate patch for an existing release.
Yes, that makes sense for a quick patch. However, for the next release I think
it'd be better to catch encoding errors and multibyte control characters, given
On 26/07/18 02:01, Paul Eggert wrote:
> Chih-Hsuan Yen wrote:
>> How about following the idea from Pádraig Brady
>> and filter \n only?
>
> Given the later comments it seems better to filter out encoding errors and
> control characters. Programs that parse the output already cannot trust the
>
Paul Eggert wrote:
> Revised proposed patch(es) attached.
Looks good to me, except for one little thing:
> memcpy (dst, src, n);
src and dst may overlap. Therefore memmove should be used instead of memcpy.
Bruno
Chih-Hsuan Yen wrote:
How about following the idea from Pádraig Brady
and filter \n only?
Given the later comments it seems better to filter out encoding errors and
control characters. Programs that parse the output already cannot trust the
strings to be exactly right, since newlines are
2018-07-23 5:40 GMT+08:00 Bruno Haible :
> Pádraig Brady wrote:
>> > This patch is correct (because the characters that you test for in
>> > c_iscntrl
>> > are 0x00..0x1F, 0x7F, which don't occur as second or later byte in a
>> > multibyte
>> > character in the EUC-JP, EUC-KR, GB2312, EUC-TW,
Pádraig Brady wrote:
> > This patch is correct (because the characters that you test for in c_iscntrl
> > are 0x00..0x1F, 0x7F, which don't occur as second or later byte in a
> > multibyte
> > character in the EUC-JP, EUC-KR, GB2312, EUC-TW, GB18030, SJIS encodings).
>
> ... It might be worth
Pádraig Brady wrote:
> but I did want to only avoid \n etc. that might cause issues for
> programs that parsed output from df on a line by line basis.
The current code (which uses iscntrl) also catches escape sequences
that can cause weird output on the screen, in a terminal emulator.
This is
Pádraig Brady wrote:
I did want to only avoid \n etc. that might cause issues for
programs that parsed output from df on a line by line basis.
This subset of control characters is safe to identify
It seems problematic to start eliding improperly encoded
mount points for example, rather than just
On 21/07/18 15:43, Bruno Haible wrote:
> Hi Pádraig,
>
>> I've attached a gnulib patch to document for iscntrl at least.
>
>> +This function does not support arguments outside of the range of the
>> +unsigned char type in locales with large character sets, on some platforms.
>> +OS X 10.5 will
On 22/07/18 08:12, Paul Eggert wrote:
> Pádraig Brady wrote:
>> I've also attached an alternative patch for df (in your name).
>
> That still has problems, since it can generate improperly-encoded strings in
> UTF-8 locales (if the inputs are improperly encoded), and can replace parts
> of
>
2018-07-22 23:12 GMT+08:00 Paul Eggert :
> Pádraig Brady wrote:
>>
>> I've also attached an alternative patch for df (in your name).
>
>
> That still has problems, since it can generate improperly-encoded strings in
> UTF-8 locales (if the inputs are improperly encoded), and can replace parts
> of
Pádraig Brady wrote:
I've also attached an alternative patch for df (in your name).
That still has problems, since it can generate improperly-encoded strings in
UTF-8 locales (if the inputs are improperly encoded), and can replace parts of
multibyte characters with '?' in non-UTF-8 locales.
2018-07-22 18:46 GMT+08:00 Bruno Haible :
> Chih-Hsuan Yen wrote:
>> The `c_iscntrl()` patch also fixes the issue on macOS. Please tell me
>> if you want me to test other patches, thanks!
>
> You could test how it behaves with mount points that contain U+2028 or
> U+2029 characters. On Linux, I'd
Chih-Hsuan Yen wrote:
> The `c_iscntrl()` patch also fixes the issue on macOS. Please tell me
> if you want me to test other patches, thanks!
You could test how it behaves with mount points that contain U+2028 or
U+2029 characters. On Linux, I'd test it like this. Hope it's similar
on macOS:
$
2018-07-22 6:43 GMT+08:00 Bruno Haible :
> Hi Pádraig,
>
>> I've attached a gnulib patch to document for iscntrl at least.
>
>> +This function does not support arguments outside of the range of the
>> +unsigned char type in locales with large character sets, on some platforms.
>> +OS X 10.5 will
Hi Pádraig,
> I've attached a gnulib patch to document for iscntrl at least.
> +This function does not support arguments outside of the range of the
> +unsigned char type in locales with large character sets, on some platforms.
> +OS X 10.5 will return non zero for characters >= 0x80 in UTF-8
On 21/07/18 07:20, Chih-Hsuan Yen wrote:
> Hi coreutils developers,
>
> I'm using coreutils on macOS High Sierra (10.13). I noticed that with
> `LANG=zh_TW.UTF-8`, `df` output is corrupted.
>
> �?�?系統 容�?? 已�?� �?��?� 已�?�% �??�?�?
> /dev/disk1s1234G 151G81G65% /
> /dev/disk1s4
20 matches
Mail list logo