Hi,
Stuart Henderson wrote on Sun, Jan 17, 2016 at 07:46:23PM +0000:
> On 2016/01/17 14:29, Ted Unangst wrote:
>> Ingo Schwarze wrote:
>>> The old ls(1) also weeded out non-printable bytes, in particular
>>> control codes.
>> The old ls only had this behavior for terminals however.
>> Redirecting to a file or pipe would always output the original bytes.
> I've used this a few times in the past, for example "ls | hexdump -C"
> or .."| vis", to find out what the characters used in some filename are.
> I'd find it surprising for this to not work.
Oops. What we currently have in the tree is broken in that respect,
i broke it, including the -q option.
Current behaviour is:
* SMALL: fully works, but no UTF-8 support
* not SMALL:
- LC_CTYPE=C on a tty or with -q: does '?', ok
- LC_CTYPE=en_US.UTF-8 on a tty or with -q: does '?', ok
- LC_CTYPE=C neither tty nor -q: does '?', wrong
- LC_CTYPE=en_US.UTF-8 neither tty nor -q: does '?', wrong
The following patch fixes the last two cases.
It is similar in spirit to what Martijn originally sent,
but fixes two issues with his patch:
1) Do not invent a new global variable, use the existing f_nonprint.
2) For valid, but non-printable codepoints, print all bytes of the
codepoint's encoding rather than just the first byte.
Should i commit this?
Yours,
Ingo