I have various torture test utf8 in test/files/utf8. To see a trailing combining character eat a following space, you can do:
$ head -c 5 tests/files/utf8/test3.txt; echo ' ' Or $ head -c 5 tests/files/utf8/test1.txt; echo 'X' The problem is, it's easy to see visually, but I'm not quite sure how an automated test is expected to care? (Other than going "it produced the right output"... The OTHER fun thing is that -C and -x care about terminal size, or default to 80 columns when they can't determine it. In theory there's an ls -w option that lets you set the width, but I never needed/implemented it. (Andrew: you're welcome to if you like. I could walk you through it or you could read http://landley.net/toybox/code.html#lib_args and try to work it out yourself.) So if you did want to test this, you'd want to "touch" a test file with a funky utf8 name, make another test file with a long enough name to force minimal spacing, and then... do what? You'd just be proving they did or didn't fit on the same line. (Basically reproducing known/expected output. If they had one space between, these two would fit, with a minimum of two spaces, they'll wrap.) What _is_ a good test is "-C is utf8 fontmetrics, not a byte count". So if you do: $ mkdir sub $ cd sub $ touch $(cat ../tests/files/utf8/test3.txt) $ toybox ls The result fits on one line, when the file was 198 bytes long. (Note: if you don't put quotes around the "$()" it'll split it into 5 words because whitespace in the sentence. That said: $ wc -m ../tests/files/utf8/test3.txt 110 ../tests/files/utf8/test3.txt And I _visually_ count that as 22 characters. (That's the debian wc, not toybox saying that. Toybox matches debian here, but it seems conceptually wrong. It's like we want another option measuring combined characters. Maybe -M or something?) But testing how your terminal displays the result for "ls bridged the space or they're still visually distinct"...? I have no idea how to automate short of doing a screen capture in a known font and comparing the bitmap, which isn't remotely realistic. :) (There are, sadly, a lot of tests I know how to _perform_ but not _automate_. And the pending mkroot stuff merely _reduces_ that gap, not eliminate it.) Rob On 10/17/19 10:44 AM, enh via Toybox wrote: > new tests? (including the difficult combining-character case?) > > On Thu, Oct 17, 2019 at 8:35 AM Andrew Ilijic <ilijic.and...@gmail.com> wrote: >> >> From d4c03b672d8b18f7ce021e4fcb02e8bb86f38f5f Mon Sep 17 00:00:00 2001 >> From: Andrew Ilijic <ilijic.and...@gmail.com> >> Date: Thu, 17 Oct 2019 11:03:19 -0400 >> Subject: [PATCH] ls: Ensure file names are separated by 2 spaces >> >> We need two spaces between filenames because that is the convention >> followed by other implementations. More importantly, if we do not have >> two spaces, certain Unicode file names cause filenames to run together. >> In Unicode, combining characters come before the character they modify. >> If a filename ends in a combining character, the combining character >> attaches to the space that follows it, causing the space not to be >> visible. Having a two-space gap stops the above issue from happening. >> >> For context and a bit more information, see mailing list link below. >> https://www.mail-archive.com/toybox@lists.landley.net/msg05986.html >> --- >> toys/posix/ls.c | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/toys/posix/ls.c b/toys/posix/ls.c >> index 0956902f..d4c0211a 100644 >> --- a/toys/posix/ls.c >> +++ b/toys/posix/ls.c >> @@ -384,7 +384,8 @@ static void listfiles(int dirfd, struct dirtree *indir) >> memset(colsizes, 0, columns*sizeof(unsigned)); >> for (ul=0; ul<dtlen; ul++) { >> entrylen(sort[next_column(ul, dtlen, columns, &c)], len); >> - *len += totpad+1; >> + // Add `2` to `totpad` to ensure two spaces between filenames >> + *len += totpad+2; >> if (c == columns) break; >> // Expand this column if necessary, break if that puts us over >> budget >> if (*len > colsizes[c]) { >> -- >> 2.11.0 >> _______________________________________________ >> Toybox mailing list >> Toybox@lists.landley.net >> http://lists.landley.net/listinfo.cgi/toybox-landley.net > _______________________________________________ > Toybox mailing list > Toybox@lists.landley.net > http://lists.landley.net/listinfo.cgi/toybox-landley.net > _______________________________________________ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net