Hi Martijn,

Martijn van Duren wrote on Sun, Jan 17, 2016 at 12:58:38PM +0100:

> I've come across a fair amount of malformed file names by all sorts
> of causes.  Be it malware or just human error.  When such a malformed
> character is in an inconvenient place and can't be auto-completed
> I usually fix this by something like the following:
>
> $ cd "`ls | tail -1`"
> ksh: cd: /home/martijn/Muziek/Mot????rhead/N?? Sleep at All - No such file
> or directory
> $ cd "`/usr/src/bin/ls/ls | tail -1`"

Why not just

 $ cd N*Sleep*

That seems simpler to me.
Sure, if you have more than one file in the same directory showing
this problem, and the names are too similar to be independently
globbed, and you want to keep them, it's a bit more work:

 $ i=1; for f in N*Sleep*; do mv $f tmp$i; i=$((i+1)); done

I don't see a real reason to use ls(1) in such situations.

> My patch maintains the question marks when stdout is a tty, but returns
> the original byte otherwise.  Afaik the only logical use for the length
> is when doing formatted output, which is only when printing to a tty.

The rationale for weeding out invalid bytes and non-printable
characters is not that we don't know how many display columns the
terminal might use to represent them.  The rationale is that they
might cause the terminal to change state, to interpret them as
in-band control codes.

> This doesn't solve the case when ls is run over ssh -t and the content
> is redirected client-side, but you can't win them all.

Indeed.  I worry that might result in security violations.

The old ls(1) also weeded out non-printable bytes, in particular
control codes.

Even though this question is only tangentially related to UTF-8:
Do we want to change that?  Should ls(1) sometimes pass through
bytes that might be control codes for some terminals?  Imposing
the responsibility that non-isatty(3) ls(1) output never ends up
on a terminal on the user?

I tend to think "no".  I wouldn't want my ls(1) to sometimes generate
output that isn't safe on a terminal.  I doubt i would get into the
habit of considering whether or not ls(1) is safe to run in a given
situation.  I'd rather have it just be safe, always.  And deal with
the occasional insane file name with different tools.

What do people think?
  Ingo


> Index: ls.c
> ===================================================================
> RCS file: /cvs/src/bin/ls/ls.c,v
> retrieving revision 1.44
> diff -u -p -r1.44 ls.c
> --- ls.c      1 Dec 2015 18:36:13 -0000       1.44
> +++ ls.c      17 Jan 2016 10:57:03 -0000
> @@ -94,6 +94,7 @@ int f_type;                 /* add type character for 
>  int f_typedir;                       /* add type character for directories */
>  
>  int rval;
> +int istty = 0;
>  
>  int
>  ls_main(int argc, char *argv[])
> @@ -110,6 +111,7 @@ ls_main(int argc, char *argv[])
>  
>       /* Terminal defaults to -Cq, non-terminal defaults to -1. */
>       if (isatty(STDOUT_FILENO)) {
> +             istty = 1;
>               if ((p = getenv("COLUMNS")) != NULL)
>                       width = strtonum(p, 1, INT_MAX, NULL);
>               if (width == 0 &&
> Index: utf8.c
> ===================================================================
> RCS file: /cvs/src/bin/ls/utf8.c,v
> retrieving revision 1.1
> diff -u -p -r1.1 utf8.c
> --- utf8.c    1 Dec 2015 18:36:13 -0000       1.1
> +++ utf8.c    17 Jan 2016 10:57:03 -0000
> @@ -21,6 +21,8 @@
>  #include <stdlib.h>
>  #include <wchar.h>
>  
> +extern int istty;
> +
>  int
>  mbsprint(const char *mbs, int print)
>  {
> @@ -33,12 +35,12 @@ mbsprint(const char *mbs, int print)
>               if ((len = mbtowc(&wc, mbs, MB_CUR_MAX)) == -1) {
>                       (void)mbtowc(NULL, NULL, MB_CUR_MAX);
>                       if (print)
> -                             putchar('?');
> +                             putchar(istty ? '?' : *mbs);
>                       total_width++;
>                       len = 1;
>               } else if ((width = wcwidth(wc)) == -1) {
>                       if (print)
> -                             putchar('?');
> +                             putchar(istty ? '?' : *mbs);
>                       total_width++;
>               } else {
>                       if (print)

Reply via email to