Re: why would "tr --complement --squeeze-repeats ..." append the substitution char once more? ...

2023-12-11 Thread Albretch Mueller
On 12/11/23, Greg Wooledge  wrote:
> 1) Many implementations of echo will interpret parts of their argument(s),
>in addition to processing options like -n.  If you want to print a
>variable's contents to standard output without *any* interpretation,
>use printf.
>
> printf %s "$myvar"
> printf '%s\n' "$myvar"
>

 I will use "printf ..." from now on.

> 2) As tomas already told you, the square brackets in
>
> tr -c -s '[A-Za-z0-9.]' _
>
>are literal.  You're using a command which will keep left and right
>square brackets in the input, *not* replacing them with underscores.
>This may not be what you want.

 My mistake, even though it didn't get in the way of what I was trying
to do. I replaced :alnum: by what I thought it meant and left the
brackets.

> 3) In locales other than C or POSIX, ranges like A-Z are *not* necessarily
>synonyms for [:upper:].  As I've already mentioned, GNU tr is known to
>contain bugs, so you're getting lucky here.  The bugs in GNU tr happen
>to work the way you're expecting, so that A-Z is treated like [:upper:]
>when it should not be.  If at some point in the future GNU tr is fixed
>to conform to POSIX, your script may break.
>
>The correct tr command you should be using if you want to retain
>accented letters (as defined in your locale) is:
>
> tr -c -s '[:alnum:].' _
>
>If you want to discard accented letters, then either of these is OK:
>
> LC_COLLATE=C tr -c -s '[:alnum:].' _
> LC_COLLATE=C tr -c -s 'A-Za-z0-9.' _
>

 I like your second one liner much better (LC_COLLATE=C tr -c -s 'A-Za-z0-9.' _)

 I tend to avoid '[:alnum:].' because the intended meaning of
"ALphabetic et NUMeric" characters, even though it depends on the
locale has a strong ASCII accent to it.

> Thus, we come full circle.

 Yes, we did. Thank you, lbrtchx



Re: why would "tr --complement --squeeze-repeats ..." append the substitution char once more? ...

2023-12-11 Thread debian-user
Albretch Mueller  wrote:
> echo "abc123" > file.txt
> ftype=$(file --brief file.txt)
> echo "// __ \$ftype: |${ftype}|"
> ftypelen=${#ftype}
> echo "// __ \$ftypelen: |${ftypelen}|"
> 
> # removing spaces ...
> ftype2=$(echo "${ftype}" | tr --complement --squeeze-repeats
> '[A-Za-z0-9.]' '_');
> echo "// __ \$ftype2: |${ftype2}|"
> ftype2len=${#ftype2}
> echo "// __ \$ftype2len: |${ftype2len}|"
> 
> lbrtchx

Short answer. tr doesn't append anything. echo does output a linefeed
at the end of the string, unless you stop it. tr dutifully translates
that to an underscore.



Re: "echo" literally in sh scripts (was: why would "tr --complement --squeeze-repeats ..." append the substitution char once more? ...)

2023-12-11 Thread Greg Wooledge
On Mon, Dec 11, 2023 at 10:16:35AM -0500, Stefan Monnier wrote:
> > 1) Many implementations of echo will interpret parts of their argument(s),
> >in addition to processing options like -n.  If you want to print a
> >variable's contents to standard output without *any* interpretation,
> >use printf.
> >
> > printf %s "$myvar"
> > printf '%s\n' "$myvar"
> 
> Interesting.  I used the following instead:
> 
> bugit_echo () {
> # POSIX `echo` has all kinds of "features" we don't want, such as
> # handling of \c and -n.
> cat < $*
> ENDDOC
> }

That requires an external command (one fork), plus whatever overhead is
used by the << implementation (temp file or pipe, depending on shell and
version).  It's not wrong, but an implementation using nothing but
builtins is usually preferable.

echo() { printf '%s\n' "$*"; }

It's also worth mentioning that both of these rely on the expansion of $*
with a default or nearly-default IFS variable.  If you want it to work
when IFS may have been altered, you can do this in bash:

echo() { local IFS=' '; printf '%s\n' "$*"; }

In sh, you'd need to fork a subshell:

echo() { (IFS=' '; printf '%s\n' "$*"); }

Or if you're a golfer:

echo() (IFS=' '; printf '%s\n' "$*")

I *really* dislike that syntax, but that's just me.  Some people use it.



"echo" literally in sh scripts (was: why would "tr --complement --squeeze-repeats ..." append the substitution char once more? ...)

2023-12-11 Thread Stefan Monnier
> 1) Many implementations of echo will interpret parts of their argument(s),
>in addition to processing options like -n.  If you want to print a
>variable's contents to standard output without *any* interpretation,
>use printf.
>
> printf %s "$myvar"
> printf '%s\n' "$myvar"

Interesting.  I used the following instead:

bugit_echo () {
# POSIX `echo` has all kinds of "features" we don't want, such as
# handling of \c and -n.
cat <

Re: why would "tr --complement --squeeze-repeats ..." append the substitution char once more? ...

2023-12-11 Thread tomas
On Mon, Dec 11, 2023 at 09:55:54AM -0500, Greg Wooledge wrote:

[...]

Greg, your analyses are always impressive. And enjoyable.

Thanks for this

cheers
-- 
t


signature.asc
Description: PGP signature


Re: why would "tr --complement --squeeze-repeats ..." append the substitution char once more? ...

2023-12-11 Thread Greg Wooledge
On Mon, Dec 11, 2023 at 02:00:49PM +, Albretch Mueller wrote:
>  Ach, yes! I forgot echo by default appends a new line character at
> the end of every string it spits out. In order to suppress it you need
> to use the "n" option: "echo -n ..."
> 
> _FL_TYPE="   abc  á é í ó ú ü ñ Á É Í Ó Ú Ü Ñ 123 birdiehere ¿ ¡ §
> ASCII  ä ö ü ß Ä Ö Ü Text"
> echo "// __ \$_FL_TYPE: |${_FL_TYPE}|"
> _FL_TYPE=$(echo "${_FL_TYPE}" | xargs)
> echo "// __ \$_FL_TYPE: |${_FL_TYPE}|"
> _FL_TYPE=$(echo -n "${_FL_TYPE}" |  tr --complement --squeeze-repeats
> '[A-Za-z0-9.]' '_');
> echo "// __ \$_FL_TYPE: |${_FL_TYPE}|"
> 
> // __ $_FL_TYPE: |   abc  á é í ó ú ü ñ Á É Í Ó Ú Ü Ñ 123 birdiehere
> ¿ ¡ § ASCII  ä ö ü ß Ä Ö Ü Text|
> // __ $_FL_TYPE: |abc á é í ó ú ü ñ Á É Í Ó Ú Ü Ñ 123 birdiehere ¿ ¡
> § ASCII ä ö ü ß Ä Ö Ü Text|
> // __ $_FL_TYPE: |abc_123_birdie_here_ASCII_Text|

OK.  Tomas's analysis was better than mine in this case.  Looks like CR
was not the issue this time around.  I do have some comments, though.

1) Many implementations of echo will interpret parts of their argument(s),
   in addition to processing options like -n.  If you want to print a
   variable's contents to standard output without *any* interpretation,
   use printf.

printf %s "$myvar"
printf '%s\n' "$myvar"

2) As tomas already told you, the square brackets in

tr -c -s '[A-Za-z0-9.]' _

   are literal.  You're using a command which will keep left and right
   square brackets in the input, *not* replacing them with underscores.
   This may not be what you want.

3) In locales other than C or POSIX, ranges like A-Z are *not* necessarily
   synonyms for [:upper:].  As I've already mentioned, GNU tr is known to
   contain bugs, so you're getting lucky here.  The bugs in GNU tr happen
   to work the way you're expecting, so that A-Z is treated like [:upper:]
   when it should not be.  If at some point in the future GNU tr is fixed
   to conform to POSIX, your script may break.

   The correct tr command you should be using if you want to retain
   accented letters (as defined in your locale) is:

tr -c -s '[:alnum:].' _

   If you want to discard accented letters, then either of these is OK:

LC_COLLATE=C tr -c -s '[:alnum:].' _
LC_COLLATE=C tr -c -s 'A-Za-z0-9.' _

4) The xargs command, which you used above, uses quotation mark characters
   as well as whitespace to define input words.  Your example worked only
   because your input does not contain any single or double quotes.

Here's a demonstration of A-Z not equating to [:upper:] using GNU sed,
which is behaving correctly:

unicorn:~$ x='   abc  á é í ó ú ü ñ Á É Í Ó Ú Ü Ñ 123 birdiehere ¿ '
unicorn:~$ printf '%s\n' "$x" | sed 's/[A-Z]//g'
   abc  á é í ó ú ü ñ123 birdiehere ¿ 
unicorn:~$ printf '%s\n' "$x" | LC_COLLATE=C sed 's/[A-Z]//g'
   abc  á é í ó ú ü ñ Á É Í Ó Ú Ü Ñ 123 birdiehere ¿ 

The meaning of [A-Z] in the sed command depends on the locale.  In my
locale, which is en_US.utf8, characters like Á are part of the A-Z range.
In the C locale, they aren't, as seen in the last command above.

The use of [A-Z] in regular expressions and globs is a *very* heavily
debated topic, and I'm only scratching the surface here.  Honestly, you
really should avoid using it.  It's just too unpredictable.

Here's an example of xargs failing when its input contains a quote:

unicorn:~$ echo 'foo "bar' | xargs
xargs: unmatched double quote; by default quotes are special to xargs 
unless you use the -0 option
foo

You can't use xargs to normalize whitespace safely.  In fact, the proper
way to normalize whitespace is...

unicorn:~$ printf 'foo "bar \t\t \t  baz  \n' | tr -s ' \t' ' '
foo "bar baz 

Thus, we come full circle.



Re: why would "tr --complement --squeeze-repeats ..." append the substitution char once more? ...

2023-12-11 Thread Max Nikulin

On 11/12/2023 21:00, Albretch Mueller wrote:

// __ $_FL_TYPE: |abc á é í ó ú ü ñ Á É Í Ó Ú Ü Ñ 123 birdiehere ¿ ¡
§ ASCII ä ö ü ß Ä Ö Ü Text|
// __ $_FL_TYPE:|abc_123_birdie_here_ASCII_Text|


https://pypi.org/project/Unidecode/
should be more friendly to languages other than English.



Re: why would "tr --complement --squeeze-repeats ..." append the substitution char once more? ...

2023-12-11 Thread Albretch Mueller
 Ach, yes! I forgot echo by default appends a new line character at
the end of every string it spits out. In order to suppress it you need
to use the "n" option: "echo -n ..."

_FL_TYPE="   abc  á é í ó ú ü ñ Á É Í Ó Ú Ü Ñ 123 birdiehere ¿ ¡ §
ASCII  ä ö ü ß Ä Ö Ü Text"
echo "// __ \$_FL_TYPE: |${_FL_TYPE}|"
_FL_TYPE=$(echo "${_FL_TYPE}" | xargs)
echo "// __ \$_FL_TYPE: |${_FL_TYPE}|"
_FL_TYPE=$(echo -n "${_FL_TYPE}" |  tr --complement --squeeze-repeats
'[A-Za-z0-9.]' '_');
echo "// __ \$_FL_TYPE: |${_FL_TYPE}|"

// __ $_FL_TYPE: |   abc  á é í ó ú ü ñ Á É Í Ó Ú Ü Ñ 123 birdiehere
¿ ¡ § ASCII  ä ö ü ß Ä Ö Ü Text|
// __ $_FL_TYPE: |abc á é í ó ú ü ñ Á É Í Ó Ú Ü Ñ 123 birdiehere ¿ ¡
§ ASCII ä ö ü ß Ä Ö Ü Text|
// __ $_FL_TYPE: |abc_123_birdie_here_ASCII_Text|

 Thank you,
 lbrtchx



Re: why would "tr --complement --squeeze-repeats ..." append the substitution char once more? ...

2023-12-11 Thread Greg Wooledge
On Mon, Dec 11, 2023 at 02:11:46PM +0100, to...@tuxteam.de wrote:
> On Mon, Dec 11, 2023 at 07:42:10AM -0500, Greg Wooledge wrote:
> > Looks like GNU tr in Debian 12 still doesn't handle multibyte characters
> > correctly:
> > 
> > unicorn:~$ echo 'mañana' | tr ñ X
> > maXXana
> 
> Hey, you just gave us a handy way to count how many encoding units
> a character takes:
> 
>   tomas@trotzki:~$ echo 'birdiehere' | tr -c 'a-z' X
>   birdiehereX

Cute as that is, there are better ways.

unicorn:~$ x=ñ; (echo "${#x}"; LC_ALL=C; echo "${#x}")
1
2



Re: why would "tr --complement --squeeze-repeats ..." append the substitution char once more? ...

2023-12-11 Thread tomas
On Mon, Dec 11, 2023 at 07:42:10AM -0500, Greg Wooledge wrote:
> On Mon, Dec 11, 2023 at 09:37:42AM +0100, to...@tuxteam.de wrote:
> >  2. This is tr, not regexp, so '[A-Za-z0-9.]' isn't doing what you
> >think it does. It will match '[', 'A' to 'Z', 'a' to 'z','.' and
> >']'. I guess you want to say 'A-Za-z0-9.'
> 
> Well spotted.
> 
> >  3. As a convenience, tr has char classes. Perhaps [:alnum:] is for
> >you. No idea whether this is a GNU extension
> 
> It's POSIX.  100% portable, as long as you ignore any bugs in GNU tr.
> 
> Looks like GNU tr in Debian 12 still doesn't handle multibyte characters
> correctly:
> 
> unicorn:~$ echo 'mañana' | tr ñ X
> maXXana
> 
> So... as long as you're working in the C locale, where [:alnum:] is
> just the ASCII capital and lowercase letters and digits, you should be
> fine.

Hey, you just gave us a handy way to count how many encoding units
a character takes:

  tomas@trotzki:~$ echo 'birdiehere' | tr -c 'a-z' X
  birdiehereX

;-)

Cheers
-- 
t


signature.asc
Description: PGP signature


Re: why would "tr --complement --squeeze-repeats ..." append the substitution char once more? ...

2023-12-11 Thread Greg Wooledge
On Mon, Dec 11, 2023 at 09:37:42AM +0100, to...@tuxteam.de wrote:
>  2. This is tr, not regexp, so '[A-Za-z0-9.]' isn't doing what you
>think it does. It will match '[', 'A' to 'Z', 'a' to 'z','.' and
>']'. I guess you want to say 'A-Za-z0-9.'

Well spotted.

>  3. As a convenience, tr has char classes. Perhaps [:alnum:] is for
>you. No idea whether this is a GNU extension

It's POSIX.  100% portable, as long as you ignore any bugs in GNU tr.

Looks like GNU tr in Debian 12 still doesn't handle multibyte characters
correctly:

unicorn:~$ echo 'mañana' | tr ñ X
maXXana

So... as long as you're working in the C locale, where [:alnum:] is
just the ASCII capital and lowercase letters and digits, you should be
fine.



Re: why would "tr --complement --squeeze-repeats ..." append the substitution char once more? ...

2023-12-11 Thread Greg Wooledge
On Mon, Dec 11, 2023 at 11:25:13AM +, Albretch Mueller wrote:
> In the case of: "ASCII text"
>  what should come out of it is: "ASCII_text"
>  not: "ASCII_text_"
>  no underscore at the end. That is the question I have.

OK, here's my guess.

The lines of code that you showed us are not actually in a script.
They're just in a FILE, and you're running a command like this:

sh myfile

Furthermore, I am guessing that the lines of code in this file have
Microsoft CR+LF line endings.  Therefore, when you do a variable
assignment like

ftype=$(file --brief "$whatever")

you end up with a Carriage Return character at the end of the variable's
content (because there is one at the end of this command).

Since you never actually SHOWED US the command you ran, or the output that
was produced, which could have made this really, really obvious, we're
forced to guess.  My guess might be right, or wrong.  But it's the best
guess I have with the limited information you've chosen to share with us.

What I mean by "obvious" is this.  Here's part of your code:

echo "abc123" > file.txt
ftype=$(file --brief file.txt)
echo "// __ \$ftype: |${ftype}|"

If my guess is correct, you got output that looks like this:

|/ __ $ftype: |ASCII text

Showing this would have made it immediately clear that a CR is involved.



Re: why would "tr --complement --squeeze-repeats ..." append the substitution char once more? ...

2023-12-11 Thread tomas
On Mon, Dec 11, 2023 at 11:25:13AM +, Albretch Mueller wrote:
>  "tr --complement --squeeze-repeats ..." makes sure that the replaced
> characters only appear once (that it doesn't immediately repeat). Say
> you have something like "  " (two spaces) or "?$|" (three characters)
> which will be replaced by just an underscore.

...which would change the length, as I wrote.

> In the case of: "ASCII text"
>  what should come out of it is: "ASCII_text"
>  not: "ASCII_text_"
>  no underscore at the end. That is the question I have.

That depends on whether your "ASCII text" has some thingy at the end
which you don't see. A newline, perchance?

>  I use such constructs as: "[A-Za-z0-9.]" to make explicit to myself
> and other people what I mean. I work in corpora research dealing with
> text based various alphabets not just in ASCII so I avoid any kinds of
> linguistic/cultural shortcuts and abbreviations.

What has this to do with how tr works? It will treat [ and ] as characters
not to substitute. I pointed that out, because it might have been unintended:

  echo -n 'This is a  text with [some brackets] in   it' | tr -cs 
"[A-Za-z0-9.]" "_"
  This_is_a_text_with_[some_brackets]_in_it

(Note this "-n" on the echo, btw? Without it, I'd be getting a "_" at the
end, the transliterated newline).

Do whatever you want :-)

Cheers
-- 
t


signature.asc
Description: PGP signature


Re: why would "tr --complement --squeeze-repeats ..." append the substitution char once more? ...

2023-12-11 Thread Albretch Mueller
 "tr --complement --squeeze-repeats ..." makes sure that the replaced
characters only appear once (that it doesn't immediately repeat). Say
you have something like "  " (two spaces) or "?$|" (three characters)
which will be replaced by just an underscore.

In the case of: "ASCII text"
 what should come out of it is: "ASCII_text"
 not: "ASCII_text_"
 no underscore at the end. That is the question I have.

 I use such constructs as: "[A-Za-z0-9.]" to make explicit to myself
and other people what I mean. I work in corpora research dealing with
text based various alphabets not just in ASCII so I avoid any kinds of
linguistic/cultural shortcuts and abbreviations.

 lbrtchx

On 12/11/23, to...@tuxteam.de  wrote:
> On Mon, Dec 11, 2023 at 08:04:06AM +, Albretch Mueller wrote:
>> On 12/11/23, Greg Wooledge  wrote:
>> > Please tell us ...
>>
>>  OK, here is what I did as a t-table
>
> [...]
>
> Your style is confusing, to say the least. Why not play with minimal
> examples and work your way up from that?
>
>> the two strings are not the same length even though your are just
>> replacing ASCII characters, why did:
>> echo "${ftype}" | tr --complement --squeeze-repeats '[A-Za-z0-9.]' '_'
>> place a character at the end?
>
> Two things stick out:
>
>  1. with --squeeze-repeats you are challenging tr to output less
>characters than the input has:
>
>trotzki:~$ echo -n "this is a #   string ###" | tr -cs 'a-z' '_'
>=> this_is_a_string_
>
>(I allowed myself to simplify things a bit) See? tr is squeezing
>repeats (repeated matches), the space-plus-three-hashes at the
>end gets squeezed to just one _, thus changing the length.
>If your strings contain more than one non-alphanumeric (something
>I don't feel like even trying a guess at), this is bound to happen.
>You ordered it.
>
>  2. This is tr, not regexp, so '[A-Za-z0-9.]' isn't doing what you
>think it does. It will match '[', 'A' to 'Z', 'a' to 'z','.' and
>']'. I guess you want to say 'A-Za-z0-9.'
>
>  3. As a convenience, tr has char classes. Perhaps [:alnum:] is for
>you. No idea whether this is a GNU extension
>
>  4. In case of doubt, read the man page :)
>
> Cheers
> --
> t
>



Re: why would "tr --complement --squeeze-repeats ..." append the substitution char once more? ...

2023-12-11 Thread tomas
On Mon, Dec 11, 2023 at 08:04:06AM +, Albretch Mueller wrote:
> On 12/11/23, Greg Wooledge  wrote:
> > Please tell us ...
> 
>  OK, here is what I did as a t-table

[...]

Your style is confusing, to say the least. Why not play with minimal
examples and work your way up from that?

> the two strings are not the same length even though your are just
> replacing ASCII characters, why did:
> echo "${ftype}" | tr --complement --squeeze-repeats '[A-Za-z0-9.]' '_'
> place a character at the end?

Two things stick out:

 1. with --squeeze-repeats you are challenging tr to output less
   characters than the input has:

   trotzki:~$ echo -n "this is a #   string ###" | tr -cs 'a-z' '_'
   => this_is_a_string_

   (I allowed myself to simplify things a bit) See? tr is squeezing
   repeats (repeated matches), the space-plus-three-hashes at the
   end gets squeezed to just one _, thus changing the length.
   If your strings contain more than one non-alphanumeric (something
   I don't feel like even trying a guess at), this is bound to happen.
   You ordered it.

 2. This is tr, not regexp, so '[A-Za-z0-9.]' isn't doing what you
   think it does. It will match '[', 'A' to 'Z', 'a' to 'z','.' and
   ']'. I guess you want to say 'A-Za-z0-9.'

 3. As a convenience, tr has char classes. Perhaps [:alnum:] is for
   you. No idea whether this is a GNU extension

 4. In case of doubt, read the man page :)

Cheers
-- 
t


signature.asc
Description: PGP signature


Re: why would "tr --complement --squeeze-repeats ..." append the substitution char once more? ...

2023-12-11 Thread Albretch Mueller
On 12/11/23, Greg Wooledge  wrote:
> Please tell us ...

 OK, here is what I did as a t-table

echo "abc123" > file.txt  # obvious text file
ftype=$(file --brief file.txt)  # got its type as reported by the "file" utility
echo "// __ \$ftype: |${ftype}|"
ftypelen=${#ftype}  # length of the string containing the file type
echo "// __ \$ftypelen: |${ftypelen}|"

# removing spaces et any other char which is not '[A-Za-z0-9.]'
replacing with underscores ...
# here is what I think to be an error happened instead of just
replacing ... by underscores
# it adds an underscore at the end?
ftype2=$(echo "${ftype}" | tr --complement --squeeze-repeats
'[A-Za-z0-9.]' '_');
echo "// __ \$ftype2: |${ftype2}|"
ftype2len=${#ftype2}
echo "// __ \$ftype2len: |${ftype2len}|"

the two strings are not the same length even though your are just
replacing ASCII characters, why did:
echo "${ftype}" | tr --complement --squeeze-repeats '[A-Za-z0-9.]' '_'
place a character at the end?
Probably echo and tr are not dancing well together. echo might be
tailgating an end of string character which tr then replaces with an
underscore.
which option do I use with echo for that not to happen?
SHould I probably play with IFS ...?

lbrtchx


On 12/11/23, Greg Wooledge  wrote:
> On Mon, Dec 11, 2023 at 02:53:07AM +, Albretch Mueller wrote:
>> echo "abc123" > file.txt
>> ftype=$(file --brief file.txt)
>> echo "// __ \$ftype: |${ftype}|"
>> ftypelen=${#ftype}
>> echo "// __ \$ftypelen: |${ftypelen}|"
>>
>> # removing spaces ...
>> ftype2=$(echo "${ftype}" | tr --complement --squeeze-repeats
>> '[A-Za-z0-9.]' '_');
>> echo "// __ \$ftype2: |${ftype2}|"
>> ftype2len=${#ftype2}
>> echo "// __ \$ftype2len: |${ftype2len}|"
>
> Please tell us:
>
>  * What you are trying to do.
>
>  * What you did (is the previous code all in a script?  if so, this is a
>good answer for this part).
>
>  * What result you got.
>
>  * What you expected to get.



Re: why would "tr --complement --squeeze-repeats ..." append the substitution char once more? ...

2023-12-10 Thread Greg Wooledge
On Mon, Dec 11, 2023 at 02:53:07AM +, Albretch Mueller wrote:
> echo "abc123" > file.txt
> ftype=$(file --brief file.txt)
> echo "// __ \$ftype: |${ftype}|"
> ftypelen=${#ftype}
> echo "// __ \$ftypelen: |${ftypelen}|"
> 
> # removing spaces ...
> ftype2=$(echo "${ftype}" | tr --complement --squeeze-repeats
> '[A-Za-z0-9.]' '_');
> echo "// __ \$ftype2: |${ftype2}|"
> ftype2len=${#ftype2}
> echo "// __ \$ftype2len: |${ftype2len}|"

Please tell us:

 * What you are trying to do.

 * What you did (is the previous code all in a script?  if so, this is a
   good answer for this part).

 * What result you got.

 * What you expected to get.



why would "tr --complement --squeeze-repeats ..." append the substitution char once more? ...

2023-12-10 Thread Albretch Mueller
echo "abc123" > file.txt
ftype=$(file --brief file.txt)
echo "// __ \$ftype: |${ftype}|"
ftypelen=${#ftype}
echo "// __ \$ftypelen: |${ftypelen}|"

# removing spaces ...
ftype2=$(echo "${ftype}" | tr --complement --squeeze-repeats
'[A-Za-z0-9.]' '_');
echo "// __ \$ftype2: |${ftype2}|"
ftype2len=${#ftype2}
echo "// __ \$ftype2len: |${ftype2len}|"

lbrtchx