Re: $RANDOM not Cryptographically secure pseudorandom number generator

2018-12-03 Thread Chet Ramey
On 12/3/18 11:31 AM, Ole Tange wrote:
> On Mon, Dec 3, 2018 at 3:56 PM Chet Ramey  wrote:
> 
>> There has to be a compelling reason to change this, especially at a point
>> so close to a major release.
> 
> The reason for my submission was that I needed a bunch of random
> numbers in a shell script, but I needed them to be high quality.
> Luckily I did not just assume that Bash delivers high quality random
> numbers, but I read the source code, and then found that the quality
> was low. I do not think must users would do that.

This is always requirements-driven. Nobody expects to get cryptographic-
quality PRNGs out of the shell (or any of the libc interfaces, tbh), and
that's never been promised or expected. You can't really expect that from
something that only promises 16 bits.

However, for common scripting tasks like generating temporary filenames,
it's perfectly adequate.

Chet
-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: $RANDOM not Cryptographically secure pseudorandom number generator

2018-12-03 Thread Eduardo Bustamante
On Mon, Dec 3, 2018 at 9:36 AM Greg Wooledge  wrote:
>
> On Mon, Dec 03, 2018 at 05:31:18PM +0100, Ole Tange wrote:
> > Luckily I did not just assume that Bash delivers high quality random
> > numbers, but I read the source code, and then found that the quality
> > was low. I do not think must users would do that.
>
> You're correct.  Most users would not have to read the source code to
> know that the built-in PRNG in bash (or in libc, or in basically ANY
> other standard thing) is of lower than cryptographic quality.
>
> Most users already KNOW this.

I have to echo this. If you are writing an application that requires
high quality random number, the onus is on YOU to ensure that you're
using quality sources and a good CSRNG. It would be a user mistake to
just use whatever the standard library of the run-time you're using
provides. Do we have to change C's rand() too? Or python's "random"
module? Or perl's "rand"? Or ruby's? (etc etc)


I do agree that adding a note in the manual to this effect would be nice though.



Re: $RANDOM not Cryptographically secure pseudorandom number generator

2018-12-03 Thread Greg Wooledge
On Mon, Dec 03, 2018 at 05:31:18PM +0100, Ole Tange wrote:
> Luckily I did not just assume that Bash delivers high quality random
> numbers, but I read the source code, and then found that the quality
> was low. I do not think must users would do that.

You're correct.  Most users would not have to read the source code to
know that the built-in PRNG in bash (or in libc, or in basically ANY
other standard thing) is of lower than cryptographic quality.

Most users already KNOW this.



Re: $RANDOM not Cryptographically secure pseudorandom number generator

2018-12-03 Thread Ole Tange
On Mon, Dec 3, 2018 at 3:56 PM Chet Ramey  wrote:

> There has to be a compelling reason to change this, especially at a point
> so close to a major release.

The reason for my submission was that I needed a bunch of random
numbers in a shell script, but I needed them to be high quality.
Luckily I did not just assume that Bash delivers high quality random
numbers, but I read the source code, and then found that the quality
was low. I do not think must users would do that.

The man page does not warn about the low quality either, and it does
not point to a way to get high quality numbers. Somehow we expect the
user to simply know this.

So from personal experience I have wasted a few hours on that account.

Had I simply assumed the numbers were high quality, it might have
caused problems for me at a later stage.

And it is protect users who do not read the man page and source code
that I suggest the change.

> You might be expecting too much from bash's random number generator. Is
> the problem that its period is at most 2**16? For its intended uses, the
> cycle length is acceptable. Do you disagree?

If I read the man page, I do not see what the intended use is. Where
is that documented?

If the user's view on the intended use differs from the developers',
then there is a risk of misaligned expectations. Documenting the
developers' view is IMHO a poor way of mitigating this, if there is a
simple solution that will satisfy the demanding user.

I see software daily that is being use in ways it was not intended.
Usually it does not break, and for GNU tools this (in my experience)
is especially true, because the GNU project officially endorses
writing robust programs.

So my suggestion is really just to be proactive, so that when users do
not use it in the intended way, it will still not break.

If you choose not to implement a CSPRNG, then please at least make it
clear in the man page that $RANDOM is a poor RNG, and what the
intended use is.


/Ole



Re: remove empty '' in ${var@Q} result?

2018-12-03 Thread Chet Ramey
On 11/28/18 9:58 PM, Clark Wang wrote:
> On Wed, Nov 8, 2017 at 9:46 PM Chet Ramey  > wrote:
> 
> On 11/7/17 11:38 PM, Clark Wang wrote:
> 
> >         I made a patch (also attached). Please see if it's ok.
> >
> >
> >     Updated by dealing with empty strings (and malloc'ing 2 more bytes)
> >     though I'm not sure if it's necessary since the func
> >     sh_quote_reusable() already handles empty strings.
> >
> >
> > Hi Chet, do you have a look at my patch?
> 
> I did. It's on the list of possible things for the next version. Since 
> it's
> only a cosmetic issue, it's not a high priority.
> 
> 
> Hi Chet,
> 
> Is it possible to make the change in the coming 5.0 release?

I am not making changes for bash-5.0 at this point, only bug fixes.

Chet
-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: $RANDOM not Cryptographically secure pseudorandom number generator

2018-12-03 Thread Greg Wooledge
On Mon, Dec 03, 2018 at 09:56:33AM -0500, Chet Ramey wrote:
> There has to be a compelling reason to change this, especially at a point
> so close to a major release.
> 
> You might be expecting too much from bash's random number generator. Is
> the problem that its period is at most 2**16? For its intended uses, the
> cycle length is acceptable. Do you disagree?

I suspect he doesn't share the same understanding of the "intended
uses" of bash's $RANDOM as the rest of us.

It's meant for displaying a random wallpaper image from a directory,
or for playing a random audio file from a directory.  Or for playing a
number guessing game with a 6 year old.

It is emphatically NOT for generating passwords.



Re: Incorrect path canonicalisation at autocompletion

2018-12-03 Thread Mattias Andrée
On Mon, 3 Dec 2018 09:58:33 -0500
Chet Ramey  wrote:

> On 12/3/18 9:49 AM, Mattias Andrée wrote:
> > On Mon, 3 Dec 2018 09:33:48 -0500
> > Chet Ramey  wrote:
> >   
> >> On 12/1/18 3:12 PM, Mattias Andrée wrote:  
> >>> Using Bash 4.4.023, type
> >>>
> >>>   cd
> >>>   mkdir -p 1/2
> >>>   cd 1/2
> >>>   touch ../../3
> >>>   ln -s ~ 4
> >>>   touch 5
> >>>   ls 4/../
> >>>
> >>> without pressing enter at the last line,
> >>> instead press  twice. 4/ and 5 will
> >>> be suggested, but if you press 
> >>> you will see that it should suggest the> files in /home.
> >>
> >> It's not a bug. Bash maintains a logical view of the file system and the
> >> current directory for cd, pwd, and $PWD, as Posix specifies.  One of the
> >> consequences is that the pathname of the current directory depends on the
> >> path used to reach it, which affects how bash canonicalizes `..'. Bash is
> >> consistent in  its use of this logical view across shell features, which
> >> includes completion.
> >>
> >> If you want to see a physical view of the file system, use
> >> `set -o physical'.
> >>  
> > 
> > Is there a way to only get physical view for
> > completion but logical view for cd, pwd, and $PWD?  
> 
> You can by using set -o physical within a completion function or running
> with set -o physical all the time and using `cd -L' and `pwd -L'.
> 

Thanks!



Re: Incorrect path canonicalisation at autocompletion

2018-12-03 Thread Chet Ramey
On 12/3/18 9:49 AM, Mattias Andrée wrote:
> On Mon, 3 Dec 2018 09:33:48 -0500
> Chet Ramey  wrote:
> 
>> On 12/1/18 3:12 PM, Mattias Andrée wrote:
>>> Using Bash 4.4.023, type
>>>
>>> cd
>>> mkdir -p 1/2
>>> cd 1/2
>>> touch ../../3
>>> ln -s ~ 4
>>> touch 5
>>> ls 4/../
>>>
>>> without pressing enter at the last line,
>>> instead press  twice. 4/ and 5 will
>>> be suggested, but if you press 
>>> you will see that it should suggest the> files in /home.  
>>
>> It's not a bug. Bash maintains a logical view of the file system and the
>> current directory for cd, pwd, and $PWD, as Posix specifies.  One of the
>> consequences is that the pathname of the current directory depends on the
>> path used to reach it, which affects how bash canonicalizes `..'. Bash is
>> consistent in  its use of this logical view across shell features, which
>> includes completion.
>>
>> If you want to see a physical view of the file system, use
>> `set -o physical'.
>>
> 
> Is there a way to only get physical view for
> completion but logical view for cd, pwd, and $PWD?

You can by using set -o physical within a completion function or running
with set -o physical all the time and using `cd -L' and `pwd -L'.

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: $RANDOM not Cryptographically secure pseudorandom number generator

2018-12-03 Thread Chet Ramey
On 12/2/18 6:13 PM, Ole Tange wrote:
> On Wed, Nov 21, 2018 at 11:45 PM Chet Ramey  wrote:
>> On 11/21/18 3:07 PM, Ole Tange wrote:
>>> 'brand' in variables.c is comparable in size to ChaCha20 and ChaCha20
>>> is not completely broken:
>>> https://en.wikipedia.org/wiki/Salsa20
>>>
>>> Could we please replace 'brand' with ChaCha20?
>>
>> What is your application that you need something more complicated than
>> the existing PRNG?
> 
> I do not have that currently, but it seems like a fairly small change
> and it seems odd to have modern software not use modern algorithms.

There has to be a compelling reason to change this, especially at a point
so close to a major release.

You might be expecting too much from bash's random number generator. Is
the problem that its period is at most 2**16? For its intended uses, the
cycle length is acceptable. Do you disagree?

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: Incorrect path canonicalisation at autocompletion

2018-12-03 Thread Mattias Andrée
On Mon, 3 Dec 2018 09:33:48 -0500
Chet Ramey  wrote:

> On 12/1/18 3:12 PM, Mattias Andrée wrote:
> > Using Bash 4.4.023, type
> > 
> > cd
> > mkdir -p 1/2
> > cd 1/2
> > touch ../../3
> > ln -s ~ 4
> > touch 5
> > ls 4/../
> > 
> > without pressing enter at the last line,
> > instead press  twice. 4/ and 5 will
> > be suggested, but if you press 
> > you will see that it should suggest the> files in /home.  
> 
> It's not a bug. Bash maintains a logical view of the file system and the
> current directory for cd, pwd, and $PWD, as Posix specifies.  One of the
> consequences is that the pathname of the current directory depends on the
> path used to reach it, which affects how bash canonicalizes `..'. Bash is
> consistent in  its use of this logical view across shell features, which
> includes completion.
> 
> If you want to see a physical view of the file system, use
> `set -o physical'.
> 

Is there a way to only get physical view for
completion but logical view for cd, pwd, and $PWD?



Re: Incorrect path canonicalisation at autocompletion

2018-12-03 Thread Chet Ramey
On 12/1/18 3:12 PM, Mattias Andrée wrote:
> Using Bash 4.4.023, type
> 
>   cd
>   mkdir -p 1/2
>   cd 1/2
>   touch ../../3
>   ln -s ~ 4
>   touch 5
>   ls 4/../
> 
> without pressing enter at the last line,
> instead press  twice. 4/ and 5 will
> be suggested, but if you press 
> you will see that it should suggest the> files in /home.

It's not a bug. Bash maintains a logical view of the file system and the
current directory for cd, pwd, and $PWD, as Posix specifies.  One of the
consequences is that the pathname of the current directory depends on the
path used to reach it, which affects how bash canonicalizes `..'. Bash is
consistent in  its use of this logical view across shell features, which
includes completion.

If you want to see a physical view of the file system, use
`set -o physical'.

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: Bash removes unrequested characters in bracket expressions (not a range).

2018-12-03 Thread Chet Ramey
On 11/24/18 4:32 PM, Bize Ma wrote:

> > Bash is removing characters not explicitly listed in a bracket
> > expression (character range).
> > In this example, it is removing digits from other languages.
> 
> What is your locale?
> 
>  
> The locale used was en_US.utf-8 but also happens with  459
> locales out of 868 available under Debian (not in C, for example).
> 
> Also in all locales affected (except one), setting either
> LC_ALL=$loc or LC_COLLATE=$loc did the same.
> Except in zh_CN.gb18030
> 
> But IMO locale collation should not be used for an explicit list.

Collation order is used for each individual character in a bracket
expression when compared against the string, as posix specifies.

> I have been made aware that there is a
>   cstart = cend = FOLD (cstart);
> inside the `sm_loop.c` file that will convert into a range many
> individual character. If that understanding is correct that is the
> source of the difference with other shells.

I'm not sure what you mean by "convert into a range." If cstart and cend
were treated as a range, the start end and end characters would be the
same. If cstart == cend, a character that collates >= cstart and <= cend
would have to collate equal to cstart and cend.

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/




Re: Bash removes unrequested characters in bracket expressions (not a range).

2018-12-03 Thread Chet Ramey
On 11/28/18 2:05 AM, Bize Ma wrote:
> Chet Ramey (mailto:chet.ra...@case.edu>>) wrote:
>  
> 
> I can't reproduce this:
> 
> 
> If you could take a look at https://unix.stackexchange.com/a/483835/265604
> you will see that it has been confirmed on "Ubuntu 17.10 (glibc 2.26) and on
> Ubuntu 18.04 (glibc 2.27), but it seems to be fixed on Ubuntu 18.10 (glibc
> 2.28)"

I must have used systems without this problem.

> It is interesting that (finally) glibc 2.28 has added a fourth sort key
> equal to the
> Unicode code point. That forces the order of all characters to be unique.

One of the POSIX future directions.

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/




Re: Bash removes unrequested characters in bracket expressions (not a range).

2018-12-03 Thread Chet Ramey
On 11/28/18 2:45 AM, Bize Ma wrote:
> Chet Ramey (mailto:chet.ra...@case.edu>>) wrote:
> 
> On 11/24/18 2:32 PM, Chet Ramey wrote:
> 
> >> But IMO locale collation should not be used for an explicit list.
> >
> > Collation order is used for each individual character in a bracket
> > expression when compared against the string, as posix specifies.
> 
> 
> Yes, values resulting from a glob expansion should be compared with strcoll.
> 
> How many characters should there be in a range like [0-0] ?
> Or to be more precise: in a [0] bracket expression? one?

There should be one character ("0") that matches as many characters as
collate equal to the character "0", as per the POSIX quote in my previous
message.

> 
> If I were you, I would file a bug report with Debian against wcscoll.
> 
> 
> And I would be told that wcscoll is doing what the collation file 14651 is
> telling it to do.

Sure.

> 
> And, that in any case, that file has been updated in glib2.8 anyway.

That should fix the problem without forcing applications to attempt to
impose a total ordering even when strcoll/wcscoll returns 0.

> It returns 0 (equal) for L"٠" and L"0" without setting errno. That's
> clearly a problem with wcscoll (if the character isn't valid in the 
> current
> locale) or the locale definition.
> 
> 
> Both characters collate to the same position as I have already explained.

Yes, so the locale definition files imposing a total ordering will be a
clear improvement.

> 
> I don't follow you about what you mean with: /(if the character isn't valid
> in the current
> locale)./

There are codepoints that correspond to characters in one locale but don't
map to a valid character in another.

Chet
-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/




Re: Bash removes unrequested characters in bracket expressions (not a range).

2018-12-03 Thread Bize Ma
Chet Ramey () wrote:


> I can't reproduce this:
>

If you could take a look at https://unix.stackexchange.com/a/483835/265604
you will see that it has been confirmed on "Ubuntu 17.10 (glibc 2.26) and on
Ubuntu 18.04 (glibc 2.27), but it seems to be fixed on Ubuntu 18.10 (glibc
2.28)"

It is interesting that (finally) glibc 2.28 has added a fourth sort key
equal to the
Unicode code point. That forces the order of all characters to be unique.


Re: Bash removes unrequested characters in bracket expressions (not a range).

2018-12-03 Thread Chet Ramey
On 11/28/18 2:29 AM, Bize Ma wrote:
> Chet Ramey (mailto:chet.ra...@case.edu>>) wrote:
> 
> On 11/24/18 4:32 PM, Bize Ma wrote:
> 
>  [...]
> 
> > I have been made aware that there is a
> >   cstart = cend = FOLD (cstart);
> > inside the `sm_loop.c` file that will convert into a range many
> > individual character. If that understanding is correct that is the
> > source of the difference with other shells.
> 
> I'm not sure what you mean by "convert into a range." If cstart and cend
> were treated as a range, the start end and end characters would be the
> same. If cstart == cend, a character that collates >= cstart and <= cend
> would have to collate equal to cstart and cend.
> 
> 
> Yes, exactly, a range where the start and the end are the same.

A range like that is exactly equivalent to a single ordinary character.

POSIX: "An ordinary character in the list should only match that character,
but may match any single character that collates equally with that
character"

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/




Re: Bash removes unrequested characters in bracket expressions (not a range).

2018-12-03 Thread Bize Ma
Chet Ramey () wrote:

> On 11/24/18 2:32 PM, Chet Ramey wrote:
>
> >> But IMO locale collation should not be used for an explicit list.
> >
> > Collation order is used for each individual character in a bracket
> > expression when compared against the string, as posix specifies.
>

Yes, values resulting from a glob expansion should be compared with strcoll.

How many characters should there be in a range like [0-0] ?
Or to be more precise: in a [0] bracket expression? one?

If I were you, I would file a bug report with Debian against wcscoll.
>

And I would be told that wcscoll is doing what the collation file 14651 is
telling it to do.

And, that in any case, that file has been updated in glib2.8 anyway.


> It returns 0 (equal) for L"٠" and L"0" without setting errno. That's
> clearly a problem with wcscoll (if the character isn't valid in the current
> locale) or the locale definition.
>

Both characters collate to the same position as I have already explained.

I don't follow you about what you mean with:
*(if the character isn't valid in the current locale).*


Re: Bash removes unrequested characters in bracket expressions (not a range).

2018-12-03 Thread Bize Ma
Chet Ramey () wrote:

> On 11/24/18 4:32 PM, Bize Ma wrote:

 [...]

> > I have been made aware that there is a
> >   cstart = cend = FOLD (cstart);
> > inside the `sm_loop.c` file that will convert into a range many
> > individual character. If that understanding is correct that is the
> > source of the difference with other shells.
>
> I'm not sure what you mean by "convert into a range." If cstart and cend
> were treated as a range, the start end and end characters would be the
> same. If cstart == cend, a character that collates >= cstart and <= cend
> would have to collate equal to cstart and cend.
>

Yes, exactly, a range where the start and the end are the same.

Try:

$ touch 0 1 ٠ ١  ۰ ۱ ߀ ߁ ० १
$ echo [1]
1  ١

It is converted to the same range as this

$ echo [1-1]
1  ١

That happens because up to glibc 2.27 this has been the collation order of
those characters (search in /usr/share/i18n/locales/iso14651_t1_common) :

 <0>;;;IGNORE
 <0>;;;IGNORE

Collate to exactly the same values. This breaks the capacity to detect that
a character is absent in a list ordered by the collation order.


Re: Bash removes unrequested characters in bracket expressions (not a range).

2018-12-03 Thread Chet Ramey
On 11/24/18 2:32 PM, Chet Ramey wrote:

>> But IMO locale collation should not be used for an explicit list.
> 
> Collation order is used for each individual character in a bracket
> expression when compared against the string, as posix specifies.
> 
>> I have been made aware that there is a
>>   cstart = cend = FOLD (cstart);
>> inside the `sm_loop.c` file that will convert into a range many
>> individual character. If that understanding is correct that is the
>> source of the difference with other shells.
> 
> I'm not sure what you mean by "convert into a range." If cstart and cend
> were treated as a range, the start end and end characters would be the
> same. If cstart == cend, a character that collates >= cstart and <= cend
> would have to collate equal to cstart and cend.

If I were you, I would file a bug report with Debian against wcscoll.

It returns 0 (equal) for L"٠" and L"0" without setting errno. That's
clearly a problem with wcscoll (if the character isn't valid in the current
locale) or the locale definition.

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/