If the dictionary contains more than 65536 words
(/usr/share/hunspell/en_US.dic does not) and you want the latest words to be
possibly picked, then -N 2 should become -N 3.
Well, except that it will take time to get a valid word number if there are,
say, 65537 words. Indeed, 65537/2^24 = 0.00390631, that is a probability
below 0.4% to pick a valid word number, and in average log(0.5) /
log(1-0.00390631) = 177 iterations (and accesses to the entropy pool) to
choose each word.
The "if" test guarantees that every word has the *exact* same probability of
being picked. But it is not worth it. It is good enough to take a random
potentially large number (4 bytes below) and use a modulo:
$ words=6; dic=/usr/share/hunspell/en_US.dic; max=`wc -l < $dic`; for i in
`seq $words`; do r=`od -A n -N 4 -t u4 /dev/random`; cut -d / -f 1 $dic | sed
-n `expr $r % $max + 1`p; done | tr '\n' ' '
Contrary to the command in the post right above, the first words in the
dictionary have a slightly greater probability to be picked. For instance,
since /usr/share/hunspell/en_US.dic has 62155 words and 2^32 = 62155 x 69100
+ 56796, then the 56796 first words are more likely to be chosen by a factor
69101 / 69100 = 1.000014472, i.e., 0.001% more likely. Who cares? If you do,
just increase a little bit more the two numbers "4" in the command above and
you will gain many more zeros in that last probability. With "5", it already
is 0.000006%.
Anyway, I am writing to myself, thinking fun little solutions instead of
actually working! But if somebody here wants to understand the command, I
would be happy to have that additional opportunity to escape the (lot of)
work I have to achieve!