Re: caesar(6) documents incorrect frequencies
> On Tue, Aug 01, 2017 at 08:41:32AM -0500, Matthew Martin wrote: > > On Tue, Aug 01, 2017 at 07:38:28AM -0600, Theo de Raadt wrote: > > > > On Tue, Aug 01, 2017 at 07:28:39AM -0600, Theo de Raadt wrote: > > > > > I've known about ETAONRISHetc basically forever. Where is this new > > > > > order (ETSAORINDHLCPMUYFWGBVKXQZJ) coming from. > > > > > > > > > > Citation please? > > > > > > > > I'm just updating the man page to reflect the percentages in caesar.c > > > > which claims to get it's numbers from "some unix(tm) documentation". > > > > > > Is it possible you've got the fix backwards? I think ETAONRISHetc is > > > from some well-known research, but ETSAOR* is brand new and even google > > > cannot find a reference to that ordering. It seems there is a bug here, > > > but is it perhaps the other frequency table? > > > > I certainly don't claim to know which frequencies are more accurate. > > Does anyone have a preferred source for which percentages to use? > > > > - Matthew Martin > > If no one has a better suggestion, > https://www.math.cornell.edu/~mec/2003-2004/cryptography/subs/frequencies.html > seems to be fairly middle of the road in it's frequencies. I disagree. I think this toy program should remain the same...
Re: caesar(6) documents incorrect frequencies
On Tue, Aug 01, 2017 at 08:41:32AM -0500, Matthew Martin wrote: > On Tue, Aug 01, 2017 at 07:38:28AM -0600, Theo de Raadt wrote: > > > On Tue, Aug 01, 2017 at 07:28:39AM -0600, Theo de Raadt wrote: > > > > I've known about ETAONRISHetc basically forever. Where is this new > > > > order (ETSAORINDHLCPMUYFWGBVKXQZJ) coming from. > > > > > > > > Citation please? > > > > > > I'm just updating the man page to reflect the percentages in caesar.c > > > which claims to get it's numbers from "some unix(tm) documentation". > > > > Is it possible you've got the fix backwards? I think ETAONRISHetc is > > from some well-known research, but ETSAOR* is brand new and even google > > cannot find a reference to that ordering. It seems there is a bug here, > > but is it perhaps the other frequency table? > > I certainly don't claim to know which frequencies are more accurate. > Does anyone have a preferred source for which percentages to use? > > - Matthew Martin If no one has a better suggestion, https://www.math.cornell.edu/~mec/2003-2004/cryptography/subs/frequencies.html seems to be fairly middle of the road in it's frequencies. - Matthew Martin
Re: caesar(6) documents incorrect frequencies
On Thu, Aug 03, 2017 at 11:20:15AM +0200, Daniel Hartmeier wrote: > Maybe you mean Etaoin Shrdlu, it has a fascinating story > > https://archive.org/details/FarewellEtaoinShrdlu Wow, just wow. Thanks for this piece of history :)
Re: caesar(6) documents incorrect frequencies
Maybe you mean Etaoin Shrdlu, it has a fascinating story https://archive.org/details/FarewellEtaoinShrdlu
Re: caesar(6) documents incorrect frequencies
> No ones agree, I think you are mistaken. This is not an exact science, it is an approximation. However one of there is well-known, and the others are calculation-de-jour.
Re: caesar(6) documents incorrect frequencies
On Tue, Aug 1, 2017 at 9:49 AM, Theo de Raadt wrote: >> > Is it possible you've got the fix backwards? I think ETAONRISHetc is >> > from some well-known research, but ETSAOR* is brand new and even google >> > cannot find a reference to that ordering. It seems there is a bug here, >> > but is it perhaps the other frequency table? >> >> I certainly don't claim to know which frequencies are more accurate. >> Does anyone have a preferred source for which percentages to use? > > I suggest a google search for ETAONRISH, which leads to a handful of > references from 1960, 1963, etc. Of course it is only an estimate, and > will vary between regions and countries EH? > > I think that frequency order is still the most accepted. > No ones agree, Wikipedia : compares to < eotha sinrd luymw fgcbp kvjqxz of modern English > ( https://en.wikipedia.org/wiki/Letter_frequency ) from: http://www.math.ucsd.edu/~crypto/Projects/MarshaMoreno/TimeComparisonFrequency.pdf Note the paper from wikipedia reference talk english and use the bible ??? The tables can be sorted and gave : ETAOINSHR DLC ... Meh -- -- - Knowing is not enough; we must apply. Willing is not enough; we must do
Re: caesar(6) documents incorrect frequencies
> > Is it possible you've got the fix backwards? I think ETAONRISHetc is > > from some well-known research, but ETSAOR* is brand new and even google > > cannot find a reference to that ordering. It seems there is a bug here, > > but is it perhaps the other frequency table? > > I certainly don't claim to know which frequencies are more accurate. > Does anyone have a preferred source for which percentages to use? I suggest a google search for ETAONRISH, which leads to a handful of references from 1960, 1963, etc. Of course it is only an estimate, and will vary between regions and countries EH? I think that frequency order is still the most accepted.
Re: caesar(6) documents incorrect frequencies
On Tue, Aug 01, 2017 at 07:38:28AM -0600, Theo de Raadt wrote: > > On Tue, Aug 01, 2017 at 07:28:39AM -0600, Theo de Raadt wrote: > > > I've known about ETAONRISHetc basically forever. Where is this new > > > order (ETSAORINDHLCPMUYFWGBVKXQZJ) coming from. > > > > > > Citation please? > > > > I'm just updating the man page to reflect the percentages in caesar.c > > which claims to get it's numbers from "some unix(tm) documentation". > > Is it possible you've got the fix backwards? I think ETAONRISHetc is > from some well-known research, but ETSAOR* is brand new and even google > cannot find a reference to that ordering. It seems there is a bug here, > but is it perhaps the other frequency table? I certainly don't claim to know which frequencies are more accurate. Does anyone have a preferred source for which percentages to use? - Matthew Martin
Re: caesar(6) documents incorrect frequencies
> On Tue, Aug 01, 2017 at 07:28:39AM -0600, Theo de Raadt wrote: > > I've known about ETAONRISHetc basically forever. Where is this new > > order (ETSAORINDHLCPMUYFWGBVKXQZJ) coming from. > > > > Citation please? > > I'm just updating the man page to reflect the percentages in caesar.c > which claims to get it's numbers from "some unix(tm) documentation". Is it possible you've got the fix backwards? I think ETAONRISHetc is from some well-known research, but ETSAOR* is brand new and even google cannot find a reference to that ordering. It seems there is a bug here, but is it perhaps the other frequency table? > - Matthew Martin > > > > On Tue, Aug 01, 2017 at 09:36:13AM +0100, Jason McIntyre wrote: > > > > On Thu, Jul 27, 2017 at 01:36:15AM -0500, Matthew Martin wrote: > > > > > The man page documents frequencies that are different than the code > > > > > uses e.g. C (3.61 vs 2.7) and D (4.78 vs 3.8). This seems a bit much > > > > > for > > > > > a man page. If anyone prefers the letter ordering be kept, the correct > > > > > order is ETSAORINDHLCPMUYFWGBVKXQZJ . > > > > > > > > > > - Matthew Martin > > > > > > > > > > > > > morning. > > > > > > > > i don;t see what harm there is in providing this information. the man > > > > page itself is hardly a huge read, and who knows whether someome might > > > > find it useful. > > > > > > > > i'd be willing to commit a diff which updates the man page. > > > > > > > > jmc > > > > > > Morning > > > > > > I assumed someone who cared to know the frequencies would open the > > > source, and removing them means there's is one less thing to get out of > > > sync. But I have no strong opinion here. > > > > > > - Matthew Martin > > > > > > > > > diff --git caesar.6 caesar.6 > > > index 9dc040a7a6d..94ad082327e 100644 > > > --- caesar.6 > > > +++ caesar.6 > > > @@ -64,13 +64,13 @@ their content. > > > .Pp > > > The frequency (from most common to least) of English letters is as > > > follows: > > > .Bd -filled -offset indent > > > -ETAONRISHDLFCMUGPYWBVKXJQZ > > > +ETSAORINDHLCPMUYFWGBVKXQZJ > > > .Ed > > > .Pp > > > Their frequencies as a percentage are as follows: > > > .Bd -filled -offset indent > > > -E(13), T(10.5), A(8.1), O(7.9), N(7.1), R(6.8), I(6.3), S(6.1), H(5.2), > > > -D(3.8), L(3.4), F(2.9), C(2.7), M(2.5), U(2.4), G(2), > > > -P(1.9), Y(1.9), > > > -W(1.5), B(1.4), V(.9), K(.4), X(.15), J(.13), Q(.11), Z(.07). > > > +E(12.37), T(9.68), S(8.77), A(7.97), O(6.96), R(6.63), I(6.39), N(5.92), > > > +D(4.78), H(4.49), L(3.81), C(3.61), P(2.91), M(2.69), U(2.62), Y(2.07), > > > +F(2.01), W(1.88), G(1.46), B(1.35), V(0.81), K(0.42), X(0.23), Q(0.08), > > > +Z(0.06), J(0.04). > > > .Ed > > > > > >
Re: caesar(6) documents incorrect frequencies
On Tue, Aug 01, 2017 at 07:28:39AM -0600, Theo de Raadt wrote: > I've known about ETAONRISHetc basically forever. Where is this new > order (ETSAORINDHLCPMUYFWGBVKXQZJ) coming from. > > Citation please? I'm just updating the man page to reflect the percentages in caesar.c which claims to get it's numbers from "some unix(tm) documentation". - Matthew Martin > > On Tue, Aug 01, 2017 at 09:36:13AM +0100, Jason McIntyre wrote: > > > On Thu, Jul 27, 2017 at 01:36:15AM -0500, Matthew Martin wrote: > > > > The man page documents frequencies that are different than the code > > > > uses e.g. C (3.61 vs 2.7) and D (4.78 vs 3.8). This seems a bit much for > > > > a man page. If anyone prefers the letter ordering be kept, the correct > > > > order is ETSAORINDHLCPMUYFWGBVKXQZJ . > > > > > > > > - Matthew Martin > > > > > > > > > > morning. > > > > > > i don;t see what harm there is in providing this information. the man > > > page itself is hardly a huge read, and who knows whether someome might > > > find it useful. > > > > > > i'd be willing to commit a diff which updates the man page. > > > > > > jmc > > > > Morning > > > > I assumed someone who cared to know the frequencies would open the > > source, and removing them means there's is one less thing to get out of > > sync. But I have no strong opinion here. > > > > - Matthew Martin > > > > > > diff --git caesar.6 caesar.6 > > index 9dc040a7a6d..94ad082327e 100644 > > --- caesar.6 > > +++ caesar.6 > > @@ -64,13 +64,13 @@ their content. > > .Pp > > The frequency (from most common to least) of English letters is as follows: > > .Bd -filled -offset indent > > -ETAONRISHDLFCMUGPYWBVKXJQZ > > +ETSAORINDHLCPMUYFWGBVKXQZJ > > .Ed > > .Pp > > Their frequencies as a percentage are as follows: > > .Bd -filled -offset indent > > -E(13), T(10.5), A(8.1), O(7.9), N(7.1), R(6.8), I(6.3), S(6.1), H(5.2), > > -D(3.8), L(3.4), F(2.9), C(2.7), M(2.5), U(2.4), G(2), > > -P(1.9), Y(1.9), > > -W(1.5), B(1.4), V(.9), K(.4), X(.15), J(.13), Q(.11), Z(.07). > > +E(12.37), T(9.68), S(8.77), A(7.97), O(6.96), R(6.63), I(6.39), N(5.92), > > +D(4.78), H(4.49), L(3.81), C(3.61), P(2.91), M(2.69), U(2.62), Y(2.07), > > +F(2.01), W(1.88), G(1.46), B(1.35), V(0.81), K(0.42), X(0.23), Q(0.08), > > +Z(0.06), J(0.04). > > .Ed > > >
Re: caesar(6) documents incorrect frequencies
I've known about ETAONRISHetc basically forever. Where is this new order (ETSAORINDHLCPMUYFWGBVKXQZJ) coming from. Citation please? > On Tue, Aug 01, 2017 at 09:36:13AM +0100, Jason McIntyre wrote: > > On Thu, Jul 27, 2017 at 01:36:15AM -0500, Matthew Martin wrote: > > > The man page documents frequencies that are different than the code > > > uses e.g. C (3.61 vs 2.7) and D (4.78 vs 3.8). This seems a bit much for > > > a man page. If anyone prefers the letter ordering be kept, the correct > > > order is ETSAORINDHLCPMUYFWGBVKXQZJ . > > > > > > - Matthew Martin > > > > > > > morning. > > > > i don;t see what harm there is in providing this information. the man > > page itself is hardly a huge read, and who knows whether someome might > > find it useful. > > > > i'd be willing to commit a diff which updates the man page. > > > > jmc > > Morning > > I assumed someone who cared to know the frequencies would open the > source, and removing them means there's is one less thing to get out of > sync. But I have no strong opinion here. > > - Matthew Martin > > > diff --git caesar.6 caesar.6 > index 9dc040a7a6d..94ad082327e 100644 > --- caesar.6 > +++ caesar.6 > @@ -64,13 +64,13 @@ their content. > .Pp > The frequency (from most common to least) of English letters is as follows: > .Bd -filled -offset indent > -ETAONRISHDLFCMUGPYWBVKXJQZ > +ETSAORINDHLCPMUYFWGBVKXQZJ > .Ed > .Pp > Their frequencies as a percentage are as follows: > .Bd -filled -offset indent > -E(13), T(10.5), A(8.1), O(7.9), N(7.1), R(6.8), I(6.3), S(6.1), H(5.2), > -D(3.8), L(3.4), F(2.9), C(2.7), M(2.5), U(2.4), G(2), > -P(1.9), Y(1.9), > -W(1.5), B(1.4), V(.9), K(.4), X(.15), J(.13), Q(.11), Z(.07). > +E(12.37), T(9.68), S(8.77), A(7.97), O(6.96), R(6.63), I(6.39), N(5.92), > +D(4.78), H(4.49), L(3.81), C(3.61), P(2.91), M(2.69), U(2.62), Y(2.07), > +F(2.01), W(1.88), G(1.46), B(1.35), V(0.81), K(0.42), X(0.23), Q(0.08), > +Z(0.06), J(0.04). > .Ed >
Re: caesar(6) documents incorrect frequencies
On Tue, Aug 01, 2017 at 09:36:13AM +0100, Jason McIntyre wrote: > On Thu, Jul 27, 2017 at 01:36:15AM -0500, Matthew Martin wrote: > > The man page documents frequencies that are different than the code > > uses e.g. C (3.61 vs 2.7) and D (4.78 vs 3.8). This seems a bit much for > > a man page. If anyone prefers the letter ordering be kept, the correct > > order is ETSAORINDHLCPMUYFWGBVKXQZJ . > > > > - Matthew Martin > > > > morning. > > i don;t see what harm there is in providing this information. the man > page itself is hardly a huge read, and who knows whether someome might > find it useful. > > i'd be willing to commit a diff which updates the man page. > > jmc Morning I assumed someone who cared to know the frequencies would open the source, and removing them means there's is one less thing to get out of sync. But I have no strong opinion here. - Matthew Martin diff --git caesar.6 caesar.6 index 9dc040a7a6d..94ad082327e 100644 --- caesar.6 +++ caesar.6 @@ -64,13 +64,13 @@ their content. .Pp The frequency (from most common to least) of English letters is as follows: .Bd -filled -offset indent -ETAONRISHDLFCMUGPYWBVKXJQZ +ETSAORINDHLCPMUYFWGBVKXQZJ .Ed .Pp Their frequencies as a percentage are as follows: .Bd -filled -offset indent -E(13), T(10.5), A(8.1), O(7.9), N(7.1), R(6.8), I(6.3), S(6.1), H(5.2), -D(3.8), L(3.4), F(2.9), C(2.7), M(2.5), U(2.4), G(2), -P(1.9), Y(1.9), -W(1.5), B(1.4), V(.9), K(.4), X(.15), J(.13), Q(.11), Z(.07). +E(12.37), T(9.68), S(8.77), A(7.97), O(6.96), R(6.63), I(6.39), N(5.92), +D(4.78), H(4.49), L(3.81), C(3.61), P(2.91), M(2.69), U(2.62), Y(2.07), +F(2.01), W(1.88), G(1.46), B(1.35), V(0.81), K(0.42), X(0.23), Q(0.08), +Z(0.06), J(0.04). .Ed
Re: caesar(6) documents incorrect frequencies
On Thu, Jul 27, 2017 at 01:36:15AM -0500, Matthew Martin wrote: > The man page documents frequencies that are different than the code > uses e.g. C (3.61 vs 2.7) and D (4.78 vs 3.8). This seems a bit much for > a man page. If anyone prefers the letter ordering be kept, the correct > order is ETSAORINDHLCPMUYFWGBVKXQZJ . > > - Matthew Martin > morning. i don;t see what harm there is in providing this information. the man page itself is hardly a huge read, and who knows whether someome might find it useful. i'd be willing to commit a diff which updates the man page. jmc > > diff --git caesar.6 caesar.6 > index 9dc040a7a6d..889f24c6548 100644 > --- caesar.6 > +++ caesar.6 > @@ -61,16 +61,3 @@ and in some of the databases used by the > program to > .Dq disguise > their content. > -.Pp > -The frequency (from most common to least) of English letters is as follows: > -.Bd -filled -offset indent > -ETAONRISHDLFCMUGPYWBVKXJQZ > -.Ed > -.Pp > -Their frequencies as a percentage are as follows: > -.Bd -filled -offset indent > -E(13), T(10.5), A(8.1), O(7.9), N(7.1), R(6.8), I(6.3), S(6.1), H(5.2), > -D(3.8), L(3.4), F(2.9), C(2.7), M(2.5), U(2.4), G(2), > -P(1.9), Y(1.9), > -W(1.5), B(1.4), V(.9), K(.4), X(.15), J(.13), Q(.11), Z(.07). > -.Ed >