Re: caesar(6) documents incorrect frequencies

2017-08-10 Thread Theo de Raadt
> On Tue, Aug 01, 2017 at 08:41:32AM -0500, Matthew Martin wrote:
> > On Tue, Aug 01, 2017 at 07:38:28AM -0600, Theo de Raadt wrote:
> > > > On Tue, Aug 01, 2017 at 07:28:39AM -0600, Theo de Raadt wrote:
> > > > > I've known about ETAONRISHetc basically forever.  Where is this new
> > > > > order (ETSAORINDHLCPMUYFWGBVKXQZJ) coming from.
> > > > > 
> > > > > Citation please?
> > > > 
> > > > I'm just updating the man page to reflect the percentages in caesar.c
> > > > which claims to get it's numbers from "some unix(tm) documentation".
> > > 
> > > Is it possible you've got the fix backwards?  I think ETAONRISHetc is
> > > from some well-known research, but ETSAOR* is brand new and even google
> > > cannot find a reference to that ordering.  It seems there is a bug here,
> > > but is it perhaps the other frequency table?
> > 
> > I certainly don't claim to know which frequencies are more accurate.
> > Does anyone have a preferred source for which percentages to use?
> > 
> > - Matthew Martin
> 
> If no one has a better suggestion,
> https://www.math.cornell.edu/~mec/2003-2004/cryptography/subs/frequencies.html
> seems to be fairly middle of the road in it's frequencies.

I disagree.  I think this toy program should remain the same...



Re: caesar(6) documents incorrect frequencies

2017-08-09 Thread Matthew Martin
On Tue, Aug 01, 2017 at 08:41:32AM -0500, Matthew Martin wrote:
> On Tue, Aug 01, 2017 at 07:38:28AM -0600, Theo de Raadt wrote:
> > > On Tue, Aug 01, 2017 at 07:28:39AM -0600, Theo de Raadt wrote:
> > > > I've known about ETAONRISHetc basically forever.  Where is this new
> > > > order (ETSAORINDHLCPMUYFWGBVKXQZJ) coming from.
> > > > 
> > > > Citation please?
> > > 
> > > I'm just updating the man page to reflect the percentages in caesar.c
> > > which claims to get it's numbers from "some unix(tm) documentation".
> > 
> > Is it possible you've got the fix backwards?  I think ETAONRISHetc is
> > from some well-known research, but ETSAOR* is brand new and even google
> > cannot find a reference to that ordering.  It seems there is a bug here,
> > but is it perhaps the other frequency table?
> 
> I certainly don't claim to know which frequencies are more accurate.
> Does anyone have a preferred source for which percentages to use?
> 
> - Matthew Martin

If no one has a better suggestion,
https://www.math.cornell.edu/~mec/2003-2004/cryptography/subs/frequencies.html
seems to be fairly middle of the road in it's frequencies.

- Matthew Martin



Re: caesar(6) documents incorrect frequencies

2017-08-03 Thread Landry Breuil
On Thu, Aug 03, 2017 at 11:20:15AM +0200, Daniel Hartmeier wrote:
> Maybe you mean Etaoin Shrdlu, it has a fascinating story
> 
>   https://archive.org/details/FarewellEtaoinShrdlu

Wow, just wow. Thanks for this piece of history :)



Re: caesar(6) documents incorrect frequencies

2017-08-03 Thread Daniel Hartmeier
Maybe you mean Etaoin Shrdlu, it has a fascinating story

  https://archive.org/details/FarewellEtaoinShrdlu



Re: caesar(6) documents incorrect frequencies

2017-08-01 Thread Theo de Raadt
> No ones agree,

I think you are mistaken.

This is not an exact science, it is an approximation.  However one of
there is well-known, and the others are calculation-de-jour.



Re: caesar(6) documents incorrect frequencies

2017-08-01 Thread sven falempin
On Tue, Aug 1, 2017 at 9:49 AM, Theo de Raadt  wrote:
>> > Is it possible you've got the fix backwards?  I think ETAONRISHetc is
>> > from some well-known research, but ETSAOR* is brand new and even google
>> > cannot find a reference to that ordering.  It seems there is a bug here,
>> > but is it perhaps the other frequency table?
>>
>> I certainly don't claim to know which frequencies are more accurate.
>> Does anyone have a preferred source for which percentages to use?
>
> I suggest a google search for ETAONRISH, which leads to a handful of
> references from 1960, 1963, etc.  Of course it is only an estimate, and
> will vary between regions and countries EH?
>
> I think that frequency order is still the most accepted.
>

No ones agree,

Wikipedia : compares to < eotha sinrd luymw fgcbp kvjqxz of modern
English > ( https://en.wikipedia.org/wiki/Letter_frequency )

from: 
http://www.math.ucsd.edu/~crypto/Projects/MarshaMoreno/TimeComparisonFrequency.pdf

Note the paper from wikipedia reference talk  english and use
the bible ???

The tables can be sorted and gave : ETAOINSHR DLC ...



Meh

-- 
--
-
Knowing is not enough; we must apply. Willing is not enough; we must do



Re: caesar(6) documents incorrect frequencies

2017-08-01 Thread Theo de Raadt
> > Is it possible you've got the fix backwards?  I think ETAONRISHetc is
> > from some well-known research, but ETSAOR* is brand new and even google
> > cannot find a reference to that ordering.  It seems there is a bug here,
> > but is it perhaps the other frequency table?
> 
> I certainly don't claim to know which frequencies are more accurate.
> Does anyone have a preferred source for which percentages to use?

I suggest a google search for ETAONRISH, which leads to a handful of
references from 1960, 1963, etc.  Of course it is only an estimate, and
will vary between regions and countries EH?

I think that frequency order is still the most accepted.



Re: caesar(6) documents incorrect frequencies

2017-08-01 Thread Matthew Martin
On Tue, Aug 01, 2017 at 07:38:28AM -0600, Theo de Raadt wrote:
> > On Tue, Aug 01, 2017 at 07:28:39AM -0600, Theo de Raadt wrote:
> > > I've known about ETAONRISHetc basically forever.  Where is this new
> > > order (ETSAORINDHLCPMUYFWGBVKXQZJ) coming from.
> > > 
> > > Citation please?
> > 
> > I'm just updating the man page to reflect the percentages in caesar.c
> > which claims to get it's numbers from "some unix(tm) documentation".
> 
> Is it possible you've got the fix backwards?  I think ETAONRISHetc is
> from some well-known research, but ETSAOR* is brand new and even google
> cannot find a reference to that ordering.  It seems there is a bug here,
> but is it perhaps the other frequency table?

I certainly don't claim to know which frequencies are more accurate.
Does anyone have a preferred source for which percentages to use?

- Matthew Martin



Re: caesar(6) documents incorrect frequencies

2017-08-01 Thread Theo de Raadt
> On Tue, Aug 01, 2017 at 07:28:39AM -0600, Theo de Raadt wrote:
> > I've known about ETAONRISHetc basically forever.  Where is this new
> > order (ETSAORINDHLCPMUYFWGBVKXQZJ) coming from.
> > 
> > Citation please?
> 
> I'm just updating the man page to reflect the percentages in caesar.c
> which claims to get it's numbers from "some unix(tm) documentation".

Is it possible you've got the fix backwards?  I think ETAONRISHetc is
from some well-known research, but ETSAOR* is brand new and even google
cannot find a reference to that ordering.  It seems there is a bug here,
but is it perhaps the other frequency table?
 
> - Matthew Martin
> 
> > > On Tue, Aug 01, 2017 at 09:36:13AM +0100, Jason McIntyre wrote:
> > > > On Thu, Jul 27, 2017 at 01:36:15AM -0500, Matthew Martin wrote:
> > > > > The man page documents frequencies that are different than the code
> > > > > uses e.g. C (3.61 vs 2.7) and D (4.78 vs 3.8). This seems a bit much 
> > > > > for
> > > > > a man page. If anyone prefers the letter ordering be kept, the correct
> > > > > order is ETSAORINDHLCPMUYFWGBVKXQZJ .
> > > > > 
> > > > > - Matthew Martin
> > > > > 
> > > > 
> > > > morning.
> > > > 
> > > > i don;t see what harm there is in providing this information. the man
> > > > page itself is hardly a huge read, and who knows whether someome might
> > > > find it useful.
> > > > 
> > > > i'd be willing to commit a diff which updates the man page.
> > > > 
> > > > jmc
> > > 
> > > Morning
> > > 
> > > I assumed someone who cared to know the frequencies would open the
> > > source, and removing them means there's is one less thing to get out of
> > > sync. But I have no strong opinion here.
> > > 
> > > - Matthew Martin
> > > 
> > > 
> > > diff --git caesar.6 caesar.6
> > > index 9dc040a7a6d..94ad082327e 100644
> > > --- caesar.6
> > > +++ caesar.6
> > > @@ -64,13 +64,13 @@ their content.
> > >  .Pp
> > >  The frequency (from most common to least) of English letters is as 
> > > follows:
> > >  .Bd -filled -offset indent
> > > -ETAONRISHDLFCMUGPYWBVKXJQZ
> > > +ETSAORINDHLCPMUYFWGBVKXQZJ
> > >  .Ed
> > >  .Pp
> > >  Their frequencies as a percentage are as follows:
> > >  .Bd -filled -offset indent
> > > -E(13), T(10.5), A(8.1), O(7.9), N(7.1), R(6.8), I(6.3), S(6.1), H(5.2),
> > > -D(3.8), L(3.4), F(2.9), C(2.7), M(2.5), U(2.4), G(2),
> > > -P(1.9), Y(1.9),
> > > -W(1.5), B(1.4), V(.9), K(.4), X(.15), J(.13), Q(.11), Z(.07).
> > > +E(12.37), T(9.68), S(8.77), A(7.97), O(6.96), R(6.63), I(6.39), N(5.92),
> > > +D(4.78), H(4.49), L(3.81), C(3.61), P(2.91), M(2.69), U(2.62), Y(2.07),
> > > +F(2.01), W(1.88), G(1.46), B(1.35), V(0.81), K(0.42), X(0.23), Q(0.08),
> > > +Z(0.06), J(0.04).
> > >  .Ed
> > > 
> > 
> 



Re: caesar(6) documents incorrect frequencies

2017-08-01 Thread Matthew Martin
On Tue, Aug 01, 2017 at 07:28:39AM -0600, Theo de Raadt wrote:
> I've known about ETAONRISHetc basically forever.  Where is this new
> order (ETSAORINDHLCPMUYFWGBVKXQZJ) coming from.
> 
> Citation please?

I'm just updating the man page to reflect the percentages in caesar.c
which claims to get it's numbers from "some unix(tm) documentation".

- Matthew Martin

> > On Tue, Aug 01, 2017 at 09:36:13AM +0100, Jason McIntyre wrote:
> > > On Thu, Jul 27, 2017 at 01:36:15AM -0500, Matthew Martin wrote:
> > > > The man page documents frequencies that are different than the code
> > > > uses e.g. C (3.61 vs 2.7) and D (4.78 vs 3.8). This seems a bit much for
> > > > a man page. If anyone prefers the letter ordering be kept, the correct
> > > > order is ETSAORINDHLCPMUYFWGBVKXQZJ .
> > > > 
> > > > - Matthew Martin
> > > > 
> > > 
> > > morning.
> > > 
> > > i don;t see what harm there is in providing this information. the man
> > > page itself is hardly a huge read, and who knows whether someome might
> > > find it useful.
> > > 
> > > i'd be willing to commit a diff which updates the man page.
> > > 
> > > jmc
> > 
> > Morning
> > 
> > I assumed someone who cared to know the frequencies would open the
> > source, and removing them means there's is one less thing to get out of
> > sync. But I have no strong opinion here.
> > 
> > - Matthew Martin
> > 
> > 
> > diff --git caesar.6 caesar.6
> > index 9dc040a7a6d..94ad082327e 100644
> > --- caesar.6
> > +++ caesar.6
> > @@ -64,13 +64,13 @@ their content.
> >  .Pp
> >  The frequency (from most common to least) of English letters is as follows:
> >  .Bd -filled -offset indent
> > -ETAONRISHDLFCMUGPYWBVKXJQZ
> > +ETSAORINDHLCPMUYFWGBVKXQZJ
> >  .Ed
> >  .Pp
> >  Their frequencies as a percentage are as follows:
> >  .Bd -filled -offset indent
> > -E(13), T(10.5), A(8.1), O(7.9), N(7.1), R(6.8), I(6.3), S(6.1), H(5.2),
> > -D(3.8), L(3.4), F(2.9), C(2.7), M(2.5), U(2.4), G(2),
> > -P(1.9), Y(1.9),
> > -W(1.5), B(1.4), V(.9), K(.4), X(.15), J(.13), Q(.11), Z(.07).
> > +E(12.37), T(9.68), S(8.77), A(7.97), O(6.96), R(6.63), I(6.39), N(5.92),
> > +D(4.78), H(4.49), L(3.81), C(3.61), P(2.91), M(2.69), U(2.62), Y(2.07),
> > +F(2.01), W(1.88), G(1.46), B(1.35), V(0.81), K(0.42), X(0.23), Q(0.08),
> > +Z(0.06), J(0.04).
> >  .Ed
> > 
> 



Re: caesar(6) documents incorrect frequencies

2017-08-01 Thread Theo de Raadt
I've known about ETAONRISHetc basically forever.  Where is this new
order (ETSAORINDHLCPMUYFWGBVKXQZJ) coming from.

Citation please?

> On Tue, Aug 01, 2017 at 09:36:13AM +0100, Jason McIntyre wrote:
> > On Thu, Jul 27, 2017 at 01:36:15AM -0500, Matthew Martin wrote:
> > > The man page documents frequencies that are different than the code
> > > uses e.g. C (3.61 vs 2.7) and D (4.78 vs 3.8). This seems a bit much for
> > > a man page. If anyone prefers the letter ordering be kept, the correct
> > > order is ETSAORINDHLCPMUYFWGBVKXQZJ .
> > > 
> > > - Matthew Martin
> > > 
> > 
> > morning.
> > 
> > i don;t see what harm there is in providing this information. the man
> > page itself is hardly a huge read, and who knows whether someome might
> > find it useful.
> > 
> > i'd be willing to commit a diff which updates the man page.
> > 
> > jmc
> 
> Morning
> 
> I assumed someone who cared to know the frequencies would open the
> source, and removing them means there's is one less thing to get out of
> sync. But I have no strong opinion here.
> 
> - Matthew Martin
> 
> 
> diff --git caesar.6 caesar.6
> index 9dc040a7a6d..94ad082327e 100644
> --- caesar.6
> +++ caesar.6
> @@ -64,13 +64,13 @@ their content.
>  .Pp
>  The frequency (from most common to least) of English letters is as follows:
>  .Bd -filled -offset indent
> -ETAONRISHDLFCMUGPYWBVKXJQZ
> +ETSAORINDHLCPMUYFWGBVKXQZJ
>  .Ed
>  .Pp
>  Their frequencies as a percentage are as follows:
>  .Bd -filled -offset indent
> -E(13), T(10.5), A(8.1), O(7.9), N(7.1), R(6.8), I(6.3), S(6.1), H(5.2),
> -D(3.8), L(3.4), F(2.9), C(2.7), M(2.5), U(2.4), G(2),
> -P(1.9), Y(1.9),
> -W(1.5), B(1.4), V(.9), K(.4), X(.15), J(.13), Q(.11), Z(.07).
> +E(12.37), T(9.68), S(8.77), A(7.97), O(6.96), R(6.63), I(6.39), N(5.92),
> +D(4.78), H(4.49), L(3.81), C(3.61), P(2.91), M(2.69), U(2.62), Y(2.07),
> +F(2.01), W(1.88), G(1.46), B(1.35), V(0.81), K(0.42), X(0.23), Q(0.08),
> +Z(0.06), J(0.04).
>  .Ed
> 



Re: caesar(6) documents incorrect frequencies

2017-08-01 Thread Matthew Martin
On Tue, Aug 01, 2017 at 09:36:13AM +0100, Jason McIntyre wrote:
> On Thu, Jul 27, 2017 at 01:36:15AM -0500, Matthew Martin wrote:
> > The man page documents frequencies that are different than the code
> > uses e.g. C (3.61 vs 2.7) and D (4.78 vs 3.8). This seems a bit much for
> > a man page. If anyone prefers the letter ordering be kept, the correct
> > order is ETSAORINDHLCPMUYFWGBVKXQZJ .
> > 
> > - Matthew Martin
> > 
> 
> morning.
> 
> i don;t see what harm there is in providing this information. the man
> page itself is hardly a huge read, and who knows whether someome might
> find it useful.
> 
> i'd be willing to commit a diff which updates the man page.
> 
> jmc

Morning

I assumed someone who cared to know the frequencies would open the
source, and removing them means there's is one less thing to get out of
sync. But I have no strong opinion here.

- Matthew Martin


diff --git caesar.6 caesar.6
index 9dc040a7a6d..94ad082327e 100644
--- caesar.6
+++ caesar.6
@@ -64,13 +64,13 @@ their content.
 .Pp
 The frequency (from most common to least) of English letters is as follows:
 .Bd -filled -offset indent
-ETAONRISHDLFCMUGPYWBVKXJQZ
+ETSAORINDHLCPMUYFWGBVKXQZJ
 .Ed
 .Pp
 Their frequencies as a percentage are as follows:
 .Bd -filled -offset indent
-E(13), T(10.5), A(8.1), O(7.9), N(7.1), R(6.8), I(6.3), S(6.1), H(5.2),
-D(3.8), L(3.4), F(2.9), C(2.7), M(2.5), U(2.4), G(2),
-P(1.9), Y(1.9),
-W(1.5), B(1.4), V(.9), K(.4), X(.15), J(.13), Q(.11), Z(.07).
+E(12.37), T(9.68), S(8.77), A(7.97), O(6.96), R(6.63), I(6.39), N(5.92),
+D(4.78), H(4.49), L(3.81), C(3.61), P(2.91), M(2.69), U(2.62), Y(2.07),
+F(2.01), W(1.88), G(1.46), B(1.35), V(0.81), K(0.42), X(0.23), Q(0.08),
+Z(0.06), J(0.04).
 .Ed



Re: caesar(6) documents incorrect frequencies

2017-08-01 Thread Jason McIntyre
On Thu, Jul 27, 2017 at 01:36:15AM -0500, Matthew Martin wrote:
> The man page documents frequencies that are different than the code
> uses e.g. C (3.61 vs 2.7) and D (4.78 vs 3.8). This seems a bit much for
> a man page. If anyone prefers the letter ordering be kept, the correct
> order is ETSAORINDHLCPMUYFWGBVKXQZJ .
> 
> - Matthew Martin
> 

morning.

i don;t see what harm there is in providing this information. the man
page itself is hardly a huge read, and who knows whether someome might
find it useful.

i'd be willing to commit a diff which updates the man page.

jmc

> 
> diff --git caesar.6 caesar.6
> index 9dc040a7a6d..889f24c6548 100644
> --- caesar.6
> +++ caesar.6
> @@ -61,16 +61,3 @@ and in some of the databases used by the
>  program to
>  .Dq disguise
>  their content.
> -.Pp
> -The frequency (from most common to least) of English letters is as follows:
> -.Bd -filled -offset indent
> -ETAONRISHDLFCMUGPYWBVKXJQZ
> -.Ed
> -.Pp
> -Their frequencies as a percentage are as follows:
> -.Bd -filled -offset indent
> -E(13), T(10.5), A(8.1), O(7.9), N(7.1), R(6.8), I(6.3), S(6.1), H(5.2),
> -D(3.8), L(3.4), F(2.9), C(2.7), M(2.5), U(2.4), G(2),
> -P(1.9), Y(1.9),
> -W(1.5), B(1.4), V(.9), K(.4), X(.15), J(.13), Q(.11), Z(.07).
> -.Ed
>