On Thu, 15 Feb 2001 10:33:02 -0000, [EMAIL PROTECTED] wrote:

> I remember a time during the time of the Samsboss debate that someone
> developed a technique to compare emails to each other (word content, gramma,
> style) to try to determine who Samsboss was... Is that what you were
> thinking of, Gavin?
> 
you mean stuff like this ?

To: [email protected]
Subject: More conspiracy theories...
From: Ian Collier <[EMAIL PROTECTED]>
Date: Thu, 25 Jun 1998 17:38:48 +0100 (BST)

I took collected postings from 11 list members starting in October
1996 and attempted to extract the text from them (excluding headers,
sigs and quoted things).  I thought it might fuel the conspiracy
theories if I invented some silly analysis techniques and tried them
out.

It might be of interest to know the verbosity of each person, so,
counting the number of words extracted from each player (and I
apologise to those who were left out - and to those who were included,
for that matter - it was just an almost-random sampling of list
members), we have:

   73144 bob.words
   66890 cookie.words
   49559 andrew.words
   45106 imc.words
   28029 davehooper.words
   25964 gavin.words
   18147 samsboss.words
   10413 bill.words
    6971 robert.words
    5338 dave.words         [that's Dave Ledbury by the way]
    3117 matthew.words
  332678 total

It's been suggested to me that I might measure the relative
frequencies of different lengths of word.  So, for example, it turns
out that 5.3% of all my words contain a single letter, while I output
19.8% four-letter words (!).  In the following table the number in the
ith column is the percentage of i-letter words.

andrew     4.5 17.7 20.5 19.5 11.5  8.0  6.8  5.1  3.0  1.7  0.9  0.4
0.2  0.1
bill       4.8 18.7 23.6 21.7 11.5  7.0  5.5  3.4  2.2  0.9  0.5  0.0
0.0  0.0
bob        4.8 19.3 22.5 21.0 11.8  7.3  5.4  3.7  2.3  1.1  0.5  0.2
0.1  0.0
cookie     5.3 18.5 21.0 19.5 11.2  8.4  6.0  4.3  2.7  1.5  0.8  0.3
0.2  0.1
dave       4.9 17.3 21.2 20.6 11.2  8.5  6.8  4.0  2.6  1.8  0.6  0.3
0.2  0.0
davehooper 5.9 17.3 22.1 18.9 12.2  8.1  5.4  4.2  3.1  1.3  0.8  0.3
0.2  0.1
gavin      5.3 17.5 23.0 20.8 12.1  7.8  5.7  3.9  2.1  1.1  0.4  0.2
0.1  0.0
imc        5.3 18.7 21.4 19.8 11.0  7.8  6.6  4.2  2.5  1.4  0.8  0.4
0.1  0.0
matthew    5.7 18.7 21.9 22.2 10.3  6.9  6.0  3.7  2.1  1.3  0.6  0.1
0.2  0.0
robert     5.6 19.3 23.1 20.4 12.6  5.8  6.0  3.5  2.2  0.9  0.5  0.1
0.0  0.0
samsboss   4.9 19.3 23.8 22.4 11.6  6.7  4.6  3.3  1.7  1.0  0.4  0.1
0.1  0.0

It's possible to correlate different lines in this table to see how
similar they are and thus find out who is a clone of whom.  As you can
see they are all pretty close to each other (which probably goes to
show that we are all writing in English) so the results should be
taken with a pinch of salt.
Anyway, the *least* correlated are:

davehooper + matthew     0.9813
andrew + samsboss        0.9836
dave + robert            0.9836
andrew + robert          0.9855
andrew + matthew         0.9865

but the *most* correlated are:

davehooper + gavin       0.9962
bob + cookie             0.9963
andrew + cookie          0.9964
andrew + imc             0.9964
bob + imc                0.9966
bill + gavin             0.9971
bob + samsboss           0.9971
bill + bob               0.9980
bill + samsboss          0.9985
cookie + imc             0.9988

Erk!  I'm a clone of Simon!  Apart from that, I find the next three
results very interesting indeed. :-)

More later...

imc
-- 
Nev - no longer at [EMAIL PROTECTED] and getting no spam at all (yet)
Webpage under construction at www,nfy53,demon,co,uk
also hiding on ICQ

Reply via email to