On Thu, 15 Feb 2001 10:33:02 -0000, [EMAIL PROTECTED] wrote: > I remember a time during the time of the Samsboss debate that someone > developed a technique to compare emails to each other (word content, gramma, > style) to try to determine who Samsboss was... Is that what you were > thinking of, Gavin? > you mean stuff like this ?
To: [email protected] Subject: More conspiracy theories... From: Ian Collier <[EMAIL PROTECTED]> Date: Thu, 25 Jun 1998 17:38:48 +0100 (BST) I took collected postings from 11 list members starting in October 1996 and attempted to extract the text from them (excluding headers, sigs and quoted things). I thought it might fuel the conspiracy theories if I invented some silly analysis techniques and tried them out. It might be of interest to know the verbosity of each person, so, counting the number of words extracted from each player (and I apologise to those who were left out - and to those who were included, for that matter - it was just an almost-random sampling of list members), we have: 73144 bob.words 66890 cookie.words 49559 andrew.words 45106 imc.words 28029 davehooper.words 25964 gavin.words 18147 samsboss.words 10413 bill.words 6971 robert.words 5338 dave.words [that's Dave Ledbury by the way] 3117 matthew.words 332678 total It's been suggested to me that I might measure the relative frequencies of different lengths of word. So, for example, it turns out that 5.3% of all my words contain a single letter, while I output 19.8% four-letter words (!). In the following table the number in the ith column is the percentage of i-letter words. andrew 4.5 17.7 20.5 19.5 11.5 8.0 6.8 5.1 3.0 1.7 0.9 0.4 0.2 0.1 bill 4.8 18.7 23.6 21.7 11.5 7.0 5.5 3.4 2.2 0.9 0.5 0.0 0.0 0.0 bob 4.8 19.3 22.5 21.0 11.8 7.3 5.4 3.7 2.3 1.1 0.5 0.2 0.1 0.0 cookie 5.3 18.5 21.0 19.5 11.2 8.4 6.0 4.3 2.7 1.5 0.8 0.3 0.2 0.1 dave 4.9 17.3 21.2 20.6 11.2 8.5 6.8 4.0 2.6 1.8 0.6 0.3 0.2 0.0 davehooper 5.9 17.3 22.1 18.9 12.2 8.1 5.4 4.2 3.1 1.3 0.8 0.3 0.2 0.1 gavin 5.3 17.5 23.0 20.8 12.1 7.8 5.7 3.9 2.1 1.1 0.4 0.2 0.1 0.0 imc 5.3 18.7 21.4 19.8 11.0 7.8 6.6 4.2 2.5 1.4 0.8 0.4 0.1 0.0 matthew 5.7 18.7 21.9 22.2 10.3 6.9 6.0 3.7 2.1 1.3 0.6 0.1 0.2 0.0 robert 5.6 19.3 23.1 20.4 12.6 5.8 6.0 3.5 2.2 0.9 0.5 0.1 0.0 0.0 samsboss 4.9 19.3 23.8 22.4 11.6 6.7 4.6 3.3 1.7 1.0 0.4 0.1 0.1 0.0 It's possible to correlate different lines in this table to see how similar they are and thus find out who is a clone of whom. As you can see they are all pretty close to each other (which probably goes to show that we are all writing in English) so the results should be taken with a pinch of salt. Anyway, the *least* correlated are: davehooper + matthew 0.9813 andrew + samsboss 0.9836 dave + robert 0.9836 andrew + robert 0.9855 andrew + matthew 0.9865 but the *most* correlated are: davehooper + gavin 0.9962 bob + cookie 0.9963 andrew + cookie 0.9964 andrew + imc 0.9964 bob + imc 0.9966 bill + gavin 0.9971 bob + samsboss 0.9971 bill + bob 0.9980 bill + samsboss 0.9985 cookie + imc 0.9988 Erk! I'm a clone of Simon! Apart from that, I find the next three results very interesting indeed. :-) More later... imc -- Nev - no longer at [EMAIL PROTECTED] and getting no spam at all (yet) Webpage under construction at www,nfy53,demon,co,uk also hiding on ICQ

