Dear mail server admins :-) I am trying to analyse the size distribution of emails, and whether it is Zipf-distributed or if there is a better approximation. I have my own mails obviously, and found a old analysis at [0], but I would like to have a up-to-date and somewhat representative sample.
For this, I would like to ask for your assistence: Could you, if you
have access to a large number of mail boxes, obtain the file sizes of
each mail, and email me that list?
E.g. if every mail is a file in the folder inbox/
$ find inbox/ -type f -printf '%s\n'
If you'd prefer to use a backup tar, you can do something like
$ tar -tvzf mails.tar.gz |awk '{ print $3;}'
Any compressed format would be fine. The more files, the better ;-)
Spam-free would also be preferred. Please also let me know if
your user base might be a slightly biased sample.
If you wish, I can let you know my findings. It would be a help me a
lot.
Best regards,
Johannes
(Student at the TU Vienna, Austria)
[0] http://osdir.com/ml/freebsd.devel.net/2002-10/msg00203.html
--
Johannes Buchner
mail: [email protected]
xmpp: [email protected]
icq: 163390666
skype:johannes_buchner
Ich freue mich über PGP/GPG verschlüsselte/signierte Mails!
pgpDSsb2dgZ7G.pgp
Description: PGP signature
