Hi Mark,
Wow! I did not realize I even posted this many messages. While
this is a low number, I thought I was more quite [I assumed
quieter] than that. How did you derive this information?
That's over a year period, that makes between 1 mail every 2-3 days.
I wrote the following awk script. Basically, it looks for a line
starting with "From: ..." that is right after a line starting with
"From ..." (if you download an archive file you will see that this
systematically and unambiguously corresponds to the start of a post).
{
c++; if (c > 1) { exit }
filelist = "2005-April.txt,2005-August.txt,2005-December.txt,2005-
July.txt,2005-June.txt,2005-May.txt,2005-November.txt,2005-
October.txt,2005-September.txt,2006-February.txt,2006-January.txt,
2006-March.txt"
split(filelist,afiles,",")
for (i in afiles) {
print afiles[i]
while (getline < afiles[i]) {
if (lineB4 == 1 && $0 ~ /^From: /) {
gsub("\"", "", $0)
gsub("^From:[\t ]*", "", $0)
gsub(" at ", "@", $0)
frFrom[$0]++
}
lineB4 = 0
if ($0 ~ /^From /) {
lineB4 = 1
}
}
}
}
Because over a year period some of us have changed of email, I have
added a synonym system, where if a synonym exists
# Andre Garzia
synonym["[EMAIL PROTECTED] (Andre Garzia)"] = "[EMAIL PROTECTED] (Andre
Garzia)"
I checked for synomyms by sorting on the (Andre Garzia) part and the
name part of the email, with alerts for duplicates (used excel for
this with if(B2=B1; "!!!", ""). I was particularly careful about this
for the 20 first contributors on the list.
Once I was satisfied to have declared all alternative emails for a
given person, I executed the program again. Loop through the frFrom
to have the synonym's count being added to the one of the main term.
Then looped again through the frFrom array and printed out.
The "||||" representation is obtained with rept("|", frFrom[key]), in
excel.
If you need a coder to hire for medium to complex parsing problems,
take contact ;-).
Marielle
------------------------------------------------------------------------
--------
Marielle Lange (PhD), Psycholinguist
Alternative emails: [EMAIL PROTECTED],
Homepage
http://homepages.widged.com/mlange/
Easy access to lexical databases http://
lexicall.widged.com/
Supporting Education Technologists http://
revolution.widged.com/wiki/
_______________________________________________
use-revolution mailing list
[email protected]
Please visit this url to subscribe, unsubscribe and manage your subscription
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution