Hi Mark,

Wow! I did not realize I even posted this many messages. While this is a low number, I thought I was more quite [I assumed quieter] than that. How did you derive this information?

That's over a year period, that makes between 1 mail every 2-3 days.

I wrote the following awk script. Basically, it looks for a line starting with "From: ..." that is right after a line starting with "From ..." (if you download an archive file you will see that this systematically and unambiguously corresponds to the start of a post).

{
        c++; if (c > 1) { exit }

filelist = "2005-April.txt,2005-August.txt,2005-December.txt,2005- July.txt,2005-June.txt,2005-May.txt,2005-November.txt,2005- October.txt,2005-September.txt,2006-February.txt,2006-January.txt, 2006-March.txt"
        split(filelist,afiles,",")
        for (i in afiles) {
                print afiles[i]
                while (getline < afiles[i]) {
                        if (lineB4 == 1 && $0 ~ /^From: /) {
                                gsub("\"", "", $0)
                                gsub("^From:[\t ]*", "", $0)
                                gsub(" at ", "@", $0)
                                frFrom[$0]++
                        }
                        lineB4 = 0
                        if ($0 ~ /^From /) {
                              lineB4 = 1
                        }
                }
        }
}

Because over a year period some of us have changed of email, I have added a synonym system, where if a synonym exists
        # Andre Garzia
synonym["[EMAIL PROTECTED] (Andre Garzia)"] = "[EMAIL PROTECTED] (Andre Garzia)"

I checked for synomyms by sorting on the (Andre Garzia) part and the name part of the email, with alerts for duplicates (used excel for this with if(B2=B1; "!!!", ""). I was particularly careful about this for the 20 first contributors on the list.

Once I was satisfied to have declared all alternative emails for a given person, I executed the program again. Loop through the frFrom to have the synonym's count being added to the one of the main term. Then looped again through the frFrom array and printed out.

The "||||" representation is obtained with rept("|", frFrom[key]), in excel.

If you need a coder to hire for medium to complex parsing problems, take contact ;-).

Marielle


------------------------------------------------------------------------ --------
Marielle Lange (PhD),  Psycholinguist

Alternative emails: [EMAIL PROTECTED],

Homepage http://homepages.widged.com/mlange/ Easy access to lexical databases http:// lexicall.widged.com/ Supporting Education Technologists http:// revolution.widged.com/wiki/


_______________________________________________
use-revolution mailing list
[email protected]
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Reply via email to