On 8/16/08, Emad Nawfal (عماد نوفل) <[EMAIL PROTECTED]> wrote: > #! usr/bin/python > # Chi-squared collocation discovery > # Important definitions first. Let's suppose that we > # are trying to find whether "powerful computers" is a collocation > # N = The number of all bigrams in the corpus > # O11 = how many times the bigram "powerful computers" occurs in the corpus > # O22 = the number of bigrams not having either word in our collocation = N > - O11 > # O12 = The number of bigrams whose second word is our second word > # but whose first word is not "powerful"
This is just the number of occurrances of the second word - O11, isn't it? > # O21 = The number of bigrams whose first word is our first word, but whose > second word > # is different from oour second word This is the number of occurrances of the first word - O11. So one way to solve this would be to make two dictionaries - one which counts bigrams and one which counts words. Then you would get the numbers with just three dictionary lookups. Kent _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor