Hi Julio, Thanks for your question. In NSP we are always counting ngrams, so the order of the words making up the ngram is considered. When we are counting bigrams (the default case for NSP) word1 is always the first word in a bigram, and word2 is always the second word. I think in other presentations of PMI word1 and word2 are simply co-occurrences, so the order does not matter. However, for NSP order does matter and so n1p is the number of times word1 occurs as the first word in a ngram.
Here's a very simple example where cat occurs as the first word in a bigram 3 times and as the second word in a bigram 1 time. Note that I've used the --newline option so that ngrams do not extend across lines. ukko(14): cat test cat mouse cat mouse cat mouse house cat ukko(15): count.pl --newline test.cnt test ukko(16): cat test.cnt 4 cat<>mouse<>3 3 3 house<>cat<>1 1 1 This is described in more detail in the NSP paper (see below), which would be a reasonable reference I think :I hope this helps, and please let us know if other questions arise. Cordially, Ted The Design, Implementation, and Use of the Ngram Statistics Package <http://www.d.umn.edu/~tpederse/Pubs/cicling2003-2.pdf>(Banerjee and Pedersen) - Appears in the Proceedings of the Fourth International Conference on Intelligent Text Processing and Computational Linguistics, pp. 370-381, February 17-21, 2003, Mexico City. On Sun, May 14, 2017 at 12:10 AM, Julio Santisteban <pulsso....@gmail.com> wrote: > Hi Ted & Satanjeev , > > I am Julio from Peru and I have a small query. In your Perl implementation > of PMI you mention about the contingency table: "n1p is the number of > times in total that word1 occurs as the first word in a bigram". but this > is not the case, usually PMI is workout with n1p as the marginals (total > frequency of word1) from the contingency table. > > I am sure you are correct, I just want to ask you some reference about it. > > http://search.cpan.org/~tpederse/Text-NSP-1.31/lib/ > Text/NSP/Measures/2D/MI/pmi.pm > > word2 ~word2 > word1 n11 n12 | n1p > ~word1 n21 n22 | n2p > -------------- > np1 np2 npp > > > Regards, > Julio Santisteban >