Hi Julio,

Thanks for your question. In NSP we are always counting ngrams, so the
order of the words making up the ngram is considered. When we are counting
bigrams (the default case for NSP)  word1 is always the first word in a
bigram, and word2 is always the second word. I think in other presentations
of PMI word1 and word2 are simply co-occurrences, so the order does not
matter. However, for NSP order does matter and so n1p is the number of
times word1 occurs as the first word in a ngram.

Here's a very simple example where cat occurs as the first word in a bigram
3 times and as the second word in a bigram 1 time. Note that I've used the
--newline option so that ngrams do not extend across lines.

ukko(14): cat test
cat mouse
cat mouse
cat mouse
house cat
ukko(15): count.pl --newline test.cnt test
ukko(16): cat test.cnt
4
cat<>mouse<>3 3 3
house<>cat<>1 1 1

This is described in more detail in the NSP paper (see below), which would
be a reasonable reference I think :I hope this helps, and please let us
know if other questions arise.

Cordially,
Ted

The Design, Implementation, and Use of the Ngram Statistics Package
<http://www.d.umn.edu/~tpederse/Pubs/cicling2003-2.pdf>(Banerjee and
Pedersen) - Appears in the Proceedings of the Fourth International
Conference on Intelligent Text Processing and Computational Linguistics,
pp. 370-381, February 17-21, 2003, Mexico City.



On Sun, May 14, 2017 at 12:10 AM, Julio Santisteban <pulsso....@gmail.com>
wrote:

> Hi Ted & Satanjeev ,
>
> I am Julio from Peru and I have a small query. In your Perl implementation
> of PMI  you mention about the contingency table: "n1p is the number of
> times in total that word1 occurs as the first word in a bigram".  but this
> is not the case, usually PMI is workout  with n1p  as the marginals (total
> frequency of word1) from the contingency table.
>
> I am sure you are correct, I just want to ask you some reference about it.
>
> http://search.cpan.org/~tpederse/Text-NSP-1.31/lib/
> Text/NSP/Measures/2D/MI/pmi.pm
>
>           word2   ~word2
>   word1    n11      n12 | n1p
>  ~word1    n21      n22 | n2p
>            --------------
>            np1      np2   npp
>
>
> Regards,
> Julio Santisteban
>

Reply via email to