[ngram] Re: the NSP trigram calculations don't match mine??

2009-06-11 Thread gunnlyse
Hello, 
thank you for this clarification!

The NSP values still do not match mine, and I see that it concerns ll, pmi, ps 
as well as tmi for trigrams. Evidently, there must be some error which probably 
lies in the observed or estimated frequencies (since all four measures produce 
different results than mine)

I need to ask for two clarifications:
(1) estimated frequency: The webpage/pmi file says:
n1pp * np1p * npp1
   m111= 
nppp
but the file 3D.pm says 
  $m111=$n1pp*$np1p*$npp1/($nppp**2); 

 which I take to mean that we use, not nppp, but the exponent:
n1pp * np1p * npp1
 m111= 
nppp * nppp
If so, which one sould I really use? 

(2) Furthermore, let us return to the example trigram. When I compute the 
example trigram's pmi in the way I understand the code, I get the value 
-15.24452, instead of the NSP package's  6.4127. 
All the observed frequencies needed for pmi are directly available in the 
example trigram line, so the only thing that can explain diverging results is 
HOW we compute the value.
May I therefore ask if you agree with the way I understand the code? 

For the trigram 
355663266
atdeter262744 7073841 9391062 5872364 1234064 647295 1064083

I compute m111 as:
  m111=  7073841 * 9391062 * 5872364
-
355663266
 
   = 1.0968417e+12
 
  and PMI = log (262744 / 1.0968417e+12) = -15.24452

  NSP's pmi returns (using the command line:
  statistic.pl --ngram 3 pmi outputfile inputfile )
  produces the following line 
atdeter1 6.4127 262744 7073841 9391062 5872364 1234064 647295 1064083

Best,
Gunn



Re: [ngram] Re: the NSP trigram calculations don't match mine??

2009-06-11 Thread Ted Pedersen
Hi Gunn,

Here's a simple test case that would be useful to see the results of...
Below I'm providing the output from the short form of the input name...
I'll do the same with the long form too...

input :

this is a test of mine i want to test many things with this
this is a good day for testing i think i will test many things
I have a test of mine that I went to test

count.pl --ngram 3 test.cnt input

statistic.pl --ngram 3 pmi test.pmi test.cnt
statistic.pl --ngram 3 ll test.ll test.cnt
statistic.pl --ngram 3 ps test.ps test.cnt
statistic.pl --ngram 3 tmi test.tmi test.cnt

marimba(46): more test.pmi
37
gooddayfor1 10.4189 1 1 1 1 1 1 1
dayfortesting1 10.4189 1 1 1 1 1 1 1
thatIwent2 9.4189 1 1 2 1 1 1 1
agoodday3 8.8339 1 3 1 1 1 1 1
thinkiwill3 8.8339 1 1 3 1 1 1 1
fortestingi3 8.8339 1 1 1 3 1 1 1
testingithink3 8.8339 1 1 3 1 1 1 1
withthisthis4 8.4189 1 1 2 2 1 1 1
manythingswith4 8.4189 1 2 2 1 2 1 1
ofminethat4 8.4189 1 2 2 1 2 1 1
Iwentto4 8.4189 1 2 1 2 1 1 1
thingsIhave4 8.4189 1 2 2 1 1 1 1
thingswiththis4 8.4189 1 2 1 2 1 1 1
minethatI4 8.4189 1 2 1 2 1 1 1
iwantto5 7.8339 1 3 1 2 1 1 1
isagood5 7.8339 1 2 3 1 2 1 1
Ihavea5 7.8339 1 2 1 3 1 1 1
mineiwant5 7.8339 1 2 3 1 1 1 1
thisthisis5 7.8339 1 3 2 1 1 1 1
testmanythings6 7.4189 2 4 2 2 2 2 2
testofmine6 7.4189 2 4 2 2 2 2 2
totestmany6 7.4189 1 1 4 2 1 1 2
manythingsI6 7.4189 1 2 2 2 2 1 1
willtestmany6 7.4189 1 1 4 2 1 1 2
thisisa7 7.2490 2 3 2 3 2 2 2
ithinki7 7.2490 1 3 1 3 1 1 1
wanttotest8 7.0970 1 1 2 5 1 1 2
wenttotest8 7.0970 1 1 2 5 1 1 2
atestof9 6.8339 2 3 4 2 2 2 2
ofminei9 6.8339 1 2 2 3 2 1 1
haveatest10 6.5120 1 1 3 5 1 1 2
iwilltest10 6.5120 1 3 1 5 1 1 1
isatest11 5.5120 1 2 3 5 2 1 2

37
gooddayfor1 10.4189 1 1 1 1 1 1 1
dayfortesting1 10.4189 1 1 1 1 1 1 1
thatIwent2 9.4189 1 1 2 1 1 1 1
agoodday3 8.8339 1 3 1 1 1 1 1
thinkiwill3 8.8339 1 1 3 1 1 1 1
fortestingi3 8.8339 1 1 1 3 1 1 1
testingithink3 8.8339 1 1 3 1 1 1 1
withthisthis4 8.4189 1 1 2 2 1 1 1
manythingswith4 8.4189 1 2 2 1 2 1 1
ofminethat4 8.4189 1 2 2 1 2 1 1
Iwentto4 8.4189 1 2 1 2 1 1 1
thingsIhave4 8.4189 1 2 2 1 1 1 1
thingswiththis4 8.4189 1 2 1 2 1 1 1
minethatI4 8.4189 1 2 1 2 1 1 1
iwantto5 7.8339 1 3 1 2 1 1 1
isagood5 7.8339 1 2 3 1 2 1 1
Ihavea5 7.8339 1 2 1 3 1 1 1
mineiwant5 7.8339 1 2 3 1 1 1 1
thisthisis5 7.8339 1 3 2 1 1 1 1
testmanythings6 7.4189 2 4 2 2 2 2 2
testofmine6 7.4189 2 4 2 2 2 2 2
totestmany6 7.4189 1 1 4 2 1 1 2
manythingsI6 7.4189 1 2 2 2 2 1 1
willtestmany6 7.4189 1 1 4 2 1 1 2
thisisa7 7.2490 2 3 2 3 2 2 2
ithinki7 7.2490 1 3 1 3 1 1 1
wanttotest8 7.0970 1 1 2 5 1 1 2
wenttotest8 7.0970 1 1 2 5 1 1 2
atestof9 6.8339 2 3 4 2 2 2 2
ofminei9 6.8339 1 2 2 3 2 1 1
haveatest10 6.5120 1 1 3 5 1 1 2
iwilltest10 6.5120 1 3 1 5 1 1 1
isatest11 5.5120 1 2 3 5 2 1 2

marimba(48): more test.ps
37
testmanythings1 8.2848 2 4 2 2 2 2 2
testofmine1 8.2848 2 4 2 2 2 2 2
thisisa2 8.0492 2 3 2 3 2 2 2
atestof3 7.4739 2 3 4 2 2 2 2
gooddayfor4 6.2218 1 1 1 1 1 1 1
dayfortesting4 6.2218 1 1 1 1 1 1 1
thatIwent5 5.5287 1 1 2 1 1 1 1
agoodday6 5.1232 1 3 1 1 1 1 1
thinkiwill6 5.1232 1 1 3 1 1 1 1
fortestingi6 5.1232 1 1 1 3 1 1 1
testingithink6 5.1232 1 1 3 1 1 1 1
withthisthis7 4.8355 1 1 2 2 1 1 1
manythingswith7 4.8355 1 2 2 1 2 1 1
ofminethat7 4.8355 1 2 2 1 2 1 1
Iwentto7 4.8355 1 2 1 2 1 1 1
thingsIhave7 4.8355 1 2 2 1 1 1 1
thingswiththis7 4.8355 1 2 1 2 1 1 1
minethatI7 4.8355 1 2 1 2 1 1 1
iwantto8 4.4301 1 3 1 2 1 1 1
isagood8 4.4301 1 2 3 1 2 1 1
Ihavea8 4.4301 1 2 1 3 1 1 1
mineiwant8 4.4301 1 2 3 1 1 1 1
thisthisis8 4.4301 1 3 2 1 1 1 1
totestmany9 4.1424 1 1 4 2 1 1 2
manythingsI9 4.1424 1 2 2 2 2 1 1
willtestmany9 4.1424 1 1 4 2 1 1 2
ithinki10 4.0246 1 3 1 3 1 1 1
wanttotest11 3.9193 1 1 2 5 1 1 2
wenttotest11 3.9193 1 1 2 5 1 1 2
ofminei12 3.7369 1 2 2 3 2 1 1
haveatest13 3.5138 1 1 3 5 1 1 2
iwilltest13 3.5138 1 3 1 5 1 1 1
isatest14 2.8206 1 2 3 5 2 1 2

marimba(51): more test.tmi
37
testmanythings1 0.4986 2 4 2 2 2 2 2
testofmine1 0.4986 2 4 2 2 2 2 2
thisisa2 0.4590 2 3 2 3 2 2 2
manythingswith3 0.4286 1 2 2 1 2 1 1
ofminethat3 0.4286 1 2 2 1 2 1 1
atestof4 0.4265 2 3 4 2 2 2 2
manythingsI5 0.3756 1 2 2 2 2 1 1
gooddayfor6 0.3585 1 1 1 1 1 1 1
dayfortesting6 0.3585 1 1 1 1 1 1 1
ofminei7 0.3564 1 2 2 3 2 1 1
isagood8 0.3541 1 2 3 1 2 1 1
isatest9 0.3506 1 2 3 5 2 1 2
totestmany10 0.3205 1 1 4 2 1 1 2
willtestmany10 0.3205 1 1 4 2 1 1 2
thatIwent11 0.3045 1 1 2 1 1 1 1
wanttotest12 0.2974 1 1 2 5 1 1 2
wenttotest12 0.2974 1 1 2 5 1 1 2
agoodday13 0.2841 1 3 1 1 1 1 1
thinkiwill13 0.2841 1 1 3 1 1 1 1
fortestingi13 0.2841 1 1 1 3 1 1 1
testingithink13 0.2841 1 1 3 1 1 1 1
withthisthis14 0.2515 1 1 2 2 1 1 1
Iwentto14 0.2515 1 2 1 2 1 1 1
thingsIhave14 0.2515 1 2 2 1 1 1 1
thingswiththis14 0.2515 1 2 1 2 1 1 1
minethatI14 0.2515 1 2 1 2 1 1 1
iwantto15 0.2323 1 3 1 2 1 1 1
Ihavea15 0.2323 1 2 1 3 1 1 1
mineiwant15 0.2323 1 2 3 1 1 1 1
thisthisis15 0.2323 1 3 2 1 1 1 1
haveatest16 0.2265 1 1 3 5 1 1 2
ithinki17 0.2142 1 3 1 3 1 1