John -
this looks like a good alternative. It appears to slightly
under-approximate the variance but
is so much faster (>20,000 x) that it may be worth the trade-off. Now if
only my Bayesian method
would give better results!
Here's my comparison:
NB. Different ways to approximate normal distribution (on 5 scores), having
NB.(approximately) same mean and standard deviation as empirical data.
NB. Following suggested by John Randall: fit binomial distribution:
meanfreq =:(+/ .* [EMAIL PROTECTED]) % +/ NB.* meanfreq: mean of frequency table
binn =:<:@# NB.* binn: binomial n from frequency
table
binp =:meanfreq % <:@# NB.* binp: binomial p from frequency table
NB.* bindist: binomial dist from freq table
bindist =:[: p. (binp^binn);(binn # [EMAIL PROTECTED])
NB. Versus an iterative estimator:
adjustNormalDist=: 3 : 0
mstarg=. y. [ maxiter=. 20 [ ctr=. 0
'madj sadj'=. mstarg
msrs=. adjmsd sn=. (%+/)(madj+sadj*i:2j4) pdfnc mstarg
while. (1e_5 +./ . <:|msrs-mstarg) *. maxiter>:ctr=. >:ctr do.
'madj sadj'=. (madj,sadj)+msrs-mstarg
msrs=. adjmsd sn=. (%+/)(madj+sadj*i:2j4) pdfnc mstarg
end.
sn
)
NB.* adjmsd: show adjusted mean and SD
adjmsd=: (([:mean 1 2 3 4 5+/ .*~]),[:stddev 1 2 3 4 5#~[:<.0.5+1e6*])
NB.* pdfnc: prob density fnc of normal curve for given SD and mean at points
x.
pdfnc=: 4 : '(%sd*%:o. 2)*^-(*:x.-mn)%+:*:sd [ ''mn sd''=. y.'
crecs=. 1{"1 getUserRecs&>5{.UUIDS NB. Cust recs: movie, cust, rate, dt
mnssds=. (mean,stddev)&>2{&.>crecs NB. adjustNormalDist takes means & SDs
mnssds;(adjmsd"1 bindist"1]5{.CPROBS);adjmsd"1 adjustNormalDist"1 mnssds
+--------------------+--------------------+--------------------+
|3.4185304 0.83555495|3.4185304 0.97786064|3.4185272 0.83555661|
|4.0113507 0.89942311|4.0113507 0.86272412|4.0113546 0.89942198|
|4.2142857 0.78975397|4.2142857 0.79459397|4.2142815 0.78975647|
|3.3923077 1.0471226 |3.3923077 0.98057379|3.39231 1.0471132 |
|3.4814815 1.2206672 |3.4814815 0.97058952|3.4814534 1.2206439 |
+--------------------+--------------------+--------------------+
NB. Comparison of distribution approximations:
bindist"1]5{.CPROBS
0.024434501 0.14947004 0.34287521 0.34957109 0.13364915
0.0037318916 0.045468236 0.2077392 0.42183858 0.32122209
0.0014887392 0.024361187 0.1494891 0.40769756 0.41696341
0.026095869 0.15532661 0.34669791 0.34393318 0.12794643
0.020770187 0.1357661 0.33279251 0.36255445 0.14811676
adjustNormalDist"1 mnssds
0.0082269678 0.11550491 0.41299218 0.37606586 0.087210083
0.0058771044 0.051522575 0.20903631 0.39249664 0.34106737
0.0012439144 0.02226285 0.15342208 0.40711008 0.41596107
0.038271035 0.16036229 0.32599857 0.32152189 0.15384622
0.07274651 0.15352443 0.24164998 0.28368735 0.24839172
NB. Timings: bindist requires only the probability distribution:
6!:2 'bindist"1]100{.CPROBS'
0.0038555179
NB. Versus:
6!:2 'crecs=. 1{"1 getUserRecs&>100{.UUIDS'
9.9790119
6!:2 'adjustNormalDist"1 (mean,stddev)&>2{&.>crecs'
77.649548
(+/77.649548 9.9790119)%0.0038555179
22728.091 NB. Ratio of times
NB. Alternate use of "adjustNormalDist" works w/ests of the prob dists:
6!:2 'mnssds=. adjmsd"1]100{.CPROBS'
5.5266708
6!:2 'adjustNormalDist"1 mnssds'
77.952477
(+/77.952477 5.5266708)%0.0038555179
21651.864
NB. Little difference in relative times
On 1/16/07, John Randall <[EMAIL PROTECTED]> wrote:
Devon McCormick wrote:
> What I'd like to do is to construct an equivalent 5-element
> distribution
> with the same mean and standard deviation but (more or less) normally
> distributed.
How about fitting a binomial distribution to the data? If a frequency
table on i.(n+1) has mean m, the binomial distribution with
generating function (q+px)^n has the same mean if np=m.
mf =:(+/ .* [EMAIL PROTECTED]) % +/ NB. mean of frequency table
n =:<:@# NB. binomial n from frequency table
p =:mf % <:@# NB. binomial p from frequency table
b =:[: p. (p^n);(n # [EMAIL PROTECTED]) NB. binomial dist from freqency table
d =:0 0.13333333 0.4 0.46666667 0 NB. data
b d NB. binomial with same mean
0.0301408 0.168789 0.354456 0.330826 0.115789
mf d
2.33333
mf b d
2.33333
Best wishes,
John
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
--
Devon McCormick
^me^ at acm.
org is my
preferred e-mail
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm