Devon,

        I don't know why you are doing all this, but let me
observe that your final answer is simply the mean and
standard deviation of your original sample.

   mean =: +/ % #
   std =: ([:+/&.:*: (- mean))%%:@(<:@#)
   std _1 0 1
1
   (mean,std)2 4 4 3 3 3 4 2 3 4 4 3 4 3 4
3.33333 0.723747
;
        If all you want is a good estimate of customers
692's rating distribution it is exactly your frequency
distribution:
+ 2 2
+ 6 3
+ 7 4
+

        Btw, another way to get that distribution is as
follows.

   freqcount=: (/: {:"1)@(~. ,.~ #/.~)
   freqcount 2 4 4 3 3 3 4 2 3 4 4 3 4 3 4
2 2
6 3
7 4

On Mon, 15 Jan 2007, Devon McCormick wrote:

+ Members of the Forum -
+
+ let's say I'm looking at Netflix ratings as probability distributions
+ instead of as point estimates. So, if customer 692 has these ratings:
+
+    'rc cr692'=. getUserRecs 692
+    2{cr692                               NB. The ratings
+ 2 4 4 3 3 3 4 2 3 4 4 3 4 3 4
+
+ The frequency table looks like this:
+
+    frtab 2{cr692 NB. counts ,. values
+ 2 2
+ 6 3
+ 7 4
+
+ [ where
+ frtab=: 3 : 0
+    y.=. y./:y.
+    difs=. 2-~/\(#y.),~I. 1,2~:/\y.
+    if. -.isNum y. do. difs=. <"0 difs [ y.=. <"1 ,.y. end.
+    difs,.~.y.
+ )
+ ]
+
+ Converting these to probabilities for ratings of >:i.5:
+
+    ]pr692=. ([:(%+/)0{1 0-~[:|:[:frtab 1 2 3 4 5,]) 2{cr692
+ 0 0.13333333 0.4 0.46666667 0
+
+ [I concatenate 1 2 3 4 5 to ensure an entry for any missing rating
+ and (%+/) to make the probabilities sum to one.]
+
+ Thus, customer 692 has given a rating of "3" 40% of the time and
+ a rating of "4" about 47% of the time.
+
+ This distribution has a mean and standard deviation:
+    (mean,stddev) 2{cr692
+ 3.3333333 0.72374686
+
+ Alternate mean calculation from the probability vector:
+    pr692 +/ . * >:i.5
+ 3.3333333
+
+ What I'd like to do is to construct an equivalent 5-element distribution
+ with the same mean and standard deviation but (more or less) normally
+ distributed.
+
+ I can easily do this for a standard normal (mean 0 and SD 1):
+
+ NB.* pdfnc: probability density fnc for normal curve w/given mean and SD.
+ pdfnc=: 4 : '(%sd*%:o. 2)*^-(*:x.-mn)%+:*:sd [ ''mn sd''=. y.'
+
+ Assuming the end-points are two standard deviations ((i:2j4) -: _2 _1 0 1 2)
+
+ from the mean:
+
+    ]sn=. (%+/)(i:2j4) pdfnc 0 1
+ 0.054488685 0.24420134 0.40261995 0.24420134 0.054488685
+
+ This distribution "sn" has mean of 3
+    sn +/ . * >:i.5
+ 3
+
+ and an approximate standard deviation of
+    stddev (<.0.5+1e6*sn)#>:i.5
+ 0.96141298
+
+ This is slightly less than one because I adjusted the distribution
+ using (%+/) to force summation to one. I'm sure there's a more exact,
+ analytic way to calculate the standard deviation but this works well
+ enough for now and I'm mostly concerned with the mean.
+
+ I can see an iterative way to get where I want:
+
+ NB. First, adjust the mean:
+    (sn=. (%+/)(_0.3+i:2j4) pdfnc 0 1)+/ . * >:i.5
+ 3.2758083
+    (sn=. (%+/)(_0.35+i:2j4) pdfnc 0 1)+/ . * >:i.5
+ 3.3211532
+    (sn=. (%+/)(_0.37+i:2j4) pdfnc 0 1)+/ . * >:i.5
+ 3.3392133
+ . . .
+    (sn=. (%+/)(_0.3635+i:2j4) pdfnc 0 1)+/ . * >:i.5
+ 3.3333489
+
+ NB. Now work on the standard deviation:
+    stddev (<.0.5+1e7*sn=. (%+/)(_0.3635+1.1*i:2j4) pdfnc 0 1)#>:i.5
+ 0.884115
+    stddev (<.0.5+1e7*sn=. (%+/)(_0.3635+1.2*i:2j4) pdfnc 0 1)#>:i.5
+ 0.82181792
+    stddev (<.0.5+1e7*sn=. (%+/)(_0.3635+1.5*i:2j4) pdfnc 0 1)#>:i.5
+ 0.66592555
+    stddev (<.0.5+1e7*sn=. (%+/)(_0.3635+1.4*i:2j4) pdfnc 0 1)#>:i.5
+ 0.71245182
+
+ NB. Of course, this throws off the mean:
+    sn +/ . * >:i.5
+ 3.2584499
+    adjmsd=: (([:mean 1 2 3 4 5+/ .*~]),[:stddev 1 2 3 4 5#~[:<.0.5+1e7*])
+   NB. Combine target measures...
+ . . .
+    adjmsd sn=. (%+/)(_0.461+1.376*i:2j4) pdfnc 0 1
+ 3.3331422 0.7238559
+ NB. Not too bad compared to:
+    (mean,stddev) 2{cr692
+ 3.3333333 0.72374686
+
+ This is probably workable but there must be an analytic solution,
+ probably a fairly straightforward one.
+
+ Any ideas?
+
+ --
+ Devon McCormick
+ ^me^ at acm.
+ org is my
+ preferred e-mail
+ ----------------------------------------------------------------------
+ For information about J forums see http://www.jsoftware.com/forums.htm
+

(B=) <----------my "sig"

Brian Schott
Atlanta, GA, USA
schott DOT bee are eye eh en AT gee em ae eye el DOT com
http://schott.selfip.net/~brian/
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to