Quoting Dennis Lee Bieber

limitedNormal ( 75, 20 )
        computed statistics: mu = 75.5121294828 sigma = 8.16374859991

        Note how computing the input sigma such that 3*sigma does not exceed
 boundaries results in a narrow bell curve (hmm, and for this set, no one
 scored 95-100)

 retryNormal ( 75, 20 )
        computed statistics: mu = 73.283826412  sigma = 16.9151951316

        The retry model produces a skew, but perhaps somewhat unnatural; a
 real skew should still have relatively few entries in the 95-100 bin,
 whereas this is still rather symmetrical about the mean.  Compare the
 45, 65, and 80 bins between these two, those show most of the retried
 values that otherwise clipped to 100 below.

 clippedNormal ( 75, 20 )
        computed statistics: mu = 75.3240108464 sigma = 18.1008966385

 Note the unnatural peak of grades in the 95-100 range resulting from
 clipping out of range values into the range.
*
See,  the full results below*

Wow thanks for this Dennis, I am actually trying to simulate the scores on
the step 1 medical boards exam. The actual distribution is not readily
available. I like your limitedNormal approach. I think this is the way for
me to go. I can come up with some reasonable mean and sd numbers from actual
results then I this approach seems the best to simulate those results. I
should actualy know more about this as I do a lot of stats but mostly
regressions. I need to look up ome more about the F dist, but I think your
limited approuch is the way to go. Trying to combine learning python and
simulating the medical residency application process has been interesting.
Here is a graph of past test results, I relize they are not on a 0 - 100
score but they is easy to address
[image: step1_score_distribution_custom.GIF]

Thanks
Vincent Davis
720-301-3003


On Sat, Jun 20, 2009 at 7:43 PM, Dennis Lee Bieber <wlfr...@ix.netcom.com>wrote:

>
>        I must be bored today...
>
>        Expanded variation:
>
> -=-=-=-=-=-
> """
>    Random Scores
> """
>
> import random
> import numpy
>
> def limitedNormal(mu, sigma=None):
>     """
>        returns a random score from a normal (bell) distribution in
> which
>        the mean, mu, is supplied by the caller, and in which the
>        standard deviation, sigma, is computed such that 3-sigma does
>        not drop below 0 [for mu < 50] or rise above 100 [for mu > 50]
>         sigma is shown as a parameter but is not used -- it permits
>        using the same arguments for all three *Normal() methods
>     """
>    if mu < 50.0:
>        sigma = mu / 3.0
>    else:
>        sigma = (100.0 - mu) / 3.0
>    return random.normalvariate(mu, sigma)
>
> def clippedNormal(mu, sigma):
>    """
>        returns a random score from a normal distribution in which
>        the mean, mu, and standard deviation, sigma, are supplied
>        by the caller.
>        the result is clipped to the range 0..100
>     """
>    return max(0.0,
>               min(100.0,
>                    random.normalvariate(mu, sigma)))
>
> def retryNormal(mu, sigma):
>    """
>        returns a random score from a normal distribution in which
>        the mean, mu, and the standard deviation, sigma, are supplied
>        by the caller.
>        if the result falls outside the range 0..100 a new score
>        is generated.
>        extremely large sigma, or mu close to the range end points
>        will cause slowness, as many results are thrown out and retried
>    """
>    score = -1
>    while not (0.0 <= score <= 100.0):
>        score = random.normalvariate(mu, sigma)
>    return score
>
>
> def clippedGamma(mu, B):
>     """
>        returns a random score from a gamma distribution in which the
>         "shape", a, is computed as the mean, mu, as from a normal
>        distribution divided by the B, "rate". Both mu and B are
>         supplied by the caller.
>        as the gamma distribution has a long tail to the right, for mu >
> 50
>        the result is computed as 100 - gamma(100-mu) to reflect the
>        desired skewness
>        results are clipped to the boundaries 0..100 as there is no easy
>        way to compute B to limit results, as is done for sigma in
>        limitedNormal()
>        NOTE: while the mean of the results will approach mu, the peak
> of
>        the curve will be to the right (for mu>50) or left (for mu<50)
>        relative to a normal curve
>    """
>     if mu < 50.0:
>        return max(0.0,
>                   min(100.0,
>                        random.gammavariate(mu / B, B)))
>     else:
>        return 100.0 - max(0.0,
>                         min(100.0,
>                              random.gammavariate((100.0 - mu) / B,
>                                                 B)))
>
> def retryGamma(mu, B):
>     """
>        returns a random score from a gamma distribution in which the
>         "shape", a, is computed as the mean, mu, as from a normal
>        distribution divided by the B, "rate". Both mu and B are
>         supplied by the caller.
>        as the gamma distribution has a long tail to the right, for mu >
> 50
>        the result is computed as 100 - gamma(100-mu) to reflect the
>        desired skewness
>         results outside the boundaries 0..100 will be retried
>         NOTE: while the mean of the results will approach mu, the peak
> of
>        the curve will be to the right (for mu>50) or left (for mu<50)
>        relative to a normal curve
>    """
>     score = -1
>    while not (0.0 <= score <= 100.0):
>        if mu < 50.0:
>            score = random.gammavariate(mu / B, B)
>        else:
>            score = 100.0 - random.gammavariate((100.0 - mu) / B,
>                                                B)
>    return score
>
> if __name__ == "__main__":
>    tries = [   ("limitedNormal", limitedNormal, (75, 20)),
>                ("retryNormal", retryNormal, (75, 20)),
>                ("clippedNormal", clippedNormal, (75, 20)),
>                ("clippedGamma", clippedGamma, (75, 10)),
>                ("retryGamma", retryGamma, (75, 10))    ]
>
>    state = random.getstate()
>
>    for (name, func, args) in tries:
>        random.setstate(state)  #reset random number generator so
>                                #each run has the same sequence
>        scores = [func(*args) for i in range(100)]
>        scores.sort()   #so listing is easier to scan visually
>        mu = numpy.mean(scores)
>        sigma = numpy.std(scores)
>        print "\n\n%s ( %s, %s )" % tuple([name] + list(args))
>        print "\tcomputed statistics: mu = %s\tsigma = %s" % (mu, sigma)
>        (histv, histb) = numpy.histogram(scores,
>                                         bins=20,
>                                         range=(0.0, 100.0))
>        print "\t\thistogram"
>        for i, v in enumerate(histv):
>            print "%4d\t%s" % (histb[i+1], "*" * v)
>        print ""
> ##        print "\t\tsorted scores"
> ##        print scores
>        print ""
>
> -=-=-=-=-=-=-=-
>
> limitedNormal ( 75, 20 )
>        computed statistics: mu = 75.5121294828 sigma = 8.16374859991
>                histogram
>   5
>  10
>  15
>  20
>  25
>  30
>  35
>  40
>  45
>  50
>  55    *
>  60    ***
>  65    ******
>  70    *****************
>  75    *******************
>  80    ************************
>  85    *****************
>  90    **********
>  95    ***
>  100
>
>
>        Note how computing the input sigma such that 3*sigma does not exceed
> boundaries results in a narrow bell curve (hmm, and for this set, no one
> scored 95-100)
>
>
>
>
> retryNormal ( 75, 20 )
>        computed statistics: mu = 73.283826412  sigma = 16.9151951316
>                histogram
>   5
>  10
>  15
>  20
>  25
>  30    *
>  35    *
>  40    **
>  45    ****
>  50    **
>  55    ****
>  60    ******
>  65    **********
>  70    *********
>  75    *********
>  80    *************
>  85    *************
>  90    ********
>  95    *********
>  100    *********
>
>        The retry model produces a skew, but perhaps somewhat unnatural; a
> real skew should still have relatively few entries in the 95-100 bin,
> whereas this is still rather symmetrical about the mean.  Compare the
> 45, 65, and 80 bins between these two, those show most of the retried
> values that otherwise clipped to 100 below.
>
>
> clippedNormal ( 75, 20 )
>        computed statistics: mu = 75.3240108464 sigma = 18.1008966385
>                histogram
>   5
>  10
>  15
>  20
>  25
>  30    *
>  35    *
>  40    **
>  45    ***
>  50    **
>  55    ****
>  60    ******
>  65    **********
>  70    *********
>  75    ********
>  80    *********
>  85    ************
>  90    ********
>  95    *******
>  100    ******************
>
>        Note the unnatural peak of grades in the 95-100 range resulting from
> clipping out of range values into the range.
>
>
> clippedGamma ( 75, 20 )
>        computed statistics: mu = 75.2343345947 sigma = 21.1537553145
>                histogram
>   5    *
>  10
>  15
>  20
>  25    *
>  30    *
>  35    ***
>  40    *
>  45    *****
>  50    ****
>  55    **
>  60    **
>  65    ********
>  70    *****
>  75    *******
>  80    *********
>  85    ******
>  90    ***********
>  95    ******************
>  100    ****************
>
>        One entry was clipped to 0.0 as it doesn't show up below
>
>
>
> retryGamma ( 75, 20 )
>        computed statistics: mu = 75.7551006676 sigma = 19.8990180392
>                histogram
>   5
>  10
>  15
>  20
>  25    *
>  30    *
>  35    ***
>  40    *
>  45    *****
>  50    ****
>  55    ***
>  60    **
>  65    ********
>  70    *****
>  75    *******
>  80    *********
>  85    ******
>  90    ***********
>  95    ******************
>  100    ****************
>
>
>
>
> --
>        Wulfraed        Dennis Lee Bieber               KD6MOG
>        wlfr...@ix.netcom.com           wulfr...@bestiaria.com
>                HTTP://wlfraed.home.netcom.com/
>        (Bestiaria Support Staff:               web-a...@bestiaria.com)
>                HTTP://www.bestiaria.com/
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to