Status: New
Owner: ----
Labels: Type-Defect Priority-Medium
New issue 3129 by [email protected]: Drastic change to sympy.stats:
Adding concept of Probability Distributions on surface level
http://code.google.com/p/sympy/issues/detail?id=3129
Currently, you create a random variable from a distribution like this:
X = Binomial(n, p)
This emulates the standard mathematical notation ``X ~ Binomial(n, p)``
That is, X *samples* from the Binomial distribution with count n and
probability p. But the current notation can also be interpreted as X
*equals* this Binomial distribution, and it's unclear that the function
Binomial (or any of the distribution functions) returns a random variable
and not the distribution itself. In fact, sympy.stats does not have any
class or concept of Distribution.
My suggestion is to add ProbabilityDistribution to sympy.stats and change
the current syntax for creating new random variables. I'm not exactly sure
on how this would interact with current ProbabilitySpaces (maybe we can
just rename BinomialPSpace to just Binomial and leave it at that). It
should be visible to the user, unlike, say, PSpace, so the user can play
with it as well as with random variables.
We call a random variable as so:
X = RandomSymbol('X', dist=Binomial(n, p))
or another notation I was thinking of,
X = Binomial(n, p).new_symbol('X')
'Binomial' would in this case be a type of ProbabilityDistribution. This is
more verbose than the current way, but it makes it explicit that X is a
random symbol and not a distribution. This also gets rid of the issue of
generating default random symbol names. Previously you'd have to write
X = Binomial(n, p, symbol='X')
to bind the symbol name 'X' to the variable. Otherwise it would use a
default, incrementing symbol. The first notation appeals to me because it
is similar to the notation for creating non-random symbols. The second
might be more pleasant if we replace 'new_symbol' with something shorter...
Adding distributions would add a bunch of interesting issues. Two
distributions with the same parameter should be equal to each other, but
two variables sampled from the same distribution aren't always equal.
BinomA = Binomial(1, S.Half)
BinomB = Binomial(1, S.Half)
BinomA == BinomB
True
X = RandomSymbol('X', BinomA)
Y = RandomSymbol('Y', BinomA)
P(Eq(X, Y))
0.5
Also, you shouldn't be able to call E (expected value) of a distribution,
though you should store the mean as a static property.
E(X) == BinomA.mean
True
Var(X) == BinomA.variance
True
Density(X) == BinomA.pdf
True
But can you multiply distributions or transform them? They are, after all,
generalized functions...
To summarize:
- Add the concept of ProbabilityDistribution to sympy.stats
- Functions like Binomial, Bernoulli, Gamma are now instances or subclasses
of ProbabilityDistribution.
- Change the syntax of creating a random variable to be unambiguous.
- Distributions are static objects: they carry information like mean,
variance, pdf, and two distributions are equal if they have the same
parameters
Benefits:
- Get rid of redundancy of creating a class for type of PSpace and then a
function to get the random variable of that PSpace.
- Explicitly creating symbol names, no more default symbols with increasing
numbers
- Unambiguous creation of new random variables
- Simple ProbabilityDistribution concept visible to users.
Drawbacks:
- More verbose to create a new RV
- May be seen as complicating the already complicated sympy.stats class
hierarchy
--
You received this message because you are subscribed to the Google Groups
"sympy-issues" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/sympy-issues?hl=en.