Issue 3129 in sympy: Drastic change to sympy.stats: Adding concept of Probability Distributions on surface level

sympy Sun, 04 Mar 2012 23:19:07 -0800

Status: New
Owner: ----
Labels: Type-Defect Priority-Medium

New issue 3129 by [email protected]: Drastic change to sympy.stats:Adding concept of Probability Distributions on surface level

http://code.google.com/p/sympy/issues/detail?id=3129

Currently, you create a random variable from a distribution like this:

X = Binomial(n, p)


This emulates the standard mathematical notation ``X ~ Binomial(n, p)``

That is, X *samples* from the Binomial distribution with count n andprobability p. But the current notation can also be interpreted as X*equals* this Binomial distribution, and it's unclear that the functionBinomial (or any of the distribution functions) returns a random variableand not the distribution itself. In fact, sympy.stats does not have anyclass or concept of Distribution.

My suggestion is to add ProbabilityDistribution to sympy.stats and changethe current syntax for creating new random variables. I'm not exactly sureon how this would interact with current ProbabilitySpaces (maybe we canjust rename BinomialPSpace to just Binomial and leave it at that). Itshould be visible to the user, unlike, say, PSpace, so the user can playwith it as well as with random variables.


We call a random variable as so:

X = RandomSymbol('X', dist=Binomial(n, p))

or another notation I was thinking of,

X = Binomial(n, p).new_symbol('X')

'Binomial' would in this case be a type of ProbabilityDistribution. This ismore verbose than the current way, but it makes it explicit that X is arandom symbol and not a distribution. This also gets rid of the issue ofgenerating default random symbol names. Previously you'd have to write

X = Binomial(n, p, symbol='X')

to bind the symbol name 'X' to the variable. Otherwise it would use adefault, incrementing symbol. The first notation appeals to me because itis similar to the notation for creating non-random symbols. The secondmight be more pleasant if we replace 'new_symbol' with something shorter...

Adding distributions would add a bunch of interesting issues. Twodistributions with the same parameter should be equal to each other, buttwo variables sampled from the same distribution aren't always equal.

BinomA = Binomial(1, S.Half)
BinomB = Binomial(1, S.Half)
BinomA == BinomB

True

X = RandomSymbol('X', BinomA)
Y = RandomSymbol('Y', BinomA)
P(Eq(X, Y))

0.5

Also, you shouldn't be able to call E (expected value) of a distribution,though you should store the mean as a static property.

E(X) == BinomA.mean

True

Var(X) == BinomA.variance

True

Density(X) == BinomA.pdf

True

But can you multiply distributions or transform them? They are, after all,generalized functions...


To summarize:
- Add the concept of ProbabilityDistribution to sympy.stats

- Functions like Binomial, Bernoulli, Gamma are now instances or subclassesof ProbabilityDistribution.

- Change the syntax of creating a random variable to be unambiguous.

- Distributions are static objects: they carry information like mean,variance, pdf, and two distributions are equal if they have the sameparameters


Benefits:

- Get rid of redundancy of creating a class for type of PSpace and then afunction to get the random variable of that PSpace.- Explicitly creating symbol names, no more default symbols with increasingnumbers

- Unambiguous creation of new random variables
- Simple ProbabilityDistribution concept visible to users.

Drawbacks:
- More verbose to create a new RV

- May be seen as complicating the already complicated sympy.stats classhierarchy


--
You received this message because you are subscribed to the Google Groups 
"sympy-issues" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/sympy-issues?hl=en.

Issue 3129 in sympy: Drastic change to sympy.stats: Adding concept of Probability Distributions on surface level

Reply via email to