subject:"\[Numpy\-discussion\] pareto docstring"

Re: [Numpy-discussion] pareto docstring

2010-05-11 Thread T J

On Mon, May 10, 2010 at 8:37 PM,  josef.p...@gmail.com wrote:

 I went googling and found a new interpretation

 numpy.random.pareto is actually the Lomax distribution also known as Pareto 2,
 Pareto (II) or Pareto Second Kind distribution


Great!


 So, from this it looks like numpy.random does not have a Pareto
 distribution, only Lomax, and the confusion came maybe because
 somewhere in the history the (II) (second kind) got dropped in the
 explanations.

 and actually it is in scipy.stats.distributions, but without rvs

 # LOMAX (Pareto of the second kind.)
 #  Special case of Pareto of the first kind (location=-1.0)


I understand the point with this last comment, but I think it can be
confusing in that the Pareto (of the first kind) has no location
parameter and people might think you are referring to the Generalized
Pareto distribution.  I think its much clearer to say:

 # Special case of the Pareto of the first kind, but shifted to the
left by 1.x -- x + 1



  2) Modify numpy/random/mtrand/distributions.c in the following way:

 double rk_pareto(rk_state *state, double a)
 {
   //return exp(rk_standard_exponential(state)/a) - 1;
   return 1.0 / rk_double(state)**(1.0 / a);
 }

 I'm not an expert on random number generator, but using the uniform 
 distribution
 as in
 http://en.wikipedia.org/wiki/Pareto_distribution#Generating_a_random_sample_from_Pareto_distribution
 and your Devroy reference seems better, than based on the relationship to
 the exponential distribution
 http://en.wikipedia.org/wiki/Pareto_distribution#Relation_to_the_exponential_distribution



Correct.  The exp relationship was for the existing implementation
(which corresponds to the Lomax).  I commented that line out and just
used 1/U^(1/a).


 I think without changing the source we can rewrite the docstring that
 this is Lomax (or
 Pareto of the Second Kind), so that at least the documentation is less
 misleading.

 But I find calling it Pareto very confusing, and I'm not the only one anymore,
 (and I don't know if anyone has used it assuming it is classical Pareto),
 so my preferred solution would be

 * rename numpy.random.pareto to numpy.random.lomax
 * and create a real (classical, first kind) pareto distribution (even
 though it's just
  adding or subtracting 1, ones we know it)


I personally have used numpy.random.pareto thinking it was the Pareto
distribution of the first kind---which led to this post in the first
place.  So, I'm in strong agreement.  While doing this, perhaps we
should increase functionality and allow users the ability to specify
the scale of the distribution (instead of just the shape)?

I can make a ticket for this and give a stab at creating the necessary patch.



 What's the backwards compatibility policy with very confusing names in numpy?


It seems reasonable that we might have to follow the deprecation
route, but I'd be happier with a faster fix.

1.5
  - Provide numpy.random.lomax.  Make numpy.random.pareto raise a
DeprecationWarning and then call lomax.
2.0 (if there is no 1.6)
  - Make numpy.random.pareto behave as Pareto distribution of 1st kind.

Immediately though, we can modify the docstring that is currently in
there to make the situation clear, instructing users how they can
generate samples from the standard Pareto distribution.  This is the
first patch I'll submit.  Perhaps it is better to only change the
docstring and then save all changes in functionality for 2.0.
Deferring to others on this one...
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] pareto docstring

2010-05-11 Thread David Goldsmith

On Tue, May 11, 2010 at 12:23 AM, T J tjhn...@gmail.com wrote:

 On Mon, May 10, 2010 at 8:37 PM,  josef.p...@gmail.com wrote:
 
  I went googling and found a new interpretation
 
  numpy.random.pareto is actually the Lomax distribution also known as
 Pareto 2,
  Pareto (II) or Pareto Second Kind distribution
 

 Great!

 
  So, from this it looks like numpy.random does not have a Pareto
  distribution, only Lomax, and the confusion came maybe because
  somewhere in the history the (II) (second kind) got dropped in the
  explanations.
 
  and actually it is in scipy.stats.distributions, but without rvs
 
  # LOMAX (Pareto of the second kind.)
  #  Special case of Pareto of the first kind (location=-1.0)
 

 I understand the point with this last comment, but I think it can be
 confusing in that the Pareto (of the first kind) has no location
 parameter and people might think you are referring to the Generalized
 Pareto distribution.  I think its much clearer to say:

  # Special case of the Pareto of the first kind, but shifted to the
 left by 1.x -- x + 1

 
 
   2) Modify numpy/random/mtrand/distributions.c in the following way:
 
  double rk_pareto(rk_state *state, double a)
  {
//return exp(rk_standard_exponential(state)/a) - 1;
return 1.0 / rk_double(state)**(1.0 / a);
  }
 
  I'm not an expert on random number generator, but using the uniform
 distribution
  as in
 
 http://en.wikipedia.org/wiki/Pareto_distribution#Generating_a_random_sample_from_Pareto_distribution
  and your Devroy reference seems better, than based on the relationship to
  the exponential distribution
 
 http://en.wikipedia.org/wiki/Pareto_distribution#Relation_to_the_exponential_distribution
 
 

 Correct.  The exp relationship was for the existing implementation
 (which corresponds to the Lomax).  I commented that line out and just
 used 1/U^(1/a).


  I think without changing the source we can rewrite the docstring that
  this is Lomax (or
  Pareto of the Second Kind), so that at least the documentation is less
  misleading.
 
  But I find calling it Pareto very confusing, and I'm not the only one
 anymore,
  (and I don't know if anyone has used it assuming it is classical Pareto),
  so my preferred solution would be
 
  * rename numpy.random.pareto to numpy.random.lomax
  * and create a real (classical, first kind) pareto distribution (even
  though it's just
   adding or subtracting 1, ones we know it)
 

 I personally have used numpy.random.pareto thinking it was the Pareto
 distribution of the first kind---which led to this post in the first
 place.  So, I'm in strong agreement.  While doing this, perhaps we
 should increase functionality and allow users the ability to specify
 the scale of the distribution (instead of just the shape)?

 I can make a ticket for this and give a stab at creating the necessary
 patch.


 
  What's the backwards compatibility policy with very confusing names in
 numpy?
 

 It seems reasonable that we might have to follow the deprecation
 route, but I'd be happier with a faster fix.

 1.5
  - Provide numpy.random.lomax.  Make numpy.random.pareto raise a
 DeprecationWarning and then call lomax.
 2.0 (if there is no 1.6)
  - Make numpy.random.pareto behave as Pareto distribution of 1st kind.

 Immediately though, we can modify the docstring that is currently in
 there to make the situation clear, instructing users how they can
 generate samples from the standard Pareto distribution.  This is the
 first patch I'll submit.  Perhaps it is better to only change the
 docstring and then save all changes in functionality for 2.0.
 Deferring to others on this one...


Elsewhere in the mailing list, it has been stated that our policy is to
document desired/intended behavior, when such differs from actual (current)
behavior.  This can be done in advance of a code fix to implement the
desired behavior, but we have discouraged (to the point of saying don't do
it) documenting current behavior when it is known that this should (and
presumably will) be changed.

DG


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion




-- 
Mathematician: noun, someone who disavows certainty when their uncertainty
set is non-empty, even if that set has measure zero.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] pareto docstring

2010-05-11 Thread Pauli Virtanen

Tue, 11 May 2010 00:23:52 -0700, T J wrote:
[clip]
 It seems reasonable that we might have to follow the deprecation route,
 but I'd be happier with a faster fix.
 
 1.5
   - Provide numpy.random.lomax.  Make numpy.random.pareto raise a
 DeprecationWarning and then call lomax.

 2.0 (if there is no 1.6)
   - Make numpy.random.pareto behave as Pareto distribution of 1st kind.

I think the next Numpy release will be 2.0.

How things were done with changes in the histogram function were:

1) Add a new=False keyword argument, and raise a DeprecationWarning
   if new==False. The user then must call it with pareto(..., new=True)
   to get the correct behaviour.

2) In the next release, change the default to new=True.

Another option would be to add a correct implementation with a different 
name, e.g. `pareto1` to signal it's the first kind, and deprecate the old 
function altogether.

A third option would be just to silently fix the bug. In any case the 
change should be mentioned noticeably in the release notes.

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] pareto docstring

2010-05-11 Thread Kevin Jacobs jac...@bioinformed.com

On Tue, May 11, 2010 at 4:14 AM, Pauli Virtanen p...@iki.fi wrote:

 A third option would be just to silently fix the bug. In any case the
 change should be mentioned noticeably in the release notes.


I see this as two bugs: the Lomax distribution was named incorrectly and the
Parato distribution was incorrect or confusingly labeled.  Both should be
fixed and clearly documented.  Unlike cases of changing tastes and
preferences, it seems unduly complicated and confusing to perseverate with
backward compatibility shims.  The next release is NumPy 2.0, which will
have other known and well advertised API and ABI incompatibilities.

Just my 2e-10 cents,
-Kevin
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] pareto docstring

2010-05-11 Thread josef . pktd

On Tue, May 11, 2010 at 8:11 AM, Kevin Jacobs jac...@bioinformed.com
bioinfor...@gmail.com wrote:
 On Tue, May 11, 2010 at 4:14 AM, Pauli Virtanen p...@iki.fi wrote:

 A third option would be just to silently fix the bug. In any case the
 change should be mentioned noticeably in the release notes.


 I see this as two bugs: the Lomax distribution was named incorrectly and the
 Parato distribution was incorrect or confusingly labeled.  Both should be
 fixed and clearly documented.  Unlike cases of changing tastes and
 preferences, it seems unduly complicated and confusing to perseverate with
 backward compatibility shims.  The next release is NumPy 2.0, which will
 have other known and well advertised API and ABI incompatibilities.
 Just my 2e-10 cents,
 -Kevin

I would have also considered it as a bug fix, except that there might be users
who use the correction (+1) as a workaround. In that case, just changing the
behavior without raising an exception for the current usage will introduce
hard to find bugs. (It's difficult to see whether the random numbers are correct
or as expected without proper testing.)

For example, we use the work-around in the docstring of
http://docs.scipy.org/numpy/docs/numpy.random.mtrand.RandomState.power/

and actually, reading the numpy.random.pareto docstring again more
carefully, the example does the correction also::

Draw samples from the distribution:
 a, m = 3., 1. # shape and mode
 s = np.random.pareto(a, 1000) + m


But it's very confusing, also there is a relationship between
Pareto/Lomax and GPD, but I'm not sure yet my algebra is
correct. (and I have misplaced the graphs and tables with the
relationships between different distributions)

To minimize backwards compatibility problems we could attach a *big*
warning text to pareto (use at your own risk)
and create new random variates, as Pauli proposed

pareto1 - classical pareto
pareto2 or lomax - with random variates the same as current pareto

both could then get clear,  unambiguous descriptions.


and a note to using the uniform distribution for the generation of
random numbers.
python 2.5 random.py uses the half open uniform distribution to
avoid division by zero, I don't know how numpy.random handles
boundary values

def paretovariate(self, alpha):
Pareto distribution.  alpha is the shape parameter.
# Jain, pg. 495

u = 1.0 - self.random()
return 1.0 / pow(u, 1.0/alpha)


Josef


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] pareto docstring

2010-05-11 Thread josef . pktd

Assuming no typos

Relationship between Pareto, Pareto(II)/Lomax and Generalized Pareto,GPD

 import sympy as sy
 x = sy.Symbol('x')
 k = sy.Symbol('k')
 c = sy.Symbol('c')
 a = sy.Symbol('a')
 m = sy.Symbol('m')
 mgpd = sy.Symbol('mgpd')

 gpd0 = (1 - c*x/k)**(1/c - 1)/k  #JKB notation (c reversed sign)
 gpd = 1/k/(1 + c*(x-mgpd)/k)**(1/c + 1) #similar to Wikipedia
 par = a*k**a/(x-m)**(1+a)  #JKB
 lom = a/k/(1+x/k)**(1+a)

 lom.subs(k,1)  #Pareto(II), Lomax  (loc=0, scale=1)
a*(1 + x)**(-1 - a)
 par.subs(k,1).subs(m,-1)  #Pareto with loc=-1, scale=1
a*(1 + x)**(-1 - a)
 gpd.subs(c,1/a).subs(k,1/a).subs(mgpd,0) #GPD with loc=0, scale=1/a
a*(1 + x)**(-1 - a)


 par.subs(k,1).subs(m,0)  # standard Pareto (loc=0, scale=1)
a*x**(-1 - a)
 gpd.subs(c,1/a).subs(k,1/a).subs(mgpd,1) #GPD with loc=1, scale=1/a
a*x**(-1 - a)

Josef
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] pareto docstring

2010-05-10 Thread T J

On Sun, May 9, 2010 at 4:49 AM,  josef.p...@gmail.com wrote:

 I think this is the same point, I was trying to make last year.

 Instead of renormalizing, my conclusion was the following,
 (copied from the mailinglist August last year)

 
 my conclusion:
 -
 What numpy.random.pareto actually produces, are random numbers from a
 pareto distribution with lower bound m=1, but location parameter
 loc=-1, that shifts the distribution to the left.

 To actually get useful  random numbers (that are correct in the usual
 usage http://en.wikipedia.org/wiki/Pareto_distribution), we need to
 add 1 to them.
 stats.distributions doesn't use mtrand.pareto

 rvs_pareto = 1 + numpy.random.pareto(a, size)

 

 I still have to work though the math of your argument, but maybe we
 can come to an agreement how the docstrings (or the function) should
 be changed, and what numpy.random.pareto really means.

 Josef
 (grateful, that there are another set of eyes on this)




Yes, I think my renormalizing statement is incorrect as it is really
just sampling from a different pdf altogether.  See the following image:

http://www.dumpt.com/img/viewer.php?file=q9tfk7ehxsw865vn067c.png

It plots histograms of the various implementations against the pdfs.
Summarizing:

The NumPy implementation is based on (Devroye p. 262).  The pdf listed
there is:

a / (1+x)^(a+1)

This differs from the standard Pareto pdf:

a / x^(a+1)

It also differs from the pdf of the generalized Pareto distribution,
with scale=1 and location=0:

(1 + a x)^(-1/a - 1)

And it also differs from the pdf of the generalized Pareto
distribution with scale=1 and location=-1  or location=1.

random.paretovariate and scipy.stats.pareto sample from the standard
Pareto, and this is the desired behavior, IMO.  Its true that 1 +
np.random.pareto provides the fix, but I think we're better off
changing the underlying implementation.  Devroye has a more recent
paper:

  http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.85.8760

which states the Pareto distribution in the standard way.  So I think
it is safe to make this change.  Backwards compatibility might be the
only argument for not making this change.  So here is my proposal:

  1) Remove every mention of the generalized Pareto distribution from
the docstring.  As far as I can see, the generalized Pareto
distribution does not reduce to the standard Pareto at all.  We can
still mention scipy.stats.distributions.genpareto and
scipy.stats.distributions.pareto.  The former is something different
and the latter will (now) be equivalent to the NumPy function.

  2) Modify numpy/random/mtrand/distributions.c in the following way:

double rk_pareto(rk_state *state, double a)
{
   //return exp(rk_standard_exponential(state)/a) - 1;
   return 1.0 / rk_double(state)**(1.0 / a);
}

Does this sound good?
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] pareto docstring

2010-05-10 Thread David Goldsmith

On Mon, May 10, 2010 at 11:14 AM, T J tjhn...@gmail.com wrote:

 On Sun, May 9, 2010 at 4:49 AM,  josef.p...@gmail.com wrote:
 
  I think this is the same point, I was trying to make last year.
 
  Instead of renormalizing, my conclusion was the following,
  (copied from the mailinglist August last year)
 
  
  my conclusion:
  -
  What numpy.random.pareto actually produces, are random numbers from a
  pareto distribution with lower bound m=1, but location parameter
  loc=-1, that shifts the distribution to the left.
 
  To actually get useful  random numbers (that are correct in the usual
  usage http://en.wikipedia.org/wiki/Pareto_distribution), we need to
  add 1 to them.
  stats.distributions doesn't use mtrand.pareto
 
  rvs_pareto = 1 + numpy.random.pareto(a, size)
 
  
 
  I still have to work though the math of your argument, but maybe we
  can come to an agreement how the docstrings (or the function) should
  be changed, and what numpy.random.pareto really means.
 
  Josef
  (grateful, that there are another set of eyes on this)
 
 


 Yes, I think my renormalizing statement is incorrect as it is really
 just sampling from a different pdf altogether.  See the following image:

 http://www.dumpt.com/img/viewer.php?file=q9tfk7ehxsw865vn067c.png

 It plots histograms of the various implementations against the pdfs.
 Summarizing:

 The NumPy implementation is based on (Devroye p. 262).  The pdf listed
 there is:

a / (1+x)^(a+1)

 This differs from the standard Pareto pdf:

a / x^(a+1)

 It also differs from the pdf of the generalized Pareto distribution,
 with scale=1 and location=0:

(1 + a x)^(-1/a - 1)

 And it also differs from the pdf of the generalized Pareto
 distribution with scale=1 and location=-1  or location=1.

 random.paretovariate and scipy.stats.pareto sample from the standard
 Pareto, and this is the desired behavior, IMO.  Its true that 1 +
 np.random.pareto provides the fix, but I think we're better off
 changing the underlying implementation.  Devroye has a more recent
 paper:

  http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.85.8760

 which states the Pareto distribution in the standard way.  So I think
 it is safe to make this change.  Backwards compatibility might be the
 only argument for not making this change.  So here is my proposal:

  1) Remove every mention of the generalized Pareto distribution from
 the docstring.  As far as I can see, the generalized Pareto
 distribution does not reduce to the standard Pareto at all.  We can
 still mention scipy.stats.distributions.genpareto and
 scipy.stats.distributions.pareto.  The former is something different
 and the latter will (now) be equivalent to the NumPy function.

  2) Modify numpy/random/mtrand/distributions.c in the following way:

 double rk_pareto(rk_state *state, double a)
 {
   //return exp(rk_standard_exponential(state)/a) - 1;
   return 1.0 / rk_double(state)**(1.0 / a);
 }

 Does this sound good?
 ___


Whatever the community decides, don't forget to please go through the formal
procedure of submitting a bug ticket so all of this is recorded in the
right way in the right place.  Thanks!

DG
-- 
Mathematician: noun, someone who disavows certainty when their uncertainty
set is non-empty, even if that set has measure zero.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] pareto docstring

2010-05-10 Thread josef . pktd

On Mon, May 10, 2010 at 2:14 PM, T J tjhn...@gmail.com wrote:
 On Sun, May 9, 2010 at 4:49 AM,  josef.p...@gmail.com wrote:

 I think this is the same point, I was trying to make last year.

 Instead of renormalizing, my conclusion was the following,
 (copied from the mailinglist August last year)

 
 my conclusion:
 -
 What numpy.random.pareto actually produces, are random numbers from a
 pareto distribution with lower bound m=1, but location parameter
 loc=-1, that shifts the distribution to the left.

 To actually get useful  random numbers (that are correct in the usual
 usage http://en.wikipedia.org/wiki/Pareto_distribution), we need to
 add 1 to them.
 stats.distributions doesn't use mtrand.pareto

 rvs_pareto = 1 + numpy.random.pareto(a, size)

 

 I still have to work though the math of your argument, but maybe we
 can come to an agreement how the docstrings (or the function) should
 be changed, and what numpy.random.pareto really means.

 Josef
 (grateful, that there are another set of eyes on this)




 Yes, I think my renormalizing statement is incorrect as it is really
 just sampling from a different pdf altogether.  See the following image:

 http://www.dumpt.com/img/viewer.php?file=q9tfk7ehxsw865vn067c.png

 It plots histograms of the various implementations against the pdfs.
 Summarizing:

 The NumPy implementation is based on (Devroye p. 262).  The pdf listed
 there is:

    a / (1+x)^(a+1)

 This differs from the standard Pareto pdf:

    a / x^(a+1)

 It also differs from the pdf of the generalized Pareto distribution,
 with scale=1 and location=0:

    (1 + a x)^(-1/a - 1)

 And it also differs from the pdf of the generalized Pareto
 distribution with scale=1 and location=-1  or location=1.

 random.paretovariate and scipy.stats.pareto sample from the standard
 Pareto, and this is the desired behavior, IMO.  Its true that 1 +
 np.random.pareto provides the fix, but I think we're better off
 changing the underlying implementation.  Devroye has a more recent
 paper:

  http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.85.8760

 which states the Pareto distribution in the standard way.  So I think
 it is safe to make this change.  Backwards compatibility might be the
 only argument for not making this change.  So here is my proposal:

  1) Remove every mention of the generalized Pareto distribution from
 the docstring.  As far as I can see, the generalized Pareto
 distribution does not reduce to the standard Pareto at all.  We can
 still mention scipy.stats.distributions.genpareto and
 scipy.stats.distributions.pareto.  The former is something different
 and the latter will (now) be equivalent to the NumPy function.

  2) Modify numpy/random/mtrand/distributions.c in the following way:

 double rk_pareto(rk_state *state, double a)
 {
   //return exp(rk_standard_exponential(state)/a) - 1;
   return 1.0 / rk_double(state)**(1.0 / a);
 }

 Does this sound good?

I went googling and found a new interpretation

numpy.random.pareto is actually the Lomax distribution also known as Pareto 2,
Pareto (II) or Pareto Second Kind distribution

http://www.itl.nist.gov/div898/software/dataplot/refman2/auxillar/pa2pdf.htm
http://www.mathwave.com/articles/pareto_2_lomax_distribution.html
http://hosho.ees.hokudai.ac.jp/~kubo/Rdoc/library/VGAM/html/lomax.html

which is different from the Pareto (First Kind) distribution
http://www.mathwave.com/help/easyfit/html/analyses/distributions/pareto.html
http://en.wikipedia.org/wiki/Pareto_distribution#Density_function

The R-VGAM docs have the reference to
http://books.google.ca/books?id=pGsAZ3W7uEACpg=PA226lpg=PA226dq=Kleiber,+C.+and+Kotz,+S.+%282003%29+Statistical+Size+Distributions+in+Economics+and+Actuarial+Sciences+pareto+lomaxsource=blots=j1-AoRxm5Esig=KDrWehJW5kt-EKH1VjDe-lFMRpwhl=enei=csPoS_bgDYT58Aau46z2Cgsa=Xoi=book_resultct=resultresnum=1ved=0CBUQ6AEwAA#v=onepageqf=false

which states on page 227

X is distributed as Lomax(b,q) = X + b is distributed Pareto(b, q)

where b is the scale parameter b=1 in numpy.random.pareto notation
and q is the shape parameter alpha

quote:
... the Pareto (II) distribution - after all, it's just a shifted
classical Pareto distribution - ...

So, from this it looks like numpy.random does not have a Pareto
distribution, only Lomax, and the confusion came maybe because
somewhere in the history the (II) (second kind) got dropped in the
explanations.

and actually it is in scipy.stats.distributions, but without rvs

# LOMAX (Pareto of the second kind.)
#  Special case of Pareto of the first kind (location=-1.0)

class lomax_gen(rv_continuous):
def _pdf(self, x, c):
return c*1.0/(1.0+x)**(c+1.0)
def _cdf(self, x, c):
return 1.0-1.0/(1.0+x)**c
def _ppf(self, q, c):
return pow(1.0-q,-1.0/c)-1
def _stats(self, c):
mu, mu2, g1, g2 = pareto.stats(c, loc=-1.0, moments='mvsk')
return mu, mu2, g1, g2
def _entropy(self, c):
return 1+1.0/c-log(c)

Re: [Numpy-discussion] pareto docstring

2010-05-09 Thread josef . pktd

On Sun, May 9, 2010 at 1:01 AM, T J tjhn...@gmail.com wrote:
 The docstring for np.pareto says:

    This is a simplified version of the Generalized Pareto distribution
    (available in SciPy), with the scale set to one and the location set to
    zero. Most authors default the location to one.

 and also:

    The probability density for the Pareto distribution is

    .. math:: p(x) = \frac{am^a}{x^{a+1}}

    where :math:`a` is the shape and :math:`m` the location

 These two statements seem to be in contradiction.  I think what was
 meant is that m is the scale, rather than the location.  For if m were
 equal to zero, as the first portion of the docstring states, then the
 entire pdf would be zero for all shapes a0.


 

 Also,  I'm not quite understanding how the stated pdf is actually the
 same as the pdf for the generalized pareto with the scale=1 and
 location=0.  By the wikipedia definition of the generalized Pareto
 distribution, if we take \sigma=1 (scale equal to one)  and \mu=0
 (location equal to zero), then we get:

 (1 + a x)^(-1/a - 1)

 which is normalized over $x \in (0, \infty)$.  If we compare this to
 the distribution stated in the docstring (with m=1)

 a x^{-a-1}

 we see that it is normalized over $x \in (1, \infty)$.  And indeed,
 the distribution requires x  scale = 1.

 If we integrate the generalized Pareto (with scale=1, location=0) over
 $x \in (1, \infty)$ then we have to re-normalize.  So should the
 docstring say:

   This is a simplified version of the Generalized Pareto distribution
   (available in Scipy), with the scale set to one, the location set to zero,
   and the distribution re-normalized over the range (1, \infty). Most
   authors default the location to one.

I think this is the same point, I was trying to make last year.

Instead of renormalizing, my conclusion was the following,
(copied from the mailinglist August last year)


my conclusion:
-
What numpy.random.pareto actually produces, are random numbers from a
pareto distribution with lower bound m=1, but location parameter
loc=-1, that shifts the distribution to the left.

To actually get useful  random numbers (that are correct in the usual
usage http://en.wikipedia.org/wiki/Pareto_distribution), we need to
add 1 to them.
stats.distributions doesn't use mtrand.pareto

rvs_pareto = 1 + numpy.random.pareto(a, size)



I still have to work though the math of your argument, but maybe we
can come to an agreement how the docstrings (or the function) should
be changed, and what numpy.random.pareto really means.

Josef
(grateful, that there are another set of eyes on this)


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] pareto docstring

2010-05-08 Thread T J

The docstring for np.pareto says:

This is a simplified version of the Generalized Pareto distribution
(available in SciPy), with the scale set to one and the location set to
zero. Most authors default the location to one.

and also:

The probability density for the Pareto distribution is

.. math:: p(x) = \frac{am^a}{x^{a+1}}

where :math:`a` is the shape and :math:`m` the location

These two statements seem to be in contradiction.  I think what was
meant is that m is the scale, rather than the location.  For if m were
equal to zero, as the first portion of the docstring states, then the
entire pdf would be zero for all shapes a0.




Also,  I'm not quite understanding how the stated pdf is actually the
same as the pdf for the generalized pareto with the scale=1 and
location=0.  By the wikipedia definition of the generalized Pareto
distribution, if we take \sigma=1 (scale equal to one)  and \mu=0
(location equal to zero), then we get:

(1 + a x)^(-1/a - 1)

which is normalized over $x \in (0, \infty)$.  If we compare this to
the distribution stated in the docstring (with m=1)

a x^{-a-1}

we see that it is normalized over $x \in (1, \infty)$.  And indeed,
the distribution requires x  scale = 1.

If we integrate the generalized Pareto (with scale=1, location=0) over
$x \in (1, \infty)$ then we have to re-normalize.  So should the
docstring say:

   This is a simplified version of the Generalized Pareto distribution
   (available in Scipy), with the scale set to one, the location set to zero,
   and the distribution re-normalized over the range (1, \infty). Most
   authors default the location to one.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] pareto docstring

Re: [Numpy-discussion] pareto docstring

Re: [Numpy-discussion] pareto docstring

Re: [Numpy-discussion] pareto docstring

Re: [Numpy-discussion] pareto docstring

Re: [Numpy-discussion] pareto docstring

Re: [Numpy-discussion] pareto docstring

Re: [Numpy-discussion] pareto docstring

Re: [Numpy-discussion] pareto docstring

Re: [Numpy-discussion] pareto docstring

[Numpy-discussion] pareto docstring

11 matches

Site Navigation

Mail list logo

Footer information