subject:"\[Numpy\-discussion\] scipy.stats.qqplot and scipy.stats.probplot axis labeling"

Re: [Numpy-discussion] scipy.stats.qqplot and scipy.stats.probplot axis labeling

2016-06-12 Thread Ralf Gommers

On Sat, Jun 11, 2016 at 2:53 PM, Ralf Gommers 
wrote:

> Hi Mark,
>
> Note that the scipy-dev or scipy-user mailing list would have been more
> appropriate for this question.
>
>
> On Fri, Jun 10, 2016 at 9:06 AM, Mark Gawron  wrote:
>
>>
>>
>> The scipy.stats.qqplot and scipy.stats.probplot  functions plot expected
>> values versus actual data values for visualization of fit to a
>> distribution.  First a one-D array of expected percentiles is generated for
>>  a sample of size N; then that is passed to  dist.ppf, the per cent point
>> function for the chosen distribution, to return an array of expected
>> values.  The visualized data points are pairs of expected and actual
>> values, and a linear regression is done on these to produce the line data
>> points in this distribution should lie on.
>>
>> Where x is the input data array and dist the chosen distribution we have:
>>
>> osr = np.sort(x)
>> osm_uniform = _calc_uniform_order_statistic_medians(len(x))
>> osm = dist.ppf(osm_uniform)
>> slope, intercept, r, prob, sterrest = stats.linregress(osm, osr)
>>
>>
>> My question concerns the plot display.
>>
>> ax.plot(osm, osr, 'bo', osm, slope*osm + intercept, 'r-')
>>
>>
>> The x-axis of the resulting plot is labeled quantiles, but the xticks and
>> xticklabels produced produced by qqplot and problplot do not seem correct
>> for the their intended interpretations.  First the numbers on the x-axis do
>> not represent quantiles; the intervals between them do not in general
>> contain equal numbers of points.  For a normal distribution with sigma=1,
>> they represent standard deviations.  Changing the label on the x-axis does
>> not seem like a very good solution, because the interpretation of the
>> values on the x-axis will be different for different distributions.  Rather
>> the right solution seems to be to actually show quantiles on the x-axis.
>> The numbers on the x-axis can stay as they are, representing quantile
>> indexes, but they need to be spaced so as to show the actual division
>> points that carve the population up into  groups of the same size.  This
>> can be done in something like the following way.
>>
>
> The ticks are correct I think, but they're theoretical quantiles and not
> sample quantiles. This was discussed in [1] and is consistent with R [2]
> and statsmodels [3]. I see that we just forgot to add "theoretical" to the
> x-axis label (mea culpa). Does adding that resolve your concern?
>

Sent a PR for this: https://github.com/scipy/scipy/pull/6249

Ralf



>
> [1] https://github.com/scipy/scipy/issues/1821
> [2] http://data.library.virginia.edu/understanding-q-q-plots/
> [3]
> http://statsmodels.sourceforge.net/devel/generated/statsmodels.graphics.gofplots.qqplot.html?highlight=qqplot#statsmodels.graphics.gofplots.qqplot
>
> Ralf
>
>
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] scipy.stats.qqplot and scipy.stats.probplot axis labeling

2016-06-11 Thread Mark Gawron

Ok,

Our messages crossed.  I understand now.

Thanks.

Mark
On Jun 11, 2016, at 12:24 PM, josef.p...@gmail.com wrote:

> 
> 
> On Sat, Jun 11, 2016 at 2:49 PM, Mark Gawron  wrote:
> Thanks, Jozef.  This is very helpful.  And I will direct this
> to one of the other mailing lists, once I read the previous posts.
> 
> Regarding your remark:  Maybe Im having a terminology problem.  It seems to 
> me once you do
> 
>>> osm = dist.ppf(osm_uniform)
> 
> you’re back in the value space for the particular distribution. So this
> gives you known probability intervals, but not UNIFORM probability
> intervals (the interval between 0 and 1 STD covers a bigger prob interval
> than the the interval between 1 and 2).  And the idea of a quantile is
> that it’s a division point in a UNIFORM division of the probability axis.
> 
> 
> Yes and No, quantile, i.e. what you get from ppf, are units of the random 
> variable. So it is on the scale of the random variable not on a probability 
> scale. The axis labels are in units of the random variable.
> 
> pp-plots have probabilities on the axis and are uniform scaled in 
> probabilities but non-uniform in the values of the random variable.
> 
> The difficult part to follow is if the plot is done uniform in one scale, but 
> the axis are labeled non-uniform in the other scale. That's what Paul's 
> probscale does and what you have in mind, AFAIU.
> 
> Josef
>  
> 
> Mark
> 
> On Jun 11, 2016, at 10:03 AM, josef.p...@gmail.com wrote:
> 
>> 
>> 
>> On Sat, Jun 11, 2016 at 8:53 AM, Ralf Gommers  wrote:
>> Hi Mark,
>> 
>> Note that the scipy-dev or scipy-user mailing list would have been more 
>> appropriate for this question. 
>> 
>> 
>> On Fri, Jun 10, 2016 at 9:06 AM, Mark Gawron  wrote:
>> 
>> 
>> The scipy.stats.qqplot and scipy.stats.probplot  functions plot expected 
>> values versus actual data values for visualization of fit to a distribution. 
>>  First a one-D array of expected percentiles is generated for  a sample of 
>> size N; then that is passed to  dist.ppf, the per cent point function for 
>> the chosen distribution, to return an array of expected values.  The 
>> visualized data points are pairs of expected and actual values, and a linear 
>> regression is done on these to produce the line data points in this 
>> distribution should lie on.
>> 
>> Where x is the input data array and dist the chosen distribution we have:
>> 
>>> osr = np.sort(x)
>>> osm_uniform = _calc_uniform_order_statistic_medians(len(x))
>>> osm = dist.ppf(osm_uniform)
>>> slope, intercept, r, prob, sterrest = stats.linregress(osm, osr)
>> 
>> My question concerns the plot display.  
>> 
>>> ax.plot(osm, osr, 'bo', osm, slope*osm + intercept, 'r-')
>> 
>> 
>> The x-axis of the resulting plot is labeled quantiles, but the xticks and 
>> xticklabels produced produced by qqplot and problplot do not seem correct 
>> for the their intended interpretations.  First the numbers on the x-axis do 
>> not represent quantiles; the intervals between them do not in general 
>> contain equal numbers of points.  For a normal distribution with sigma=1, 
>> they represent standard deviations.  Changing the label on the x-axis does 
>> not seem like a very good solution, because the interpretation of the values 
>> on the x-axis will be different for different distributions.  Rather the 
>> right solution seems to be to actually show quantiles on the x-axis. The 
>> numbers on the x-axis can stay as they are, representing quantile indexes, 
>> but they need to be spaced so as to show the actual division points that 
>> carve the population up into  groups of the same size.  This can be done in 
>> something like the following way. 
>> 
>> The ticks are correct I think, but they're theoretical quantiles and not 
>> sample quantiles. This was discussed in [1] and is consistent with R [2] and 
>> statsmodels [3]. I see that we just forgot to add "theoretical" to the 
>> x-axis label (mea culpa). Does adding that resolve your concern?
>> 
>> [1] https://github.com/scipy/scipy/issues/1821
>> [2] http://data.library.virginia.edu/understanding-q-q-plots/
>> [3] 
>> http://statsmodels.sourceforge.net/devel/generated/statsmodels.graphics.gofplots.qqplot.html?highlight=qqplot#statsmodels.graphics.gofplots.qqplot
>> 
>> Ralf
>> 
>> 
>> as related link 
>> http://phobson.github.io/mpl-probscale/tutorial/closer_look_at_viz.html
>> 
>> Paul Hobson has done a lot of work for getting different probabitlity scales 
>> attached to pp-plots or generalized versions of probability plots. I think 
>> qqplots are less ambiguous because they are on the original or standardized 
>> scale.
>> 
>> I haven't worked my way through the various interpretation of probability 
>> axis yet because I find it "not obvious". It might be easier for fields that 
>> have a tradition of using probability papers.
>> 
>> It's planned to be added to the statsmodels probability plots so that

Re: [Numpy-discussion] scipy.stats.qqplot and scipy.stats.probplot axis labeling

2016-06-11 Thread josef . pktd

On Sat, Jun 11, 2016 at 2:49 PM, Mark Gawron  wrote:

> Thanks, Jozef.  This is very helpful.  And I will direct this
> to one of the other mailing lists, once I read the previous posts.
>
> Regarding your remark:  Maybe Im having a terminology problem.  It seems
> to me once you do
>
> osm = dist.ppf(osm_uniform)
>>>
>>>
> you’re back in the value space for the particular distribution. So this
> gives you known probability intervals, but not UNIFORM probability
> intervals (the interval between 0 and 1 STD covers a bigger prob interval
> than the the interval between 1 and 2).  And the idea of a quantile is
> that it’s a division point in a UNIFORM division of the probability axis.
>


Yes and No, quantile, i.e. what you get from ppf, are units of the random
variable. So it is on the scale of the random variable not on a probability
scale. The axis labels are in units of the random variable.

pp-plots have probabilities on the axis and are uniform scaled in
probabilities but non-uniform in the values of the random variable.

The difficult part to follow is if the plot is done uniform in one scale,
but the axis are labeled non-uniform in the other scale. That's what Paul's
probscale does and what you have in mind, AFAIU.

Josef


>
> Mark
>
> On Jun 11, 2016, at 10:03 AM, josef.p...@gmail.com wrote:
>
>
>
> On Sat, Jun 11, 2016 at 8:53 AM, Ralf Gommers 
> wrote:
>
>> Hi Mark,
>>
>> Note that the scipy-dev or scipy-user mailing list would have been more
>> appropriate for this question.
>>
>>
>> On Fri, Jun 10, 2016 at 9:06 AM, Mark Gawron 
>> wrote:
>>
>>>
>>>
>>> The scipy.stats.qqplot and scipy.stats.probplot  functions plot expected
>>> values versus actual data values for visualization of fit to a
>>> distribution.  First a one-D array of expected percentiles is generated for
>>>  a sample of size N; then that is passed to  dist.ppf, the per cent point
>>> function for the chosen distribution, to return an array of expected
>>> values.  The visualized data points are pairs of expected and actual
>>> values, and a linear regression is done on these to produce the line data
>>> points in this distribution should lie on.
>>>
>>> Where x is the input data array and dist the chosen distribution we have:
>>>
>>> osr = np.sort(x)
>>> osm_uniform = _calc_uniform_order_statistic_medians(len(x))
>>> osm = dist.ppf(osm_uniform)
>>> slope, intercept, r, prob, sterrest = stats.linregress(osm, osr)
>>>
>>>
>>> My question concerns the plot display.
>>>
>>> ax.plot(osm, osr, 'bo', osm, slope*osm + intercept, 'r-')
>>>
>>>
>>> The x-axis of the resulting plot is labeled quantiles, but the xticks
>>> and xticklabels produced produced by qqplot and problplot do not seem
>>> correct for the their intended interpretations.  First the numbers on the
>>> x-axis do not represent quantiles; the intervals between them do not in
>>> general contain equal numbers of points.  For a normal distribution with
>>> sigma=1, they represent standard deviations.  Changing the label on the
>>> x-axis does not seem like a very good solution, because the interpretation
>>> of the values on the x-axis will be different for different distributions.
>>> Rather the right solution seems to be to actually show quantiles on the
>>> x-axis. The numbers on the x-axis can stay as they are, representing
>>> quantile indexes, but they need to be spaced so as to show the actual
>>> division points that carve the population up into  groups of the same
>>> size.  This can be done in something like the following way.
>>>
>>
>> The ticks are correct I think, but they're theoretical quantiles and not
>> sample quantiles. This was discussed in [1] and is consistent with R [2]
>> and statsmodels [3]. I see that we just forgot to add "theoretical" to the
>> x-axis label (mea culpa). Does adding that resolve your concern?
>>
>> [1] https://github.com/scipy/scipy/issues/1821
>> [2] http://data.library.virginia.edu/understanding-q-q-plots/
>> [3]
>> http://statsmodels.sourceforge.net/devel/generated/statsmodels.graphics.gofplots.qqplot.html?highlight=qqplot#statsmodels.graphics.gofplots.qqplot
>>
>> Ralf
>>
>>
> as related link
> http://phobson.github.io/mpl-probscale/tutorial/closer_look_at_viz.html
>
> Paul Hobson has done a lot of work for getting different probabitlity
> scales attached to pp-plots or generalized versions of probability plots. I
> think qqplots are less ambiguous because they are on the original or
> standardized scale.
>
> I haven't worked my way through the various interpretation of probability
> axis yet because I find it "not obvious". It might be easier for fields
> that have a tradition of using probability papers.
>
> It's planned to be added to the statsmodels probability plots so that
> there will be a large choice of axis labels and scales.
>
> Josef
>
>
>>
>>
>>
>> ___
>> NumPy-Discussion mailing list
>>

Re: [Numpy-discussion] scipy.stats.qqplot and scipy.stats.probplot axis labeling

2016-06-11 Thread Mark Gawron

Thanks, Jozef.  This is very helpful.  And I will direct this
to one of the other mailing lists, once I read the previous posts.

Regarding your remark:  Maybe Im having a terminology problem.  It seems to me 
once you do

>> osm = dist.ppf(osm_uniform)

you’re back in the value space for the particular distribution. So this
gives you known probability intervals, but not UNIFORM probability
intervals (the interval between 0 and 1 STD covers a bigger prob interval
than the the interval between 1 and 2).  And the idea of a quantile is
that it’s a division point in a UNIFORM division of the probability axis.

Mark
On Jun 11, 2016, at 10:03 AM, josef.p...@gmail.com wrote:

> 
> 
> On Sat, Jun 11, 2016 at 8:53 AM, Ralf Gommers  wrote:
> Hi Mark,
> 
> Note that the scipy-dev or scipy-user mailing list would have been more 
> appropriate for this question. 
> 
> 
> On Fri, Jun 10, 2016 at 9:06 AM, Mark Gawron  wrote:
> 
> 
> The scipy.stats.qqplot and scipy.stats.probplot  functions plot expected 
> values versus actual data values for visualization of fit to a distribution.  
> First a one-D array of expected percentiles is generated for  a sample of 
> size N; then that is passed to  dist.ppf, the per cent point function for the 
> chosen distribution, to return an array of expected values.  The visualized 
> data points are pairs of expected and actual values, and a linear regression 
> is done on these to produce the line data points in this distribution should 
> lie on.
> 
> Where x is the input data array and dist the chosen distribution we have:
> 
>> osr = np.sort(x)
>> osm_uniform = _calc_uniform_order_statistic_medians(len(x))
>> osm = dist.ppf(osm_uniform)
>> slope, intercept, r, prob, sterrest = stats.linregress(osm, osr)
> 
> My question concerns the plot display.  
> 
>> ax.plot(osm, osr, 'bo', osm, slope*osm + intercept, 'r-')
> 
> 
> The x-axis of the resulting plot is labeled quantiles, but the xticks and 
> xticklabels produced produced by qqplot and problplot do not seem correct for 
> the their intended interpretations.  First the numbers on the x-axis do not 
> represent quantiles; the intervals between them do not in general contain 
> equal numbers of points.  For a normal distribution with sigma=1, they 
> represent standard deviations.  Changing the label on the x-axis does not 
> seem like a very good solution, because the interpretation of the values on 
> the x-axis will be different for different distributions.  Rather the right 
> solution seems to be to actually show quantiles on the x-axis. The numbers on 
> the x-axis can stay as they are, representing quantile indexes, but they need 
> to be spaced so as to show the actual division points that carve the 
> population up into  groups of the same size.  This can be done in something 
> like the following way. 
> 
> The ticks are correct I think, but they're theoretical quantiles and not 
> sample quantiles. This was discussed in [1] and is consistent with R [2] and 
> statsmodels [3]. I see that we just forgot to add "theoretical" to the x-axis 
> label (mea culpa). Does adding that resolve your concern?
> 
> [1] https://github.com/scipy/scipy/issues/1821
> [2] http://data.library.virginia.edu/understanding-q-q-plots/
> [3] 
> http://statsmodels.sourceforge.net/devel/generated/statsmodels.graphics.gofplots.qqplot.html?highlight=qqplot#statsmodels.graphics.gofplots.qqplot
> 
> Ralf
> 
> 
> as related link 
> http://phobson.github.io/mpl-probscale/tutorial/closer_look_at_viz.html
> 
> Paul Hobson has done a lot of work for getting different probabitlity scales 
> attached to pp-plots or generalized versions of probability plots. I think 
> qqplots are less ambiguous because they are on the original or standardized 
> scale.
> 
> I haven't worked my way through the various interpretation of probability 
> axis yet because I find it "not obvious". It might be easier for fields that 
> have a tradition of using probability papers.
> 
> It's planned to be added to the statsmodels probability plots so that there 
> will be a large choice of axis labels and scales.
> 
> Josef
>  
>  
> 
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
> 
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] scipy.stats.qqplot and scipy.stats.probplot axis labeling

2016-06-11 Thread josef . pktd

On Sat, Jun 11, 2016 at 8:53 AM, Ralf Gommers 
wrote:

> Hi Mark,
>
> Note that the scipy-dev or scipy-user mailing list would have been more
> appropriate for this question.
>
>
> On Fri, Jun 10, 2016 at 9:06 AM, Mark Gawron  wrote:
>
>>
>>
>> The scipy.stats.qqplot and scipy.stats.probplot  functions plot expected
>> values versus actual data values for visualization of fit to a
>> distribution.  First a one-D array of expected percentiles is generated for
>>  a sample of size N; then that is passed to  dist.ppf, the per cent point
>> function for the chosen distribution, to return an array of expected
>> values.  The visualized data points are pairs of expected and actual
>> values, and a linear regression is done on these to produce the line data
>> points in this distribution should lie on.
>>
>> Where x is the input data array and dist the chosen distribution we have:
>>
>> osr = np.sort(x)
>> osm_uniform = _calc_uniform_order_statistic_medians(len(x))
>> osm = dist.ppf(osm_uniform)
>> slope, intercept, r, prob, sterrest = stats.linregress(osm, osr)
>>
>>
>> My question concerns the plot display.
>>
>> ax.plot(osm, osr, 'bo', osm, slope*osm + intercept, 'r-')
>>
>>
>> The x-axis of the resulting plot is labeled quantiles, but the xticks and
>> xticklabels produced produced by qqplot and problplot do not seem correct
>> for the their intended interpretations.  First the numbers on the x-axis do
>> not represent quantiles; the intervals between them do not in general
>> contain equal numbers of points.  For a normal distribution with sigma=1,
>> they represent standard deviations.  Changing the label on the x-axis does
>> not seem like a very good solution, because the interpretation of the
>> values on the x-axis will be different for different distributions.  Rather
>> the right solution seems to be to actually show quantiles on the x-axis.
>> The numbers on the x-axis can stay as they are, representing quantile
>> indexes, but they need to be spaced so as to show the actual division
>> points that carve the population up into  groups of the same size.  This
>> can be done in something like the following way.
>>
>
> The ticks are correct I think, but they're theoretical quantiles and not
> sample quantiles. This was discussed in [1] and is consistent with R [2]
> and statsmodels [3]. I see that we just forgot to add "theoretical" to the
> x-axis label (mea culpa). Does adding that resolve your concern?
>
> [1] https://github.com/scipy/scipy/issues/1821
> [2] http://data.library.virginia.edu/understanding-q-q-plots/
> [3]
> http://statsmodels.sourceforge.net/devel/generated/statsmodels.graphics.gofplots.qqplot.html?highlight=qqplot#statsmodels.graphics.gofplots.qqplot
>
> Ralf
>
>
as related link
http://phobson.github.io/mpl-probscale/tutorial/closer_look_at_viz.html

Paul Hobson has done a lot of work for getting different probabitlity
scales attached to pp-plots or generalized versions of probability plots. I
think qqplots are less ambiguous because they are on the original or
standardized scale.

I haven't worked my way through the various interpretation of probability
axis yet because I find it "not obvious". It might be easier for fields
that have a tradition of using probability papers.

It's planned to be added to the statsmodels probability plots so that there
will be a large choice of axis labels and scales.

Josef


>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] scipy.stats.qqplot and scipy.stats.probplot axis labeling

2016-06-11 Thread Ralf Gommers

Hi Mark,

Note that the scipy-dev or scipy-user mailing list would have been more
appropriate for this question.


On Fri, Jun 10, 2016 at 9:06 AM, Mark Gawron  wrote:

>
>
> The scipy.stats.qqplot and scipy.stats.probplot  functions plot expected
> values versus actual data values for visualization of fit to a
> distribution.  First a one-D array of expected percentiles is generated for
>  a sample of size N; then that is passed to  dist.ppf, the per cent point
> function for the chosen distribution, to return an array of expected
> values.  The visualized data points are pairs of expected and actual
> values, and a linear regression is done on these to produce the line data
> points in this distribution should lie on.
>
> Where x is the input data array and dist the chosen distribution we have:
>
> osr = np.sort(x)
> osm_uniform = _calc_uniform_order_statistic_medians(len(x))
> osm = dist.ppf(osm_uniform)
> slope, intercept, r, prob, sterrest = stats.linregress(osm, osr)
>
>
> My question concerns the plot display.
>
> ax.plot(osm, osr, 'bo', osm, slope*osm + intercept, 'r-')
>
>
> The x-axis of the resulting plot is labeled quantiles, but the xticks and
> xticklabels produced produced by qqplot and problplot do not seem correct
> for the their intended interpretations.  First the numbers on the x-axis do
> not represent quantiles; the intervals between them do not in general
> contain equal numbers of points.  For a normal distribution with sigma=1,
> they represent standard deviations.  Changing the label on the x-axis does
> not seem like a very good solution, because the interpretation of the
> values on the x-axis will be different for different distributions.  Rather
> the right solution seems to be to actually show quantiles on the x-axis.
> The numbers on the x-axis can stay as they are, representing quantile
> indexes, but they need to be spaced so as to show the actual division
> points that carve the population up into  groups of the same size.  This
> can be done in something like the following way.
>

The ticks are correct I think, but they're theoretical quantiles and not
sample quantiles. This was discussed in [1] and is consistent with R [2]
and statsmodels [3]. I see that we just forgot to add "theoretical" to the
x-axis label (mea culpa). Does adding that resolve your concern?

[1] https://github.com/scipy/scipy/issues/1821
[2] http://data.library.virginia.edu/understanding-q-q-plots/
[3]
http://statsmodels.sourceforge.net/devel/generated/statsmodels.graphics.gofplots.qqplot.html?highlight=qqplot#statsmodels.graphics.gofplots.qqplot

Ralf
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] scipy.stats.qqplot and scipy.stats.probplot axis labeling

2016-06-10 Thread Mark Gawron



The scipy.stats.qqplot and scipy.stats.probplot  functions plot expected values 
versus actual data values for visualization of fit to a distribution.  First a 
one-D array of expected percentiles is generated for  a sample of size N; then 
that is passed to  dist.ppf, the per cent point function for the chosen 
distribution, to return an array of expected values.  The visualized data 
points are pairs of expected and actual values, and a linear regression is done 
on these to produce the line data points in this distribution should lie on.

Where x is the input data array and dist the chosen distribution we have:

> osr = np.sort(x)
> osm_uniform = _calc_uniform_order_statistic_medians(len(x))
> osm = dist.ppf(osm_uniform)
> slope, intercept, r, prob, sterrest = stats.linregress(osm, osr)

My question concerns the plot display.  

> ax.plot(osm, osr, 'bo', osm, slope*osm + intercept, 'r-')


The x-axis of the resulting plot is labeled quantiles, but the xticks and 
xticklabels produced produced by qqplot and problplot do not seem correct for 
the their intended interpretations.  First the numbers on the x-axis do not 
represent quantiles; the intervals between them do not in general contain equal 
numbers of points.  For a normal distribution with sigma=1, they represent 
standard deviations.  Changing the label on the x-axis does not seem like a 
very good solution, because the interpretation of the values on the x-axis will 
be different for different distributions.  Rather the right solution seems to 
be to actually show quantiles on the x-axis. The numbers on the x-axis can stay 
as they are, representing quantile indexes, but they need to be spaced so as to 
show the actual division points that carve the population up into  groups of 
the same size.  This can be done in something like the following way. 

> import numpy as np
> xt = np.arange(-3,3,dtype=int)

> # Find the 5 quantiles to divide the data into sixths
> percentiles = [x*.167 + .502 for x in xt]
> percentiles = np.array(percentiles + [.999])
> vals = dist.ppf(percentiles)
> ax.set_xticks(vals)
> xt = np.array(list(xt)+[3])
> ax.set_xticklabels(xt)
> ax.set_xlabel('Quantile')
> plt.show()



I’ve attached two images to show the difference between the current 
visualization and the suggested one.

Mark Gawron




___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] scipy.stats.qqplot and scipy.stats.probplot axis labeling

Re: [Numpy-discussion] scipy.stats.qqplot and scipy.stats.probplot axis labeling

Re: [Numpy-discussion] scipy.stats.qqplot and scipy.stats.probplot axis labeling

Re: [Numpy-discussion] scipy.stats.qqplot and scipy.stats.probplot axis labeling

Re: [Numpy-discussion] scipy.stats.qqplot and scipy.stats.probplot axis labeling

Re: [Numpy-discussion] scipy.stats.qqplot and scipy.stats.probplot axis labeling

[Numpy-discussion] scipy.stats.qqplot and scipy.stats.probplot axis labeling

7 matches

Site Navigation

Mail list logo

Footer information