Re: [Numpy-discussion] NumPy lesson at EuroScipy2016?
Dear Bartocz, thank you very much for proposing a tutorial on advanced NumPy for Euroscipy 2016! I think it's an awesome idea! Before the call for proposals, I did a survey about the subjects that people were interested in for the advanced tutorials, and advanced NumPy scored very high (see the poll on https://docs.google.com/forms/d/1H0vDPNgRVyESM1LYHSXXmunTgorNvVmu_psS56u9MOk/viewanalytics and my blog post on the results on http://emmanuelle.github.io/euroscipy-tutorials-results-from-the-opinion-poll.html). Therefore, I would be very grateful if you were willing to submit a proposal for a tutorial on advanced NumPy, in the advanced track. For the beginners track, there is already a tutorial on NumPy, which will be given by Gert Ingold (a contributor to the Scipy Lecture Notes). He's planning to cover the intro chapter of the scipy lecture notes about NumPy http://www.scipy-lectures.org/intro/numpy/index.html Since you mentioned the Scipy Lecture Notes in your e-mail, if you think that you would be interested in updating/improving the part on advanced NumPy of the lecture notes, that'd be really awesome! All the best, Emma On Sat, Jun 11, 2016 at 03:51:08PM +0200, Ralf Gommers wrote: > On Thu, Jun 9, 2016 at 11:25 PM, wrote: > Hi all, > Recently I taught "Advanced NumPy" lesson at a Software Carpentry workshop > [1]. It covered a review of basic operations on numpy arrays and also more > advanced topics: indexing, broadcasting, dtypes and memory layout. I would > greatly appreciate your feedback on the lesson materials, which are > available on github pages [2]. > I am also thinking of proposing this lesson as a EuroScipy 2016 tutorial. > Is anyone already planning to teach NumPy there? If so, would you be > interested to team up for this lesson (as a co-instructor, helper or > mentor)? > There's always a Numpy tutorial at EuroScipy. Emmanuelle (Cc'd) is the > tutorial > chair, she can tell you the plan and I'm sure she appreciates your offer of > help. > Cheers, > Ralf ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] scipy.stats.qqplot and scipy.stats.probplot axis labeling
Ok, Our messages crossed. I understand now. Thanks. Mark On Jun 11, 2016, at 12:24 PM, josef.p...@gmail.com wrote: > > > On Sat, Jun 11, 2016 at 2:49 PM, Mark Gawron wrote: > Thanks, Jozef. This is very helpful. And I will direct this > to one of the other mailing lists, once I read the previous posts. > > Regarding your remark: Maybe Im having a terminology problem. It seems to > me once you do > >>> osm = dist.ppf(osm_uniform) > > you’re back in the value space for the particular distribution. So this > gives you known probability intervals, but not UNIFORM probability > intervals (the interval between 0 and 1 STD covers a bigger prob interval > than the the interval between 1 and 2). And the idea of a quantile is > that it’s a division point in a UNIFORM division of the probability axis. > > > Yes and No, quantile, i.e. what you get from ppf, are units of the random > variable. So it is on the scale of the random variable not on a probability > scale. The axis labels are in units of the random variable. > > pp-plots have probabilities on the axis and are uniform scaled in > probabilities but non-uniform in the values of the random variable. > > The difficult part to follow is if the plot is done uniform in one scale, but > the axis are labeled non-uniform in the other scale. That's what Paul's > probscale does and what you have in mind, AFAIU. > > Josef > > > Mark > > On Jun 11, 2016, at 10:03 AM, josef.p...@gmail.com wrote: > >> >> >> On Sat, Jun 11, 2016 at 8:53 AM, Ralf Gommers wrote: >> Hi Mark, >> >> Note that the scipy-dev or scipy-user mailing list would have been more >> appropriate for this question. >> >> >> On Fri, Jun 10, 2016 at 9:06 AM, Mark Gawron wrote: >> >> >> The scipy.stats.qqplot and scipy.stats.probplot functions plot expected >> values versus actual data values for visualization of fit to a distribution. >> First a one-D array of expected percentiles is generated for a sample of >> size N; then that is passed to dist.ppf, the per cent point function for >> the chosen distribution, to return an array of expected values. The >> visualized data points are pairs of expected and actual values, and a linear >> regression is done on these to produce the line data points in this >> distribution should lie on. >> >> Where x is the input data array and dist the chosen distribution we have: >> >>> osr = np.sort(x) >>> osm_uniform = _calc_uniform_order_statistic_medians(len(x)) >>> osm = dist.ppf(osm_uniform) >>> slope, intercept, r, prob, sterrest = stats.linregress(osm, osr) >> >> My question concerns the plot display. >> >>> ax.plot(osm, osr, 'bo', osm, slope*osm + intercept, 'r-') >> >> >> The x-axis of the resulting plot is labeled quantiles, but the xticks and >> xticklabels produced produced by qqplot and problplot do not seem correct >> for the their intended interpretations. First the numbers on the x-axis do >> not represent quantiles; the intervals between them do not in general >> contain equal numbers of points. For a normal distribution with sigma=1, >> they represent standard deviations. Changing the label on the x-axis does >> not seem like a very good solution, because the interpretation of the values >> on the x-axis will be different for different distributions. Rather the >> right solution seems to be to actually show quantiles on the x-axis. The >> numbers on the x-axis can stay as they are, representing quantile indexes, >> but they need to be spaced so as to show the actual division points that >> carve the population up into groups of the same size. This can be done in >> something like the following way. >> >> The ticks are correct I think, but they're theoretical quantiles and not >> sample quantiles. This was discussed in [1] and is consistent with R [2] and >> statsmodels [3]. I see that we just forgot to add "theoretical" to the >> x-axis label (mea culpa). Does adding that resolve your concern? >> >> [1] https://github.com/scipy/scipy/issues/1821 >> [2] http://data.library.virginia.edu/understanding-q-q-plots/ >> [3] >> http://statsmodels.sourceforge.net/devel/generated/statsmodels.graphics.gofplots.qqplot.html?highlight=qqplot#statsmodels.graphics.gofplots.qqplot >> >> Ralf >> >> >> as related link >> http://phobson.github.io/mpl-probscale/tutorial/closer_look_at_viz.html >> >> Paul Hobson has done a lot of work for getting different probabitlity scales >> attached to pp-plots or generalized versions of probability plots. I think >> qqplots are less ambiguous because they are on the original or standardized >> scale. >> >> I haven't worked my way through the various interpretation of probability >> axis yet because I find it "not obvious". It might be easier for fields that >> have a tradition of using probability papers. >> >> It's planned to be added to the statsmodels probability plots so that there >> will be a large choice of axis labels and scales. >> >> J
Re: [Numpy-discussion] scipy.stats.qqplot and scipy.stats.probplot axis labeling
On Sat, Jun 11, 2016 at 2:49 PM, Mark Gawron wrote: > Thanks, Jozef. This is very helpful. And I will direct this > to one of the other mailing lists, once I read the previous posts. > > Regarding your remark: Maybe Im having a terminology problem. It seems > to me once you do > > osm = dist.ppf(osm_uniform) >>> >>> > you’re back in the value space for the particular distribution. So this > gives you known probability intervals, but not UNIFORM probability > intervals (the interval between 0 and 1 STD covers a bigger prob interval > than the the interval between 1 and 2). And the idea of a quantile is > that it’s a division point in a UNIFORM division of the probability axis. > Yes and No, quantile, i.e. what you get from ppf, are units of the random variable. So it is on the scale of the random variable not on a probability scale. The axis labels are in units of the random variable. pp-plots have probabilities on the axis and are uniform scaled in probabilities but non-uniform in the values of the random variable. The difficult part to follow is if the plot is done uniform in one scale, but the axis are labeled non-uniform in the other scale. That's what Paul's probscale does and what you have in mind, AFAIU. Josef > > Mark > > On Jun 11, 2016, at 10:03 AM, josef.p...@gmail.com wrote: > > > > On Sat, Jun 11, 2016 at 8:53 AM, Ralf Gommers > wrote: > >> Hi Mark, >> >> Note that the scipy-dev or scipy-user mailing list would have been more >> appropriate for this question. >> >> >> On Fri, Jun 10, 2016 at 9:06 AM, Mark Gawron >> wrote: >> >>> >>> >>> The scipy.stats.qqplot and scipy.stats.probplot functions plot expected >>> values versus actual data values for visualization of fit to a >>> distribution. First a one-D array of expected percentiles is generated for >>> a sample of size N; then that is passed to dist.ppf, the per cent point >>> function for the chosen distribution, to return an array of expected >>> values. The visualized data points are pairs of expected and actual >>> values, and a linear regression is done on these to produce the line data >>> points in this distribution should lie on. >>> >>> Where x is the input data array and dist the chosen distribution we have: >>> >>> osr = np.sort(x) >>> osm_uniform = _calc_uniform_order_statistic_medians(len(x)) >>> osm = dist.ppf(osm_uniform) >>> slope, intercept, r, prob, sterrest = stats.linregress(osm, osr) >>> >>> >>> My question concerns the plot display. >>> >>> ax.plot(osm, osr, 'bo', osm, slope*osm + intercept, 'r-') >>> >>> >>> The x-axis of the resulting plot is labeled quantiles, but the xticks >>> and xticklabels produced produced by qqplot and problplot do not seem >>> correct for the their intended interpretations. First the numbers on the >>> x-axis do not represent quantiles; the intervals between them do not in >>> general contain equal numbers of points. For a normal distribution with >>> sigma=1, they represent standard deviations. Changing the label on the >>> x-axis does not seem like a very good solution, because the interpretation >>> of the values on the x-axis will be different for different distributions. >>> Rather the right solution seems to be to actually show quantiles on the >>> x-axis. The numbers on the x-axis can stay as they are, representing >>> quantile indexes, but they need to be spaced so as to show the actual >>> division points that carve the population up into groups of the same >>> size. This can be done in something like the following way. >>> >> >> The ticks are correct I think, but they're theoretical quantiles and not >> sample quantiles. This was discussed in [1] and is consistent with R [2] >> and statsmodels [3]. I see that we just forgot to add "theoretical" to the >> x-axis label (mea culpa). Does adding that resolve your concern? >> >> [1] https://github.com/scipy/scipy/issues/1821 >> [2] http://data.library.virginia.edu/understanding-q-q-plots/ >> [3] >> http://statsmodels.sourceforge.net/devel/generated/statsmodels.graphics.gofplots.qqplot.html?highlight=qqplot#statsmodels.graphics.gofplots.qqplot >> >> Ralf >> >> > as related link > http://phobson.github.io/mpl-probscale/tutorial/closer_look_at_viz.html > > Paul Hobson has done a lot of work for getting different probabitlity > scales attached to pp-plots or generalized versions of probability plots. I > think qqplots are less ambiguous because they are on the original or > standardized scale. > > I haven't worked my way through the various interpretation of probability > axis yet because I find it "not obvious". It might be easier for fields > that have a tradition of using probability papers. > > It's planned to be added to the statsmodels probability plots so that > there will be a large choice of axis labels and scales. > > Josef > > >> >> >> >> ___ >> NumPy-Discussion mailing list >> NumPy-Discussion@scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] scipy.stats.qqplot and scipy.stats.probplot axis labeling
Thanks, Jozef. This is very helpful. And I will direct this to one of the other mailing lists, once I read the previous posts. Regarding your remark: Maybe Im having a terminology problem. It seems to me once you do >> osm = dist.ppf(osm_uniform) you’re back in the value space for the particular distribution. So this gives you known probability intervals, but not UNIFORM probability intervals (the interval between 0 and 1 STD covers a bigger prob interval than the the interval between 1 and 2). And the idea of a quantile is that it’s a division point in a UNIFORM division of the probability axis. Mark On Jun 11, 2016, at 10:03 AM, josef.p...@gmail.com wrote: > > > On Sat, Jun 11, 2016 at 8:53 AM, Ralf Gommers wrote: > Hi Mark, > > Note that the scipy-dev or scipy-user mailing list would have been more > appropriate for this question. > > > On Fri, Jun 10, 2016 at 9:06 AM, Mark Gawron wrote: > > > The scipy.stats.qqplot and scipy.stats.probplot functions plot expected > values versus actual data values for visualization of fit to a distribution. > First a one-D array of expected percentiles is generated for a sample of > size N; then that is passed to dist.ppf, the per cent point function for the > chosen distribution, to return an array of expected values. The visualized > data points are pairs of expected and actual values, and a linear regression > is done on these to produce the line data points in this distribution should > lie on. > > Where x is the input data array and dist the chosen distribution we have: > >> osr = np.sort(x) >> osm_uniform = _calc_uniform_order_statistic_medians(len(x)) >> osm = dist.ppf(osm_uniform) >> slope, intercept, r, prob, sterrest = stats.linregress(osm, osr) > > My question concerns the plot display. > >> ax.plot(osm, osr, 'bo', osm, slope*osm + intercept, 'r-') > > > The x-axis of the resulting plot is labeled quantiles, but the xticks and > xticklabels produced produced by qqplot and problplot do not seem correct for > the their intended interpretations. First the numbers on the x-axis do not > represent quantiles; the intervals between them do not in general contain > equal numbers of points. For a normal distribution with sigma=1, they > represent standard deviations. Changing the label on the x-axis does not > seem like a very good solution, because the interpretation of the values on > the x-axis will be different for different distributions. Rather the right > solution seems to be to actually show quantiles on the x-axis. The numbers on > the x-axis can stay as they are, representing quantile indexes, but they need > to be spaced so as to show the actual division points that carve the > population up into groups of the same size. This can be done in something > like the following way. > > The ticks are correct I think, but they're theoretical quantiles and not > sample quantiles. This was discussed in [1] and is consistent with R [2] and > statsmodels [3]. I see that we just forgot to add "theoretical" to the x-axis > label (mea culpa). Does adding that resolve your concern? > > [1] https://github.com/scipy/scipy/issues/1821 > [2] http://data.library.virginia.edu/understanding-q-q-plots/ > [3] > http://statsmodels.sourceforge.net/devel/generated/statsmodels.graphics.gofplots.qqplot.html?highlight=qqplot#statsmodels.graphics.gofplots.qqplot > > Ralf > > > as related link > http://phobson.github.io/mpl-probscale/tutorial/closer_look_at_viz.html > > Paul Hobson has done a lot of work for getting different probabitlity scales > attached to pp-plots or generalized versions of probability plots. I think > qqplots are less ambiguous because they are on the original or standardized > scale. > > I haven't worked my way through the various interpretation of probability > axis yet because I find it "not obvious". It might be easier for fields that > have a tradition of using probability papers. > > It's planned to be added to the statsmodels probability plots so that there > will be a large choice of axis labels and scales. > > Josef > > > > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] scipy.stats.qqplot and scipy.stats.probplot axis labeling
On Sat, Jun 11, 2016 at 8:53 AM, Ralf Gommers wrote: > Hi Mark, > > Note that the scipy-dev or scipy-user mailing list would have been more > appropriate for this question. > > > On Fri, Jun 10, 2016 at 9:06 AM, Mark Gawron wrote: > >> >> >> The scipy.stats.qqplot and scipy.stats.probplot functions plot expected >> values versus actual data values for visualization of fit to a >> distribution. First a one-D array of expected percentiles is generated for >> a sample of size N; then that is passed to dist.ppf, the per cent point >> function for the chosen distribution, to return an array of expected >> values. The visualized data points are pairs of expected and actual >> values, and a linear regression is done on these to produce the line data >> points in this distribution should lie on. >> >> Where x is the input data array and dist the chosen distribution we have: >> >> osr = np.sort(x) >> osm_uniform = _calc_uniform_order_statistic_medians(len(x)) >> osm = dist.ppf(osm_uniform) >> slope, intercept, r, prob, sterrest = stats.linregress(osm, osr) >> >> >> My question concerns the plot display. >> >> ax.plot(osm, osr, 'bo', osm, slope*osm + intercept, 'r-') >> >> >> The x-axis of the resulting plot is labeled quantiles, but the xticks and >> xticklabels produced produced by qqplot and problplot do not seem correct >> for the their intended interpretations. First the numbers on the x-axis do >> not represent quantiles; the intervals between them do not in general >> contain equal numbers of points. For a normal distribution with sigma=1, >> they represent standard deviations. Changing the label on the x-axis does >> not seem like a very good solution, because the interpretation of the >> values on the x-axis will be different for different distributions. Rather >> the right solution seems to be to actually show quantiles on the x-axis. >> The numbers on the x-axis can stay as they are, representing quantile >> indexes, but they need to be spaced so as to show the actual division >> points that carve the population up into groups of the same size. This >> can be done in something like the following way. >> > > The ticks are correct I think, but they're theoretical quantiles and not > sample quantiles. This was discussed in [1] and is consistent with R [2] > and statsmodels [3]. I see that we just forgot to add "theoretical" to the > x-axis label (mea culpa). Does adding that resolve your concern? > > [1] https://github.com/scipy/scipy/issues/1821 > [2] http://data.library.virginia.edu/understanding-q-q-plots/ > [3] > http://statsmodels.sourceforge.net/devel/generated/statsmodels.graphics.gofplots.qqplot.html?highlight=qqplot#statsmodels.graphics.gofplots.qqplot > > Ralf > > as related link http://phobson.github.io/mpl-probscale/tutorial/closer_look_at_viz.html Paul Hobson has done a lot of work for getting different probabitlity scales attached to pp-plots or generalized versions of probability plots. I think qqplots are less ambiguous because they are on the original or standardized scale. I haven't worked my way through the various interpretation of probability axis yet because I find it "not obvious". It might be easier for fields that have a tradition of using probability papers. It's planned to be added to the statsmodels probability plots so that there will be a large choice of axis labels and scales. Josef > > > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy lesson at EuroScipy2016?
On Thu, Jun 9, 2016 at 11:25 PM, wrote: > Hi all, > > Recently I taught "Advanced NumPy" lesson at a Software Carpentry workshop > [1]. It covered a review of basic operations on numpy arrays and also more > advanced topics: indexing, broadcasting, dtypes and memory layout. I would > greatly appreciate your feedback on the lesson materials, which are > available on github pages [2]. > > I am also thinking of proposing this lesson as a EuroScipy 2016 tutorial. > Is anyone already planning to teach NumPy there? If so, would you be > interested to team up for this lesson (as a co-instructor, helper or > mentor)? > There's always a Numpy tutorial at EuroScipy. Emmanuelle (Cc'd) is the tutorial chair, she can tell you the plan and I'm sure she appreciates your offer of help. Cheers, Ralf ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] scipy.stats.qqplot and scipy.stats.probplot axis labeling
Hi Mark, Note that the scipy-dev or scipy-user mailing list would have been more appropriate for this question. On Fri, Jun 10, 2016 at 9:06 AM, Mark Gawron wrote: > > > The scipy.stats.qqplot and scipy.stats.probplot functions plot expected > values versus actual data values for visualization of fit to a > distribution. First a one-D array of expected percentiles is generated for > a sample of size N; then that is passed to dist.ppf, the per cent point > function for the chosen distribution, to return an array of expected > values. The visualized data points are pairs of expected and actual > values, and a linear regression is done on these to produce the line data > points in this distribution should lie on. > > Where x is the input data array and dist the chosen distribution we have: > > osr = np.sort(x) > osm_uniform = _calc_uniform_order_statistic_medians(len(x)) > osm = dist.ppf(osm_uniform) > slope, intercept, r, prob, sterrest = stats.linregress(osm, osr) > > > My question concerns the plot display. > > ax.plot(osm, osr, 'bo', osm, slope*osm + intercept, 'r-') > > > The x-axis of the resulting plot is labeled quantiles, but the xticks and > xticklabels produced produced by qqplot and problplot do not seem correct > for the their intended interpretations. First the numbers on the x-axis do > not represent quantiles; the intervals between them do not in general > contain equal numbers of points. For a normal distribution with sigma=1, > they represent standard deviations. Changing the label on the x-axis does > not seem like a very good solution, because the interpretation of the > values on the x-axis will be different for different distributions. Rather > the right solution seems to be to actually show quantiles on the x-axis. > The numbers on the x-axis can stay as they are, representing quantile > indexes, but they need to be spaced so as to show the actual division > points that carve the population up into groups of the same size. This > can be done in something like the following way. > The ticks are correct I think, but they're theoretical quantiles and not sample quantiles. This was discussed in [1] and is consistent with R [2] and statsmodels [3]. I see that we just forgot to add "theoretical" to the x-axis label (mea culpa). Does adding that resolve your concern? [1] https://github.com/scipy/scipy/issues/1821 [2] http://data.library.virginia.edu/understanding-q-q-plots/ [3] http://statsmodels.sourceforge.net/devel/generated/statsmodels.graphics.gofplots.qqplot.html?highlight=qqplot#statsmodels.graphics.gofplots.qqplot Ralf ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Integers to integer powers, let's make a decision
On Fr, 2016-06-10 at 20:16 +, Ian Henriksen wrote: > On Fri, Jun 10, 2016 at 12:01 PM Nathaniel Smith > wrote: > > On Jun 10, 2016 10:50, "Alan Isaac" wrote: > > > > > > On 6/10/2016 1:34 PM, Nathaniel Smith wrote: > > >> > > >> You keep pounding on this example. It's a fine example, but, > > c'mon. **2 is probably at least 100x more common in real source > > code. Maybe 1000x more common. Why should we break the > > >> common case for your edge case? > > > > > > > > > > > > It is hardly an "edge case". > > > Again, **almost all** integer combinations overflow: that's the > > point. > > When you say "almost all", you're assuming inputs that are > > uniformly sampled integers. I'm much more interested in what > > proportion of calls to the ** operator involve inputs that can > > overflow, and in real life those inputs are very heavily biased > > towards small numbers. > > (I also think we should default to raising an error on overflow in > > general, with a seterr switch to turn it off when desired. But > > that's another discussion...) > > -n > > > Another thing that would need separate discussion... > Making 64 bit integers default in more cases would help here. > Currently arange gives 32 bit integers on 64 bit Windows, but > 64 bit integers on 64 bit Linux/OSX. Using size_t (or even > int64_t) as the default size would help with overflows in > the more common use cases. It's a hefty backcompat > break, but 64 bit systems are much more common now, > and using 32 bit integers on 64 bit windows is a bit odd. > Anyway, hopefully that's not too off-topic. > Best, I agree, at least on python3 (the reason is that python 3, the subclass thingy goes away, so it is less likely to break anything). I think we could have a shot at this, it is quirky, but the current incosistency is pretty bad too (and probably has a lot of bugs out in the wild, because of tests on systems where long is 64bits). A different issue though, though I wouldn't mind if someone ponders this a bit more and maybe creates a pull request. - Sebastian > Ian Henriksen > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion signature.asc Description: This is a digitally signed message part ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion