Re: [Numpy-discussion] Combining covariance and correlation coefficient into one numpy.cov call
On Wed, Oct 26, 2016 at 3:20 PM,wrote: > > > On Wed, Oct 26, 2016 at 3:11 PM, Mathew S. Madhavacheril < > mathewsyr...@gmail.com> wrote: > >> >> >> On Wed, Oct 26, 2016 at 2:56 PM, Nathaniel Smith wrote: >> >>> On Wed, Oct 26, 2016 at 11:13 AM, Stephan Hoyer >>> wrote: >>> > On Wed, Oct 26, 2016 at 11:03 AM, Mathew S. Madhavacheril >>> > wrote: >>> >> >>> >> On Wed, Oct 26, 2016 at 1:46 PM, Stephan Hoyer >>> wrote: >>> >>> >>> >>> I wonder if the goals of this addition could be achieved by simply >>> adding >>> >>> an optional `cov` argument >>> >>> >>> >>> to np.corr, which would provide a pre-computed covariance. >>> >> >>> >> >>> >> That's a fair suggestion which I'm happy to switch to. This >>> eliminates the >>> >> need for two new functions. >>> >> I'll add an optional `cov = False` argument to numpy.corrcoef that >>> returns >>> >> a tuple (corr, cov) instead. >>> >> >>> >>> >>> >>> >>> >>> Either way, `covcorr` feels like a helper function that could exist >>> in >>> >>> user code rather than numpy proper. >>> >> >>> >> >>> >> The user would have to re-implement the part that converts the >>> covariance >>> >> matrix to a correlation >>> >> coefficient. I made this PR to avoid that code duplication. >>> > >>> > >>> > With the API I was envisioning (or even your proposed API, for that >>> matter), >>> > this function would only be a few lines, e.g., >>> > >>> > def covcorr(x): >>> > cov = np.cov(x) >>> > corr = np.corrcoef(x, cov=cov) >>> >>> IIUC, if you have a covariance matrix then you can compute the >>> correlation matrix directly, without looking at 'x', so corrcoef(x, >>> cov=cov) is a bit odd-looking. I think probably the API that makes the >>> most sense is just to expose something like the covtocorr function >>> (maybe it could have a less telegraphic name?)? And then, yeah, users >>> can use that to build their own covcorr or whatever if they want it. >>> >> >> Right, agreed, this is why I said `x` becomes redundant when `cov` is >> specified >> when calling `numpy.corrcoef`. So we have two alternatives: >> >> 1) Have `np.corrcoef` accept a boolean optional argument `covmat = False` >> that lets >> one obtain a tuple containing the covariance and the correlation matrices >> in the same call >> 2) Modify my original PR so that `np.covtocorr` remains (with possibly a >> better >> name) but remove `np.covcorr` since this is easy for the user to add. >> >> My preference is option 2. >> > > cov2corr is a useful function > http://www.statsmodels.org/dev/generated/statsmodels.stats. > moment_helpers.cov2corr.html > I also wrote the inverse function corr2cov, but AFAIR use it only in some > test cases. > > > I don't think adding any of the options to corrcoef or covcor is useful > since there is no computational advantage to it. > I'm not sure I agree with that statement. If a user wants to calculate both a covariance and correlation matrix, they currently have two options: A) Call np.cov and np.corrcoef separately, which takes at least twice as long as one call to np.cov. For data-sets that I am used to, a np.cov call takes 5-10 seconds. B) Call np.cov and then separately implement their own correlation matrix code, which means the user isn't able to fully take advantage of code that is already in numpy. In any case, I've updated the PR: https://github.com/numpy/numpy/pull/8211 Relative to my original PR, it: a) removes the numpy.covcorr function which the user can easily implement b) have numpy.cov2corr be the function exposed in the API (previously called numpy.covtocorr in the PR), which accepts a pre-calculated covariance matrix c) have numpy.corrcoef call numpy.cov2corr > > > >> >> >> ___ >> NumPy-Discussion mailing list >> NumPy-Discussion@scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Combining covariance and correlation coefficient into one numpy.cov call
On Wed, Oct 26, 2016 at 11:13 AM, Stephan Hoyerwrote: > On Wed, Oct 26, 2016 at 11:03 AM, Mathew S. Madhavacheril > wrote: >> >> On Wed, Oct 26, 2016 at 1:46 PM, Stephan Hoyer wrote: >>> >>> I wonder if the goals of this addition could be achieved by simply adding >>> an optional `cov` argument >>> >>> to np.corr, which would provide a pre-computed covariance. >> >> >> That's a fair suggestion which I'm happy to switch to. This eliminates the >> need for two new functions. >> I'll add an optional `cov = False` argument to numpy.corrcoef that returns >> a tuple (corr, cov) instead. >> >>> >>> >>> Either way, `covcorr` feels like a helper function that could exist in >>> user code rather than numpy proper. >> >> >> The user would have to re-implement the part that converts the covariance >> matrix to a correlation >> coefficient. I made this PR to avoid that code duplication. > > > With the API I was envisioning (or even your proposed API, for that matter), > this function would only be a few lines, e.g., > > def covcorr(x): > cov = np.cov(x) > corr = np.corrcoef(x, cov=cov) IIUC, if you have a covariance matrix then you can compute the correlation matrix directly, without looking at 'x', so corrcoef(x, cov=cov) is a bit odd-looking. I think probably the API that makes the most sense is just to expose something like the covtocorr function (maybe it could have a less telegraphic name?)? And then, yeah, users can use that to build their own covcorr or whatever if they want it. -n -- Nathaniel J. Smith -- https://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Combining covariance and correlation coefficient into one numpy.cov call
On Wed, Oct 26, 2016 at 11:03 AM, Mathew S. Madhavacheril < mathewsyr...@gmail.com> wrote: > On Wed, Oct 26, 2016 at 1:46 PM, Stephan Hoyerwrote: > >> I wonder if the goals of this addition could be achieved by simply adding >> an optional `cov` argument >> > to np.corr, which would provide a pre-computed covariance. >> > > That's a fair suggestion which I'm happy to switch to. This eliminates the > need for two new functions. > I'll add an optional `cov = False` argument to numpy.corrcoef that returns > a tuple (corr, cov) instead. > > >> >> Either way, `covcorr` feels like a helper function that could exist in >> user code rather than numpy proper. >> > > The user would have to re-implement the part that converts the covariance > matrix to a correlation > coefficient. I made this PR to avoid that code duplication. > With the API I was envisioning (or even your proposed API, for that matter), this function would only be a few lines, e.g., def covcorr(x): cov = np.cov(x) corr = np.corrcoef(x, cov=cov) return (cov, corr) Generally, functions this short should be provided as recipes (if at all) rather than be added to numpy proper, unless the need for them is extremely common. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Combining covariance and correlation coefficient into one numpy.cov call
On Wed, Oct 26, 2016 at 1:46 PM, Stephan Hoyerwrote: > I wonder if the goals of this addition could be achieved by simply adding > an optional `cov` argument > to np.corr, which would provide a pre-computed covariance. > That's a fair suggestion which I'm happy to switch to. This eliminates the need for two new functions. I'll add an optional `cov = False` argument to numpy.corrcoef that returns a tuple (corr, cov) instead. > > Either way, `covcorr` feels like a helper function that could exist in > user code rather than numpy proper. > The user would have to re-implement the part that converts the covariance matrix to a correlation coefficient. I made this PR to avoid that code duplication. Mathew > > On Wed, Oct 26, 2016 at 10:27 AM, Mathew S. Madhavacheril < > mathewsyr...@gmail.com> wrote: > >> Hi all, >> >> I posted a pull request: >> https://github.com/numpy/numpy/pull/8211 >> >> which adds a function `numpy.covcorr` that calculates both >> the covariance matrix and correlation coefficient with a single >> call to `numpy.cov` (which is often an expensive call for large >> data-sets). A function `numpy.covtocorr` has also been added >> that converts a covariance matrix to a correlation coefficent, >> and `numpy.corrcoef` has been modified to call this. The >> motivation here is that one often needs the covariance for >> subsequent analysis and the correlation coefficient for >> visualization, so instead of forcing the user to write their own >> code to convert one to the other, we want to allow both to >> be obtained from `numpy` as efficiently as possible. >> >> Best, >> Mathew >> >> >> ___ >> NumPy-Discussion mailing list >> NumPy-Discussion@scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Combining covariance and correlation coefficient into one numpy.cov call
I wonder if the goals of this addition could be achieved by simply adding an optional `cov` argument to np.corr, which would provide a pre-computed covariance. Either way, `covcorr` feels like a helper function that could exist in user code rather than numpy proper. On Wed, Oct 26, 2016 at 10:27 AM, Mathew S. Madhavacheril < mathewsyr...@gmail.com> wrote: > Hi all, > > I posted a pull request: > https://github.com/numpy/numpy/pull/8211 > > which adds a function `numpy.covcorr` that calculates both > the covariance matrix and correlation coefficient with a single > call to `numpy.cov` (which is often an expensive call for large > data-sets). A function `numpy.covtocorr` has also been added > that converts a covariance matrix to a correlation coefficent, > and `numpy.corrcoef` has been modified to call this. The > motivation here is that one often needs the covariance for > subsequent analysis and the correlation coefficient for > visualization, so instead of forcing the user to write their own > code to convert one to the other, we want to allow both to > be obtained from `numpy` as efficiently as possible. > > Best, > Mathew > > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion