Re: [Numpy-discussion] Combining covariance and correlation coefficient into one numpy.cov call

2016-10-26 Thread Mathew S. Madhavacheril
On Wed, Oct 26, 2016 at 3:20 PM,  wrote:

>
>
> On Wed, Oct 26, 2016 at 3:11 PM, Mathew S. Madhavacheril <
> mathewsyr...@gmail.com> wrote:
>
>>
>>
>> On Wed, Oct 26, 2016 at 2:56 PM, Nathaniel Smith  wrote:
>>
>>> On Wed, Oct 26, 2016 at 11:13 AM, Stephan Hoyer 
>>> wrote:
>>> > On Wed, Oct 26, 2016 at 11:03 AM, Mathew S. Madhavacheril
>>> >  wrote:
>>> >>
>>> >> On Wed, Oct 26, 2016 at 1:46 PM, Stephan Hoyer 
>>> wrote:
>>> >>>
>>> >>> I wonder if the goals of this addition could be achieved by simply
>>> adding
>>> >>> an optional `cov` argument
>>> >>>
>>> >>> to np.corr, which would provide a pre-computed covariance.
>>> >>
>>> >>
>>> >> That's a fair suggestion which I'm happy to switch to. This
>>> eliminates the
>>> >> need for two new functions.
>>> >> I'll add an optional `cov = False` argument to numpy.corrcoef that
>>> returns
>>> >> a tuple (corr, cov) instead.
>>> >>
>>> >>>
>>> >>>
>>> >>> Either way, `covcorr` feels like a helper function that could exist
>>> in
>>> >>> user code rather than numpy proper.
>>> >>
>>> >>
>>> >> The user would have to re-implement the part that converts the
>>> covariance
>>> >> matrix to a correlation
>>> >> coefficient. I made this PR to avoid that code duplication.
>>> >
>>> >
>>> > With the API I was envisioning (or even your proposed API, for that
>>> matter),
>>> > this function would only be a few lines, e.g.,
>>> >
>>> > def covcorr(x):
>>> > cov = np.cov(x)
>>> > corr = np.corrcoef(x, cov=cov)
>>>
>>> IIUC, if you have a covariance matrix then you can compute the
>>> correlation matrix directly, without looking at 'x', so corrcoef(x,
>>> cov=cov) is a bit odd-looking. I think probably the API that makes the
>>> most sense is just to expose something like the covtocorr function
>>> (maybe it could have a less telegraphic name?)? And then, yeah, users
>>> can use that to build their own covcorr or whatever if they want it.
>>>
>>
>> Right, agreed, this is why I said `x` becomes redundant when `cov` is
>> specified
>> when calling `numpy.corrcoef`.  So we have two alternatives:
>>
>> 1) Have `np.corrcoef` accept a boolean optional argument `covmat = False`
>> that lets
>> one obtain a tuple containing the covariance and the correlation matrices
>> in the same call
>> 2) Modify my original PR so that `np.covtocorr` remains (with possibly a
>> better
>> name) but remove `np.covcorr` since this is easy for the user to add.
>>
>> My preference is option 2.
>>
>
> cov2corr is a useful function
> http://www.statsmodels.org/dev/generated/statsmodels.stats.
> moment_helpers.cov2corr.html
> I also wrote the inverse function corr2cov, but AFAIR use it only in some
> test cases.
>
>
> I don't think adding any of the options to corrcoef or covcor is useful
> since there is no computational advantage to it.
>

I'm not sure I agree with that statement. If a user wants to calculate both
a covariance and correlation matrix,
they currently have two options:
A) Call np.cov and np.corrcoef separately, which takes at least twice as
long as one call to np.cov. For data-sets that
I am used to, a np.cov call takes 5-10 seconds.
B) Call np.cov and then separately implement their own correlation matrix
code, which means the user
isn't able to fully take advantage of code that is already in numpy.

In any case, I've updated the PR:
https://github.com/numpy/numpy/pull/8211

Relative to my original PR, it:
a) removes the numpy.covcorr function which the user can easily implement
b) have numpy.cov2corr be the function exposed in the API (previously
called numpy.covtocorr in the PR), which accepts a pre-calculated covariance
matrix
c) have numpy.corrcoef call numpy.cov2corr




>
>
>
>>
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Combining covariance and correlation coefficient into one numpy.cov call

2016-10-26 Thread Nathaniel Smith
On Wed, Oct 26, 2016 at 11:13 AM, Stephan Hoyer  wrote:
> On Wed, Oct 26, 2016 at 11:03 AM, Mathew S. Madhavacheril
>  wrote:
>>
>> On Wed, Oct 26, 2016 at 1:46 PM, Stephan Hoyer  wrote:
>>>
>>> I wonder if the goals of this addition could be achieved by simply adding
>>> an optional `cov` argument
>>>
>>> to np.corr, which would provide a pre-computed covariance.
>>
>>
>> That's a fair suggestion which I'm happy to switch to. This eliminates the
>> need for two new functions.
>> I'll add an optional `cov = False` argument to numpy.corrcoef that returns
>> a tuple (corr, cov) instead.
>>
>>>
>>>
>>> Either way, `covcorr` feels like a helper function that could exist in
>>> user code rather than numpy proper.
>>
>>
>> The user would have to re-implement the part that converts the covariance
>> matrix to a correlation
>> coefficient. I made this PR to avoid that code duplication.
>
>
> With the API I was envisioning (or even your proposed API, for that matter),
> this function would only be a few lines, e.g.,
>
> def covcorr(x):
> cov = np.cov(x)
> corr = np.corrcoef(x, cov=cov)

IIUC, if you have a covariance matrix then you can compute the
correlation matrix directly, without looking at 'x', so corrcoef(x,
cov=cov) is a bit odd-looking. I think probably the API that makes the
most sense is just to expose something like the covtocorr function
(maybe it could have a less telegraphic name?)? And then, yeah, users
can use that to build their own covcorr or whatever if they want it.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Combining covariance and correlation coefficient into one numpy.cov call

2016-10-26 Thread Stephan Hoyer
On Wed, Oct 26, 2016 at 11:03 AM, Mathew S. Madhavacheril <
mathewsyr...@gmail.com> wrote:

> On Wed, Oct 26, 2016 at 1:46 PM, Stephan Hoyer  wrote:
>
>> I wonder if the goals of this addition could be achieved by simply adding
>> an optional `cov` argument
>>
> to np.corr, which would provide a pre-computed covariance.
>>
>
> That's a fair suggestion which I'm happy to switch to. This eliminates the
> need for two new functions.
> I'll add an optional `cov = False` argument to numpy.corrcoef that returns
> a tuple (corr, cov) instead.
>
>
>>
>> Either way, `covcorr` feels like a helper function that could exist in
>> user code rather than numpy proper.
>>
>
> The user would have to re-implement the part that converts the covariance
> matrix to a correlation
> coefficient. I made this PR to avoid that code duplication.
>

With the API I was envisioning (or even your proposed API, for that
matter), this function would only be a few lines, e.g.,

def covcorr(x):
cov = np.cov(x)
corr = np.corrcoef(x, cov=cov)
return (cov, corr)

Generally, functions this short should be provided as recipes (if at all)
rather than be added to numpy proper, unless the need for them is extremely
common.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Combining covariance and correlation coefficient into one numpy.cov call

2016-10-26 Thread Mathew S. Madhavacheril
On Wed, Oct 26, 2016 at 1:46 PM, Stephan Hoyer  wrote:

> I wonder if the goals of this addition could be achieved by simply adding
> an optional `cov` argument
>
to np.corr, which would provide a pre-computed covariance.
>

That's a fair suggestion which I'm happy to switch to. This eliminates the
need for two new functions.
I'll add an optional `cov = False` argument to numpy.corrcoef that returns
a tuple (corr, cov) instead.


>
> Either way, `covcorr` feels like a helper function that could exist in
> user code rather than numpy proper.
>

The user would have to re-implement the part that converts the covariance
matrix to a correlation
coefficient. I made this PR to avoid that code duplication.

Mathew


>
> On Wed, Oct 26, 2016 at 10:27 AM, Mathew S. Madhavacheril <
> mathewsyr...@gmail.com> wrote:
>
>> Hi all,
>>
>> I posted a pull request:
>> https://github.com/numpy/numpy/pull/8211
>>
>> which adds a function `numpy.covcorr` that calculates both
>> the covariance matrix and correlation coefficient with a single
>> call to `numpy.cov` (which is often an expensive call for large
>> data-sets). A function `numpy.covtocorr` has also been added
>> that converts a covariance matrix to a correlation coefficent,
>> and `numpy.corrcoef` has been modified to call this. The
>> motivation here is that one often needs the covariance for
>> subsequent analysis and the correlation coefficient for
>> visualization, so instead of forcing the user to write their own
>> code to convert one to the other, we want to allow both to
>> be obtained from `numpy` as efficiently as possible.
>>
>> Best,
>> Mathew
>>
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Combining covariance and correlation coefficient into one numpy.cov call

2016-10-26 Thread Stephan Hoyer
I wonder if the goals of this addition could be achieved by simply adding
an optional `cov` argument to np.corr, which would provide a pre-computed
covariance.

Either way, `covcorr` feels like a helper function that could exist in user
code rather than numpy proper.

On Wed, Oct 26, 2016 at 10:27 AM, Mathew S. Madhavacheril <
mathewsyr...@gmail.com> wrote:

> Hi all,
>
> I posted a pull request:
> https://github.com/numpy/numpy/pull/8211
>
> which adds a function `numpy.covcorr` that calculates both
> the covariance matrix and correlation coefficient with a single
> call to `numpy.cov` (which is often an expensive call for large
> data-sets). A function `numpy.covtocorr` has also been added
> that converts a covariance matrix to a correlation coefficent,
> and `numpy.corrcoef` has been modified to call this. The
> motivation here is that one often needs the covariance for
> subsequent analysis and the correlation coefficient for
> visualization, so instead of forcing the user to write their own
> code to convert one to the other, we want to allow both to
> be obtained from `numpy` as efficiently as possible.
>
> Best,
> Mathew
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion