Re: [Numpy-discussion] Behavior of np.random.multivariate_normal with bad covariance matrices

2015-04-08 Thread Blake Griffith
I like your idea Josef, I'll add it to the PR. Just to be clear, we should
have something like:

Have a single check_valid keyword arg, which will default to warn, since
that is the current behavior. It will check approximate symmetry, PSDness,
and for NaN  infs. Other options on the check_valid keyword arg will be
ignore, and raise.

What should happen when fix is passed for check_valid? Set negative
eigenvalues to 0 and symmetrize the matrix?

On Mon, Mar 30, 2015 at 8:34 AM, josef.p...@gmail.com wrote:

 On Sun, Mar 29, 2015 at 7:39 PM, Blake Griffith
 blake.a.griff...@gmail.com wrote:
  I have an open PR which lets users control the checks on the input
  covariance matrix. The matrix is required to be symmetric and positve
  semi-definite (PSD). The current behavior is that NumPy raises a warning
 if
  the matrix is not PSD, and does not even check for symmetry.
 
  I added a symmetry check, which raises a warning when the input is not
  symmetric. And added two keyword args which users can use to turn off the
  checks/warnings when the matrix is ill formed. So this would only cause
  another new warning to be raised in existing code.
 
  This is needed because sometimes the covariance matrix is only *almost*
  symmetric or PSD due to roundoff error.
 
  Thoughts?

 My only question is why is **exact** symmetry relevant?

 AFAIU
 A empirical covariance matrix might not be exactly symmetric unless we
 specifically force it to be. But I don't see why some roundoff errors
 that violate symmetry should be relevant.

 use allclose with floating point rtol or equivalent?

 Some user code might suddenly get irrelevant warnings.

 BTW:
 neg = (np.sum(u.T * v, axis=1)  0)  (s  0)
 doesn't need to be calculated if cov_psd is false.

 -

 some more:

 svd can hang if the values are not finite, i.e. nan or infs

 counter proposal would be to add a `check_valid` keyword with option
 ignore. warn, raise, and fix

 and raise an error if there are nans and check_valid is not ignore.

 -

 aside:
 np.random.multivariate_normal   is only relevant if you have a new cov
 each call (or don't mind repeated possibly expensive calculations),
 so, I guess, adding checks by default won't upset many users.


 Josef


 
 
  PR: https://github.com/numpy/numpy/pull/5726
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Behavior of np.random.multivariate_normal with bad covariance matrices

2015-03-30 Thread josef.pktd
On Sun, Mar 29, 2015 at 7:39 PM, Blake Griffith
blake.a.griff...@gmail.com wrote:
 I have an open PR which lets users control the checks on the input
 covariance matrix. The matrix is required to be symmetric and positve
 semi-definite (PSD). The current behavior is that NumPy raises a warning if
 the matrix is not PSD, and does not even check for symmetry.

 I added a symmetry check, which raises a warning when the input is not
 symmetric. And added two keyword args which users can use to turn off the
 checks/warnings when the matrix is ill formed. So this would only cause
 another new warning to be raised in existing code.

 This is needed because sometimes the covariance matrix is only *almost*
 symmetric or PSD due to roundoff error.

 Thoughts?

My only question is why is **exact** symmetry relevant?

AFAIU
A empirical covariance matrix might not be exactly symmetric unless we
specifically force it to be. But I don't see why some roundoff errors
that violate symmetry should be relevant.

use allclose with floating point rtol or equivalent?

Some user code might suddenly get irrelevant warnings.

BTW:
neg = (np.sum(u.T * v, axis=1)  0)  (s  0)
doesn't need to be calculated if cov_psd is false.

-

some more:

svd can hang if the values are not finite, i.e. nan or infs

counter proposal would be to add a `check_valid` keyword with option
ignore. warn, raise, and fix

and raise an error if there are nans and check_valid is not ignore.

-

aside:
np.random.multivariate_normal   is only relevant if you have a new cov
each call (or don't mind repeated possibly expensive calculations),
so, I guess, adding checks by default won't upset many users.


Josef




 PR: https://github.com/numpy/numpy/pull/5726

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion