Re: [Numpy-discussion] Behavior of np.random.multivariate_normal with bad covariance matrices
I like your idea Josef, I'll add it to the PR. Just to be clear, we should have something like: Have a single check_valid keyword arg, which will default to warn, since that is the current behavior. It will check approximate symmetry, PSDness, and for NaN infs. Other options on the check_valid keyword arg will be ignore, and raise. What should happen when fix is passed for check_valid? Set negative eigenvalues to 0 and symmetrize the matrix? On Mon, Mar 30, 2015 at 8:34 AM, josef.p...@gmail.com wrote: On Sun, Mar 29, 2015 at 7:39 PM, Blake Griffith blake.a.griff...@gmail.com wrote: I have an open PR which lets users control the checks on the input covariance matrix. The matrix is required to be symmetric and positve semi-definite (PSD). The current behavior is that NumPy raises a warning if the matrix is not PSD, and does not even check for symmetry. I added a symmetry check, which raises a warning when the input is not symmetric. And added two keyword args which users can use to turn off the checks/warnings when the matrix is ill formed. So this would only cause another new warning to be raised in existing code. This is needed because sometimes the covariance matrix is only *almost* symmetric or PSD due to roundoff error. Thoughts? My only question is why is **exact** symmetry relevant? AFAIU A empirical covariance matrix might not be exactly symmetric unless we specifically force it to be. But I don't see why some roundoff errors that violate symmetry should be relevant. use allclose with floating point rtol or equivalent? Some user code might suddenly get irrelevant warnings. BTW: neg = (np.sum(u.T * v, axis=1) 0) (s 0) doesn't need to be calculated if cov_psd is false. - some more: svd can hang if the values are not finite, i.e. nan or infs counter proposal would be to add a `check_valid` keyword with option ignore. warn, raise, and fix and raise an error if there are nans and check_valid is not ignore. - aside: np.random.multivariate_normal is only relevant if you have a new cov each call (or don't mind repeated possibly expensive calculations), so, I guess, adding checks by default won't upset many users. Josef PR: https://github.com/numpy/numpy/pull/5726 ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Behavior of np.random.multivariate_normal with bad covariance matrices
I have an open PR which lets users control the checks on the input covariance matrix. The matrix is required to be symmetric and positve semi-definite (PSD). The current behavior is that NumPy raises a warning if the matrix is not PSD, and does not even check for symmetry. I added a symmetry check, which raises a warning when the input is not symmetric. And added two keyword args which users can use to turn off the checks/warnings when the matrix is ill formed. So this would only cause another new warning to be raised in existing code. This is needed because sometimes the covariance matrix is only *almost* symmetric or PSD due to roundoff error. Thoughts? PR: https://github.com/numpy/numpy/pull/5726 ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] PEP8
I think a good solution would to use add a git_hooks directory with a pre-commit git hook along with an git hook installation script. And a note should be added to DEV_README.txt suggesting installing the git hooks for pep8 compatibility. I personally use this as a pre-commit #!/bin/sh FILES=$(git diff --cached --name-status | grep -v ^D | awk '$1 $2 { print $2}' | grep -e .py$) if [ -n $FILES ]; then pep8 -r $FILES fi which is from here: https://gist.github.com/lentil/810399#comment-303703 On Mon, Sep 9, 2013 at 10:54 AM, Nathaniel Smith n...@pobox.com wrote: On Mon, Sep 9, 2013 at 3:29 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Mon, Sep 9, 2013 at 8:12 AM, Richard Hattersley rhatters...@gmail.com wrote: Something we have done in matplotlib is that we have made PEP8 a part of the tests. In Iris and Cartopy we've also done this and it works well. While we transition we have an exclusion list (which is gradually getting shorter). We've had mixed experiences with automatic reformatting, so prefer to keep the human in the loop. I agree with keeping a human in the loop, the script would be intended to get things into the right neighborhood, the submitter would have to review the changes after. If the script isn't too strict there will be than one way to do some things and those bits would rely on the good taste of the coder. So if I understand right, the goal is to have some script that developers can run before (or after) submitting a PR, like tools/autopep8-my-changes numpy/ that will fix up their changes, but leave the rest of numpy alone? And the proposed mechanism is to come up with a combination of changes to the numpy source and an autopep8 configuration such that autopep8 --our-config numpy/ becomes a no-op, and then we can use this as an implementation of tools/autopep8-my-changes? If that's right then my feeling is that the goal seems worthwhile but the approach seems difficult and unlikely to survive for long. As soon as someone overrides autopep8 once, we either have to disable the rule for the whole project or keep overriding it manually forever. You're already suggesting taking out the spaces-around-arithmetic rule, which strikes me as one of the most useful -- sure, it gets things wrongs sometimes, but I feel like we're constantly reviewing PRs where all*the*(arithmetic+is)-written**like*this. Maybe a better approach would be to spend that time hacking up some script that uses git and autopep8 together to run autopep8 over all and only those lines which the current branch has actually touched? It's pretty easy to parse 'git diff' output to get a list of all line numbers which have been modified, and then we could run autopep8 over the modified files and pull out only those changes which touch those lines. -n P.S.: definitely [:, :, 2] ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Upcoming 1.8 release.
I would like to have the ufunc overrides in 1.8 if it is possible. On Thu, Aug 15, 2013 at 9:21 AM, Charles R Harris charlesr.har...@gmail.com wrote: I don't see any that *have* to go in, but there are a few that could be included. The most significant is probably the inplace fancy indexing if it is ready. The nanmean etc. functions are not committed yet, but I think they are ready. If the Polynomial import fixes show up, they can go in. There are the usual janitorial things, the release notes need some clean up, the docs need merging, and the HOWTO_RELEASE document needs updating. For datetime64, I think a comment should be added to the release notes that it is still experimental and that changes are expected in 1.9. Hopefully the next release will come out next spring. I think we are also about ready for a 1.7.2 release. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Upcoming 1.8 release.
I think it is nearly complete. Although there are some recent changes that need review. I still need to go back and make changes to the original NEP noting the differences in final implementation. On Thu, Aug 15, 2013 at 11:52 AM, Charles R Harris charlesr.har...@gmail.com wrote: On Thu, Aug 15, 2013 at 10:48 AM, Blake Griffith blake.a.griff...@gmail.com wrote: I would like to have the ufunc overrides in 1.8 if it is possible. On Thu, Aug 15, 2013 at 9:21 AM, Charles R Harris charlesr.har...@gmail.com wrote: I don't see any that *have* to go in, but there are a few that could be included. The most significant is probably the inplace fancy indexing if it is ready. The nanmean etc. functions are not committed yet, but I think they are ready. If the Polynomial import fixes show up, they can go in. There are the usual janitorial things, the release notes need some clean up, the docs need merging, and the HOWTO_RELEASE document needs updating. For datetime64, I think a comment should be added to the release notes that it is still experimental and that changes are expected in 1.9. Hopefully the next release will come out next spring. I think we are also about ready for a 1.7.2 release. What is the status of that? I've been leaving that commit up the Pauli. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] ufunc overrides
Hello NumPy, Part of my GSoC is compatibility with SciPy's sparse matrices and NumPy's ufuncs. Currently there is no feasible way to do this without changing ufuncs a bit. I've been considering a mechanism to override ufuncs based on checking the ufuncs arguments for a __ufunc_override__ attribute. Then handing off the operation to a function specified by that attribute. I prototyped this in python and did a demo in a blog post here: http://cwl.cx/posts/week-6-ufunc-overrides.html This is similar to a previously discussed, but never implemented change: http://mail.scipy.org/pipermail/numpy-discussion/2011-June/056945.html However it seems like the ufunc machinery might be ripped out and replaced with a true multi-method implementation soon. See Travis' blog post: http://technicaldiscovery.blogspot.com/2013/07/thoughts-after-scipy-2013-and-specific.html So I'd like to make my changes as forward compatible as possible. However I'm not sure what I should even consider here, or how forward compatible my current implementation is. Thoughts? Until then, I'm writing up a nep, it is still pretty incomplete, it can be found here: https://github.com/cowlicks/numpy/blob/ufunc-override/doc/neps/ufunc-overrides.rst ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] GSoC proposal -- Numpy SciPy
Oh wow, I just assumed that `dot` was a ufunc... However, it would still be useful to have ufuncs working well with the sparse package. I don't understand everything that is going on in https://github.com/numpy/numpy/blob/master/numpy/core/src/umath/ufunc_object.c But I assumed that I would be able to add the ability to check for something like _ufunc_override_. I'm not sure where this piece of logic should be inserted, or what the performance implications to NumPy would be... I'm trying to figure this out. But major optimizations to ufuncs is out of the scope of this GSoC. I will look into what can be done about the `dot` function. On Tue, Apr 30, 2013 at 6:53 PM, Nathaniel Smith n...@pobox.com wrote: On Tue, Apr 30, 2013 at 4:02 PM, Pauli Virtanen p...@iki.fi wrote: 30.04.2013 22:37, Nathaniel Smith kirjoitti: [clip] How do you plan to go about this? The obvious option of just calling scipy.sparse.issparse() on ufunc entry raises some problems, since numpy can't depend on or even import scipy, and we might be reluctant to add such a special case for what's a rather more general problem. OTOH it might be possible to solve the problem in general, e.g., see the prototyped _ufunc_override_ special method in: https://github.com/njsmith/numpyNEP/blob/master/numpyNEP.py but I don't know if you want to get into such a debate within the scope of your GSoC. What were you thinking? To me it seems that the right thing to do here is the general solution. Do you see immediate problems in e.g. just enabling something like your _ufunc_override_? Just that we might want to think a bit about the design space before implementing something. E.g., apparently doing Python attribute lookup is very expensive -- we recently had a patch to skip __array_interface__ checks whenever possible -- is adding another such per-operation overhead ok? I guess we could use similar checks (skip checking for known types like int/float/ndarray), or only check for _ufunc_override_ on the class (not the instance) and cache the result per-class? The easy thing is that there are no backward compatibility problems here, since if the magic is missing, the old logic is used. Currently, the numpy dot() and ufuncs also most of the time do nothing sensible with sparse matrix inputs even though they in some cases return values. Which then makes writing generic sparse/dense code more painful than just __mul__ being matrix multiplication. I agree, but, if the main target is 'dot' then the current _ufunc_override_ design alone won't do it, since 'dot' is not a ufunc... -n ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] GSoC proposal -- Numpy SciPy
There are several situations where that comes up (Like comparing two sparse matrices A == B) There is a SparseEfficiancyWarning that can be thrown, but the way this should be implemented still needs to be discussed. I will be writing a specification on how ufuncs and ndarrays are handled by the sparse package, the spec can be found here https://github.com/cowlicks/scipy-sparse-ndarray-and-ufunc-spec/blob/master/Spec.markdown. In general, a unary ufunc operating on a sparse matrix should return a sparse matrix. If you really want to do cos(sparse) you will be able to. But if you are just interested in the initially non zero elements should probably do something like: sparse.data = np.cos(sparse.data) On Wed, May 1, 2013 at 1:32 PM, Daπid davidmen...@gmail.com wrote: On 1 May 2013 20:12, Blake Griffith blake.a.griff...@gmail.com wrote: However, it would still be useful to have ufuncs working well with the sparse package. How are you planning to deal with ufunc(0) != 0? cos(sparse) is actually dense. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] GSoC proposal -- Numpy SciPy
Hello, I'm writing a GSoC proposal, mostly concerning SciPy, but it involves a few changes to NumPy. The proposal is titled: Improvements to the sparse package of Scipy: support for bool dtype and better interaction with NumPy and can be found on my GitHub: https://github.com/cowlicks/GSoC-proposal/blob/master/proposal.markdown#numpy-interactionsjuly-8th-to-august-26th-7-weeks Basically, I want to change the ufunc class to be aware of SciPy's sparse matrices. So that when a ufunc is passed a sparse matrix as an argument, it will dispatch to a function in the sparse matrix package, which will then decide what to do. I just wanted to ping NumPy to make sure this is reasonable, and I'm not totally off track. Suggestions, feedback and criticism welcome. Thanks! ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion