[Numpy-discussion] Confused by spec of numpy.linalg.solve
Versions: sys.version '3.3.2 (default, Mar 5 2014, 08:21:05) \n[GCC 4.8.2 20131212 (Red Hat 4.8.2-7)]' numpy.__version__ '1.8.0' Problem: I'm trying to unpick the shape requirements of numpy.linalg.solve(). The help text says: solve(a, b) - a : (..., M, M) array_like Coefficient matrix. b : {(..., M,), (..., M, K)}, array_like Ordinate or dependent variable values. It's the requirements on b that are giving me grief. My read of the help text is that b must have a shape with either its final axis or its penultimate axis equal to M in size. Which axis the matrix contraction is along depends on the size of the final axis of b. So, according to my reading, if b has shape (6,3) then the first choice, (..., M,), is invoked but if a has shape (3,3) and b has shape (3,6) then the second choice, (..., M, K), is invoked. I find this weird, but I've dealt with (much) weirder. However, this is not what I see. When b has shape (3,6) everything goes as expected. When b has shape (6,3) I get an error message that 6 is not equal to 3: ValueError: solve: Operand 1 has a mismatch in its core dimension 0, with gufunc signature (m,m),(m,n)-(m,n) (size 6 is different from 3) Obviously my reading is incorrect. Can somebody elucidate for me exactly what the requirements are on the shape of b? Example code: import numpy import numpy.linalg # Works. M = numpy.array([ [1.0, 1.0/2.0, 1.0/3.0], [1.0/2.0, 1.0/3.0, 1.0/4.0], [1.0/3.0, 1.0/4.0, 1.0/5.0] ] ) yy1 = numpy.array([ [1.0, 0.0, 0.0], [0.0, 1.0, 0.0], [0.0, 0.0, 1.0] ]) print(yy1.shape) xx1 = numpy.linalg.solve(M, yy1) print(xx1) # Works too. yy2 = numpy.array([ [1.0, 0.0, 0.0, 1.0, 0.0, 0.0], [0.0, 1.0, 0.0, 0.0, 1.0, 0.0], [0.0, 0.0, 1.0, 0.0, 0.0, 1.0] ]) print(yy2.shape) xx2 = numpy.linalg.solve(M, yy2) print(xx2) # Fails. yy3 = numpy.array([ [1.0, 0.0, 0.0], [0.0, 1.0, 0.0], [0.0, 0.0, 1.0], [1.0, 0.0, 0.0], [0.0, 1.0, 0.0], [0.0, 0.0, 1.0] ]) print(yy3.shape) xx3 = numpy.linalg.solve(M, yy3) print(xx3) ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Confused by spec of numpy.linalg.solve
On Di, 2014-04-01 at 15:31 +0100, Bob Dowling wrote: Versions: sys.version '3.3.2 (default, Mar 5 2014, 08:21:05) \n[GCC 4.8.2 20131212 (Red Hat 4.8.2-7)]' numpy.__version__ '1.8.0' Problem: I'm trying to unpick the shape requirements of numpy.linalg.solve(). The help text says: solve(a, b) - a : (..., M, M) array_like Coefficient matrix. b : {(..., M,), (..., M, K)}, array_like Ordinate or dependent variable values. It's the requirements on b that are giving me grief. My read of the help text is that b must have a shape with either its final axis or its penultimate axis equal to M in size. Which axis the matrix contraction is along depends on the size of the final axis of b. So, according to my reading, if b has shape (6,3) then the first choice, (..., M,), is invoked but if a has shape (3,3) and b has shape (3,6) then the second choice, (..., M, K), is invoked. I find this weird, but I've dealt with (much) weirder. I bet the documentation needs some more info there (if you have time, please write a pull request). If you look at the code (that part is just python code), you will see what really happens. If `a` has exactly one dimension more then `b`, the first case is used. Otherwise (..., M, K) is used instead. To make sure you always get the expected result, it may be best to make sure that the number of broadcasting (...) dimensions of `a` and `b` are identical (I am not sure if you expect this to be the case or not). The shape itself does not matter, only the (relative) number of dimensions does for the decision which of the two signatures is used. In other words, since you do not use `...` your examples always use the (M, K) logic. - Sebastian However, this is not what I see. When b has shape (3,6) everything goes as expected. When b has shape (6,3) I get an error message that 6 is not equal to 3: ValueError: solve: Operand 1 has a mismatch in its core dimension 0, with gufunc signature (m,m),(m,n)-(m,n) (size 6 is different from 3) Obviously my reading is incorrect. Can somebody elucidate for me exactly what the requirements are on the shape of b? Example code: import numpy import numpy.linalg # Works. M = numpy.array([ [1.0, 1.0/2.0, 1.0/3.0], [1.0/2.0, 1.0/3.0, 1.0/4.0], [1.0/3.0, 1.0/4.0, 1.0/5.0] ] ) yy1 = numpy.array([ [1.0, 0.0, 0.0], [0.0, 1.0, 0.0], [0.0, 0.0, 1.0] ]) print(yy1.shape) xx1 = numpy.linalg.solve(M, yy1) print(xx1) # Works too. yy2 = numpy.array([ [1.0, 0.0, 0.0, 1.0, 0.0, 0.0], [0.0, 1.0, 0.0, 0.0, 1.0, 0.0], [0.0, 0.0, 1.0, 0.0, 0.0, 1.0] ]) print(yy2.shape) xx2 = numpy.linalg.solve(M, yy2) print(xx2) # Fails. yy3 = numpy.array([ [1.0, 0.0, 0.0], [0.0, 1.0, 0.0], [0.0, 0.0, 1.0], [1.0, 0.0, 0.0], [0.0, 1.0, 0.0], [0.0, 0.0, 1.0] ]) print(yy3.shape) xx3 = numpy.linalg.solve(M, yy3) print(xx3) ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Resolving the associativity/precedence debate for @
On Mon, Mar 24, 2014 at 6:33 PM, Nathaniel Smith n...@pobox.com wrote: On Mon, Mar 24, 2014 at 11:58 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Mon, Mar 24, 2014 at 5:56 PM, Nathaniel Smith n...@pobox.com wrote: On Sat, Mar 22, 2014 at 6:13 PM, Nathaniel Smith n...@pobox.com wrote: After 88 emails we don't have a conclusion in the other thread (see [1] for background). But we have to come to some conclusion or another if we want @ to exist :-). So I'll summarize where the discussion stands and let's see if we can find some way to resolve this. Response in this thread so far seems (AFAICT) to have pretty much converged on same-left. If you think that this would be terrible and there is some compelling argument against it, then please speak up! Otherwise, if no-one objects, then I'll go ahead in the next few days and put same-left into the PEP. I think we should take a close look at broadcasting before deciding on the precedence. Can you elaborate? Like what, concretely, do you think we need to do now? Mostly I like to think of the '@' operators like commas in a function call where each argument gets evaluated before the matrix multiplications take place, so that would put it of lower precedence than '*', but still higher than '+, -' . However, since most matrix expressions seem to be small it may not matter much and the same result could be gotten with parenthesis. But I do think it would make it easier to read and parse matrix expressions as the '@' would serve as a natural divider. So 'A @ B*v' would be equivalent to 'A @ (B*v)' and not '(A @B)*v'. Hmm, now that I stare at it, it may actually be easier to simply read left to right and use parenthesis when needed. So put me down as neutral at this point and maybe trending towards equal precedence. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Confused by spec of numpy.linalg.solve
On Tue, Apr 1, 2014 at 3:57 PM, Sebastian Berg sebast...@sipsolutions.net wrote: If `a` has exactly one dimension more then `b`, the first case is used. Otherwise (..., M, K) is used instead. To make sure you always get the expected result, it may be best to make sure that the number of broadcasting (...) dimensions of `a` and `b` are identical (I am not sure if you expect this to be the case or not). The shape itself does not matter, only the (relative) number of dimensions does for the decision which of the two signatures is used. Oh, really? This seems really unfortunate -- AFAICT it makes it impossible to write a generic broadcasting matrix-solve or vector-solve :-/ (except by explicitly checking shapes and prepending ones by hand, more or less doing the broadcasting manually). Surely it would be better to use PEP 467 style broadcasting, where the only special case is if `b` has exactly 1 dimension? -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ANN: NumPy 1.8.1 release
On Mon, Mar 31, 2014 at 3:09 PM, Matthew Brett matthew.br...@gmail.comwrote: I am hopelessly lost here, but it looks as though Python extension modules get loaded via hDLL = LoadLibraryEx(pathname, NULL, LOAD_WITH_ALTERED_SEARCH_PATH); See: http://hg.python.org/cpython/file/3a1db0d2747e/Python/dynload_win.c#l195 I think this means that the first directory on the search path is indeed the path containing the extension module: http://msdn.microsoft.com/en-us/library/windows/desktop/ms682586(v=vs.85).aspx#alternate_search_order_for_desktop_applications yup -- that seems to be what it says... So I'm guessing that it would not work putting DLLs into the 'DLLs' directory - unless the extension modules went in there too. and yet there is a bunch of stuff there, so something is going on...It looks like my Windows box is down at the moment, but I _think_ there are a bunch of dependency dlls in there -- and not the extensions themselves. But I'm way out of my depth, too. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Dates and times and Datetime64 (again)
On Mon, Mar 31, 2014 at 7:19 PM, Nathaniel Smith n...@pobox.com wrote: The difference is that datetime.datetime doesn't provide any iso string parsing. Sure it does. datetime.strptime, with the %z modifier in particular. that's not ISO parsing, that's parsing according to a user-defined format string, which can be used for ISO parsing, but the user is in control of how that's done. And I see this: For a naive object, the %z and %Z format codes are replaced by empty strings. though I'm not entirely sure what that means -- probably only for writing. The use case I'm imagining is for folks with ISO strings with a Z on the end -- they'll need to deal with pre-parsing the strings to strip off the Z, when it wouldn't change the result. Maybe this is an argument for UTC always rather than naive? Probably it is, but that approach seems a lot harder to extend to proper tz support later, plus being more likely to cause trouble for pandas's proper tz support now. I was originally advocating for naive to begin with ;-) Someone else pushed for UTC -- I thought it was you! (but I guess not) It seems this committee of two has come to a consensus on naive -- and you're probably right, raise an exception if there is a time zone specifier. -CHB -n ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Dates and times and Datetime64 (again)
On Tue, Apr 1, 2014 at 12:10 PM, Chris Barker chris.bar...@noaa.gov wrote: For a naive object, the %z and %Z format codes are replaced by empty strings. though I'm not entirely sure what that means -- probably only for writing. That's right: from datetime import * datetime.now().strftime('%z') '' datetime.now(timezone.utc).strftime('%z') '+' ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Dates and times and Datetime64 (again)
On Tue, Apr 1, 2014 at 12:10 PM, Chris Barker chris.bar...@noaa.gov wrote: It seems this committee of two has come to a consensus on naive -- and you're probably right, raise an exception if there is a time zone specifier. Count me as +1 on naive, but consider converting garbage (including strings with trailing Z) to NaT. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Dates and times and Datetime64 (again)
On Tue, Apr 1, 2014 at 5:22 PM, Alexander Belopolsky ndar...@mac.com wrote: On Tue, Apr 1, 2014 at 12:10 PM, Chris Barker chris.bar...@noaa.gov wrote: It seems this committee of two has come to a consensus on naive -- and you're probably right, raise an exception if there is a time zone specifier. Count me as +1 on naive, but consider converting garbage (including strings with trailing Z) to NaT. That's not how we handle other types, e.g.: In [5]: a = np.zeros(1, dtype=float) In [6]: a[0] = garbage ValueError: could not convert string to float: garbage (Cf, Errors should never pass silently.) Any reason why datetime64 should be different? -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ANN: NumPy 1.8.1 release
Hi, On Tue, Apr 1, 2014 at 9:04 AM, Chris Barker chris.bar...@noaa.gov wrote: On Mon, Mar 31, 2014 at 3:09 PM, Matthew Brett matthew.br...@gmail.com wrote: I am hopelessly lost here, but it looks as though Python extension modules get loaded via hDLL = LoadLibraryEx(pathname, NULL, LOAD_WITH_ALTERED_SEARCH_PATH); See: http://hg.python.org/cpython/file/3a1db0d2747e/Python/dynload_win.c#l195 I think this means that the first directory on the search path is indeed the path containing the extension module: http://msdn.microsoft.com/en-us/library/windows/desktop/ms682586(v=vs.85).aspx#alternate_search_order_for_desktop_applications yup -- that seems to be what it says... So I'm guessing that it would not work putting DLLs into the 'DLLs' directory - unless the extension modules went in there too. and yet there is a bunch of stuff there, so something is going on...It looks like my Windows box is down at the moment, but I _think_ there are a bunch of dependency dlls in there -- and not the extensions themselves. I'm guessing that the LOAD_WITH_ALTERED_SEARCH_PATH means that a DLL loaded via: hDLL = LoadLibraryEx(pathname, NULL, LOAD_WITH_ALTERED_SEARCH_PATH); will in turn (by default) search for its dependent DLLs in their own directory.Or maybe in the directory of the first DLL to be loaded with LOAD_WITH_ALTERED_SEARCH_PATH, damned if I can follow the documentation. Looking forward to doing my tax return after this. But - anyway - that means that any extensions in the DLLs directory will get their dependencies from the DLLs directory, but that is only true for extensions in that directory. Cheers, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ANN: NumPy 1.8.1 release
On Tue, Apr 1, 2014 at 6:26 PM, Matthew Brett matthew.br...@gmail.com wrote: I'm guessing that the LOAD_WITH_ALTERED_SEARCH_PATH means that a DLL loaded via: hDLL = LoadLibraryEx(pathname, NULL, LOAD_WITH_ALTERED_SEARCH_PATH); will in turn (by default) search for its dependent DLLs in their own directory.Or maybe in the directory of the first DLL to be loaded with LOAD_WITH_ALTERED_SEARCH_PATH, damned if I can follow the documentation. Looking forward to doing my tax return after this. But - anyway - that means that any extensions in the DLLs directory will get their dependencies from the DLLs directory, but that is only true for extensions in that directory. So in conclusion, if we just drop our compiled dependencies next to the compiled module files then we're good, even on older Windows versions? That sounds much simpler than previous discussions, but good news if it's true... -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ANN: NumPy 1.8.1 release
Hi, On Tue, Apr 1, 2014 at 10:43 AM, Nathaniel Smith n...@pobox.com wrote: On Tue, Apr 1, 2014 at 6:26 PM, Matthew Brett matthew.br...@gmail.com wrote: I'm guessing that the LOAD_WITH_ALTERED_SEARCH_PATH means that a DLL loaded via: hDLL = LoadLibraryEx(pathname, NULL, LOAD_WITH_ALTERED_SEARCH_PATH); will in turn (by default) search for its dependent DLLs in their own directory.Or maybe in the directory of the first DLL to be loaded with LOAD_WITH_ALTERED_SEARCH_PATH, damned if I can follow the documentation. Looking forward to doing my tax return after this. But - anyway - that means that any extensions in the DLLs directory will get their dependencies from the DLLs directory, but that is only true for extensions in that directory. So in conclusion, if we just drop our compiled dependencies next to the compiled module files then we're good, even on older Windows versions? That sounds much simpler than previous discussions, but good news if it's true... I think that's right, but as you can see, I am not sure. It might explain why Carl Kleffner found that he could drop libopenblas.dll in numpy/core and it just worked [1]. Well, if all the extensions using blas / lapack are in fact in numpy/core. Christoph - have you tried doing the same with MKL? Cheers, Matthew [1] http://numpy-discussion.10968.n7.nabble.com/Default-builds-of-OpenBLAS-development-branch-are-now-fork-safe-td36523.html ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Dates and times and Datetime64 (again)
I agree with that interpretation of naive as well. I'll change the proposal to reflect that. So any modifier should raise an error then? (At the risk of breaking people's code.) The only question is, should we consider accepting the modifier and disregard it with a warning, letting the user know that this is only for temporary compatibility purposes? As of now, it's not clear to me which of those options is better. Cheers, Sankarshan On Apr 1, 2014, at 1:12 PM, Nathaniel Smith n...@pobox.com wrote: On Tue, Apr 1, 2014 at 5:22 PM, Alexander Belopolsky ndar...@mac.com wrote: On Tue, Apr 1, 2014 at 12:10 PM, Chris Barker chris.bar...@noaa.gov wrote: It seems this committee of two has come to a consensus on naive -- and you're probably right, raise an exception if there is a time zone specifier. Count me as +1 on naive, but consider converting garbage (including strings with trailing Z) to NaT. That's not how we handle other types, e.g.: In [5]: a = np.zeros(1, dtype=float) In [6]: a[0] = garbage ValueError: could not convert string to float: garbage (Cf, Errors should never pass silently.) Any reason why datetime64 should be different? -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Sankarshan Mudkavi Undergraduate in Physics, University of Waterloo www.smudkavi.com signature.asc Description: Message signed with OpenPGP using GPGMail ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ANN: NumPy 1.8.1 release
Hi, I just noticed this C reference implementation of blas: https://github.com/rljames/coblas No lapack, no benchmarks, but tests, and BSD. I wonder if it is possible to craft a Frankenlibrary from OpenBLAS and reference implementations to avoid broken parts of OpenBLAS? Cheers, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Standard Deviation (std): Suggested change for ddof default value
While most other Python applications (scipy, pandas) use for the calculation of the standard deviation the default ddof=1 (i.e. they calculate the sample standard deviation), the Numpy implementation uses the default ddof=0. Personally I cannot think of many applications where it would be desired to calculate the standard deviation with ddof=0. In addition, I feel that there should be consistency between standard modules such as numpy, scipy, and pandas. I am wondering if there is a good reason to stick to ddof=0 as the default for std, or if others would agree with my suggestion to change the default to ddof=1? Thomas --- Prof. (FH) PD Dr. Thomas Haslwanter School of Applied Health and Social Sciences University of Applied Sciences Upper Austria FH OÖ Studienbetriebs GmbH Garnisonstraße 21 4020 Linz/Austria Tel.: +43 (0)5 0804 -52170 Fax: +43 (0)5 0804 -52171 E-Mail: thomas.haslwan...@fh-linz.atmailto:thomas.haslwan...@fh-linz.at Web: me-research.fh-linz.athttp://work.thaslwanter.at or work.thaslwanter.at ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Dates and times and Datetime64 (again)
On Tue, Apr 1, 2014 at 1:12 PM, Nathaniel Smith n...@pobox.com wrote: In [6]: a[0] = garbage ValueError: could not convert string to float: garbage (Cf, Errors should never pass silently.) Any reason why datetime64 should be different? datetime64 is different because it has NaT support from the start. NaN support for floats seems to be an afterthought if not an accident of implementation. And it looks like some errors do pass silently: a[0] = 1 # not a TypeError But I withdraw my suggestion. The closer datetime64 behavior is to numeric types the better. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Standard Deviation (std): Suggested change for ddof default value
Because np.mean() is ddof=0? (I mean effectively, not that it actually has a parameter for that) There is consistency within the library, and I certainly wouldn't want to have NaN all of the sudden coming from my calls to mean() that I apply to an arbitrary non-empty array of values that happened to have only one value. So, if we can't change the default for mean, then it only makes sense to keep np.std() consistent with np.mean(). My 2 cents... Ben Root On Tue, Apr 1, 2014 at 2:27 PM, Haslwanter Thomas thomas.haslwan...@fh-linz.at wrote: While most other Python applications (scipy, pandas) use for the calculation of the standard deviation the default ddof=1 (i.e. they calculate the sample standard deviation), the Numpy implementation uses the default ddof=0. Personally I cannot think of many applications where it would be desired to calculate the standard deviation with ddof=0. In addition, I feel that there should be consistency between standard modules such as numpy, scipy, and pandas. I am wondering if there is a good reason to stick to ddof=0 as the default for std, or if others would agree with my suggestion to change the default to ddof=1? Thomas --- Prof. (FH) PD Dr. Thomas Haslwanter School of Applied Health and Social Sciences *University of Applied Sciences* *Upper Austria* *FH OÖ Studienbetriebs GmbH* Garnisonstraße 21 4020 Linz/Austria Tel.: +43 (0)5 0804 -52170 Fax: +43 (0)5 0804 -52171 E-Mail: thomas.haslwan...@fh-linz.at Web: me-research.fh-linz.at http://work.thaslwanter.at or work.thaslwanter.at ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Standard Deviation (std): Suggested change for ddof default value
Haslwanter Thomas thomas.haslwan...@fh-linz.at wrote: Personally I cannot think of many applications where it would be desired to calculate the standard deviation with ddof=0. In addition, I feel that there should be consistency between standard modules such as numpy, scipy, and pandas. ddof=0 is the maxiumum likelihood estimate. It is also needed in Bayesian estimation. If you are not eatimating from a sample, but rather calculating for the whole population, you always want ddof=0. What does Matlab do by default? (Yes, it is a retorical question.) I am wondering if there is a good reason to stick to ddof=0 as the default for std, or if others would agree with my suggestion to change the default to ddof=1? It is a bad idea to suddenly break everyone's code. Sturla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Standard Deviation (std): Suggested change for ddof default value
I agree; breaking code over this would be ridiculous. Also, I prefer the zero default, despite the mean/std combo probably being more common. On Tue, Apr 1, 2014 at 10:02 PM, Sturla Molden sturla.mol...@gmail.comwrote: Haslwanter Thomas thomas.haslwan...@fh-linz.at wrote: Personally I cannot think of many applications where it would be desired to calculate the standard deviation with ddof=0. In addition, I feel that there should be consistency between standard modules such as numpy, scipy, and pandas. ddof=0 is the maxiumum likelihood estimate. It is also needed in Bayesian estimation. If you are not eatimating from a sample, but rather calculating for the whole population, you always want ddof=0. What does Matlab do by default? (Yes, it is a retorical question.) I am wondering if there is a good reason to stick to ddof=0 as the default for std, or if others would agree with my suggestion to change the default to ddof=1? It is a bad idea to suddenly break everyone's code. Sturla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Standard Deviation (std): Suggested change for ddof default value
On Tue, Apr 1, 2014 at 9:02 PM, Sturla Molden sturla.mol...@gmail.com wrote: Haslwanter Thomas thomas.haslwan...@fh-linz.at wrote: Personally I cannot think of many applications where it would be desired to calculate the standard deviation with ddof=0. In addition, I feel that there should be consistency between standard modules such as numpy, scipy, and pandas. ddof=0 is the maxiumum likelihood estimate. It is also needed in Bayesian estimation. It's true, but the counter-arguments are also strong. And regardless of whether ddof=1 or ddof=0 is better, surely the same one is better for both numpy and scipy. If you are not eatimating from a sample, but rather calculating for the whole population, you always want ddof=0. What does Matlab do by default? (Yes, it is a retorical question.) R (which is probably a more relevant comparison) does do ddof=1 by default. I am wondering if there is a good reason to stick to ddof=0 as the default for std, or if others would agree with my suggestion to change the default to ddof=1? It is a bad idea to suddenly break everyone's code. It would be a disruptive transition, but OTOH having inconsistencies like this guarantees the ongoing creation of new broken code. -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Confused by spec of numpy.linalg.solve
On 04/01/2014 04:25 PM, Nathaniel Smith wrote: On Tue, Apr 1, 2014 at 3:57 PM, Sebastian Berg sebast...@sipsolutions.net wrote: If `a` has exactly one dimension more then `b`, the first case is used. Otherwise (..., M, K) is used instead. To make sure you always get the expected result, it may be best to make sure that the number of broadcasting (...) dimensions of `a` and `b` are identical (I am not sure if you expect this to be the case or not). The shape itself does not matter, only the (relative) number of dimensions does for the decision which of the two signatures is used. Oh, really? This seems really unfortunate It also seems quite counter-intuitive. It means that an array a of shape (3,3) will behave radically differently to one of shape (1,3,3). But thank you for the explanation. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Confused by spec of numpy.linalg.solve
On Di, 2014-04-01 at 16:25 +0100, Nathaniel Smith wrote: On Tue, Apr 1, 2014 at 3:57 PM, Sebastian Berg sebast...@sipsolutions.net wrote: If `a` has exactly one dimension more then `b`, the first case is used. Otherwise (..., M, K) is used instead. To make sure you always get the expected result, it may be best to make sure that the number of broadcasting (...) dimensions of `a` and `b` are identical (I am not sure if you expect this to be the case or not). The shape itself does not matter, only the (relative) number of dimensions does for the decision which of the two signatures is used. Since b is a system of equations if it is 2-dim, I think it basically doesn't make sense to have a (M, K) shaped b anyway, since you could use a (K, M) shaped b with broadcasting logic (though I guess that is slower unless you add extra logic). - Sebastian Oh, really? This seems really unfortunate -- AFAICT it makes it impossible to write a generic broadcasting matrix-solve or vector-solve :-/ (except by explicitly checking shapes and prepending ones by hand, more or less doing the broadcasting manually). Surely it would be better to use PEP 467 style broadcasting, where the only special case is if `b` has exactly 1 dimension? -n ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Standard Deviation (std): Suggested change for ddof default value
On Tue, Apr 1, 2014 at 10:08 PM, Nathaniel Smith n...@pobox.com wrote: On Tue, Apr 1, 2014 at 9:02 PM, Sturla Molden sturla.mol...@gmail.com wrote: Haslwanter Thomas thomas.haslwan...@fh-linz.at wrote: Personally I cannot think of many applications where it would be desired to calculate the standard deviation with ddof=0. In addition, I feel that there should be consistency between standard modules such as numpy, scipy, and pandas. ddof=0 is the maxiumum likelihood estimate. It is also needed in Bayesian estimation. It's true, but the counter-arguments are also strong. And regardless of whether ddof=1 or ddof=0 is better, surely the same one is better for both numpy and scipy. If we could still choose here without any costs, obviously that's true. This particular ship sailed a long time ago though. By the way, there isn't even a `scipy.stats.std`, so we're comparing with differently named functions (nanstd for example). If you are not eatimating from a sample, but rather calculating for the whole population, you always want ddof=0. What does Matlab do by default? (Yes, it is a retorical question.) R (which is probably a more relevant comparison) does do ddof=1 by default. I am wondering if there is a good reason to stick to ddof=0 as the default for std, or if others would agree with my suggestion to change the default to ddof=1? It is a bad idea to suddenly break everyone's code. It would be a disruptive transition, but OTOH having inconsistencies like this guarantees the ongoing creation of new broken code. Not much of an argument to change return values for a so heavily used function. Ralf -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Standard Deviation (std): Suggested change for ddof default value
On Tue, Apr 1, 2014 at 2:08 PM, Nathaniel Smith n...@pobox.com wrote: On Tue, Apr 1, 2014 at 9:02 PM, Sturla Molden sturla.mol...@gmail.com wrote: Haslwanter Thomas thomas.haslwan...@fh-linz.at wrote: Personally I cannot think of many applications where it would be desired to calculate the standard deviation with ddof=0. In addition, I feel that there should be consistency between standard modules such as numpy, scipy, and pandas. ddof=0 is the maxiumum likelihood estimate. It is also needed in Bayesian estimation. It's true, but the counter-arguments are also strong. And regardless of whether ddof=1 or ddof=0 is better, surely the same one is better for both numpy and scipy. If you are not eatimating from a sample, but rather calculating for the whole population, you always want ddof=0. What does Matlab do by default? (Yes, it is a retorical question.) R (which is probably a more relevant comparison) does do ddof=1 by default. I am wondering if there is a good reason to stick to ddof=0 as the default for std, or if others would agree with my suggestion to change the default to ddof=1? It is a bad idea to suddenly break everyone's code. It would be a disruptive transition, but OTOH having inconsistencies like this guarantees the ongoing creation of new broken code. This topic comes up regularly. The original choice was made for numpy 1.0b1 by Travis, see this later thread.http://thread.gmane.org/gmane.comp.python.numeric.general/25720/focus=25721At this point it is probably best to leave it alone. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Standard Deviation (std): Suggested change for ddof default value
On Tue, Apr 1, 2014 at 9:51 PM, Ralf Gommers ralf.gomm...@gmail.com wrote: On Tue, Apr 1, 2014 at 10:08 PM, Nathaniel Smith n...@pobox.com wrote: On Tue, Apr 1, 2014 at 9:02 PM, Sturla Molden sturla.mol...@gmail.com wrote: Haslwanter Thomas thomas.haslwan...@fh-linz.at wrote: Personally I cannot think of many applications where it would be desired to calculate the standard deviation with ddof=0. In addition, I feel that there should be consistency between standard modules such as numpy, scipy, and pandas. ddof=0 is the maxiumum likelihood estimate. It is also needed in Bayesian estimation. It's true, but the counter-arguments are also strong. And regardless of whether ddof=1 or ddof=0 is better, surely the same one is better for both numpy and scipy. If we could still choose here without any costs, obviously that's true. This particular ship sailed a long time ago though. By the way, there isn't even a `scipy.stats.std`, so we're comparing with differently named functions (nanstd for example). Presumably nanstd is a lot less heavily used than std, and presumably people expect 'nanstd' to be a 'nan' version of 'std' -- what do you think of changing nanstd to ddof=0 to match numpy? (With appropriate FutureWarning transition, etc.) -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Standard Deviation (std): Suggested change for ddof default value
On Tue, Apr 1, 2014 at 4:54 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Tue, Apr 1, 2014 at 2:08 PM, Nathaniel Smith n...@pobox.com wrote: On Tue, Apr 1, 2014 at 9:02 PM, Sturla Molden sturla.mol...@gmail.com wrote: Haslwanter Thomas thomas.haslwan...@fh-linz.at wrote: Personally I cannot think of many applications where it would be desired to calculate the standard deviation with ddof=0. In addition, I feel that there should be consistency between standard modules such as numpy, scipy, and pandas. ddof=0 is the maxiumum likelihood estimate. It is also needed in Bayesian estimation. It's true, but the counter-arguments are also strong. And regardless of whether ddof=1 or ddof=0 is better, surely the same one is better for both numpy and scipy. If you are not eatimating from a sample, but rather calculating for the whole population, you always want ddof=0. What does Matlab do by default? (Yes, it is a retorical question.) R (which is probably a more relevant comparison) does do ddof=1 by default. I am wondering if there is a good reason to stick to ddof=0 as the default for std, or if others would agree with my suggestion to change the default to ddof=1? It is a bad idea to suddenly break everyone's code. It would be a disruptive transition, but OTOH having inconsistencies like this guarantees the ongoing creation of new broken code. This topic comes up regularly. The original choice was made for numpy 1.0b1 by Travis, see this later thread. At this point it is probably best to leave it alone. I don't have any opinion about this debate, but I love the justification in that thread Any surprise that is created by the different default should be mitigated by the fact that it's an opportunity to learn something about what you are doing. This masterpiece of rhetoric will surely help me win many internet arguments in the future! ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Confused by spec of numpy.linalg.solve
On Tue, Apr 1, 2014 at 9:50 PM, Sebastian Berg sebast...@sipsolutions.net wrote: On Di, 2014-04-01 at 16:25 +0100, Nathaniel Smith wrote: On Tue, Apr 1, 2014 at 3:57 PM, Sebastian Berg sebast...@sipsolutions.net wrote: If `a` has exactly one dimension more then `b`, the first case is used. Otherwise (..., M, K) is used instead. To make sure you always get the expected result, it may be best to make sure that the number of broadcasting (...) dimensions of `a` and `b` are identical (I am not sure if you expect this to be the case or not). The shape itself does not matter, only the (relative) number of dimensions does for the decision which of the two signatures is used. Since b is a system of equations if it is 2-dim, I think it basically doesn't make sense to have a (M, K) shaped b anyway, since you could use a (K, M) shaped b with broadcasting logic (though I guess that is slower unless you add extra logic). Not sure I'm following your point exactly, but the argument for having (M, M) `a` and (M, K) `b` is that solve(a, b) is the same as dot(inv(a), b), which obviously accepts 2d `a` and `b`... -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Standard Deviation (std): Suggested change for ddof default value
On Tue, Apr 1, 2014 at 5:11 PM, Nathaniel Smith n...@pobox.com wrote: On Tue, Apr 1, 2014 at 9:51 PM, Ralf Gommers ralf.gomm...@gmail.com wrote: On Tue, Apr 1, 2014 at 10:08 PM, Nathaniel Smith n...@pobox.com wrote: On Tue, Apr 1, 2014 at 9:02 PM, Sturla Molden sturla.mol...@gmail.com wrote: Haslwanter Thomas thomas.haslwan...@fh-linz.at wrote: Personally I cannot think of many applications where it would be desired to calculate the standard deviation with ddof=0. In addition, I feel that there should be consistency between standard modules such as numpy, scipy, and pandas. ddof=0 is the maxiumum likelihood estimate. It is also needed in Bayesian estimation. It's true, but the counter-arguments are also strong. And regardless of whether ddof=1 or ddof=0 is better, surely the same one is better for both numpy and scipy. If we could still choose here without any costs, obviously that's true. This particular ship sailed a long time ago though. By the way, there isn't even a `scipy.stats.std`, so we're comparing with differently named functions (nanstd for example). Presumably nanstd is a lot less heavily used than std, and presumably people expect 'nanstd' to be a 'nan' version of 'std' -- what do you think of changing nanstd to ddof=0 to match numpy? (With appropriate FutureWarning transition, etc.) numpy is numpy, a numerical library scipy.stats is stats and behaves differently. (axis=0) nanstd in scipy.stats will hopefully also go away soon, so I don't think it's worth changing there either. pandas came later and thought ddof=1 is worth more than consistency. I don't think ddof defaults's are worth jumping through deprecation hoops. (bias in cov, corrcoef is non-standard ddof) Josef -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ANN: NumPy 1.8.1 release
On Tue, Apr 1, 2014 at 6:43 PM, Nathaniel Smith n...@pobox.com wrote: On Tue, Apr 1, 2014 at 6:26 PM, Matthew Brett matthew.br...@gmail.com wrote: I'm guessing that the LOAD_WITH_ALTERED_SEARCH_PATH means that a DLL loaded via: hDLL = LoadLibraryEx(pathname, NULL, LOAD_WITH_ALTERED_SEARCH_PATH); will in turn (by default) search for its dependent DLLs in their own directory.Or maybe in the directory of the first DLL to be loaded with LOAD_WITH_ALTERED_SEARCH_PATH, damned if I can follow the documentation. Looking forward to doing my tax return after this. But - anyway - that means that any extensions in the DLLs directory will get their dependencies from the DLLs directory, but that is only true for extensions in that directory. So in conclusion, if we just drop our compiled dependencies next to the compiled module files then we're good, even on older Windows versions? That sounds much simpler than previous discussions, but good news if it's true... That does not work very well in my experience: - numpy has extension modules in multiple directories, so we would need to copy the dlls in multiple subdirectories - copying dlls means that windows will load that dll multiple times, with all the ensuing problems (I don't know for MKL/OpenBlas, but we've seen serious issues when doing something similar for hdf5 dll and pytables/h5py). David -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ANN: NumPy 1.8.1 release
On Tue, Apr 1, 2014 at 11:58 PM, David Cournapeau courn...@gmail.com wrote: On Tue, Apr 1, 2014 at 6:43 PM, Nathaniel Smith n...@pobox.com wrote: On Tue, Apr 1, 2014 at 6:26 PM, Matthew Brett matthew.br...@gmail.com wrote: I'm guessing that the LOAD_WITH_ALTERED_SEARCH_PATH means that a DLL loaded via: hDLL = LoadLibraryEx(pathname, NULL, LOAD_WITH_ALTERED_SEARCH_PATH); will in turn (by default) search for its dependent DLLs in their own directory.Or maybe in the directory of the first DLL to be loaded with LOAD_WITH_ALTERED_SEARCH_PATH, damned if I can follow the documentation. Looking forward to doing my tax return after this. But - anyway - that means that any extensions in the DLLs directory will get their dependencies from the DLLs directory, but that is only true for extensions in that directory. So in conclusion, if we just drop our compiled dependencies next to the compiled module files then we're good, even on older Windows versions? That sounds much simpler than previous discussions, but good news if it's true... That does not work very well in my experience: - numpy has extension modules in multiple directories, so we would need to copy the dlls in multiple subdirectories - copying dlls means that windows will load that dll multiple times, with all the ensuing problems (I don't know for MKL/OpenBlas, but we've seen serious issues when doing something similar for hdf5 dll and pytables/h5py). We could just ship all numpy's extension modules in the same directory if we wanted. It would be pretty easy to stick some code at the top of numpy/__init__.py to load them from numpy/all_dlls/ and then slot them into the appropriate places in the package namespace. Of course scipy and numpy will still both have to ship BLAS etc., and so I guess it will get loaded at least twice in *any* binary install system. I'm not sure why this would be a problem (Windows, unlike Unix, carefully separates DLL namespaces, right?), but if it is a problem then it's a very fundamental one for any binaries we ship. Do the binaries we ship now have this problem? Or are we currently managing to statically link everything? -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ANN: NumPy 1.8.1 release
On Wed, Apr 2, 2014 at 12:36 AM, Nathaniel Smith n...@pobox.com wrote: On Tue, Apr 1, 2014 at 11:58 PM, David Cournapeau courn...@gmail.com wrote: On Tue, Apr 1, 2014 at 6:43 PM, Nathaniel Smith n...@pobox.com wrote: On Tue, Apr 1, 2014 at 6:26 PM, Matthew Brett matthew.br...@gmail.com wrote: I'm guessing that the LOAD_WITH_ALTERED_SEARCH_PATH means that a DLL loaded via: hDLL = LoadLibraryEx(pathname, NULL, LOAD_WITH_ALTERED_SEARCH_PATH); will in turn (by default) search for its dependent DLLs in their own directory.Or maybe in the directory of the first DLL to be loaded with LOAD_WITH_ALTERED_SEARCH_PATH, damned if I can follow the documentation. Looking forward to doing my tax return after this. But - anyway - that means that any extensions in the DLLs directory will get their dependencies from the DLLs directory, but that is only true for extensions in that directory. So in conclusion, if we just drop our compiled dependencies next to the compiled module files then we're good, even on older Windows versions? That sounds much simpler than previous discussions, but good news if it's true... That does not work very well in my experience: - numpy has extension modules in multiple directories, so we would need to copy the dlls in multiple subdirectories - copying dlls means that windows will load that dll multiple times, with all the ensuing problems (I don't know for MKL/OpenBlas, but we've seen serious issues when doing something similar for hdf5 dll and pytables/h5py). We could just ship all numpy's extension modules in the same directory if we wanted. It would be pretty easy to stick some code at the top of numpy/__init__.py to load them from numpy/all_dlls/ and then slot them into the appropriate places in the package namespace. Of course scipy and numpy will still both have to ship BLAS etc., and so I guess it will get loaded at least twice in *any* binary install system. I'm not sure why this would be a problem (Windows, unlike Unix, carefully separates DLL namespaces, right?) It does not really matter here. For pure blas/lapack, that may be ok because the functions are stateless, but I would not count on it either. The cleanest solution I can think of is to have 'privately shared DLL', but that would AFAIK require patching python, so not really an option. , but if it is a problem then it's a very fundamental one for any binaries we ship. Do the binaries we ship now have this problem? Or are we currently managing to statically link everything? We currently statically link everything. The main challenge is that 'new' (= 4) versions of mingw don't easily allow statically linking all the mingw-related dependencies. While the options are there, everytime I tried to do it with an official build of mingw, I had some weird, very hard to track crashes. The other alternative that has been suggested is to build one own's toolchain where everything is static by default. I am not sure why that works, and that brings the risk of depending on a toolchain that we can't really maintain. David -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion