[Numpy-discussion] Confused by spec of numpy.linalg.solve

2014-04-01 Thread Bob Dowling
Versions:

 sys.version
'3.3.2 (default, Mar  5 2014, 08:21:05) \n[GCC 4.8.2 20131212 (Red Hat
4.8.2-7)]'

 numpy.__version__
'1.8.0'



Problem:

I'm trying to unpick the shape requirements of numpy.linalg.solve().
The help text says:

solve(a, b) -
 a : (..., M, M) array_like
 Coefficient matrix.
 b : {(..., M,), (..., M, K)}, array_like
 Ordinate or dependent variable values.

It's the requirements on b that are giving me grief.  My read of the
help text is that b must have a shape with either its final axis or
its penultimate axis equal to M in size.  Which axis the matrix
contraction is along depends on the size of the final axis of b.


So, according to my reading, if b has shape (6,3) then the first
choice, (..., M,), is invoked but if a has shape (3,3) and b has
shape (3,6) then the second choice, (..., M, K), is invoked.  I find
this weird, but I've dealt with (much) weirder.


However, this is not what I see.  When b has shape (3,6) everything
goes as expected.  When b has shape (6,3) I get an error message that
6 is not equal to 3:

 ValueError: solve: Operand 1 has a mismatch in its core dimension 0,
 with gufunc signature (m,m),(m,n)-(m,n) (size 6 is different from 3)



Obviously my reading is incorrect.  Can somebody elucidate for me
exactly what the requirements are on the shape of b?



Example code:

import numpy
import numpy.linalg

# Works.
M = numpy.array([
 [1.0, 1.0/2.0, 1.0/3.0],
 [1.0/2.0, 1.0/3.0, 1.0/4.0],
 [1.0/3.0, 1.0/4.0, 1.0/5.0]
 ] )

yy1 = numpy.array([
 [1.0, 0.0, 0.0],
 [0.0, 1.0, 0.0],
 [0.0, 0.0, 1.0]
 ])
print(yy1.shape)
xx1 = numpy.linalg.solve(M, yy1)
print(xx1)

# Works too.
yy2 = numpy.array([
 [1.0, 0.0, 0.0, 1.0, 0.0, 0.0],
 [0.0, 1.0, 0.0, 0.0, 1.0, 0.0],
 [0.0, 0.0, 1.0, 0.0, 0.0, 1.0]
 ])
print(yy2.shape)
xx2 = numpy.linalg.solve(M, yy2)
print(xx2)

# Fails.
yy3 = numpy.array([
 [1.0, 0.0, 0.0],
 [0.0, 1.0, 0.0],
 [0.0, 0.0, 1.0],
 [1.0, 0.0, 0.0],
 [0.0, 1.0, 0.0],
 [0.0, 0.0, 1.0]
 ])
print(yy3.shape)
xx3 = numpy.linalg.solve(M, yy3)
print(xx3)




___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Confused by spec of numpy.linalg.solve

2014-04-01 Thread Sebastian Berg
On Di, 2014-04-01 at 15:31 +0100, Bob Dowling wrote:
 Versions:
 
  sys.version
 '3.3.2 (default, Mar  5 2014, 08:21:05) \n[GCC 4.8.2 20131212 (Red Hat
 4.8.2-7)]'
 
  numpy.__version__
 '1.8.0'
 
 
 
 Problem:
 
 I'm trying to unpick the shape requirements of numpy.linalg.solve().
 The help text says:
 
 solve(a, b) -
  a : (..., M, M) array_like
  Coefficient matrix.
  b : {(..., M,), (..., M, K)}, array_like
  Ordinate or dependent variable values.
 
 It's the requirements on b that are giving me grief.  My read of the
 help text is that b must have a shape with either its final axis or
 its penultimate axis equal to M in size.  Which axis the matrix
 contraction is along depends on the size of the final axis of b.
 
 
 So, according to my reading, if b has shape (6,3) then the first
 choice, (..., M,), is invoked but if a has shape (3,3) and b has
 shape (3,6) then the second choice, (..., M, K), is invoked.  I find
 this weird, but I've dealt with (much) weirder.
 

I bet the documentation needs some more info there (if you have time,
please write a pull request). If you look at the code (that part is just
python code), you will see what really happens.

If `a` has exactly one dimension more then `b`, the first case is used.
Otherwise (..., M, K) is used instead. To make sure you always get the
expected result, it may be best to make sure that the number of
broadcasting (...) dimensions of `a` and `b` are identical (I am not
sure if you expect this to be the case or not). The shape itself does
not matter, only the (relative) number of dimensions does for the
decision which of the two signatures is used.

In other words, since you do not use `...` your examples always use the
(M, K) logic.

 - Sebastian

 
 However, this is not what I see.  When b has shape (3,6) everything
 goes as expected.  When b has shape (6,3) I get an error message that
 6 is not equal to 3:
 
  ValueError: solve: Operand 1 has a mismatch in its core dimension 0,
  with gufunc signature (m,m),(m,n)-(m,n) (size 6 is different from 3)
 
 
 
 Obviously my reading is incorrect.  Can somebody elucidate for me
 exactly what the requirements are on the shape of b?
 
 
 
 Example code:
 
 import numpy
 import numpy.linalg
 
 # Works.
 M = numpy.array([
  [1.0, 1.0/2.0, 1.0/3.0],
  [1.0/2.0, 1.0/3.0, 1.0/4.0],
  [1.0/3.0, 1.0/4.0, 1.0/5.0]
  ] )
 
 yy1 = numpy.array([
  [1.0, 0.0, 0.0],
  [0.0, 1.0, 0.0],
  [0.0, 0.0, 1.0]
  ])
 print(yy1.shape)
 xx1 = numpy.linalg.solve(M, yy1)
 print(xx1)
 
 # Works too.
 yy2 = numpy.array([
  [1.0, 0.0, 0.0, 1.0, 0.0, 0.0],
  [0.0, 1.0, 0.0, 0.0, 1.0, 0.0],
  [0.0, 0.0, 1.0, 0.0, 0.0, 1.0]
  ])
 print(yy2.shape)
 xx2 = numpy.linalg.solve(M, yy2)
 print(xx2)
 
 # Fails.
 yy3 = numpy.array([
  [1.0, 0.0, 0.0],
  [0.0, 1.0, 0.0],
  [0.0, 0.0, 1.0],
  [1.0, 0.0, 0.0],
  [0.0, 1.0, 0.0],
  [0.0, 0.0, 1.0]
  ])
 print(yy3.shape)
 xx3 = numpy.linalg.solve(M, yy3)
 print(xx3)
 
 
 
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
 


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Resolving the associativity/precedence debate for @

2014-04-01 Thread Charles R Harris
On Mon, Mar 24, 2014 at 6:33 PM, Nathaniel Smith n...@pobox.com wrote:

 On Mon, Mar 24, 2014 at 11:58 PM, Charles R Harris
 charlesr.har...@gmail.com wrote:
  On Mon, Mar 24, 2014 at 5:56 PM, Nathaniel Smith n...@pobox.com wrote:
 
  On Sat, Mar 22, 2014 at 6:13 PM, Nathaniel Smith n...@pobox.com wrote:
   After 88 emails we don't have a conclusion in the other thread (see
   [1] for background). But we have to come to some conclusion or another
   if we want @ to exist :-). So I'll summarize where the discussion
   stands and let's see if we can find some way to resolve this.
 
  Response in this thread so far seems (AFAICT) to have pretty much
  converged on same-left.
 
  If you think that this would be terrible and there is some compelling
  argument against it, then please speak up! Otherwise, if no-one
  objects, then I'll go ahead in the next few days and put same-left
  into the PEP.
 
 
  I think we should take a close look at broadcasting before deciding on
 the
  precedence.

 Can you elaborate? Like what, concretely, do you think we need to do now?


Mostly I like to think of the '@' operators like commas in a function call
where each argument gets evaluated before the matrix multiplications take
place, so that would put it of lower precedence than '*', but still higher
than '+, -' . However, since most matrix expressions seem to be small it
may not matter much and the same result could be gotten with parenthesis.
But I do think it would make it easier to read and parse matrix expressions
as the '@' would serve as a natural divider. So 'A @ B*v' would be
equivalent to 'A @ (B*v)' and not '(A @B)*v'.

Hmm, now that I stare at it, it may actually be easier to simply read left
to right and use parenthesis when needed. So put me down as neutral at this
point and maybe trending towards equal precedence.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Confused by spec of numpy.linalg.solve

2014-04-01 Thread Nathaniel Smith
On Tue, Apr 1, 2014 at 3:57 PM, Sebastian Berg
sebast...@sipsolutions.net wrote:
 If `a` has exactly one dimension more then `b`, the first case is used.
 Otherwise (..., M, K) is used instead. To make sure you always get the
 expected result, it may be best to make sure that the number of
 broadcasting (...) dimensions of `a` and `b` are identical (I am not
 sure if you expect this to be the case or not). The shape itself does
 not matter, only the (relative) number of dimensions does for the
 decision which of the two signatures is used.

Oh, really? This seems really unfortunate -- AFAICT it makes it
impossible to write a generic broadcasting matrix-solve or
vector-solve :-/ (except by explicitly checking shapes and prepending
ones by hand, more or less doing the broadcasting manually). Surely it
would be better to use PEP 467 style broadcasting, where the only
special case is if `b` has exactly 1 dimension?

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ANN: NumPy 1.8.1 release

2014-04-01 Thread Chris Barker
On Mon, Mar 31, 2014 at 3:09 PM, Matthew Brett matthew.br...@gmail.comwrote:

 I am hopelessly lost here, but it looks as though Python extension
 modules get loaded via

 hDLL = LoadLibraryEx(pathname, NULL,
  LOAD_WITH_ALTERED_SEARCH_PATH);

 See:
 http://hg.python.org/cpython/file/3a1db0d2747e/Python/dynload_win.c#l195

 I think this means that the first directory on the search path is
 indeed the path containing the extension module:


 http://msdn.microsoft.com/en-us/library/windows/desktop/ms682586(v=vs.85).aspx#alternate_search_order_for_desktop_applications


yup -- that seems to be what it says...

So I'm guessing that it would not work putting DLLs into the 'DLLs'
 directory - unless the extension modules went in there too.


and yet there is a bunch of stuff there, so something is going on...It
looks like my Windows box is down at the moment, but I _think_ there are a
bunch of dependency dlls in there -- and not the extensions themselves.

But I'm way out of my depth, too.

-Chris


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Dates and times and Datetime64 (again)

2014-04-01 Thread Chris Barker
On Mon, Mar 31, 2014 at 7:19 PM, Nathaniel Smith n...@pobox.com wrote:

  The difference is that datetime.datetime doesn't provide any iso string
 parsing.

 Sure it does. datetime.strptime, with the %z modifier in particular.


that's not ISO parsing, that's parsing according to a user-defined format
string, which can be used for ISO parsing, but the user is in control of
how that's done. And I see this:

For a naive object, the %z and %Z format codes are replaced by empty
strings.

 though I'm not entirely sure what that means -- probably only for writing.

 The use case I'm imagining is for folks with ISO strings with a Z on the
 end -- they'll need to deal with pre-parsing the strings to strip off the
 Z, when it wouldn't change the result.
 
  Maybe this is an argument for UTC always rather than naive?

 Probably it is, but that approach seems a lot harder to extend to proper
 tz support later, plus being more likely to cause trouble for pandas's
 proper tz support now.

I was originally advocating for naive to begin with ;-) Someone else pushed
for UTC -- I thought it was you! (but I guess not)

It seems this committee of two has come to a consensus on naive -- and
you're probably right, raise an exception if there is a time zone specifier.

-CHB







  -n

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion




-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Dates and times and Datetime64 (again)

2014-04-01 Thread Alexander Belopolsky
On Tue, Apr 1, 2014 at 12:10 PM, Chris Barker chris.bar...@noaa.gov wrote:

 For a naive object, the %z and %Z format codes are replaced by empty
 strings.

  though I'm not entirely sure what that means -- probably only for writing.


That's right:

 from datetime import *
 datetime.now().strftime('%z')
''
 datetime.now(timezone.utc).strftime('%z')
'+'
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Dates and times and Datetime64 (again)

2014-04-01 Thread Alexander Belopolsky
On Tue, Apr 1, 2014 at 12:10 PM, Chris Barker chris.bar...@noaa.gov wrote:

 It seems this committee of two has come to a consensus on naive -- and
 you're probably right, raise an exception if there is a time zone specifier.


Count me as +1 on naive, but consider converting garbage (including strings
with trailing Z) to NaT.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Dates and times and Datetime64 (again)

2014-04-01 Thread Nathaniel Smith
On Tue, Apr 1, 2014 at 5:22 PM, Alexander Belopolsky ndar...@mac.com wrote:

 On Tue, Apr 1, 2014 at 12:10 PM, Chris Barker chris.bar...@noaa.gov wrote:

 It seems this committee of two has come to a consensus on naive -- and
 you're probably right, raise an exception if there is a time zone specifier.


 Count me as +1 on naive, but consider converting garbage (including strings
 with trailing Z) to NaT.

That's not how we handle other types, e.g.:

In [5]: a = np.zeros(1, dtype=float)

In [6]: a[0] = garbage
ValueError: could not convert string to float: garbage

(Cf, Errors should never pass silently.) Any reason why datetime64
should be different?

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ANN: NumPy 1.8.1 release

2014-04-01 Thread Matthew Brett
Hi,

On Tue, Apr 1, 2014 at 9:04 AM, Chris Barker chris.bar...@noaa.gov wrote:
 On Mon, Mar 31, 2014 at 3:09 PM, Matthew Brett matthew.br...@gmail.com
 wrote:

 I am hopelessly lost here, but it looks as though Python extension
 modules get loaded via

 hDLL = LoadLibraryEx(pathname, NULL,
  LOAD_WITH_ALTERED_SEARCH_PATH);

 See:
 http://hg.python.org/cpython/file/3a1db0d2747e/Python/dynload_win.c#l195

 I think this means that the first directory on the search path is
 indeed the path containing the extension module:


 http://msdn.microsoft.com/en-us/library/windows/desktop/ms682586(v=vs.85).aspx#alternate_search_order_for_desktop_applications


 yup -- that seems to be what it says...

 So I'm guessing that it would not work putting DLLs into the 'DLLs'
 directory - unless the extension modules went in there too.


 and yet there is a bunch of stuff there, so something is going on...It looks
 like my Windows box is down at the moment, but I _think_ there are a bunch
 of dependency dlls in there -- and not the extensions themselves.

I'm guessing that the LOAD_WITH_ALTERED_SEARCH_PATH means that a DLL loaded via:

hDLL = LoadLibraryEx(pathname, NULL,  LOAD_WITH_ALTERED_SEARCH_PATH);

will in turn (by default) search for its dependent DLLs in their own
directory.Or maybe in the directory of the first DLL to be loaded
with LOAD_WITH_ALTERED_SEARCH_PATH, damned if I can follow the
documentation.  Looking forward to doing my tax return after this.

But - anyway - that means that any extensions in the DLLs directory
will get their dependencies from the DLLs directory, but that is only
true for extensions in that directory.

Cheers,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ANN: NumPy 1.8.1 release

2014-04-01 Thread Nathaniel Smith
On Tue, Apr 1, 2014 at 6:26 PM, Matthew Brett matthew.br...@gmail.com wrote:
 I'm guessing that the LOAD_WITH_ALTERED_SEARCH_PATH means that a DLL loaded 
 via:

 hDLL = LoadLibraryEx(pathname, NULL,  LOAD_WITH_ALTERED_SEARCH_PATH);

 will in turn (by default) search for its dependent DLLs in their own
 directory.Or maybe in the directory of the first DLL to be loaded
 with LOAD_WITH_ALTERED_SEARCH_PATH, damned if I can follow the
 documentation.  Looking forward to doing my tax return after this.

 But - anyway - that means that any extensions in the DLLs directory
 will get their dependencies from the DLLs directory, but that is only
 true for extensions in that directory.

So in conclusion, if we just drop our compiled dependencies next to
the compiled module files then we're good, even on older Windows
versions? That sounds much simpler than previous discussions, but good
news if it's true...

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ANN: NumPy 1.8.1 release

2014-04-01 Thread Matthew Brett
Hi,

On Tue, Apr 1, 2014 at 10:43 AM, Nathaniel Smith n...@pobox.com wrote:
 On Tue, Apr 1, 2014 at 6:26 PM, Matthew Brett matthew.br...@gmail.com wrote:
 I'm guessing that the LOAD_WITH_ALTERED_SEARCH_PATH means that a DLL loaded 
 via:

 hDLL = LoadLibraryEx(pathname, NULL,  LOAD_WITH_ALTERED_SEARCH_PATH);

 will in turn (by default) search for its dependent DLLs in their own
 directory.Or maybe in the directory of the first DLL to be loaded
 with LOAD_WITH_ALTERED_SEARCH_PATH, damned if I can follow the
 documentation.  Looking forward to doing my tax return after this.

 But - anyway - that means that any extensions in the DLLs directory
 will get their dependencies from the DLLs directory, but that is only
 true for extensions in that directory.

 So in conclusion, if we just drop our compiled dependencies next to
 the compiled module files then we're good, even on older Windows
 versions? That sounds much simpler than previous discussions, but good
 news if it's true...

I think that's right, but as you can see, I am not sure.

It might explain why Carl Kleffner found that he could drop
libopenblas.dll in numpy/core and it just worked [1].  Well, if all
the extensions using blas / lapack are in fact in numpy/core.

Christoph - have you tried doing the same with MKL?

Cheers,

Matthew

[1] 
http://numpy-discussion.10968.n7.nabble.com/Default-builds-of-OpenBLAS-development-branch-are-now-fork-safe-td36523.html
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Dates and times and Datetime64 (again)

2014-04-01 Thread Sankarshan Mudkavi
I agree with that interpretation of naive as well. I'll change the proposal to 
reflect that. So any modifier should raise an error then? (At the risk of 
breaking people's code.)

The only question is, should we consider accepting the modifier and disregard 
it with a warning, letting the user know that this is only for temporary 
compatibility purposes?


As of now, it's not clear to me which of those options is better.

Cheers,
Sankarshan

On Apr 1, 2014, at 1:12 PM, Nathaniel Smith n...@pobox.com wrote:

 On Tue, Apr 1, 2014 at 5:22 PM, Alexander Belopolsky ndar...@mac.com wrote:
 
 On Tue, Apr 1, 2014 at 12:10 PM, Chris Barker chris.bar...@noaa.gov wrote:
 
 It seems this committee of two has come to a consensus on naive -- and
 you're probably right, raise an exception if there is a time zone specifier.
 
 
 Count me as +1 on naive, but consider converting garbage (including strings
 with trailing Z) to NaT.
 
 That's not how we handle other types, e.g.:
 
 In [5]: a = np.zeros(1, dtype=float)
 
 In [6]: a[0] = garbage
 ValueError: could not convert string to float: garbage
 
 (Cf, Errors should never pass silently.) Any reason why datetime64
 should be different?
 
 -n
 
 -- 
 Nathaniel J. Smith
 Postdoctoral researcher - Informatics - University of Edinburgh
 http://vorpus.org
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

-- 
Sankarshan Mudkavi
Undergraduate in Physics, University of Waterloo
www.smudkavi.com








signature.asc
Description: Message signed with OpenPGP using GPGMail
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ANN: NumPy 1.8.1 release

2014-04-01 Thread Matthew Brett
Hi,

I just noticed this C reference implementation of blas:

https://github.com/rljames/coblas

No lapack, no benchmarks, but tests, and BSD.  I wonder if it is
possible to craft a Frankenlibrary from OpenBLAS and reference
implementations to avoid broken parts of OpenBLAS?

Cheers,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Standard Deviation (std): Suggested change for ddof default value

2014-04-01 Thread Haslwanter Thomas
While most other Python applications (scipy, pandas) use for the calculation of 
the standard deviation the default ddof=1 (i.e. they calculate the sample 
standard deviation), the Numpy implementation uses the default ddof=0.
Personally I cannot think of many applications where it would be desired to 
calculate the standard deviation with ddof=0. In addition, I feel that there 
should be consistency between standard modules such as numpy, scipy, and pandas.

I am wondering if there is a good reason to stick to ddof=0 as the default 
for std, or if others would agree with my suggestion to change the default to 
ddof=1?

Thomas

---
Prof. (FH) PD Dr. Thomas Haslwanter
School of Applied Health and Social Sciences
University of Applied Sciences Upper Austria
FH OÖ Studienbetriebs GmbH
Garnisonstraße 21
4020 Linz/Austria
Tel.: +43 (0)5 0804 -52170
Fax: +43 (0)5 0804 -52171
E-Mail: thomas.haslwan...@fh-linz.atmailto:thomas.haslwan...@fh-linz.at
Web: me-research.fh-linz.athttp://work.thaslwanter.at
or work.thaslwanter.at

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Dates and times and Datetime64 (again)

2014-04-01 Thread Alexander Belopolsky
On Tue, Apr 1, 2014 at 1:12 PM, Nathaniel Smith n...@pobox.com wrote:

 In [6]: a[0] = garbage
 ValueError: could not convert string to float: garbage

 (Cf, Errors should never pass silently.) Any reason why datetime64
 should be different?


datetime64 is different because it has NaT support from the start.  NaN
support for floats seems to be an afterthought if not an accident of
implementation.

And it looks like some errors do pass silently:

 a[0] = 1
# not a TypeError

But I withdraw my suggestion.  The closer datetime64 behavior is to numeric
types the better.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Standard Deviation (std): Suggested change for ddof default value

2014-04-01 Thread Benjamin Root
Because np.mean() is ddof=0? (I mean effectively, not that it actually has
a parameter for that) There is consistency within the library, and I
certainly wouldn't want to have NaN all of the sudden coming from my calls
to mean() that I apply to an arbitrary non-empty array of values that
happened to have only one value. So, if we can't change the default for
mean, then it only makes sense to keep np.std() consistent with np.mean().

My 2 cents...
Ben Root



On Tue, Apr 1, 2014 at 2:27 PM, Haslwanter Thomas 
thomas.haslwan...@fh-linz.at wrote:

 While most other Python applications (scipy, pandas) use for the
 calculation of the standard deviation the default ddof=1 (i.e. they
 calculate the sample standard deviation), the Numpy implementation uses the
 default ddof=0.

 Personally I cannot think of many applications where it would be desired
 to calculate the standard deviation with ddof=0. In addition, I feel that
 there should be consistency between standard modules such as numpy, scipy,
 and pandas.



 I am wondering if there is a good reason to stick to ddof=0 as the
 default for std, or if others would agree with my suggestion to change
 the default to ddof=1?



 Thomas



 ---
 Prof. (FH) PD Dr. Thomas Haslwanter
 School of Applied Health and Social Sciences

 *University of Applied Sciences* *Upper Austria*
 *FH OÖ Studienbetriebs GmbH*
 Garnisonstraße 21
 4020 Linz/Austria
 Tel.: +43 (0)5 0804 -52170
 Fax: +43 (0)5 0804 -52171
 E-Mail: thomas.haslwan...@fh-linz.at
 Web: me-research.fh-linz.at http://work.thaslwanter.at
 or work.thaslwanter.at



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Standard Deviation (std): Suggested change for ddof default value

2014-04-01 Thread Sturla Molden
Haslwanter Thomas thomas.haslwan...@fh-linz.at wrote:

 Personally I cannot think of many applications where it would be desired
 to calculate the standard deviation with ddof=0. In addition, I feel that
 there should be consistency between standard modules such as numpy, scipy, 
 and pandas.

ddof=0 is the maxiumum likelihood estimate. It is also needed in Bayesian
estimation.

If you are not eatimating from a sample, but rather calculating for the
whole population, you always want ddof=0. 

What does Matlab do by default? (Yes, it is a retorical question.)


 I am wondering if there is a good reason to stick to ddof=0 as the
 default for std, or if others would agree with my suggestion to change
 the default to ddof=1?

It is a bad idea to suddenly break everyone's code. 


Sturla

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Standard Deviation (std): Suggested change for ddof default value

2014-04-01 Thread Eelco Hoogendoorn
I agree; breaking code over this would be ridiculous. Also, I prefer the
zero default, despite the mean/std combo probably being more common.


On Tue, Apr 1, 2014 at 10:02 PM, Sturla Molden sturla.mol...@gmail.comwrote:

 Haslwanter Thomas thomas.haslwan...@fh-linz.at wrote:

  Personally I cannot think of many applications where it would be desired
  to calculate the standard deviation with ddof=0. In addition, I feel that
  there should be consistency between standard modules such as numpy,
 scipy, and pandas.

 ddof=0 is the maxiumum likelihood estimate. It is also needed in Bayesian
 estimation.

 If you are not eatimating from a sample, but rather calculating for the
 whole population, you always want ddof=0.

 What does Matlab do by default? (Yes, it is a retorical question.)


  I am wondering if there is a good reason to stick to ddof=0 as the
  default for std, or if others would agree with my suggestion to change
  the default to ddof=1?

 It is a bad idea to suddenly break everyone's code.


 Sturla

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Standard Deviation (std): Suggested change for ddof default value

2014-04-01 Thread Nathaniel Smith
On Tue, Apr 1, 2014 at 9:02 PM, Sturla Molden sturla.mol...@gmail.com wrote:
 Haslwanter Thomas thomas.haslwan...@fh-linz.at wrote:

 Personally I cannot think of many applications where it would be desired
 to calculate the standard deviation with ddof=0. In addition, I feel that
 there should be consistency between standard modules such as numpy, scipy, 
 and pandas.

 ddof=0 is the maxiumum likelihood estimate. It is also needed in Bayesian
 estimation.

It's true, but the counter-arguments are also strong. And regardless
of whether ddof=1 or ddof=0 is better, surely the same one is better
for both numpy and scipy.

 If you are not eatimating from a sample, but rather calculating for the
 whole population, you always want ddof=0.

 What does Matlab do by default? (Yes, it is a retorical question.)

R (which is probably a more relevant comparison) does do ddof=1 by default.

 I am wondering if there is a good reason to stick to ddof=0 as the
 default for std, or if others would agree with my suggestion to change
 the default to ddof=1?

 It is a bad idea to suddenly break everyone's code.

It would be a disruptive transition, but OTOH having inconsistencies
like this guarantees the ongoing creation of new broken code.

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Confused by spec of numpy.linalg.solve

2014-04-01 Thread Bob Dowling
On 04/01/2014 04:25 PM, Nathaniel Smith wrote:
 On Tue, Apr 1, 2014 at 3:57 PM, Sebastian Berg
 sebast...@sipsolutions.net wrote:
 If `a` has exactly one dimension more then `b`, the first case is used.
 Otherwise (..., M, K) is used instead. To make sure you always get the
 expected result, it may be best to make sure that the number of
 broadcasting (...) dimensions of `a` and `b` are identical (I am not
 sure if you expect this to be the case or not). The shape itself does
 not matter, only the (relative) number of dimensions does for the
 decision which of the two signatures is used.
 Oh, really? This seems really unfortunate

It also seems quite counter-intuitive.  It means that an array a of 
shape (3,3) will behave radically differently to one of shape (1,3,3).  
But thank you for the explanation.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Confused by spec of numpy.linalg.solve

2014-04-01 Thread Sebastian Berg
On Di, 2014-04-01 at 16:25 +0100, Nathaniel Smith wrote:
 On Tue, Apr 1, 2014 at 3:57 PM, Sebastian Berg
 sebast...@sipsolutions.net wrote:
  If `a` has exactly one dimension more then `b`, the first case is used.
  Otherwise (..., M, K) is used instead. To make sure you always get the
  expected result, it may be best to make sure that the number of
  broadcasting (...) dimensions of `a` and `b` are identical (I am not
  sure if you expect this to be the case or not). The shape itself does
  not matter, only the (relative) number of dimensions does for the
  decision which of the two signatures is used.
 

Since b is a system of equations if it is 2-dim, I think it basically
doesn't make sense to have a (M, K) shaped b anyway, since you could use
a (K, M) shaped b with broadcasting logic (though I guess that is slower
unless you add extra logic).

- Sebastian

 Oh, really? This seems really unfortunate -- AFAICT it makes it
 impossible to write a generic broadcasting matrix-solve or
 vector-solve :-/ (except by explicitly checking shapes and prepending
 ones by hand, more or less doing the broadcasting manually). Surely it
 would be better to use PEP 467 style broadcasting, where the only
 special case is if `b` has exactly 1 dimension?
 
 -n
 


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Standard Deviation (std): Suggested change for ddof default value

2014-04-01 Thread Ralf Gommers
On Tue, Apr 1, 2014 at 10:08 PM, Nathaniel Smith n...@pobox.com wrote:

 On Tue, Apr 1, 2014 at 9:02 PM, Sturla Molden sturla.mol...@gmail.com
 wrote:
  Haslwanter Thomas thomas.haslwan...@fh-linz.at wrote:
 
  Personally I cannot think of many applications where it would be desired
  to calculate the standard deviation with ddof=0. In addition, I feel
 that
  there should be consistency between standard modules such as numpy,
 scipy, and pandas.
 
  ddof=0 is the maxiumum likelihood estimate. It is also needed in Bayesian
  estimation.

 It's true, but the counter-arguments are also strong. And regardless
 of whether ddof=1 or ddof=0 is better, surely the same one is better
 for both numpy and scipy.


If we could still choose here without any costs, obviously that's true.
This particular ship sailed a long time ago though. By the way, there isn't
even a `scipy.stats.std`, so we're comparing with differently named
functions (nanstd for example).


   If you are not eatimating from a sample, but rather calculating for the
  whole population, you always want ddof=0.
 
  What does Matlab do by default? (Yes, it is a retorical question.)

 R (which is probably a more relevant comparison) does do ddof=1 by default.

  I am wondering if there is a good reason to stick to ddof=0 as the
  default for std, or if others would agree with my suggestion to change
  the default to ddof=1?
 
  It is a bad idea to suddenly break everyone's code.

 It would be a disruptive transition, but OTOH having inconsistencies
 like this guarantees the ongoing creation of new broken code.


Not much of an argument to change return values for a so heavily used
function.

Ralf



 -n

 --
 Nathaniel J. Smith
 Postdoctoral researcher - Informatics - University of Edinburgh
 http://vorpus.org
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Standard Deviation (std): Suggested change for ddof default value

2014-04-01 Thread Charles R Harris
On Tue, Apr 1, 2014 at 2:08 PM, Nathaniel Smith n...@pobox.com wrote:

 On Tue, Apr 1, 2014 at 9:02 PM, Sturla Molden sturla.mol...@gmail.com
 wrote:
  Haslwanter Thomas thomas.haslwan...@fh-linz.at wrote:
 
  Personally I cannot think of many applications where it would be desired
  to calculate the standard deviation with ddof=0. In addition, I feel
 that
  there should be consistency between standard modules such as numpy,
 scipy, and pandas.
 
  ddof=0 is the maxiumum likelihood estimate. It is also needed in Bayesian
  estimation.

 It's true, but the counter-arguments are also strong. And regardless
 of whether ddof=1 or ddof=0 is better, surely the same one is better
 for both numpy and scipy.

  If you are not eatimating from a sample, but rather calculating for the
  whole population, you always want ddof=0.
 
  What does Matlab do by default? (Yes, it is a retorical question.)

 R (which is probably a more relevant comparison) does do ddof=1 by default.

  I am wondering if there is a good reason to stick to ddof=0 as the
  default for std, or if others would agree with my suggestion to change
  the default to ddof=1?
 
  It is a bad idea to suddenly break everyone's code.

 It would be a disruptive transition, but OTOH having inconsistencies
 like this guarantees the ongoing creation of new broken code.


This topic comes up regularly. The original choice was made for numpy 1.0b1
by Travis, see this later
thread.http://thread.gmane.org/gmane.comp.python.numeric.general/25720/focus=25721At
this point it is probably best to leave it alone.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Standard Deviation (std): Suggested change for ddof default value

2014-04-01 Thread Nathaniel Smith
On Tue, Apr 1, 2014 at 9:51 PM, Ralf Gommers ralf.gomm...@gmail.com wrote:



 On Tue, Apr 1, 2014 at 10:08 PM, Nathaniel Smith n...@pobox.com wrote:

 On Tue, Apr 1, 2014 at 9:02 PM, Sturla Molden sturla.mol...@gmail.com
 wrote:
  Haslwanter Thomas thomas.haslwan...@fh-linz.at wrote:
 
  Personally I cannot think of many applications where it would be
  desired
  to calculate the standard deviation with ddof=0. In addition, I feel
  that
  there should be consistency between standard modules such as numpy,
  scipy, and pandas.
 
  ddof=0 is the maxiumum likelihood estimate. It is also needed in
  Bayesian
  estimation.

 It's true, but the counter-arguments are also strong. And regardless
 of whether ddof=1 or ddof=0 is better, surely the same one is better
 for both numpy and scipy.

 If we could still choose here without any costs, obviously that's true. This
 particular ship sailed a long time ago though. By the way, there isn't even
 a `scipy.stats.std`, so we're comparing with differently named functions
 (nanstd for example).

Presumably nanstd is a lot less heavily used than std, and presumably
people expect 'nanstd' to be a 'nan' version of 'std' -- what do you
think of changing nanstd to ddof=0 to match numpy? (With appropriate
FutureWarning transition, etc.)

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Standard Deviation (std): Suggested change for ddof default value

2014-04-01 Thread alex
On Tue, Apr 1, 2014 at 4:54 PM, Charles R Harris
charlesr.har...@gmail.com wrote:



 On Tue, Apr 1, 2014 at 2:08 PM, Nathaniel Smith n...@pobox.com wrote:

 On Tue, Apr 1, 2014 at 9:02 PM, Sturla Molden sturla.mol...@gmail.com
 wrote:
  Haslwanter Thomas thomas.haslwan...@fh-linz.at wrote:
 
  Personally I cannot think of many applications where it would be
  desired
  to calculate the standard deviation with ddof=0. In addition, I feel
  that
  there should be consistency between standard modules such as numpy,
  scipy, and pandas.
 
  ddof=0 is the maxiumum likelihood estimate. It is also needed in
  Bayesian
  estimation.

 It's true, but the counter-arguments are also strong. And regardless
 of whether ddof=1 or ddof=0 is better, surely the same one is better
 for both numpy and scipy.

  If you are not eatimating from a sample, but rather calculating for the
  whole population, you always want ddof=0.
 
  What does Matlab do by default? (Yes, it is a retorical question.)

 R (which is probably a more relevant comparison) does do ddof=1 by
 default.

  I am wondering if there is a good reason to stick to ddof=0 as the
  default for std, or if others would agree with my suggestion to
  change
  the default to ddof=1?
 
  It is a bad idea to suddenly break everyone's code.

 It would be a disruptive transition, but OTOH having inconsistencies
 like this guarantees the ongoing creation of new broken code.


 This topic comes up regularly. The original choice was made for numpy 1.0b1
 by Travis, see this later thread. At this point it is probably best to leave
 it alone.

I don't have any opinion about this debate, but I love the
justification in that thread Any surprise that is created by the
different default should be mitigated by the fact that it's an
opportunity to learn something about what you are doing.  This
masterpiece of rhetoric will surely help me win many internet
arguments in the future!
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Confused by spec of numpy.linalg.solve

2014-04-01 Thread Nathaniel Smith
On Tue, Apr 1, 2014 at 9:50 PM, Sebastian Berg
sebast...@sipsolutions.net wrote:
 On Di, 2014-04-01 at 16:25 +0100, Nathaniel Smith wrote:
 On Tue, Apr 1, 2014 at 3:57 PM, Sebastian Berg
 sebast...@sipsolutions.net wrote:
  If `a` has exactly one dimension more then `b`, the first case is used.
  Otherwise (..., M, K) is used instead. To make sure you always get the
  expected result, it may be best to make sure that the number of
  broadcasting (...) dimensions of `a` and `b` are identical (I am not
  sure if you expect this to be the case or not). The shape itself does
  not matter, only the (relative) number of dimensions does for the
  decision which of the two signatures is used.


 Since b is a system of equations if it is 2-dim, I think it basically
 doesn't make sense to have a (M, K) shaped b anyway, since you could use
 a (K, M) shaped b with broadcasting logic (though I guess that is slower
 unless you add extra logic).

Not sure I'm following your point exactly, but the argument for having
(M, M) `a` and (M, K) `b` is that solve(a, b) is the same as
dot(inv(a), b), which obviously accepts 2d `a` and `b`...

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Standard Deviation (std): Suggested change for ddof default value

2014-04-01 Thread josef . pktd
On Tue, Apr 1, 2014 at 5:11 PM, Nathaniel Smith n...@pobox.com wrote:
 On Tue, Apr 1, 2014 at 9:51 PM, Ralf Gommers ralf.gomm...@gmail.com wrote:



 On Tue, Apr 1, 2014 at 10:08 PM, Nathaniel Smith n...@pobox.com wrote:

 On Tue, Apr 1, 2014 at 9:02 PM, Sturla Molden sturla.mol...@gmail.com
 wrote:
  Haslwanter Thomas thomas.haslwan...@fh-linz.at wrote:
 
  Personally I cannot think of many applications where it would be
  desired
  to calculate the standard deviation with ddof=0. In addition, I feel
  that
  there should be consistency between standard modules such as numpy,
  scipy, and pandas.
 
  ddof=0 is the maxiumum likelihood estimate. It is also needed in
  Bayesian
  estimation.

 It's true, but the counter-arguments are also strong. And regardless
 of whether ddof=1 or ddof=0 is better, surely the same one is better
 for both numpy and scipy.

 If we could still choose here without any costs, obviously that's true. This
 particular ship sailed a long time ago though. By the way, there isn't even
 a `scipy.stats.std`, so we're comparing with differently named functions
 (nanstd for example).

 Presumably nanstd is a lot less heavily used than std, and presumably
 people expect 'nanstd' to be a 'nan' version of 'std' -- what do you
 think of changing nanstd to ddof=0 to match numpy? (With appropriate
 FutureWarning transition, etc.)

numpy is numpy, a numerical library
scipy.stats is stats and behaves differently.  (axis=0)

nanstd in scipy.stats will hopefully also go away soon, so I don't
think it's worth changing there either.

pandas came later and thought ddof=1 is worth more than consistency.

I don't think ddof defaults's are worth jumping through deprecation hoops.

(bias in cov, corrcoef is non-standard ddof)

Josef



 --
 Nathaniel J. Smith
 Postdoctoral researcher - Informatics - University of Edinburgh
 http://vorpus.org
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ANN: NumPy 1.8.1 release

2014-04-01 Thread David Cournapeau
On Tue, Apr 1, 2014 at 6:43 PM, Nathaniel Smith n...@pobox.com wrote:

 On Tue, Apr 1, 2014 at 6:26 PM, Matthew Brett matthew.br...@gmail.com
 wrote:
  I'm guessing that the LOAD_WITH_ALTERED_SEARCH_PATH means that a DLL
 loaded via:
 
  hDLL = LoadLibraryEx(pathname, NULL,  LOAD_WITH_ALTERED_SEARCH_PATH);
 
  will in turn (by default) search for its dependent DLLs in their own
  directory.Or maybe in the directory of the first DLL to be loaded
  with LOAD_WITH_ALTERED_SEARCH_PATH, damned if I can follow the
  documentation.  Looking forward to doing my tax return after this.
 
  But - anyway - that means that any extensions in the DLLs directory
  will get their dependencies from the DLLs directory, but that is only
  true for extensions in that directory.

 So in conclusion, if we just drop our compiled dependencies next to
 the compiled module files then we're good, even on older Windows
 versions? That sounds much simpler than previous discussions, but good
 news if it's true...


That does not work very well in my experience:

  - numpy has extension modules in multiple directories, so we would need
to copy the dlls in multiple subdirectories
  - copying dlls means that windows will load that dll multiple times, with
all the ensuing problems (I don't know for MKL/OpenBlas, but we've seen
serious issues when doing something similar for hdf5 dll and pytables/h5py).

David


 --
 Nathaniel J. Smith
 Postdoctoral researcher - Informatics - University of Edinburgh
 http://vorpus.org
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ANN: NumPy 1.8.1 release

2014-04-01 Thread Nathaniel Smith
On Tue, Apr 1, 2014 at 11:58 PM, David Cournapeau courn...@gmail.com wrote:
 On Tue, Apr 1, 2014 at 6:43 PM, Nathaniel Smith n...@pobox.com wrote:

 On Tue, Apr 1, 2014 at 6:26 PM, Matthew Brett matthew.br...@gmail.com
 wrote:
  I'm guessing that the LOAD_WITH_ALTERED_SEARCH_PATH means that a DLL
  loaded via:
 
  hDLL = LoadLibraryEx(pathname, NULL,  LOAD_WITH_ALTERED_SEARCH_PATH);
 
  will in turn (by default) search for its dependent DLLs in their own
  directory.Or maybe in the directory of the first DLL to be loaded
  with LOAD_WITH_ALTERED_SEARCH_PATH, damned if I can follow the
  documentation.  Looking forward to doing my tax return after this.
 
  But - anyway - that means that any extensions in the DLLs directory
  will get their dependencies from the DLLs directory, but that is only
  true for extensions in that directory.

 So in conclusion, if we just drop our compiled dependencies next to
 the compiled module files then we're good, even on older Windows
 versions? That sounds much simpler than previous discussions, but good
 news if it's true...


 That does not work very well in my experience:

   - numpy has extension modules in multiple directories, so we would need to
 copy the dlls in multiple subdirectories
   - copying dlls means that windows will load that dll multiple times, with
 all the ensuing problems (I don't know for MKL/OpenBlas, but we've seen
 serious issues when doing something similar for hdf5 dll and pytables/h5py).

We could just ship all numpy's extension modules in the same directory
if we wanted. It would be pretty easy to stick some code at the top of
numpy/__init__.py to load them from numpy/all_dlls/ and then slot them
into the appropriate places in the package namespace.

Of course scipy and numpy will still both have to ship BLAS etc., and
so I guess it will get loaded at least twice in *any* binary install
system. I'm not sure why this would be a problem (Windows, unlike
Unix, carefully separates DLL namespaces, right?), but if it is a
problem then it's a very fundamental one for any binaries we ship.

Do the binaries we ship now have this problem? Or are we currently
managing to statically link everything?

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ANN: NumPy 1.8.1 release

2014-04-01 Thread David Cournapeau
On Wed, Apr 2, 2014 at 12:36 AM, Nathaniel Smith n...@pobox.com wrote:

 On Tue, Apr 1, 2014 at 11:58 PM, David Cournapeau courn...@gmail.com
 wrote:
  On Tue, Apr 1, 2014 at 6:43 PM, Nathaniel Smith n...@pobox.com wrote:
 
  On Tue, Apr 1, 2014 at 6:26 PM, Matthew Brett matthew.br...@gmail.com
  wrote:
   I'm guessing that the LOAD_WITH_ALTERED_SEARCH_PATH means that a DLL
   loaded via:
  
   hDLL = LoadLibraryEx(pathname, NULL,  LOAD_WITH_ALTERED_SEARCH_PATH);
  
   will in turn (by default) search for its dependent DLLs in their own
   directory.Or maybe in the directory of the first DLL to be loaded
   with LOAD_WITH_ALTERED_SEARCH_PATH, damned if I can follow the
   documentation.  Looking forward to doing my tax return after this.
  
   But - anyway - that means that any extensions in the DLLs directory
   will get their dependencies from the DLLs directory, but that is only
   true for extensions in that directory.
 
  So in conclusion, if we just drop our compiled dependencies next to
  the compiled module files then we're good, even on older Windows
  versions? That sounds much simpler than previous discussions, but good
  news if it's true...
 
 
  That does not work very well in my experience:
 
- numpy has extension modules in multiple directories, so we would
 need to
  copy the dlls in multiple subdirectories
- copying dlls means that windows will load that dll multiple times,
 with
  all the ensuing problems (I don't know for MKL/OpenBlas, but we've seen
  serious issues when doing something similar for hdf5 dll and
 pytables/h5py).

 We could just ship all numpy's extension modules in the same directory
 if we wanted. It would be pretty easy to stick some code at the top of
 numpy/__init__.py to load them from numpy/all_dlls/ and then slot them
 into the appropriate places in the package namespace.

 Of course scipy and numpy will still both have to ship BLAS etc., and
 so I guess it will get loaded at least twice in *any* binary install
 system. I'm not sure why this would be a problem (Windows, unlike
 Unix, carefully separates DLL namespaces, right?)


It does not really matter here. For pure blas/lapack, that may be ok
because the functions are stateless, but I would not count on it either.

The cleanest solution I can think of is to have 'privately shared DLL', but
that would AFAIK require patching python, so not really an option.


, but if it is a
 problem then it's a very fundamental one for any binaries we ship.

 Do the binaries we ship now have this problem? Or are we currently
 managing to statically link everything?


We currently statically link everything. The main challenge is that 'new'
(= 4) versions of mingw don't easily allow statically linking all the
mingw-related dependencies. While the options are there, everytime I tried
to do it with an official build of mingw, I had some weird, very hard to
track crashes. The other alternative that has been suggested is to build
one own's toolchain where everything is static by default. I am not sure
why that works, and that brings the risk of depending on a toolchain that
we can't really maintain.

David


 -n

 --
 Nathaniel J. Smith
 Postdoctoral researcher - Informatics - University of Edinburgh
 http://vorpus.org
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion