from:"josef.pktd"

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-25 Thread josef.pktd

On Fri, Jul 25, 2014 at 4:25 PM, RayS r...@blue-cove.com wrote:

 At 11:29 AM 7/25/2014, you wrote:
 On Fri, Jul 25, 2014 at 5:56 PM, RayS r...@blue-cove.com wrote:
   The important point was that it would be best if all of the
  methods affected
   by summing 32 bit floats with 32 bit accumulators had the same Notes as
   numpy.mean(). We went through a lot of code yesterday, assuming that
 any
   numpy or Scipy.stats functions that use accumulators suffer the same
 issue,
   whether noted or not, and found it true.
 
 Do you have a list of the functions that are affected?

 We only tested a few we used, but
 scipy.stats.nanmean, or any .*mean()
 numpy.sum, mean, average, std, var,...

 via something like:

 import numpy
 import scipy.stats
 print numpy.__version__
 print scipy.__version__
 onez = numpy.ones((2**25, 1), numpy.float32)
 step = 2**10
 func = scipy.stats.nanmean
 for s in range(2**24-step, 2**25, step):
  if func(onez[:s+step])!=1.:
  print '\nbroke', s, func(onez[:s+step])
  break
  else:
  print '\r',s,

   That said, it does seem that np.mean could be implemented better than
 it is, even given float32's inherent limitations. If anyone wants to
 implement better algorithms for computing the mean, variance, sums,
 etc., then we would love to add them to numpy.

 Others have pointed out the possible tradeoffs in summation algos,
 perhaps a method arg would be appropriate, better depending on your
 desire for speed vs. accuracy.


I think this would be a good improvement.
But it doesn't compensate for users to be aware of the problems. I think
the docstring and the description of the dtype argument is pretty clear.

I'm largely with Eelco, whatever precision or algorithm we use, with
floating point calculations we run into problems in some cases. And I don't
think this is a broken function but a design decision that takes the
different tradeoffs into account.
Whether it's the right decision is an open question, if there are better
algorithm with essentially not disadvantages.

Two examples:
I had problems to verify some results against Stata at more than a few
significant digits, until I realized that Stata had used float32 for the
calculations by default in this case, while I was working with float64.
Using single precision linear algebra causes the same numerical problems as
numpy.mean runs into.

A few years ago I tried to match some tougher NIST examples that were
intentionally very badly scaled. numpy.mean at float64 had quite large
errors, but a simple trick with two passes through the data managed to get
very close to the certified NIST examples.


my conclusion:
don't use float32 unless you know you don't need any higher precision.
even with float64 it is possible to run into extreme cases where you get
numerical garbage or large precision losses.
However, in the large majority of cases a boring fast naive
implementation is enough.

Also, whether we use mean, sum or dot in a calculation is an implementation
detail, which in the case of dot doesn't have a dtype argument and always
depends on the dtype of the arrays, AFAIK.
Separating the accumulation dtype from the array dtype would require a lot
of work except in the simplest cases, like those that only use sum and mean
with specified dtype argument.


trying out the original example:

 X = np.ones((5, 1024), np.float32)
 X.mean()
1.0
 X.mean(dtype=np.float32)
1.0


 np.dot(X.ravel(), np.ones(X.ravel().shape) *1. / X.ravel().shape)
1.02299174

 np.__version__
'1.5.1'

Win32

Josef





 It just occurred to me that if the STSI folks (who count photons)
 took the mean() or other such func of an image array from Hubble
 sensors to find background value, they'd better always be using float64.

   - Ray



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-26 Thread josef.pktd

On Sat, Jul 26, 2014 at 9:57 AM, Benjamin Root ben.r...@ou.edu wrote:

 I could get behind the context manager approach. It would help keep
 backwards compatibility, while providing a very easy (and clean) way of
 consistently using the same reduction operation. Adding kwargs is just a
 road to hell.


Wouldn't a context manager require a global state that changes how
everything is calculated ?

Josef




 Cheers!
 Ben Root


 On Sat, Jul 26, 2014 at 9:53 AM, Julian Taylor 
 jtaylor.deb...@googlemail.com wrote:

 On 26.07.2014 15:38, Eelco Hoogendoorn wrote:
 
  Why is it not always used?

 for 1d reduction the iterator blocks by 8192 elements even when no
 buffering is required. There is a TODO in the source to fix that by
 adding additional checks. Unfortunately nobody knows hat these
 additional tests would need to be and Mark Wiebe who wrote it did not
 reply to a ping yet.

 Also along the non-fast axes the iterator optimizes the reduction to
 remove the strided access, see:
 https://github.com/numpy/numpy/pull/4697#issuecomment-42752599


 Instead of having a keyword argument to mean I would prefer a context
 manager that changes algorithms for different requirements.
 This would easily allow changing the accuracy and performance of third
 party functions using numpy without changing the third party library as
 long as they are using numpy as the base.
 E.g.
 with np.precisionstate(sum=kahan):
   scipy.stats.nanmean(d)

 We also have case where numpy uses algorithms that are far more precise
 than most people needs them. E.g. np.hypot and the related complex
 absolute value and division.
 These are very slow with glibc as it provides 1ulp accuracy, this is
 hardly ever needed.
 Another case that could use dynamic changing is flushing subnormals to
 zero.

 But this api is like Nathaniels parameterizable dtypes just an idea
 floating in my head which needs proper design and implementation written
 down. The issue is as usual ENOTIME.
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-26 Thread josef.pktd

On Sat, Jul 26, 2014 at 2:44 PM, Benjamin Root ben.r...@ou.edu wrote:

 That is one way of doing it, and probably the cleanest way. Or else you
 have to pass in the context object everywhere anyway. But I am not so
 concerned about that (we do that for other things as well). Bigger concerns
 would be nested contexts. For example, what if one of the scikit functions
 use such a context to explicitly state that they need a particular
 reduction algorithm, but the call to that scikit function is buried under a
 few layers of user functions, at the top of which has a context manager
 that states a different reduction op.

 Whose context wins? Naively, the scikit's context wins (because that's how
 contexts work). But, does that break with the very broad design goal here?
 To let the user specify the reduction kernel? Practically speaking, we
 could see users naively puting in context managers all over the place in
 their libraries, possibly choosing incorrect algorithms (I am serious here,
 how often have we seen stackoverflow instructions just blindly parrot
 certain arguments just because)? This gives the user no real mechanism to
 override the library, largely defeating the purpose.

 My other concern would be with multi-threaded code (which is where a
 global state would be bad).



statsmodels still has avoided anything that smells like a global state that
changes calculation.
(We never even implemented different global warning levels.)

https://groups.google.com/d/msg/pystatsmodels/-J9WXKLjyH4/5xvKu9_mbbEJ


Josef
There be Dragons.




 Ben



 On Sat, Jul 26, 2014 at 2:29 PM, josef.p...@gmail.com wrote:




 On Sat, Jul 26, 2014 at 9:57 AM, Benjamin Root ben.r...@ou.edu wrote:

 I could get behind the context manager approach. It would help keep
 backwards compatibility, while providing a very easy (and clean) way of
 consistently using the same reduction operation. Adding kwargs is just a
 road to hell.


 Wouldn't a context manager require a global state that changes how
 everything is calculated ?

 Josef




 Cheers!
 Ben Root


 On Sat, Jul 26, 2014 at 9:53 AM, Julian Taylor 
 jtaylor.deb...@googlemail.com wrote:

 On 26.07.2014 15:38, Eelco Hoogendoorn wrote:
 
  Why is it not always used?

 for 1d reduction the iterator blocks by 8192 elements even when no
 buffering is required. There is a TODO in the source to fix that by
 adding additional checks. Unfortunately nobody knows hat these
 additional tests would need to be and Mark Wiebe who wrote it did not
 reply to a ping yet.

 Also along the non-fast axes the iterator optimizes the reduction to
 remove the strided access, see:
 https://github.com/numpy/numpy/pull/4697#issuecomment-42752599


 Instead of having a keyword argument to mean I would prefer a context
 manager that changes algorithms for different requirements.
 This would easily allow changing the accuracy and performance of third
 party functions using numpy without changing the third party library as
 long as they are using numpy as the base.
 E.g.
 with np.precisionstate(sum=kahan):
   scipy.stats.nanmean(d)

 We also have case where numpy uses algorithms that are far more precise
 than most people needs them. E.g. np.hypot and the related complex
 absolute value and division.
 These are very slow with glibc as it provides 1ulp accuracy, this is
 hardly ever needed.
 Another case that could use dynamic changing is flushing subnormals to
 zero.

 But this api is like Nathaniels parameterizable dtypes just an idea
 floating in my head which needs proper design and implementation written
 down. The issue is as usual ENOTIME.
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-27 Thread josef.pktd

On Sat, Jul 26, 2014 at 5:19 PM, Sturla Molden sturla.mol...@gmail.com
wrote:

 Robert Kern robert.k...@gmail.com wrote:

  It would presumably require a global threading.RLock for protecting the
  global state.
 
  We would use thread-local storage like we currently do with the
  np.errstate() context manager. Each thread will have its own global
  state.

 That sounds like a better plan, yes :)


Any global state that changes how things are calculated will have
unpredictable results.

And I don't trust python users to be disciplined enough.

issue: Why do I get different results after `import this_funy_package`?

Josef





 Sturla

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-27 Thread josef.pktd

On Sun, Jul 27, 2014 at 4:24 AM, Robert Kern robert.k...@gmail.com wrote:

 On Sun, Jul 27, 2014 at 7:04 AM,  josef.p...@gmail.com wrote:
 
  On Sat, Jul 26, 2014 at 5:19 PM, Sturla Molden sturla.mol...@gmail.com
  wrote:
 
  Robert Kern robert.k...@gmail.com wrote:
 
   It would presumably require a global threading.RLock for protecting
 the
   global state.
  
   We would use thread-local storage like we currently do with the
   np.errstate() context manager. Each thread will have its own global
   state.
 
  That sounds like a better plan, yes :)
 
  Any global state that changes how things are calculated will have
  unpredictable results.
 
  And I don't trust python users to be disciplined enough.
 
  issue: Why do I get different results after `import this_funy_package`?

 That's why the suggestion is that it be controlled by a context
 manager. The state change will only be limited to the `with:`
 statement. You would not be able to fire-and-forget the state
 change.


Can you implement a context manager without introducing a global variable
that everyone could set, and forget?

Josef



 --
 Robert Kern
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] inplace unary operations?

2014-08-31 Thread josef.pktd

On Sat, Aug 30, 2014 at 1:45 PM, Nathaniel Smith n...@pobox.com wrote:

 On Sat, Aug 30, 2014 at 6:43 PM,  josef.p...@gmail.com wrote:
  Is there a way to negate a boolean, or to change the sign of a float
 inplace
  ?

 np.logical_not(arr, out=arr)
 np.negative(arr, out=arr)


Thanks Nathaniel.

np.negative might save a bit of memory and time when we have to negate the
loglikelihood all the time.

Josef




 -n

 --
 Nathaniel J. Smith
 Postdoctoral researcher - Informatics - University of Edinburgh
 http://vorpus.org
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] SFMT (faster mersenne twister)

2014-09-08 Thread josef.pktd

On Sun, Sep 7, 2014 at 4:23 PM, Sturla Molden sturla.mol...@gmail.com
wrote:

 Benjamin Root ben.r...@ou.edu wrote:
  In addition to issues with reproducibility, think of all of the unit
 tests
  that would break!

 That is a reproducibility problem :)

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



Related aside about reproducibility of random numbers:

IMO:
scipy.stats.distributions.rvs does not guarantee yet the values of the
random numbers except for those that are directly produced by numpy.
In contrast to numpy.random, scipy's distributions don't have unit tests
for the specific values of the rvs, and the rvs code for specific
distribution could still be improved in some cases, I guess

Josef
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Generalize hstack/vstack -- stack; Block matrices like in matlab

2014-09-08 Thread josef.pktd

On Mon, Sep 8, 2014 at 10:41 AM, Sturla Molden sturla.mol...@gmail.com
wrote:

 Stefan Otte stefan.o...@gmail.com wrote:

  stack([[a, b], [c, d]])
 
  In my case `stack` replaced `hstack` and `vstack` almost completely.
 
  If you're interested in including it in numpy I created a pull request
  [1]. I'm looking forward to getting some feedback!


np.asarray(np.bmat()) ?

Josef




 As far as I can see, it uses hstack and vstack. But that means a and b have
 to have the same number of rows, c and d must have the same rumber of rows,
 and hstack((a,b)) and hstack((c,d)) must have the same number of columns.

 Thus it requires a regularity like this:

 BB
 BB
 CCCDDD
 CCCDDD
 CCCDDD
 CCCDDD

 What if we just ignore this constraint, and only require the output to be
 rectangular? Now we have a 'tetris game':

 BB
 BB
 BB
 BB
 DD
 DD

 or

 BB
 BB
 BB
 BB
 BB
 BB

 This should be 'stackable', yes? Or perhaps we need another stacking
 function for this, say numpy.tetris?

 And while we're at it, what about higher dimensions? should there be an
 ndstack function too?


 Sturla

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Generalize hstack/vstack -- stack; Block matrices like in matlab

2014-09-08 Thread josef.pktd

On Mon, Sep 8, 2014 at 12:10 PM, Jaime Fernández del Río 
jaime.f...@gmail.com wrote:

 On Mon, Sep 8, 2014 at 7:41 AM, Sturla Molden sturla.mol...@gmail.com
 wrote:

 Stefan Otte stefan.o...@gmail.com wrote:

  stack([[a, b], [c, d]])
 
  In my case `stack` replaced `hstack` and `vstack` almost completely.
 
  If you're interested in including it in numpy I created a pull request
  [1]. I'm looking forward to getting some feedback!

 As far as I can see, it uses hstack and vstack. But that means a and b
 have
 to have the same number of rows, c and d must have the same rumber of
 rows,
 and hstack((a,b)) and hstack((c,d)) must have the same number of columns.

 Thus it requires a regularity like this:

 BB
 BB
 CCCDDD
 CCCDDD
 CCCDDD
 CCCDDD

 What if we just ignore this constraint, and only require the output to be
 rectangular? Now we have a 'tetris game':

 BB
 BB
 BB
 BB
 DD
 DD

 or

 BB
 BB
 BB
 BB
 BB
 BB

 This should be 'stackable', yes? Or perhaps we need another stacking
 function for this, say numpy.tetris?

 And while we're at it, what about higher dimensions? should there be an
 ndstack function too?


 This is starting to look like the second time in a row Stefan tries to
 extend numpy with a simple convenience function, and he gets tricked into
 implementing some sophisticated algorithm...


Maybe the third time is a gem.




 For his next PR I expect nothing less than an NP-complete problem. ;-)


How about cholesky or qr updating?   I could use one right now.

Josef






 Jaime

 --
 (\__/)
 ( O.o)
 (  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
 de dominación mundial.

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Generalize hstack/vstack -- stack; Block matrices like in matlab

2014-09-09 Thread josef.pktd

On Tue, Sep 9, 2014 at 5:42 AM, Stefan Otte stefan.o...@gmail.com wrote:

 Hey,

 @Josef, I wasn't aware of `bmat` and `np.asarray(np.bmat())` does
 basically what I want and what I'm already using.


I never needed any tetris or anything similar except for the matched block
version.

Just to point out two more related functions

scipy.sparse also has `bmat` for sparse block matrices
scipy.linalg and scipy.sparse have `block_diag` to complement bmat.


What I sometimes wish for is a sparse pseudo kronecker product as
convenience to bmat, where the first (or the second) matrix contains (0,1)
flags where the 1's specify where to put the blocks.
(I'm not sure what I really mean, similar to block_diag but with a
different filling pattern.)

Josef




 Regarding the Tetris problem: that never happened to me, but stack, as
 Josef pointed out, can handle that already :)

 I like the idea of removing the redundant square brackets:
 stack([[a, b], [c, d]]) -- stack([a, b], [c, d])
 However, if the brackets are there there is no difference between
 creating a `np.array` and stacking arrays with `np.stack`.

 If we want to get fancy and turn this PR into something bigger
 (working our way up to a NP-complete problem ;)) then how about this.
 I sometimes have arrays that look like:
   AB0
   0 C
 Where 0 is a scalar but is supposed to fill the rest of the array.
 Having something like 0 in there might lead to ambiguities though. What
 does
   ABC
   0D0
 mean? One could limit the filler to appear only on the left or the right:
   AB0
   0CD
 But even then the shape is not completely determined. So we could
 require to have one row that only consists of arrays and determines
 the shape. Alternatively we could have a keyword parameter `shape`:
   stack([A, B, 0], [0, C, D], shape=(8, 8))

 Colin, with `bmat` you can do what you're asking for. Directly taken
 from the example:
  np.bmat('A,B; C,D')
 matrix([[1, 1, 2, 2],
 [1, 1, 2, 2],
 [3, 4, 7, 8],
 [5, 6, 9, 0]])


 General question: If `bmat` already offers something like `stack`
 should we even bother implementing `stack`? More code leads to more
 bugs and maintenance work.


 Best,
  Stefan



 On Tue, Sep 9, 2014 at 12:14 AM, cjw c...@ncf.ca wrote:
 
  On 08-Sep-14 4:40 PM, Joseph Martinot-Lagarde wrote:
  Le 08/09/2014 15:29, Stefan Otte a écrit :
  Hey,
 
  quite often I work with block matrices. Matlab offers the convenient
 notation
 
[ a b; c d ]
  This would appear to be a desirable way to go.
 
  Numpy has something similar for strings.  The above is neater.
 
  Colin W.
  to stack matrices. The numpy equivalent is kinda clumsy:
 
  vstack([hstack([a,b]), hstack([c,d])])
 
  I wrote the little function `stack` that does exactly that:
 
stack([[a, b], [c, d]])
 
  In my case `stack` replaced `hstack` and `vstack` almost completely.
 
  If you're interested in including it in numpy I created a pull request
  [1]. I'm looking forward to getting some feedback!
 
 
  Best,
 Stefan
 
 
 
  [1] https://github.com/numpy/numpy/pull/5057
 
  The outside brackets are redundant, stack([[a, b], [c, d]]) should be
  stack([a, b], [c, d])
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Generalize hstack/vstack -- stack; Block matrices like in matlab

2014-09-09 Thread josef.pktd

On Tue, Sep 9, 2014 at 8:30 AM, josef.p...@gmail.com wrote:




 On Tue, Sep 9, 2014 at 5:42 AM, Stefan Otte stefan.o...@gmail.com wrote:

 Hey,

 @Josef, I wasn't aware of `bmat` and `np.asarray(np.bmat())` does
 basically what I want and what I'm already using.


 I never needed any tetris or anything similar except for the matched block
 version.

 Just to point out two more related functions

 scipy.sparse also has `bmat` for sparse block matrices
 scipy.linalg and scipy.sparse have `block_diag` to complement bmat.


 What I sometimes wish for is a sparse pseudo kronecker product as
 convenience to bmat, where the first (or the second) matrix contains (0,1)
 flags where the 1's specify where to put the blocks.
 (I'm not sure what I really mean, similar to block_diag but with a
 different filling pattern.)


(or in analogy to kronecker, np.nonzero provides the filling pattern, but
the submatrix is multiplied by the value.)

(application: unbalanced repeated measures or panel data)

Josef
()



 Josef




 Regarding the Tetris problem: that never happened to me, but stack, as
 Josef pointed out, can handle that already :)

 I like the idea of removing the redundant square brackets:
 stack([[a, b], [c, d]]) -- stack([a, b], [c, d])
 However, if the brackets are there there is no difference between
 creating a `np.array` and stacking arrays with `np.stack`.

 If we want to get fancy and turn this PR into something bigger
 (working our way up to a NP-complete problem ;)) then how about this.
 I sometimes have arrays that look like:
   AB0
   0 C
 Where 0 is a scalar but is supposed to fill the rest of the array.
 Having something like 0 in there might lead to ambiguities though. What
 does
   ABC
   0D0
 mean? One could limit the filler to appear only on the left or the
 right:
   AB0
   0CD
 But even then the shape is not completely determined. So we could
 require to have one row that only consists of arrays and determines
 the shape. Alternatively we could have a keyword parameter `shape`:
   stack([A, B, 0], [0, C, D], shape=(8, 8))

 Colin, with `bmat` you can do what you're asking for. Directly taken
 from the example:
  np.bmat('A,B; C,D')
 matrix([[1, 1, 2, 2],
 [1, 1, 2, 2],
 [3, 4, 7, 8],
 [5, 6, 9, 0]])


 General question: If `bmat` already offers something like `stack`
 should we even bother implementing `stack`? More code leads to more
 bugs and maintenance work.


 Best,
  Stefan



 On Tue, Sep 9, 2014 at 12:14 AM, cjw c...@ncf.ca wrote:
 
  On 08-Sep-14 4:40 PM, Joseph Martinot-Lagarde wrote:
  Le 08/09/2014 15:29, Stefan Otte a écrit :
  Hey,
 
  quite often I work with block matrices. Matlab offers the convenient
 notation
 
[ a b; c d ]
  This would appear to be a desirable way to go.
 
  Numpy has something similar for strings.  The above is neater.
 
  Colin W.
  to stack matrices. The numpy equivalent is kinda clumsy:
 
  vstack([hstack([a,b]), hstack([c,d])])
 
  I wrote the little function `stack` that does exactly that:
 
stack([[a, b], [c, d]])
 
  In my case `stack` replaced `hstack` and `vstack` almost completely.
 
  If you're interested in including it in numpy I created a pull request
  [1]. I'm looking forward to getting some feedback!
 
 
  Best,
 Stefan
 
 
 
  [1] https://github.com/numpy/numpy/pull/5057
 
  The outside brackets are redundant, stack([[a, b], [c, d]]) should be
  stack([a, b], [c, d])
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Linear algebra functions on empty arrays

2014-09-15 Thread josef.pktd

On Mon, Sep 15, 2014 at 5:48 AM, Sebastian Berg
sebast...@sipsolutions.net wrote:
 Hey all,

 for https://github.com/numpy/numpy/pull/3861/files I would like to allow
 0-sized dimensions for generalized ufuncs, meaning that the gufunc has
 to be able to handle this, but also that it *can* handle it at all.
 However lapack does not support this, so it needs some explicit fixing.
 Also some of the linalg functions currently explicitly allow and others
 explicitly disallow empty arrays.

 For example the QR and eigvals does not allow it, but on the other hand
 solve explicitly does (most probably never did, simply because lapack
 does not). So I am wondering if there is some convention for this, or
 what convention we should implement.

What does an empty square matrix/array look like?

np.linalg.solve   can have empty rhs, but shape of empty lhs, `a`, is ?

If I do a QR(arr)  with arr.shape=(0, 5), what is R supposed to be ?


I just wrote some loops over linalg.qr, but I always initialized explicitly.

I didn't manage to figure out how empty arrays would be useful.

If an empty square matrix can only only be of shape (0, 0), then it's
no use (in my applications).


Josef



 Regards,

 Sebastian

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Linear algebra functions on empty arrays

2014-09-15 Thread josef.pktd

On Mon, Sep 15, 2014 at 7:26 AM, Sebastian Berg
sebast...@sipsolutions.net wrote:
 On Mo, 2014-09-15 at 07:07 -0400, josef.p...@gmail.com wrote:
 On Mon, Sep 15, 2014 at 5:48 AM, Sebastian Berg
 sebast...@sipsolutions.net wrote:
  Hey all,
 
  for https://github.com/numpy/numpy/pull/3861/files I would like to allow
  0-sized dimensions for generalized ufuncs, meaning that the gufunc has
  to be able to handle this, but also that it *can* handle it at all.
  However lapack does not support this, so it needs some explicit fixing.
  Also some of the linalg functions currently explicitly allow and others
  explicitly disallow empty arrays.
 
  For example the QR and eigvals does not allow it, but on the other hand
  solve explicitly does (most probably never did, simply because lapack
  does not). So I am wondering if there is some convention for this, or
  what convention we should implement.

 What does an empty square matrix/array look like?

 np.linalg.solve   can have empty rhs, but shape of empty lhs, `a`, is ?

 If I do a QR(arr)  with arr.shape=(0, 5), what is R supposed to be ?


 QR may be more difficult since R may itself could not be empty, begging
 the question if you want to error out or fill it sensibly.

I shouldn't have tried it again (I got this a few times last week):

 ze = np.ones((z.shape[1], 0))
 np.linalg.qr(ze)
 ** On entry to DGEQRF parameter number  7 had an illegal value

crash

z.shape[1]  is 3
 np.__version__
'1.6.1'

I think, I would prefer an exception if the output would require a
empty square matrix with shape  (0, 0)
I don't see any useful fill value.


 Cholesky would require (0, 0) for example and for eigenvalues it would
 somewhat make sense too, the (0, 0) matrix has 0 eigenvalues.
 I did not go through them all, but I would like to figure out whether we
 should aim to generally allow it, or maybe just allow it for some
 special ones.

If the return square array has shape (0, 0), then it would make sense,
but I haven't run into a case for it yet.

np.cholesky(np.ones((0, 0)))  ?
(I didn't try since my interpreter is crashed. :)

Josef


 - Sebastian


 I just wrote some loops over linalg.qr, but I always initialized explicitly.

 I didn't manage to figure out how empty arrays would be useful.

 If an empty square matrix can only only be of shape (0, 0), then it's
 no use (in my applications).


 Josef


 
  Regards,
 
  Sebastian
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Is this a bug?

2014-09-16 Thread josef.pktd

On Tue, Sep 16, 2014 at 3:42 PM, Nathaniel Smith n...@pobox.com wrote:
 On Tue, Sep 16, 2014 at 3:27 PM, Charles R Harris
 charlesr.har...@gmail.com wrote:
 Hi All,

 It turns out that gufuncs will broadcast the last dimension if it is one.
 For instance, inner1d has signature `(n), (n) - ()`, yet

 In [27]: inner1d([1,1,1], [1])
 Out[27]: 3

 Yes, this looks totally wrong to me too... broadcasting is a feature
 of auto-vectorizing a core operation over a set of dimensions, it
 shouldn't be applied to the dimensions of the core operation itself
 like this.

Are these functions doing any numerical shortcuts in this case?

If yes, this would be convenient.

inner1d(x, weights)   with weights is either (n, ) or ()

if weights == 1:
return x.sum()
else:
return inner1d(x, weights)

Josef


 -n

 --
 Nathaniel J. Smith
 Postdoctoral researcher - Informatics - University of Edinburgh
 http://vorpus.org
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Request for enhancement to numpy.random.shuffle

2014-10-12 Thread josef.pktd

On Sun, Oct 12, 2014 at 10:54 AM, Warren Weckesser
warren.weckes...@gmail.com wrote:


 On Sun, Oct 12, 2014 at 7:57 AM, Robert Kern robert.k...@gmail.com wrote:

 On Sat, Oct 11, 2014 at 11:51 PM, Warren Weckesser
 warren.weckes...@gmail.com wrote:

  A small wart in this API is the meaning of
 
shuffle(a, independent=False, axis=None)
 
  It could be argued that the correct behavior is to leave the
  array unchanged. (The current behavior can be interpreted as
  shuffling a 1-d sequence of monolithic blobs; the axis argument
  specifies which axis of the array corresponds to the
  sequence index.  Then `axis=None` means the argument is
  a single monolithic blob, so there is nothing to shuffle.)
  Or an error could be raised.
 
  What do you think?

 It seems to me a perfectly good reason to have two methods instead of
 one. I can't imagine when I wouldn't be using a literal True or False
 for this, so it really should be two different methods.



 I agree, and my first inclination was to propose a different method (and I
 had the bikeshedding conversation with myself about the name: disarrange,
 scramble, disorder, randomize, ashuffle, some other variation of the
 word shuffle, ...), but I figured the first thing folks would say is Why
 not just add options to shuffle?  So, choose your battles and all that.

 What do other folks think of making a separate method?

I'm not a fan of many similar functions.

What's the difference between permute, shuffle and scramble?
And how do I find or remember which is which?





 That said, I would just make the axis=None behavior the same for both
 methods. axis=None does *not* mean treat this like a single
 monolithic blob in any of the axis=-having methods; it means flatten
 the array and do the operation on the single flattened axis. I think
 the latter behavior is a reasonable interpretation of axis=None for
 both methods.



 Sounds good to me.

+1 (since all the arguments have been already given


Josef
- Why does sort treat columns independently instead of sorting rows?
- because there is lexsort
- Oh, lexsort, I haven thought about it in 5 years. It's not even next
to sort in the pop up code completion



 Warren




 --
 Robert Kern
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Request for enhancement to numpy.random.shuffle

2014-10-12 Thread josef.pktd

On Sun, Oct 12, 2014 at 11:33 AM, Warren Weckesser
warren.weckes...@gmail.com wrote:


 On Sun, Oct 12, 2014 at 11:20 AM, josef.p...@gmail.com wrote:

 On Sun, Oct 12, 2014 at 10:54 AM, Warren Weckesser
 warren.weckes...@gmail.com wrote:
 
 
  On Sun, Oct 12, 2014 at 7:57 AM, Robert Kern robert.k...@gmail.com
  wrote:
 
  On Sat, Oct 11, 2014 at 11:51 PM, Warren Weckesser
  warren.weckes...@gmail.com wrote:
 
   A small wart in this API is the meaning of
  
 shuffle(a, independent=False, axis=None)
  
   It could be argued that the correct behavior is to leave the
   array unchanged. (The current behavior can be interpreted as
   shuffling a 1-d sequence of monolithic blobs; the axis argument
   specifies which axis of the array corresponds to the
   sequence index.  Then `axis=None` means the argument is
   a single monolithic blob, so there is nothing to shuffle.)
   Or an error could be raised.
  
   What do you think?
 
  It seems to me a perfectly good reason to have two methods instead of
  one. I can't imagine when I wouldn't be using a literal True or False
  for this, so it really should be two different methods.
 
 
 
  I agree, and my first inclination was to propose a different method (and
  I
  had the bikeshedding conversation with myself about the name:
  disarrange,
  scramble, disorder, randomize, ashuffle, some other variation of
  the
  word shuffle, ...), but I figured the first thing folks would say is
  Why
  not just add options to shuffle?  So, choose your battles and all that.
 
  What do other folks think of making a separate method?

 I'm not a fan of many similar functions.

 What's the difference between permute, shuffle and scramble?



 The difference between `shuffle` and the new method being proposed is
 explained in the first email in this thread.
 `np.random.permutation` with an array argument returns a shuffled copy of
 the array; it does not modify its argument. (It should also get an `axis`
 argument when `shuffle` gets an `axis` argument.)


 And how do I find or remember which is which?



 You could start with `doc(np.random)` (or `np.random?` in ipython).

If you have to check the docstring each time, then there is something wrong.
In my opinion all docstrings should be read only once.

It's like a Windows program where the GUI menus are not **self-explanatory**.

What did Save-As do ?

Josef



 Warren





 
 
 
  That said, I would just make the axis=None behavior the same for both
  methods. axis=None does *not* mean treat this like a single
  monolithic blob in any of the axis=-having methods; it means flatten
  the array and do the operation on the single flattened axis. I think
  the latter behavior is a reasonable interpretation of axis=None for
  both methods.
 
 
 
  Sounds good to me.

 +1 (since all the arguments have been already given


 Josef
 - Why does sort treat columns independently instead of sorting rows?
 - because there is lexsort
 - Oh, lexsort, I haven thought about it in 5 years. It's not even next
 to sort in the pop up code completion


 
  Warren
 
 
 
 
  --
  Robert Kern
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Request for enhancement to numpy.random.shuffle

2014-10-12 Thread josef.pktd

On Sun, Oct 12, 2014 at 12:14 PM, Warren Weckesser
warren.weckes...@gmail.com wrote:


 On Sat, Oct 11, 2014 at 6:51 PM, Warren Weckesser
 warren.weckes...@gmail.com wrote:

 I created an issue on github for an enhancement
 to numpy.random.shuffle:
 https://github.com/numpy/numpy/issues/5173
 I'd like to get some feedback on the idea.

 Currently, `shuffle` shuffles the first dimension of an array
 in-place.  For example, shuffling a 2D array shuffles the rows:

 In [227]: a
 Out[227]:
 array([[ 0,  1,  2],
[ 3,  4,  5],
[ 6,  7,  8],
[ 9, 10, 11]])

 In [228]: np.random.shuffle(a)

 In [229]: a
 Out[229]:
 array([[ 0,  1,  2],
[ 9, 10, 11],
[ 3,  4,  5],
[ 6,  7,  8]])


 To add an axis keyword, we could (in effect) apply `shuffle` to
 `a.swapaxes(axis, 0)`.  For a 2-D array, `axis=1` would shuffles
 the columns:

 In [232]: a = np.arange(15).reshape(3,5)

 In [233]: a
 Out[233]:
 array([[ 0,  1,  2,  3,  4],
[ 5,  6,  7,  8,  9],
[10, 11, 12, 13, 14]])

 In [234]: axis = 1

 In [235]: np.random.shuffle(a.swapaxes(axis, 0))

 In [236]: a
 Out[236]:
 array([[ 3,  2,  4,  0,  1],
[ 8,  7,  9,  5,  6],
[13, 12, 14, 10, 11]])

 So that's the first part--adding an `axis` keyword.

 The other part of the enhancement request is to add a shuffle
 behavior that shuffles the 1-d slices *independently*.  That is,
 for a 2-d array, shuffling with `axis=0` would apply a different
 shuffle to each column.  In the github issue, I defined a
 function called `disarrange` that implements this behavior:

 In [240]: a
 Out[240]:
 array([[ 0,  1,  2],
[ 3,  4,  5],
[ 6,  7,  8],
[ 9, 10, 11],
[12, 13, 14]])

 In [241]: disarrange(a, axis=0)

 In [242]: a
 Out[242]:
 array([[ 6,  1,  2],
[ 3, 13, 14],
[ 9, 10,  5],
[12,  7,  8],
[ 0,  4, 11]])

 Note that each column has been shuffled independently.

 This behavior is analogous to how `sort` handles the `axis`
 keyword.  `sort` sorts the 1-d slices along the given axis
 independently.

 In the github issue, I suggested the following signature
 for `shuffle` (but I'm not too fond of the name `independent`):

   def shuffle(a, independent=False, axis=0)

 If `independent` is False, the current behavior of `shuffle`
 is used.  If `independent` is True, each 1-d slice is shuffled
 independently (in the same way that `sort` sorts each 1-d
 slice).

 Like most functions that take an `axis` argument, `axis=None`
 means to shuffle the flattened array.  With `independent=True`,
 it would act like `np.random.shuffle(a.flat)`, e.g.

 In [247]: a
 Out[247]:
 array([[ 0,  1,  2,  3,  4],
[ 5,  6,  7,  8,  9],
[10, 11, 12, 13, 14]])

 In [248]: np.random.shuffle(a.flat)

 In [249]: a
 Out[249]:
 array([[ 0, 14,  9,  1, 13],
[ 2,  8,  5,  3,  4],
[ 6, 10,  7, 12, 11]])


 A small wart in this API is the meaning of

   shuffle(a, independent=False, axis=None)

 It could be argued that the correct behavior is to leave the
 array unchanged. (The current behavior can be interpreted as
 shuffling a 1-d sequence of monolithic blobs; the axis argument
 specifies which axis of the array corresponds to the
 sequence index.  Then `axis=None` means the argument is
 a single monolithic blob, so there is nothing to shuffle.)
 Or an error could be raised.

 What do you think?

 Warren




 It is clear from the comments so far that, when `axis` is None, the result
 should be a shuffle of all the elements in the array, for both methods of
 shuffling (whether implemented as a new method or with a boolean argument to
 `shuffle`).  Forget I ever suggested doing nothing or raising an error. :)

 Josef's comment reminded me that `numpy.random.permutation`

which kind of proofs my point

I sometimes have problems finding `shuffle` because I want a function
that does permutation.

Josef

returns a
 shuffled copy of the array (when its argument is an array).  This function
 should also get an `axis` argument.  `permutation` shuffles the same way
 `shuffle` does--it simply makes a copy and then calls `shuffle` on the copy.
 If a new method is added for the new shuffling style, then it would be
 consistent to also add a new method that uses the new shuffling style and
 returns a copy of the shuffled array.   Then we would then have four
 methods:

In-placeCopy
 Current shuffle style  shuffle permutation
 New shuffle style  (name TBD)  (name TBD)

 (All of them will have an `axis` argument.)

 I suspect this will make some folks prefer the approach of adding a boolean
 argument to `shuffle` and `permutation`.

 Warren


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org

Re: [Numpy-discussion] Request for enhancement to numpy.random.shuffle

2014-10-16 Thread josef.pktd

On Thu, Oct 16, 2014 at 3:39 PM, Nathaniel Smith n...@pobox.com wrote:
 On Thu, Oct 16, 2014 at 6:30 PM, Warren Weckesser
 warren.weckes...@gmail.com wrote:


 On Thu, Oct 16, 2014 at 12:40 PM, Nathaniel Smith n...@pobox.com wrote:

 On Thu, Oct 16, 2014 at 4:39 PM, Warren Weckesser
 warren.weckes...@gmail.com wrote:
 
  On Sun, Oct 12, 2014 at 9:13 PM, Nathaniel Smith n...@pobox.com wrote:
 
  Regarding names: shuffle/permutation is a terrible naming convention
  IMHO and shouldn't be propagated further. We already have a good
  naming convention for inplace-vs-sorted: sort vs. sorted, reverse vs.
  reversed, etc.
 
  So, how about:
 
  scramble + scrambled shuffle individual entries within each
  row/column/..., as in Warren's suggestion.
 
  shuffle + shuffled to do what shuffle, permutation do now (mnemonic:
  these break a 2d array into a bunch of 1d cards, and then shuffle
  those cards).
 
  permuted remains indefinitely, with the docstring: Deprecated alias
  for 'shuffled'.
 
  That sounds good to me.  (I might go with 'randomize' instead of
  'scramble',
  but that's a second-order decision for the API.)

 I hesitate to use names like randomize because they're less
 informative than they feel seem -- if asked what this operation does
 to an array, then it would be natural to say it randomizes the
 array. But if told that the random module has a function called
 randomize, then that's not very informative -- everything in random
 randomizes something somehow.

 I had some similar concerns (hence my original disarrange), but
 randomize seemed more likely to be found when searching or browsing the
 docs, and while it might be a bit too generic-sounding, it does feel like a
 natural verb for the process.   On the other hand, permute and permuted
 are even more natural and unambiguous.  Any objections to those?  (The
 existing function is permutation.)
 [...]
 By the way, permutation has a feature not yet mentioned here: if the
 argument is an integer 'n', it generates a permutation of arange(n).  In
 this case, it acts like matlab's randperm function.  Unless we replicate
 that in the new function, we shouldn't deprecate permutation.

 I guess we could do something like:

 permutation(n):

 Return a random permutation on n items. Equivalent to permuted(arange(n)).

 Note: for backwards compatibility, a call like permutation(an_array)
 currently returns the same as shuffled(an_array). (This is *not*
 equivalent to permuted(an_array).) This functionality is deprecated.

 OTOH np.random.permute as a name does have a downside: someday we'll
 probably add a function called np.permute (for applying a given
 permutation in place -- the O(n) algorithm for this is useful and
 tricky), and having two functions with the same name and very
 different semantics would be pretty confusing.

I like `permute`. That's the one term I'm looking for first.

If np.permute does some kind of deterministic permutation or pivoting,
then I wouldn't find it confusing if np.random.permute does random
permutation.

(I definitely don't like scrambled, sounds like eggs or cable TV that
needs to be unscrambled.)

Josef



 -n

 --
 Nathaniel J. Smith
 Postdoctoral researcher - Informatics - University of Edinburgh
 http://vorpus.org
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Request for enhancement to numpy.random.shuffle

2014-10-17 Thread josef.pktd

On Thu, Oct 16, 2014 at 10:50 PM, Nathaniel Smith n...@pobox.com wrote:
 On Fri, Oct 17, 2014 at 2:35 AM,  josef.p...@gmail.com wrote:
 On Thu, Oct 16, 2014 at 3:39 PM, Nathaniel Smith n...@pobox.com wrote:
 On Thu, Oct 16, 2014 at 6:30 PM, Warren Weckesser
 warren.weckes...@gmail.com wrote:


 On Thu, Oct 16, 2014 at 12:40 PM, Nathaniel Smith n...@pobox.com wrote:

 On Thu, Oct 16, 2014 at 4:39 PM, Warren Weckesser
 warren.weckes...@gmail.com wrote:
 
  On Sun, Oct 12, 2014 at 9:13 PM, Nathaniel Smith n...@pobox.com wrote:
 
  Regarding names: shuffle/permutation is a terrible naming convention
  IMHO and shouldn't be propagated further. We already have a good
  naming convention for inplace-vs-sorted: sort vs. sorted, reverse vs.
  reversed, etc.
 
  So, how about:
 
  scramble + scrambled shuffle individual entries within each
  row/column/..., as in Warren's suggestion.
 
  shuffle + shuffled to do what shuffle, permutation do now (mnemonic:
  these break a 2d array into a bunch of 1d cards, and then shuffle
  those cards).
 
  permuted remains indefinitely, with the docstring: Deprecated alias
  for 'shuffled'.
 
  That sounds good to me.  (I might go with 'randomize' instead of
  'scramble',
  but that's a second-order decision for the API.)

 I hesitate to use names like randomize because they're less
 informative than they feel seem -- if asked what this operation does
 to an array, then it would be natural to say it randomizes the
 array. But if told that the random module has a function called
 randomize, then that's not very informative -- everything in random
 randomizes something somehow.

 I had some similar concerns (hence my original disarrange), but
 randomize seemed more likely to be found when searching or browsing the
 docs, and while it might be a bit too generic-sounding, it does feel like a
 natural verb for the process.   On the other hand, permute and permuted
 are even more natural and unambiguous.  Any objections to those?  (The
 existing function is permutation.)
 [...]
 By the way, permutation has a feature not yet mentioned here: if the
 argument is an integer 'n', it generates a permutation of arange(n).  In
 this case, it acts like matlab's randperm function.  Unless we replicate
 that in the new function, we shouldn't deprecate permutation.

 I guess we could do something like:

 permutation(n):

 Return a random permutation on n items. Equivalent to permuted(arange(n)).

 Note: for backwards compatibility, a call like permutation(an_array)
 currently returns the same as shuffled(an_array). (This is *not*
 equivalent to permuted(an_array).) This functionality is deprecated.

 OTOH np.random.permute as a name does have a downside: someday we'll
 probably add a function called np.permute (for applying a given
 permutation in place -- the O(n) algorithm for this is useful and
 tricky), and having two functions with the same name and very
 different semantics would be pretty confusing.

 I like `permute`. That's the one term I'm looking for first.

 If np.permute does some kind of deterministic permutation or pivoting,
 then I wouldn't find it confusing if np.random.permute does random
 permutation.

 Yeah, but:

 from ... import permute
 # 500 lines later
 def foo(...):
 permute(...)  # what the heck is this

 It definitely *can* be confusing; basically everything else in
 np.random has a name that suggests randomness even without seeing the
 full path.

I usually/always avoid importing names from random into the module namespace

np.random.xxx

from numpy.random import power
power(...)

 power(5, 3)
array([ 0.93771162,  0.96180884,  0.80191961])

???

and f and beta and gamma, ...

 bytes(10)
'\xa3\xf0%\x88\x11\xda\x0e\x81\x0c\x8e'
 bytes(5)
'\xb0B\x8e\xa1\x80'



 It's not a huge deal, though.

 (I definitely don't like scrambled, sounds like eggs or cable TV that
 needs to be unscrambled.)

 I vote that in this kind of bikeshed we try to restrict ourselves to
 arguments that we can at least pretend are motivated by some
 technical/UX concern ;-). (I guess unscrambling eggs would be
 technically impressive tho ;-))

Ignoring the eggs, it still sounds like a cheap encryption and is a
word I would never look for when looking for something to implement a
permutation test.

Josef



 --
 Nathaniel J. Smith
 Postdoctoral researcher - Informatics - University of Edinburgh
 http://vorpus.org
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Choosing between NumPy and SciPy functions

2014-10-27 Thread josef.pktd

On Mon, Oct 27, 2014 at 2:24 PM, Eelco Hoogendoorn
hoogendoorn.ee...@gmail.com wrote:

The same occurred to me when reading that question. My personal opinion is
that such functionality should be deprecated from numpy. I don't know who
said this, but it really stuck with me: but the power of numpy is first and
foremost in it being a fantastic interface, not in being a library.

There is nothing more annoying than every project having its own array
type. The fact that the whole scientific python stack can so seamlessly
communicate is where all good things begin.

In my opinion, that is what numpy should focus on; basic data structures,
and tools for manipulating them. Linear algebra is way too high level for
numpy imo, and used by only a small subsets of its 'matlab-like' users.

When I get serious about linear algebra or ffts or what have you, id
rather import an extra module that wraps a specific library.

We are not always getting serious about linalg, just a quick call to pinv
or qr or matrix_rank or similar doesn't necessarily mean we need a linalg
library with all advanced options.

@ matrix operations and linear algebra are basic stuff.

On Mon, Oct 27, 2014 at 2:26 PM, D. Michael McFarland dm...@dmmcf.net
wrote:

A recent post raised a question about differences in results obtained
with numpy.linalg.eigh() and scipy.linalg.eigh(), documented at

http://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.eigh.html#numpy.linalg.eigh
and

http://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.eigh.html#scipy.linalg.eigh
,
respectively. It is clear that these functions address different
mathematical problems (among other things, the SciPy routine can solve
the generalized as well as standard eigenproblems); I am not concerned
here with numerical differences in the results for problems both should
be able to solve (the author of the original post received useful
replies in that thread).

What I would like to ask about is the situation this illustrates, where
both NumPy and SciPy provide similar functionality (sometimes identical,
to judge by the documentation). Is there some guidance on which is to
be preferred? I could argue that using only NumPy when possible avoids
unnecessary dependence on SciPy in some code, or that using SciPy
consistently makes for a single interface and so is less error prone.
Is there a rule of thumb for cases where SciPy names shadow NumPy names?

I've used Python for a long time, but have only recently returned to
doing serious numerical work with it. The tools are very much improved,
but sometimes, like now, I feel I'm missing the obvious. I would
appreciate pointers to any relevant documentation, or just a summary of
conventional wisdom on the topic.

Just as opinion as user:

Most of the time I don't care and treat this just as different versions.
For example in the linalg case, I use by default numpy.linalg and switch to
scipy if I need the extras.
pinv is the only one that I ever seriously compared.

Some details are nicer, np.linalg.qr(x, mode='r') returns the reduced
matrix instead of the full matrix as does scipy.linalg.
np.linalg.pinv is faster but maybe slightly less accurate (or defaults that
make it less accurate in corner cases). scipy often has more overhead (and
isfinite check by default).

I just checked, I didn't even know scipy.linalg also has an `inv`. One of
my arguments for np.linalg would have been that it's easy to switch between
inv and pinv.

For fft I use mostly scipy, IIRC. (scipy's fft imports numpy's fft,
partially?)

Essentially, I don't care most of the time that there are different ways of
doing essentially the same thing, but some better information about the
differences would be useful.

Josef

Regards,
Michael
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Choosing between NumPy and SciPy functions

2014-10-27 Thread josef.pktd

On Mon, Oct 27, 2014 at 10:50 PM, Sturla Molden sturla.mol...@gmail.com
wrote:

 josef.p...@gmail.com wrote:

  For fft I use mostly scipy, IIRC.   (scipy's fft imports numpy's fft,
  partially?)

 No. SciPy uses the Fortran library FFTPACK (wrapped with f2py) and NumPy
 uses a smaller C library called fftpack_lite. Algorithmically they are are
 similar, but fftpack_lite has fewer features (e.g. no DCT). scipy.fftpack
 does not import numpy.fft. Neither of these libraries are very fast, but
 usually they are fast enough for practical purposes. If we really need a
 kick-ass fast FFT we need to go to libraries like FFTW, Intel MKL or
 Apple's Accelerate Framework, or even use tools like CUDA or OpenCL to run
 the FFT on the GPU. But using such tools takes more coding (and reading API
 specifications) than the convinience of just using the FFTs already in
 NumPy or SciPy. So if you count in your own time as well, it might not be
 that FFTW or MKL are the faster FFTs.



Ok, I didn't remember correctly.

I didn't use much fft recently, I never used DCT. My favorite fft
function is fftconvolve.
https://github.com/scipy/scipy/blob/e758c482efb8829685dcf494bdf71eeca3dd77f0/scipy/signal/signaltools.py#L13
   doesn't seem to mind mixing numpy and scipy  (quick github search)


It's sometimes useful to have simplified functions that are good enough
where we don't have to figure out all the extras that the docstring of the
fancy version is mentioning.

Josef




 Sturla

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Choosing between NumPy and SciPy functions

2014-10-27 Thread josef.pktd

On Mon, Oct 27, 2014 at 11:31 PM, josef.p...@gmail.com wrote:



 On Mon, Oct 27, 2014 at 10:50 PM, Sturla Molden sturla.mol...@gmail.com
 wrote:

 josef.p...@gmail.com wrote:

  For fft I use mostly scipy, IIRC.   (scipy's fft imports numpy's fft,
  partially?)

 No. SciPy uses the Fortran library FFTPACK (wrapped with f2py) and NumPy
 uses a smaller C library called fftpack_lite. Algorithmically they are are
 similar, but fftpack_lite has fewer features (e.g. no DCT). scipy.fftpack
 does not import numpy.fft. Neither of these libraries are very fast, but
 usually they are fast enough for practical purposes. If we really need a
 kick-ass fast FFT we need to go to libraries like FFTW, Intel MKL or
 Apple's Accelerate Framework, or even use tools like CUDA or OpenCL to run
 the FFT on the GPU. But using such tools takes more coding (and reading
 API
 specifications) than the convinience of just using the FFTs already in
 NumPy or SciPy. So if you count in your own time as well, it might not be
 that FFTW or MKL are the faster FFTs.



 Ok, I didn't remember correctly.

 I didn't use much fft recently, I never used DCT. My favorite fft
 function is fftconvolve.

 https://github.com/scipy/scipy/blob/e758c482efb8829685dcf494bdf71eeca3dd77f0/scipy/signal/signaltools.py#L13
doesn't seem to mind mixing numpy and scipy  (quick github search)


 It's sometimes useful to have simplified functions that are good enough
 where we don't have to figure out all the extras that the docstring of the
 fancy version is mentioning.


I take this back (even if it's true),
because IMO the defaults should work, and I have a tendency to pile on
options in my code that are intended for experts.

Josef





 Josef




 Sturla

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Choosing between NumPy and SciPy functions

2014-10-31 Thread josef.pktd

On Fri, Oct 31, 2014 at 11:07 AM, Benjamin Root ben.r...@ou.edu wrote:

 Just to throw in my two cents here. I feel that sometimes, features are
 tried out first elsewhere (possibly in scipy) and then brought down into
 numpy after sufficient shakedown time. So, in some cases, I wonder if the
 numpy version is actually more refined than the scipy version? Of course,
 there is no way to know this from the documentation, which is a problem.
 Didn't scipy have nanmean() for a while before Numpy added it in version
 1.8?


That's true for several functions in scipy.stats. And we have more
deprecation in scipy.stats in favor of numpy pending.

part of polynomials is another case, kind of.

But I don't remember any other ones in my time.

(There is also a reverse extension for scipy binned_stats based on the
np.histogram code.)

Josef





 Ben Root

 On Fri, Oct 31, 2014 at 10:26 AM, D. Michael McFarland dm...@dmmcf.net
 wrote:

 Stefan van der Walt ste...@sun.ac.za writes:

  On 2014-10-27 15:26:58, D. Michael McFarland dm...@dmmcf.net wrote:
  What I would like to ask about is the situation this illustrates, where
  both NumPy and SciPy provide similar functionality (sometimes
 identical,
  to judge by the documentation).  Is there some guidance on which is to
  be preferred?
 
  I'm not sure if you've received an answer to your question so far. My
  advice: use the SciPy functions.  SciPy is often built on more extensive
  Fortran libraries not available during NumPy compilation, and I am not
  aware of any cases where a function in NumPy is faster or more extensive
  than the equivalent in SciPy.

 The whole thread has been interesting reading (now that I've finally
 come back to it...got busy for a few days), but this is the sort of
 answer I was hoping for.  Thank you.

 Best,
 Michael
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] simple reduction question

2014-12-24 Thread josef.pktd

On Wed, Dec 24, 2014 at 10:30 AM, Julian Taylor 
jtaylor.deb...@googlemail.com wrote:

 On 24.12.2014 16:25, Neal Becker wrote:
  What would be the most efficient way to compute:
 
  c[j] = \sum_i (a[i] * b[i,j])
 
  where a[i] is a 1-d vector, b[i,j] is a 2-d array?
 
  This seems to be one way:
 
  import numpy as np
  a = np.arange (3)
  b = np.arange (12).reshape (3,4)
  c = np.dot (a, b).sum()
 
  but np.dot returns a vector, which then needs further reduction.  Don't
 know if
  there's a better way.
 

 the formula maps nice to einsum:

 np.einsum(i,ij-, a, b)

 should also be reasonably efficient, but that probably depends on your
 BLAS library and the sizes of the arrays.


hijacking a bit since I was just trying to replicate various
multidimensional dot products with einsum

Are the older timings for einsum still a useful guide?

e.g.
http://stackoverflow.com/questions/14758283/is-there-a-numpy-scipy-dot-product-calculating-only-the-diagonal-entries-of-the

I didn't pay a lot of attention to the einsum changes, since I haven't
really used it yet.

Josef
X V X.T but vectorized



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Any interest in a 'heaviside' ufunc?

2015-02-03 Thread josef.pktd

On Wed, Feb 4, 2015 at 12:18 AM, Warren Weckesser 
warren.weckes...@gmail.com wrote:



 On Tue, Feb 3, 2015 at 11:14 PM, Sturla Molden sturla.mol...@gmail.com
 wrote:

 Warren Weckesser warren.weckes...@gmail.com wrote:

  0if x  0
  heaviside(x) =  0.5  if x == 0
  1if x  0
 

 This is not correct. The discrete form of the Heaviside step function has
 the value 1 for x == 0.

 heaviside = lambda x : 1 - (x  0).astype(int)





 By discrete form, do you mean discrete time (i.e. a function defined on
 the integers)?  Then I agree, the discrete time unit step function is
 defined as

 u(k) = 0  k  0
1  k = 0

 for integer k.

 The domain of the proposed Heaviside function is not discrete; it is
 defined for arbitrary floating point (real) arguments.  In this case, the
 choice heaviside(0) = 0.5 is a common convention. See for example,

 * http://mathworld.wolfram.com/HeavisideStepFunction.html
 * http://www.mathworks.com/help/symbolic/heaviside.html
 * http://en.wikipedia.org/wiki/Heaviside_step_function, in particular
 http://en.wikipedia.org/wiki/Heaviside_step_function#Zero_argument

 Other common conventions are the right-continuous version that you prefer
 (heavisde(0) = 1), or the left-continuous version (heaviside(0) = 0).

 We can accommodate the alternatives with an additional argument that sets
 the value at 0:

 heaviside(x, zero_value=0.5)


What's the usecase for a heaviside function?

I don't think I have needed one since I was using mathematica or maple.

(x  0).astype(...)   (x = 0).astype(...)
np.sign(x, dtype)
look useful enough for most cases, or not?

(What I wish numpy had is conditional place that doesn't calculate all the
values. (I think there is a helper function in scipy.stats for that))

Josef






 Warren




 Sturla

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] suggestion: improve text of failing test

2015-02-05 Thread josef.pktd

On Thu, Feb 5, 2015 at 3:39 PM, Nathaniel Smith n...@pobox.com wrote:

 On 5 Feb 2015 12:15, josef.p...@gmail.com wrote:
 
  The assert_allclose text is not precise enough to be helpful to fix a
 test failure that cannot be replicated on every machine, and we cannot just
 quickly grab --pdb-failures.
 
  By how much do I have to lower the precision to make it pass on this
 continuous integration machine?
 
 
  assert_allclose(he, hefd, rtol=5e-10)
File C:\Python27\envs\py3\lib\site-packages\numpy\testing\utils.py,
 line 1297, in assert_allclose
  verbose=verbose, header=header)
File C:\Python27\envs\py3\lib\site-packages\numpy\testing\utils.py,
 line 665, in assert_array_compare
  raise AssertionError(msg)
  AssertionError:
  Not equal to tolerance rtol=5e-10, atol=0
 
  (mismatch 100.0%)
   x: array([[ -2.965667e+01,  -1.988865e+02,  -2.370194e+00,
  -1.003654e+01],
 [ -1.988865e+02,  -1.383377e+03,  -1.592292e+01,  -6.800266e+01],
 [ -2.370194e+00,  -1.592292e+01,  -8.301699e-01,  -8.301699e-01],
 [ -1.003654e+01,  -6.800266e+01,  -8.301699e-01,  -3.449885e+00]])
   y: array([[ -2.965667e+01,  -1.988865e+02,  -2.370194e+00,
  -1.003654e+01],
 [ -1.988865e+02,  -1.383377e+03,  -1.592292e+01,  -6.800266e+01],
 [ -2.370194e+00,  -1.592292e+01,  -8.301699e-01,  -8.301699e-01],
 [ -1.003654e+01,  -6.800266e+01,  -8.301699e-01,  -3.449885e+00]])
 
 
  the suggestion is to add rtol and atol to the mismatch summary, so we
 can see if it's just a precision issue or something serious
 
  rtol = np.max(np.abs(x / y - 1)
  atol = np.max(np.abs(x - y)
 
  (mismatch 100.0%  rtol=xxx  atol=xxx)

 So basically just printing what rtol and/or atol would have to be to make
 the test pass? Sounds useful to me. (There is a bit of an infelicity in
 that if you're using both atol and rtol in the same test then there's no
 easy way to suggest how to fix both simultaneously, but I'm not sure how to
 fix that. Maybe we should also print max(abs(x[y == 0]))?)


I usually check the rtol and atol as above in pdb on failure, Most of the
time it's enough information to figure out how to twist the numbers. There
are only a few cases where I'm fine tuning both rtol and atol at the same
time. I guess there is the sum of the tol from the definition of allclose.

We don't have many cases with y == 0 mixed together with large numbers,
because our reference numbers usually also have numerical noise.

One point is also to just make the test output more informative, to see if
the test machine is just a bit off even if the mismatch=100%.


 Want to submit a pull request?


Not really, I'd rather stick to my corner and let someone else get on the
numpy contributor list :)
(header was suggestion not proposal)

Thanks,

Josef


 -n

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Silent Broadcasting considered harmful

2015-02-08 Thread josef.pktd

On Sun, Feb 8, 2015 at 4:08 PM, Eelco Hoogendoorn
hoogendoorn.ee...@gmail.com wrote:
 I personally use Octave and/or Numpy  for several years now and never ever
 needed braodcasting.
 But since it is still there there will be many users who need it, there will
 be some use for it.

 Uhm, yeah, there is some use for it. Im all for explicit over implicit, but
 personally current broadcasting rules have never bothered me, certainly not
 to the extent of justifying massive backwards compatibility violations. Take
 It from someone who relies on broadcasting for every other line of code.


 On Sun, Feb 8, 2015 at 10:03 PM, Stefan Reiterer dom...@gmx.net wrote:

 Hi!

 As shortly discussed on github:
 https://github.com/numpy/numpy/issues/5541

 I personally think that silent Broadcasting is not a good thing. I had
 recently a lot
 of trouble with row and column vectors which got bradcastet toghether
 altough it was
 more annoying than useful, especially since I had to search deep down into
 the code to find out
 that the problem was nothing else than Broadcasting...

 I personally use Octave and/or Numpy  for several years now and never ever
 needed braodcasting.
 But since it is still there there will be many users who need it, there
 will be some use for it.

 So I suggest that the best would be to throw warnings when arrays get
 Broadcasted like
 Octave do. Python warnings can be catched and handled, that would be a
 great benefit.

 Another idea would to provide warning levels for braodcasting, e.g
 0 = Never, 1=Warn once, 2=Warn always, 3 = Forbid aka throw exception,
 with 0 as default.
 This would avoid breaking other code, and give the user some control over
 braodcasting.

 Kind regards,
 Stefan


Numpy broadcasting is the greatest feature of numpy !!!

It just takes a bit of getting used to coming from another matrix language.


Josef
what 3!?





 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Silent Broadcasting considered harmful

2015-02-08 Thread josef.pktd

On Sun, Feb 8, 2015 at 4:56 PM, Matthew Brett matthew.br...@gmail.com wrote:
 Hi,

 On Sun, Feb 8, 2015 at 1:39 PM, Simon Wood sgwoo...@gmail.com wrote:


 On Sun, Feb 8, 2015 at 4:24 PM, Stefan Reiterer dom...@gmx.net wrote:

 I don't think this is a good comparison, especially since broadcasting is
 a feature not a necessity ...
 It's more like turning off/on driving assistance.

 And as already mentioned: other matrix languages also allow it, but they
 warn about it's usage.
 This has indeed it's merits.
 Gesendet: Sonntag, 08. Februar 2015 um 22:17 Uhr
 Von: Charles R Harris charlesr.har...@gmail.com
 An: Discussion of Numerical Python numpy-discussion@scipy.org
 Betreff: Re: [Numpy-discussion] Silent Broadcasting considered harmful


 On Sun, Feb 8, 2015 at 2:14 PM, Stefan Reiterer dom...@gmx.net wrote:

 Yeah I'm aware of that, that's the reason why I suggested a warning level
 as an alternative.
 Setting no warnings as default would avoid breaking existing code.
 Gesendet: Sonntag, 08. Februar 2015 um 22:08 Uhr
 Von: Eelco Hoogendoorn hoogendoorn.ee...@gmail.com
 An: Discussion of Numerical Python numpy-discussion@scipy.org
 Betreff: Re: [Numpy-discussion] Silent Broadcasting considered harmful
  I personally use Octave and/or Numpy  for several years now and never
  ever needed braodcasting.
 But since it is still there there will be many users who need it, there
 will be some use for it.

 Uhm, yeah, there is some use for it. Im all for explicit over implicit,
 but personally current broadcasting rules have never bothered me, certainly
 not to the extent of justifying massive backwards compatibility violations.
 Take It from someone who relies on broadcasting for every other line of
 code.



 It's how numpy works. It would be like getting into your car and being
 warned that it has wheels.

 Chuck
 ___ NumPy-Discussion mailing
 list NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


 I agree, I do not think this is a good comparison. All cars have wheels,
 there are no surprises there. This is more like a car that decides to do
 something completely different from everything that you learned about in
 driving school.

 I find the broadcasting aspect of Numpy a turn off. If I go to add a 1x3
 vector to a 3x1 vector, I want the program to warn me or error out. I don't
 want it to do something under the covers that has no mathematical basis or
 definition. Also, Octave may provide a warning, but Matlab errors
 out...Matrix dimensions must agree. Which they must, at least in my world.

 In a previous life, many of us were very serious users of Matlab,
 myself included.

 Matlab / Octave have a model of the array as being a matrix, but numpy
 does not have this model.   There is a Matrix class that implements
 this model, but usually experienced numpy users either never use this,
 or stop using it.

 I can only say - subjectively I know - that I did not personally
 suffer from this when I switched to numpy from Matlab, partly because
 I was fully aware that I was going to have to change the way I thought
 about arrays, for various reasons.  After a short while getting used
 to it, broadcasting seemed like a huge win.  I guess the fact that
 other languages have adopted it means that others have had the same
 experience.

 So, numpy is not a straight replacement of Matlab, in terms of design.

 To pursue the analogy, you have learned to drive an automatic car.
 Numpy is a stick-shift car.  There are good reasons to prefer a
 stick-shift, but it does mean that someone trained on an automatic is
 bound to feel that a stick-shift is uncomfortable for a while.


I think the analogy is Python printing at the start and all the time a warning
We use indentation, not braces, brackets or `end` to indicate blocks of code.

Josef




 Best,

 Matthew
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Silent Broadcasting considered harmful

2015-02-08 Thread josef.pktd

On Sun, Feb 8, 2015 at 5:17 PM, Stefan Reiterer dom...@gmx.net wrote:
 Actually I use numpy for several years now, and I love it.
 The reason that I think silent broadcasting of sums is bad
 comes simply from the fact, that I had more trouble with it, than it helped
 me.

 I won't stop using numpy because of that, but I think this behavior may
 backfire,
 and thats the reason I started this discussion. Till now the only way out of
 the misery
 is to make proper unit tests, and to be careful as hell with dimensions and
 shape checks.

I fully agree with the last part. We need a lot more checks and be a
lot more careful in numpy than in matlab. But that's a fundamental
difference between the array versus matrix approach.

For me the main behavior I had to adjust to was loosing a dimension in
any reduce operation, mean, sum, ...

if x is 2d
x - x.mean(1)
we loose a dimension, and it doesn't broadcast in the right direction

x - x.mean(0)
perfect, no `repeat` needed, it just broadcasts the way we need.

Josef


 Providing optional warnings just would be an elegant way out of this.

 Cheers,
 Stefan
 Gesendet: Sonntag, 08. Februar 2015 um 22:56 Uhr
 Von: Matthew Brett matthew.br...@gmail.com

 An: Discussion of Numerical Python numpy-discussion@scipy.org
 Betreff: Re: [Numpy-discussion] Silent Broadcasting considered harmful
 Hi,

 On Sun, Feb 8, 2015 at 1:39 PM, Simon Wood sgwoo...@gmail.com wrote:


 On Sun, Feb 8, 2015 at 4:24 PM, Stefan Reiterer dom...@gmx.net wrote:

 I don't think this is a good comparison, especially since broadcasting is
 a feature not a necessity ...
 It's more like turning off/on driving assistance.

 And as already mentioned: other matrix languages also allow it, but they
 warn about it's usage.
 This has indeed it's merits.
 Gesendet: Sonntag, 08. Februar 2015 um 22:17 Uhr
 Von: Charles R Harris charlesr.har...@gmail.com
 An: Discussion of Numerical Python numpy-discussion@scipy.org
 Betreff: Re: [Numpy-discussion] Silent Broadcasting considered harmful


 On Sun, Feb 8, 2015 at 2:14 PM, Stefan Reiterer dom...@gmx.net wrote:

 Yeah I'm aware of that, that's the reason why I suggested a warning
 level
 as an alternative.
 Setting no warnings as default would avoid breaking existing code.
 Gesendet: Sonntag, 08. Februar 2015 um 22:08 Uhr
 Von: Eelco Hoogendoorn hoogendoorn.ee...@gmail.com
 An: Discussion of Numerical Python numpy-discussion@scipy.org
 Betreff: Re: [Numpy-discussion] Silent Broadcasting considered harmful
  I personally use Octave and/or Numpy for several years now and never
  ever needed braodcasting.
 But since it is still there there will be many users who need it, there
 will be some use for it.

 Uhm, yeah, there is some use for it. Im all for explicit over implicit,
 but personally current broadcasting rules have never bothered me,
 certainly
 not to the extent of justifying massive backwards compatibility
 violations.
 Take It from someone who relies on broadcasting for every other line of
 code.



 It's how numpy works. It would be like getting into your car and being
 warned that it has wheels.

 Chuck
 ___ NumPy-Discussion mailing
 list NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


 I agree, I do not think this is a good comparison. All cars have wheels,
 there are no surprises there. This is more like a car that decides to do
 something completely different from everything that you learned about in
 driving school.

 I find the broadcasting aspect of Numpy a turn off. If I go to add a 1x3
 vector to a 3x1 vector, I want the program to warn me or error out. I
 don't
 want it to do something under the covers that has no mathematical basis or
 definition. Also, Octave may provide a warning, but Matlab errors
 out...Matrix dimensions must agree. Which they must, at least in my
 world.

 In a previous life, many of us were very serious users of Matlab,
 myself included.

 Matlab / Octave have a model of the array as being a matrix, but numpy
 does not have this model. There is a Matrix class that implements
 this model, but usually experienced numpy users either never use this,
 or stop using it.

 I can only say - subjectively I know - that I did not personally
 suffer from this when I switched to numpy from Matlab, partly because
 I was fully aware that I was going to have to change the way I thought
 about arrays, for various reasons. After a short while getting used
 to it, broadcasting seemed like a huge win. I guess the fact that
 other languages have adopted it means that others have had the same
 experience.

 So, numpy is not a straight replacement of Matlab, in terms of design.

 To pursue the analogy, you have learned to drive an automatic car.
 Numpy is a stick-shift car. There are good reasons to prefer a
 stick-shift, but it does mean that someone trained on an automatic is
 bound to feel that a stick-shift is uncomfortable for a while.

 Best,

 Matthew

Re: [Numpy-discussion] The future of ndarray.diagonal()

2015-01-05 Thread josef.pktd

On Mon, Jan 5, 2015 at 11:13 AM, Alan G Isaac alan.is...@gmail.com wrote:

 On 1/5/2015 10:48 AM, josef.p...@gmail.com wrote:
  Dtypes are a mess (in terms of code compatibility). Matlab is much
 nicer, it's all just doubles.


 1. Thank goodness for dtypes.
 2. http://www.mathworks.com/help/matlab/numeric-types.html
 3. After translating Matlab code to much nicer NumPy,
 I cannot find any way to say MATLAB is nicer.


Maybe it's my selection bias in matlab, I only wrote or read code in matlab
that used exclusively double.

Of course they are a necessary and great feature.

However, life and code would be simpler if we could just do
x = np.asarray(x, float) or even  x = np.array(x, float)
at the beginning of every function, instead of worrying why a user doesn't
have float and trying to accommodate that choice.

https://github.com/statsmodels/statsmodels/search?q=dtypetype=Issuesutf8=%E2%9C%93

AFAIK, matlab and R still have copy on write, so they don't have to worry
about inplace modifications.

5 lines of code to implement an algorithm, and 50 lines of code for input
checking.


My response was to the issue of code as algorithmic documentation:

There are packages or code supplements to books that come with the
disclaimer that the code is written for educational purposes, to help
understand the algorithm, but is not designed for efficiency or performance
or generality.

The more powerful the language and the fancier the code, the larger is
the maintenance and wrapping work.

another example:

a dot product of a float/double 2d array is independent of any numpy
version, and it will produce the same result in numpy 19.0 (except for
different machine precision rounding errors)

a dot product of an array (without dtype and shape restriction) might be
anything and change within a few numpy versions.

Josef




 Cheers,
 Alan
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Matrix Class

2015-02-14 Thread josef.pktd

On Sat, Feb 14, 2015 at 12:05 PM, cjw c...@ncf.ca wrote:

 On 14-Feb-15 11:35 AM, josef.p...@gmail.com wrote:

 On Wed, Feb 11, 2015 at 4:18 PM, Ryan Nelson rnelsonc...@gmail.com
 wrote:

 Colin,

 I currently use Py3.4 and Numpy 1.9.1. However, I built a quick test
 conda
 environment with Python2.7 and Numpy 1.7.0, and I get the same:

 
 Python 2.7.9 |Continuum Analytics, Inc.| (default, Dec 18 2014, 16:57:52)
 [MSC v
 .1500 64 bit (AMD64)]
 Type copyright, credits or license for more information.

 IPython 2.3.1 -- An enhanced Interactive Python.
 Anaconda is brought to you by Continuum Analytics.
 Please check out: http://continuum.io/thanks and https://binstar.org
 ? - Introduction and overview of IPython's features.
 %quickref - Quick reference.
 help  - Python's own help system.
 object?   - Details about 'object', use 'object??' for extra details.

 In [1]: import numpy as np

 In [2]: np.__version__
 Out[2]: '1.7.0'

 In [3]: np.mat([4,'5',6])
 Out[3]:
 matrix([['4', '5', '6']],
 dtype='|S1')

 In [4]: np.mat([4,'5',6], dtype=int)
 Out[4]: matrix([[4, 5, 6]])
 ###

 As to your comment about coordinating with Statsmodels, you should see
 the
 links in the thread that Alan posted:
 http://permalink.gmane.org/gmane.comp.python.numeric.general/56516
 http://permalink.gmane.org/gmane.comp.python.numeric.general/56517
 Josef's comments at the time seem to echo the issues the devs (and
 others)
 have with the matrix class. Maybe things have changed with Statsmodels.

 Not changed, we have a strict policy against using np.matrix.

 generic efficient versions for linear operators, kronecker or sparse
 block matrix styly operations would be useful, but I would use array
 semantics, similar to using dot or linalg functions on ndarrays.

 Josef
 (long reply canceled because I'm writing too much that might only be
 of tangential interest or has been in some of the matrix discussion
 before.)

 Josef,

 Many thanks.  I have gained the impression that there is some antipathy to
 np.matrix, perhaps this is because, as others have suggested, the array
 doesn't provide an appropriate framework.

It's not directly antipathy, it's cost-benefit analysis.

np.matrix has few advantages, but makes reading and maintaining code
much more difficult.
Having to watch out for multiplication `*` is a lot of extra work.

Checking shapes and fixing bugs with unexpected dtypes is also a lot
of work, but we have large benefits.
For a long time the policy in statsmodels was to keep pandas out of
the core of functions (i.e. out of the actual calculations) and
restrict it to inputs and returns. However, pandas is becoming more
popular and can do some things much better than plain numpy, so it is
slowly moving inside some of our core calculations.
It's still an easy source of bugs, but we do gain something.

Benefits like these don't exist for np.matrix.


 Where are such policy decisions documented?  Numpy doesn't appear to have a
 BDFL.

In general it's a mix of mailing list discussions and discussion in
issues and PRs.
I'm not directly involved in numpy and don't subscribe to the numpy's
github notifications.

For scipy (and partially for statsmodels): I think large parts of
policies for code and workflow are not explicitly specified, but are
more an understanding of maintainers and developers that can slowly
change over time, build up through spread out discussion as temporary
consensus (or without strong objections).
scipy has a hacking text file to describe some of it, but I haven't
read it in ages.

(long term changes compared to 6 years ago: required code review and
required test coverage.)

Josef



 I had read Alan's links back in February and now have note of them.

 Colin W.




 I know I mentioned Sage and SageMathCloud before. I'll just point out
 that
 there are folks that use this for real research problems, not just as a
 pedagogical tool. They have a Matrix/vector/column_matrix class that do
 what
 you were expecting from your problems posted above. Indeed below is a
 (truncated) cut and past from a Sage Worksheet. (See
 http://www.sagemath.org/doc/tutorial/tour_linalg.html)
 ##
 In : Matrix([1,'2',3])
 Error in lines 1-1
 Traceback (most recent call last):
 TypeError: unable to find a common ring for all elements

 In : Matrix([[1,2,3],[4,5]])
 ValueError: List of rows is not valid (rows are wrong types or lengths)

 In : vector([1,2,3])
 (1, 2, 3)

 In : column_matrix([1,2,3])
 [1]
 [2]
 [3]
 ##

 Large portions of the custom code and wrappers in Sage are written in
 Python. I don't think their Matrix object is a subclass of ndarray, so
 perhaps you could strip out the Matrix stuff from here to make a separate
 project with just the Matrix stuff, if you don't want to go through the
 Sage
 interface.


 On Wed, Feb 11, 2015 at 11:54 AM, cjw c...@ncf.ca wrote:


 On 11-Feb-15 10:21 AM, Ryan Nelson wrote:

 So:

 In [2]: np.mat([4,'5',6])
 Out[2]:

Re: [Numpy-discussion] Matrix Class

2015-02-14 Thread josef.pktd

On Sat, Feb 14, 2015 at 4:27 PM, Charles R Harris
charlesr.har...@gmail.com wrote:


 On Sat, Feb 14, 2015 at 12:36 PM, josef.p...@gmail.com wrote:

 On Sat, Feb 14, 2015 at 12:05 PM, cjw c...@ncf.ca wrote:
 
  On 14-Feb-15 11:35 AM, josef.p...@gmail.com wrote:
 
  On Wed, Feb 11, 2015 at 4:18 PM, Ryan Nelson rnelsonc...@gmail.com
  wrote:
 
  Colin,
 
  I currently use Py3.4 and Numpy 1.9.1. However, I built a quick test
  conda
  environment with Python2.7 and Numpy 1.7.0, and I get the same:
 
  
  Python 2.7.9 |Continuum Analytics, Inc.| (default, Dec 18 2014,
  16:57:52)
  [MSC v
  .1500 64 bit (AMD64)]
  Type copyright, credits or license for more information.
 
  IPython 2.3.1 -- An enhanced Interactive Python.
  Anaconda is brought to you by Continuum Analytics.
  Please check out: http://continuum.io/thanks and https://binstar.org
  ? - Introduction and overview of IPython's features.
  %quickref - Quick reference.
  help  - Python's own help system.
  object?   - Details about 'object', use 'object??' for extra details.
 
  In [1]: import numpy as np
 
  In [2]: np.__version__
  Out[2]: '1.7.0'
 
  In [3]: np.mat([4,'5',6])
  Out[3]:
  matrix([['4', '5', '6']],
  dtype='|S1')
 
  In [4]: np.mat([4,'5',6], dtype=int)
  Out[4]: matrix([[4, 5, 6]])
  ###
 
  As to your comment about coordinating with Statsmodels, you should see
  the
  links in the thread that Alan posted:
  http://permalink.gmane.org/gmane.comp.python.numeric.general/56516
  http://permalink.gmane.org/gmane.comp.python.numeric.general/56517
  Josef's comments at the time seem to echo the issues the devs (and
  others)
  have with the matrix class. Maybe things have changed with
  Statsmodels.
 
  Not changed, we have a strict policy against using np.matrix.
 
  generic efficient versions for linear operators, kronecker or sparse
  block matrix styly operations would be useful, but I would use array
  semantics, similar to using dot or linalg functions on ndarrays.
 
  Josef
  (long reply canceled because I'm writing too much that might only be
  of tangential interest or has been in some of the matrix discussion
  before.)
 
  Josef,
 
  Many thanks.  I have gained the impression that there is some antipathy
  to
  np.matrix, perhaps this is because, as others have suggested, the array
  doesn't provide an appropriate framework.

 It's not directly antipathy, it's cost-benefit analysis.

 np.matrix has few advantages, but makes reading and maintaining code
 much more difficult.
 Having to watch out for multiplication `*` is a lot of extra work.

 Checking shapes and fixing bugs with unexpected dtypes is also a lot
 of work, but we have large benefits.
 For a long time the policy in statsmodels was to keep pandas out of
 the core of functions (i.e. out of the actual calculations) and
 restrict it to inputs and returns. However, pandas is becoming more
 popular and can do some things much better than plain numpy, so it is
 slowly moving inside some of our core calculations.
 It's still an easy source of bugs, but we do gain something.


 Any bits of Pandas that might be good for numpy/scipy to steal?

I'm not a Pandas expert.
Some of it comes into statsmodels because we need the data handling
also inside a function, e.g. keeping track of labels, indices, and so
on. Another reason is that contributors are more familiar with
pandas's way of solving a problems, even if I suspect numpy would be
more efficient.

However, a recent change, replaces where I would have used np.unique
with pandas.factorize which is supposed to be faster.
https://github.com/statsmodels/statsmodels/pull/2213

Two or three years ago my numpy way of group handling (using
np.unique, bincount and similar) was still faster than the pandas
`apply` version, I'm not sure that's still true.


And to emphasize: all our heavy stuff especially the big models still
only have numpy and scipy inside (with the exception of one model
waiting in a PR).

Josef



 snip

 Chuck


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] suggestion: improve text of failing test

2015-02-05 Thread josef.pktd

The assert_allclose text is not precise enough to be helpful to fix a test
failure that cannot be replicated on every machine, and we cannot just
quickly grab --pdb-failures.

By how much do I have to lower the precision to make it pass on this
continuous integration machine?


assert_allclose(he, hefd, rtol=5e-10)
  File C:\Python27\envs\py3\lib\site-packages\numpy\testing\utils.py,
line 1297, in assert_allclose
verbose=verbose, header=header)
  File C:\Python27\envs\py3\lib\site-packages\numpy\testing\utils.py,
line 665, in assert_array_compare
raise AssertionError(msg)
AssertionError:
Not equal to tolerance rtol=5e-10, atol=0

(mismatch 100.0%)
 x: array([[ -2.965667e+01,  -1.988865e+02,  -2.370194e+00,  -1.003654e+01],
   [ -1.988865e+02,  -1.383377e+03,  -1.592292e+01,  -6.800266e+01],
   [ -2.370194e+00,  -1.592292e+01,  -8.301699e-01,  -8.301699e-01],
   [ -1.003654e+01,  -6.800266e+01,  -8.301699e-01,  -3.449885e+00]])
 y: array([[ -2.965667e+01,  -1.988865e+02,  -2.370194e+00,  -1.003654e+01],
   [ -1.988865e+02,  -1.383377e+03,  -1.592292e+01,  -6.800266e+01],
   [ -2.370194e+00,  -1.592292e+01,  -8.301699e-01,  -8.301699e-01],
   [ -1.003654e+01,  -6.800266e+01,  -8.301699e-01,  -3.449885e+00]])


the suggestion is to add rtol and atol to the mismatch summary, so we can
see if it's just a precision issue or something serious

rtol = np.max(np.abs(x / y - 1)
atol = np.max(np.abs(x - y)

(mismatch 100.0%  rtol=xxx  atol=xxx)


(and as aside to the all close discussion:
I do set the tolerances very carefully especially if the agreement with
comparison numbers is below 1e-6 or so)


Josef
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Characteristic of a Matrix.

2015-01-05 Thread josef.pktd

On Mon, Jan 5, 2015 at 1:58 PM, Nathaniel Smith n...@pobox.com wrote:

 I'm afraid that I really don't understand what you're trying to say. Is
 there something that you think numpy should be doing differently?



I liked it better when this raised an exception, instead of creating a
rectangular object array.

Josef



 On Mon, Jan 5, 2015 at 6:40 PM, Colin J. Williams cjwilliam...@gmail.com
 wrote:

 One of the essential characteristics of a matrix is that it be
 rectangular.

 This is neither spelt out or checked currently.

 The Doc description refers to a class:

- *class *numpy.matrix[source]

 http://github.com/numpy/numpy/blob/v1.9.1/numpy/matrixlib/defmatrix.py#L206

 Returns a matrix from an array-like object, or from a string of data. A
 matrix is aspecialized 2-D array that retains its 2-D
 nature through operations. It has certain special operators, such as *
 (matrix multiplication) and ** (matrix power).

 This illustrates a failure, which is reported later in the calculation:

 A2= np.matrix([[1, 2, -2], [-3, -1, 4], [4, 2 -6]])

 Here 2 - 6 is treated as an expression.

 Wikipedia offers:

 In mathematics http://en.wikipedia.org/wiki/Mathematics, a *matrix*
 (plural *matrices*) is a rectangular
 http://en.wikipedia.org/wiki/Rectangle *array
 http://en.wiktionary.org/wiki/array*[1]
 http://en.wikipedia.org/wiki/Matrix_%28mathematics%29#cite_note-1 of
 numbers http://en.wikipedia.org/wiki/Number, symbols
 http://en.wikipedia.org/wiki/Symbol_%28formal%29, or expressions
 http://en.wikipedia.org/wiki/Expression_%28mathematics%29, arranged in 
 *rows
 http://en.wiktionary.org/wiki/row* and *columns
 http://en.wiktionary.org/wiki/column*.[2]
 http://en.wikipedia.org/wiki/Matrix_%28mathematics%29#cite_note-2[3]
 http://en.wikipedia.org/wiki/Matrix_%28mathematics%29#cite_note-3 The
 individual items in a matrix are called its *elements* or *entries*. An
 example of a matrix with 2 rows and 3 columns is
 [image: \begin{bmatrix}1  9  -13 \\20  5  -6 \end{bmatrix}.]In the
 Numpy context, the symbols or expressions need to be evaluable.

 Colin W.





 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion




 --
 Nathaniel J. Smith
 Postdoctoral researcher - Informatics - University of Edinburgh
 http://vorpus.org

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Characteristic of a Matrix.

2015-01-05 Thread josef.pktd

On Mon, Jan 5, 2015 at 2:36 PM, Nathaniel Smith n...@pobox.com wrote:

 On Mon, Jan 5, 2015 at 7:18 PM, josef.p...@gmail.com wrote:
 
 
 
  On Mon, Jan 5, 2015 at 1:58 PM, Nathaniel Smith n...@pobox.com wrote:
 
  I'm afraid that I really don't understand what you're trying to say. Is
 there something that you think numpy should be doing differently?
 
 
  I liked it better when this raised an exception, instead of creating a
 rectangular object array.

 Did it really used to raise an exception? Patches accepted :-)  (#5303
 is the relevant bug, like Warren points out. From the discussion there
 it doesn't look like np.array's handling of non-conformable lists has
 any defenders.)


Since I'm usually late in updating numpy, I was for a long time very
familiar with the frequent occurence of
`ValueError: setting an array element with a sequence.`

based on this, it was up to numpy 1.5
https://github.com/scipy/scipy/pull/2631#issuecomment-20898809

ugly but backwards compatible :)

Josef




 --
 Nathaniel J. Smith
 Postdoctoral researcher - Informatics - University of Edinburgh
 http://vorpus.org
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] The future of ndarray.diagonal()

2015-01-05 Thread josef.pktd

On Mon, Jan 5, 2015 at 4:08 AM, Konrad Hinsen konrad.hin...@fastmail.net
wrote:

 --On 5 janvier 2015 08:43:45 + Sturla Molden sturla.mol...@gmail.com
 wrote:

  To me it seems that algorithms in scientific papers and books are
  described in various forms of pseudo-code.

 That's indeed what people do when they write a paper about an algorithm.
 But many if not most algorithms in computational science are never
 published in a specific article. Very often, a scientific article gives
 only an outline of a method in plain English. The only full documentation
 of the method is the implementation.

  Perhaps we need a notation
  which is universal and ethernal like the language mathematics. But I am
  not sure Python could or should try to be that scripting language.

 Neither Python nor any other programming was designed for that task, and
 none of them is really a good fit. But today's de facto situation is that
 programming languages fulfill the role of algorithmic specification
 languages in computational science. And I don't expect this to change
 rapidly, in particular because to the best of my knowledge there is no
 better choice available at the moment.

 I wrote an article on this topic that will appear in the March 2015 issue
 of Computing in Science and Engineering. It concludes that for now, a
 simple Python script is probably the best you can do for an executable
 specification of an algorithm. However, I also recommend not using big
 libraries such as NumPy in such scripts.

  I also think it is reasonable to ask if journals should require code as
  algorithmic documentation to be written in some ISO standard language
 like
  C or Fortran 90. The behavior of Python and NumPy are not dictated by
  standards, and as such is not better than pseudo-code.

 True, but the ISO specifications of C and Fortran have so many holes
 (undefined behavior) that they are not really much better for the job.
 And again, we can't ignore the reality of the de facto use today: there are
 no such requirements or even guidelines, so Python scripts are often the
 best we have as algorithmic documentation.


Matlab is more well defined than numpy.

numpy has too many features.

I think, if you want a runnable python script as algorithmic documentation,
then it will be necessary and relatively easy in most cases to stick to the
stable basic features.
The same for a library, if we want to minimize compatibility problems, then
we shouldn't use features that are most likely a moving target.
One of the issues is whether we want to write safe or fancy code.
(Fancy code might or will be faster, with a specific version.)

For example in most of my use cases having a view or copy of an array makes
a difference to the performance but not the results. I didn't participate
in the `diagonal` debate because I don't have a strong opinion and don't
use it with an assignment. There is an explicit np.fill_diagonal that is
inplace.

Having views or copies of arrays never sounded like having a clear cut
answer, there are too many functions that return views if possible.
When our (statsmodels) code correctness depends on whether it's a view or
copy, then we usually make sure and write the matching unit tests.

Other cases, the behavior of numpy in edge cases like empty arrays is still
in flux. We usually try to avoid relying on implicit behavior.
Dtypes are a mess (in terms of code compatibility). Matlab is much nicer,
it's all just doubles. Now pandas and numpy are making object arrays
popular and introduce strange things like datetime dtypes, and users think
a program written a while ago can handle them.

Related compatibility issue python 2 and python 3: For non-string
manipulation scientific code the main limitation is to avoid version
specific features, and decide when to use lists versus iterators for range,
zip, map. Other than that, it looks much simpler to me than expected.


Overall I think the current policy of incremental changes in numpy works
very well. Statsmodels needs a few minor adjustments in each version. But
most of those are for cases where numpy became more strict or where we used
a specific behavior in edge cases, AFAIR.

One problem with accumulating changes for a larger version change like
numpy 2 or 3 or 4 is to decide what changes would require this. Most
changes will break some code, if the code requires or uses some exotic or
internal behavior.
If we want to be strict, then we don't change the policy but change the
version numbers, instead of 1.8, 1.9 we have numpy 18 and numpy 19.
However, from my perspective none of the recent changes were fundamental
enough.

BTW: Stata is versioning scripts. Each script can define for which version
of Stata it was written, but I have no idea how they handle the
compatibility issues.  It looks to me that it would be way too much work to
do something like this in an open source project.

Legacy cleanups like removal of numeric compatibility in numpy or weave
(and

Re: [Numpy-discussion] Adding keyword to asarray and asanyarray.

2015-03-06 Thread josef.pktd

On Fri, Mar 6, 2015 at 7:59 AM, Charles R Harris
charlesr.har...@gmail.com wrote:


 On Thu, Mar 5, 2015 at 10:02 PM, josef.p...@gmail.com wrote:

 On Thu, Mar 5, 2015 at 12:33 PM, Charles R Harris
 charlesr.har...@gmail.com wrote:
 
 
  On Thu, Mar 5, 2015 at 10:04 AM, Chris Barker chris.bar...@noaa.gov
  wrote:
 
  On Thu, Mar 5, 2015 at 8:42 AM, Benjamin Root ben.r...@ou.edu wrote:
 
  dare I say... datetime64/timedelta64 support?
 
 
  well, the precision of those is 64 bits, yes? so if you asked for less
  than that, you'd still get a dt64. If you asked for 64 bits, you'd get
  it,
  if you asked for datetime128  -- what would you get???
 
  a 128 bit integer? or an Exception, because there is no 128bit datetime
  dtype.
 
  But I think this is the same problem with any dtype -- if you ask for a
  precision that doesn't exist, you're going to get an error.
 
  Is there a more detailed description of the proposed feature anywhere?
  Do
  you specify a dtype as a precision? or jsut the precision, and let the
  dtype
  figure it out for itself, i.e.:
 
  precision=64
 
  would give you a float64 if the passed in array was a float type, but a
  int64 if the passed in array was an int type, or a uint64 if the passed
  in
  array was a unsigned int type, etc.
 
  But in the end,  I wonder about the use case. I generaly use asarray
  one
  of two ways:
 
  Without a dtype -- to simple make sure I've got an ndarray of SOME
  dtype.
 
  or
 
  With a dtype - because I really care about the dtype -- usually because
  I
  need to pass it on to C code or something.
 
  I don't think I'd ever need at least some precision, but not care if I
  got
  more than that...
 
 
  The main use that I want to cover is that float64 and complex128 have
  the
  same precision and it would be good if either is acceptable.  Also, one
  might just want either float32 or float64, not just one of the two.
  Another
  intent is to make the fewest possible copies. The determination of the
  resulting type is made using the result_type function.


 How does this work for object arrays, or datetime?

 Can I specify at least float32 or float64, and it raises an exception
 if it cannot be converted?

 The problem we have in statsmodels is that pandas frequently uses
 object arrays and it messes up patsy or statsmodels if it's not
 explicitly converted.


 Object arrays go to object arrays, datetime64 depends.

 In [10]: result_type(ones(1, dtype=object_), float32)
 Out[10]: dtype('O')


 Datetime64 seems to use the highest precision

 In [12]: result_type(ones(1, dtype='datetime64[D]'), 'datetime64[us]')
 Out[12]: dtype('M8[us]')

 In [13]: result_type(ones(1, dtype='datetime64[D]'), 'datetime64[Y]')
 Out[13]: dtype('M8[D]')

 but doesn't convert to float

 In [11]: result_type(ones(1, dtype='datetime64[D]'), float32)
 ---
 TypeError Traceback (most recent call last)
 ipython-input-11-e1a09e933dc7 in module()
  1 result_type(ones(1, dtype='datetime64[D]'), float32)

 TypeError: invalid type promotion

 What would you like it to do?

Note: the dtype handling in statsmodels is still a mess, and we just
plugged some of the worst cases.


What we would need is asarray with at least a minimum precision (e.g.
float32) and raise an exception if it's not numeric, like string,
object, custom dtypes ...

However, we need custom dtype handling in statsmodels anyway, so the
enhancement to asarray with exceptions would mainly be convenient to
get something to work with because pandas and numpy as now object
array friendly.

I assume scipy also has insufficient checks for non-numeric dtypes, AFAIR.


Josef



 Chuck

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] MKL ValueError: On entry to DLASCL parameter number 5 had an illegal value

2015-03-10 Thread josef.pktd

I got a illegal value message using MKL on Windows 64 while running the
statsmodels test suite.

Kevin is getting the same with more information, which indicates that it
might be numpy.linalg.svd
https://github.com/statsmodels/statsmodels/issues/2308#issuecomment-78086656

Is this serious?

I'm just setting up a new computer and haven't investigated yet.
Given the name of the test class, this might be a warning for an inf or
nan, then it would be our problem in statsmodels

Josef
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] MKL ValueError: On entry to DLASCL parameter number 5 had an illegal value

2015-03-10 Thread josef.pktd

On Tue, Mar 10, 2015 at 1:20 PM, Charles R Harris charlesr.har...@gmail.com
 wrote:



 On Tue, Mar 10, 2015 at 10:33 AM, josef.p...@gmail.com wrote:


 I got a illegal value message using MKL on Windows 64 while running the
 statsmodels test suite.

 Kevin is getting the same with more information, which indicates that it
 might be numpy.linalg.svd

 https://github.com/statsmodels/statsmodels/issues/2308#issuecomment-78086656

 Is this serious?

 I'm just setting up a new computer and haven't investigated yet.
 Given the name of the test class, this might be a warning for an inf or
 nan, then it would be our problem in statsmodels


 What version of Numpy? Does this also happen if you aren't using MKL? The 
 dlascl
 reference
 http://www.netlib.org/lapack/explore-html/de/d3c/dlascl_8f.html is
 here. It is possible that the problem being solved is numerically sensitive
 and that the treatment of underflow/overflow due to compiler flags might be
 producing zeros or infs.


nosetests says NumPy version 1.9.2rc1
Version is from winpython which I guess uses Gohlke binaries.

It didn't show up with official numpy in 32bit python, and it doesn't show
up on TravisCI.

In contrast to Kevin, I don't get any info or test failure

M:\Notesnosetests --pdb --pdb-failures -v
statsmodels.base.tests.test_data.Test
MissingArray
statsmodels.base.tests.test_data.TestMissingArray.test_raise_no_missing ...
ok
statsmodels.base.tests.test_data.TestMissingArray.test_raise ... ok
statsmodels.base.tests.test_data.TestMissingArray.test_drop ... ok
statsmodels.base.tests.test_data.TestMissingArray.test_none ...
Intel MKL ERROR: Parameter 4 was incorrect on entry to DLASCL.

Intel MKL ERROR: Parameter 5 was incorrect on entry to DLASCL.

Intel MKL ERROR: Parameter 4 was incorrect on entry to DLASCL.

Intel MKL ERROR: Parameter 5 was incorrect on entry to DLASCL.
ok
statsmodels.base.tests.test_data.TestMissingArray.test_endog_only_raise ...
ok
statsmodels.base.tests.test_data.TestMissingArray.test_endog_only_drop ...
ok
statsmodels.base.tests.test_data.TestMissingArray.test_mv_endog ... ok
statsmodels.base.tests.test_data.TestMissingArray.test_extra_kwargs_2d ...
ok
statsmodels.base.tests.test_data.TestMissingArray.test_extra_kwargs_1d ...
ok

--
Ran 9 tests in 0.049s

OK

However based on the testcase, it looks like we call np.linalg.svd with an
array that contains nans,
DLASCL might then be used internally in the svd calculations

Josef


 Chuck

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Behavior of np.random.multivariate_normal with bad covariance matrices

2015-03-30 Thread josef.pktd

On Sun, Mar 29, 2015 at 7:39 PM, Blake Griffith
blake.a.griff...@gmail.com wrote:
 I have an open PR which lets users control the checks on the input
 covariance matrix. The matrix is required to be symmetric and positve
 semi-definite (PSD). The current behavior is that NumPy raises a warning if
 the matrix is not PSD, and does not even check for symmetry.

 I added a symmetry check, which raises a warning when the input is not
 symmetric. And added two keyword args which users can use to turn off the
 checks/warnings when the matrix is ill formed. So this would only cause
 another new warning to be raised in existing code.

 This is needed because sometimes the covariance matrix is only *almost*
 symmetric or PSD due to roundoff error.

 Thoughts?

My only question is why is **exact** symmetry relevant?

AFAIU
A empirical covariance matrix might not be exactly symmetric unless we
specifically force it to be. But I don't see why some roundoff errors
that violate symmetry should be relevant.

use allclose with floating point rtol or equivalent?

Some user code might suddenly get irrelevant warnings.

BTW:
neg = (np.sum(u.T * v, axis=1)  0)  (s  0)
doesn't need to be calculated if cov_psd is false.

-

some more:

svd can hang if the values are not finite, i.e. nan or infs

counter proposal would be to add a `check_valid` keyword with option
ignore. warn, raise, and fix

and raise an error if there are nans and check_valid is not ignore.

-

aside:
np.random.multivariate_normal   is only relevant if you have a new cov
each call (or don't mind repeated possibly expensive calculations),
so, I guess, adding checks by default won't upset many users.


Josef




 PR: https://github.com/numpy/numpy/pull/5726

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Adding keyword to asarray and asanyarray.

2015-03-05 Thread josef.pktd

On Thu, Mar 5, 2015 at 12:33 PM, Charles R Harris
charlesr.har...@gmail.com wrote:


 On Thu, Mar 5, 2015 at 10:04 AM, Chris Barker chris.bar...@noaa.gov wrote:

 On Thu, Mar 5, 2015 at 8:42 AM, Benjamin Root ben.r...@ou.edu wrote:

 dare I say... datetime64/timedelta64 support?


 well, the precision of those is 64 bits, yes? so if you asked for less
 than that, you'd still get a dt64. If you asked for 64 bits, you'd get it,
 if you asked for datetime128  -- what would you get???

 a 128 bit integer? or an Exception, because there is no 128bit datetime
 dtype.

 But I think this is the same problem with any dtype -- if you ask for a
 precision that doesn't exist, you're going to get an error.

 Is there a more detailed description of the proposed feature anywhere? Do
 you specify a dtype as a precision? or jsut the precision, and let the dtype
 figure it out for itself, i.e.:

 precision=64

 would give you a float64 if the passed in array was a float type, but a
 int64 if the passed in array was an int type, or a uint64 if the passed in
 array was a unsigned int type, etc.

 But in the end,  I wonder about the use case. I generaly use asarray one
 of two ways:

 Without a dtype -- to simple make sure I've got an ndarray of SOME dtype.

 or

 With a dtype - because I really care about the dtype -- usually because I
 need to pass it on to C code or something.

 I don't think I'd ever need at least some precision, but not care if I got
 more than that...


 The main use that I want to cover is that float64 and complex128 have the
 same precision and it would be good if either is acceptable.  Also, one
 might just want either float32 or float64, not just one of the two. Another
 intent is to make the fewest possible copies. The determination of the
 resulting type is made using the result_type function.


How does this work for object arrays, or datetime?

Can I specify at least float32 or float64, and it raises an exception
if it cannot be converted?

The problem we have in statsmodels is that pandas frequently uses
object arrays and it messes up patsy or statsmodels if it's not
explicitly converted.

Josef





 Chuck


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Advanced indexing: fancy vs. orthogonal

2015-04-02 Thread josef.pktd

On Thu, Apr 2, 2015 at 2:03 PM, Eric Firing efir...@hawaii.edu wrote:
 On 2015/04/02 4:15 AM, Jaime Fernández del Río wrote:
 We probably need more traction on the should this be done? discussion
 than on the can this be done? one, the need for a reordering of the
 axes swings me slightly in favor, but I mostly don't see it yet.

 As a long-time user of numpy, and an advocate and teacher of Python for
 science, here is my perspective:

 Fancy indexing is a horrible design mistake--a case of cleverness run
 amok.  As you can read in the Numpy documentation, it is hard to
 explain, hard to understand, hard to remember.  Its use easily leads to
 unreadable code and hard-to-see errors.  Here is the essence of an
 example that a student presented me with just this week, in the context
 of reordering eigenvectors based on argsort applied to eigenvalues:

 In [25]: xx = np.arange(2*3*4).reshape((2, 3, 4))

 In [26]: ii = np.arange(4)

 In [27]: print(xx[0])
 [[ 0  1  2  3]
   [ 4  5  6  7]
   [ 8  9 10 11]]

 In [28]: print(xx[0, :, ii])
 [[ 0  4  8]
   [ 1  5  9]
   [ 2  6 10]
   [ 3  7 11]]

 Quickly now, how many numpy users would look at that last expression and
 say, Of course, that is equivalent to transposing xx[0]?  And, Of
 course that expression should give a completely different result from
 xx[0][:, ii].?

 I would guess it would be less than 1%.  That should tell you right away
 that we have a real problem here.  Fancy indexing can't be *read* by a
 sub-genius--it has to be laboriously figured out piece by piece, with
 frequent reference to the baffling descriptions in the Numpy docs.

 So I think you should turn the question around and ask, What is the
 actual real-world use case for fancy indexing?  How often does real
 code rely on it?  I have taken advantage of it occasionally, maybe you
 have too, but I think a survey of existing code would show that the need
 for it is *far* less common than the need for simple orthogonal
 indexing.  That tells me that it is fancy indexing, not orthogonal
 indexing, that should be available through a function and/or special
 indexing attribute.  The question is then how to make that transition.


Swapping the axis when slices are mixed with fancy indexing was a
design mistake, IMO. But not fancy indexing itself.

 np.triu_indices(5)
(array([0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4], dtype=int64),
array([0, 1, 2, 3, 4, 1, 2, 3, 4, 2, 3, 4, 3, 4, 4], dtype=int64))
 m = np.arange(25).reshape(5, 5)[np.triu_indices(5)]
 m
array([ 0,  1,  2,  3,  4,  6,  7,  8,  9, 12, 13, 14, 18, 19, 24])

 m2 = np.zeros((5,5))
 m2[np.triu_indices(5)] = m
 m2
array([[  0.,   1.,   2.,   3.,   4.],
   [  0.,   6.,   7.,   8.,   9.],
   [  0.,   0.,  12.,  13.,  14.],
   [  0.,   0.,   0.,  18.,  19.],
   [  0.,   0.,   0.,   0.,  24.]])

(I don't remember what's fancy in indexing, just that broadcasting
rules apply.)

Josef



 Eric





 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Advanced indexing: fancy vs. orthogonal

2015-04-02 Thread josef.pktd

On Thu, Apr 2, 2015 at 10:30 PM, Matthew Brett matthew.br...@gmail.com wrote:
 Hi,

 On Thu, Apr 2, 2015 at 6:09 PM,  josef.p...@gmail.com wrote:
 On Thu, Apr 2, 2015 at 8:02 PM, Eric Firing efir...@hawaii.edu wrote:
 On 2015/04/02 1:14 PM, Hanno Klemm wrote:
 Well, I have written quite a bit of code that relies on fancy
 indexing, and I think the question, if the behaviour of the []
 operator should be changed has sailed with numpy now at version 1.9.
 Given the amount packages that rely on numpy, changing this
 fundamental behaviour would not be a clever move.

 Are you *positive* that there is no clever way to make a transition?
 It's not worth any further thought?

 I guess it would be similar to python 3 string versus bytes, but
 without the overwhelming benefits.

 I don't think I would be in favor of deprecating fancy indexing even
 if it were possible. In general, my impression is that if there is a
 trade-off in numpy between powerful machinery versus easy to learn and
 teach, then the design philosophy when in favor of power.

 I think numpy indexing is not too difficult and follows a consistent
 pattern, and I completely avoid mixing slices and index arrays with
 ndim  2.

 I'm sure y'all are totally on top of this, but for myself, I would
 like to distinguish:

 * fancy indexing with boolean arrays - I use it all the time and don't
 get confused;
 * fancy indexing with non-boolean arrays - horrendously confusing,
 almost never use it, except on a single axis when I can't confuse it
 with orthogonal indexing:

 In [3]: a = np.arange(24).reshape(6, 4)

 In [4]: a
 Out[4]:
 array([[ 0,  1,  2,  3],
[ 4,  5,  6,  7],
[ 8,  9, 10, 11],
[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]])

 In [5]: a[[1, 2, 4]]
 Out[5]:
 array([[ 4,  5,  6,  7],
[ 8,  9, 10, 11],
[16, 17, 18, 19]])

 I also remember a discussion with Travis O where he was also saying
 that this indexing was confusing and that it would be good if there
 was some way to transition to what he called outer product indexing (I
 think that's the same as 'orthogonal' indexing).

 I think it should be DOA, except as a discussion topic for numpy 3000.

 I think there are two proposals here:

 1) Add some syntactic sugar to allow orthogonal indexing of numpy
 arrays, no backward compatibility break.

 That seems like a very good idea to me - were there any big objections to 
 that?

 2) Over some long time period, move the default behavior of np.array
 non-boolean indexing from the current behavior to the orthogonal
 behavior.

 That is going to be very tough, because it will cause very confusing
 breakage of legacy code.

 On the other hand, maybe it is worth going some way towards that, like this:

 * implement orthogonal indexing as a method arr.sensible_index[...]
 * implement the current non-boolean fancy indexing behavior as a
 method - arr.crazy_index[...]
 * deprecate non-boolean fancy indexing as standard arr[...] indexing;
 * wait a long time;
 * remove non-boolean fancy indexing as standard arr[...] (errors are
 preferable to change in behavior)

 Then if we are brave we could:

 * wait a very long time;
 * make orthogonal indexing the default.

 But the not-brave steps above seem less controversial, and fairly reasonable.

 What about that as an approach?

I also thought the transition would have to be something like that or
a clear break point, like numpy 3.0. I would be in favor something
like this for the axis swapping case with ndim2.

However, before going to that, you would still have to provide a list
of behaviors that will be deprecated, and make a poll in various
libraries for how much it is actually used.

My impression is that fancy indexing is used more often than
orthogonal indexing (beyond the trivial case x[:, idx]).
Also, many usecases for orthogonal indexing moved to using pandas, and
numpy is left with non-orthogonal indexing use cases.
And third, fancy indexing is a superset of orthogonal indexing (with
proper broadcasting), and you still need to justify why everyone
should be restricted to the subset instead of a voluntary constraint
to use code that is easier to understand.

I checked numpy.random.choice which I would have implemented with
fancy indexing, but it uses only `take`, AFAICS.

Switching to using a explicit method is not really a problem for
maintained library code, but I still don't really see why we should do
this.

Josef


 Cheers,

 Matthew
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Advanced indexing: fancy vs. orthogonal

2015-04-02 Thread josef.pktd

On Thu, Apr 2, 2015 at 11:30 PM, Nathaniel Smith n...@pobox.com wrote:
 On Thu, Apr 2, 2015 at 6:35 PM,  josef.p...@gmail.com wrote:
 (I thought about this because I was looking at accessing off-diagonal
 elements, m2[np.arange(4), np.arange(4) + 1] )

 Psst: np.diagonal(m2, offset=1)

It was just an example  (banded or toeplitz)
(I know how indexing works, kind off, but don't remember what diag or
other functions are exactly doing.)

 m2b = m2.copy()
 m2b[np.arange(4), np.arange(4) + 1]
array([  1.,   7.,  13.,  19.])
 m2b[np.arange(4), np.arange(4) + 1] = np.nan
 m2b
array([[  0.,  nan,   2.,   3.,   4.],
   [  0.,   6.,  nan,   8.,   9.],
   [  0.,   0.,  12.,  nan,  14.],
   [  0.,   0.,   0.,  18.,  nan],
   [  0.,   0.,   0.,   0.,  24.]])

 m2c = m2.copy()
 np.diagonal(m2c, offset=1) = np.nan
SyntaxError: can't assign to function call
 dd = np.diagonal(m2c, offset=1)
 dd[:] = np.nan
Traceback (most recent call last):
  File pyshell#89, line 1, in module
dd[:] = np.nan
ValueError: assignment destination is read-only
 np.__version__
'1.9.2rc1'

 m2d = m2.copy()
 m2d[np.arange(4)[::-1], np.arange(4) + 1] = np.nan

Josef


 --
 Nathaniel J. Smith -- http://vorpus.org
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Advanced indexing: fancy vs. orthogonal

2015-04-02 Thread josef.pktd

On Thu, Apr 2, 2015 at 9:09 PM,  josef.p...@gmail.com wrote:
 On Thu, Apr 2, 2015 at 8:02 PM, Eric Firing efir...@hawaii.edu wrote:
 On 2015/04/02 1:14 PM, Hanno Klemm wrote:
 Well, I have written quite a bit of code that relies on fancy
 indexing, and I think the question, if the behaviour of the []
 operator should be changed has sailed with numpy now at version 1.9.
 Given the amount packages that rely on numpy, changing this
 fundamental behaviour would not be a clever move.

 Are you *positive* that there is no clever way to make a transition?
 It's not worth any further thought?

 I guess it would be similar to python 3 string versus bytes, but
 without the overwhelming benefits.

 I don't think I would be in favor of deprecating fancy indexing even
 if it were possible. In general, my impression is that if there is a
 trade-off in numpy between powerful machinery versus easy to learn and
 teach, then the design philosophy when in favor of power.

 I think numpy indexing is not too difficult and follows a consistent
 pattern, and I completely avoid mixing slices and index arrays with
 ndim  2.

 I think it should be DOA, except as a discussion topic for numpy 3000.

 just my opinion


is this fancy?

 vals
array([6, 5, 4, 1, 2, 3])
 a+b
array([[3, 2, 1, 0],
   [4, 3, 2, 1],
   [5, 4, 3, 2]])
 vals[a+b]
array([[1, 4, 5, 6],
   [2, 1, 4, 5],
   [3, 2, 1, 4]])

https://github.com/scipy/scipy/blob/v0.14.0/scipy/linalg/special_matrices.py#L178

(I thought about this because I was looking at accessing off-diagonal
elements, m2[np.arange(4), np.arange(4) + 1] )


How would you find all the code that would not be correct anymore with
a changed definition of indexing and slicing, if there is insufficient
test coverage and it doesn't raise an exception?
If we find it, who fixes all the legacy code? (I don't think it will
be minor unless there is a new method `fix_[...]`  (fancy ix)

Josef


 Josef



 If people want to implement orthogonal indexing with another method,
 by all means I might use it at some point in the future. However,
 adding even more complexity to the behaviour of the bracket slicing
 is probably not a good idea.

 I'm not advocating adding even more complexity, I'm trying to think
 about ways to make it *less* complex from the typical user's standpoint.

 Eric
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Advanced indexing: fancy vs. orthogonal

2015-04-02 Thread josef.pktd

On Thu, Apr 2, 2015 at 8:02 PM, Eric Firing efir...@hawaii.edu wrote:
 On 2015/04/02 1:14 PM, Hanno Klemm wrote:
 Well, I have written quite a bit of code that relies on fancy
 indexing, and I think the question, if the behaviour of the []
 operator should be changed has sailed with numpy now at version 1.9.
 Given the amount packages that rely on numpy, changing this
 fundamental behaviour would not be a clever move.

 Are you *positive* that there is no clever way to make a transition?
 It's not worth any further thought?

I guess it would be similar to python 3 string versus bytes, but
without the overwhelming benefits.

I don't think I would be in favor of deprecating fancy indexing even
if it were possible. In general, my impression is that if there is a
trade-off in numpy between powerful machinery versus easy to learn and
teach, then the design philosophy when in favor of power.

I think numpy indexing is not too difficult and follows a consistent
pattern, and I completely avoid mixing slices and index arrays with
ndim  2.

I think it should be DOA, except as a discussion topic for numpy 3000.

just my opinion

Josef



 If people want to implement orthogonal indexing with another method,
 by all means I might use it at some point in the future. However,
 adding even more complexity to the behaviour of the bracket slicing
 is probably not a good idea.

 I'm not advocating adding even more complexity, I'm trying to think
 about ways to make it *less* complex from the typical user's standpoint.

 Eric
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] code snippet: assert all close or large

2015-04-30 Thread josef.pktd

Sorry, hit the wrong key

just an example that I think is not covered by numpy.testing assert

absolute tolerance for `inf`: assert x and y are allclose or x is large if
y is inf

On Thu, Apr 30, 2015 at 2:24 PM, josef.p...@gmail.com wrote:




def assert_allclose_large(x, y, rtol=1e-6, atol=0, ltol=1e30):

 assert x and y are allclose or x is large if y is inf



mask_inf = np.isinf(y)  ~np.isinf(x)

assert_allclose(x[~mask_inf], y[~mask_inf], rtol=rtol, atol=atol)

assert_array_less(ltol, x[mask_inf])


Josef
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] code snippet: assert all close or large

2015-04-30 Thread josef.pktd


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] binary wheels for numpy?

2015-05-15 Thread josef.pktd

On Fri, May 15, 2015 at 4:07 PM, Chris Barker chris.bar...@noaa.gov wrote:

 Hi folks.,

 I did a little intro to scipy session as part of a larger Python class
 the other day, and was dismayed to find that pip install numpy still
 dosn't work on Windows.

 Thanks mostly to Matthew Brett's work, the whole scipy stack is
 pip-installable on OS-X, it would be really nice if we had that for Windows.

 And no, saying you should go get Python(x,y) or Anaconda, or Canopy,
 or...) is really not a good solution. That is indeed the way to go if
 someone is primarily focusing on computational programming, but if you have
 a web developer, or someone new to Python for general use, they really
 should be able to just grab numpy and play around with it a bit without
 having to start all over again.


Unrelated to the pip/wheel discussion.

In my experience by far the easiest to get something running to play with
is using Winpython. Download and unzip (and maybe add to system path) and
most of the data analysis stack is available.

I haven't even bothered yet to properly install a full system python on
my Windows machine. I'm just working with 3 winpython. (One even has Julia
and IJulia included after following the installation instructions for a
short time.)

Josef





 My solution was to point folks to Chris Gohlke's site -- which is a
 Fabulous resource --

 THANK YOU CHRISTOPH!

 But I still think that we should have the basic scipy stack on PyPi as
 Windows Wheels...

 IIRC, the last run through on this discussion got stuck on the what
 hardware should it support -- wheels do not allow a selection at install
 time, so we'd have to decide what instruction set to support, and just
 stick with that. Which would mean that:

 some folks would get a numpy/scipy that would run a bit slower than it
 might
 and
 some folks would get one that wouldn't run at all on their machine.

 But I don't see any reason that we can't find a compromise here -- do a
 build that supports most machines, and be done with it. Even now, people
 have to go get (one way or another) a MKL-based build to get optimum
 performance anyway -- so if we pick an instruction set support by, say (an
 arbitrary, and impossible to determine) 95% of machines out there -- we're
 good to go.

 I take it there are licensing issues that prevent us from putting Chris'
 Binaries up on PyPi?

 But are there technical issues I'm forgetting here, or do we just need to
 come to a consensus as to hardware version to support and do it?

 -Chris







 --

 Christopher Barker, Ph.D.
 Oceanographer

 Emergency Response Division
 NOAA/NOS/ORR(206) 526-6959   voice
 7600 Sand Point Way NE   (206) 526-6329   fax
 Seattle, WA  98115   (206) 526-6317   main reception

 chris.bar...@noaa.gov

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors

2015-04-15 Thread josef.pktd

On Wed, Apr 15, 2015 at 6:08 PM,  josef.p...@gmail.com wrote:
 On Wed, Apr 15, 2015 at 5:31 PM, Neil Girdhar mistersh...@gmail.com wrote:
 Does it work for you to set

 outer = np.multiply.outer

 ?

 It's actually faster on my machine.

 I assume it does because np.corrcoeff uses it, and it's the same type
 of use cases.
 However, I'm not using it very often (I prefer broadcasting), but I've
 seen it often enough when reviewing code.

 This is mainly to point out that it could be a popular function (that
 maybe shouldn't be deprecated)

 https://github.com/search?utf8=%E2%9C%93q=np.outer
 416914

After thinking another minute:

I think it should not be deprecated, it's like toepliz. We can use it
also to normalize 2d arrays where columns and rows are different not
symmetric as in the corrcoef case.

Josef



 Josef



 On Wed, Apr 15, 2015 at 5:29 PM, josef.p...@gmail.com wrote:

 On Wed, Apr 15, 2015 at 7:35 AM, Neil Girdhar mistersh...@gmail.com
 wrote:
  Yes, I totally agree.  If I get started on the PR to deprecate np.outer,
  maybe I can do it as part of the same PR?
 
  On Wed, Apr 15, 2015 at 4:32 AM, Sebastian Berg
  sebast...@sipsolutions.net
  wrote:
 
  Just a general thing, if someone has a few minutes, I think it would
  make sense to add the ufunc.reduce thing to all of these functions at
  least in the See Also or Notes section in the documentation.
 
  These special attributes are not that well known, and I think that
  might
  be a nice way to make it easier to find.
 
  - Sebastian
 
  On Di, 2015-04-14 at 22:18 -0400, Nathaniel Smith wrote:
   I am, yes.
  
   On Apr 14, 2015 9:17 PM, Neil Girdhar mistersh...@gmail.com
   wrote:
   Ok, I didn't know that.  Are you at pycon by any chance?
  
   On Tue, Apr 14, 2015 at 7:16 PM, Nathaniel Smith
   n...@pobox.com wrote:
   On Tue, Apr 14, 2015 at 3:48 PM, Neil Girdhar
   mistersh...@gmail.com wrote:
Yes, I totally agree with you regarding np.sum and
   np.product, which is why
I didn't suggest np.add.reduce, np.multiply.reduce.
   I wasn't sure whether
cumsum and cumprod might be on the line in your
   judgment.
  
   Ah, I see. I think we should treat them the same for
   now -- all the
   comments I made apply to a lesser or greater extent
   (in particular,
   cumsum and cumprod both do the thing where they
   dispatch to .cumsum()
   .cumprod() method).
  
   -n
  
   --
   Nathaniel J. Smith -- http://vorpus.org
   ___
   NumPy-Discussion mailing list
   NumPy-Discussion@scipy.org
  
   http://mail.scipy.org/mailman/listinfo/numpy-discussion
  
  
  
  
   ___
   NumPy-Discussion mailing list
   NumPy-Discussion@scipy.org
   http://mail.scipy.org/mailman/listinfo/numpy-discussion
  
   ___
   NumPy-Discussion mailing list
   NumPy-Discussion@scipy.org
   http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 


 I'm just looking at this thread.

 I see outer used quite often

 corrcoef = cov / np.outer(std, std)

 (even I use it sometimes instead of
 cov / std[:,None] / std

 Josef
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors

2015-04-15 Thread josef.pktd

On Wed, Apr 15, 2015 at 6:40 PM, Nathaniel Smith n...@pobox.com wrote:
 On Wed, Apr 15, 2015 at 6:08 PM,  josef.p...@gmail.com wrote:
 On Wed, Apr 15, 2015 at 5:31 PM, Neil Girdhar mistersh...@gmail.com wrote:
 Does it work for you to set

 outer = np.multiply.outer

 ?

 It's actually faster on my machine.

 I assume it does because np.corrcoeff uses it, and it's the same type
 of use cases.
 However, I'm not using it very often (I prefer broadcasting), but I've
 seen it often enough when reviewing code.

 This is mainly to point out that it could be a popular function (that
 maybe shouldn't be deprecated)

 https://github.com/search?utf8=%E2%9C%93q=np.outer
 416914

 For future reference, that's not the number -- you have to click
 through to Code and then look at a single-language result to get
 anything remotely meaningful. In this case b/c they're different by an
 order of magnitude, and in general because sometimes the top line
 number is completely made up (like it has no relation to the
 per-language numbers on the left and then changes around randomly if
 you simply reload the page).

 (So 29,397 is what you want in this case.)

 Also that count then tends to have tons of duplicates (e.g. b/c there
 are hundreds of copies of numpy itself on github), so you need a big
 grain of salt when looking at the absolute number, but it can be
 useful, esp. for relative comparisons.

My mistake, rushing too much.
github show only 25 code references in numpy itself.

in quotes, python only  (namespace conscious packages on github)
(I think github counts modules not instances)

np.cumsum 11,022
np.cumprod 1,290
np.outer 6,838

statsmodels
np.cumsum 21
np.cumprod  2
np.outer 15

Josef


 -n
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors

2015-04-15 Thread josef.pktd

On Wed, Apr 15, 2015 at 5:31 PM, Neil Girdhar mistersh...@gmail.com wrote:
 Does it work for you to set

 outer = np.multiply.outer

 ?

 It's actually faster on my machine.

I assume it does because np.corrcoeff uses it, and it's the same type
of use cases.
However, I'm not using it very often (I prefer broadcasting), but I've
seen it often enough when reviewing code.

This is mainly to point out that it could be a popular function (that
maybe shouldn't be deprecated)

https://github.com/search?utf8=%E2%9C%93q=np.outer
416914

Josef



 On Wed, Apr 15, 2015 at 5:29 PM, josef.p...@gmail.com wrote:

 On Wed, Apr 15, 2015 at 7:35 AM, Neil Girdhar mistersh...@gmail.com
 wrote:
  Yes, I totally agree.  If I get started on the PR to deprecate np.outer,
  maybe I can do it as part of the same PR?
 
  On Wed, Apr 15, 2015 at 4:32 AM, Sebastian Berg
  sebast...@sipsolutions.net
  wrote:
 
  Just a general thing, if someone has a few minutes, I think it would
  make sense to add the ufunc.reduce thing to all of these functions at
  least in the See Also or Notes section in the documentation.
 
  These special attributes are not that well known, and I think that
  might
  be a nice way to make it easier to find.
 
  - Sebastian
 
  On Di, 2015-04-14 at 22:18 -0400, Nathaniel Smith wrote:
   I am, yes.
  
   On Apr 14, 2015 9:17 PM, Neil Girdhar mistersh...@gmail.com
   wrote:
   Ok, I didn't know that.  Are you at pycon by any chance?
  
   On Tue, Apr 14, 2015 at 7:16 PM, Nathaniel Smith
   n...@pobox.com wrote:
   On Tue, Apr 14, 2015 at 3:48 PM, Neil Girdhar
   mistersh...@gmail.com wrote:
Yes, I totally agree with you regarding np.sum and
   np.product, which is why
I didn't suggest np.add.reduce, np.multiply.reduce.
   I wasn't sure whether
cumsum and cumprod might be on the line in your
   judgment.
  
   Ah, I see. I think we should treat them the same for
   now -- all the
   comments I made apply to a lesser or greater extent
   (in particular,
   cumsum and cumprod both do the thing where they
   dispatch to .cumsum()
   .cumprod() method).
  
   -n
  
   --
   Nathaniel J. Smith -- http://vorpus.org
   ___
   NumPy-Discussion mailing list
   NumPy-Discussion@scipy.org
  
   http://mail.scipy.org/mailman/listinfo/numpy-discussion
  
  
  
  
   ___
   NumPy-Discussion mailing list
   NumPy-Discussion@scipy.org
   http://mail.scipy.org/mailman/listinfo/numpy-discussion
  
   ___
   NumPy-Discussion mailing list
   NumPy-Discussion@scipy.org
   http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 


 I'm just looking at this thread.

 I see outer used quite often

 corrcoef = cov / np.outer(std, std)

 (even I use it sometimes instead of
 cov / std[:,None] / std

 Josef
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors

2015-04-15 Thread josef.pktd

On Wed, Apr 15, 2015 at 7:35 AM, Neil Girdhar mistersh...@gmail.com wrote:
 Yes, I totally agree.  If I get started on the PR to deprecate np.outer,
 maybe I can do it as part of the same PR?

 On Wed, Apr 15, 2015 at 4:32 AM, Sebastian Berg sebast...@sipsolutions.net
 wrote:

 Just a general thing, if someone has a few minutes, I think it would
 make sense to add the ufunc.reduce thing to all of these functions at
 least in the See Also or Notes section in the documentation.

 These special attributes are not that well known, and I think that might
 be a nice way to make it easier to find.

 - Sebastian

 On Di, 2015-04-14 at 22:18 -0400, Nathaniel Smith wrote:
  I am, yes.
 
  On Apr 14, 2015 9:17 PM, Neil Girdhar mistersh...@gmail.com wrote:
  Ok, I didn't know that.  Are you at pycon by any chance?
 
  On Tue, Apr 14, 2015 at 7:16 PM, Nathaniel Smith
  n...@pobox.com wrote:
  On Tue, Apr 14, 2015 at 3:48 PM, Neil Girdhar
  mistersh...@gmail.com wrote:
   Yes, I totally agree with you regarding np.sum and
  np.product, which is why
   I didn't suggest np.add.reduce, np.multiply.reduce.
  I wasn't sure whether
   cumsum and cumprod might be on the line in your
  judgment.
 
  Ah, I see. I think we should treat them the same for
  now -- all the
  comments I made apply to a lesser or greater extent
  (in particular,
  cumsum and cumprod both do the thing where they
  dispatch to .cumsum()
  .cumprod() method).
 
  -n
 
  --
  Nathaniel J. Smith -- http://vorpus.org
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
 
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



I'm just looking at this thread.

I see outer used quite often

corrcoef = cov / np.outer(std, std)

(even I use it sometimes instead of
cov / std[:,None] / std

Josef
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors

2015-04-17 Thread josef.pktd

On Fri, Apr 17, 2015 at 10:07 AM, Sebastian Berg
sebast...@sipsolutions.net wrote:
 On Do, 2015-04-16 at 15:28 -0700, Matthew Brett wrote:
 Hi,

 snip

 So, how about a slight modification of your proposal?

 1) Raise deprecation warning for np.outer for non 1D arrays for a few
 versions, with depraction in favor of np.multiply.outer, then
 2) Raise error for np.outer on non 1D arrays


 I think that was Neil's proposal a bit earlier, too. +1 for it in any
 case, since at least for the moment I doubt outer is used a lot for non
 1-d arrays. Possible step 3) make it work on higher dims after a long
 period.

sounds ok to me

Some random comments of what I remember or guess in terms of usage

I think there are at most very few np.outer usages with 2d or higher dimension.
(statsmodels has two models that switch between 2d and 1d
parameterization where we don't use outer but it has similar
characteristics. However, we need to control the ravel order, which
IIRC is Fortran)

The current behavior of 0-D scalars in the initial post might be
useful if a numpy function returns a scalar instead of a 1-D array in
size=1. np.diag which is a common case, doesn't return a scalar (in my
version of numpy).

I don't know any use case where I would ever want to have the 2d
behavior of np.multiply.outer.
I guess we will or would have applications for outer along an axis,
for example if x.shape = (100, 10), then we have
x[:,None, :] * x[:, :, None] (I guess)
Something like this shows up reasonably often in econometrics as
Outer Product. However in most cases we can avoid constructing this
matrix and get the final results in a more memory efficient or faster
way.
(example an array of covariance matrices)

Josef





 - Sebastian


 Best,

 Matthew
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors

2015-04-17 Thread josef.pktd

On Fri, Apr 17, 2015 at 10:59 AM, Sebastian Berg
sebast...@sipsolutions.net wrote:
 On Fr, 2015-04-17 at 10:47 -0400, josef.p...@gmail.com wrote:
 On Fri, Apr 17, 2015 at 10:07 AM, Sebastian Berg
 sebast...@sipsolutions.net wrote:
  On Do, 2015-04-16 at 15:28 -0700, Matthew Brett wrote:
  Hi,
 
  snip
 
  So, how about a slight modification of your proposal?
 
  1) Raise deprecation warning for np.outer for non 1D arrays for a few
  versions, with depraction in favor of np.multiply.outer, then
  2) Raise error for np.outer on non 1D arrays
 
 
  I think that was Neil's proposal a bit earlier, too. +1 for it in any
  case, since at least for the moment I doubt outer is used a lot for non
  1-d arrays. Possible step 3) make it work on higher dims after a long
  period.

 sounds ok to me

 Some random comments of what I remember or guess in terms of usage

 I think there are at most very few np.outer usages with 2d or higher 
 dimension.
 (statsmodels has two models that switch between 2d and 1d
 parameterization where we don't use outer but it has similar
 characteristics. However, we need to control the ravel order, which
 IIRC is Fortran)

 The current behavior of 0-D scalars in the initial post might be
 useful if a numpy function returns a scalar instead of a 1-D array in
 size=1. np.diag which is a common case, doesn't return a scalar (in my
 version of numpy).

 I don't know any use case where I would ever want to have the 2d
 behavior of np.multiply.outer.
 I guess we will or would have applications for outer along an axis,
 for example if x.shape = (100, 10), then we have
 x[:,None, :] * x[:, :, None] (I guess)
 Something like this shows up reasonably often in econometrics as
 Outer Product. However in most cases we can avoid constructing this
 matrix and get the final results in a more memory efficient or faster
 way.
 (example an array of covariance matrices)


 So basically outer product of stacked vectors (fitting basically into
 how np.linalg functions now work). I think that might be a good idea,
 but even then we first need to do the deprecation and it would be a long
 term project. Or you add np.linalg.outer or such sooner and in the
 longer run it will be an alias to that instead of np.multiple.outer.


Essentially yes, but I don't have an opinion about location or
implementation in numpy, nor do I know enough.

I always considered np.outer conceptually as belonging to linalg that
provides a more convenient interface than np.dot if both arrays are
1-D.  (no need to add extra axis and transpose)

Josef



 Josef




 
  - Sebastian
 
 
  Best,
 
  Matthew
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors

2015-04-17 Thread josef.pktd

On Fri, Apr 17, 2015 at 12:16 PM, Neil Girdhar mistersh...@gmail.com wrote:


 On Fri, Apr 17, 2015 at 12:09 PM, josef.p...@gmail.com wrote:

 On Fri, Apr 17, 2015 at 11:22 AM, Neil Girdhar mistersh...@gmail.com
 wrote:
 
 
  On Fri, Apr 17, 2015 at 10:47 AM, josef.p...@gmail.com wrote:
 
  On Fri, Apr 17, 2015 at 10:07 AM, Sebastian Berg
  sebast...@sipsolutions.net wrote:
   On Do, 2015-04-16 at 15:28 -0700, Matthew Brett wrote:
   Hi,
  
   snip
  
   So, how about a slight modification of your proposal?
  
   1) Raise deprecation warning for np.outer for non 1D arrays for a
   few
   versions, with depraction in favor of np.multiply.outer, then
   2) Raise error for np.outer on non 1D arrays
  
  
   I think that was Neil's proposal a bit earlier, too. +1 for it in any
   case, since at least for the moment I doubt outer is used a lot for
   non
   1-d arrays. Possible step 3) make it work on higher dims after a long
   period.
 
  sounds ok to me
 
  Some random comments of what I remember or guess in terms of usage
 
  I think there are at most very few np.outer usages with 2d or higher
  dimension.
  (statsmodels has two models that switch between 2d and 1d
  parameterization where we don't use outer but it has similar
  characteristics. However, we need to control the ravel order, which
  IIRC is Fortran)
 
  The current behavior of 0-D scalars in the initial post might be
  useful if a numpy function returns a scalar instead of a 1-D array in
  size=1. np.diag which is a common case, doesn't return a scalar (in my
  version of numpy).
 
  I don't know any use case where I would ever want to have the 2d
  behavior of np.multiply.outer.
 

 I only understand part of your example, but it looks similar to what
 we are doing in statsmodels.

 
  My use case is pretty simple.  Given an input vector x, and a weight
  matrix
  W, and a model y=Wx, I calculate the gradient of the loss L with respect
  W.
  It is the outer product of x with the vector of gradients dL/dy.  So the
  code is simply:
 
  W -= outer(x, dL_by_dy)

 if you sum/subtract over all the values, isn't this the same as
 np.dot(x, dL_by_dy)


 What?  Matrix subtraction is element-wise:

 In [1]: x = np.array([2,3,4])

 In [2]: dL_by_dy = np.array([7,9])

 In [5]: W = np.zeros((3, 2))

 In [6]: W -= np.outer(x, dL_by_dy)

 In [7]: W
 Out[7]:
 array([[-14., -18.],
[-21., -27.],
[-28., -36.]])


Ok, different use case

mine are more like variations on the following

 a1 = np.arange(18).reshape(6,3)
 a2 = np.arange(12).reshape(6, 2)
 index = [1, 2, 5]


text book version
 np.sum([np.outer(a1[i], a2[i]) for i in index], 0)
array([[180, 204],
   [196, 223],
   [212, 242]])

simpler
 np.dot(a1[index].T, a2[index])
array([[180, 204],
   [196, 223],
   [212, 242]])



 
  Sometimes, I have some x_indices and y_indices.  Now I want to do:
 
  W[x_indices, y_indices] -= outer(x[x_indices], dL_by_dy[y_indices])
 
  Unfortunately, if x_indices or y_indices are int or slice in some way
  that
  removes a dimension, the left side will have fewer dimensions than the
  right.  np.multipy.outer does the right thing without the ugly cases:
 
  if isinstance(x_indices, int): … # ugly hacks follow.

 My usual hacks are either to use np.atleast_1d or np.atleast_1d or
 np.squeeze if there is shape mismatch in some cases.


 Yes, but in this case, the left side is the problem, which has too few
 dimensions.  So atleast_1d doesn't work.  I was conditionally squeezing, but
 that is extremely ugly.  Especially if you're conditionally squeezing based
 on both x_indices and y_indices.

I don't remember if I ever used something like this

 a1[0, 1]
1
 a1[np.atleast_1d(0), np.atleast_1d(1)]
array([1])

 a1[np.atleast_1d(0), np.atleast_1d(1)] = [[100]]

 a1[0, 1]  = [[100]]
Traceback (most recent call last):
  File pyshell#314, line 1, in module
a1[0, 1]  = [[100]]
ValueError: setting an array element with a sequence.

Josef





 
  I guess we will or would have applications for outer along an axis,
  for example if x.shape = (100, 10), then we have
  x[:,None, :] * x[:, :, None] (I guess)
  Something like this shows up reasonably often in econometrics as
  Outer Product. However in most cases we can avoid constructing this
  matrix and get the final results in a more memory efficient or faster
  way.
  (example an array of covariance matrices)
 
 
  Not sure I see this.  outer(a, b) should return something that has
  shape:
  (a.shape + b.shape).  If you're doing it along an axis, you mean
  you're
  reshuffling the resulting shape vector?

 No I'm not reshaping the full tensor product.

 It's a vectorized version of looping over independent outer products

 np.array([outer(xi, yi) for xi,yi in zip(x, y)])
 (which I would never use with outer)

 but I have code that works similar for a reduce (or reduce_at) loop over
 this.

 Josef


 
 
  Josef
 
 
 
 
  
   - Sebastian
  
  
   Best,
  
   Matthew

Re: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors

2015-04-17 Thread josef.pktd

Neil, please reply inline or at the bottom which is customary for
numpy scipy related mailing lists.
It's sometimes difficult to figure out what the context of your reply
is. (and the context is all over the place)

On Fri, Apr 17, 2015 at 11:30 AM, Neil Girdhar mistersh...@gmail.com wrote:
 This relationship between outer an dot only holds for vectors.  For tensors,
 and other kinds of vector spaces, I'm not sure if outer products and dot
 products have anything to do with each other.

That may be the case, and I never figured out what to do with dot in
more than 2 dimensions.

90% (a guess) of what I work on or see is in a 2-D or vectorized 3-D
world with 2-D linalg, or can be reduced to it.

(general tensor algebra creates endless loops in my brain :)

Josef



 On Fri, Apr 17, 2015 at 11:11 AM, josef.p...@gmail.com wrote:

 On Fri, Apr 17, 2015 at 10:59 AM, Sebastian Berg
 sebast...@sipsolutions.net wrote:
  On Fr, 2015-04-17 at 10:47 -0400, josef.p...@gmail.com wrote:
  On Fri, Apr 17, 2015 at 10:07 AM, Sebastian Berg
  sebast...@sipsolutions.net wrote:
   On Do, 2015-04-16 at 15:28 -0700, Matthew Brett wrote:
   Hi,
  
   snip
  
   So, how about a slight modification of your proposal?
  
   1) Raise deprecation warning for np.outer for non 1D arrays for a
   few
   versions, with depraction in favor of np.multiply.outer, then
   2) Raise error for np.outer on non 1D arrays
  
  
   I think that was Neil's proposal a bit earlier, too. +1 for it in any
   case, since at least for the moment I doubt outer is used a lot for
   non
   1-d arrays. Possible step 3) make it work on higher dims after a long
   period.
 
  sounds ok to me
 
  Some random comments of what I remember or guess in terms of usage
 
  I think there are at most very few np.outer usages with 2d or higher
  dimension.
  (statsmodels has two models that switch between 2d and 1d
  parameterization where we don't use outer but it has similar
  characteristics. However, we need to control the ravel order, which
  IIRC is Fortran)
 
  The current behavior of 0-D scalars in the initial post might be
  useful if a numpy function returns a scalar instead of a 1-D array in
  size=1. np.diag which is a common case, doesn't return a scalar (in my
  version of numpy).
 
  I don't know any use case where I would ever want to have the 2d
  behavior of np.multiply.outer.
  I guess we will or would have applications for outer along an axis,
  for example if x.shape = (100, 10), then we have
  x[:,None, :] * x[:, :, None] (I guess)
  Something like this shows up reasonably often in econometrics as
  Outer Product. However in most cases we can avoid constructing this
  matrix and get the final results in a more memory efficient or faster
  way.
  (example an array of covariance matrices)
 
 
  So basically outer product of stacked vectors (fitting basically into
  how np.linalg functions now work). I think that might be a good idea,
  but even then we first need to do the deprecation and it would be a long
  term project. Or you add np.linalg.outer or such sooner and in the
  longer run it will be an alias to that instead of np.multiple.outer.


 Essentially yes, but I don't have an opinion about location or
 implementation in numpy, nor do I know enough.

 I always considered np.outer conceptually as belonging to linalg that
 provides a more convenient interface than np.dot if both arrays are
 1-D.  (no need to add extra axis and transpose)

 Josef

 
 
  Josef
 
 
 
 
  
   - Sebastian
  
  
   Best,
  
   Matthew
   ___
   NumPy-Discussion mailing list
   NumPy-Discussion@scipy.org
   http://mail.scipy.org/mailman/listinfo/numpy-discussion
  
  
  
   ___
   NumPy-Discussion mailing list
   NumPy-Discussion@scipy.org
   http://mail.scipy.org/mailman/listinfo/numpy-discussion
  
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors

2015-04-17 Thread josef.pktd

On Fri, Apr 17, 2015 at 2:56 PM, Sebastian Berg
sebast...@sipsolutions.net wrote:
 On Fr, 2015-04-17 at 12:40 -0400, josef.p...@gmail.com wrote:
 On Fri, Apr 17, 2015 at 12:16 PM, Neil Girdhar mistersh...@gmail.com wrote:
 
 
  On Fri, Apr 17, 2015 at 12:09 PM, josef.p...@gmail.com wrote:
 
  On Fri, Apr 17, 2015 at 11:22 AM, Neil Girdhar mistersh...@gmail.com
  wrote:
  
  
   On Fri, Apr 17, 2015 at 10:47 AM, josef.p...@gmail.com wrote:
  
   On Fri, Apr 17, 2015 at 10:07 AM, Sebastian Berg
   sebast...@sipsolutions.net wrote:
On Do, 2015-04-16 at 15:28 -0700, Matthew Brett wrote:
Hi,
   
snip
   
So, how about a slight modification of your proposal?
   
1) Raise deprecation warning for np.outer for non 1D arrays for a
few
versions, with depraction in favor of np.multiply.outer, then
2) Raise error for np.outer on non 1D arrays
   
   
I think that was Neil's proposal a bit earlier, too. +1 for it in any
case, since at least for the moment I doubt outer is used a lot for
non
1-d arrays. Possible step 3) make it work on higher dims after a long
period.
  
   sounds ok to me
  
   Some random comments of what I remember or guess in terms of usage
  
   I think there are at most very few np.outer usages with 2d or higher
   dimension.
   (statsmodels has two models that switch between 2d and 1d
   parameterization where we don't use outer but it has similar
   characteristics. However, we need to control the ravel order, which
   IIRC is Fortran)
  
   The current behavior of 0-D scalars in the initial post might be
   useful if a numpy function returns a scalar instead of a 1-D array in
   size=1. np.diag which is a common case, doesn't return a scalar (in my
   version of numpy).
  
   I don't know any use case where I would ever want to have the 2d
   behavior of np.multiply.outer.
  
 
  I only understand part of your example, but it looks similar to what
  we are doing in statsmodels.
 
  
   My use case is pretty simple.  Given an input vector x, and a weight
   matrix
   W, and a model y=Wx, I calculate the gradient of the loss L with respect
   W.
   It is the outer product of x with the vector of gradients dL/dy.  So the
   code is simply:
  
   W -= outer(x, dL_by_dy)
 
  if you sum/subtract over all the values, isn't this the same as
  np.dot(x, dL_by_dy)
 
 
  What?  Matrix subtraction is element-wise:
 
  In [1]: x = np.array([2,3,4])
 
  In [2]: dL_by_dy = np.array([7,9])
 
  In [5]: W = np.zeros((3, 2))
 
  In [6]: W -= np.outer(x, dL_by_dy)
 
  In [7]: W
  Out[7]:
  array([[-14., -18.],
 [-21., -27.],
 [-28., -36.]])


 Ok, different use case

 mine are more like variations on the following

  a1 = np.arange(18).reshape(6,3)
  a2 = np.arange(12).reshape(6, 2)
  index = [1, 2, 5]


 text book version
  np.sum([np.outer(a1[i], a2[i]) for i in index], 0)
 array([[180, 204],
[196, 223],
[212, 242]])

 simpler
  np.dot(a1[index].T, a2[index])
 array([[180, 204],
[196, 223],
[212, 242]])


 
  
   Sometimes, I have some x_indices and y_indices.  Now I want to do:
  
   W[x_indices, y_indices] -= outer(x[x_indices], dL_by_dy[y_indices])
  
   Unfortunately, if x_indices or y_indices are int or slice in some way
   that
   removes a dimension, the left side will have fewer dimensions than the
   right.  np.multipy.outer does the right thing without the ugly cases:
  
   if isinstance(x_indices, int): … # ugly hacks follow.
 
  My usual hacks are either to use np.atleast_1d or np.atleast_1d or
  np.squeeze if there is shape mismatch in some cases.
 
 
  Yes, but in this case, the left side is the problem, which has too few
  dimensions.  So atleast_1d doesn't work.  I was conditionally squeezing, 
  but
  that is extremely ugly.  Especially if you're conditionally squeezing based
  on both x_indices and y_indices.

 I don't remember if I ever used something like this

  a1[0, 1]
 1
  a1[np.atleast_1d(0), np.atleast_1d(1)]
 array([1])

  a1[np.atleast_1d(0), np.atleast_1d(1)] = [[100]]

  a1[0, 1]  = [[100]]
 Traceback (most recent call last):
   File pyshell#314, line 1, in module
 a1[0, 1]  = [[100]]
 ValueError: setting an array element with a sequence.


 Hehe, yeah, that difference. But if you really want that, you can
 usually do a1[0, 1, ...] if you don't mind the ugliness.

I'm not sure what you mean, although it sounds like a nice trick.
This doesn't work for me

 a1[0, 1, ...]  = [[100]]
Traceback (most recent call last):
  File pyshell#315, line 1, in module
a1[0, 1, ...]  = [[100]]
ValueError: assignment to 0-d array

 np.__version__
'1.9.2rc1'
 a1[0, 1,

Josef




 Josef


 
 
 
  
   I guess we will or would have applications for outer along an axis,
   for example if x.shape = (100, 10), then we have
   x[:,None, :] * x[:, :, None] (I guess)
   Something like this shows up reasonably often in econometrics as
   Outer Product. However in most cases we can avoid constructing

Re: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors

2015-04-17 Thread josef.pktd

On Fri, Apr 17, 2015 at 4:03 PM, Sebastian Berg
sebast...@sipsolutions.net wrote:
 On Fr, 2015-04-17 at 15:18 -0400, josef.p...@gmail.com wrote:
 On Fri, Apr 17, 2015 at 2:56 PM, Sebastian Berg
 snip
  Hehe, yeah, that difference. But if you really want that, you can
  usually do a1[0, 1, ...] if you don't mind the ugliness.

 I'm not sure what you mean, although it sounds like a nice trick.
 This doesn't work for me


 Oh, mindslip. I thought the problem was that maybe scalar assignment
 does not remove trailing dimensions. But the actual reason was that you
 do not have an array on the right hand side. And the assignment code
 isn't sure if you might want to do object assignment in that case, so it
 can't do the funny broadcasting of the left hand side (or trailing
 dimension removing, whichever way around you like to think of it).

Now I'm getting confused

I had thought that these two are the same

 a1[0, 1]  = np.array([[100]])
 a1[0, 1]  = [[100]]

but trying it out and from your explanation, they are not

I thought Neil's initial use case was that

a1[0, 1]  = np.outer(5, 1)

doesn't work, because of
 np.outer(5, 1).shape
(1, 1)

But that works for me.

In any case, the thread is getting long, and I explained my perception
of use cases for np.outer.

Josef


  a1[0, 1, ...]  = [[100]]
 Traceback (most recent call last):
   File pyshell#315, line 1, in module
 a1[0, 1, ...]  = [[100]]
 ValueError: assignment to 0-d array

  np.__version__
 '1.9.2rc1'
  a1[0, 1,

 Josef



 
  Josef
 
 
  
  
  
   
I guess we will or would have applications for outer along an axis,
for example if x.shape = (100, 10), then we have
x[:,None, :] * x[:, :, None] (I guess)
Something like this shows up reasonably often in econometrics as
Outer Product. However in most cases we can avoid constructing 
this
matrix and get the final results in a more memory efficient or 
faster
way.
(example an array of covariance matrices)
   
   
Not sure I see this.  outer(a, b) should return something that has
shape:
(a.shape + b.shape).  If you're doing it along an axis, you mean
you're
reshuffling the resulting shape vector?
  
   No I'm not reshaping the full tensor product.
  
   It's a vectorized version of looping over independent outer products
  
   np.array([outer(xi, yi) for xi,yi in zip(x, y)])
   (which I would never use with outer)
  
   but I have code that works similar for a reduce (or reduce_at) loop 
   over
   this.
  
   Josef
  
  
   
   
Josef
   
   
   
   

 - Sebastian


 Best,

 Matthew
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
   
   
   
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
   
   ___
   NumPy-Discussion mailing list
   NumPy-Discussion@scipy.org
   http://mail.scipy.org/mailman/listinfo/numpy-discussion
  
  
  
   ___
   NumPy-Discussion mailing list
   NumPy-Discussion@scipy.org
   http://mail.scipy.org/mailman/listinfo/numpy-discussion
  
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors

2015-04-17 Thread josef.pktd

On Fri, Apr 17, 2015 at 11:22 AM, Neil Girdhar mistersh...@gmail.com wrote:


 On Fri, Apr 17, 2015 at 10:47 AM, josef.p...@gmail.com wrote:

 On Fri, Apr 17, 2015 at 10:07 AM, Sebastian Berg
 sebast...@sipsolutions.net wrote:
  On Do, 2015-04-16 at 15:28 -0700, Matthew Brett wrote:
  Hi,
 
  snip
 
  So, how about a slight modification of your proposal?
 
  1) Raise deprecation warning for np.outer for non 1D arrays for a few
  versions, with depraction in favor of np.multiply.outer, then
  2) Raise error for np.outer on non 1D arrays
 
 
  I think that was Neil's proposal a bit earlier, too. +1 for it in any
  case, since at least for the moment I doubt outer is used a lot for non
  1-d arrays. Possible step 3) make it work on higher dims after a long
  period.

 sounds ok to me

 Some random comments of what I remember or guess in terms of usage

 I think there are at most very few np.outer usages with 2d or higher
 dimension.
 (statsmodels has two models that switch between 2d and 1d
 parameterization where we don't use outer but it has similar
 characteristics. However, we need to control the ravel order, which
 IIRC is Fortran)

 The current behavior of 0-D scalars in the initial post might be
 useful if a numpy function returns a scalar instead of a 1-D array in
 size=1. np.diag which is a common case, doesn't return a scalar (in my
 version of numpy).

 I don't know any use case where I would ever want to have the 2d
 behavior of np.multiply.outer.


I only understand part of your example, but it looks similar to what
we are doing in statsmodels.


 My use case is pretty simple.  Given an input vector x, and a weight matrix
 W, and a model y=Wx, I calculate the gradient of the loss L with respect W.
 It is the outer product of x with the vector of gradients dL/dy.  So the
 code is simply:

 W -= outer(x, dL_by_dy)

if you sum/subtract over all the values, isn't this the same as
np.dot(x, dL_by_dy)



 Sometimes, I have some x_indices and y_indices.  Now I want to do:

 W[x_indices, y_indices] -= outer(x[x_indices], dL_by_dy[y_indices])

 Unfortunately, if x_indices or y_indices are int or slice in some way that
 removes a dimension, the left side will have fewer dimensions than the
 right.  np.multipy.outer does the right thing without the ugly cases:

 if isinstance(x_indices, int): … # ugly hacks follow.

My usual hacks are either to use np.atleast_1d or np.atleast_1d or
np.squeeze if there is shape mismatch in some cases.


 I guess we will or would have applications for outer along an axis,
 for example if x.shape = (100, 10), then we have
 x[:,None, :] * x[:, :, None] (I guess)
 Something like this shows up reasonably often in econometrics as
 Outer Product. However in most cases we can avoid constructing this
 matrix and get the final results in a more memory efficient or faster
 way.
 (example an array of covariance matrices)


 Not sure I see this.  outer(a, b) should return something that has shape:
 (a.shape + b.shape).  If you're doing it along an axis, you mean you're
 reshuffling the resulting shape vector?

No I'm not reshaping the full tensor product.

It's a vectorized version of looping over independent outer products

np.array([outer(xi, yi) for xi,yi in zip(x, y)])
(which I would never use with outer)

but I have code that works similar for a reduce (or reduce_at) loop over this.

Josef




 Josef




 
  - Sebastian
 
 
  Best,
 
  Matthew
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] On responding to dubious ideas (was: Re: Advanced indexing: fancy vs. orthogonal)

2015-04-09 Thread josef.pktd

On Thu, Apr 9, 2015 at 10:11 AM, Alan G Isaac alan.is...@gmail.com wrote:
   Alan wrote:
 3. I admit, my students are NOT using non-boolen fancy indexing on
 multidimensional arrays. (As far as I know.)  Are yours?

The only confusing case is mixing slices and integer array indexing
for ndim  2. The rest looks unsurprising, AFAIR

(AFAICS, my last fancy indexing mailing list discussion is at least 4
years old, with Jonathan Taylor. I don't remember when I discovered
the usefulness of the axis argument in take which covers many 3 or
higher dimensional indexing use cases.)



 On 4/9/2015 2:22 AM, Nathaniel Smith wrote:
 Well, okay, this would explain it, since integer fancy indexing is
 exactly the confusing case:-)  On the plus side, this also means that
 even if pigs started doing barrel-rolls through hell's
 winter-vortex-chilled air tomorrow and we simply removed integer fancy
 indexing, your students would be unaffected:-)


 Except that they do use statsmodels, which I believe (?) does make use of
 integer fancy-indexing.

And maybe all work would come to a standstill, because every library
is using fancy integer indexing.

I still don't know what all constitutes fancy indexing.

The two most common use cases for me (statsmodels) are indexing for
selecting elements like diag_indices, triu_indices and maybe nonzero,
and expanding from a unique array like inverse index in np.unique.

And there are just a few, AFAIR, orthogonal indexing cases with
broadcasting index arrays to select rectangular pieces of an array.

Josef


 Alan

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] ANN: HDF5 for Python 2.5.0

2015-04-09 Thread josef.pktd

On Thu, Apr 9, 2015 at 2:41 PM, Nathaniel Smith n...@pobox.com wrote:
 (Off-list)

 Congrats! Also btw, you might want to switch to a new subject line format
 for these emails -- the mention of Python 2.5 getting hdf5 support made me
 do a serious double take before I figured out what was going on, and 2.6 and
 2.7 will be even worse :-)

(offlist)
I also had to read the subject line and the first paragraph several
times to see who is using python 2.5

Josef
:}



 On Apr 9, 2015 2:07 PM, Andrew Collette andrew.colle...@gmail.com wrote:

 Announcing HDF5 for Python (h5py) 2.5.0
 

 The h5py team is happy to announce the availability of h5py 2.5.0.

 This release introduces experimental support for the highly-anticipated
 Single Writer Multiple Reader (SWMR) feature in the upcoming HDF5 1.10
 release.  SWMR allows sharing of a single HDF5 file between multiple
 processes without the complexity of MPI or multiprocessing-based
 solutions.

 This is an experimental feature that should NOT be used in production
 code. We are interested in getting feedback from the broader community
 with respect to performance and the API design.

 For more details, check out the h5py user guide:
 http://docs.h5py.org/en/latest/swmr.html

 SWMR support was contributed by Ulrik Pedersen.


 What's h5py?
 

 The h5py package is a Pythonic interface to the HDF5 binary data format.

 It lets you store huge amounts of numerical data, and easily manipulate
 that data from NumPy. For example, you can slice into multi-terabyte
 datasets stored on disk, as if they were real NumPy arrays. Thousands of
 datasets can be stored in a single file, categorized and tagged however
 you want.

 Documentation is at:

 http://docs.h5py.org


 Changes
 ---

 * Experimental SWMR support
 * Group and AttributeManager classes now inherit from the appropriate ABCs
 * Fixed an issue with 64-bit float VLENS
 * Cython warning cleanups related to const
 * Entire code base ported to six; 2to3 removed from setup.py


 Acknowledgements
 ---

 This release incorporates changes from, among others:

 * Ulrik Pedersen
 * James Tocknell
 * Will Parkin
 * Antony Lee
 * Peter H. Li
 * Peter Colberg
 * Ghislain Antony Vaillant


 Where to get it
 ---

 Downloads, documentation, and more are available at the h5py website:

 http://www.h5py.org
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] On responding to dubious ideas (was: Re: Advanced indexing: fancy vs. orthogonal)

2015-04-08 Thread josef.pktd

On Wed, Apr 8, 2015 at 1:38 PM, Robert Kern robert.k...@gmail.com wrote:
 On Wed, Apr 8, 2015 at 2:06 AM, Nathaniel Smith n...@pobox.com wrote:

 On Apr 5, 2015 7:04 AM, Robert Kern robert.k...@gmail.com wrote:
 
  On Sat, Apr 4, 2015 at 10:38 PM, Nathaniel Smith n...@pobox.com wrote:
  
   On Apr 4, 2015 4:12 AM, Todd toddr...@gmail.com wrote:
   
There was no break as large as this. In fact I would say this is
even a larger change than any individual change we saw in the python 2 
to 3
switch.  The basic mechanics of indexing are just too fundamental and 
touch
on too many things to make this sort of change feasible.
  
   I'm afraid I'm not clever enough to know how large or feasible a
   change is without even seeing the proposed change.
 
  It doesn't take any cleverness. The change in question was to make the
  default indexing semantics to orthogonal indexing. No matter the details of
  the ultimate proposal to achieve that end, it has known minimum
  consequences, at least in the broad outline. Current documentation and 
  books
  become obsolete for a fundamental operation. Current code must be modified
  by some step to continue working. These are consequences inherent in the
  end, not just the means to the end; we don't need a concrete proposal in
  front of us to know what they are. There are ways to mitigate these
  consequences, but there are no silver bullets that eliminate them. And we
  can compare those consequences to approaches like Jaime's that achieve a
  majority of the benefits of such a change without any of the negative
  consequences. That comparison does not bode well for any proposal.

 Ok, let me try to make my point another way.

 I don't actually care at this stage in the discussion whether the change
 is ultimately viable. And I don't think you should either. (For values of
 you that includes everyone in the discussion, not picking on Robert in
 particular :-).)

 My point is that rational, effective discussion requires giving ideas room
 to breath. Sometimes ideas turn out to be not as bad as they looked.
 Sometimes it turns out that they are, but there's some clever tweak that
 gives you 95% of the benefits for 5% of the cost. Sometimes you generate a
 better understanding of the tradeoffs that subsequently informs later design
 decisions. Sometimes working through the details makes both sides realize
 that there's a third option that solves both their problems. Sometimes you
 merely get a very specific understanding of why the whole approach is
 unreasonable that you can then, say, take to the pandas and netcdf
 developers as evidence of that you made a good faith effort and ask them to
 meet you half way. And all these things require understanding the specifics
 of what *exactly* works or doesn't work about about idea. IMHO, it's
 extremely misleading at this stage to make any assertion about whether
 Jaime's approach gives the majority of benefits of such a change is
 extremely misleading at this stage: not because it's wrong, but because it
 totally short-circuits the discussion about what benefits we care about.
 Jaime's patch certainly has merits, but does it help us make numpy and
 pandas/netcdf's more compatible? Does it make it easier for Eric to teach?
 Those are some specific merits that we might care about a lot, and for which
 Jaime's patch may or may not help much. But that kind of nuance gets lost
 when we jump straight to debating thumbs-up versus thumbs-down.

 And we can get all of that discussion from discussing Jaime's proposal. I
 would argue that we will get better, more focused discussion from it since
 it is actually a concrete proposal and not just a wish that numpy's indexing
 semantics were something else. I think that a full airing and elaboration of
 Jaime's proposal (as the final PR should look quite different than the
 initial one to incorporate the what is found in the discussion) will give us
 a satisficing solution. I certainly think that that is *more likely* to
 arrive at a satisficing solution than an attempt to change the default
 indexing semantics. I can name specific improvements that would specifically
 address the concerns you named if you would like. Maybe it won't be *quite*
 as good (for some parties) than if Numeric chose orthogonal indexing from
 the get-go, but it will likely be much better for everyone than if numpy
 broke backward compatibility on this feature now.

 I cross-my-heart promise that under the current regime, no PR breaking
 fancy indexing would ever get anywhere *near* numpy master without
 *extensive* discussion and warnings on the list. The core devs just spent
 weeks quibbling about whether a PR that adds a new field to the end of the
 dtype struct would break ABI backcompat (we're now pretty sure it doesn't),
 and the current standard we enforce is that every PR that touches public API
 needs a list discussion, even minor extensions with no compatibility issues
 at all. No one

Re: [Numpy-discussion] IDE's for numpy development?

2015-04-01 Thread josef.pktd

On Wed, Apr 1, 2015 at 12:04 PM, Charles R Harris
charlesr.har...@gmail.com wrote:
 Hi All,

 In a recent exchange Mark Wiebe suggested that the lack of support for numpy
 development in Visual Studio might limit the number of developers attracted
 to the project. I'm a vim/console developer myself and make no claim of
 familiarity with modern development tools, but I wonder if such tools might
 now be available for Numpy. A quick google search turns up a beta plugin for
 Visual Studio,, and there is an xcode IDE for the mac that apparently offers
 some Python support. The two things that I think are required are: 1)
 support for mixed C, python developement and 2) support for building and
 testing numpy. I'd be interested in information from anyone with experience
 in using such an IDE and ideas of how Numpy might make using some of the
 common IDEs easier.

 Thoughts?

I have no experience with the C/C++ part, but I'm using the C/C++
version of Eclipse with PyDev.

It should have all the extra features available, but I don't use them
and don't have compiler, debugger and so on for C/C++ connected to
Eclipse. It looks like it supports Visual C++ and MingW GCC toolchain.
(I'm not sure the same project can be a C/C++ and a PyDev project at
the same time.)


Josef


 Chuck

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Adding 'where' to ufunc methods?

2015-04-01 Thread josef.pktd

On Wed, Apr 1, 2015 at 3:47 PM, Nathaniel Smith n...@pobox.com wrote:
 On Wed, Apr 1, 2015 at 11:34 AM, Jaime Fernández del Río
 jaime.f...@gmail.com wrote:
 This question on StackOverflow:

 http://stackoverflow.com/questions/29394377/minimum-of-numpy-array-ignoring-diagonal

 Got me thinking that I had finally found a use for the 'where' kwarg of
 ufuncs. Unfortunately it is only provided for the ufunc itself, but not for
 any of its methods.

 Is there any fundamental reason these were not implemented back in the day?
 Any frontal opposition to having them now?

 The where= argument stuff was rescued from the last aborted attempt to
 add missing value support to numpy. The only reason they aren't
 implemented for the ufunc methods is that Mark didn't get that far.

 +1 to adding them now.

can you get `where` in ufuncs without missing value support?

what's the result for ufuncs that are not reduce operations?
what's the result for reduce operations along an axis if there is
nothing there (in a row or column or ...)?

Josef


 -n

 --
 Nathaniel J. Smith -- http://vorpus.org
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] DEP: Deprecate boolean array indices with non-matching shape #4353

2015-06-05 Thread josef.pktd

On Fri, Jun 5, 2015 at 3:16 AM, Sebastian Berg sebast...@sipsolutions.net
wrote:

 On Do, 2015-06-04 at 18:04 -0700, Nathaniel Smith wrote:
  On Thu, Jun 4, 2015 at 5:57 PM, Nathaniel Smith n...@pobox.com wrote:
   So specifically the question is -- if you have an array with five
  items, and
   a Boolean array with three items, then currently you can use the
  later to
   index the former:
  
   arr = np.arange(5)
   mask = np.asarray([True, False, True])
   arr[mask] # returns array([0, 2])
  
   This is justified by the rule that indexing with a Boolean array
  should be
   the same as indexing with the same array that's been passed to
  np.nonzero().
   Empirically, though, this causes constant confusion and does not
  seen very
   useful, so the question is whether we should deprecate it.
 
  One place where the current behavior is particularly baffling and
  annoying is when you have multiple boolean masks in the same indexing
  operation. I think everyone would expect this to index separately on
  each axis (outer product indexing style, like slices do), and that's
  really the only useful interpretation, but that's not what it does...:


 This is not being deprecated in there for the moment, it is a different
 discussion. Though maybe we can improve the error message to mention
 that the array was originally boolean, has always been bugging me a bit
 (it used to mention for some cases it is not anymore).

 - Sebastian


  In [3]: a = np.arange(9).reshape((3, 3))
 
  In [4]: a
  Out[4]:
  array([[0, 1, 2],
 [3, 4, 5],
 [6, 7, 8]])
 
  In [6]: a[np.asarray([True, False, True]), np.asarray([False, True,
  True])]
  Out[6]: array([1, 8])
 
  In [7]: a[np.asarray([True, False, True]), np.asarray([False, False,
  True])]
  Out[7]: array([2, 8])
 
  In [8]: a[np.asarray([True, False, True]), np.asarray([True, True,
  True])]
 
 ---
  IndexErrorTraceback (most recent call
  last)
  ipython-input-8-30b3427bec2a in module()
   1 a[np.asarray([True, False, True]), np.asarray([True, True,
  True])]
 
  IndexError: shape mismatch: indexing arrays could not be broadcast
  together with shapes (2,) (3,)
 
 
  -n
 
  --
  Nathaniel J. Smith -- http://vorpus.org
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



What is actually being deprecated?
It looks like there are different examples.

wrong length: Nathaniels first example above, where the mask is not
broadcastable to original array because mask is longer or shorter than
shape[axis].
I also wouldn't have expected this to work, although I use np.nozero and
boolean mask indexing interchangeably, I would assume we need the correct
length for the mask.

The second case where the boolean mask has an extra dimension of length
one, or several boolean arrays might need more checking.
I'm pretty sure I used various version, assuming they are a feature, and
when I see arrays, I usually don't assume outer product indexing  (that
might lead to a similar discussion as the recent fancy versus orthogonal
indexing)


Josef
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] DEP: Deprecate boolean array indices with non-matching shape #4353

2015-06-05 Thread josef.pktd

On Fri, Jun 5, 2015 at 11:50 AM, Anne Archibald archib...@astron.nl wrote:



 On Fri, Jun 5, 2015 at 5:45 PM Sebastian Berg sebast...@sipsolutions.net
 wrote:

 On Fr, 2015-06-05 at 08:36 -0400, josef.p...@gmail.com wrote:
 
 snip
 
  What is actually being deprecated?
  It looks like there are different examples.
 
 
  wrong length: Nathaniels first example above, where the mask is not
  broadcastable to original array because mask is longer or shorter than
  shape[axis].
  I also wouldn't have expected this to work, although I use np.nozero
  and boolean mask indexing interchangeably, I would assume we need the
  correct length for the mask.
 

 For the moment we are only talking about wrong length (along a given
 dimension). Not about wrong number of dimensions or multiple boolean
 indices.


 I am pro-deprecation then, definitely. I don't see a use case for padding
 a wrong-shaped boolean array with Falses, and the padding has burned me in
 the past.

 It's not orthogonal to the wrong-number-of-dimensions issue, though,
 because if your Boolean array has a dimension of length 1, broadcasting
 says duplicate it along that axis to match the indexee, and wrong-length
 says pad it with Falses. This ambiguity/pitfall disappears if the padding
 never happens, and that kind of broadcasting is very useful.


Good argument, now I understand why we only get a single column



 x = np.arange(4*5).reshape(4,5)
 mask = np.array([1,0,1,0,1], bool)

padding with False, this would also be deprecated AFAIU, and Anna pointed
out

 x[mask[:4][:,None]]
array([ 0, 10])
 x[mask[None,:]]
array([0, 2, 4])

masks can only be combined with slices, so no fancy masking allowed nor
defined (yet)

 x[mask[:4][:,None], mask[None,:]]
Traceback (most recent call last):
  File pyshell#31, line 1, in module
x[mask[:4][:,None], mask[None,:]]
IndexError: too many indices for array


I'm using 1d masks quite often to select rows or columns, which seems to
work in more than two dimensions
(Benjamin's surprise)

 x[:, mask]
array([[ 0,  2,  4],
   [ 5,  7,  9],
   [10, 12, 14],
   [15, 17, 19]])

 x[mask[:4][:,None] * mask[None,:]]
array([ 0,  2,  4, 10, 12, 14])
 x[:,:,None][mask[:4][:,None] * mask[None,:]]
array([[ 0],
   [ 2],
   [ 4],
   [10],
   [12],
   [14]])

Josef




 Anne

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Python 3 and isinstance(np.int64(42), int)

2015-06-23 Thread josef.pktd

On Fri, Jun 19, 2015 at 4:15 PM, Chris Barker chris.bar...@noaa.gov wrote:

 On Wed, Jun 17, 2015 at 11:13 PM, Nathaniel Smith n...@pobox.com wrote:

  there's some
 argument that in Python, doing explicit type checks like this is
 usually a sign that one is doing something awkward,


 I tend to agree with that.

 On the other hand, numpy itself is kind-of sort-of statically typed. But
 in that case, if you need to know the type of an array -- check the array's
 dtype.

 Also:

   a = np.zeros(7, int)
   n = a[3]
   type(n)
 type 'numpy.int64'

 I Never liked declaring numpy arrays with the python types like int or
 float -- in numpy you usually care more about the type, so should simple
 use int64 if you want a 64 bit int. And float64 if you want a 64 bit
 float. Granted, pyton floats have always been float64 (on all platfroms??),
 and python ints used to a be a reasonable int type, but now that python
 ints are bigInts in py3, it really makes sense to be clear.

 And now that I think about it, in py2, int is 32 bit on win64 and 64 bit
 on *nix64 -- so you're really better off being explicit with your numpy
 arrays.



being late checking some examples

 a = np.zeros(7, int)
 a.dtype
dtype('int32')
 np.__version__
'1.9.2rc1'
 type(a[3])
class 'numpy.int32'


 a = np.zeros(7, int)
 a = np.array([88])
 a
array([88], dtype=int64)

 a = np.array([8])
 a
array([8], dtype=object)

 a = np.array([8], dtype=int)
Traceback (most recent call last):
  File pyshell#10, line 1, in module
a = np.array([8], dtype=int)
OverflowError: Python int too large to convert to C long


Looks like we need to be a bit more careful now.

Josef
Python 3.4.3




 -CHB


 --

 Christopher Barker, Ph.D.
 Oceanographer

 Emergency Response Division
 NOAA/NOS/ORR(206) 526-6959   voice
 7600 Sand Point Way NE   (206) 526-6329   fax
 Seattle, WA  98115   (206) 526-6317   main reception

 chris.bar...@noaa.gov

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Clarification sought on Scipy Numpy version requirements.

2015-06-19 Thread josef.pktd

On Fri, Jun 19, 2015 at 4:08 PM, Charles R Harris charlesr.har...@gmail.com
 wrote:

 Hi All,

 I'm looking to change some numpy deprecations into errors as well as
 remove some deprecated functions. The problem I see is that
 SciPy claims to support Numpy = 1.5 and Numpy 1.5 is really, really, old.
 So the question is, does support mean compiles with earlier versions
 of Numpy ? If that is the case there is very little that can be done about
 deprecation. OTOH, if it means Scipy can be compiled with more recent numpy
 versions but used with earlier Numpy versions (which is a good feat), I'd
 like to know. I'd also like to know what the interface requirements are, as
 I'd like to remove old_defines.h


numpy 1.6 I think is still accurate
https://github.com/scipy/scipy/pull/4265

As far as I know, you can never compile against a newer and run with an
older version.

We had the discussion recently about backwards versus forwards binary
compatibility

Josef





 Chuck

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] checking S versus U dtype

2015-06-01 Thread josef.pktd

What's the best way to check whether a numpy array is string or bytes on
python3?

using char?


 A = np.asarray([[1, 0, 0],['E', 1, 0],['E', 'E', 1]], dtype='U1')
 A
array([['1', '0', '0'],
   ['E', '1', '0'],
   ['E', 'E', '1']],
  dtype='U1')
 A.dtype
dtype('U1')
 A.dtype.char
'U'
 A.dtype.char == 'U'
True
 A.dtype.char == 'S'
False
 A.astype('S1').dtype.char == 'S'
True
 A.astype('S1').dtype.char == 'U'
False


background:
I don't know why sometimes I got S and sometimes U on Python 3.4, and I
want the code to work with both

 A == 'E'
array([[False, False, False],
   [ True, False, False],
   [ True,  True, False]], dtype=bool)

 A.astype('S1') == 'E'
False
 A.astype('S1') == b'E'
array([[False, False, False],
   [ True, False, False],
   [ True,  True, False]], dtype=bool)


Josef
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] floats for indexing, reshape - too strict ?

2015-07-01 Thread josef.pktd

About the deprecation warning for using another type than integers, in
ones, reshape, indexing and so on:

Wouldn't it be nicer to accept floats that are equal to an integer?

for example

 5.0 == 5
True

 np.ones(10 / 2)
array([ 1.,  1.,  1.,  1.,  1.])
 10 / 2 == 5
True

or the python 2 version

 np.ones(10. / 2)
array([ 1.,  1.,  1.,  1.,  1.])
 10. / 2 == 5
True

I'm using now 10 // 2, or int(10./2 + 1)   but this is unconditional and
doesn't raise if the numbers are not close or equal to an integer (which
would be a bug)


Josef
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] floats for indexing, reshape - too strict ?

2015-07-01 Thread josef.pktd

On Wed, Jul 1, 2015 at 10:32 AM, Sebastian Berg sebast...@sipsolutions.net
wrote:

 On Mi, 2015-07-01 at 10:05 -0400, josef.p...@gmail.com wrote:
  About the deprecation warning for using another type than integers, in
  ones, reshape, indexing and so on:
 
 
  Wouldn't it be nicer to accept floats that are equal to an integer?
 

 Hmmm, the biggest point was that the old solution was to basically
 (besides strings) use `int(...)`, which means it does not raise any
 errors as you also mention.
 I am open to think about allowing exact floats for most of this
 (frankly, not advanced indexing at least for the moment, but we never
 did there), I think scipy may be doing that for some functions?

 The disadvantage I see is, that some weirder calculations would possible
 work most of the times, but not always, what I mean is such a case.
 A -- possibly silly -- example:

 In [8]: for i in range(10):
...: print i, i == i * 0.1 * 10
...:
 0 True
 1 True
 2 True
 3 False
 4 True
 5 True
 6 False
 7 False
 8 True
 9 True

 I am somewhat opposed to rounding a lot (i.e. not noticing if you got
 3. somewhere), so not sure if you can define a tolerance
 reasonable here unless it is exact. Though I guess you are right that
 `//` will also just round silently already.


Yes, I thought about this, something like `int_if_close` in analogy to
real_if_close would be useful.

However, given that we need to decide on a threshold in this case, I
thought it's overkill to put that into reshape, ones and indexing and
similar.

Simpler cases would work
number if triangular elements

 for i in range(10): print(i, i * (i - 1) / 2. == int(i * (i - 1) / 2.))

0 True
1 True
2 True
3 True
4 True
5 True
6 True
7 True
8 True
9 True

also np.ceil and np.trunc return floats, not integers.

One disadvantage of raising or warning after the equality check is that
developers have a tendency to write nice unit tests. Then the casting
doesn't break in the unit tests but might raise an exception at some random
data.


For reference: here are my changes in cleaning up
https://github.com/statsmodels/statsmodels/pull/2490/files


Josef






 - Sebastian

 
  for example
 
 
   5.0 == 5
  True
 
 
   np.ones(10 / 2)
  array([ 1.,  1.,  1.,  1.,  1.])
   10 / 2 == 5
  True
 
 
  or the python 2 version
 
 
   np.ones(10. / 2)
  array([ 1.,  1.,  1.,  1.,  1.])
   10. / 2 == 5
  True
 
 
  I'm using now 10 // 2, or int(10./2 + 1)   but this is unconditional
  and doesn't raise if the numbers are not close or equal to an integer
  (which would be a bug)
 
 
 
 
  Josef
 
 
 
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] floats for indexing, reshape - too strict ?

2015-07-02 Thread josef.pktd

On Thu, Jul 2, 2015 at 8:51 PM, Chris Barker - NOAA Federal 
chris.bar...@noaa.gov wrote:

 Sent from my iPhone

 
  The disadvantage I see is, that some weirder calculations would possible
  work most of the times, but not always,


   not sure if you can define a tolerance
  reasonable here unless it is exact.

 You could use a relative tolerance, but you'd still have to set that.
 Better to put that decision squarely in the user's hands.

  Though I guess you are right that
  `//` will also just round silently already.

 Yes, but if it's in the user's code, it should be obvious -- and then
 the user can choose to round, or floor, or ceiling


round, floor, ceil don't produce integers.

I'm writing library code, and I don't have control over what everyone does.

round, floor, ceil, and // might hide bugs or user mistakes, if we are
supposed to get something that is like an int but it's. 42.6 instead.

Josef
https://en.wikipedia.org/wiki/Phrases_from_The_Hitchhiker%27s_Guide_to_the_Galaxy#Answer_to_the_Ultimate_Question_of_Life.2C_the_Universe.2C_and_Everything_.2842.29




 -CHB

 
  - Sebastian
 
 
  for example
 
 
  5.0 == 5
  True
 
 
  np.ones(10 / 2)
  array([ 1.,  1.,  1.,  1.,  1.])
  10 / 2 == 5
  True
 
 
  or the python 2 version
 
 
  np.ones(10. / 2)
  array([ 1.,  1.,  1.,  1.,  1.])
  10. / 2 == 5
  True
 
 
  I'm using now 10 // 2, or int(10./2 + 1)   but this is unconditional
  and doesn't raise if the numbers are not close or equal to an integer
  (which would be a bug)
 
 
 
 
  Josef
 
 
 
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] annoying Deprecation warnings about non-integers

2015-06-30 Thread josef.pktd

I'm trying to fix some code in statsmodels that creates Deprecation
Warnings from numpy

Most of it are quite easy to fix, mainly cases where we use floats to avoid
integer division

I have two problems

first, I get Deprecation warnings in the test run that don't specify where
they happen.
I try to find them with file searches, but I don't see a `np.ones` that
might cause a problem
(needle in a haystack: Close to 4000 unittests and more than 100,000 lines
of numpython)
Also, I'm not sure the warnings are only from statsmodels, they could be in
numpy, scipy or pandas, couldn't they?


second, what's wrong with non-integers in `np.r_[[np.nan] * head, x,
[np.nan] * tail]` (see below)

I tried to set the warnings filter to `error` but then Python itself
errored right away.

https://travis-ci.org/statsmodels/statsmodels/jobs/68748936
https://github.com/statsmodels/statsmodels/issues/2480


Thanks for any clues

Josef


nosetests  -s --pdb-failures --pdb
M:\j\statsmodels\statsmodels_py34\statsmodels\tsa\tests

..C:\WinPython-64bit-3.4.3.1\python-3.4.3.amd64\lib\sit
e-packages\numpy\core\numeric.py:183: DeprecationWarning: using a
non-integer nu
mber instead of an integer will result in an error in the future
  a = empty(shape, dtype, order)
..


...M:\j\statsmodels\stat
smodels_py34\statsmodels\tsa\filters\filtertools.py:28: DeprecationWarning:
usin
g a non-integer number instead of an integer will result in an error in the
futu
re
  return np.r_[[np.nan] * head, x, [np.nan] * tail]
..


...C:\WinPython-64bit-3.4.3.1
\python-3.4.3.amd64\lib\site-packages\numpy\lib\twodim_base.py:231:
DeprecationW
arning: using a non-integer number instead of an integer will result in an
error
 in the future
  m = zeros((N, M), dtype=dtype)
C:\WinPython-64bit-3.4.3.1\python-3.4.3.amd64\lib\site-packages\numpy\l
ib\twodim_base.py:238: DeprecationWarning: using a non-integer number
instead of
 an integer will result in an error in the future
  m[:M-k].flat[i::M+1] = 1
...
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Backwards-incompatible improvements to numpy.random.RandomState

2015-05-24 Thread josef.pktd

On Sun, May 24, 2015 at 9:08 AM, Alan G Isaac alan.is...@gmail.com wrote:

 On 5/24/2015 8:47 AM, Ralf Gommers wrote:
  Values only change if you leave out the call to seed()


 OK, but this claim seems to conflict with the following language:
 the global RandomState object should use the latest implementation of the
 methods.
 I take it that this is what Nathan meant by
 I think this is just a bug in the description of the proposal here, not
 in the proposal itself.

 So, is the correct phrasing
 the global RandomState object should use the latest implementation of the
 methods, unless explicitly seeded?


that's how I understand it.

I don't see any problems with the clarified proposal for the use cases that
I know of.

Can we choose the version also for the global random state, for example to
fix both version and seed in unit tests, with version  0?


BTW: I would expect that bug fixes are still exempt from backwards
compatibility.

fixing #5851 should be independent of the version, (without having looked
at the issue)

(If you need to replicate bugs, then use an old version of a package.)

Josef



 Thanks,
 Alan
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Backwards-incompatible improvements to numpy.random.RandomState

2015-05-24 Thread josef.pktd

On Sun, May 24, 2015 at 1:49 PM, Nathaniel Smith n...@pobox.com wrote:

 On May 24, 2015 8:43 AM, josef.p...@gmail.com wrote:
 
  Reminder: we are bottom or inline posting

 Can we stop hassling people about this? Inline replies are a great tool to
 have in your toolkit for complicated technical discussions, but I feel like
 our weird insistence on them has turned into a pointless and exclusionary
 thing. It's not like bottom replying is even any better -- the traditional
 mailing list rule is you trim quotes to just the part you're replying to
 (like this message); quoting the whole thing and replying underneath just
 to give people a bit of exercise for their scrolling finger would totally
 have gotten you flamed too.

 But email etiquette has moved on since the 90s, even regular posters to
 this list violate this rule all the time, it's time to let it go.


It's not a 90's thing and I learned about it around 2009 when I started in
here.
I find it very annoying trying to catch up with a longer thread and the
replies are all over the place.


Anne is a few years older than I in terms of numpy and scipy participation
and this was just intended to be a friendly reminder.

And as BTW: I'm glad Anne is back with scipy.


Josef



 -n

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Backwards-incompatible improvements to numpy.random.RandomState

2015-05-24 Thread josef.pktd

On Sun, May 24, 2015 at 11:13 AM, Anne Archibald archib...@astron.nl
wrote:

 Do we want a deprecation-like approach, so that eventually people who want
 replicability will specify versions, and everyone else gets bug fixes and
 improvements? This would presumably take several major versions, but it
 might avoid people getting unintentionally trapped on this version.

 Incidentally, bug fixes are complicated: if a bug fix uses more or fewer
 raw random numbers, it breaks repeatability not just for the call that got
 fixed but for all successive random number generations.


Reminder: we are bottom or inline posting





 Anne

 On Sun, May 24, 2015 at 5:04 PM josef.p...@gmail.com wrote:

 On Sun, May 24, 2015 at 9:08 AM, Alan G Isaac alan.is...@gmail.com
 wrote:

 On 5/24/2015 8:47 AM, Ralf Gommers wrote:
  Values only change if you leave out the call to seed()


 OK, but this claim seems to conflict with the following language:
 the global RandomState object should use the latest implementation of
 the methods.
 I take it that this is what Nathan meant by
 I think this is just a bug in the description of the proposal here, not
 in the proposal itself.

 So, is the correct phrasing
 the global RandomState object should use the latest implementation of
 the methods, unless explicitly seeded?


 that's how I understand it.

 I don't see any problems with the clarified proposal for the use cases
 that I know of.

 Can we choose the version also for the global random state, for example
 to fix both version and seed in unit tests, with version  0?


 BTW: I would expect that bug fixes are still exempt from backwards
 compatibility.

 fixing #5851 should be independent of the version, (without having
 looked at the issue)


I skimmed the issue.
In a strict sense it's not really a bug, the user doesn't get wrong
numbers, he or she gets Not A Number.

So there are no current usages that use the function in that range.

Josef




 (If you need to replicate bugs, then use an old version of a package.)

 Josef



 Thanks,
 Alan
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Backwards-incompatible improvements to numpy.random.RandomState

2015-05-24 Thread josef.pktd

On Sun, May 24, 2015 at 5:09 PM, Antony Lee antony@berkeley.edu wrote:

 2015-05-24 13:30 GMT-07:00 Sturla Molden sturla.mol...@gmail.com:

 On 24/05/15 10:22, Antony Lee wrote:

  Comments, and help for writing tests (in particular to make sure
  backwards compatibility is maintained) are welcome.

 I have one comment, and that is what makes random numbers so special?
 This applies to the rest of NumPy too, fixing a bug can sometimes change
 the output of a function.

 Personally I think we should only make guarantees about the data types,
 array shapes, and things like that, but not about the values. Those who
 need a particular version of NumPy for exact reproducibility should
 install the version of Python and NumPy they need. That is why virtual
 environments exist.


 I personally agree with this point of view (see original discussion in
 #5299, for example); if it was only up to me at least I'd make
 RandomState(seed) default to the latest version rather than the original
 one (whether to keep the old versions around is another question).  On the
 other hand, I see that this long-standing debate has prevented obvious
 improvements from being added sometimes for years (e.g. a patch for
 Ziggurat normal variates has been lying around since 2010), or led to
 potential API duplication in order to fix some clearly undesirable behavior
 (dirichlet returning nan being described as in a strict sense not really
 a bug(!)), so I'm willing to compromise to get this moving forward.



It's clearly a different kind of bug than some of the ones we fixed in
the past without backwards compatibility discussion where the distribution
was wrong, i.e. some values shifted so parts have more weight and parts
have less weight.

As I mentioned, I don't see any real problem with the proposal.

Josef




 Antony

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposal: Deprecate np.int, np.float, etc.?

2015-08-04 Thread josef.pktd

On Tue, Aug 4, 2015 at 4:39 AM, Sebastian Berg sebast...@sipsolutions.net
wrote:

 On Mo, 2015-08-03 at 21:32 +0200, Sturla Molden wrote:
  On 03/08/15 20:51, Chris Barker wrote:
 
   well, IIUC, np.int http://np.int is the python integer type, which
 is
   a C long in all the implemtations of cPython that I know about -- but
 is
   that a guarantee?in the future as well?
 
  It is a Python int on Python 2.
 
  On Python 3 dtype=np.int means the dtype will be C long, because a
  Python int has no size limit. But np.int aliases Python int. And
  creating an array with dype=int therefore does not create an array of
  Python int, it creates an array of C long. To actually get dtype=int we
  have to write dtype=object, which is just crazy.
 

 Since it seemes there may be a few half truths flying around in this
 thread. See http://docs.scipy.org/doc/numpy/user/basics.types.html



Quote:

Note that, above, we use the *Python* float object as a dtype. NumPy knows
that int refers to np.int_, bool meansnp.bool_, that float is np.float_ and
complex is np.complex_. The other data-types do not have Python
equivalents.

Is there a conflict with the current thread?

Josef
(I'm not a C person, so most of this is outside my scope, except for
watching bugfixes to make older code work for larger datasets. Use `intp`,
Luke.)




 and also note the sentence below the table (maybe the table should also
 note these):

 Additionally to intc the platform dependent C integer types short, long,
 longlong and their unsigned versions are defined.

 - Sebastian

 
  Sturla
 
 
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] np.in1d() sets, bug?

2015-08-10 Thread josef.pktd

On Mon, Aug 10, 2015 at 1:40 PM, Benjamin Root ben.r...@ou.edu wrote:

  Not really, it is simply because ``np.asarray(set([1, 2, 3]))``
  returns an object array

 Holy crap! To be pedantic, it looks like it turns it into a numpy scalar,
 but still! I wouldn't have expected np.asarray() on a set (or dictionary,
 for that matter) to work because order is not guaranteed. Is this expected
 behavior?

 Digging into the implementation of in1d(), I can see now how passing a
 set() wouldn't be useful at all (as an aside, pretty clever algorithm). I
 know sets aren't array-like, but the code that used this seemed to work at
 first, and this problem wasn't revealed until I created some unit tests to
 exercise some possible corner cases. Silently producing possibly erroneous
 results is dangerous. Don't know if better documentation or some better
 sanity checking would be called for here, though.

 Ben Root


 On Mon, Aug 10, 2015 at 1:10 PM, Sebastian Berg 
 sebast...@sipsolutions.net wrote:

 On Mo, 2015-08-10 at 12:09 -0400, Benjamin Root wrote:
  Just came across this one today:
 
   np.in1d([1], set([0, 1, 2]), assume_unique=True)
  array([ False], dtype=bool)
 
   np.in1d([1], [0, 1, 2], assume_unique=True)
 
  array([ True], dtype=bool)
 
 
  I am assuming this has something to do with the fact that order is not
  guaranteed with set() objects? I was kind of hoping that setting
  assume_unique=True would be sufficient to overcome that problem.
  Should sets be rejected as an error?
 

 Not really, it is simply because ``np.asarray(set([1, 2, 3]))``
 returns an object array and 1 is not the same as ``set([1, 2, 3])``.

 I think earlier numpy versions may have had short cuts for short lists
 or something so this may have worked in some cases



is it possible to get at least a UserWarning when creating an object array
and dtype object hasn't been explicitly requested or underlying data is
already in an object dtype?


Josef



 - Sebastian


 
  This was using v1.9.0
 
 
  Cheers!
 
  Ben Root
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] difference between dtypes

2015-07-23 Thread josef.pktd

Is there an explanation somewhere of what different basic dtypes mean,
across platforms and python versions?

 np.bool8
type 'numpy.bool_'
 np.bool_
type 'numpy.bool_'
 bool
type 'bool'


Are there any rules and recommendations or is it all folks lore?


I'm asking because my intuition picked up by osmosis might be off, and I
thought
https://github.com/scipy/scipy/pull/5076
is weird (i.e. counter intuitive).


Deprecation warnings are always a lot of fun, especially if
This log is too long to be displayed. Please reduce the verbosity of your
build or download the raw log.

Josef
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] difference between dtypes

2015-07-24 Thread josef.pktd

On Fri, Jul 24, 2015 at 3:46 AM, Robert Kern robert.k...@gmail.com wrote:

 On Wed, Jul 22, 2015 at 7:45 PM, josef.p...@gmail.com wrote:
 
  Is there an explanation somewhere of what different basic dtypes mean,
 across platforms and python versions?
 
   np.bool8
  type 'numpy.bool_'
   np.bool_
  type 'numpy.bool_'
   bool
  type 'bool'
 
 
  Are there any rules and recommendations or is it all folks lore?

 This may help a little:


 http://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html#arrays-dtypes-constructing

 Basically, we accept the builtin Python type objects as a dtype argument
 and do something sensible with them. float - np.float64 because Python
 floats are C doubles. int - np.int32 or np.int64 depending on whatever a C
 long is (i.e. depending on the 64bitness of your CPU and how your OS
 chooses to deal with that). We encode those precision choices as aliases to
 the corresponding specific numpy scalar types (underscored as necessary to
 avoid shadowing builtins of the same name): np.float_ is np.float64, for
 example.

 See here for why the aliases to Python builtin types, np.int, np.float,
 etc. still exist:

 https://github.com/numpy/numpy/pull/6103#issuecomment-123652497

 If you just need to pass a dtype= argument and want the precision that
 matches the native integer and float for your platform, then I prefer to
 use the Python builtin types instead of the underscored aliases; they just
 look cleaner. If you need a true numpy scalar type (e.g. to construct a
 numpy scalar object), of course, you must use one of the numpy scalar
 types, and the underscored aliases are convenient for that. Never use the
 aliases to the Python builtin types.



(I don't have time to follow up on this for at least two weeks)

my thinking was that, if there is no actual difference between bool,
np.bool and np.bool_, the np.bool could become an alias and a replacement
for np.bool_, so we can get rid of a ugly trailing underscore.
If np.float is always float64 it could be mapped to that directly.

As the previous discussion on python int versus numpy int on python 3.x,
int is at least confusing.

Also I'm thinking that maybe adjusting the code to the (mis)interpretation,
instead of adjusting killing np.float completely might be nicer, (but
changing np.int would be riskier?)

Josef





 --
 Robert Kern

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposal: stop supporting 'setup.py install'; start requiring 'pip install .' instead

2015-10-27 Thread josef.pktd

On Tue, Oct 27, 2015 at 3:32 AM, Ralf Gommers 
wrote:

>
>
> On Tue, Oct 27, 2015 at 8:28 AM, Nathaniel Smith  wrote:
>
>> On Tue, Oct 27, 2015 at 12:19 AM, Ralf Gommers 
>> wrote:
>> >
>> >
>> > On Tue, Oct 27, 2015 at 6:44 AM, Nathaniel Smith  wrote:
>> >>
>> >> On Mon, Oct 26, 2015 at 9:31 PM, Nathaniel Smith 
>> wrote:
>> >> [...]
>> >> > I believe that this would also break both 'easy_install numpy', and
>> >> > attempts to install numpy via the setup_requires= argument to
>> >> > setuptools.setup (because setup_requires= implicitly calls
>> >> > easy_install). install_requires= would *not* be affected, and
>> >> > setup_requires= would still be fine in cases where numpy was already
>> >> > installed.
>> >>
>> >> On further investigation, it looks like the simplest approach to doing
>> >> this would actually treat easy_install and setup_requires= the same
>> >> way as they treat pip, i.e., they would all be allowed. (I was
>> >> misreading some particularly confusing code in setuptools.)
>> >>
>> >> It also looks like easy_installed packages can at least be safely
>> >> upgraded, so I guess allowing this is okay :-).
>> >
>> >
>> > I just discovered https://bitbucket.org/dholth/setup-requires, which
>> ensures
>> > that setup_requires uses pip instead of easy_install. So we can not only
>> > keep setup-requires working, but make it work significantly better.
>>
>> IIUC this is not something that we (= numpy) could use ourselves, but
>> instead something that everyone who does setup_requires=["numpy"]
>> would have to set up in their individual projects?
>>
>
> Right. I was thinking about using it in scipy. Ah well, I'm sure we can
> manage to not break ``setup_requires=numpy`` in some way.
>


What's the equivalent of
python setup.py build_ext --inplace


brief google search (I didn't follow up on those)

https://github.com/pypa/pip/issues/1887
https://github.com/pypa/pip/issues/18


Given that I rely completely on binary distributions for numpy and scipy, I
won't be affected.

(I'm still allergic to pip and will switch only several years after
everybody else.)

Josef



>
> Ralf
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposal: stop supporting 'setup.py install'; start requiring 'pip install .' instead

2015-10-27 Thread josef.pktd

On Tue, Oct 27, 2015 at 10:59 AM, Nathaniel Smith  wrote:

> On Oct 27, 2015 6:08 AM,  wrote:
> >
> [...]
> >
> >
> > What's the equivalent of
> > python setup.py build_ext --inplace
>
> It's
> python setup.py build_ext --inplace
>
> ;-)
>
Ok, Sorry, I read now the small print and the issue.

Sounds reasonable, given we can `force` our way out.

(If the reason to run to pip is a misspelled dev version number, then it
looks like a hammer to me.)

Josef



> There's also no replacement for setup.py sdist, or setup.py upload (which
> is broken and should never be used), or setup.py clean (which is also
> broken and should never be used in numpy's case). pip is a better package
> installer than raw distutils or setuptools, for non-installation-related
> tasks it has nothing to offer. (With the partial exception of 'pip wheel'.)
>
> -n
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] Bug

2015-10-16 Thread josef.pktd


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] numpy 1.10.1 reduce operation on recarrays

2015-10-16 Thread josef.pktd

was there a change with reduce operations with recarrays in 1.10 or 1.10.1?

Travis shows a new test failure in the statsmodels testsuite with 1.10.1:

ERROR: test suite for 

  File
"/home/travis/miniconda/envs/statsmodels-test/lib/python2.7/site-packages/statsmodels-0.8.0-py2.7-linux-x86_64.egg/statsmodels/base/data.py",
line 131, in _handle_constant
const_idx = np.where(self.exog.ptp(axis=0) == 0)[0].squeeze()
TypeError: cannot perform reduce with flexible type


Sorry for asking so late.
(statsmodels is short on maintainers, and I'm distracted)


statsmodels still has code to support recarrays and structured dtypes from
the time before pandas became popular, but I don't think anyone is using
them together with statsmodels anymore.

Josef
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Bug

2015-10-16 Thread josef.pktd

Sorry, wrong shortcut key, question will arrive later.

Josef

On Fri, Oct 16, 2015 at 1:40 PM,  wrote:

>
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] when did column_stack become C-contiguous?

2015-10-18 Thread josef.pktd

On Mon, Oct 19, 2015 at 1:10 AM, Stephan Hoyer  wrote:

> Looking at the git logs, column_stack appears to have been that way
> (creating a new array with concatenate) since at least NumPy 0.9.2, way
> back in January 2006:
> https://github.com/numpy/numpy/blob/v0.9.2/numpy/lib/shape_base.py#L271
>

Then it must have been changed somewhere else between 1.6.1 amd 1.9.2rc1

I have my notebook and my desktop with different numpy and python versions
next to each other and I don't see a typo in my command.

I assume python 2.7 versus python 3.4 doesn't make a difference.

--

>>> np.column_stack((np.ones(10), np.ones(10))).flags
  C_CONTIGUOUS : False
  F_CONTIGUOUS : True
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

>>> np.__version__
'1.6.1'
>>> import sys
>>> sys.version
'2.7.1 (r271:86832, Nov 27 2010, 18:30:46) [MSC v.1500 32 bit (Intel)]'



>>> np.column_stack((np.ones(10), np.ones(10))).flags
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

>>> np.__version__
'1.9.2rc1'
>>> import sys
>>> sys.version
'3.4.3 (v3.4.3:9b73f1c3e601, Feb 24 2015, 22:44:40) [MSC v.1600 64 bit
(AMD64)]'

---

comparing all flags, owndata also has changed, but I don't think that has
any effect

Josef


>
>
> Stephan
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] when did column_stack become C-contiguous?

2015-10-18 Thread josef.pktd

>>> np.column_stack((np.ones(10), np.ones(10))).flags
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False

>>> np.__version__
'1.9.2rc1'


on my notebook which has numpy 1.6.1 it is f_contiguous


I was just trying to optimize a loop over variable adjustment in
regression, and found out that we lost fortran contiguity.

I always thought column_stack is for fortran usage (linalg)

What's the alternative?
column_stack was one of my favorite commands, and I always assumed we have
in statsmodels the right memory layout to call the linalg libraries.

("assumed" means we don't have timing nor unit tests for it.)

Josef
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] when did column_stack become C-contiguous?

2015-10-18 Thread josef.pktd

On Mon, Oct 19, 2015 at 1:27 AM,  wrote:

>
>
> On Mon, Oct 19, 2015 at 1:10 AM, Stephan Hoyer  wrote:
>
>> Looking at the git logs, column_stack appears to have been that way
>> (creating a new array with concatenate) since at least NumPy 0.9.2, way
>> back in January 2006:
>> https://github.com/numpy/numpy/blob/v0.9.2/numpy/lib/shape_base.py#L271
>>
>
> Then it must have been changed somewhere else between 1.6.1 amd 1.9.2rc1
>
> I have my notebook and my desktop with different numpy and python versions
> next to each other and I don't see a typo in my command.
>
> I assume python 2.7 versus python 3.4 doesn't make a difference.
>
> --
>
> >>> np.column_stack((np.ones(10), np.ones(10))).flags
>   C_CONTIGUOUS : False
>   F_CONTIGUOUS : True
>   OWNDATA : False
>   WRITEABLE : True
>   ALIGNED : True
>   UPDATEIFCOPY : False
>
> >>> np.__version__
> '1.6.1'
> >>> import sys
> >>> sys.version
> '2.7.1 (r271:86832, Nov 27 2010, 18:30:46) [MSC v.1500 32 bit (Intel)]'
>
> 
>
> >>> np.column_stack((np.ones(10), np.ones(10))).flags
>   C_CONTIGUOUS : True
>   F_CONTIGUOUS : False
>   OWNDATA : True
>   WRITEABLE : True
>   ALIGNED : True
>   UPDATEIFCOPY : False
>
> >>> np.__version__
> '1.9.2rc1'
> >>> import sys
> >>> sys.version
> '3.4.3 (v3.4.3:9b73f1c3e601, Feb 24 2015, 22:44:40) [MSC v.1600 64 bit
> (AMD64)]'
>
> ---
>
> comparing all flags, owndata also has changed, but I don't think that has
> any effect
>

qualification

It looks like in 1.9 it depends on the order of the 2-d arrays, which it
didn't do in 1.6

>>> np.column_stack((np.ones(10), np.ones((10, 2), order='F'))).flags
  C_CONTIGUOUS : False
  F_CONTIGUOUS : True
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False


which means the default order looks more like "K" now, not "C", IIUC

Josef



>
> Josef
>
>
>>
>>
>> Stephan
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] when did column_stack become C-contiguous?

2015-10-18 Thread josef.pktd

On Mon, Oct 19, 2015 at 12:35 AM,  wrote:

> >>> np.column_stack((np.ones(10), np.ones(10))).flags
>   C_CONTIGUOUS : True
>   F_CONTIGUOUS : False
>
> >>> np.__version__
> '1.9.2rc1'
>
>
> on my notebook which has numpy 1.6.1 it is f_contiguous
>
>
> I was just trying to optimize a loop over variable adjustment in
> regression, and found out that we lost fortran contiguity.
>
> I always thought column_stack is for fortran usage (linalg)
>
> What's the alternative?
> column_stack was one of my favorite commands, and I always assumed we have
> in statsmodels the right memory layout to call the linalg libraries.
>
> ("assumed" means we don't have timing nor unit tests for it.)
>

What's the difference between using array and column_stack except for a
transpose and memory order?

my current usecase is copying columns on top of each other

#exog0 = np.column_stack((np.ones(nobs), x0, x0s2))

exog0 = np.array((np.ones(nobs), x0, x0s2)).T

exog_opt = exog0.copy(order='F')


the following part is in a loop, followed by some linear algebra for OLS,
res_optim is a scalar parameter.
exog_opt[:, -1] = np.clip(exog0[:, k] + res_optim, 0, np.inf)

Are my assumption on memory access correct, or is there a better way?

(I have quite a bit code in statsmodels that is optimized for fortran
ordered memory layout especially for sequential regression, under the
assumption that column_stack provides that Fortran order.)

Also, do I need to start timing and memory benchmarking or is it obvious
that a loop

for k in range(maxi):
x = arr[:, :k]


depends on memory order?

Josef


>
> Josef
>
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] numpy 1.10.1 reduce operation on recarrays

2015-10-16 Thread josef.pktd

On Fri, Oct 16, 2015 at 2:21 PM, Charles R Harris  wrote:

>
>
> On Fri, Oct 16, 2015 at 12:20 PM, Charles R Harris <
> charlesr.har...@gmail.com> wrote:
>
>>
>>
>> On Fri, Oct 16, 2015 at 11:58 AM,  wrote:
>>
>>> was there a change with reduce operations with recarrays in 1.10 or
>>> 1.10.1?
>>>
>>> Travis shows a new test failure in the statsmodels testsuite with 1.10.1:
>>>
>>> ERROR: test suite for >> 'statsmodels.base.tests.test_data.TestRecarrays'>
>>>
>>>   File
>>> "/home/travis/miniconda/envs/statsmodels-test/lib/python2.7/site-packages/statsmodels-0.8.0-py2.7-linux-x86_64.egg/statsmodels/base/data.py",
>>> line 131, in _handle_constant
>>> const_idx = np.where(self.exog.ptp(axis=0) == 0)[0].squeeze()
>>> TypeError: cannot perform reduce with flexible type
>>>
>>>
>>> Sorry for asking so late.
>>> (statsmodels is short on maintainers, and I'm distracted)
>>>
>>>
>>> statsmodels still has code to support recarrays and structured dtypes
>>> from the time before pandas became popular, but I don't think anyone is
>>> using them together with statsmodels anymore.
>>>
>>>
>> There were several commits dealing both recarrays and ufuncs, so this
>> might well be a regression.
>>
>>
> A bisection would be helpful. Also, open an issue.
>


The reason for the test failure might be somewhere else hiding behind
several layers of statsmodels, but only started to show up with numpy 1.10.1

I already have the reduce exception with my currently installed numpy
'1.9.2rc1'

>>> x = np.random.random(9*3).view([('const', 'f8'),('x_1', 'f8'), ('x_2',
'f8')]).view(np.recarray)

>>> np.ptp(x, axis=0)
Traceback (most recent call last):
  File "", line 1, in 
  File
"C:\programs\WinPython-64bit-3.4.3.1\python-3.4.3.amd64\lib\site-packages\numpy\core\fromnumeric.py",
line 2047, in ptp
return ptp(axis, out)
TypeError: cannot perform reduce with flexible type


Sounds like fun, and I don't even know how to automatically bisect.

Josef


>
> Chuck
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] when did column_stack become C-contiguous?

2015-10-19 Thread josef.pktd

On Mon, Oct 19, 2015 at 2:14 AM, Nathaniel Smith  wrote:

> On Sun, Oct 18, 2015 at 9:35 PM,   wrote:
>  np.column_stack((np.ones(10), np.ones(10))).flags
> >   C_CONTIGUOUS : True
> >   F_CONTIGUOUS : False
> >
>  np.__version__
> > '1.9.2rc1'
> >
> >
> > on my notebook which has numpy 1.6.1 it is f_contiguous
> >
> >
> > I was just trying to optimize a loop over variable adjustment in
> regression,
> > and found out that we lost fortran contiguity.
> >
> > I always thought column_stack is for fortran usage (linalg)
> >
> > What's the alternative?
> > column_stack was one of my favorite commands, and I always assumed we
> have
> > in statsmodels the right memory layout to call the linalg libraries.
> >
> > ("assumed" means we don't have timing nor unit tests for it.)
>
> In general practice no numpy functions make any guarantee about memory
> layout, unless that's explicitly a documented part of their contract
> (e.g. 'ascontiguous', or some functions that take an order= argument
> -- I say "some" b/c there are functions like 'reshape' that take an
> argument called order= that doesn't actually refer to memory layout).
> This isn't so much an official policy as just a fact of life -- if
> no-one has any idea that the someone is depending on some memory
> layout detail then there's no way to realize that we've broken
> something. (But it is a good policy IMO.)
>

I understand that in general.

However, I always thought column_stack is a array creation function which
have guaranteed memory layout. And since it's stacking by columns I thought
that order is always Fortran.
And the fact that it doesn't have an order keyword yet, I thought is just a
missing extension.



>
> If this kind of problem gets caught during a pre-release cycle then we
> generally do try to fix it, because we try not to break code, but if
> it's been broken for 2 full releases then there's no much we can do --
> we can't go back in time to fix it so it sounds like you're stuck
> working around the problem no matter what (unless you want to refuse
> to support 1.9.0 through 1.10.1, which I assume you don't... worst
> case, you just have to do a global search replace of np.column_stack
> with statsmodels.utils.column_stack_f, right?).
>
> And the regression issue seems like the only real argument for
> changing it back -- we'd never guarantee f-contiguity here if starting
> from a blank slate, I think?
>

When the cat is out of the bag, the down stream developer writes
compatibility code or helper functions.

I will do that at at least the parts I know are intentionally designed for
F memory order.

---

statsmodels doesn't really check or consistently optimize the memory order,
except in some cython functions.
But, I thought we should be doing quite well with getting Fortran ordered
arrays. I only paid attention where we have more extensive loops internally.

Nathniel, Does patsy guarantee memory layout (F-contiguous) when creating
design matrices?

Josef



>
> -n
>
> --
> Nathaniel J. Smith -- http://vorpus.org
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] when did column_stack become C-contiguous?

2015-10-19 Thread josef.pktd

On Mon, Oct 19, 2015 at 5:16 AM, Sebastian Berg 
wrote:

> On Mo, 2015-10-19 at 01:34 -0400, josef.p...@gmail.com wrote:
> >
>
> 
>
>
> >
> > It looks like in 1.9 it depends on the order of the 2-d arrays, which
> > it didn't do in 1.6
> >
>
> Yes, it uses concatenate, and concatenate probably changed in 1.7 to use
> "K" (since "K" did not really exists before 1.7 IIRC).
> Not sure what we can do about it, the order is not something that is
> easily fixed unless explicitly given. It might be optimized (as in this
> case I would guess).
> Whether or not doing the fastest route for these kind of functions is
> faster for the user is of course impossible to know, we can only hope
> that in most cases it is better.
> If someone has an idea how to decide I am all ears, but I think all we
> can do is put in asserts/tests in the downstream code if it relies
> heavily on the order (or just copy, if the order is wrong) :(, another
> example is change of the output order in advanced indexing in some
> cases, it makes it faster sometimes, and probably slower in others, what
> is right seems very much non-trivial.
>

To understand the reason:

Is this to have more efficient memory access during copying?

AFAIU, column_stack needs to create a new array which has to be either F or
C contiguous, so we always have to pick one of the two. With a large number
of 1d arrays it seemed more "intuitive" to me to copy them by columns.

Josef



>
> - Sebastian
>
>
> >
> > >>> np.column_stack((np.ones(10), np.ones((10, 2), order='F'))).flags
> >   C_CONTIGUOUS : False
> >   F_CONTIGUOUS : True
> >   OWNDATA : True
> >   WRITEABLE : True
> >   ALIGNED : True
> >   UPDATEIFCOPY : False
> >
> >
> >
> >
> > which means the default order looks more like "K" now, not "C", IIUC
> >
> >
> > Josef
> >
> >
> >
> >
> >
> > Josef
> >
> >
> >
> >
> >
> > Stephan
> >
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> >
> >
> >
> >
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] when did column_stack become C-contiguous?

2015-10-19 Thread josef.pktd

On Mon, Oct 19, 2015 at 9:00 AM,  wrote:

>
>
> On Mon, Oct 19, 2015 at 5:16 AM, Sebastian Berg <
> sebast...@sipsolutions.net> wrote:
>
>> On Mo, 2015-10-19 at 01:34 -0400, josef.p...@gmail.com wrote:
>> >
>>
>> 
>>
>>
>> >
>> > It looks like in 1.9 it depends on the order of the 2-d arrays, which
>> > it didn't do in 1.6
>> >
>>
>> Yes, it uses concatenate, and concatenate probably changed in 1.7 to use
>> "K" (since "K" did not really exists before 1.7 IIRC).
>> Not sure what we can do about it, the order is not something that is
>> easily fixed unless explicitly given. It might be optimized (as in this
>> case I would guess).
>> Whether or not doing the fastest route for these kind of functions is
>> faster for the user is of course impossible to know, we can only hope
>> that in most cases it is better.
>> If someone has an idea how to decide I am all ears, but I think all we
>> can do is put in asserts/tests in the downstream code if it relies
>> heavily on the order (or just copy, if the order is wrong) :(, another
>> example is change of the output order in advanced indexing in some
>> cases, it makes it faster sometimes, and probably slower in others, what
>> is right seems very much non-trivial.
>>
>
> To understand the reason:
>
> Is this to have more efficient memory access during copying?
>
> AFAIU, column_stack needs to create a new array which has to be either F
> or C contiguous, so we always have to pick one of the two. With a large
> number of 1d arrays it seemed more "intuitive" to me to copy them by
> columns.
>


just as background

I was mainly surprised last night about having my long held beliefs
shattered. I skipped numpy 1.7 and 1.8 in my development environment and
still need to catch up now that I use 1.9 as my main numpy version.

I might have to update a bit my "folk wisdom", which is not codified
anywhere and doesn't have unit tests.

For example, the improvement iteration for Fortran contiguous or not C or F
contiguous arrays sounded very useful, but I never checked if it would
affect us.

Josef



>
> Josef
>
>
>
>>
>> - Sebastian
>>
>>
>> >
>> > >>> np.column_stack((np.ones(10), np.ones((10, 2), order='F'))).flags
>> >   C_CONTIGUOUS : False
>> >   F_CONTIGUOUS : True
>> >   OWNDATA : True
>> >   WRITEABLE : True
>> >   ALIGNED : True
>> >   UPDATEIFCOPY : False
>> >
>> >
>> >
>> >
>> > which means the default order looks more like "K" now, not "C", IIUC
>> >
>> >
>> > Josef
>> >
>> >
>> >
>> >
>> >
>> > Josef
>> >
>> >
>> >
>> >
>> >
>> > Stephan
>> >
>> > ___
>> > NumPy-Discussion mailing list
>> > NumPy-Discussion@scipy.org
>> >
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>> >
>> >
>> >
>> >
>> >
>> > ___
>> > NumPy-Discussion mailing list
>> > NumPy-Discussion@scipy.org
>> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] when did column_stack become C-contiguous?

2015-10-19 Thread josef.pktd

On Mon, Oct 19, 2015 at 9:15 PM, Nathaniel Smith  wrote:

> On Mon, Oct 19, 2015 at 5:55 AM,   wrote:
> >
> >
> > On Mon, Oct 19, 2015 at 2:14 AM, Nathaniel Smith  wrote:
> >>
> >> On Sun, Oct 18, 2015 at 9:35 PM,   wrote:
> >>  np.column_stack((np.ones(10), np.ones(10))).flags
> >> >   C_CONTIGUOUS : True
> >> >   F_CONTIGUOUS : False
> >> >
> >>  np.__version__
> >> > '1.9.2rc1'
> >> >
> >> >
> >> > on my notebook which has numpy 1.6.1 it is f_contiguous
> >> >
> >> >
> >> > I was just trying to optimize a loop over variable adjustment in
> >> > regression,
> >> > and found out that we lost fortran contiguity.
> >> >
> >> > I always thought column_stack is for fortran usage (linalg)
> >> >
> >> > What's the alternative?
> >> > column_stack was one of my favorite commands, and I always assumed we
> >> > have
> >> > in statsmodels the right memory layout to call the linalg libraries.
> >> >
> >> > ("assumed" means we don't have timing nor unit tests for it.)
> >>
> >> In general practice no numpy functions make any guarantee about memory
> >> layout, unless that's explicitly a documented part of their contract
> >> (e.g. 'ascontiguous', or some functions that take an order= argument
> >> -- I say "some" b/c there are functions like 'reshape' that take an
> >> argument called order= that doesn't actually refer to memory layout).
> >> This isn't so much an official policy as just a fact of life -- if
> >> no-one has any idea that the someone is depending on some memory
> >> layout detail then there's no way to realize that we've broken
> >> something. (But it is a good policy IMO.)
> >
> >
> > I understand that in general.
> >
> > However, I always thought column_stack is a array creation function which
> > have guaranteed memory layout. And since it's stacking by columns I
> thought
> > that order is always Fortran.
> > And the fact that it doesn't have an order keyword yet, I thought is
> just a
> > missing extension.
>
> I guess I don't know what to say except that I'm sorry to hear that
> and sorry that no-one noticed until several releases later.
>


Were there more contiguity changes in 0.10?
I just saw a large number of test errors and failures in statespace models
which are heavily based on cython code where it's not just a question of
performance.

I don't know yet what's going on, but I just saw that we have some explicit
tests for fortran contiguity which just started to fail.




>
> >> If this kind of problem gets caught during a pre-release cycle then we
> >> generally do try to fix it, because we try not to break code, but if
> >> it's been broken for 2 full releases then there's no much we can do --
> >> we can't go back in time to fix it so it sounds like you're stuck
> >> working around the problem no matter what (unless you want to refuse
> >> to support 1.9.0 through 1.10.1, which I assume you don't... worst
> >> case, you just have to do a global search replace of np.column_stack
> >> with statsmodels.utils.column_stack_f, right?).
> >>
> >> And the regression issue seems like the only real argument for
> >> changing it back -- we'd never guarantee f-contiguity here if starting
> >> from a blank slate, I think?
> >
> >
> > When the cat is out of the bag, the down stream developer writes
> > compatibility code or helper functions.
> >
> > I will do that at at least the parts I know are intentionally designed
> for F
> > memory order.
> >
> > ---
> >
> > statsmodels doesn't really check or consistently optimize the memory
> order,
> > except in some cython functions.
> > But, I thought we should be doing quite well with getting Fortran ordered
> > arrays. I only paid attention where we have more extensive loops
> internally.
> >
> > Nathniel, Does patsy guarantee memory layout (F-contiguous) when creating
> > design matrices?
>
> I never thought about it :-). So: no, it looks like right now patsy
> usually returns C-order matrices (or really, whatever np.empty or
> np.repeat returns), and there aren't any particular guarantees that
> this will continue to be the case in the future.
>
> Is returning matrices in F-contiguous layout really important? Should
> there be a return_type="fortran_matrix" option or something like that?
>

I don't know, yet. My intuition was that it would be better because we feed
the arrays directly to pinv/SVD or QR which, I think, require by default
Fortran contiguous.

However, my intuition might not be correct, and it might not make much
difference in a single OLS estimation.

There are a few critical loops in variable selection that I'm planning to
investigate to see how much it matters.
Memory optimization was never high in our priority compared to expanding
the functionality overall, but reading the Julia mailing list is starting
to worry me a bit. :)

(I'm even starting to see the reason for multiple dispatch.)

Josef


>
> -n
>
> --
> Nathaniel J. Smith -- http://vorpus.org

Re: [Numpy-discussion] numpy 1.10.1 reduce operation on recarrays

2015-10-19 Thread josef.pktd

On Fri, Oct 16, 2015 at 9:31 PM, Allan Haldane 
wrote:

> On 10/16/2015 09:17 PM, josef.p...@gmail.com wrote:
>
>>
>>
>> On Fri, Oct 16, 2015 at 8:56 PM, Allan Haldane > > wrote:
>>
>> On 10/16/2015 05:31 PM, josef.p...@gmail.com
>>  wrote:
>> >
>> >
>> > On Fri, Oct 16, 2015 at 2:21 PM, Charles R Harris
>> > 
>> > >> wrote:
>> >
>> >
>> >
>> > On Fri, Oct 16, 2015 at 12:20 PM, Charles R Harris
>> > 
>> > >> wrote:
>> >
>> >
>> >
>> > On Fri, Oct 16, 2015 at 11:58 AM, > 
>>  > >
>> >> wrote:
>>  >
>>  > was there a change with reduce operations with
>> recarrays in
>>  > 1.10 or 1.10.1?
>>  >
>>  > Travis shows a new test failure in the statsmodels
>> testsuite
>>  > with 1.10.1:
>>  >
>>  > ERROR: test suite for >  > 'statsmodels.base.tests.test_data.TestRecarrays'>
>>  >
>>  >   File
>>  >
>>
>> "/home/travis/miniconda/envs/statsmodels-test/lib/python2.7/site-packages/statsmodels-0.8.0-py2.7-linux-x86_64.egg/statsmodels/base/data.py",
>>  > line 131, in _handle_constant
>>  > const_idx = np.where(self.exog.ptp(axis=0) ==
>>  > 0)[0].squeeze()
>>  > TypeError: cannot perform reduce with flexible type
>>  >
>>  >
>>  > Sorry for asking so late.
>>  > (statsmodels is short on maintainers, and I'm
>> distracted)
>>  >
>>  >
>>  > statsmodels still has code to support recarrays and
>>  > structured dtypes from the time before pandas became
>>  > popular, but I don't think anyone is using them
>> together
>>  > with statsmodels anymore.
>>  >
>>  >
>>  > There were several commits dealing both recarrays and
>> ufuncs, so
>>  > this might well be a regression.
>>  >
>>  >
>>  > A bisection would be helpful. Also, open an issue.
>>  >
>>  >
>>  >
>>  > The reason for the test failure might be somewhere else hiding
>> behind
>>  > several layers of statsmodels, but only started to show up with
>> numpy 1.10.1
>>  >
>>  > I already have the reduce exception with my currently installed
>> numpy
>>  > '1.9.2rc1'
>>  >
>>   x = np.random.random(9*3).view([('const', 'f8'),('x_1', 'f8'),
>>  > ('x_2', 'f8')]).view(np.recarray)
>>  >
>>   np.ptp(x, axis=0)
>>  > Traceback (most recent call last):
>>  >   File "", line 1, in 
>>  >   File
>>  >
>>
>> "C:\programs\WinPython-64bit-3.4.3.1\python-3.4.3.amd64\lib\site-packages\numpy\core\fromnumeric.py",
>>  > line 2047, in ptp
>>  > return ptp(axis, out)
>>  > TypeError: cannot perform reduce with flexible type
>>  >
>>  >
>>  > Sounds like fun, and I don't even know how to automatically bisect.
>>  >
>>  > Josef
>>
>> That example isn't the problem (ptp should definitely fail on
>> structured
>> arrays), but I've tracked down what is - it has to do with views of
>> record arrays.
>>
>> The fix looks simple, I'll get it in for the next release.
>>
>>
>> Thanks,
>>
>> I realized that at that point in the statsmodels code we should have
>> only regular ndarrays, so the array conversion fails somewhere.
>>
>> AFAICS, the main helper function to convert is
>>
>> def struct_to_ndarray(arr):
>>  return arr.view((float, len(arr.dtype.names)))
>>
>> which doesn't look like it will handle other dtypes than float64. Nobody
>> ever complained, so maybe our test suite is the only user of this.
>>
>> What is now the recommended way of converting structured
>> dtypes/recarrays to ndarrays?
>>
>> Josef
>>
>
> Yes, that's the code I narrowed it down to as well. I think the code in
> statsmodels is fine, the problem is actually a  bug I must admit I
> introduced in changes to the way views of recarrays work.
>
> If you are curious, the bug is in this line:
>
> https://github.com/numpy/numpy/blob/master/numpy/core/records.py#L467
>
> This line was intended to fix the problem that accessing a nested record
> array field would lose the 'np.record' dtype. I only considered void
> structured arrays, and had forgotten about sub-arrays which statsmodels
> uses.
>
> I think the fix is to replace `issubclass(val.type,

Re: [Numpy-discussion] when did column_stack become C-contiguous?

2015-10-20 Thread josef.pktd

On Mon, Oct 19, 2015 at 9:51 PM,  wrote:

>
>
> On Mon, Oct 19, 2015 at 9:15 PM, Nathaniel Smith  wrote:
>
>> On Mon, Oct 19, 2015 at 5:55 AM,   wrote:
>> >
>> >
>> > On Mon, Oct 19, 2015 at 2:14 AM, Nathaniel Smith  wrote:
>> >>
>> >> On Sun, Oct 18, 2015 at 9:35 PM,   wrote:
>> >>  np.column_stack((np.ones(10), np.ones(10))).flags
>> >> >   C_CONTIGUOUS : True
>> >> >   F_CONTIGUOUS : False
>> >> >
>> >>  np.__version__
>> >> > '1.9.2rc1'
>> >> >
>> >> >
>> >> > on my notebook which has numpy 1.6.1 it is f_contiguous
>> >> >
>> >> >
>> >> > I was just trying to optimize a loop over variable adjustment in
>> >> > regression,
>> >> > and found out that we lost fortran contiguity.
>> >> >
>> >> > I always thought column_stack is for fortran usage (linalg)
>> >> >
>> >> > What's the alternative?
>> >> > column_stack was one of my favorite commands, and I always assumed we
>> >> > have
>> >> > in statsmodels the right memory layout to call the linalg libraries.
>> >> >
>> >> > ("assumed" means we don't have timing nor unit tests for it.)
>> >>
>> >> In general practice no numpy functions make any guarantee about memory
>> >> layout, unless that's explicitly a documented part of their contract
>> >> (e.g. 'ascontiguous', or some functions that take an order= argument
>> >> -- I say "some" b/c there are functions like 'reshape' that take an
>> >> argument called order= that doesn't actually refer to memory layout).
>> >> This isn't so much an official policy as just a fact of life -- if
>> >> no-one has any idea that the someone is depending on some memory
>> >> layout detail then there's no way to realize that we've broken
>> >> something. (But it is a good policy IMO.)
>> >
>> >
>> > I understand that in general.
>> >
>> > However, I always thought column_stack is a array creation function
>> which
>> > have guaranteed memory layout. And since it's stacking by columns I
>> thought
>> > that order is always Fortran.
>> > And the fact that it doesn't have an order keyword yet, I thought is
>> just a
>> > missing extension.
>>
>> I guess I don't know what to say except that I'm sorry to hear that
>> and sorry that no-one noticed until several releases later.
>>
>
>
> Were there more contiguity changes in 0.10?
> I just saw a large number of test errors and failures in statespace models
> which are heavily based on cython code where it's not just a question of
> performance.
>
> I don't know yet what's going on, but I just saw that we have some
> explicit tests for fortran contiguity which just started to fail.
>
>
>
>
>>
>> >> If this kind of problem gets caught during a pre-release cycle then we
>> >> generally do try to fix it, because we try not to break code, but if
>> >> it's been broken for 2 full releases then there's no much we can do --
>> >> we can't go back in time to fix it so it sounds like you're stuck
>> >> working around the problem no matter what (unless you want to refuse
>> >> to support 1.9.0 through 1.10.1, which I assume you don't... worst
>> >> case, you just have to do a global search replace of np.column_stack
>> >> with statsmodels.utils.column_stack_f, right?).
>> >>
>> >> And the regression issue seems like the only real argument for
>> >> changing it back -- we'd never guarantee f-contiguity here if starting
>> >> from a blank slate, I think?
>> >
>> >
>> > When the cat is out of the bag, the down stream developer writes
>> > compatibility code or helper functions.
>> >
>> > I will do that at at least the parts I know are intentionally designed
>> for F
>> > memory order.
>> >
>> > ---
>> >
>> > statsmodels doesn't really check or consistently optimize the memory
>> order,
>> > except in some cython functions.
>> > But, I thought we should be doing quite well with getting Fortran
>> ordered
>> > arrays. I only paid attention where we have more extensive loops
>> internally.
>> >
>> > Nathniel, Does patsy guarantee memory layout (F-contiguous) when
>> creating
>> > design matrices?
>>
>> I never thought about it :-). So: no, it looks like right now patsy
>> usually returns C-order matrices (or really, whatever np.empty or
>> np.repeat returns), and there aren't any particular guarantees that
>> this will continue to be the case in the future.
>>
>> Is returning matrices in F-contiguous layout really important? Should
>> there be a return_type="fortran_matrix" option or something like that?
>>
>
> I don't know, yet. My intuition was that it would be better because we
> feed the arrays directly to pinv/SVD or QR which, I think, require by
> default Fortran contiguous.
>
> However, my intuition might not be correct, and it might not make much
> difference in a single OLS estimation.
>

I did some quick timing checks of pinv and qr, and the Fortran ordered is
only about 5% to 15% faster and uses about the same amount of memory
(watching the Task manager). So, nothing to get excited

Re: [Numpy-discussion] numpy 1.10.1 reduce operation on recarrays

2015-10-16 Thread josef.pktd

On Fri, Oct 16, 2015 at 8:56 PM, Allan Haldane 
wrote:

> On 10/16/2015 05:31 PM, josef.p...@gmail.com wrote:
> >
> >
> > On Fri, Oct 16, 2015 at 2:21 PM, Charles R Harris
> > > wrote:
> >
> >
> >
> > On Fri, Oct 16, 2015 at 12:20 PM, Charles R Harris
> > >
> wrote:
> >
> >
> >
> > On Fri, Oct 16, 2015 at 11:58 AM,  > > wrote:
> >
> > was there a change with reduce operations with recarrays in
> > 1.10 or 1.10.1?
> >
> > Travis shows a new test failure in the statsmodels testsuite
> > with 1.10.1:
> >
> > ERROR: test suite for  > 'statsmodels.base.tests.test_data.TestRecarrays'>
> >
> >   File
> >
>  
> "/home/travis/miniconda/envs/statsmodels-test/lib/python2.7/site-packages/statsmodels-0.8.0-py2.7-linux-x86_64.egg/statsmodels/base/data.py",
> > line 131, in _handle_constant
> > const_idx = np.where(self.exog.ptp(axis=0) ==
> > 0)[0].squeeze()
> > TypeError: cannot perform reduce with flexible type
> >
> >
> > Sorry for asking so late.
> > (statsmodels is short on maintainers, and I'm distracted)
> >
> >
> > statsmodels still has code to support recarrays and
> > structured dtypes from the time before pandas became
> > popular, but I don't think anyone is using them together
> > with statsmodels anymore.
> >
> >
> > There were several commits dealing both recarrays and ufuncs, so
> > this might well be a regression.
> >
> >
> > A bisection would be helpful. Also, open an issue.
> >
> >
> >
> > The reason for the test failure might be somewhere else hiding behind
> > several layers of statsmodels, but only started to show up with numpy
> 1.10.1
> >
> > I already have the reduce exception with my currently installed numpy
> > '1.9.2rc1'
> >
>  x = np.random.random(9*3).view([('const', 'f8'),('x_1', 'f8'),
> > ('x_2', 'f8')]).view(np.recarray)
> >
>  np.ptp(x, axis=0)
> > Traceback (most recent call last):
> >   File "", line 1, in 
> >   File
> >
> "C:\programs\WinPython-64bit-3.4.3.1\python-3.4.3.amd64\lib\site-packages\numpy\core\fromnumeric.py",
> > line 2047, in ptp
> > return ptp(axis, out)
> > TypeError: cannot perform reduce with flexible type
> >
> >
> > Sounds like fun, and I don't even know how to automatically bisect.
> >
> > Josef
>
> That example isn't the problem (ptp should definitely fail on structured
> arrays), but I've tracked down what is - it has to do with views of
> record arrays.
>
> The fix looks simple, I'll get it in for the next release.
>

Thanks,

I realized that at that point in the statsmodels code we should have only
regular ndarrays, so the array conversion fails somewhere.

AFAICS, the main helper function to convert is

def struct_to_ndarray(arr):
return arr.view((float, len(arr.dtype.names)))

which doesn't look like it will handle other dtypes than float64. Nobody
ever complained, so maybe our test suite is the only user of this.

What is now the recommended way of converting structured dtypes/recarrays
to ndarrays?

Josef



>
> Allan
>
>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Interesting discussion on copyrighting files.

2015-10-17 Thread josef.pktd

On Thu, Oct 15, 2015 at 11:28 PM, Charles R Harris <
charlesr.har...@gmail.com> wrote:

> Worth a read at A .
>

Thanks, it is worth a read.

Most of the time when I see code copied from scipy or statsmodels, it is
properly attributed.
But every once in a while (like just now) I see code in an interesting
sounding package on github where I start to recognize parts because they
have my code comments still left in but don't have an attribution to the
origin.

It's almost ok if it's MIT or BSD licensed because then I can "borrow back"
the changes, but not if the new license is GPL.

This is to the point in the discussion of seeing modules or functions that
got isolated from the parent package.

Josef
(slightly grumpy)

>
> Chuck
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

1 2 >

1 - 100 of 161 matches

Mail list logo