Re: [Numpy-discussion] Why do mgrid and meshgrid not return broadcast arrays?

2017-03-08 Thread Juan Nunez-Iglesias
Ah, fantastic, thanks Per!

I'd still be interested to hear from the core devs as to why this isn't the 
default, both with meshgrid and mgrid...

Juan.

On 9 Mar 2017, 6:29 PM +1100, per.brodtk...@ffi.no, wrote:
> Hi, Juan.
>
> Meshgrid can actually give what you want, but you must use the options: 
> copy=False  and indexing=’ij’.
>
> In [7]: %timeit np.meshgrid(np.arange(512), np.arange(512))
> 1000 loops, best of 3: 1.24 ms per loop
>
> In [8]: %timeit np.meshgrid(np.arange(512), np.arange(512), copy=False)
> 1 loops, best of 3: 27 µs per loop
>
> In [9]: %timeit np.meshgrid(np.arange(512), np.arange(512), copy=False, 
> indexing='ij')
> 1 loops, best of 3: 23 µs per loop
>
> Best regards
> Per A. Brodtkorb
>
> From: NumPy-Discussion [mailto:numpy-discussion-boun...@scipy.org] On Behalf 
> Of Juan Nunez-Iglesias
> Sent: 9. mars 2017 04:20
> To: Discussion of Numerical Python
> Subject: Re: [Numpy-discussion] Why do mgrid and meshgrid not return 
> broadcast arrays?
>
> Hi Warren,
>
> ogrid doesn’t solve my problem. Note that my code returns arrays that would 
> evaluate as equal to the mgrid output. It’s just that they are copied in 
> mgrid into a giant array, instead of broadcast:
>
>
> In [176]: a0, b0 = np.mgrid[:5, :5]
>
> In [177]: a1, b1 = th.broadcast_mgrid((np.arange(5), np.arange(5)))
>
> In [178]: a0
> Out[178]:
> array([[0, 0, 0, 0, 0],
>        [1, 1, 1, 1, 1],
>        [2, 2, 2, 2, 2],
>        [3, 3, 3, 3, 3],
>        [4, 4, 4, 4, 4]])
>
> In [179]: a1
> Out[179]:
> array([[0, 0, 0, 0, 0],
>        [1, 1, 1, 1, 1],
>        [2, 2, 2, 2, 2],
>        [3, 3, 3, 3, 3],
>        [4, 4, 4, 4, 4]])
>
> In [180]: a0.strides
> Out[180]: (40, 8)
>
> In [181]: a1.strides
> Out[181]: (8, 0)
>
>
>
> On 9 Mar 2017, 2:05 PM +1100, Warren Weckesser <warren.weckes...@gmail.com>, 
> wrote:
>
>
>
> On Wed, Mar 8, 2017 at 9:48 PM, Juan Nunez-Iglesias <jni.s...@gmail.com> 
> wrote:
> I was a bit surprised to discover that both meshgrid nor mgrid return fully 
> instantiated arrays, when simple broadcasting (ie with stride=0 for other 
> axes) is functionally identical and happens much, much faster.
>
>
> Take a look at ogrid: 
> https://docs.scipy.org/doc/numpy/reference/generated/numpy.ogrid.html
> Warren
>
> > I wrote my own function to do this:
> >
> >
> > def broadcast_mgrid(arrays):
> >     shape = tuple(map(len, arrays))
> >     ndim = len(shape)
> >     result = []
> >     for i, arr in enumerate(arrays, start=1):
> >         reshaped = np.broadcast_to(arr[[...] + [np.newaxis] * (ndim - i)],
> >                                    shape)
> >         result.append(reshaped)
> >     return result
> >
> >
> > For even a modest-sized 512 x 512 grid, this version is close to 100x 
> > faster:
> >
> >
> > In [154]: %timeit th.broadcast_mgrid((np.arange(512), np.arange(512)))
> > 1 loops, best of 3: 25.9 µs per loop
> >
> > In [156]: %timeit np.meshgrid(np.arange(512), np.arange(512))
> > 100 loops, best of 3: 2.02 ms per loop
> >
> > In [157]: %timeit np.mgrid[:512, :512]
> > 100 loops, best of 3: 4.84 ms per loop
> >
> >
> > Is there a conscious design decision as to why this isn’t what 
> > meshgrid/mgrid do already? Or would a PR be welcome to do this?
> >
> > Thanks,
> >
> > Juan.
> >
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Why do mgrid and meshgrid not return broadcast arrays?

2017-03-08 Thread Juan Nunez-Iglesias
Hi Warren,

ogrid doesn’t solve my problem. Note that my code returns arrays that would 
evaluate as equal to the mgrid output. It’s just that they are copied in mgrid 
into a giant array, instead of broadcast:


In [176]: a0, b0 = np.mgrid[:5, :5]

In [177]: a1, b1 = th.broadcast_mgrid((np.arange(5), np.arange(5)))

In [178]: a0
Out[178]:
array([[0, 0, 0, 0, 0],
       [1, 1, 1, 1, 1],
       [2, 2, 2, 2, 2],
       [3, 3, 3, 3, 3],
       [4, 4, 4, 4, 4]])

In [179]: a1
Out[179]:
array([[0, 0, 0, 0, 0],
       [1, 1, 1, 1, 1],
       [2, 2, 2, 2, 2],
       [3, 3, 3, 3, 3],
       [4, 4, 4, 4, 4]])

In [180]: a0.strides
Out[180]: (40, 8)

In [181]: a1.strides
Out[181]: (8, 0)



On 9 Mar 2017, 2:05 PM +1100, Warren Weckesser <warren.weckes...@gmail.com>, 
wrote:
>
>
> > On Wed, Mar 8, 2017 at 9:48 PM, Juan Nunez-Iglesias <jni.s...@gmail.com> 
> > wrote:
> > > I was a bit surprised to discover that both meshgrid nor mgrid return 
> > > fully instantiated arrays, when simple broadcasting (ie with stride=0 for 
> > > other axes) is functionally identical and happens much, much faster.
> > >
> >
> >
> > Take a look at ogrid: 
> > https://docs.scipy.org/doc/numpy/reference/generated/numpy.ogrid.html
> >
> > Warren
> >
> >
> > > I wrote my own function to do this:
> > >
> > >
> > > def broadcast_mgrid(arrays):
> > >     shape = tuple(map(len, arrays))
> > >     ndim = len(shape)
> > >     result = []
> > >     for i, arr in enumerate(arrays, start=1):
> > >         reshaped = np.broadcast_to(arr[[...] + [np.newaxis] * (ndim - i)],
> > >                                    shape)
> > >         result.append(reshaped)
> > >     return result
> > >
> > >
> > > For even a modest-sized 512 x 512 grid, this version is close to 100x 
> > > faster:
> > >
> > >
> > > In [154]: %timeit th.broadcast_mgrid((np.arange(512), np.arange(512)))
> > > 1 loops, best of 3: 25.9 µs per loop
> > >
> > > In [156]: %timeit np.meshgrid(np.arange(512), np.arange(512))
> > > 100 loops, best of 3: 2.02 ms per loop
> > >
> > > In [157]: %timeit np.mgrid[:512, :512]
> > > 100 loops, best of 3: 4.84 ms per loop
> > >
> > >
> > > Is there a conscious design decision as to why this isn’t what 
> > > meshgrid/mgrid do already? Or would a PR be welcome to do this?
> > >
> > > Thanks,
> > >
> > > Juan.
> > >
> > > ___
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion@scipy.org
> > > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> > >
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Why do mgrid and meshgrid not return broadcast arrays?

2017-03-08 Thread Juan Nunez-Iglesias
I was a bit surprised to discover that both meshgrid nor mgrid return fully 
instantiated arrays, when simple broadcasting (ie with stride=0 for other axes) 
is functionally identical and happens much, much faster.

I wrote my own function to do this:


def broadcast_mgrid(arrays):
    shape = tuple(map(len, arrays))
    ndim = len(shape)
    result = []
    for i, arr in enumerate(arrays, start=1):
        reshaped = np.broadcast_to(arr[[...] + [np.newaxis] * (ndim - i)],
                                   shape)
        result.append(reshaped)
    return result


For even a modest-sized 512 x 512 grid, this version is close to 100x faster:


In [154]: %timeit th.broadcast_mgrid((np.arange(512), np.arange(512)))
1 loops, best of 3: 25.9 µs per loop

In [156]: %timeit np.meshgrid(np.arange(512), np.arange(512))
100 loops, best of 3: 2.02 ms per loop

In [157]: %timeit np.mgrid[:512, :512]
100 loops, best of 3: 4.84 ms per loop


Is there a conscious design decision as to why this isn’t what meshgrid/mgrid 
do already? Or would a PR be welcome to do this?

Thanks,

Juan.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ANN: NumExpr3 Alpha

2017-02-18 Thread Juan Nunez-Iglesias
Hi everyone,

Thanks for this. It looks absolutely fantastic. I've been putting off using 
numexpr but it looks like I don't have a choice anymore. ;)

Regarding feature requests, I've always found it off putting that I have to 
wrap my expressions in a string to speed them up. Has anyone explored the 
possibility of using Python 3.6's frame evaluation API to do this? I remember a 
vague discussion on this list a while back but I don't know whether anything 
came of it.

Thanks!

Juan.

On 18 Feb 2017, 3:42 AM +1100, Robert McLeod , wrote:
> Hi David,
>
> Thanks for your comments, reply below the fold.
>
> > On Fri, Feb 17, 2017 at 4:34 PM, Daπid  wrote:
> > > This is very nice indeed!
> > >
> > > On 17 February 2017 at 12:15, Robert McLeod  wrote:
> > > > * bytes and unicode support
> > > > * reductions (mean, sum, prod, std)
> > >
> > > I use both a lot, maybe I can help you get them working.
> > >
> > > Also, regarding "Vectorization hasn't been done yet with cmath
> > > functions for real numbers (such as sqrt(), exp(), etc.), only for
> > > complex functions". What is the bottleneck? Is it in GCC or just
> > > someone has to sit down and adapt it?
> >
> > I just haven't done it yet.  Basically I'm moving from Switzerland to 
> > Canada in a week so this was the gap to push something out that's usable if 
> > not perfect. Rather I just import cmath functions, which are inlined but I 
> > suspect what's needed is to break them down into their components. For 
> > example, the complex arccos function looks like this:
> >
> > static void
> > nc_acos( npy_intp n, npy_complex64 *x, npy_complex64 *r)
> > {
> >     npy_complex64 a;
> >     for( npy_intp I = 0; I < n; I++ ) {
> >         a = x[I];
> >         _inline_mul( x[I], x[I], r[I] );
> >         _inline_sub( Z_1, r[I], r[I] );
> >         _inline_sqrt( r[I], r[I] );
> >         _inline_muli( r[I], r[I] );
> >         _inline_add( a, r[I], r[I] );
> >         _inline_log( r[I] , r[I] );
> >         _inline_muli( r[I], r[I] );
> >         _inline_neg( r[I], r[I]);
> >     }
> > }
> >
> > I haven't sat down and inspected whether the cmath versions get vectorized, 
> > but there's not a huge speed difference between NE2 and 3 for such a 
> > function on float (but their is for complex), so my suspicion is they 
> > aren't.  Another option would be to add a library such as Yeppp! as 
> > LIB_YEPPP or some other library that's faster than glib.  For example the 
> > glib function "fma(a,b,c)" is slower than doing "a*b+c" in NE3, and that's 
> > not how it should be.  Yeppp is also built with Python generating C code, 
> > so it could either be very easy or very hard.
> >
> > On bytes and unicode, I haven't seen examples for how people use it, so I'm 
> > not sure where to start. Since there's practically not a limitation on the 
> > number of operations now (the library is 1.3 MB now, compared to 1.2 MB for 
> > NE2 with gcc 5.4) the string functions could grow significantly from what 
> > we have in NE2.
> >
> > With regards to reductions, NumExpr never multi-threaded them, and could 
> > only do outer reductions, so in the end there was no speed advantage to be 
> > had compared to having NumPy do them on the result.  I suspect the primary 
> > value there was in PyTables and Pandas where the expression had to do 
> > everything.  One of the things I've moved away from in NE3 is doing output 
> > buffering (rather it pre-allocates the output array), so for reductions the 
> > understanding NumExpr has of broadcasting would have to be deeper.
> >
> > In any event contributions would certainly be welcome.
> >
> > Robert
> >
> --
> Robert McLeod, Ph.D.
> Center for Cellular Imaging and Nano Analytics (C-CINA)
> Biozentrum der Universität Basel
> Mattenstrasse 26, 4058 Basel
> Work: +41.061.387.3225
> robert.mcl...@unibas.ch
> robert.mcl...@bsse.ethz.ch
> robbmcl...@gmail.com
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Deprecating matrices.

2017-01-07 Thread Juan Nunez-Iglesias
Hi all! I've been lurking on this discussion, and don't have too much to add 
except to encourage a fast deprecation: I can't wait for sparse matrices to 
have an element-wise multiply operator.

On 7 Jan 2017, 7:52 PM +1100, Ralf Gommers , wrote:
>
>
> On Sat, Jan 7, 2017 at 9:39 PM, Nathaniel Smith  (mailto:n...@pobox.com)> wrote:
> > On Fri, Jan 6, 2017 at 11:59 PM, Ralf Gommers  > (mailto:ralf.gomm...@gmail.com)> wrote:
> > >
> > >
> > > On Sat, Jan 7, 2017 at 2:52 PM, Charles R Harris 
> > > 
> > > wrote:
> > >>
> > >>
> > >>
> > >> On Fri, Jan 6, 2017 at 6:37 PM,  > >> (mailto:josef.p...@gmail.com)> wrote:
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> On Fri, Jan 6, 2017 at 8:28 PM, Ralf Gommers  > >>> (mailto:ralf.gomm...@gmail.com)>
> > >>> wrote:
> > 
> > 
> > 
> >  On Sat, Jan 7, 2017 at 2:21 PM, CJ Carey  >  (mailto:perimosocord...@gmail.com)>
> >  wrote:
> > >
> > >
> > > On Fri, Jan 6, 2017 at 6:19 PM, Ralf Gommers  > > (mailto:ralf.gomm...@gmail.com)>
> > > wrote:
> > >>
> > >> This sounds like a reasonable idea. Timeline could be something like:
> > >>
> > >> 1. Now: create new package, deprecate np.matrix in docs.
> > >> 2. In say 1.5 years: start issuing visible deprecation warnings in
> > >> numpy
> > >> 3. After 2020: remove matrix from numpy.
> > >>
> > >> Ralf
> > >
> > >
> > > I think this sounds reasonable, and reminds me of the deliberate
> > > deprecation process taken for scipy.weave. I guess we'll see how 
> > > successful
> > > it was when 0.19 is released.
> > >
> > > The major problem I have with removing numpy matrices is the effect on
> > > scipy.sparse, which mostly-consistently mimics numpy.matrix semantics 
> > > and
> > > often produces numpy.matrix results when densifying. The two are 
> > > coupled
> > > tightly enough that if numpy matrices go away, all of the existing 
> > > sparse
> > > matrix classes will have to go at the same time.
> > >
> > > I don't think that would be the end of the world,
> > 
> > 
> >  Not the end of the world literally, but the impact would be pretty
> >  major. I think we're stuck with scipy.sparse, and may at some point 
> >  will add
> >  a new sparse *array* implementation next to it. For scipy we will have 
> >  to
> >  add a dependency on the new npmatrix package or vendor it.
> > >>>
> > >>>
> > >>> That sounds to me like moving maintenance of numpy.matrix from numpy to
> > >>> scipy, if scipy.sparse is one of the main users and still depends on it.
> > >
> > >
> > > Maintenance costs are pretty low, and are partly still for numpy (it has 
> > > to
> > > keep subclasses like np.matrix working. I'm not too worried about the
> > > effort. The purpose here is to remove np.matrix from numpy so beginners 
> > > will
> > > never see it. Educating sparse matrix users is a lot easier, and there 
> > > are a
> > > lot less such users.
> > >
> > >>
> > >> What I was thinking was encouraging folks to use `arr.dot(...)` or `@`
> > >> instead of `*` for matrix multiplication, keeping `*` for scalar
> > >> multiplication.
> > >
> > >
> > > I don't think that change in behavior of `*` is doable.
> >
> > I guess it would be technically possible to have matrix.__mul__ issue
> > a deprecation warning before matrix.__init__ does, to try and
> > encourage people to switch to using .dot and/or @, and thus make it
> > easier to later port their code to regular arrays?
>
> Yes, but that's not very relevant. I'm saying "not doable" since after the 
> debacle with changing diag return to a view my understanding is we decided 
> that it's a bad idea to make changes that don't break code but return 
> different numerical results. There's no good way to work around that here.
>
> With something as widely used as np.matrix, you simply cannot rely on people 
> porting code. You just need to phase out np.matrix in a way that breaks code 
> but never changes behavior silently (even across multiple releases).
>
> Ralf
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] how to name "contagious" keyword in np.ma.convolve

2016-10-14 Thread Juan Nunez-Iglesias
+1 for propagate_mask. That is the only proposal that immediately makes sense 
to me. "contagious" may be cute but I think approximately 0% of users would 
guess its purpose on first use.

Can you elaborate on what happens with the masks exactly? I didn't quite get 
why propagate_mask=False was unintuitive. My expectation is that any mask 
present in the input will not be set in the output, but the mask will be 
"respected" by the function.

On 15 Oct. 2016, 5:23 AM +1100, Allan Haldane , wrote:
> I think the possibilities that have been mentioned so far (here or in
> the PR) are:
>
> contagious
> contagious_mask
> propagate
> propagate_mask
> propagated
>
> `propogate_mask=False` seemed to imply that the mask would never be set,
> so Eric also suggested
> propagate_mask='any' or propagate_mask='all'
>
>
> I would be happy with 'propagated=False' as the name/default. As Eric
> pointed out, most MaskedArray functions like sum implicitly don't
> propagate, currently, so maybe we should do likewise here.
>
>
> Allan
>
> On 10/14/2016 01:44 PM, Benjamin Root wrote:
> > Why not "propagated"?
> >
> > On Fri, Oct 14, 2016 at 1:08 PM, Sebastian Berg
> > > wrote:
> >
> > On Fr, 2016-10-14 at 13:00 -0400, Allan Haldane wrote:
> > > Hi all,
> > >
> > > Eric Wieser has a PR which defines new functions np.ma.correlate and
> > > np.ma.convolve:
> > >
> > > https://github.com/numpy/numpy/pull/7922
> >  > >
> > > We're deciding how to name the keyword arg which determines whether
> > > masked elements are "propagated" in the convolution sums. Currently
> > > we
> > > are leaning towards calling it "contagious", with default of True:
> > >
> > > def convolve(a, v, mode='full', contagious=True):
> > >
> > > Any thoughts?
> > >
> >
> > Sounds a bit overly odd to me to be honest. Just brain storming, you
> > could think/name it the other way around maybe? Should the masked
> > values be considered as zero/ignored?
> >
> > - Sebastian
> >
> >
> > > Cheers,
> > > Allan
> > >
> > > ___
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion@scipy.org  > > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> >  > >
> >
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org  > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> >  >
> >
> >
> >
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy set_printoptions, silent failure, bug?

2016-07-19 Thread Juan Nunez-Iglesias
https://github.com/numpy/numpy/issues


From: John Ladasky  
Reply: Discussion of Numerical Python 

Date: 20 July 2016 at 7:49:10 AM
To: Discussion of Numerical Python 

Subject:  Re: [Numpy-discussion] Numpy set_printoptions, silent failure,
bug?

Hi Robert,
>
> Thanks for your reply.  If no one disagrees with you or with me that this
> is a Numpy bug, I would appreciate being directed to the appropriate page
> to submit a bug-fix request.
>
>
> On Tue, Jul 19, 2016 at 2:43 PM, Robert Kern 
> wrote:
>
>> On Tue, Jul 19, 2016 at 10:41 PM, John Ladasky  wrote:
>>
>> > Should this be considered a Numpy bug, or is there some reason that
>> set_printoptions would legitimately need to accept a dictionary as a single
>> argument?
>>
>> There is no such reason. One could certainly add more validation to the
>> arguments to np.set_printoptions().
>>
>> --
>> Robert Kern
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
>
> --
> *John J. Ladasky Jr., Ph.D.*
> *Research Scientist*
> *International Technological University*
> *2711 N. First St, San Jose, CA 95134 USA*
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Added atleast_nd, request for clarification/cleanup of atleast_3d

2016-07-06 Thread Juan Nunez-Iglesias
at_leastnd would be useful for nd image processing in a very analogous way
to how at_least2d is used by scikit-image, assuming it prepends. The
at_least3d choice is baffling, seems analogous to the 0.5-based indexing
presented at PyCon, and should be "fun" to deprecate. =P



On 6 July 2016 at 2:57:57 PM, Eric Firing (efir...@hawaii.edu) wrote:

On 2016/07/06 8:25 AM, Benjamin Root wrote:
> I wouldn't have the keyword be "where", as that collides with the notion
> of "where" elsewhere in numpy.

Agreed. Maybe "side"?

(I find atleast_1d and atleast_2d to be very helpful for handling
inputs, as Ben noted; I'm skeptical as to the value of atleast_3d and
atleast_nd.)

Eric
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Added atleast_nd, request for clarification/cleanup of atleast_3d

2016-07-06 Thread Juan Nunez-Iglesias
We use np.at_least2d extensively in scikit-image, and I also use it in a
*lot* of my own code now that scikit-learn stopped accepting 1D arrays as
feature vectors.

> what is the advantage of np.at_leastnd` over `np.array(a, copy=False,
ndim=n)`

Readability, clearly.

My only concern is the described behavior of np.at_least3d, which came as a
surprise. I certainly would expect the “at_least” family to all work in the
same way as broadcasting, ie prepending singleton dimensions.
Prepend/append behavior can be controlled either by keyword or simply by
using .T, I don’t mind either way.

Juan.

On 6 July 2016 at 10:22:15 AM, Marten van Kerkwijk (
m.h.vankerkw...@gmail.com) wrote:

Hi All,

I'm with Nathaniel here, in that I don't really see the point of these
routines in the first place: broadcasting takes care of many of the initial
use cases one might think of, and others are generally not all that well
served by them: the examples from scipy to me do not really support
`at_least?d`, but rather suggest that little thought has been put into
higher-dimensional objects which should be treated as stacks of row or
column vectors. My sense is that we're better off developing the direction
started with `matmul`, perhaps adding `matvecmul` etc.

More to the point of the initial inquiry: what is the advantage of having a
general `np.at_leastnd` routine over doing
```
np.array(a, copy=False, ndim=n)
```
or, for a list of inputs,
```
[np.array(a, copy=False, ndim=n) for a in input_list]
```

All the best,

Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Picking rows with the first (or last) occurrence of each key

2016-07-04 Thread Juan Nunez-Iglesias
On 4 July 2016 at 7:27:47 PM, Skip Montanaro (skip.montan...@gmail.com)
wrote:

Hashing it probably wouldn't work, too
great a chance for collisions.


If the string is ASCII, you can always interpret the bytes as part of an 8
byte integer. Or, you can map unique values to consecutive integers.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Picking rows with the first (or last) occurrence of each key

2016-07-04 Thread Juan Nunez-Iglesias
On 4 July 2016 at 7:38:48 PM, Skip Montanaro (skip.montan...@gmail.com)
wrote:

Oh, cool. Precisely the sort of solution I was hoping would turn up.


Except it doesn’t seem to meet your original spec, which retrieved the
first item of each *run* of an index value?
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Picking rows with the first (or last) occurrence of each key

2016-07-02 Thread Juan Nunez-Iglesias
Hey Skip,

Any way that you can make your keys numeric? Then you can run np.diff on
that first column, and use the indices of nonzero entries (np.flatnonzero)
to know where values change. With a +1/-1 offset (that I am too lazy to
figure out right now ;) you can then index into the original rows to get
either the first or last occurrence of each run.

Juan.



On 2 July 2016 at 10:10:16 PM, Skip Montanaro (skip.montan...@gmail.com)
wrote:

(I'm probably going to botch the description...)

Suppose I have a 2D array of Python objects, the first n elements of each
row form a key, the rest of the elements form the value. Each key can (and
generally does) occur multiple times. I'd like to generate a new array
consisting of just the first (or last) row for each key occurrence. Rows
retain their relative order on output.

For example, suppose I have this array with key length 2:

[ 'a', 27, 14.5 ]
[ 'b', 12, 99.0 ]
[ 'a', 27, 15.7 ]
[ 'a', 17, 100.3 ]
[ 'b', 12, -329.0 ]

Selecting the first occurrence of each key would return this array:

[ 'a', 27, 14.5 ]
[ 'b', 12, 99.0 ]
[ 'a', 17, 100.3 ]

while selecting the last occurrence would return this array:

[ 'a', 27, 15.7 ]
[ 'a', 17, 100.3 ]
[ 'b', 12, -329.0 ]

In real life, my array is a bit larger than this example, with the input
being on the order of a million rows, and the output being around 5000
rows. Avoiding processing all those extra rows at the Python level would
speed things up.

I don't know what this filter might be called (though I'm sure I haven't
thought of something new), so searching Google or Bing for it would seem to
be fruitless. It strikes me as something which numpy or Pandas might
already
have in their bag(s) of tricks.

Pointers appreciated,

Skip


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Integers to integer powers, let's make a decision

2016-06-10 Thread Juan Nunez-Iglesias
+1 to Alan's point. Having different type behaviour depending on the values
of x and y for np.arange(x) ** y would be awful, and it would also be awful
to have to worry about overflow here...

...

Having said that, it would be equally annoying to not have a way to define
integer powers...


From: Alan Isaac  
Reply: Discussion of Numerical Python 

Date: 10 June 2016 at 5:10:57 AM
To: Discussion of Numerical Python 

Subject:  Re: [Numpy-discussion] Integers to integer powers, let's make a
decision

On 6/10/2016 2:42 AM, Nathaniel Smith wrote:
>
> I dunno, with my user hat on I'd be incredibly surprised / confused /
> annoyed if an innocent-looking expression like
>
> np.arange(10) ** 2
>
> started returning floats... having exact ints is a really nice feature
> of Python/numpy as compared to R/Javascript, and while it's true that
> int64 can overflow, there are also large powers that can be more
> precisely represented as int64 than float.
>
>
>
> Is np.arange(10)**10 also "innocent looking" to a Python user?
>
> Also, I am confused by what "large powers" means in this context.
> Is 2**40 a "large power"?
>
> Finally, is np.arange(1,3)**-2 "innocent looking" to a Python user?
>
> Cheers,
> Alan
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Make np.bincount output same dtype as weights

2016-03-26 Thread Juan Nunez-Iglesias
Thanks for clarifying, Jaime, and fwiw I agree with Josef: I would expect
np.bincount to behave like np.sum with regards to promoting weights dtypes.
Including bool.

On Sun, Mar 27, 2016 at 1:58 PM,  wrote:

> On Sat, Mar 26, 2016 at 9:54 PM, Joseph Fox-Rabinovitz
>  wrote:
> > Would it make sense to just make the output type large enough to hold the
> > cumulative sum of the weights?
> >
> >
> > - Joseph Fox-Rabinovitz
> >
> > -- Original message--
> >
> > From: Jaime Fernández del Río
> >
> > Date: Sat, Mar 26, 2016 16:16
> >
> > To: Discussion of Numerical Python;
> >
> > Subject:[Numpy-discussion] Make np.bincount output same dtype as weights
> >
> > Hi all,
> >
> > I have just submitted a PR (#7464) that fixes an enhancement request
> > (#6854), making np.bincount return an array of the same type as the
> weights
> > parameter.  This is an important deviation from current behavior, which
> > always casts weights to double, and always returns a double array, so I
> > would like to hear what others think about the worthiness of this.  Main
> > discussion points:
> >
> > np.bincount now works with complex weights (yay!), I guess this should
> be a
> > pretty uncontroversial enhancement.
> > The return is of the same type as weights, which means that small
> integers
> > are very likely to overflow.  This is exactly what #6854 requested, but
> > perhaps we should promote the output for integers to a long, as we do in
> > np.sum?
>
> I always thought of bincount with weights just as a group-by sum. So
> it would be easier to remember and have fewer surprises if it matches
> the behavior of np.sum.
>
> > Boolean arrays stay boolean, and OR, rather than sum, the weights. Is
> this
> > what one would want? If we decide that integer promotion is the way to
> go,
> > perhaps booleans should go in the same pack?
>
> Isn't this calculating the sum, i.e. count of True by group, already?
> Based on a quick example with numpy 1.9.2, I don't think I ever used
> bool weights before.
>
>
> > This new implementation currently supports all of the reasonable native
> > types, but has no fallback for user defined types.  I guess we should
> > attempt to cast the array to double as before if no native loop can be
> > found? It would be good to have a way of testing this though, any
> thoughts
> > on how to go about this?
> > Does a behavior change like this require some deprecation period? What
> would
> > that look like?
> > I have also added broadcasting of weights to the full size of list, so
> that
> > one can do e.g. np.bincount([1, 2, 3], weights=2j) without having to tile
> > the single weight to the size of the bins list.
> >
> > Any other thoughts are very welcome as well!
>
> (2-D weights ?)
>
>
> Josef
>
>
> >
> > Jaime
> >
> > --
> > (__/)
> > ( O.o)
> > ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus
> planes de
> > dominación mundial.
> >
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Make np.bincount output same dtype as weights

2016-03-26 Thread Juan Nunez-Iglesias
Just to clarify, this will only affect weighted bincounts, right? I can't tell 
you in how many places my code depends on the return type being integer!!!


On 27 Mar 2016, 7:16 AM +1100, Jaime Fernández del Río, 
wrote:
> Hi all,
>  
> I have just submitted a PR (#7464(https://github.com/numpy/numpy/pull/7464)) 
> that fixes an enhancement request 
> (#6854(https://github.com/numpy/numpy/issues/6854)), makingnp.bincountreturn 
> an array of the same type as theweightsparameter.This is an important 
> deviation from current behavior, which always castsweightstodouble, and 
> always returns adoublearray, so I would like to hear what others think about 
> the worthiness of this.Main discussion points:
> np.bincountnow works with complex weights (yay!), I guess this should be a 
> pretty uncontroversial enhancement.
> The return is of the same type asweights, which means that small integers are 
> very likely to overflow.This is exactly what #6854 requested, but perhaps we 
> should promote the output for integers to along, as we do innp.sum?
> Boolean arrays stay boolean, and OR, rather than sum, the weights. Is this 
> what one would want? If we decide that integer promotion is the way to go, 
> perhaps booleans should go in the same pack?
> This new implementation currently supports all of the reasonable native 
> types, but has no fallback for user defined types.I guess we should attempt 
> to cast the array to double as before if no native loop can be found? It 
> would be good to have a way of testing this though, any thoughts on how to go 
> about this?
> Does a behavior change like this require some deprecation period? What would 
> that look like?
> I have also added broadcasting of weights to the full size of list, so that 
> one can do e.g.np.bincount([1, 2, 3], weights=2j)without having to tile the 
> single weight to the size of the bins list.
> Any other thoughts are very welcome as well!
>  
> Jaime
>  
> --
> (\__/)
> ( O.o)
> (><) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de 
> dominación mundial.___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] 100 numpy exercises (80/100)

2016-03-08 Thread Juan Nunez-Iglesias
Thanks for this fantastic resource, Nicolas! I also had never heard of
argpartition and immediately know of many places in my code where I can use
it. I also learned that axis= can take a tuple as an argument.

On Wed, Mar 9, 2016 at 7:18 AM, Nicolas P. Rougier  wrote:

>
> Hi all,
>
> I've just added some exercises to the collection at
> https://github.com/rougier/numpy-100
> (and in the process, I've discovered np.argpartition... nice!)
>
> If you have some ideas/comments/corrections... Still 20 to go...
>
>
>
> Nicolas
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] making "low" optional in numpy.randint

2016-02-17 Thread Juan Nunez-Iglesias
Ah! Touché! =) My last and admittedly weak defense is that I've been
writing numpy since before 1.7. =)

On Thu, Feb 18, 2016 at 11:08 AM, Alan Isaac <alan.is...@gmail.com> wrote:

> On 2/17/2016 7:01 PM, Juan Nunez-Iglesias wrote:
>
>> Notice the limitation "1D array-like".
>>
>
>
>
> http://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.random.choice.html
> "If an int, the random sample is generated as if a was np.arange(n)"
>
> hth,
>
> Alan Isaac
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] making "low" optional in numpy.randint

2016-02-17 Thread Juan Nunez-Iglesias
Notice the limitation "1D array-like".

On Thu, Feb 18, 2016 at 10:59 AM, Alan Isaac <alan.is...@gmail.com> wrote:

> On 2/17/2016 6:48 PM, Juan Nunez-Iglesias wrote:
>
>> Also fwiw, I think the 0-based, half-open interval is one of the best
>> features of Python indexing and yes, I do use random integers to index
>> into my arrays and would not appreciate having to litter my code with
>> "-1" everywhere.
>>
>
>
> http://docs.scipy.org/doc/numpy-1.10.0/reference/generated
> /numpy.random.choice.html
>
> fwiw,
> Alan Isaac
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] making "low" optional in numpy.randint

2016-02-17 Thread Juan Nunez-Iglesias
LOL "random integers" != "random_integers". =D

On Thu, Feb 18, 2016 at 10:52 AM, G Young <gfyoun...@gmail.com> wrote:

> Your statement is a little self-contradictory, but in any case, you
> shouldn't worry about random_integers getting removed from the code-base.
> However, it has been deprecated in favor of randint.
>
> On Wed, Feb 17, 2016 at 11:48 PM, Juan Nunez-Iglesias <jni.s...@gmail.com>
> wrote:
>
>> Also fwiw, I think the 0-based, half-open interval is one of the best
>> features of Python indexing and yes, I do use random integers to index into
>> my arrays and would not appreciate having to litter my code with "-1"
>> everywhere.
>>
>> On Thu, Feb 18, 2016 at 10:29 AM, Alan Isaac <alan.is...@gmail.com>
>> wrote:
>>
>>> On 2/17/2016 3:42 PM, Robert Kern wrote:
>>>
>>>> random.randint() was the one big exception, and it was considered a
>>>> mistake for that very reason, soft-deprecated in favor of
>>>> random.randrange().
>>>>
>>>
>>>
>>> randrange also has its detractors:
>>> https://code.activestate.com/lists/python-dev/138358/
>>> and following.
>>>
>>> I think if we start citing persistant conventions, the
>>> persistent convention across *many* languages that the bounds
>>> provided for a random integer range are inclusive also counts for
>>> something, especially when the names are essentially shared.
>>>
>>> But again, I am just trying to be clear about what is at issue,
>>> not push for a change.  I think citing non-existent standards
>>> is not helpful.  I think the discrepancy between the Python
>>> standard library and numpy for a function going by a common
>>> name is harmful.  (But then, I teach.)
>>>
>>> fwiw,
>>>
>>> Alan
>>>
>>>
>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@scipy.org
>>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] making "low" optional in numpy.randint

2016-02-17 Thread Juan Nunez-Iglesias
Also fwiw, I think the 0-based, half-open interval is one of the best
features of Python indexing and yes, I do use random integers to index into
my arrays and would not appreciate having to litter my code with "-1"
everywhere.

On Thu, Feb 18, 2016 at 10:29 AM, Alan Isaac  wrote:

> On 2/17/2016 3:42 PM, Robert Kern wrote:
>
>> random.randint() was the one big exception, and it was considered a
>> mistake for that very reason, soft-deprecated in favor of
>> random.randrange().
>>
>
>
> randrange also has its detractors:
> https://code.activestate.com/lists/python-dev/138358/
> and following.
>
> I think if we start citing persistant conventions, the
> persistent convention across *many* languages that the bounds
> provided for a random integer range are inclusive also counts for
> something, especially when the names are essentially shared.
>
> But again, I am just trying to be clear about what is at issue,
> not push for a change.  I think citing non-existent standards
> is not helpful.  I think the discrepancy between the Python
> standard library and numpy for a function going by a common
> name is harmful.  (But then, I teach.)
>
> fwiw,
>
> Alan
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] julia - Multidimensional algorithms and iteration

2016-02-02 Thread Juan Nunez-Iglesias
Nice. I particularly liked that indices are just arrays, so you can do
array arithmetic on them. I spend a lot of time converting
tuples-to-array-to-tuples. If I understand correctly, indexing-with-arrays
is overloaded in NumPy so the tuple syntax isn't going away any time soon,
is it?

On Wed, Feb 3, 2016 at 2:33 AM, Neal Becker  wrote:

> http://julialang.org/blog/2016/02/iteration/
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Make as_strided result writeonly

2016-01-25 Thread Juan Nunez-Iglesias
I agree that it's not ideal that the return value of as_strided is
writable. However, to be clear, this *would* break the API, which should
not happen between minor releases when using semantic versioning. Even with
a deprecation cycle, for libraries such as scikit-image that want to
maintain broad compatibility with multiple numpy versions, we would then
have to have some code to detect which version of numpy we're dealing with,
and do something different depending on the version. That's a big
development cost for something that has not been shown to cause any
problems.

And btw, although some people might use as_strided that aren't
super-amazing level 42 programmers, I would say that by that stage they are
probably comfortable enough to troubleshoot the shitstorm that's about to
fall on them. =P

On Tue, Jan 26, 2016 at 4:25 AM, Sturla Molden 
wrote:

> On 25/01/16 18:06, Sebastian Berg wrote:
>
> That said, I guess I could agree with you in the regard that there are
>> so many *other* awful ways to use as_strided, that maybe it really is
>> just so bad, that improving one thing doesn't actually help anyway ;).
>>
>
> That is roughly my position on this, yes. :)
>
>
> Sturla
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Make as_strided result writeonly

2016-01-24 Thread Juan Nunez-Iglesias
> Yeah, that is a real use case. I am not planning to remove the option,
> but it would be as a `readonly` keyword argument, which means you would
> need to make the code depend on the numpy version if you require a
> writable array [1].
>
> [1] as_strided does not currently support arr.flags.writable = True for
its result array.

Can you explain this in more detail? I'm writing to the result without
problem. Anyway, it only occurred to me after your response that the
deprecation path would be quite disruptive to downstream libraries,
including scikit-image. My feeling is that unless someone has been bitten
by this (?), the benefit does not outweigh the cost of deprecation. Perhaps
something to push to 2.0?

On Sun, Jan 24, 2016 at 8:17 PM, Sebastian Berg <sebast...@sipsolutions.net>
wrote:

> On So, 2016-01-24 at 13:00 +1100, Juan Nunez-Iglesias wrote:
> > I've used as_strided before to create an "endless" output array when
> > I didn't care about the result of an operation, just the side effect.
> > See eg here. So I would certainly like option to remain to get a
> > writeable array. In general, I'm sceptical about whether the benefits
> > outweigh the costs.
>
> Yeah, that is a real use case. I am not planning to remove the option,
> but it would be as a `readonly` keyword argument, which means you would
> need to make the code depend on the numpy version if you require a
> writable array [1].
> This actually somewhat defeats the purpose of all of this, but
> `np.ndarray` can do this dummy thing for you I think, so you could get
> around that, but
>
> The purpose is that if you actually would use an as_strided array in
> your operation, the result is unpredictable (not just complicated). And
> while as_strided is IMO designed to be used by people who know what
> they are doing, I have a feeling it is being used quite a lot in
> general.
>
> We did a similar thing for the new `broadcast_to`, though I think there
> we decided to skip the readonly until complains happen.
>
> Actually there is one more thing I might do. And that is issue a
> UserWarning when new array quite likely points to invalid memory.
>
> - Sebastian
>
>
> [1] as_strided does not currently support arr.flags.writable = True for
> its result array.
>
>
> > On Sun, Jan 24, 2016 at 9:20 AM, Nathaniel Smith <n...@pobox.com>
> > wrote:
> > > On Sat, Jan 23, 2016 at 1:25 PM, Sebastian Berg
> > > <sebast...@sipsolutions.net> wrote:
> > > >
> > > > Hi all,
> > > >
> > > > I have just opened a PR, to make as_strided writeonly (as
> > > default). The
> > >
> > > I think you meant readonly :-)
> > >
> > > > reasoning for this change is that an `as_strided` array often
> > > have self
> > > > overlapping memory. However, writing to an array where multiple
> > > > elements have the identical memory address can be confusing, and
> > > the
> > > > results are typically unpredictable.
> > > >
> > > > Considering the danger, the proposal is to add a `readonly=True`.
> > > A
> > > > poweruser (who that function is designed for anyway), could thus
> > > still
> > > > get a writeable array.
> > > >
> > > > For the moment, writing to the result would raise a FutureWarning
> > > with
> > > > `readonly="warn"`.
> > >
> > > This should just be a deprecation warning, right? (Because
> > > switching
> > > an array from writeable->readonly might cause previously correct
> > > code
> > > to error out, but not to silently start returning different
> > > results.)
> > >
> > > > Do you agree with this, or would it be a major inconvenience?
> > >
> > > AFAIK the only use cases for as_strided involve self-overlap (for
> > > non-self-overlap you can generally use reshape / indexing / etc.
> > > and
> > > it's much simpler). So +1 from me.
> > >
> > > -n
> > >
> > > --
> > > Nathaniel J. Smith -- https://vorpus.org
> > > ___
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion@scipy.org
> > > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> > >
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Make as_strided result writeonly

2016-01-23 Thread Juan Nunez-Iglesias
I've used as_strided before to create an "endless" output array when I
didn't care about the result of an operation, just the side effect. See eg
here
.
So I would certainly like option to remain to get a writeable array. In
general, I'm sceptical about whether the benefits outweigh the costs.

On Sun, Jan 24, 2016 at 9:20 AM, Nathaniel Smith  wrote:

> On Sat, Jan 23, 2016 at 1:25 PM, Sebastian Berg
>  wrote:
> >
> > Hi all,
> >
> > I have just opened a PR, to make as_strided writeonly (as default). The
>
> I think you meant readonly :-)
>
> > reasoning for this change is that an `as_strided` array often have self
> > overlapping memory. However, writing to an array where multiple
> > elements have the identical memory address can be confusing, and the
> > results are typically unpredictable.
> >
> > Considering the danger, the proposal is to add a `readonly=True`. A
> > poweruser (who that function is designed for anyway), could thus still
> > get a writeable array.
> >
> > For the moment, writing to the result would raise a FutureWarning with
> > `readonly="warn"`.
>
> This should just be a deprecation warning, right? (Because switching
> an array from writeable->readonly might cause previously correct code
> to error out, but not to silently start returning different results.)
>
> > Do you agree with this, or would it be a major inconvenience?
>
> AFAIK the only use cases for as_strided involve self-overlap (for
> non-self-overlap you can generally use reshape / indexing / etc. and
> it's much simpler). So +1 from me.
>
> -n
>
> --
> Nathaniel J. Smith -- https://vorpus.org
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] deprecating assignment to ndarray.data

2016-01-21 Thread Juan Nunez-Iglesias
Does this apply in any way to the .data attribute in scipy.sparse matrices?
I fiddle with that quite often!

On Fri, Jan 22, 2016 at 11:21 AM, Nathaniel Smith  wrote:

> So it turns out that ndarray.data supports assignment at the Python
> level, and what it does is just assign to the ->data field of the
> ndarray object:
>
> https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/getset.c#L325
>
> This kind of assignment been deprecated at the C level since 1.7, and
> is totally unsafe -- if there are any views pointing to the array when
> this happens, then they'll be left pointing off into unallocated
> memory.
>
> E.g.:
>
> a = np.arange(10)
> b = np.linspace(0, 1, 10)
> c = a.view()
> a.data = b.data
> # Now c points into free'd memory
>
> Can we deprecate or just remove this?
>
> (Also filed issue: https://github.com/numpy/numpy/issues/7093)
>
> -n
>
> --
> Nathaniel J. Smith -- https://vorpus.org
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Why does np.repeat build a full array?

2015-12-15 Thread Juan Nunez-Iglesias
On Tue, Dec 15, 2015 at 8:29 PM, Sebastian Berg 
wrote:

> Actually, your particular use-case is covered by the new `broadcast_to`
> function.
>

So it is! Fascinating, thanks for pointing that out! =)
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Why does np.repeat build a full array?

2015-12-14 Thread Juan Nunez-Iglesias
Hi,

I've recently been using the following pattern to create arrays of a
specific repeating value:

from numpy.lib.stride_tricks import as_strided
value = np.ones((1,), dtype=float)
arr = as_strided(value, shape=input_array.shape, strides=(0,))

I can then use arr e.g. to count certain pairs of elements using
sparse.coo_matrix. It occurred to me that numpy might have a similar
function, and found np.repeat. But it seems that repeat actually creates
the full, replicated array, rather than using stride tricks to keep it
small. Is there any reason for this?

Thanks!

Juan.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] FeatureRequest: support for array construction from iterators

2015-12-12 Thread Juan Nunez-Iglesias
Hey Nathaniel,

Fascinating! Thanks for the primer! I didn't know that it would check dtype
of values in the whole array. In that case, I would agree that it would be
bad to infer it magically from just the first value, and this can be left
to the users.

Thanks!

Juan.

On Sat, Dec 12, 2015 at 7:00 PM, Nathaniel Smith <n...@pobox.com> wrote:

> On Fri, Dec 11, 2015 at 11:32 PM, Juan Nunez-Iglesias
> <jni.s...@gmail.com> wrote:
> > Nathaniel,
> >
> >> IMO this is better than making np.array(iter) internally call list(iter)
> >> or equivalent
> >
> > Yeah but that's not the only option:
> >
> > from itertools import chain
> > def fromiter_awesome_edition(iterable):
> > elem = next(iterable)
> > dtype = whatever_numpy_does_to_infer_dtypes_from_lists(elem)
> > return np.fromiter(chain([elem], iterable), dtype=dtype)
> >
> > I think this would be a huge win for usability. Always getting tripped
> up by
> > the dtype requirement. I can submit a PR if people like this pattern.
>
> This isn't the semantics of np.array, though -- np.array will look at
> the whole input and try to find a common dtype, so this can't be the
> implementation for np.array(iter). E.g. try np.array([1, 1.0])
>
> I can see an argument for making the dtype= argument to fromiter
> optional, with a warning in the docs that it will guess based on the
> first element and that you should specify it if you don't want that.
> It seems potentially a bit error prone (in the sense that it might
> make it easier to end up with code that works great when you test it
> but then breaks later when something unexpected happens), but maybe
> the usability outweighs that. I don't use fromiter myself so I don't
> have a strong opinion.
>
> > btw, I think np.array(['f', 'o', 'o']) would be exactly the expected
> result
> > for np.array('foo'), but I guess that's just me.
>
> In general np.array(thing_that_can_go_inside_an_array) returns a
> zero-dimensional (scalar) array -- np.array(1), np.array(True), etc.
> all work like this, so I'd expect np.array("foo") to do the same.
>
> -n
>
> --
> Nathaniel J. Smith -- http://vorpus.org
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] FeatureRequest: support for array construction from iterators

2015-12-11 Thread Juan Nunez-Iglesias
Nathaniel,

> IMO this is better than making np.array(iter) internally call list(iter)
or equivalent

Yeah but that's not the only option:

from itertools import chain
def fromiter_awesome_edition(iterable):
elem = next(iterable)
dtype = whatever_numpy_does_to_infer_dtypes_from_lists(elem)
return np.fromiter(chain([elem], iterable), dtype=dtype)

I think this would be a huge win for usability. Always getting tripped up
by the dtype requirement. I can submit a PR if people like this pattern.

btw, I think np.array(['f', 'o', 'o']) would be exactly the expected result
for np.array('foo'), but I guess that's just me.

Juan.

On Sat, Dec 12, 2015 at 10:12 AM, Nathaniel Smith  wrote:

> Constructing an array from an iterator is fundamentally different from
> constructing an array from an in-memory data structure like a list,
> because in the iterator case it's necessary to either use a
> single-pass algorithm or else create extra temporary buffers that
> cause much higher memory overhead. (Which is undesirable given that
> iterators are mostly used exactly in the case where one wants to
> reduce memory overhead.)
>
> np.fromiter requires the dtype= argument because this is necessary if
> you want to construct the array in a single pass.
>
> np.array(list(iter)) can avoid the dtype argument, because it creates
> that large memory buffer. IMO this is better than making
> np.array(iter) internally call list(iter) or equivalent, because the
> workaround (adding an explicit call to list()) is trivial, while also
> making it obvious to the user what the actual cost of their request
> is. (Explicit is better than implicit.)
>
> In addition, the proposed API has a number of infelicities:
> - We're generally trying to *reduce* the magic in functions like
> np.array (e.g. the discussions of having less magic for lists with
> mismatched numbers of elements, or non-list sequences)
> - There's a strong convention in Python is when making a function like
> np.array generic, it should accept any iter*able* rather any
> iter*ator*. But it would be super confusing if np.array({1: 2})
> returned array([1]), or if array("foo") returned array(["f", "o",
> "o"]), so we don't actually want to handle all iterables the same.
> It's somewhat dubious even for iterators (e.g. someone might want to
> create an object array containing an iterator...)...
>
> hope that helps,
> -n
>
> On Fri, Dec 11, 2015 at 2:27 PM, Stephan Sahm  wrote:
> > numpy.fromiter is neither numpy.array nor does it work similar to
> > numpy.array(list(...)) as the dtype argument is necessary
> >
> > is there a reason, why np.array(...) should not work on iterators? I have
> > the feeling that such requests get (repeatedly) dismissed, but until yet
> I
> > haven't found a compelling argument for leaving this Feature missing (to
> > remember, it is already implemented in a branch)
> >
> > Please let me know if you know about an argument,
> > best,
> > Stephan
> >
> > On 27 November 2015 at 14:18, Alan G Isaac  wrote:
> >>
> >> On 11/27/2015 5:37 AM, Stephan Sahm wrote:
> >>>
> >>> I like to request a generator/iterator support for np.array(...) as far
> >>> as list(...) supports it.
> >>
> >>
> >>
> >> http://docs.scipy.org/doc/numpy/reference/generated/numpy.fromiter.html
> >>
> >> hth,
> >> Alan Isaac
> >> ___
> >> NumPy-Discussion mailing list
> >> NumPy-Discussion@scipy.org
> >> https://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> >
> >
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
>
>
>
> --
> Nathaniel J. Smith -- http://vorpus.org
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal for a new function: np.moveaxis

2015-11-05 Thread Juan Nunez-Iglesias
I'm just a lowly user, but I'm a fan of this. +1!

On Thu, Nov 5, 2015 at 6:42 PM, Stephan Hoyer  wrote:

> I've put up a pull request implementing a new function, np.moveaxis, as an
> alternative to np.transpose and np.rollaxis:
> https://github.com/numpy/numpy/pull/6630
> This functionality has been discussed (even the exact function name)
> several times over the years, but it never made it into a pull request. The
> most pressing issue is that the behavior of np.rollaxis is not intuitive to
> most users:
> https://mail.scipy.org/pipermail/numpy-discussion/2010-September/052882.html
> https://github.com/numpy/numpy/issues/2039
> http://stackoverflow.com/questions/29891583/reason-why-numpy-rollaxis-is-so-confusing
> In this pull request, I also allow the source and destination axes to be
> sequences as well as scalars. This does not add much complexity to the
> code, solves some additional use cases and makes np.moveaxis a proper
> generalization of the other axes manipulation routines (see the pull
> requests for details).
> Best of all, it already works on ndarray duck types (like masked array and
> dask.array), because they have already implemented transpose.
> I think np.moveaxis would be a useful addition to NumPy -- I've found
> myself writing helper functions with a subset of its functionality several
> times over the past few years. What do you think?
> Cheers,
> Stephan___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal: stop supporting 'setup.py install'; start requiring 'pip install .' instead

2015-10-28 Thread Juan Nunez-Iglesias
Thanks, Jerome! I’ve added it to my to-watch list. It sounds really useful!




Juan.

On Wed, Oct 28, 2015 at 6:36 PM, Jerome Kieffer <jerome.kief...@esrf.fr>
wrote:

> On Tue, 27 Oct 2015 15:35:50 -0700 (PDT)
> "Juan Nunez-Iglesias" <jni.s...@gmail.com> wrote:
>> Can someone here who understands more about distribution maybe write a blog 
>> post detailing:
> Hi,
> Olivier Grisel from sklearn gave a very good talk on this topic at PyCon, 
> earlier
> this year:
> http://www.pyvideo.org/video/3473/build-and-test-wheel-packages-on-linux-osx-win
> Very instructive.
> -- 
> Jérôme Kieffer
> tel +33 476 882 445
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal: stop supporting 'setup.py install'; start requiring 'pip install .' instead

2015-10-27 Thread Juan Nunez-Iglesias
Is there a pip equivalent of "python setup.py develop"?

On Tue, Oct 27, 2015 at 5:33 PM Charles R Harris 
wrote:

> On Tue, Oct 27, 2015 at 12:08 AM, Nathaniel Smith  wrote:
>
>> On Mon, Oct 26, 2015 at 11:03 PM, Charles R Harris
>>  wrote:
>> >
>> [...]
>> > I gave it a shot the other day. Pip keeps a record of the path to the
>> repo
>> > and in order to cleanup I needed to search out the file and delete the
>> repo
>> > path. There is probably a better way to do that, but it didn't strike
>> me as
>> > less troublesome than ` python setup.py install --local`.
>>
>> Sorry, what did you "give a shot", and what problem did it create?
>> What does `setup.py install --local` do? (it doesn't seem to be
>> mentioned in `setup.py install --help`.)
>>
>
>  `pip install --user -e . `. However, `pip install --user .` seems to work
> fine. The pip documentation isn't the best.
>
> Yeah, `--user` not `--local`. It's getting late...
>
> Chuck
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal: stop supporting 'setup.py install'; start requiring 'pip install .' instead

2015-10-27 Thread Juan Nunez-Iglesias
Can someone here who understands more about distribution maybe write a blog 
post detailing:




- why these setup.py commands are bad

- which alternative corresponds to each command and why it's better

- where to find information about this




For example, I had never heard of "twine", and parenthetical statements such as 
"setup.py upload (which is broken and should never be used)" are useless to 
those who don't know this and useless to those who do.





I understand that this is an "internal" discussion, but it's nice if those 
following to learn can get quick pointers. Since there is a *ton* of material 
online telling us *to use* python setup.py install, all the time, it would be 
extremely helpful for the community if discussions such as this one helped to 
bubble up the Right Way of doing Python packaging and distribution.




Thanks,




Juan.

On Wed, Oct 28, 2015 at 9:16 AM, Ralf Gommers 
wrote:

> On Tue, Oct 27, 2015 at 8:19 AM, Ralf Gommers 
> wrote:
> Updating this list for comments made after I sent it and now that I've
> looked in more detail at what the less common commands do:
>> So if/when we accept the proposal in this thread, I'm thinking we should
>> make a bunch of changes at once:
>> - always use setuptools (this is a new dependency)
>> - error on ``python setup.py install``
>>
> (removed the item about setup_requires, relevant for scipy but not numpy)
>> - error on ``python setup.py clean`` (saying "use `git clean -xdf` (or
>> -Xdf ...) instead")
>> - change ``python setup.py --help`` to first show numpy-specific stuff
>> before setuptools help info
>> - update all our install docs
>>
> - error on ``python setup.py upload`` (saying "use `twine upload -s`
> instead")
> - error on ``python setup.py upload_docs``
> - error on ``python setup.py easy_install`` (I'm not joking, that exists)
> - error on ``python setup.py test`` (saying "use `python runtests.py`
> instead")
> - remove setupegg.py
> Ralf
> And when "pip upgrade" is released (should be soon, see
>> https://github.com/pypa/pip/pull/3194), officially change our mind and
>> recommend the use of install_requires/setup_requires to packages depending
>> on numpy.
>>
>>___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal: stop supporting 'setup.py install'; start requiring 'pip install .' instead

2015-10-27 Thread Juan Nunez-Iglesias
Thanks Ralf! The pointer to Python Packaging User Guide is already gold! (But a 
wider discussion e.g. in the NumPy repo, mirroring the docstring conventions, 
would also be good!)

On Wed, Oct 28, 2015 at 10:02 AM, Ralf Gommers <ralf.gomm...@gmail.com>
wrote:

> On Tue, Oct 27, 2015 at 11:35 PM, Juan Nunez-Iglesias <jni.s...@gmail.com>
> wrote:
>> Can someone here who understands more about distribution maybe write a
>> blog post detailing:
>>
>> - why these setup.py commands are bad
>> - which alternative corresponds to each command and why it's better
>> - where to find information about this
>>
> Good question. Not that I have a blog, but I can try to write something a
> bit longer the coming weekend.
>>
>> For example, I had never heard of "twine", and parenthetical statements
>> such as "setup.py upload (which is broken and should never be used)" are
>> useless to those who don't know this and useless to those who do.
>>
> IIRC `setup.py upload` sends passwords over plain http. I've also seen it
> do weird things like change one's own PyPi rights from maintainer to owner.
> The most comprehensive overview of all this stuff is
> https://packaging.python.org/en/latest/, which starts with tool
> recommendations. Twine is one of the first things mentioned.
> Ralf
>> I understand that this is an "internal" discussion, but it's nice if those
>> following to learn can get quick pointers. Since there is a *ton* of
>> material online telling us *to use* python setup.py install, all the time,
>> it would be extremely helpful for the community if discussions such as this
>> one helped to bubble up the Right Way of doing Python packaging and
>> distribution.
>>
>> Thanks,
>>
>> Juan.
>>
>>
>>
>>
>>
>> On Wed, Oct 28, 2015 at 9:16 AM, Ralf Gommers <ralf.gomm...@gmail.com>
>> wrote:
>>
>>>
>>>
>>> On Tue, Oct 27, 2015 at 8:19 AM, Ralf Gommers <ralf.gomm...@gmail.com>
>>> wrote:
>>>
>>> Updating this list for comments made after I sent it and now that I've
>>> looked in more detail at what the less common commands do:
>>>
>>>
>>>> So if/when we accept the proposal in this thread, I'm thinking we should
>>>> make a bunch of changes at once:
>>>> - always use setuptools (this is a new dependency)
>>>> - error on ``python setup.py install``
>>>>
>>>
>>> (removed the item about setup_requires, relevant for scipy but not numpy)
>>>
>>>
>>>> - error on ``python setup.py clean`` (saying "use `git clean -xdf` (or
>>>> -Xdf ...) instead")
>>>> - change ``python setup.py --help`` to first show numpy-specific stuff
>>>> before setuptools help info
>>>> - update all our install docs
>>>>
>>>
>>> - error on ``python setup.py upload`` (saying "use `twine upload -s`
>>> instead")
>>> - error on ``python setup.py upload_docs``
>>> - error on ``python setup.py easy_install`` (I'm not joking, that exists)
>>> - error on ``python setup.py test`` (saying "use `python runtests.py`
>>> instead")
>>> - remove setupegg.py
>>>
>>> Ralf
>>>
>>> And when "pip upgrade" is released (should be soon, see
>>>> https://github.com/pypa/pip/pull/3194), officially change our mind and
>>>> recommend the use of install_requires/setup_requires to packages depending
>>>> on numpy.
>>>>
>>>>
>>>
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Nansum function behavior

2015-10-24 Thread Juan Nunez-Iglesias
Hi Charles,


Just providing an outsider's perspective...




Your specific use-case doesn't address the general definition of nansum: 
perform a sum while ignoring nans. As others have pointed out, (especially in 
the linked thread) the sum of nothing is 0. Although the current behaviour of 
nansum doesn't quite match your use-case, there is no doubt at all that it 
follows a consistent convention. "Wrong" is certainly not the correct way to 
describe it.




You can easily cater to your use case as follows:




def rilhac_nansum(ar, axis=None):

    if axis is None:

        return np.nanmean(ar)

    else:

        return np.nanmean(ar, axis=axis) * ar.shape[axis]




nanmean _consistently_ returns nans when encountering nan-only values because 
the mean of nothing is nan (the sum of nothing divided by the length of 
nothing, ie 0/0).




Hope this helps...




Juan.

On Sat, Oct 24, 2015 at 12:44 PM, Charles Rilhac 
wrote:

> I saw this thread and I totally disagree with thouis argument…
> Of course, you can have NaN if there are only NaNs. Thanks goodness, There is 
> a lot of way to do that. 
> But it’s not convenient, consistent and above all, it is wrong logically to 
> do that. NaN does not mean zeros and operation with NaN only cannot return a 
> figure…
> You lose information about your array. It is easier to fill the result of 
> nansum with zeros than to keep a mask of your orignal array or whatever you 
> do.
> Why it’s misleading ? 
> For example you want to sum rows of a array and mean the result :
> a = np.array([[2,np.nan,4], [np.nan,np.nan, np.nan]])
> b = np.nansum(a, axis=1) # array([ 6.,  0.])
> m = np.nanmean(b) # 3.0 WRONG because you wanted to get 6
>> On 24 Oct 2015, at 09:28, Stephan Hoyer  wrote:
>> 
>> Hi Charles,
>> 
>> You should read the previous discussion about this issue on GitHub:
>> https://github.com/numpy/numpy/issues/1721
>> 
>> For what it's worth, I do think the new definition of nansum is more 
>> consistent.
>> 
>> If you want to preserve NaN if there are no non-NaN values, you can often 
>> calculate this desired quantity from nanmean, which does return NaN if there 
>> are only NaNs.
>> 
>> Stephan
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fwd: Numpy for data manipulation

2015-10-01 Thread Juan Nunez-Iglesias
It will still have to a nice png, but you get an interactive figure when it is 
live.



You just blew my mind. =D




+1 to Python 3 and aliasing numpy as np.___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Change default order to Fortran order

2015-08-02 Thread Juan Nunez-Iglesias
Hi Kang,

Feel free to come chat about your application on the scikit-image list [1]!
I'll note that we've been through the array order discussion many times
there and even have a doc page about it [2].

The short version is that you'll save yourself a lot of pain by starting to
think of your images as (plane, row, column) instead of (x, y, z). The
syntax actually becomes friendlier too. For example, to do something to
each slice of data, you do:

for plane in image:
plane += foo

instead of

for z in image.shape[2]:
image[:, :, z] += foo

for example.

Juan.

[1] scikit-im...@googlegroups.com
[2]
http://scikit-image.org/docs/dev/user_guide/numpy_images.html#coordinate-conventions

PS: As to the renamed Fortran-ordered numpy, may I suggest funpy. The F
is for Fortran and the fun is for all the fun you'll have maintaining it. =P

On Mon, 3 Aug 2015 at 6:28 am Daniel Sank sank.dan...@gmail.com wrote:

 Kang,

 Thank you for explaining your motivation. It's clear from your last note,
 as you said, that your desire for column-first indexing has nothing to do
 with in-memory data layout. That being the case, I strongly urge you to
 just use bare numpy and do not use the fortran_zeros function I
 recommended before. Changing the in-memory layout via the order keyword
 in numpy.zeros will not change the way indexing works at all. You gain
 absolutely nothing by changing the in-memory order unless you are writing
 some C or Fortran code which will interact with the data in memory.

 To see what I mean, consider the following examples:

 x = np.array([1, 2, 3], [4, 5, 6]])
 x.shape
  (2, 3)

 and

 x = np.array([1, 2, 3], [4, 5, 6]], order='F')
 x.shape
  (2, 3)

 You see that changing the in-memory order has nothing whatsoever to do
 with the array's shape or how you access it.

  You will see run time error. Depending on environment, you may get
 useful error message
  (i.e. index out of range), but sometimes you just get bad image results.

 Could you give a very simple example of what you mean? I can't think of
 how this could ever happen and your fear here makes me think there's a
 fundamental misunderstanding about how array operations in numpy and other
 programming languages work. As an example, iteration in numpy goes through
 the first index:

 x = np.array([[1, 2, 3], [4, 5, 6]])
 for foo in x:
 ...

 Inside the for loop, foo takes on the values [1, 2, 3] on the first
 iteration and [4, 5, 6] on the second. If you want to iterate through the
 columns just do this instead

 x = np.array([[1, 2, 3], [4, 5, 6]])
 for foo in x.T:
 ...

 If your complaint is that you want np.array([[1, 2, 3], [4, 5, 6]]) to
 produce an array with shape (3, 2) then you should own up to the fact that
 the array constructor expects it the other way around and do this

 x = np.array([[1, 2, 3], [4, 5, 6]]).T

 instead. This is infinity times better than trying to write a shim
 function or patch numpy because with .T you're using (fast) built-in
 functionality which other people your code will understand.

 The real message here is that whether the first index runs over rows or
 columns is actually meaningless. The only places the row versus column
 issue has any meaning is when doing input/output (in which case you should
 use the transpose if you actually need it), or when doing iteration. One
 thing that would make sense if you're reading from a binary file format
 which uses column-major format would be to write your own reader function:

 def read_fortran_style_binary_file(file):
 return np.fromfile(file).T

 Note that if you do this then you already have a column major array in
 numpy and you don't have to worry about any other transposes (except,
 again, when doing more I/O or passing to something like a plotting
 function).




 On Sun, Aug 2, 2015 at 7:16 PM, Kang Wang kwan...@wisc.edu wrote:

 Thank you all for replying and providing useful insights and suggestions.

 The reasons I really want to use column-major are:

- I am image-oriented user (not matrix-oriented, as explained in

 http://docs.scipy.org/doc/numpy/reference/internals.html#multidimensional-array-indexing-order-issues
)
- I am so used to read/write I(x, y, z) in textbook and code, and
it is very likely that if the environment (row-major environment) forces 
 me
to write I(z, y, x),  I will write a bug if I am not 100% focused. When
this happens, it is difficult to debug, because everything compile and
build fine. You will see run time error. Depending on environment, you may
get useful error message (i.e. index out of range), but sometimes you just
get bad image results.
- It actually has not too much to do with the actual data layout in
memory. In imaging processing, especially medical imaging where I am
working in, if you have a 3D image, everyone will agree that in memory, 
 the
X index is the fasted changing index, and the Z dimension (we often call 
 it
the slice