Re: [Numpy-discussion] Question about numpy.random.choice with probabilties

2017-01-23 Thread Robert Kern
On Mon, Jan 23, 2017 at 9:41 AM, Nadav Har'El <n...@scylladb.com> wrote:
>
> On Mon, Jan 23, 2017 at 4:52 PM, aleba...@gmail.com <aleba...@gmail.com>
wrote:
>>
>> 2017-01-23 15:33 GMT+01:00 Robert Kern <robert.k...@gmail.com>:
>>>
>>> I don't object to some Notes, but I would probably phrase it more like
we are providing the standard definition of the jargon term "sampling
without replacement" in the case of non-uniform probabilities. To my mind
(or more accurately, with my background), "replace=False" obviously picks
out the implemented procedure, and I would have been incredibly surprised
if it did anything else. If the option were named "unique=True", then I
would have needed some more documentation to let me know exactly how it was
implemented.
>>>
>> FWIW, I totally agree with Robert
>
> With my own background (MSc. in Mathematics), I agree that this algorithm
is indeed the most natural one. And as I said, when I wanted to implement
something myself when I wanted to choose random combinations (k out of n
items), I wrote exactly the same one. But when it didn't produce the
desired probabilities (even in cases where I knew that doing this was
possible), I wrongly assumed numpy would do things differently - only to
realize it uses exactly the same algorithm. So clearly, the documentation
didn't quite explain what it does or doesn't do.

In my experience, I have seen "without replacement" mean only one thing. If
the docstring had said "returns unique items", I'd agree that it doesn't
explain what it does or doesn't do. The only issue is that "without
replacement" is jargon, and it is good to recapitulate the definitions of
such terms for those who aren't familiar with them.

> Also, Robert, I'm curious: beyond explaining why the existing algorithm
is reasonable (which I agree), could you give me an example of where it is
actually  *useful* for sampling?

The references I previously quoted list a few. One is called "multistage
sampling proportional to size". The idea being that you draw (without
replacement) from a larger units (say, congressional districts) before
sampling within them. It is similar to the situation you outline, but it is
probably more useful at a different scale, like lots of larger units (where
your algorithm is likely to provide no solution) rather than a handful.

It is probably less useful in terms of survey design, where you are trying
to *design* a process to get a result, than it is in queueing theory and
related fields, where you are trying to *describe* and simulate a process
that is pre-defined.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Question about numpy.random.choice with probabilties

2017-01-23 Thread Robert Kern
On Mon, Jan 23, 2017 at 9:22 AM, Anne Archibald <peridot.face...@gmail.com>
wrote:
>
>
> On Mon, Jan 23, 2017 at 3:34 PM Robert Kern <robert.k...@gmail.com> wrote:
>>
>> I don't object to some Notes, but I would probably phrase it more like
we are providing the standard definition of the jargon term "sampling
without replacement" in the case of non-uniform probabilities. To my mind
(or more accurately, with my background), "replace=False" obviously picks
out the implemented procedure, and I would have been incredibly surprised
if it did anything else. If the option were named "unique=True", then I
would have needed some more documentation to let me know exactly how it was
implemented.
>
>
> It is what I would have expected too, but we have a concrete example of a
user who expected otherwise; where one user speaks up, there are probably
more who didn't (some of whom probably have code that's not doing what they
think it does). So for the cost of adding a Note, why not help some of them?

That's why I said I'm fine with adding a Note. I'm just suggesting a
re-wording so that the cautious language doesn't lead anyone who is
familiar with the jargon to think we're doing something ad hoc while still
providing the details for those who aren't so familiar.

> As for the standardness of the definition: I don't know, have you a
reference where it is defined? More natural to me would be to have a list
of items with integer multiplicities (as in: "cat" 3 times, "dog" 1 time).
I'm hesitant to claim ours is a standard definition unless it's in a
textbook somewhere. But I don't insist on my phrasing.

Textbook, I'm not so sure, but it is the *only* definition I've ever
encountered in the literature:

http://epubs.siam.org/doi/abs/10.1137/0209009
http://www.sciencedirect.com/science/article/pii/S002001900500298X

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Question about numpy.random.choice with probabilties

2017-01-23 Thread Robert Kern
On Mon, Jan 23, 2017 at 6:27 AM, Anne Archibald <peridot.face...@gmail.com>
wrote:
>
> On Wed, Jan 18, 2017 at 4:13 PM Nadav Har'El <n...@scylladb.com> wrote:
>>
>> On Wed, Jan 18, 2017 at 4:30 PM, <josef.p...@gmail.com> wrote:
>>>
>>>> Having more sampling schemes would be useful, but it's not possible to
implement sampling schemes with impossible properties.
>>>
>>> BTW: sampling 3 out of 3 without replacement is even worse
>>>
>>> No matter what sampling scheme and what selection probabilities we use,
we always have every element with probability 1 in the sample.
>>
>> I agree. The random-sample function of the type I envisioned will be
able to reproduce the desired probabilities in some cases (like the example
I gave) but not in others. Because doing this correctly involves a set of n
linear equations in comb(n,k) variables, it can have no solution, or many
solutions, depending on the n and k, and the desired probabilities. A
function of this sort could return an error if it can't achieve the desired
probabilities.
>
> It seems to me that the basic problem here is that the
numpy.random.choice docstring fails to explain what the function actually
does when called with weights and without replacement. Clearly there are
different expectations; I think numpy.random.choice chose one that is easy
to explain and implement but not necessarily what everyone expects. So the
docstring should be clarified. Perhaps a Notes section:
>
> When numpy.random.choice is called with replace=False and non-uniform
probabilities, the resulting distribution of samples is not obvious.
numpy.random.choice effectively follows the procedure: when choosing the
kth element in a set, the probability of element i occurring is p[i]
divided by the total probability of all not-yet-chosen (and therefore
eligible) elements. This approach is always possible as long as the sample
size is no larger than the population, but it means that the probability
that element i occurs in the sample is not exactly p[i].

I don't object to some Notes, but I would probably phrase it more like we
are providing the standard definition of the jargon term "sampling without
replacement" in the case of non-uniform probabilities. To my mind (or more
accurately, with my background), "replace=False" obviously picks out the
implemented procedure, and I would have been incredibly surprised if it did
anything else. If the option were named "unique=True", then I would have
needed some more documentation to let me know exactly how it was
implemented.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fwd: Backslash operator A\b and np/sp.linalg.solve

2017-01-09 Thread Robert Kern
On Mon, Jan 9, 2017 at 7:10 PM, Ilhan Polat <ilhanpo...@gmail.com> wrote:
>
> Yes, that's precisely the case but when we know the structure we can just
choose the appropriate solver anyhow with a little bit of overhead. What I
mean is that, to my knowledge, FORTRAN routines for checking for
triangularness etc. are absent.

I'm responding to that. The reason that they don't have those FORTRAN
routines for testing for structure inside of a generic dense matrix is that
in FORTRAN it's more natural (and efficient) to just use the explicit
packed structure and associated routines instead. You would only use a
generic dense matrix if you know that there isn't structure in the matrix.
So there are no routines for detecting that structure in generic dense
matrices.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fwd: Backslash operator A\b and np/sp.linalg.solve

2017-01-09 Thread Robert Kern
On Mon, Jan 9, 2017 at 5:09 PM, Ilhan Polat <ilhanpo...@gmail.com> wrote:

> So every test in the polyalgorithm is cheaper than the next one. I'm not
exactly sure what might be the best strategy yet hence the question. It's
really interesting that LAPACK doesn't have this type of fast checks.

In Fortran LAPACK, if you have a special structured matrix, you usually
explicitly use packed storage and call the appropriate function type on it.
It's only when you go to a system that only has a generic, unstructured
dense matrix data type that it makes sense to do those kinds of checks.

-- 
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] array comprehension

2016-11-04 Thread Robert Kern
On Fri, Nov 4, 2016 at 6:36 AM, Neal Becker <ndbeck...@gmail.com> wrote:
>
> Francesc Alted wrote:
>
> > 2016-11-04 13:06 GMT+01:00 Neal Becker <ndbeck...@gmail.com>:
> >
> >> I find I often write:
> >> np.array ([some list comprehension])
> >>
> >> mainly because list comprehensions are just so sweet.
> >>
> >> But I imagine this isn't particularly efficient.
> >>
> >
> > Right.  Using a generator and np.fromiter() will avoid the creation of
the
> > intermediate list.  Something like:
> >
> > np.fromiter((i for i in range(x)))  # use xrange for Python 2
> >
> >
> Does this generalize to >1 dimensions?

No.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] missing from contributor list?

2016-11-02 Thread Robert Kern
Because Github (or maybe git) doesn't track the history of the file through
all of the renames. It is only reporting the contributors of changes to the
file at its current location. If you go back to the time just prior to the
commit that renamed the file, you do show up in the list:

https://github.com/numpy/numpy/blob/f179ec92d8ddb0dc5f7445255022be5c4765a704/numpy/build_utils/src/apple_sgemv_fix.c

On Wed, Nov 2, 2016 at 3:38 PM, Sturla Molden <sturla.mol...@gmail.com>
wrote:

> Why am I missing from the contributor hist here?
>
> https://github.com/numpy/numpy/blob/master/numpy/_
> build_utils/src/apple_sgemv_fix.c
>
>
> Sturla
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>



-- 
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Intel random number package

2016-10-27 Thread Robert Kern
On Thu, Oct 27, 2016 at 10:45 AM, Todd <toddr...@gmail.com> wrote:
>
> On Thu, Oct 27, 2016 at 12:12 PM, Nathaniel Smith <n...@pobox.com> wrote:
>>
>> Ever notice how Anaconda doesn't provide pyfftw? They can't legally ship
both MKL and pyfftw, and they picked MKL.
>
> Anaconda does ship GPL code [1].  They even ship GPL code that depends on
numpy, such as cvxcanon and pystan, and there doesn't seem to be anything
that prevents me from installing them alongside the MKL version of numpy.
So I don't see how it would be any different for pyfftw.

I think we've exhausted the relevance of this tangent to Oleksander's
contributions.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Intel random number package

2016-10-26 Thread Robert Kern
On Wed, Oct 26, 2016 at 12:41 PM, Warren Weckesser <
warren.weckes...@gmail.com> wrote:
>
> On Wed, Oct 26, 2016 at 3:24 PM, Nathaniel Smith <n...@pobox.com> wrote:

>> The patch also adds ~10,000 lines of code; here's an example of what
>> some of it looks like:
>>
>>
https://github.com/oleksandr-pavlyk/numpy/blob/b53880432c19356f4e54b520958272516bf391a2/numpy/random_intel/mklrand/mkl_distributions.cpp#L1724-L1833
>>
>> I don't see how we can realistically commit to maintaining this.
>
> FYI:  numpy already maintains code exactly like that:
https://github.com/numpy/numpy/blob/master/numpy/random/mtrand/distributions.c#L262-L397
>
> Perhaps the point should be that the numpy devs won't want to maintain
two nearly identical versions of that code.

Indeed. That's how the algorithm was published. The /* sigh ... */ is my
own. ;-)

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Intel random number package

2016-10-26 Thread Robert Kern
On Wed, Oct 26, 2016 at 9:36 AM, Sebastian Berg <sebast...@sipsolutions.net>
wrote:
>
> On Mi, 2016-10-26 at 09:29 -0700, Robert Kern wrote:
> > On Wed, Oct 26, 2016 at 9:10 AM, Julian Taylor <jtaylor.debian@google
> > mail.com> wrote:
> > >
> > > On 10/26/2016 06:00 PM, Julian Taylor wrote:
> >
> > >> I prefer for the full functionality of numpy to stay available
> > with a
> > >> stack of community owned software, even if it may be less powerful
> > that
> > >> way.
> > >
> > > But then if this is really just the same random numbers numpy
> > already provides just faster, it is probably acceptable in principle.
> > I haven't actually looked at the PR yet.
> >
> > I think the stream is different in some places, at least. And it's
> > not a silent backend drop-in like np.linalg being built against an
> > optimized BLAS, just a separate module that is inoperative without
> > MKL.
>
> I might be swayed, but my gut feeling would be that a backend change
> (if the default stream changes, an explicit one, though maybe one could
> make a "fastest") would be the only reasonable way to provide such a
> thing in numpy itself.

That mostly argues for distributing it as a separate package, not part of
numpy at all.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Intel random number package

2016-10-26 Thread Robert Kern
On Wed, Oct 26, 2016 at 9:10 AM, Julian Taylor <
jtaylor.deb...@googlemail.com> wrote:
>
> On 10/26/2016 06:00 PM, Julian Taylor wrote:

>> I prefer for the full functionality of numpy to stay available with a
>> stack of community owned software, even if it may be less powerful that
>> way.
>
> But then if this is really just the same random numbers numpy already
provides just faster, it is probably acceptable in principle. I haven't
actually looked at the PR yet.

I think the stream is different in some places, at least. And it's not a
silent backend drop-in like np.linalg being built against an optimized
BLAS, just a separate module that is inoperative without MKL.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Intel random number package

2016-10-25 Thread Robert Kern
On Tue, Oct 25, 2016 at 10:22 PM, Charles R Harris <
charlesr.har...@gmail.com> wrote:
>
> On Tue, Oct 25, 2016 at 10:41 PM, Robert Kern <robert.k...@gmail.com>
wrote:
>>
>> On Tue, Oct 25, 2016 at 9:34 PM, Charles R Harris <
charlesr.har...@gmail.com> wrote:
>> >
>> > Hi All,
>> >
>> > There is a proposed random number package PR now up on github:
https://github.com/numpy/numpy/pull/8209. It is from
>> > oleksandr-pavlyk and implements the number random number package using
MKL for increased speed. I think we are definitely interested in the
improved speed, but I'm not sure numpy is the best place to put the
package. I'd welcome any comments on the PR itself, as well as any thoughts
on the best way organize or use of this work. Maybe scikit-random
>>
>> This is what ng-numpy-randomstate is for.
>>
>> https://github.com/bashtage/ng-numpy-randomstate
>
> Interesting, despite old fashioned original ziggurat implementation of
the normal and gnu c style... Does that project seek to preserve all the
bytestreams or is it still in flux?

I would assume some flux for now, but you can ask the author by submitting
a corrected ziggurat PR as a trial balloon. ;-)

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Intel random number package

2016-10-25 Thread Robert Kern
On Tue, Oct 25, 2016 at 9:34 PM, Charles R Harris <charlesr.har...@gmail.com>
wrote:
>
> Hi All,
>
> There is a proposed random number package PR now up on github:
https://github.com/numpy/numpy/pull/8209. It is from
> oleksandr-pavlyk and implements the number random number package using
MKL for increased speed. I think we are definitely interested in the
improved speed, but I'm not sure numpy is the best place to put the
package. I'd welcome any comments on the PR itself, as well as any thoughts
on the best way organize or use of this work. Maybe scikit-random

This is what ng-numpy-randomstate is for.

https://github.com/bashtage/ng-numpy-randomstate

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Preserving NumPy views when pickling

2016-10-25 Thread Robert Kern
On Tue, Oct 25, 2016 at 7:05 PM, Feng Yu <rainwood...@gmail.com> wrote:
>
> Hi,
>
> Just another perspective. base' and 'data' in PyArrayObject are two
> separate variables.
>
> base can point to any PyObject, but it is `data` that defines where
> data is accessed in memory.
>
> 1. There is no clear way to pickle a pointer (`data`) in a meaningful
> way. In order for `data` member to make sense we still need to
> 'readout' the values stored at `data` pointer in the pickle.
>
> 2. By definition base is not necessary a numpy array but it is just
> some other object for managing the memory.

In general, yes, but most often it's another ndarray, and the child is
related to the parent by a slice operation that could be computed by
comparing the `data` tuples. The exercise here isn't to always represent
the general case in this way, but to see what can be done opportunistically
and if that actually helps solve a practical problem.

> 3. One can surely pickle the `base` object as a reference, but it is
> useless if the data memory has been reconstructed independently during
> unpickling.
>
> 4. Unless there is clear way to notify the referencing numpy array of
> the new data pointer. There probably isn't.
>
> BTW, is the stride information is lost during pickling, too? The
> behavior shall probably be documented if not yet.

The stride information may be lost, yes. We reserve the right to retain it,
though (for example, if .T is contiguous then we might well serialize the
transposed data linearly and return a view on that data upon
deserialization). I don't believe that we guarantee that the unpickled
result is contiguous.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Preserving NumPy views when pickling

2016-10-25 Thread Robert Kern
On Tue, Oct 25, 2016 at 5:09 PM, Matthew Harrigan <
harrigan.matt...@gmail.com> wrote:
>
> It seems pickle keeps track of references for basic python types.
>
> x = [1]
> y = [x]
> x,y = pickle.loads(pickle.dumps((x,y)))
> x.append(2)
> print(y)
> >>> [[1,2]]
>
> Numpy arrays are different but references are forgotten after
pickle/unpickle.  Shared objects do not remain shared.  Based on the quote
below it could be considered bug with numpy/pickle.

Not a bug, but an explicit design decision on numpy's part.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Preserving NumPy views when pickling

2016-10-25 Thread Robert Kern
On Tue, Oct 25, 2016 at 3:07 PM, Stephan Hoyer <sho...@gmail.com> wrote:
>
> On Tue, Oct 25, 2016 at 1:07 PM, Nathaniel Smith <n...@pobox.com> wrote:
>>
>> Concretely, what do would you suggest should happen with:
>>
>> base = np.zeros(1)
>> view = base[:10]
>>
>> # case 1
>> pickle.dump(view, file)
>>
>> # case 2
>> pickle.dump(base, file)
>> pickle.dump(view, file)
>>
>> # case 3
>> pickle.dump(view, file)
>> pickle.dump(base, file)
>>
>> ?
>
> I see what you're getting at here. We would need a rule for when to
include the base in the pickle and when not to. Otherwise,
pickle.dump(view, file) always contains data from the base pickle, even
with view is much smaller than base.
>
> The safe answer is "only use views in the pickle when base is already
being pickled", but that isn't possible to check unless all the arrays are
together in a custom container. So, this isn't really feasible for NumPy.

It would be possible with a custom Pickler/Unpickler since they already
keep track of objects previously (un)pickled. That would handle [base,
view] okay but not [view, base], so it's probably not going to be all that
useful outside of special situations. It would make a neat recipe, but I
probably would not provide it in numpy itself.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Using library-specific headers

2016-09-29 Thread Robert Kern
On Thu, Sep 29, 2016 at 6:27 PM, Pavlyk, Oleksandr <
oleksandr.pav...@intel.com> wrote:

> Related to numpy.random_inter, I noticed that the randomstate package,
which extends numpy.random was
> not being made a part of numpy, but rather published on PyPI as a
stand-alone module. Does that mean that
> the community decided against  including it in numpy's codebase? If so, I
would appreciate if someone could
> elaborate on or point me to the reasoning behind that decision.

No, we are just working out the API and the extensibility machinery in a
separate package before committing to backwards compatibility.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] New Indexing Methods Revival #N (subclasses!)

2016-09-06 Thread Robert Kern
On Tue, Sep 6, 2016 at 8:46 AM, Sebastian Berg <sebast...@sipsolutions.net>
wrote:
>
> On Di, 2016-09-06 at 09:37 +0200, Sebastian Berg wrote:
> > On Mo, 2016-09-05 at 18:31 -0400, Marten van Kerkwijk wrote:
> > >
> > > Actually, on those names: an alternative to your proposal would be
> > > to
> > > introduce only one new method which can do all types of indexing,
> > > depending on a keyword argument, i.e., something like
> > > ```
> > > def getitem(self, item, mode='outer'):
> > > ...
> > > ```
> > Have I been overthinking this, eh? Just making it `__getitem__(self,
> > index, mode=...)` and then from `vindex` calling the subclasses
> > `__getitem__(self, index, mode="vector")` or so would already solve
> > the
> > issue almost fully? Only thing I am not quite sure about:
> >
> > 1. Is `__getitem__` in some way special to make this difficult (also
> > considering some new ideas like allowing object[a=4]?
>
> OK; I think the C-side slot cannot get the kwarg likely, but probably
> you can find a solution for that

Well, the solution is to use a different name, I think.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Reading in a mesh file

2016-09-01 Thread Robert Kern
On Thu, Sep 1, 2016 at 3:49 PM, Florian Lindner <mailingli...@xgm.de> wrote:
>
> Hello,
>
> thanks for your reply which was really helpful!
>
> My problem is that I discovered that the data I got is rather unordered.
>
> The documentation for reshape says: Read the elements of a using this
index order, and place the elements into the
> reshaped array using this index order. ‘C’ means to read / write the
elements using C-like index order, with the last
> axis index changing fastest, back to the first axis index changing
slowest. ‘F’ means to read / write the elements using
> Fortran-like index order, with the first index changing fastest, and the
last index changing slowest.
>
> With my data both dimensions change, so there is no specific ordering of
the points, just a bunch of arbitrarily mixed
> "x y z value" data.
>
> My idea is:
>
> out = np.loadtxt(...)
> x = np.unique(out[:,0])
> y = np.unique[out]:,1])
> xx, yy = np.meshgrid(x, y)
>
> values = lookup(xx, yy, out)
>
> lookup is ufunc (I hope that term is correct here) that looks up the
value of every x and y in out, like
> x_filtered = out[ out[:,0] == x, :]
> y_filtered  = out[ out[:,1] == y, :]
> return y_filtered[2]
>
> (untested, just a sketch)
>
> Would this work? Any better way?

If the (x, y) values are actually drawn from a rectilinear grid, then you
can use np.lexsort() to sort the rows before reshaping.

[~/scratch]
|4> !cat random-mesh.txt
0.3 0.3 21
0   0   10
0   0.3 11
0.3 0.6 22
0   0.6 12
0.6 0.3 31
0.3 0   20
0.6 0.6 32
0.6 0   30


[~/scratch]
|5> scrambled_nodes = np.loadtxt('random-mesh.txt')

# Note! Put the "faster" column before the "slower" column!
[~/scratch]
|6> i = np.lexsort([scrambled_nodes[:, 1], scrambled_nodes[:, 0]])

[~/scratch]
|7> sorted_nodes = scrambled_nodes[i]

[~/scratch]
|8> sorted_nodes
array([[  0. ,   0. ,  10. ],
   [  0. ,   0.3,  11. ],
   [  0. ,   0.6,  12. ],
   [  0.3,   0. ,  20. ],
   [  0.3,   0.3,  21. ],
   [  0.3,   0.6,  22. ],
   [  0.6,   0. ,  30. ],
   [  0.6,   0.3,  31. ],
   [  0.6,   0.6,  32. ]])


Then carry on with the reshape()ing as before. If the grid points that
"ought to be the same" are not actually identical, then you may end up with
some problems, e.g. if you had "0.3001 0.0 20.0" as a row, but all
of the other "x=0.3" rows had "0.3", then that row would get sorted out of
order. You would have to clean up the grid coordinates a bit first.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Reading in a mesh file

2016-08-31 Thread Robert Kern
On Wed, Aug 31, 2016 at 4:00 PM, Florian Lindner <mailingli...@xgm.de>
wrote:
>
> Hello,
>
> I have mesh (more exactly: just a bunch of nodes) description with values
associated to the nodes in a file, e.g. for a
> 3x3 mesh:
>
> 0   0   10
> 0   0.3 11
> 0   0.6 12
> 0.3 0   20
> 0.3 0.3 21
> 0.3 0.6 22
> 0.6 0   30
> 0.6 0.3 31
> 0.6 0.6 32
>
> What is best way to read it in and get data structures like the ones I
get from np.meshgrid?
>
> Of course, I know about np.loadtxt, but I'm having trouble getting the
resulting arrays (x, y, values) in the right form
> and to retain association to the values.

For this particular case (known shape and ordering), this is what I would
do. Maybe throw in a .T or three depending on exactly how you want them to
be laid out.

[~/scratch]
|1> !cat mesh.txt

0   0   10
0   0.3 11
0   0.6 12
0.3 0   20
0.3 0.3 21
0.3 0.6 22
0.6 0   30
0.6 0.3 31
0.6 0.6 32

[~/scratch]
|2> nodes = np.loadtxt('mesh.txt')

[~/scratch]
|3> nodes
array([[  0. ,   0. ,  10. ],
   [  0. ,   0.3,  11. ],
   [  0. ,   0.6,  12. ],
   [  0.3,   0. ,  20. ],
   [  0.3,   0.3,  21. ],
   [  0.3,   0.6,  22. ],
   [  0.6,   0. ,  30. ],
   [  0.6,   0.3,  31. ],
   [  0.6,   0.6,  32. ]])

[~/scratch]
|4> reshaped = nodes.reshape((3, 3, -1))

[~/scratch]
|5> reshaped
array([[[  0. ,   0. ,  10. ],
[  0. ,   0.3,  11. ],
[  0. ,   0.6,  12. ]],

   [[  0.3,   0. ,  20. ],
[  0.3,   0.3,  21. ],
[  0.3,   0.6,  22. ]],

   [[  0.6,   0. ,  30. ],
[  0.6,   0.3,  31. ],
[  0.6,   0.6,  32. ]]])

[~/scratch]
|7> x = reshaped[..., 0]

[~/scratch]
|8> y = reshaped[..., 1]

[~/scratch]
|9> values = reshaped[..., 2]

[~/scratch]
|10> x
array([[ 0. ,  0. ,  0. ],
   [ 0.3,  0.3,  0.3],
   [ 0.6,  0.6,  0.6]])

[~/scratch]
|11> y
array([[ 0. ,  0.3,  0.6],
   [ 0. ,  0.3,  0.6],
   [ 0. ,  0.3,  0.6]])

[~/scratch]
|12> values
array([[ 10.,  11.,  12.],
   [ 20.,  21.,  22.],
   [ 30.,  31.,  32.]])

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Include last element when subindexing numpy arrays?

2016-08-31 Thread Robert Kern
On Wed, Aug 31, 2016 at 1:34 PM, Matti Viljamaa <mvilja...@kapsi.fi> wrote:
>
> On 31 Aug 2016, at 15:22, Robert Kern <robert.k...@gmail.com> wrote:
>
> On Wed, Aug 31, 2016 at 12:28 PM, Matti Viljamaa <mvilja...@kapsi.fi>
wrote:
> >
> > Is there a clean way to include the last element when subindexing numpy
arrays?
> > Since the default behaviour of numpy arrays is to omit the “stop index”.
> >
> > So for,
> >
> > >>> A
> > array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
> > >>> A[0:5]
> > array([0, 1, 2, 3, 4])
>
> A[5:]
>
> --
> Robert Kern
>
> No that returns the subarray starting from index 5 to the end.
>
> What I want to be able to return
>
> array([0, 1, 2, 3, 4, 5])
>
> (i.e. last element 5 included)
>
> but without the funky A[0:6] syntax, which looks like it should return
>
> array([0, 1, 2, 3, 4, 5, 6])
>
> but since bumpy arrays omit the last index, returns
>
> array([0, 1, 2, 3, 4, 5])
>
> which syntactically would be more reasonable to be A[0:5].

Ah, I see what you are asking now.

The answer is "no"; this is just the way that slicing works in Python in
general. numpy merely follows suit. It is something that you will get used
to with practice. My sense of "funkiness" and "reasonableness" is the
opposite of yours, for instance.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] State-of-the-art to use a C/C++ library from Python

2016-08-31 Thread Robert Kern
On Wed, Aug 31, 2016 at 12:28 PM, Michael Bieri <mibi...@gmail.com> wrote:
>
> Hi all
>
> There are several ways on how to use C/C++ code from Python with NumPy,
as given in http://docs.scipy.org/doc/numpy/user/c-info.html . Furthermore,
there's at least pybind11.
>
> I'm not quite sure which approach is state-of-the-art as of 2016. How
would you do it if you had to make a C/C++ library available in Python
right now?
>
> In my case, I have a C library with some scientific functions on matrices
and vectors. You will typically call a few functions to configure the
computation, then hand over some pointers to existing buffers containing
vector data, then start the computation, and finally read back the data.
The library also can use MPI to parallelize.

I usually reach for Cython:

http://cython.org/
http://docs.cython.org/en/latest/src/userguide/memoryviews.html

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Why np.fft.rfftfreq only returns up to Nyqvist?

2016-08-31 Thread Robert Kern
On Wed, Aug 31, 2016 at 1:14 PM, Matti Viljamaa <mvilja...@kapsi.fi> wrote:
>
> What’s the reasonability of np.fft.rfftfreq returning frequencies only up
to Nyquist, rather than for the full sample rate?

The answer to the question that you asked is that np.fft.rfft() only
computes values for frequencies only up to Nyquist, so np.fft.rfftfreq()
must give you the frequencies to match. I'm not sure if there is another
misunderstanding lurking that needs to be clarified.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Include last element when subindexing numpy arrays?

2016-08-31 Thread Robert Kern
On Wed, Aug 31, 2016 at 12:28 PM, Matti Viljamaa <mvilja...@kapsi.fi> wrote:
>
> Is there a clean way to include the last element when subindexing numpy
arrays?
> Since the default behaviour of numpy arrays is to omit the “stop index”.
>
> So for,
>
> >>> A
> array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
> >>> A[0:5]
> array([0, 1, 2, 3, 4])

A[5:]

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] coordinate bounds

2016-08-20 Thread Robert Kern
On Sat, Aug 20, 2016 at 9:16 PM, Alan Isaac <alan.is...@gmail.com> wrote:
>
> Is there a numpy equivalent to Mma's CoordinateBounds command?
> http://reference.wolfram.com/language/ref/CoordinateBounds.html

The first signature can be computed like so:

  np.transpose([coords.min(axis=0), coords.max(axis=0)])

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy set_printoptions, silent failure, bug?

2016-07-19 Thread Robert Kern
On Tue, Jul 19, 2016 at 10:41 PM, John Ladasky <jlada...@itu.edu> wrote:

> Should this be considered a Numpy bug, or is there some reason that
set_printoptions would legitimately need to accept a dictionary as a single
argument?

There is no such reason. One could certainly add more validation to the
arguments to np.set_printoptions().

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Design feedback solicitation

2016-07-14 Thread Robert Kern
On Fri, Jul 15, 2016 at 2:53 AM, Pavlyk, Oleksandr <
oleksandr.pav...@intel.com> wrote:
>
> Hi Robert,
>
> Thank you for the pointers.
>
> I think numpy.random should have a mechanism to choose between methods
for generating the underlying randomness dynamically, at a run-time, as
well as an extensible framework, where developers could add more methods.
The default would be MT19937 for backwards compatibility. It is important
to be able to do this at a run-time, as it would allow one to use different
algorithms in different threads (like different members of the parallel
Mersenne twister family of generators, see MT2203).
>
> The framework should allow to define randomness as a bit stream, a stream
of fixed size integers, or a stream of uniform reals (32 or 64 bits). This
is a lot of like MKL’s abstract method for basic pseudo-random number
generation.
>
> Each method should provide routines to sample from uniform distributions
over reals (in floats and doubles), as well as over integers.
>
> All remaining non-uniform distributions build on top of these uniform
streams.

ng-numpy-randomstate does all of these.

> I think it is pretty important to refactor numpy.random to allow the
underlying generators to produce a given number of independent variates at
a time. There could be convenience wrapper functions to allow to get one
variate for backwards compatibility, but this change in design would allow
for better efficiency, as sampling a vector of random variates at once is
often faster than repeated sampling of one at a time due to set-up cost,
vectorization, etc.

The underlying C implementation is an implementation detail, so the
refactoring that you suggest has no backwards compatibility constraints.

> Finally, methods to sample particular distribution should uniformly
support method keyword argument. Because method names vary from
distribution to distribution, it should ideally be programmatically
discoverable which methods are supported for a given distribution. For
instance, the standard normal distribution could support
method=’Inversion’, method=’Box-Muller’, method=’Ziggurat’,
method=’Box-Muller-Marsaglia’ (the one used in numpy.random right now), as
well as bunch of non-named methods based on transformed rejection method
(see http://statistik.wu-wien.ac.at/anuran/ )

That is one of the items under discussion. I personally prefer that one
simply exposes named methods for each different scheme (e.g.
ziggurat_normal(), etc.).

> It would also be good if one could dynamically register a new method to
sample from a non-uniform distribution. This would allow, for instance, to
automatically add methods to sample certain non-uniform distribution by
directly calling into MKL (or other library), when available, instead of
building them from uniforms (which may remain a fall-through method).
>
> The linked project is a good start, but the choice of the underlying
algorithm needs to be made at a run-time,

That's what happens. You instantiate the RandomState class that you want.

> as far as I understood, and the only provided interface to query random
variates is one at a time, just like it is currently the case
> in numpy.random.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Design feedback solicitation

2016-06-17 Thread Robert Kern
On Fri, Jun 17, 2016 at 4:08 PM, Pavlyk, Oleksandr <
oleksandr.pav...@intel.com> wrote:
>
> Hi,
>
> I am new to this list, so I will start with an introduction. My name is
Oleksandr Pavlyk. I now work at Intel Corp. on the Intel Distribution for
Python, and previously worked at Wolfram Research for 12 years. My latest
project was to write a mirror to numpy.random, named numpy.random_intel.
The module uses MKL to sample from different distributions for efficiency.
It provides support for different underlying algorithms for basic
pseudo-random number generation, i.e. in addition to MT19937, it also
provides SFMT19937, MT2203, etc.
>
> I recently published a blog about it:
>
>
https://software.intel.com/en-us/blogs/2016/06/15/faster-random-number-generation-in-intel-distribution-for-python
>
> I originally attempted to simply replace numpy.random in the Intel
Distribution for Python with the new module, but due to fixed seed
backwards incompatibility this results in numerous test failures in numpy,
scipy, pandas and other modules.
>
> Unlike numpy.random, the new module generates a vector of random numbers
at a time, which can be done faster than repeatedly generating the same
number of variates one at a time.
>
> The source code for the new module is not upstreamed yet, and this email
is meant to solicit early community feedback to allow for faster acceptance
of the proposed changes.

Cool! You can find pertinent discussion here:

  https://github.com/numpy/numpy/issues/6967

And the current effort for adding new core PRNGs here:

  https://github.com/bashtage/ng-numpy-randomstate

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Indexing with floats

2016-06-10 Thread Robert Kern
On Fri, Jun 10, 2016 at 12:15 PM, Fabien <fabien.mauss...@gmail.com> wrote:
>
> Hi,
>
> I really tried to do my homework before asking this here, but I just
couldn't find the relevant information anywhere...
>
> My question is about the rationale behind forbidding indexing with
floats, i.e.:
>
> >>> x[2.]
> __main__:1: VisibleDeprecationWarning: using a non-integer number instead
of an integer will result in an error in the future
>
> I don't find this very handy from a user's perspective, and I'd be
grateful for pointers on discussion threads and/or PRs where this has been
discussed, so that I can understand why it's important.

https://mail.scipy.org/pipermail/numpy-discussion/2012-December/064705.html
https://github.com/numpy/numpy/issues/2810
https://github.com/numpy/numpy/pull/2891
https://github.com/numpy/numpy/pull/3243
https://mail.scipy.org/pipermail/numpy-discussion/2015-July/073125.html

Note that the future is coming in the next numpy release:

https://github.com/numpy/numpy/pull/6271

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal: numpy.random.random_seed

2016-05-23 Thread Robert Kern
On Mon, May 23, 2016 at 5:41 PM, Chris Barker <chris.bar...@noaa.gov> wrote:
>
> On Sun, May 22, 2016 at 2:35 AM, Robert Kern <robert.k...@gmail.com>
wrote:
>>
>> Well, I mean, engineers want lots of things. I suspect that most
engineers *really* just want to call `numpy.random.seed(8675309)` at the
start and never explicitly pass around separate streams. There's an upside
to that in terms of code simplicity. There are also significant limitations
and constraints. Ultimately, the upside against the alternative of passing
around RandomState objects is usually overweighed by the limitations, so
best practice is to pass around RandomState objects.
>
> Could we do something like the logging module, and have numpy.random
"manage" a bunch of stream objects for you -- so you could get the default
single stream easily, and also get access to specific streams without
needing to pass around the objects?

No, I don't think so. The logging module's namespacing doesn't really have
an equivalent use case for PRNGs. We would just be making a much more
complicated global state to manage.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal: numpy.random.random_seed

2016-05-22 Thread Robert Kern
On Wed, May 18, 2016 at 7:56 PM, Nathaniel Smith <n...@pobox.com> wrote:
>
> On Wed, May 18, 2016 at 5:07 AM, Robert Kern <robert.k...@gmail.com>
wrote:
> > On Wed, May 18, 2016 at 1:14 AM, Nathaniel Smith <n...@pobox.com> wrote:

> >> ...anyway, the real reason I'm a bit grumpy is because there are solid
> >> engineering reasons why users *want* this API,
> >
> > I remain unconvinced on this mark. Grumpily.
>
> Sorry for getting grumpy :-).

And my apologies for some unwarranted hyperbole. I think we're both
converging on a reasonable approach, though.

> The engineering reasons seem pretty
> obvious to me though?

Well, I mean, engineers want lots of things. I suspect that most engineers
*really* just want to call `numpy.random.seed(8675309)` at the start and
never explicitly pass around separate streams. There's an upside to that in
terms of code simplicity. There are also significant limitations and
constraints. Ultimately, the upside against the alternative of passing
around RandomState objects is usually overweighed by the limitations, so
best practice is to pass around RandomState objects.

I acknowledge that there exists an upside to the splitting API, but I don't
think it's a groundbreaking improvement over the alternative current best
practice. It's also unclear to me how often situations that really
demonstrate the upside come into play; in my experience a lot of these
situations are already structured such that preallocating N streams is the
natural thing to do. The limitations and constraints are currently
underexplored, IMO; and in this conservative field, pessimism is warranted.

> If you have any use case for independent streams
> at all, and you're writing code that's intended to live inside a
> library's abstraction barrier, then you need some way to choose your
> streams to avoid colliding with arbitrary other code that the end-user
> might assemble alongside yours as part of their final program. So
> AFAICT you have two options: either you need a "tree-style" API for
> allocating these streams, or else you need to add some explicit API to
> your library that lets the end-user control in detail which streams
> you use. Both are possible, but the latter is obviously undesireable
> if you can avoid it, since it breaks the abstraction barrier, making
> your library more complicated to use and harder to evolve.

ACK

> >> so whether or not it
> >> turns out to be possible I think we should at least be allowed to have
> >> a discussion about whether there's some way to give it to them.
> >
> > I'm not shutting down discussion of the option. I *implemented* the
option.
> > I think that discussing whether it should be part of the main API is
> > premature. There probably ought to be a paper or three out there
supporting
> > its safety and utility first. Let the utility function version flourish
> > first.
>
> OK -- I guess this particularly makes sense given how
> extra-tightly-constrained we currently are in fixing mistakes in
> np.random. But I feel like in the end the right place for this really
> is inside the RandomState interface, because the person implementing
> RandomState is the one best placed to understand (a) the gnarly
> technical details here, and (b) how those change depending on the
> particular PRNG in use. I don't want to end up with a bunch of
> subtly-buggy utility functions in non-specialist libraries like dask
> -- so we should be trying to help downstream users figure out how to
> actually get this into np.random?

I think this is an open research area. An enterprising grad student could
milk this for a couple of papers analyzing how to do this safely for a
variety of PRNGs. I don't think we can hash this out in an email thread or
PR. So yeah, eventually there might be an API on RandomState for this, but
it's way too premature to do so right now, IMO. Maybe start with a
specialized subclass of RandomState that adds this experimental API. In
ng-numpy-randomstate. ;-)

But if someone has spare time to work on numpy.random, for God's sake, use
it to review @gfyoung's PRs instead.

> >> It's
> >> not even 100% out of the question that we conclude that existing PRNGs
> >> are buggy because they don't take this use case into account -- it
> >> would be far from the first time that numpy found itself going beyond
> >> the limits of older numerical tools that weren't designed to build the
> >> kind of large composable systems that numpy gets used for.
> >>
> >> MT19937's state space is large enough that you could explicitly encode
> >> a "tree seed" into it, even if you don't trust the laws of probability
> >> -- e.g., you start with a RandomState with id [], then i

Re: [Numpy-discussion] Proposal: numpy.random.random_seed

2016-05-18 Thread Robert Kern
On Wed, May 18, 2016 at 6:20 PM, <josef.p...@gmail.com> wrote:
>
> On Wed, May 18, 2016 at 12:01 PM, Robert Kern <robert.k...@gmail.com>
wrote:
>>
>> On Wed, May 18, 2016 at 4:50 PM, Chris Barker <chris.bar...@noaa.gov>
wrote:
>> >>
>> >> > ...anyway, the real reason I'm a bit grumpy is because there are
solid
>> >> > engineering reasons why users *want* this API,
>> >
>> > Honestly, I am lost in the math -- but like any good engineer, I want
to accomplish something anyway :-) I trust you guys to get this right -- or
at least document what's "wrong" with it.
>> >
>> > But, if I'm reading the use case that started all this correctly, it
closely matches my use-case. That is, I have a complex model with multiple
independent "random" processes. And we want to be able to re-produce
EXACTLY simulations -- our users get confused when the results are
"different" even if in a statistically insignificant way.
>> >
>> > At the moment we are using one RNG, with one seed for everything. So
we get reproducible results, but if one thing is changed, then the entire
simulation is different -- which is OK, but it would be nicer to have each
process using its own RNG stream with it's own seed. However, it matters
not one whit if those seeds are independent -- the processes are different,
you'd never notice if they were using the same PRN stream -- because they
are used differently. So a "fairly low probability of a clash" would be
totally fine.
>>
>> Well, the main question is: do you need to be able to spawn dependent
streams at arbitrary points to an arbitrary depth without coordination
between processes? The necessity for multiple independent streams per se is
not contentious.
>
> I'm similar to Chris, and didn't try to figure out the details of what
you are talking about.
>
> However, if there are functions getting into numpy that help in using a
best practice even if it's not bullet proof, then it's still better than
home made approaches.
> If it get's in soon, then we can use it in a few years (given dependency
lag). At that point there should be more distributed, nested simulation
based algorithms where we don't know in advance how far we have to go to
get reliable numbers or convergence.
>
> (But I don't see anything like that right now.)

Current best practice is to use PRNGs with settable streams (or fixed
jumpahead for those PRNGs cursed to not have settable streams but blessed
to have super-long periods). The way to get those into numpy is to help
Kevin Sheppard finish:

  https://github.com/bashtage/ng-numpy-randomstate

He's done nearly all of the hard work already.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal: numpy.random.random_seed

2016-05-18 Thread Robert Kern
On Wed, May 18, 2016 at 4:50 PM, Chris Barker <chris.bar...@noaa.gov> wrote:
>>
>> > ...anyway, the real reason I'm a bit grumpy is because there are solid
>> > engineering reasons why users *want* this API,
>
> Honestly, I am lost in the math -- but like any good engineer, I want to
accomplish something anyway :-) I trust you guys to get this right -- or at
least document what's "wrong" with it.
>
> But, if I'm reading the use case that started all this correctly, it
closely matches my use-case. That is, I have a complex model with multiple
independent "random" processes. And we want to be able to re-produce
EXACTLY simulations -- our users get confused when the results are
"different" even if in a statistically insignificant way.
>
> At the moment we are using one RNG, with one seed for everything. So we
get reproducible results, but if one thing is changed, then the entire
simulation is different -- which is OK, but it would be nicer to have each
process using its own RNG stream with it's own seed. However, it matters
not one whit if those seeds are independent -- the processes are different,
you'd never notice if they were using the same PRN stream -- because they
are used differently. So a "fairly low probability of a clash" would be
totally fine.

Well, the main question is: do you need to be able to spawn dependent
streams at arbitrary points to an arbitrary depth without coordination
between processes? The necessity for multiple independent streams per se is
not contentious.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal: numpy.random.random_seed

2016-05-18 Thread Robert Kern
On Wed, May 18, 2016 at 1:14 AM, Nathaniel Smith <n...@pobox.com> wrote:
>
> On Tue, May 17, 2016 at 10:41 AM, Robert Kern <robert.k...@gmail.com>
wrote:
> > On Tue, May 17, 2016 at 6:24 PM, Nathaniel Smith <n...@pobox.com> wrote:
> >>
> >> On May 17, 2016 1:50 AM, "Robert Kern" <robert.k...@gmail.com> wrote:
> >> >
> >> [...]
> >> > What you want is a function that returns many RandomState objects
that
> >> > are hopefully spread around the MT19937 space enough that they are
> >> > essentially independent (in the absence of true jumpahead). The
better
> >> > implementation of such a function would look something like this:
> >> >
> >> > def spread_out_prngs(n, root_prng=None):
> >> > if root_prng is None:
> >> > root_prng = np.random
> >> > elif not isinstance(root_prng, np.random.RandomState):
> >> > root_prng = np.random.RandomState(root_prng)
> >> > sprouted_prngs = []
> >> > for i in range(n):
> >> > seed_array = root_prng.randint(1<<32, size=624)  #
> >> > dtype=np.uint32 under 1.11
> >> > sprouted_prngs.append(np.random.RandomState(seed_array))
> >> > return spourted_prngs
> >>
> >> Maybe a nice way to encapsulate this in the RandomState interface
would be
> >> a method RandomState.random_state() that generates and returns a new
child
> >> RandomState.
> >
> > I disagree. This is a workaround in the absence of proper jumpahead or
> > guaranteed-independent streams. I would not encourage it.
> >
> >> > Internally, this generates seed arrays of about the size of the
MT19937
> >> > state so make sure that you can access more of the state space. That
will at
> >> > least make the chance of collision tiny. And it can be easily
rewritten to
> >> > take advantage of one of the newer PRNGs that have true independent
streams:
> >> >
> >> >   https://github.com/bashtage/ng-numpy-randomstate
> >>
> >> ... But unfortunately I'm not sure how to make my interface suggestion
> >> above work on top of one of these RNGs, because for
RandomState.random_state
> >> you really want a tree of independent RNGs and the fancy new PRNGs only
> >> provide a single flat namespace :-/. And even more annoyingly, the
tree API
> >> is actually a nicer API, because with a flat namespace you have to
know up
> >> front about all possible RNGs your code will use, which is an
unfortunate
> >> global coupling that makes it difficult to compose programs out of
> >> independent pieces, while the RandomState.random_state approach
composes
> >> beautifully. Maybe there's some clever way to allocate a 64-bit
namespace to
> >> make it look tree-like? I'm not sure 64 bits is really enough...
> >
> > MT19937 doesn't have a "tree" any more than the others. It's the same
flat
> > state space. You are just getting the illusion of a tree by hoping that
you
> > never collide. You ought to think about precisely the same global
coupling
> > issues with MT19937 as you do with guaranteed-independent streams.
> > Hope-and-prayer isn't really a substitute for properly engineering your
> > problem. It's just a moral hazard to promote this method to the main
API.
>
> Nonsense.
>
> If your definition of "hope and prayer" includes assuming that we
> won't encounter a random collision in a 2**19937 state space, then
> literally all engineering is hope-and-prayer. A collision could
> happen, but if it does it's overwhelmingly more likely to happen
> because of a flaw in the mathematical analysis, or a bug in the
> implementation, or because random quantum fluctuations caused you and
> your program to suddenly be transported to a parallel world where 1 +
> 1 = 1, than that you just got unlucky with your random state. And all
> of these hazards apply equally to both MT19937 and more modern PRNGs.

Granted.

> ...anyway, the real reason I'm a bit grumpy is because there are solid
> engineering reasons why users *want* this API,

I remain unconvinced on this mark. Grumpily.

> so whether or not it
> turns out to be possible I think we should at least be allowed to have
> a discussion about whether there's some way to give it to them.

I'm not shutting down discussion of the option. I *implemented* the option.
I think that discussing whether it should be part of the main API is
premature. There probably ought to be a paper or three out there supporting
its safety and utility first. Let the utili

Re: [Numpy-discussion] Proposal: numpy.random.random_seed

2016-05-17 Thread Robert Kern
On Tue, May 17, 2016 at 6:24 PM, Nathaniel Smith <n...@pobox.com> wrote:
>
> On May 17, 2016 1:50 AM, "Robert Kern" <robert.k...@gmail.com> wrote:
> >
> [...]
> > What you want is a function that returns many RandomState objects that
are hopefully spread around the MT19937 space enough that they are
essentially independent (in the absence of true jumpahead). The better
implementation of such a function would look something like this:
> >
> > def spread_out_prngs(n, root_prng=None):
> > if root_prng is None:
> > root_prng = np.random
> > elif not isinstance(root_prng, np.random.RandomState):
> > root_prng = np.random.RandomState(root_prng)
> > sprouted_prngs = []
> > for i in range(n):
> > seed_array = root_prng.randint(1<<32, size=624)  #
dtype=np.uint32 under 1.11
> > sprouted_prngs.append(np.random.RandomState(seed_array))
> > return spourted_prngs
>
> Maybe a nice way to encapsulate this in the RandomState interface would
be a method RandomState.random_state() that generates and returns a new
child RandomState.

I disagree. This is a workaround in the absence of proper jumpahead or
guaranteed-independent streams. I would not encourage it.

> > Internally, this generates seed arrays of about the size of the MT19937
state so make sure that you can access more of the state space. That will
at least make the chance of collision tiny. And it can be easily rewritten
to take advantage of one of the newer PRNGs that have true independent
streams:
> >
> >   https://github.com/bashtage/ng-numpy-randomstate
>
> ... But unfortunately I'm not sure how to make my interface suggestion
above work on top of one of these RNGs, because for
RandomState.random_state you really want a tree of independent RNGs and the
fancy new PRNGs only provide a single flat namespace :-/. And even more
annoyingly, the tree API is actually a nicer API, because with a flat
namespace you have to know up front about all possible RNGs your code will
use, which is an unfortunate global coupling that makes it difficult to
compose programs out of independent pieces, while the
RandomState.random_state approach composes beautifully. Maybe there's some
clever way to allocate a 64-bit namespace to make it look tree-like? I'm
not sure 64 bits is really enough...

MT19937 doesn't have a "tree" any more than the others. It's the same flat
state space. You are just getting the illusion of a tree by hoping that you
never collide. You ought to think about precisely the same global coupling
issues with MT19937 as you do with guaranteed-independent streams.
Hope-and-prayer isn't really a substitute for properly engineering your
problem. It's just a moral hazard to promote this method to the main API.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal: numpy.random.random_seed

2016-05-17 Thread Robert Kern
On Tue, May 17, 2016 at 2:40 PM, Sturla Molden <sturla.mol...@gmail.com>
wrote:
>
> Stephan Hoyer <sho...@gmail.com> wrote:
> > I have recently encountered several use cases for randomly generate
random
> > number seeds:
> >
> > 1. When writing a library of stochastic functions that take a seed as an
> > input argument, and some of these functions call multiple other such
> > stochastic functions. Dask is one such example [1].
> >
> > 2. When a library needs to produce results that are reproducible after
> > calling numpy.random.seed, but that do not want to use the functions in
> > numpy.random directly. This came up recently in a pandas pull request
[2],
> > because we want to allow using RandomState objects as an alternative to
> > global state in numpy.random. A major advantage of this approach is
that it
> > provides an obvious alternative to reusing the private
numpy.random._mtrand
> > [3].
>
> What about making numpy.random a finite state machine, and keeping a stack
> of RandomState seeds? That is, something similar to what OpenGL does for
> its matrices? Then we get two functions, numpy.random.push_seed and
> numpy.random.pop_seed.

I don't think that addresses the issues brought up here. It's just more
global state to worry about.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal: numpy.random.random_seed

2016-05-17 Thread Robert Kern
On Tue, May 17, 2016 at 9:09 AM, Stephan Hoyer <sho...@gmail.com> wrote:
>
> On Tue, May 17, 2016 at 12:18 AM, Robert Kern <robert.k...@gmail.com>
wrote:
>>
>> On Tue, May 17, 2016 at 4:54 AM, Stephan Hoyer <sho...@gmail.com> wrote:
>> > 1. When writing a library of stochastic functions that take a seed as
an input argument, and some of these functions call multiple other such
stochastic functions. Dask is one such example [1].
>>
>> Can you clarify the use case here? I don't really know what you are
doing here, but I'm pretty sure this is not the right approach.
>
> Here's a contrived example. Suppose I've written a simulator for cars
that consists of a number of loosely connected components (e.g., an engine,
brakes, etc.). The behavior of each component of our simulator is
stochastic, but we want everything to be fully reproducible, so we need to
use seeds or RandomState objects.
>
> We might write our simulate_car function like the following:
>
> def simulate_car(engine_config, brakes_config, seed=None):
> rs = np.random.RandomState(seed)
> engine = simulate_engine(engine_config, seed=rs.random_seed())
> brakes = simulate_brakes(brakes_config, seed=rs.random_seed())
> ...
>
> The problem with passing the same RandomState object (either explicitly
or dropping the seed argument entirely and using the  global state) to both
simulate_engine and simulate_breaks is that it breaks encapsulation -- if I
change what I do inside simulate_engine, it also effects the brakes.

That's a little too contrived, IMO. In most such simulations, the different
components interact with each other in the normal course of the simulation;
that's why they are both joined together in the same simulation instead of
being two separate runs. Unless if the components are being run across a
process or thread boundary (a la dask below) where true nondeterminism
comes into play, then I don't think you want these semi-independent
streams. This seems to be the advice du jour from the agent-based modeling
community.

> The dask use case is actually pretty different -- the intent is to create
many random numbers in parallel using multiple threads or processes
(possibly in a distributed fashion). I know that skipping ahead is the
standard way to get independent number streams for parallel sampling, but
that isn't exposed in numpy.random, and setting distinct seeds seems like a
reasonable alternative for scientific computing use cases.

Forget about integer seeds. Those are for human convenience. If you're not
jotting them down in your lab notebook in pen, you don't want an integer
seed.

What you want is a function that returns many RandomState objects that are
hopefully spread around the MT19937 space enough that they are essentially
independent (in the absence of true jumpahead). The better implementation
of such a function would look something like this:

def spread_out_prngs(n, root_prng=None):
if root_prng is None:
root_prng = np.random
elif not isinstance(root_prng, np.random.RandomState):
root_prng = np.random.RandomState(root_prng)
sprouted_prngs = []
for i in range(n):
seed_array = root_prng.randint(1<<32, size=624)  # dtype=np.uint32
under 1.11
sprouted_prngs.append(np.random.RandomState(seed_array))
return spourted_prngs

Internally, this generates seed arrays of about the size of the MT19937
state so make sure that you can access more of the state space. That will
at least make the chance of collision tiny. And it can be easily rewritten
to take advantage of one of the newer PRNGs that have true independent
streams:

  https://github.com/bashtage/ng-numpy-randomstate

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal: numpy.random.random_seed

2016-05-17 Thread Robert Kern
On Tue, May 17, 2016 at 4:54 AM, Stephan Hoyer <sho...@gmail.com> wrote:
>
> I have recently encountered several use cases for randomly generate
random number seeds:
>
> 1. When writing a library of stochastic functions that take a seed as an
input argument, and some of these functions call multiple other such
stochastic functions. Dask is one such example [1].

Can you clarify the use case here? I don't really know what you are doing
here, but I'm pretty sure this is not the right approach.

> 2. When a library needs to produce results that are reproducible after
calling numpy.random.seed, but that do not want to use the functions in
numpy.random directly. This came up recently in a pandas pull request [2],
because we want to allow using RandomState objects as an alternative to
global state in numpy.random. A major advantage of this approach is that it
provides an obvious alternative to reusing the private numpy.random._mtrand
[3].

It's only pseudo-private. This is an authorized use of it.

However, for this case, I usually just pass around the the numpy.random
module itself and let duck-typing take care of the rest.

> [3] On a side note, if there's no longer a good reason to keep this
object private, perhaps we should expose it in our public API. It would
certainly be useful -- scikit-learn is already using it (see links in the
pandas PR above).

Adding a public get_global_random_state() function might be in order.
Originally, I wanted there to be *some* barrier to entry, but just grabbing
it to use as a default RandomState object is definitely an intended use of
it. It's not going to disappear.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Floor divison on int returns float

2016-04-13 Thread Robert Kern
On Wed, Apr 13, 2016 at 3:17 AM, Antony Lee <antony@berkeley.edu> wrote:
>
> This kind of issue (see also https://github.com/numpy/numpy/issues/3511)
has become more annoying now that indexing requires integers (indexing with
a float raises a VisibleDeprecationWarning).  The argument "dividing an
uint by an int may give a result that does not fit in an uint nor in an
int" does not sound very convincing to me,

It shouldn't because that's not the rule that numpy follows. The range of
the result is never considered. Both *inputs* are cast to the same type
that can represent the full range of either input type (for that matter,
the actual *values* of the inputs are also never considered). In the case
of uint64 and int64, there is no really good common type (the integer
hierarchy has to top out somewhere), but float64 merely loses resolution
rather than cutting off half of the range of uint64.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] mtrand.c update 1.11 breaks my crappy code

2016-04-06 Thread Robert Kern
On Wed, Apr 6, 2016 at 4:17 PM, Neal Becker <ndbeck...@gmail.com> wrote:

> I prefer to use a single instance of a RandomState so that there are
> guarantees about the independence of streams generated from python random
> functions, and from my c++ code.  True, there are simpler approaches - but
> I'm a purist.

Consider using PRNGs that actually expose truly independent streams instead
of a single shared stream:

https://github.com/bashtage/ng-numpy-randomstate

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] mtrand.c update 1.11 breaks my crappy code

2016-04-06 Thread Robert Kern
On Wed, Apr 6, 2016 at 2:18 PM, Neal Becker <ndbeck...@gmail.com> wrote:
>
> I have C++ code that tries to share the mtrand state.  It unfortunately
> depends on the layout of RandomState which used to be:
>
> struct __pyx_obj_6mtrand_RandomState {
>   PyObject_HEAD
>   rk_state *internal_state;
>   PyObject *lock;
> };
>
> But with 1.11 it's:
> struct __pyx_obj_6mtrand_RandomState {
>   PyObject_HEAD
>   struct __pyx_vtabstruct_6mtrand_RandomState *__pyx_vtab;
>   rk_state *internal_state;
>   PyObject *lock;
>   PyObject *state_address;
> };
>
> So
> 1. Why the change?
> 2. How can I write portable code?

There is no C API to RandomState at this time, stable, portable or
otherwise. It's all private implementation detail. If you would like a
stable and portable C API for RandomState, you will need to contribute one
using PyCapsules to expose the underlying rk_state* pointer.

https://docs.python.org/2.7/c-api/capsule.html

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] proposal: new logspace without the log in the argument

2016-02-19 Thread Robert Kern
On Fri, Feb 19, 2016 at 12:10 PM, Andrew Nelson <andyf...@gmail.com> wrote:
>
> With respect to geomspace proposals: instead of specifying start and end
values and the number of points I'd like to have an option where I can set
the start and end points and the ratio. The function would then work out
the correct number of points to get closest to the end value.
>
> E.g. geomspace(start=1, finish=2, ratio=1.03)
>
> The first entries would be 1.0, 1.03, 1*1.03**2, etc.
>
> I have a requirement for the correct ratio between the points, and it's a
right bind having to calculate the exact number of points needed.

At the risk of extending the twisty little maze of names, all alike, I
would probably call a function with this signature geomrange() instead. It
is more akin to arange(start, stop, step) than linspace(start, stop,
num_steps).

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] proposal: new logspace without the log in the argument

2016-02-18 Thread Robert Kern
On Thu, Feb 18, 2016 at 10:19 PM, Alan Isaac <alan.is...@gmail.com> wrote:
>
> On 2/18/2016 2:44 PM, Robert Kern wrote:
>>
>> In a new function not named `linspace()`, I think that might be fine. I
do occasionally want to swap between linear and logarithmic/geometric
spacing based on a parameter, so this
>> doesn't violate the van Rossum Rule of Function Signatures.
>
> Would such a new function correct the apparent mistake (?) of
> `linspace` including the endpoint by default?
> Or is the current API justified by its Matlab origins?
> (Or have I missed the point altogether?)

The last, I'm afraid. Different use cases, different conventions. Integer
ranges are half-open because that is the most useful convention in a
0-indexed ecosystem. Floating point ranges don't interface with indexing,
and the closed intervals are the most useful (or at least the most common).

> If this query is annoying, please ignore it.  It is not meant to be.

The same for my answer.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] proposal: new logspace without the log in the argument

2016-02-18 Thread Robert Kern
On Thu, Feb 18, 2016 at 7:38 PM, Nathaniel Smith <n...@pobox.com> wrote:
>
> Some questions it'd be good to get feedback on:
>
> - any better ideas for naming it than "geomspace"? It's really too bad
> that the 'logspace' name is already taken.

geomspace() is a perfectly cromulent name, IMO.

> - I guess the alternative interface might be something like
>
> np.linspace(start, stop, steps, spacing="log")
>
> what do people think?

In a new function not named `linspace()`, I think that might be fine. I do
occasionally want to swap between linear and logarithmic/geometric spacing
based on a parameter, so this doesn't violate the van Rossum Rule of
Function Signatures.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] making "low" optional in numpy.randint

2016-02-17 Thread Robert Kern
He was talking consistently about "random integers" not
"random_integers()". :-)

On Wednesday, 17 February 2016, G Young <gfyoun...@gmail.com> wrote:

> Your statement is a little self-contradictory, but in any case, you
> shouldn't worry about random_integers getting removed from the code-base.
> However, it has been deprecated in favor of randint.
>
> On Wed, Feb 17, 2016 at 11:48 PM, Juan Nunez-Iglesias <jni.s...@gmail.com
> <javascript:_e(%7B%7D,'cvml','jni.s...@gmail.com');>> wrote:
>
>> Also fwiw, I think the 0-based, half-open interval is one of the best
>> features of Python indexing and yes, I do use random integers to index into
>> my arrays and would not appreciate having to litter my code with "-1"
>> everywhere.
>>
>> On Thu, Feb 18, 2016 at 10:29 AM, Alan Isaac <alan.is...@gmail.com
>> <javascript:_e(%7B%7D,'cvml','alan.is...@gmail.com');>> wrote:
>>
>>> On 2/17/2016 3:42 PM, Robert Kern wrote:
>>>
>>>> random.randint() was the one big exception, and it was considered a
>>>> mistake for that very reason, soft-deprecated in favor of
>>>> random.randrange().
>>>>
>>>
>>>
>>> randrange also has its detractors:
>>> https://code.activestate.com/lists/python-dev/138358/
>>> and following.
>>>
>>> I think if we start citing persistant conventions, the
>>> persistent convention across *many* languages that the bounds
>>> provided for a random integer range are inclusive also counts for
>>> something, especially when the names are essentially shared.
>>>
>>> But again, I am just trying to be clear about what is at issue,
>>> not push for a change.  I think citing non-existent standards
>>> is not helpful.  I think the discrepancy between the Python
>>> standard library and numpy for a function going by a common
>>> name is harmful.  (But then, I teach.)
>>>
>>> fwiw,
>>>
>>> Alan
>>>
>>>
>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@scipy.org
>>> <javascript:_e(%7B%7D,'cvml','NumPy-Discussion@scipy.org');>
>>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> <javascript:_e(%7B%7D,'cvml','NumPy-Discussion@scipy.org');>
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>

-- 
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] making "low" optional in numpy.randint

2016-02-17 Thread Robert Kern
On Wed, Feb 17, 2016 at 8:43 PM, G Young <gfyoun...@gmail.com> wrote:

> Josef: I don't think we are making people think more.  They're all
keyword arguments, so if you don't want to think about them, then you leave
them as the defaults, and everyone is happy.

I believe that Josef has the code's reader in mind, not the code's writer.
As a reader of other people's code (and I count 6-months-ago-me as one such
"other people"), I am sure to eventually encounter all of the different
variants, so I will need to know all of them.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] making "low" optional in numpy.randint

2016-02-17 Thread Robert Kern
On Wed, Feb 17, 2016 at 8:30 PM, Alan Isaac <alan.is...@gmail.com> wrote:
>
> On 2/17/2016 12:28 PM, G Young wrote:
>>
>> Perhaps, but we are not coding in Haskell.  We are coding in Python, and
>> the standard is that the endpoint is excluded, which renders your point
>> moot I'm afraid.
>
> I am not sure what "standard" you are talking about.
> I thought we were talking about the user interface.

It is a persistent and consistent convention (i.e. "standard") across
Python APIs that deal with integer ranges (range(), slice(),
random.randrange(), ...), particularly those that end up related to
indexing; e.g. `x[np.random.randint(0, len(x))]` to pull a random sample
from an array.

random.randint() was the one big exception, and it was considered a mistake
for that very reason, soft-deprecated in favor of random.randrange().

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] making "low" optional in numpy.randint

2016-02-17 Thread Robert Kern
On Wed, Feb 17, 2016 at 4:40 PM, Alan Isaac <alan.is...@gmail.com> wrote:
>
> Behavior of random integer generation:
> Python randint[a,b]
> MATLAB randi  [a,b]
> Mma RandomInteger [a,b]
> haskell randomR   [a,b]
> GAUSS rndi[a,b]
> Maple rand[a,b]
>
> In short, NumPy's `randint` is non-standard (and,
> I would add, non-intuitive).  Presumably was due
> due to relying on a float draw from [0,1) along
> with the use of floor.

No, never was. It is implemented so because Python uses semi-open integer
intervals by preference because it plays most nicely with 0-based indexing.
Not sure about all of those systems, but some at least are 1-based
indexing, so closed intervals do make sense.

The Python stdlib's random.randint() closed interval is considered a
mistake by python-dev leading to the implementation and preference for
random.randrange() instead.

> The divergence in behavior between the (later) Python
> function of the same name is particularly unfortunate.

Indeed, but unfortunately, this mistake dates way back to Numeric times,
and easing the migration to numpy was a priority in the heady days of numpy
1.0.

> So I suggest further work on this function is
> not called for, and use of `random_integers`
> should be encouraged.  Probably NumPy's `randint`
> should be deprecated.

Not while I'm here. Instead, `random_integers()` is discouraged and perhaps
might eventually be deprecated.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Suggestion: special-case np.array(range(...)) to be faster

2016-02-15 Thread Robert Kern
On Mon, Feb 15, 2016 at 4:24 PM, Jeff Reback <jeffreb...@gmail.com> wrote:
>
> just an FYI.
>
> pandas implemented a RangeIndex in upcoming 0.18.0, mainly for memory
savings,
> see here, similar to how python range/xrange work.
>
> though there are substantial perf benefits, mainly with set operations,
see here
> though didn't officially benchmark thes.

Since it is a numpy-aware object (unlike the builtins), you can (and have,
if I'm reading the code correctly) implement __array__() such that it does
the correctly performant thing and call np.arange(). RangeIndex won't be
adversely impacted by retaining the status quo.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Hook in __init__.py to let distributors patch numpy

2016-02-12 Thread Robert Kern
I would add a numpy/_distributor_init.py module and unconditionally import
it in the __init__.py. It's contents in our upstream sources would just be
a docstring:

"""Distributors! Put your initialization code here!
"""

One important technical benefit is that the unconditional import won't hide
ImportErrors in the distributor's code.

On Fri, Feb 12, 2016 at 1:19 AM, Matthew Brett <matthew.br...@gmail.com>
wrote:

> Hi,
>
> Over at https://github.com/numpy/numpy/issues/5479 we're discussing
> Windows wheels.
>
> On thing that we would like to be able to ship Windows wheels, is to
> be able to put some custom checks into numpy when you build the
> wheels.
>
> Specifically, for Windows, we're building on top of ATLAS BLAS /
> LAPACK, and we need to check that the system on which the wheel is
> running, has SSE2 instructions, otherwise we know ATLAS will crash
> (almost everybody does have SSE2 these days).
>
> The way I propose we do that, is this patch here:
>
> https://github.com/numpy/numpy/pull/7231
>
> diff --git a/numpy/__init__.py b/numpy/__init__.py
> index 0fcd509..ba3ba16 100644
> --- a/numpy/__init__.py
> +++ b/numpy/__init__.py
> @@ -190,6 +190,12 @@ def pkgload(*packages, **options):
>  test = testing.nosetester._numpy_tester().test
>  bench = testing.nosetester._numpy_tester().bench
>
> +# Allow platform-specific build to intervene in numpy init
> +try:
> +from . import _distributor_init
> +except ImportError:
> +pass
> +
>  from . import core
>  from .core import *
>  from . import compat
>
> So, numpy __init__.py looks for a module `_distributor_init`, in which
> the distributor might have put custom code to do any checks and
> initialization needed for the particular platform.  We don't by
> default ship a `_distributor_init.py` but leave it up to packagers to
> generate this when building binaries.
>
> Does that sound like a sensible approach to y'all?
>
> Cheers,
>
> Matthew
> _______
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>



-- 
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Behavior of np.random.uniform

2016-01-21 Thread Robert Kern
On Thu, Jan 21, 2016 at 7:06 AM, Jaime Fernández del Río <
jaime.f...@gmail.com> wrote:
>
> There doesn't seem to be much of a consensus on the way to go, so leaving
things as they are and have been seems the wisest choice for now, thanks
for all the feedback. I will work with Greg on documenting the status quo
properly.

Ugh. Be careful in documenting the way things currently work. No one
intended for it to work that way! No one should rely on high___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Behavior of np.random.uniform

2016-01-21 Thread Robert Kern
On Tue, Jan 19, 2016 at 5:35 PM, Sebastian Berg <sebast...@sipsolutions.net>
wrote:
>
> On Di, 2016-01-19 at 16:28 +, G Young wrote:
> > In rand range, it raises an exception if low >= high.
> >
> > I should also add that AFAIK enforcing low >= high with floats is a
> > lot trickier than it is for integers.  I have been knee-deep in
> > corner cases for some time with randint where numbers that are
> > visually different are cast as the same number by numpy due to
> > rounding and representation issues.  That situation only gets worse
> > with floats.
> >
>
> Well, actually random.uniform docstring says:
>
> Get a random number in the range [a, b) or [a, b] depending on
> rounding.

Which docstring are you looking at? The current one says [low, high)

http://docs.scipy.org/doc/numpy/reference/generated/numpy.random.uniform.html#numpy.random.uniform

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Behavior of np.random.uniform

2016-01-19 Thread Robert Kern
On Tue, Jan 19, 2016 at 5:40 PM, Charles R Harris <charlesr.har...@gmail.com>
wrote:
>
> On Tue, Jan 19, 2016 at 10:36 AM, Robert Kern <robert.k...@gmail.com>
wrote:
>>
>> On Tue, Jan 19, 2016 at 5:27 PM, Charles R Harris <
charlesr.har...@gmail.com> wrote:
>> >
>>
>> > On Tue, Jan 19, 2016 at 9:23 AM, Chris Barker - NOAA Federal <
chris.bar...@noaa.gov> wrote:
>> >>
>> >> What does the standard lib do for rand range? I see that randint Is
closed on both ends, so order doesn't matter, though if it raises for b<a,
then that's a precedent we could follow.
>> >
>> > randint is not closed on the high end. The now deprecated
random_integers is the function that does that.
>> >
>> > For floats, it's good to have various interval options. For instance,
in generating numbers that will be inverted or have their log taken it is
good to avoid zero. However, the names 'low' and 'high' are misleading...
>>
>> They are correctly leading the users to the manner in which the author
intended the function to be used. The *implementation* is misleading by
allowing users to do things contrary to the documented intent. ;-)
>>
>> With floating point and general intervals, there is not really a good
way to guarantee that the generated results avoid the "open" end of the
specified interval or even stay *within* that interval. This function is
definitely not intended to be used as `uniform(closed_end, open_end)`.
>
> Well, it is possible to make that happen if one is careful or directly
sets the bits in ieee types...

For the unit interval, certainly. For general bounds, I am not so sure.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Behavior of np.random.uniform

2016-01-19 Thread Robert Kern
On Tue, Jan 19, 2016 at 5:27 PM, Charles R Harris <charlesr.har...@gmail.com>
wrote:
>

> On Tue, Jan 19, 2016 at 9:23 AM, Chris Barker - NOAA Federal <
chris.bar...@noaa.gov> wrote:
>>
>> What does the standard lib do for rand range? I see that randint Is
closed on both ends, so order doesn't matter, though if it raises for b<a,
then that's a precedent we could follow.
>
> randint is not closed on the high end. The now deprecated random_integers
is the function that does that.
>
> For floats, it's good to have various interval options. For instance, in
generating numbers that will be inverted or have their log taken it is good
to avoid zero. However, the names 'low' and 'high' are misleading...

They are correctly leading the users to the manner in which the author
intended the function to be used. The *implementation* is misleading by
allowing users to do things contrary to the documented intent. ;-)

With floating point and general intervals, there is not really a good way
to guarantee that the generated results avoid the "open" end of the
specified interval or even stay *within* that interval. This function is
definitely not intended to be used as `uniform(closed_end, open_end)`.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Behavior of np.random.uniform

2016-01-19 Thread Robert Kern
On Tue, Jan 19, 2016 at 5:36 PM, Robert Kern <robert.k...@gmail.com> wrote:
>
> On Tue, Jan 19, 2016 at 5:27 PM, Charles R Harris <
charlesr.har...@gmail.com> wrote:
> >
>
> > On Tue, Jan 19, 2016 at 9:23 AM, Chris Barker - NOAA Federal <
chris.bar...@noaa.gov> wrote:
> >>
> >> What does the standard lib do for rand range? I see that randint Is
closed on both ends, so order doesn't matter, though if it raises for b<a,
then that's a precedent we could follow.
> >
> > randint is not closed on the high end. The now deprecated
random_integers is the function that does that.
> >
> > For floats, it's good to have various interval options. For instance,
in generating numbers that will be inverted or have their log taken it is
good to avoid zero. However, the names 'low' and 'high' are misleading...
>
> They are correctly leading the users to the manner in which the author
intended the function to be used. The *implementation* is misleading by
allowing users to do things contrary to the documented intent. ;-)
>
> With floating point and general intervals, there is not really a good way
to guarantee that the generated results avoid the "open" end of the
specified interval or even stay *within* that interval. This function is
definitely not intended to be used as `uniform(closed_end, open_end)`.

There are special cases that *can* be implemented and are worth doing so as
they are building blocks for other distributions that do need to avoid 0 or
1 as you say. Full-featured RNG suites do offer these:

  [0, 1]
  [0, 1)
  (0, 1]
  (0, 1)

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Software Capabilities of NumPy in Our Tensor Survey Paper

2016-01-15 Thread Robert Kern
On Fri, Jan 15, 2016 at 5:30 PM, Nathaniel Smith <n...@pobox.com> wrote:
>
> On Jan 15, 2016 8:36 AM, "Li Jiajia" <jiaji...@gatech.edu> wrote:
> >
> > Hi all,
> > I’m a PhD student in Georgia Tech. Recently, we’re working on a survey
paper about tensor algorithms: basic tensor operations, tensor
decomposition and some tensor applications. We are making a table to
compare the capabilities of different software and planning to include
NumPy. We’d like to make sure these parameters are correct to make a fair
compare. Although we have looked into the related documents, please help us
to confirm these. Besides, if you think there are more features of your
software and a more preferred citation, please let us know. We’ll consider
to update them. We want to show NumPy supports tensors, and we also include
"scikit-tensor” in our survey, which is based on NumPy.
> > Please let me know any confusion or any advice!
> > Thanks a lot! :-)
> >
> > Notice:
> > 1. “YES/NO” to show whether or not the software supports the operation
or has the feature.
> > 2. “?” means we’re not sure of the feature, and please help us out.
> > 3. “Tensor order” means the maximum number of tensor dimensions that
users can do with this software.
> > 4. For computational cores,
> > 1) "Element-wise Tensor Operation (A * B)” includes element-wise
add/minus/multiply/divide, also Kronecker, outer and Katri-Rao products. If
the software contains one of them, we mark “YES”.
> > 2) “TTM” means tensor-times-matrix multiplication. We distinguish TTM
from tensor contraction. If the software includes tensor contraction, it
can also support TTM.
> > 3) For “MTTKRP”, we know most software can realize it through the above
two operations. We mark it “YES”, only if an specified optimization for the
whole operation.
>
> NumPy has support for working with multidimensional tensors, if you like,
but it doesn't really use the tensor language and notation (preferring
instead to think in terms of "arrays" as a somewhat more computationally
focused and less mathematically focused conceptual framework).
>
> Which is to say that I actually have no idea what all those jargon terms
you're asking about mean :-) I am suspicious that NumPy supports more of
those operations than you have marked, just under different names/notation,
but really can't tell either way for sure without knowing what exactly they
are.

In particular check if your operations can be expressed with einsum()

http://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.einsum.html

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Get rid of special scalar arithmetic

2016-01-13 Thread Robert Kern
On Wed, Jan 13, 2016 at 5:18 AM, Charles R Harris <charlesr.har...@gmail.com>
wrote:
>
> Hi All,
>
> I've opened issue #7002, reproduced below, for discussion.
>>
>> Numpy umath has a file scalarmath.c.src that implements scalar
arithmetic using special functions that are about 10x faster than the
equivalent ufuncs.
>>
>> In [1]: a = np.float64(1)
>>
>> In [2]: timeit a*a
>> 1000 loops, best of 3: 69.5 ns per loop
>>
>> In [3]: timeit np.multiply(a, a)
>> 100 loops, best of 3: 722 ns per loop
>>
>> I contend that in large programs this improvement in execution time is
not worth the complexity and maintenance overhead; it is unlikely that
scalar-scalar arithmetic is a significant part of their execution time.
Therefore I propose to use ufuncs for all of the scalar-scalar arithmetic.
This would also bring the benefits of __numpy_ufunc__ to scalars with
minimal effort.
>
> Thoughts?

Not all important-to-optimize programs are large in our field; interactive
use is rampant. The scalar optimizations weren't added speculatively:
people noticed that their Numeric code ran much slower under numpy and were
reluctant to migrate. I was forever responding on comp.lang.python, "It's
because scalar arithmetic hasn't been optimized yet. We know how to do it,
we just need a volunteer to do the work. Contributions gratefully
accepted!" The most critical areas tended to be optimization where you are
often working with implicit scalars that pop out in the optimization loop.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] deprecate random.random_integers

2016-01-04 Thread Robert Kern
On Sun, Jan 3, 2016 at 11:51 PM, G Young <gfyoun...@gmail.com> wrote:
>
> Hello all,
>
> In light of the discussion in #6910, I have gone ahead and deprecated
random_integers in my most recent PR here.  As this is an API change (sort
of), what are people's thoughts on this deprecation?

I'm reasonably in favor. random_integers() with its closed-interval
convention only exists because it existed in Numeric's RandomArray module.
The closed-interval convention was broadly been considered to be a mistake
introduced early in the stdlib random module and rectified with the
introduction and promotion of random.randrange() instead.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy funding update

2015-12-31 Thread Robert Kern
On Wed, Dec 30, 2015 at 10:54 AM, Ralf Gommers <ralf.gomm...@gmail.com>
wrote:
>
> Hi all,
>
> A quick good news message: OSDC has made a $5k contribution to NumFOCUS,
which is split between support for a women in technology workshop and
support for Numpy:
http://www.numfocus.org/blog/osdc-donates-5k-to-support-numpy-women-in-tech
> This was a very nice surprise to me, and a first sign that the FSA
(fiscal sponsorship agreement) we recently signed with NumFOCUS is going to
yield significant benefits for Numpy.
>
> NumFOCUS is also doing a special end-of-year fundraiser. Funds donated
(up to $5k) will be tripled by anonymous sponsors:
http://www.numfocus.org/blog/numfocus-end-of-year-fundraising-drive-5000-matching-gift-challenge
> So think of Numpy (or your other favorite NumFOCUS-sponsored project of
course) if you're considering a holiday season charitable gift!

That sounds great! Do we have any concrete plans for spending that money,
yet?

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] FeatureRequest: support for array construction from iterators

2015-12-14 Thread Robert Kern
On Mon, Dec 14, 2015 at 3:56 PM, Benjamin Root <ben.v.r...@gmail.com> wrote:

> By the way, any reason why this works?
> >>> np.array(xrange(10))
> array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

It's not a generator. It's a true sequence that just happens to have a
special implementation rather than being a generic container.

>>> len(xrange(10))
10
>>> xrange(10)[5]
5

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] FeatureRequest: support for array construction from iterators

2015-12-14 Thread Robert Kern
On Mon, Dec 14, 2015 at 5:41 PM, Benjamin Root <ben.v.r...@gmail.com> wrote:
>
> Heh, never noticed that. Was it implemented more like a
generator/iterator in older versions of Python?

No, it predates generators and iterators so it has always had to be
implemented like that.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] reshaping array question

2015-11-17 Thread Robert Kern
On Tue, Nov 17, 2015 at 3:48 PM, Neal Becker <ndbeck...@gmail.com> wrote:
>
> I have an array of shape
> (7, 24, 2, 1024)
>
> I'd like an array of
> (7, 24, 2048)
>
> such that the elements on the last dimension are interleaving the elements
> from the 3rd dimension
>
> [0,0,0,0] -> [0,0,0]
> [0,0,1,0] -> [0,0,1]
> [0,0,0,1] -> [0,0,2]
> [0,0,1,1] -> [0,0,3]
> ...
>
> What might be the simplest way to do this?

np.transpose(A, (-2, -1)).reshape(A.shape[:-2] + (-1,))

> 
> A different question, suppose I just want to stack them
>
> [0,0,0,0] -> [0,0,0]
> [0,0,0,1] -> [0,0,1]
> [0,0,0,2] -> [0,0,2]
> ...
> [0,0,1,0] -> [0,0,1024]
> [0,0,1,1] -> [0,0,1025]
> [0,0,1,2] -> [0,0,1026]
> ...

A.reshape(A.shape[:-2] + (-1,))

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] reshaping array question

2015-11-17 Thread Robert Kern
On Nov 17, 2015 6:53 PM, "Sebastian Berg" <sebast...@sipsolutions.net>
wrote:
>
> On Di, 2015-11-17 at 13:49 -0500, Neal Becker wrote:
> > Robert Kern wrote:
> >
> > > On Tue, Nov 17, 2015 at 3:48 PM, Neal Becker <ndbeck...@gmail.com>
wrote:
> > >>
> > >> I have an array of shape
> > >> (7, 24, 2, 1024)
> > >>
> > >> I'd like an array of
> > >> (7, 24, 2048)
> > >>
> > >> such that the elements on the last dimension are interleaving the
> > >> elements from the 3rd dimension
> > >>
> > >> [0,0,0,0] -> [0,0,0]
> > >> [0,0,1,0] -> [0,0,1]
> > >> [0,0,0,1] -> [0,0,2]
> > >> [0,0,1,1] -> [0,0,3]
> > >> ...
> > >>
> > >> What might be the simplest way to do this?
> > >
> > > np.transpose(A, (-2, -1)).reshape(A.shape[:-2] + (-1,))
> >
> > I get an error on that 1st transpose:
> >
>
> Transpose needs a slightly different input. If you look at the help, it
> should be clear. The help might also point to np.swapaxes, which may be
> a bit more straight forward for this exact case.

Sorry about that. Was in a rush and working from a faulty memory.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Nansum function behavior

2015-10-23 Thread Robert Kern
On Fri, Oct 23, 2015 at 5:45 PM, Charles Rilhac <webmastert...@gmail.com>
wrote:
>
> Hello,
>
> I noticed the change regarding nan function and especially nansum
function. I think this choice is a big mistake. I know that Matlab and R
have made this choice but it is illogical and counterintuitive.

What change are you referring to?

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [Feature Suggestion]More comparison functions for floating point numbers

2015-10-19 Thread Robert Kern
On Mon, Oct 19, 2015 at 9:51 PM, cy18 <thec...@gmail.com> wrote:
>
> It would be useful when we need to subtracting a bit before comparing by
greater or less. By subtracting a bit, we only have an absolute error
tolerance and with the new functions, we can have both absolute and
relative error tolerance. This is how isclose(a, b) better than
abs(a-b)<=atol.

You just adjust the value by whichever tolerance is greatest in magnitude.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Cython-based OpenMP-accelerated quartic polynomial solver

2015-09-30 Thread Robert Kern
On Wed, Sep 30, 2015 at 10:35 AM, Matthieu Brucher <
matthieu.bruc...@gmail.com> wrote:
>
> Yes, obviously, the code has NR parts, so it can't be licensed as BSD
> as it is...

It's not obvious to me, especially after Juha's further clarifications.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] interpretation of the draft governance document (was Re: Governance model request)

2015-09-24 Thread Robert Kern
On Wed, Sep 23, 2015 at 11:12 PM, Travis Oliphant <tra...@continuum.io>
wrote:

>> [We also have to decide on the initial membership for the Council. While
the above text makes pains to distinguish between "committer" and "Council
Member", in the past we've pretty much treated them as the same. So to keep
things simple and deterministic, I propose that we seed the Council with
everyone who has reviewed/merged a pull request since Jan 1, 2014, and move
those who haven't used their commit bit in >1.5 years to the emeritus list.
Based on the output of
>>
>>git log --grep="^Merge pull request" --since 2014-01-01 | grep
Author: | sort -u
>>
>> I believe this would give us an initial Steering Council of: Sebastian
Berg, Jaime Fernández del Río, Ralf Gommers, Alex Griffing, Charles Harris,
Nathaniel Smith, Julian Taylor, and Pauli Virtanen (assuming everyone on
that list is interested/willing to serve).]
>
> I would ask to be on this initial council by having the rule include
"original contributors of the code" which would basically include Robert
Kern and Pearu Peterson in addition to Chuck Harris (though Pearu's main
contribution was numpy.distutils and f2py and he may not be interested any
longer in the rest of it).

Were I to want such a thing, I would simply review and merge one of the
many languishing PRs. I have the power to add myself to this list without
asking for special exemptions to the general rule, merely a token bit of
effort on my part.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [Python-ideas] Should our default random number generator be secure?

2015-09-19 Thread Robert Kern
On Sat, Sep 19, 2015 at 2:26 PM, Antoine Pitrou <solip...@pitrou.net> wrote:
>
> On Mon, 14 Sep 2015 19:02:58 +0200
> Sturla Molden <sturla.mol...@gmail.com> wrote:
> > On 14/09/15 10:34, Antoine Pitrou wrote:
> >
> > > If Numpy wanted to switch to a different generator, and if Numba
wanted
> > > to remain compatible with Numpy, one of the PCG functions would be an
> > > excellent choice (also for CPU performance, incidentally).
> >
> > Is Apache license ok in NumPy?
>
> While I don't know Numpy's policy precisely, I have no idea why a
> modern non-copyleft license such as Apache wouldn't be ok.

GPLv2 incompatibility: http://www.apache.org/licenses/GPL-compatibility.html

> That said, PCG looks simple enough that you can reimplement it (or at
> least the variants that are of interest to you) independently without
> too much effort.

The author has agreed in principle to relicense the code as MIT, though has
not yet merged the PR that accomplishes this yet. That said, we'd probably
end up doing a significant amount of rewriting so that we will have a C
implementation of software-uint128 arithmetic.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [Python-ideas] Should our default random number generator be secure?

2015-09-14 Thread Robert Kern
[Tim, ping me if you want to get dropped from the reply chain, as we are
liable to get more into numpy decision-making. I've dropped python-ideas.]

On Mon, Sep 14, 2015 at 4:34 AM, Tim Peters <tim.pet...@gmail.com> wrote:
>
> [Robert Kern <robert.k...@gmail.com>]
> >> ...
> >> I'll also recommend the PCG paper (and algorithm) as the author's
> >> cross-PRNGs comparisons have been bandied about in this thread
already. The
> >> paper lays out a lot of the relevant issues and balances the various
> >> qualities that are important: statistical quality, speed, and security
(of
> >> various flavors).
> >>
> >>   http://www.pcg-random.org/paper.html
> >>
> >> I'm actually not that impressed with Random123. The core idea is nice
and
> >> clean, but the implementation is hideously complex.
>
> [Nathaniel Smith <n...@pobox.com>]
> > I'm curious what you mean by this? In terms of the code, or...?

In terms of the code that would be necessary to implement this in a
simultaneously performant and cross-platform way in numpy.random.

> > I haven't looked at the code,

I recommend it! Approach it with the practical goal of "how do I stick this
in as a new core PRNG for numpy.random?"

> > but the simplest generator they
> > recommend in the paper is literally just
> >
> > def rng_stream(seed):
> > counter = 0
> > while True:
> > # AES128 takes a 128 bit key and 128 bits of data and returns
> > 128 bits of encrypted data
> > val = AES128(key=seed, data=counter)
> > yield low_64_bits(val)
> > yield high_64_bits(val)
> > counter += 1
>
> I assume it's because if you expand what's required to _implement_
> AES128() in C, it does indeed look pretty hideously complex.  On HW
> implementing AES primitives, of course the code can be much simpler.
>
> But to be fair, if integer multiplication and/or addition weren't
> implemented in HW, and we had to write to C code emulating them via
> bit-level fiddling, the code for any of the PCG algorithms would look
> hideously complex too.
>
> But being fair isn't much fun ;-)

Actually, I meant all of the crap *around* it, the platform-compatibility
testing to see if you have such a hardware instruction or not, and C++
template shenanigans in the surrounding code. It's possible that the
complexity is only due to flexibility, but it was too complex for me to
begin understanding *why* it's so complex before I succumbed to ennui and
moved on to some other productive use of my free time. At least some of the
complexity is due to needing software implementations of reduced-round
crypto for decent performance in the absence of the hardware instruction.
Performing well in the absence of the hardware instruction is very
important to me as I do not seem to have the AES-NI instruction available
on my mid-2012 Macbook Pro. Exposing counter-mode AES128 as a core PRNG is
a nice idea, but it's just low on my wishlist. I want fast, multiple
independent streams on my current hardware first, and PCG gives that to me.

On the "if I had to implement all of the bit-fiddling myself" count, I'd
still prefer PCG. It has a similar problem with 128-bit integers that may
or may not be natively provided (should you want a 128-bit state), and I
have to say that the software implementation of that bit-fiddling is much
nicer to work with than reduced-round Threefry. It's reference
implementation is also overly complex for practical use as it is showing
off the whole parameterized PCG family to support the paper, even the
explicitly unrecommended members. But it's relatively straightforward to
disentangle the desirable members; their implementations, including the
supporting 128-bit arithmetic code, are small and clean.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] OK to upload patched 1.9.2 for Python 3.5?

2015-09-14 Thread Robert Kern
On Mon, Sep 14, 2015 at 9:32 AM, Matthew Brett <matthew.br...@gmail.com>
wrote:
>
> Hi,
>
> On Mon, Sep 14, 2015 at 1:22 AM, David Cournapeau <courn...@gmail.com>
wrote:
> >
> >
> > On Mon, Sep 14, 2015 at 9:18 AM, Matthew Brett <matthew.br...@gmail.com>
> > wrote:
> >>
> >> Hi,
> >>
> >> I'm just building numpy 1.9.2 for Python 3.5 (just released).
> >>
> >> In order to get the tests to pass on Python 3.5, I need to cherry pick
> >> commit 7d6aa8c onto the 1.9.2 tag position.
> >>
> >> Does anyone object to me uploading a wheel built from this patched
> >> version to pypi as 1.9.2 for Python 3.5 on OSX?   It would help to get
> >> the ball rolling for Python 3.5 binary wheels.
> >
> >
> > Why not releasing this as 1.9.3 ? It does not need to be a full release
> > (with binaries and all), but having multiple sources for a given tag is
> > confusing.
>
> Generally OK with me, but it's quite a bit of extra work for very
> little gain.  We'd have to tag, release a source tarball and OSX
> wheels, at least.

I think it's highly desirable that we also have a *source* release that
builds on Python 3.5, irrespective of whether or not we have binary wheels
for a couple of platforms up for Python 3.5. So I would encourage a quick
1.9.3 release that incorporates this patch.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] strange casting rules

2015-07-29 Thread Robert Kern
On Wed, Jul 29, 2015 at 1:07 PM, Neal Becker ndbeck...@gmail.com wrote:

 np.uint64(-1)+0
 Out[36]: 1.8446744073709552e+19

 I often work on signal processing requiring bit-exact integral arithmetic.
 Promoting to float is not helpful - I don't understand the logic of the
 above example.

See this thread:

http://mail.scipy.org/pipermail/numpy-discussion/2015-July/073196.html

Cast your 0 to a uint64 or other unsigned int type to avoid this.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] difference between dtypes

2015-07-24 Thread Robert Kern
On Wed, Jul 22, 2015 at 7:45 PM, josef.p...@gmail.com wrote:

 Is there an explanation somewhere of what different basic dtypes mean,
across platforms and python versions?

  np.bool8
 type 'numpy.bool_'
  np.bool_
 type 'numpy.bool_'
  bool
 type 'bool'


 Are there any rules and recommendations or is it all folks lore?

This may help a little:

http://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html#arrays-dtypes-constructing

Basically, we accept the builtin Python type objects as a dtype argument
and do something sensible with them. float - np.float64 because Python
floats are C doubles. int - np.int32 or np.int64 depending on whatever a C
long is (i.e. depending on the 64bitness of your CPU and how your OS
chooses to deal with that). We encode those precision choices as aliases to
the corresponding specific numpy scalar types (underscored as necessary to
avoid shadowing builtins of the same name): np.float_ is np.float64, for
example.

See here for why the aliases to Python builtin types, np.int, np.float,
etc. still exist:

https://github.com/numpy/numpy/pull/6103#issuecomment-123652497

If you just need to pass a dtype= argument and want the precision that
matches the native integer and float for your platform, then I prefer to
use the Python builtin types instead of the underscored aliases; they just
look cleaner. If you need a true numpy scalar type (e.g. to construct a
numpy scalar object), of course, you must use one of the numpy scalar
types, and the underscored aliases are convenient for that. Never use the
aliases to the Python builtin types.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] difference between dtypes

2015-07-24 Thread Robert Kern
On Fri, Jul 24, 2015 at 10:05 AM, josef.p...@gmail.com wrote:

 On Fri, Jul 24, 2015 at 3:46 AM, Robert Kern robert.k...@gmail.com
wrote:

 On Wed, Jul 22, 2015 at 7:45 PM, josef.p...@gmail.com wrote:
 
  Is there an explanation somewhere of what different basic dtypes mean,
across platforms and python versions?
 
   np.bool8
  type 'numpy.bool_'
   np.bool_
  type 'numpy.bool_'
   bool
  type 'bool'
 
 
  Are there any rules and recommendations or is it all folks lore?

 This may help a little:


http://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html#arrays-dtypes-constructing

 Basically, we accept the builtin Python type objects as a dtype argument
and do something sensible with them. float - np.float64 because Python
floats are C doubles. int - np.int32 or np.int64 depending on whatever a C
long is (i.e. depending on the 64bitness of your CPU and how your OS
chooses to deal with that). We encode those precision choices as aliases to
the corresponding specific numpy scalar types (underscored as necessary to
avoid shadowing builtins of the same name): np.float_ is np.float64, for
example.

 See here for why the aliases to Python builtin types, np.int, np.float,
etc. still exist:

 https://github.com/numpy/numpy/pull/6103#issuecomment-123652497

 If you just need to pass a dtype= argument and want the precision that
matches the native integer and float for your platform, then I prefer to
use the Python builtin types instead of the underscored aliases; they just
look cleaner. If you need a true numpy scalar type (e.g. to construct a
numpy scalar object), of course, you must use one of the numpy scalar
types, and the underscored aliases are convenient for that. Never use the
aliases to the Python builtin types.

 (I don't have time to follow up on this for at least two weeks)

 my thinking was that, if there is no actual difference between bool,
np.bool and np.bool_, the np.bool could become an alias and a replacement
for np.bool_, so we can get rid of a ugly trailing underscore.
 If np.float is always float64 it could be mapped to that directly.

Well, I'll tell you why that's a bad idea in when you get back in two weeks.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Rationale for integer promotion rules

2015-07-16 Thread Robert Kern
On Thu, Jul 16, 2015 at 7:14 PM, Nathaniel Smith n...@pobox.com wrote:

 On Thu, Jul 16, 2015 at 9:18 AM, Antoine Pitrou solip...@pitrou.net
wrote:

  while adding int8 and uint8 will give int16 as result
  (promoting to the smallest fitting type).

 I understand this to be a consequence of the previous rule (results
 should match inputs) combined with the need to find a common input
 type.

Specifically, when combining signed and unsigned ints, we need to find a
signed int type that can simultaneously hold both of the *inputs*. Whether
the *output* would fit or overflow is not what the rules are concerned with.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] dimension independent copy of corner of array

2015-07-13 Thread Robert Kern
newarr[tuple(slice(0, i) for i in oldarray.shape)] = oldarray

On Mon, Jul 13, 2015 at 12:34 PM, Neal Becker ndbeck...@gmail.com wrote:

 I want to copy an array to the corner of a new array.  What is a dimension
 independent way to say:

 newarr[:i0,:i1,...] = oldarray

 where (i0,i1...) is oldarray.shape?

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion




-- 
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Backwards-incompatible improvements to numpy.random.RandomState

2015-05-24 Thread Robert Kern
On Sun, May 24, 2015 at 7:56 PM, Sturla Molden sturla.mol...@gmail.com
wrote:

 On 24/05/15 20:04, Nathaniel Smith wrote:

  I'm not sure what you're envisioning as needing a deprecation cycle? The
  neat thing about random is that we already have a way for users to say
  that they want replicability -- the use of an explicit seed --

 No, this is not sufficient for random numbers. Random sampling and
 ziggurat generators are examples. If we introduce a change (e.g. a
 bugfix) that will affect the number of calls to the entropy source, just
 setting the seed will in general not be enough to ensure backwards
 compatibility. That is e.g. the case with using ziggurat samplers
 instead of the current transcendental transforms for normal, exponential
 and gamma distributions. While ziggurat is faster (and to my knowledge)
 more accurate, it will also make a different number of calls to the
 entropy source, and hence the whole sequence will be affected, even if
 you do set a random seed.

Please reread the proposal at the top of the thread.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Backwards-incompatible improvements to numpy.random.RandomState

2015-05-24 Thread Robert Kern
On Sun, May 24, 2015 at 7:46 PM, Sturla Molden sturla.mol...@gmail.com
wrote:

 On 24/05/15 17:13, Anne Archibald wrote:
  Do we want a deprecation-like approach, so that eventually people who
  want replicability will specify versions, and everyone else gets bug
  fixes and improvements? This would presumably take several major
  versions, but it might avoid people getting unintentionally trapped on
  this version.
 
  Incidentally, bug fixes are complicated: if a bug fix uses more or fewer
  raw random numbers, it breaks repeatability not just for the call that
  got fixed but for all successive random number generations.

 If a function has a bug, changing it will change the output of the
 function. This is not special for random numbers. If not retaining the
 old erroneous output means we break-backwards compatibility, then no
 bugs can ever be fixed, anywhere in NumPy. I think we need to clarify
 what we mean by backwards compatibility for random numbers. What
 guarantees should we make from one version to another?

The policy thus far has been that we will fix bugs in the distributions and
make changes that allow a strictly wider domain of distribution parameters
(e.g. allowing b==0 where before we only allowed b0), but we will not make
other enhancements that would change existing good output.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] binary wheels for numpy?

2015-05-17 Thread Robert Kern
On Sun, May 17, 2015 at 7:50 PM, Matthew Brett matthew.br...@gmail.com
wrote:

 On Sun, May 17, 2015 at 8:22 AM, Sturla Molden sturla.mol...@gmail.com
wrote:
  Matthew Brett matthew.br...@gmail.com wrote:
 
  Yes, unfortunately we can't put MKL binaries on pypi because of the
  MKL license - see
 
  I believe we can, because we asked Intel for permission. From what I
heard
  the response was positive.

 We would need something formal from Intel saying that they do not
 require us to hold our users to their standard redistribution terms
 and that they waive the requirement that we be responsible for any
 damage to Intel that happens as a result of people using our binaries.

 I'm guessing we don't have this, but I'm happy to be corrected,

I don't think permission from Intel is the blocking issue for putting these
binaries up on PyPI. Even with Intel's permission, we would be putting up
proprietary binaries on a page that is explicitly claiming that the files
linked therein are BSD-licensed. The binaries could not be redistributed
with any GPLed module, say, pygsl.

We could host them on numpy.org on their own page that clearly explained
the license of those files, but I think PyPI is out.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance of numpy.array()

2015-04-29 Thread Robert Kern
On Wed, Apr 29, 2015 at 4:05 PM, simona bellavista afy...@gmail.com wrote:

 I work on two distinct scientific clusters. I have run the same python
code on the two clusters and I have noticed that one is faster by an order
of magnitude than the other (1min vs 10min, this is important because I run
this function many times).

 I have investigated with a profiler and I have found that the cause of
this is that (same code and same data) is the function numpy.array that is
being called 10^5 times. On cluster A it takes 2 s in total, whereas on
cluster B it takes ~6 min.  For what regards the other functions, they are
generally faster on cluster A. I understand that the clusters are quite
different, both as hardware and installed libraries. It strikes me that on
this particular function the performance is so different. I would have
though that this is due to a difference in the available memory, but
actually by looking with `top` the memory seems to be used only at 0.1% on
cluster B. In theory numpy is compiled with atlas on cluster B, and on
cluster A it is not clear, because numpy.__config__.show() returns NOT
AVAILABLE for anything.

 Does anybody has any insight on that, and if I can improve the
performance on cluster B?

Check to see if you have the Transparent Hugepages (THP) Linux kernel
feature enabled on each cluster. You may want to try turning it off. I have
recently run into a problem with a large-memory multicore machine with THP
for programs that had many large numpy.array() memory allocations. Usually,
THP helps memory-hungry applications (you can Google for the reasons), but
it does require defragmenting the memory space to get contiguous hugepages.
The system can get into a state where the memory space is so fragmented
such that trying to get each new hugepage requires a lot of extra work to
create the contiguous memory regions. In my case, a perfectly
well-performing program would suddenly slow down immensely during it's
memory-allocation-intensive actions. When I turned THP off, it started
working normally again.

If you have root, try using `perf top` to see what C functions in user
space and kernel space are taking up the most time in your process. If you
see anything like `do_page_fault()`, this, or a similar issue, is your
problem.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Non-meta indexing improvements discussion

2015-04-09 Thread Robert Kern
On Thu, Apr 9, 2015 at 8:07 AM, Sebastian Berg sebast...@sipsolutions.net
wrote:

 On Do, 2015-04-09 at 08:50 +0200, Sebastian Berg wrote:

  3. I do not know if it possible or useful, but I could imagine a module
  wide switch (similar to __future__ imports) to change the default
  indexing behaviour.

 OK, my suggestion But actually I do not know if I like it all that
 much (nor if it can be done) since 1. and 2. seem to me like enough. But
 if someone feels strongly about fancy indexing being bad, I wanted to
 point out that there may be ways to go down the road to switch numpy
 without actually switching.

I can't think of a way to actually make that work (I can list all the ways
I thought of that *don't* work if anyone insists, but it's a tedious dead
end), but it also seems to me to be a step backward. Assuming that we have
both .ortho_ix and .fancy_ix to work with, it seems to me that the
explicitness is a good thing. Even if in this module you only want to
exploit one of those semantics, your module's readers live in a wider
context where both semantics play a role. Moving your marker of this is
the non-default semantics I am using to some module or function header
away from where the semantics are actually used makes the code harder to
read.

A newish user trying to read some nontrivial indexing code may come to the
list and ask what exactly is this expression doing here? and give us just
the line of code with the expression (anecdotally, this is usually how this
scenario goes down). If we have to answer it depends; is there an
@ortho_indexing decorator at the top of the function?, that's probably a
cure worse than the disease. The properties are a good way to provide
googleable signposts right where the tricky semantics are being used.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] On responding to dubious ideas (was: Re: Advanced indexing: fancy vs. orthogonal)

2015-04-08 Thread Robert Kern
On Wed, Apr 8, 2015 at 2:06 AM, Nathaniel Smith n...@pobox.com wrote:

 On Apr 5, 2015 7:04 AM, Robert Kern robert.k...@gmail.com wrote:
 
  On Sat, Apr 4, 2015 at 10:38 PM, Nathaniel Smith n...@pobox.com wrote:
  
   On Apr 4, 2015 4:12 AM, Todd toddr...@gmail.com wrote:
   
There was no break as large as this. In fact I would say this is
even a larger change than any individual change we saw in the python 2 to 3
switch.  The basic mechanics of indexing are just too fundamental and touch
on too many things to make this sort of change feasible.
  
   I'm afraid I'm not clever enough to know how large or feasible a
change is without even seeing the proposed change.
 
  It doesn't take any cleverness. The change in question was to make the
default indexing semantics to orthogonal indexing. No matter the details of
the ultimate proposal to achieve that end, it has known minimum
consequences, at least in the broad outline. Current documentation and
books become obsolete for a fundamental operation. Current code must be
modified by some step to continue working. These are consequences inherent
in the end, not just the means to the end; we don't need a concrete
proposal in front of us to know what they are. There are ways to mitigate
these consequences, but there are no silver bullets that eliminate them.
And we can compare those consequences to approaches like Jaime's that
achieve a majority of the benefits of such a change without any of the
negative consequences. That comparison does not bode well for any proposal.

 Ok, let me try to make my point another way.

 I don't actually care at this stage in the discussion whether the change
is ultimately viable. And I don't think you should either. (For values of
you that includes everyone in the discussion, not picking on Robert in
particular :-).)

 My point is that rational, effective discussion requires giving ideas
room to breath. Sometimes ideas turn out to be not as bad as they looked.
Sometimes it turns out that they are, but there's some clever tweak that
gives you 95% of the benefits for 5% of the cost. Sometimes you generate a
better understanding of the tradeoffs that subsequently informs later
design decisions. Sometimes working through the details makes both sides
realize that there's a third option that solves both their problems.
Sometimes you merely get a very specific understanding of why the whole
approach is unreasonable that you can then, say, take to the pandas and
netcdf developers as evidence of that you made a good faith effort and ask
them to meet you half way. And all these things require understanding the
specifics of what *exactly* works or doesn't work about about idea. IMHO,
it's extremely misleading at this stage to make any assertion about whether
Jaime's approach gives the majority of benefits of such a change is
extremely misleading at this stage: not because it's wrong, but because it
totally short-circuits the discussion about what benefits we care about.
Jaime's patch certainly has merits, but does it help us make numpy and
pandas/netcdf's more compatible? Does it make it easier for Eric to teach?
Those are some specific merits that we might care about a lot, and for
which Jaime's patch may or may not help much. But that kind of nuance gets
lost when we jump straight to debating thumbs-up versus thumbs-down.

And we can get all of that discussion from discussing Jaime's proposal. I
would argue that we will get better, more focused discussion from it since
it is actually a concrete proposal and not just a wish that numpy's
indexing semantics were something else. I think that a full airing and
elaboration of Jaime's proposal (as the final PR should look quite
different than the initial one to incorporate the what is found in the
discussion) will give us a satisficing solution. I certainly think that
that is *more likely* to arrive at a satisficing solution than an attempt
to change the default indexing semantics. I can name specific improvements
that would specifically address the concerns you named if you would like.
Maybe it won't be *quite* as good (for some parties) than if Numeric chose
orthogonal indexing from the get-go, but it will likely be much better for
everyone than if numpy broke backward compatibility on this feature now.

 I cross-my-heart promise that under the current regime, no PR breaking
fancy indexing would ever get anywhere *near* numpy master without
*extensive* discussion and warnings on the list. The core devs just spent
weeks quibbling about whether a PR that adds a new field to the end of the
dtype struct would break ABI backcompat (we're now pretty sure it doesn't),
and the current standard we enforce is that every PR that touches public
API needs a list discussion, even minor extensions with no compatibility
issues at all. No one is going to sneak anything by anyone.

That is not the issue. Ralf asked you not to invite such PRs in the first
place. No one thinks that such a PR would get snuck

Re: [Numpy-discussion] On responding to dubious ideas (was: Re: Advanced indexing: fancy vs. orthogonal)

2015-04-08 Thread Robert Kern
On Wed, Apr 8, 2015 at 8:05 PM, Eric Firing efir...@hawaii.edu wrote:

 Now, can we please get back to consideration of reasonable options?

Sure, but I recommend going back to the actually topical thread (or a new
one), as this one is meta.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] On responding to dubious ideas (was: Re: Advanced indexing: fancy vs. orthogonal)

2015-04-08 Thread Robert Kern
On Wed, Apr 8, 2015 at 8:40 PM, Ralf Gommers ralf.gomm...@gmail.com wrote:

 To address in detail the list of Matthew you mention above:

   * implement orthogonal indexing as a method arr.sensible_index[...]
 That's basically Jaime's PR.

   * implement the current non-boolean fancy indexing behavior as a method
- arr.crazy_index[...]
 Not that harmful, but only makes sense in combination with the next steps.

Well, since we got the peanut butter in our meta, I might as well join in
here.

I think this step is useful even without the deprecation steps. First,
allow me to rename things in a less judgy fashion:

* arr.ortho_ix is a property that allows for orthogonal indexing, a la
Jaime's PR.

* arr.fancy_ix is a property that implements the current numpy-standard
fancy indexing.

Even though arr.fancy_ix only replicating the default semantics, having
it opens up some possibilities.

Other array-like objects can implement both of these with the same names.
Thus, to write generic code that exploits the particular behaviors of one
of these semantics, you just make sure to use the property that behaves the
way you want. Then you don't care what the default syntax does on that
object. You don't have to test if an object has arr.__orthogonal_indexing__
and have two different code paths; you just use the property that behaves
the way you want.

It also allows us to issue warnings when the default indexing syntax is
used for some of the behaviors that are weird corner cases, like [1, :,
array]. This is one of those corner cases where the behavior is probably
not what anyone actually *wants*; it was just the only thing we could do
that is consistent with the desired semantics of the rest of the cases. I
think it would be reasonable to issue a warning if the default indexing
syntax was used with it. It's probably a sign that the user thought that
indexing worked like orthogonal indexing. The warning would *not* be issued
if the arr.fancy_ix property was used, since that is an explicit signal
that the user is specifically requesting a particular set of behaviors. I
probably won't want to ever *deprecate* the behavior for the default
syntax, but a warning is easy to deal with even with old code that you
don't want to modify directly.

Lastly, I would appreciate having some signal to tell readers pay
attention; this is nontrivial index manipulation; here is a googleable term
so you can look up what this means. I almost always use the default fancy
indexing, but I'd use the arr.fancy_ix property for the nontrivial cases
just for this alone.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Advanced indexing: fancy vs. orthogonal

2015-04-05 Thread Robert Kern
On Sat, Apr 4, 2015 at 10:38 PM, Nathaniel Smith n...@pobox.com wrote:

 On Apr 4, 2015 4:12 AM, Todd toddr...@gmail.com wrote:
 
 
  On Apr 4, 2015 10:54 AM, Nathaniel Smith n...@pobox.com wrote:
  
   Core python broke backcompat on a regular basis throughout the python
   2 series, and almost certainly will again -- the bar to doing so is
   *very* high, and they use elaborate mechanisms to ease the way
   (__future__, etc.), but they do it. A few months ago there was even
   some serious consideration given to changing py3 bytestring indexing
   to return bytestrings instead of integers. (Consensus was
   unsurprisingly that this was a bad idea, but there were core devs
   seriously exploring it, and no-one complained about the optics.)
 
  There was no break as large as this. In fact I would say this is even a
larger change than any individual change we saw in the python 2 to 3
switch.  The basic mechanics of indexing are just too fundamental and touch
on too many things to make this sort of change feasible.

 I'm afraid I'm not clever enough to know how large or feasible a change
is without even seeing the proposed change.

It doesn't take any cleverness. The change in question was to make the
default indexing semantics to orthogonal indexing. No matter the details of
the ultimate proposal to achieve that end, it has known minimum
consequences, at least in the broad outline. Current documentation and
books become obsolete for a fundamental operation. Current code must be
modified by some step to continue working. These are consequences inherent
in the end, not just the means to the end; we don't need a concrete
proposal in front of us to know what they are. There are ways to mitigate
these consequences, but there are no silver bullets that eliminate them.
And we can compare those consequences to approaches like Jaime's that
achieve a majority of the benefits of such a change without any of the
negative consequences. That comparison does not bode well for any proposal.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Advanced indexing: fancy vs. orthogonal

2015-04-04 Thread Robert Kern
On Sat, Apr 4, 2015 at 9:54 AM, Nathaniel Smith n...@pobox.com wrote:

 On Sat, Apr 4, 2015 at 12:17 AM, Ralf Gommers ralf.gomm...@gmail.com
wrote:
 
  On Sat, Apr 4, 2015 at 1:54 AM, Nathaniel Smith n...@pobox.com wrote:

  So I'd be very happy to see worked out proposals for any or
  all of these approaches. It strikes me as really premature to be
  issuing proclamations about what changes might be considered. There is
  really no danger to *considering* a proposal;
 
  Sorry, I have to disagree. Numpy is already seen by some as having a
poor
  track record on backwards compatibility. Having core developers say
propose
  some backcompat break to how indexing works and we'll consider it
makes our
  stance on that look even worse. Of course everyone is free to make any
  technical proposal they deem fit and we'll consider the merits of it.
  However I'd like us to be clear that we do care strongly about backwards
  compatibility and that the fundamentals of the core of Numpy (things
like
  indexing, broadcasting, dtypes and ufuncs) will not be changed in
  backwards-incompatible ways.
 
  Ralf
 
  P.S. also not for a possible numpy 2.0 (or have we learned nothing from
  Python3?).

 I agree 100% that we should and do care strongly about backwards
 compatibility. But you're saying in one sentence that we should tell
 people that we won't consider backcompat breaks, and then in the next
 sentence that of course we actually will consider them (even if we
 almost always reject them). Basically, I think saying one thing and
 doing another is not a good way to build people's trust.

There is a difference between politely considering what proposals people
send us uninvited and inviting people to work on specific proposals. That
is what Ralf was getting at.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Mathematical functions in Numpy

2015-03-17 Thread Robert Kern
On Tue, Mar 17, 2015 at 6:29 PM, Matthieu Brucher 
matthieu.bruc...@gmail.com wrote:

 Hi,

 These functions are defined in the C standard library!

I think he's asking how to define numpy ufuncs.

 2015-03-17 18:00 GMT+00:00 Shubhankar Mohapatra mshubhan...@yahoo.co.in:
  Hello all,
  I am a undergraduate and i am trying to do a project this time on
numppy in
  gsoc. This project is about integrating vector math library classes of
sleef
  and yeppp into numpy to make the mathematical functions faster. I have
  already studied the new library classes but i am unable to find the sin
,
  cos function definitions in the numpy souce code.Can someone please
help me
  find the functions in the source code so that i can implement the new
  library class into numpy.
  Thanking you,
  Shubhankar Mohapatra
 
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 



 --
 Information System Engineer, Ph.D.
 Blog: http://matt.eifelle.com
 LinkedIn: http://www.linkedin.com/in/matthieubrucher
 Music band: http://liliejay.com/
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion




--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] random.RandomState and deepcopy

2015-03-13 Thread Robert Kern
On Fri, Mar 13, 2015 at 5:59 PM, Neal Becker ndbeck...@gmail.com wrote:

 Robert Kern wrote:

  On Fri, Mar 13, 2015 at 5:34 PM, Neal Becker ndbeck...@gmail.com
wrote:
 
  It is common that to guarantee good statistical independence between
  various
  random generators, a singleton instance of an RNG is shared between
them.
 
  So I typically have various random generator objects, which (sometimes
  several levels objects deep) embed an instance of RandomState.
 
  Now I have a requirement to copy a generator object (without knowing
  exactly
  what that generator object is).
 
  Or rather, you want the generator object to *avoid* copies by returning
  itself when a copy is requested of it.
 
  My solution is to use deepcopy on the top-level object.  But I need to
  overload __deepcopy__ on the singleton RandomState object.
 
  Unfortunately, RandomState doesn't allow customization of __deepcopy__
  (or
  anything else).  And it has no __dict__.
 
  You can always subclass RandomState to override its __deepcopy__.
 
  --
  Robert Kern

 Yes, I think I prefer this:

 from numpy.random import RandomState

 class shared_random_state (RandomState):
 def __init__ (self, rs):
 RandomState.__init__(self, rs)

 def __deepcopy__ (self, memo):
 return self

 Although, that means I have to use it like this:

 rs = shared_random_state (0)

 where I really would prefer (for aesthetic reasons):

 rs = shared_random_state (RandomState(0))

 but I don't know how to do that if shared_random_state inherits from
 RandomState.

shrug If you insist:

class shared_random_state(RandomState):
def __init__(self, rs):
self.__setstate__(rs.__getstate__())

def __deepcopy__(self, memo):
return self

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] random.RandomState and deepcopy

2015-03-13 Thread Robert Kern
On Fri, Mar 13, 2015 at 5:34 PM, Neal Becker ndbeck...@gmail.com wrote:

 It is common that to guarantee good statistical independence between
various
 random generators, a singleton instance of an RNG is shared between them.

 So I typically have various random generator objects, which (sometimes
 several levels objects deep) embed an instance of RandomState.

 Now I have a requirement to copy a generator object (without knowing
exactly
 what that generator object is).

Or rather, you want the generator object to *avoid* copies by returning
itself when a copy is requested of it.

 My solution is to use deepcopy on the top-level object.  But I need to
 overload __deepcopy__ on the singleton RandomState object.

 Unfortunately, RandomState doesn't allow customization of __deepcopy__ (or
 anything else).  And it has no __dict__.

You can always subclass RandomState to override its __deepcopy__.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] argument handling by uniform

2015-03-13 Thread Robert Kern
On Fri, Mar 13, 2015 at 3:57 PM, Alan G Isaac alan.is...@gmail.com wrote:

 Today I accidentally wrote `uni = np.random.uniform((-0.5,0.5),201)`,
 supply a tuple instead of separate low and high values.  This gave
 me two draws (from [0..201] I think).  My question: how were the
 arguments interpreted?

Broadcast against each other. Roughly equivalent to:

uni = np.array([
  np.random.uniform(-0.5, 201),
  np.random.uniform(0.5, 201),
])

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] tie breaking for max, min, argmax, argmin

2015-03-12 Thread Robert Kern
On Thu, Mar 12, 2015 at 1:31 PM, Johannes Kulick 
johannes.kul...@ipvs.uni-stuttgart.de wrote:

 Hello,

 I wonder if it would be worth to enhance max, min, argmax and argmin
(more?)
 with a tie breaking parameter: If multiple entries have the same value
the first
 value is returned by now. It would be useful to have a parameter to alter
this
 behavior to an arbitrary tie-breaking. I would propose, that the
tie-breaking
 function gets a list with all indices of the max/mins.

 Example:
  a = np.array([ 1, 2, 5, 5, 2, 1])
  np.argmax(a, tie_breaking=random.choice)
 3

  np.argmax(a, tie_breaking=random.choice)
 2

  np.argmax(a, tie_breaking=random.choice)
 2

  np.argmax(a, tie_breaking=random.choice)
 2

  np.argmax(a, tie_breaking=random.choice)
 3

 Especially for some randomized experiments it is necessary that not
always the
 first maximum is returned, but a random optimum. Thus I end up writing
these
 things over and over again.

 I understand, that max and min are crucial functions, which shouldn't be
slowed
 down by the proposed changes. Adding new functions instead of altering the
 existing ones would be a good option.

 Are there any concerns against me implementing these things and sending a
pull
 request? Should such a function better be included in scipy for example?

On the whole, I think I would prefer new functions for this. I assume you
only need variants for argmin() and argmax() and not min() and max(), since
all of the tied values for the latter two would be identical, so returning
the first one is just as good as any other.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy pickling problem - python 2 vs. python 3

2015-03-07 Thread Robert Kern
On Sat, Mar 7, 2015 at 9:54 AM, Pauli Virtanen p...@iki.fi wrote:

 How about an extra use_pickle=True kwarg that can be used to disable
 using pickle altogether in these routines?

If we do, I'd vastly prefer `forbid_pickle=False`. The use_pickle spelling
suggests that you are asking it to use pickle when it otherwise wouldn't,
which is not the intention.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] One-byte string dtype: third time's the charm?

2015-02-22 Thread Robert Kern
On Sun, Feb 22, 2015 at 7:29 PM, Sturla Molden sturla.mol...@gmail.com
wrote:

 On 22/02/15 19:21, Aldcroft, Thomas wrote:

  Problems like this are now showing up in the wild [3].  Workarounds are
  also showing up, like a way to easily convert from 'S' to 'U' within
  astropy Tables [4], but this is really not a desirable way to go.
  Gigabyte-sized string data arrays are not uncommon, so converting to
  UCS-4 is a real memory and performance hit.

 Why UCS-4? The Python's internal flexible string respresentation will
 use ascii for ascii text.

numpy's 'U' dtype is UCS-4, and this is what Thomas is referring to, not
Python's string type. It cannot have a flexible representation as it *is*
the representation. Python 3's `str` type is opaque, so it can freely
choose how to represent the data in memory. numpy dtypes transparently
describe how the data is represented in memory.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] unpacking data values into array of bits

2015-02-12 Thread Robert Kern
On Thu, Feb 12, 2015 at 2:21 PM, Neal Becker ndbeck...@gmail.com wrote:

 I need to transmit some data values.  These values will be float and long
 values.  I need them encoded into a string of bits.

 The only way I found so far to do this seems rather roundabout:


 np.unpackbits (np.array (memoryview(struct.pack ('d', pi
 Out[45]:
 array([0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1,
0,
0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0,
0,
0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0], dtype=uint8)

 (which I'm not certain is correct)

 Also, I don't know how to reverse this process

You already had your string ready for transmission with `struct.pack('d',
pi)`.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Why am I getting a FutureWarning with this code?

2014-12-22 Thread Robert Kern
On Mon, Dec 22, 2014 at 10:35 AM, Steve Spicklemire st...@spvi.com wrote:

 I’m working on visual python (http://vpython.org) which lists numpy among
its dependencies.

 I recently updated my numpy installation to 1.9.1 and I’m now
encountering this error:


/usr/local/Cellar/python/2.7.8_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/VPython-6.10-py2.7-macosx-10.10-x86_64.egg/visual_common/materials.py:70:
FutureWarning: comparison to `None` will result in an elementwise object
comparison in the future.
   self.__setattr__(key, value)

 Oddly, the code in question looks like this:


  62 from . import cvisual
  63 from numpy import array, reshape, fromstring, ubyte, ndarray,
zeros, asarray
  64 import os.path, math
  65
  66 class raw_texture(cvisual.texture):
  67 def __init__(self, **kwargs):
  68 cvisual.texture.__init__(self)
  69 for key, value in kwargs.items():
  70 self.__setattr__(key, value)

The answer is in the implementation of cvisual.texture somewhere. This is
Boost.Python C++ code, so there's a fair bit of magic going on such that I
can't pinpoint the precise line that's causing this, but I suspect that it
might be this one:

https://github.com/BruceSherwood/vpython-wx/blob/master/src/python/numeric_texture.cpp#L258

When `data` is assigned, this line checks if the value is None so that it
can raise an error to tell you not to do that (I believe, from the context,
that `py::object()` is the Boost.Python idiom for getting the None
singleton). I *think*, but am not positive, that Boost.Python converts the
C++ == operator into a Python == operation, so that would trigger this
FutureWarning.

So VPython is the code that needs to be fixed to do an identity comparison
with None rather than an equality comparison.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Add a function to broadcast arrays to a given shape to numpy's stride_tricks?

2014-12-11 Thread Robert Kern
On Thu, Dec 11, 2014 at 2:47 PM, Nathaniel Smith n...@pobox.com wrote:

 On 11 Dec 2014 14:31, Pierre Haessig pierre.haes...@crans.org wrote:
 
 
  Le 11/12/2014 01:00, Nathaniel Smith a écrit :
   Seems like a useful addition to me -- I've definitely wanted this in
   the past. I agree with Stephan that reshape() might not be the best
   place, though; I wouldn't think to look for it there.
  
   Two API ideas, which are not mutually exclusive:
  
   [...]
  
   2) Add a broadcast_to(arr, shape) function, which broadcasts the array
   to exactly the shape given, or else errors out if this is not
   possible.
  That's also possible. Then there could be a point in `reshape`
docstring.
 
  Could this function be named `broadcast` instead of `broadcast_to` ?
  (in coherence with `reshape`)

 It could, but then there wouldn't be much to distinguish it from
broadcast_arrays. Broadcasting is generally a symmetric operation - see
broadcast_arrays or arr1 + arr2. So the 'to' is there to give a clue that
this function is not symmetric, and rather has a specific goal in mind.

And we already have a numpy.broadcast() function.

http://docs.scipy.org/doc/numpy/reference/generated/numpy.broadcast.html

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] FFTS for numpy's FFTs (was: Re: Choosing between NumPy and SciPy functions)

2014-12-11 Thread Robert Kern
On Thu, Dec 11, 2014 at 3:53 PM, Eric Moore e...@redtetrahedron.org wrote:

 On Thu, Dec 11, 2014 at 10:41 AM, Todd toddr...@gmail.com wrote:

 I recently became aware of another C-library for doing FFTs (and other
things):

 https://github.com/arrayfire/arrayfire

 They claim to have comparable FFT performance to MKL when run on a CPU
(they also support running on the GPU but that is probably outside the
scope of numpy or scipy).  It used to be proprietary but now it is under a
BSD-3-Clause license.  It seems it supports non-power-of-2 FFT operations
as well (although those are slower).  I don't know much beyond that, but it
is probably worth looking in

 AFAICT the cpu backend is a FFTW wrapper.

Indeed.
https://github.com/arrayfire/arrayfire/blob/devel/src/backend/cpu/fft.cpp#L16

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Add a function to broadcast arrays to a given shape to numpy's stride_tricks?

2014-12-11 Thread Robert Kern
On Thu, Dec 11, 2014 at 4:17 PM, Sebastian Berg sebast...@sipsolutions.net
wrote:

 On Do, 2014-12-11 at 16:56 +0100, Pierre Haessig wrote:
  Le 11/12/2014 16:52, Robert Kern a écrit :
  
   And we already have a numpy.broadcast() function.
  
  
http://docs.scipy.org/doc/numpy/reference/generated/numpy.broadcast.html
  True, I once read the docstring of this function. but never used it
though.

 I am not sure it is really the right thing for most things since it
 returns an old style iterator.

Indeed. That's why I wrote broadcast_arrays().

 On the other hand arrays with 0-strides
 need a bit more care (if we add this top level, one might have copy=True
 as a default or so?).
 Also because of that it is currently limited to NPY_MAXARGS (32).
 Personally, I would like to see this type of functionality implemented
 in C, and may be willing to help with it. This kind of code exists in
 enough places in numpy so it can be stolen pretty readily. One option
 would also be to have something like:

 np.common_shape(*arrays)
 np.broadcast_to(array, shape)
 # (though I would like many arrays too)

 and then broadcast_ar rays could be implemented in terms of these two.

Why? What benefit does broadcast_arrays() get from being reimplemented in C?

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Should ndarray be a context manager?

2014-12-09 Thread Robert Kern
On Tue, Dec 9, 2014 at 5:57 PM, Julian Taylor jtaylor.deb...@googlemail.com
wrote:

 On 09.12.2014 18:55, Sturla Molden wrote:
  On 09/12/14 18:39, Julian Taylor wrote:
 
  A context manager will also not help you with reference cycles.
 
  If will because __exit__ is always executed. Even if the PyArrayObject
  struct lingers, the data buffer will be released.

 a exit function would not delete the buffer, only decrease the reference
 count of the array. If something else still holds a reference it stays
 valid.
 Otherwise you would end up with a crash when the other object holding a
 reference tries to access it.

I believe that Sturla is proposing that the buffer (the data pointer) will
indeed be free()ed and the ndarray object be modified in-place to have an
empty shape. Most references won't matter, because they are opaque; e.g. a
frame being held by a caught traceback somewhere or whatever. These are
frequently the references that keep alive large arrays when we don't them
to, and are hard to track down. The only place where you will get a crasher
is when other ndarray views on the original array are still around because
those are not opaque references.

The main problem I have is that this is much too likely to cause a segfault
to be part of the main API for ndarrays. I perhaps wouldn't mind a
non-public-API function hidden in numpy somewhere (but not in numpy.* or
even numpy.*.*) that did this. The user would put it into a finally:
clause, if that such things matters to them, instead of using it as a
context manager. Like as_strided(), which creates similar potential for
crashes, it should be not be casually available. This kind of action needs
to be an explicit statement
yes_i_dont_need_this_memory_anymore_references_be_damned() rather than
implicitly hidden behind generic syntax.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Should ndarray be a context manager?

2014-12-09 Thread Robert Kern
On Tue, Dec 9, 2014 at 8:15 PM, Chris Barker chris.bar...@noaa.gov wrote:

 (although I'm still confused as to why it's so important (in cPython) to
 have a file context manager..)


Because you want the file to close when the exception is raised and not at
some indeterminate point thereafter when the traceback stack frames finally
get disposed of, which can be an indefinitely long time.

-- 
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Finding values in an array

2014-11-28 Thread Robert Kern
On Fri, Nov 28, 2014 at 8:22 AM, Julian Taylor 
jtaylor.deb...@googlemail.com wrote:

 On 28.11.2014 04:15, Alexander Belopolsky wrote:
  I probably miss something very basic, but how given two arrays a and b,
  can I find positions in a where elements of b are located?  If a were
  sorted, I could use searchsorted, but I don't want to get valid
  positions for elements that are not in a.  In my case, a has unique
  elements, but in the general case I would accept the first match.  In
  other words, I am looking for an array analog of list.index() method.

 np.where(np.in1d(a, b))

Only if the matching elements in `b` have the same order as they do in `a`.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


  1   2   3   4   5   6   7   8   9   10   >