Re: [Numpy-discussion] Move scipy.org docs to Github?

2017-03-16 Thread Robert T. McGibbon
I have always put my docs on Amazon S3 (examples: http://mdtraj.org/1.8.0/
, .http://msmbuilder.org/3.7.0/) For static webpages, you can't beat the
cost, and there's a lot of tooling in the wild for uploading pages to S3.

It might be an option to consider.

-Robert

On Thu, Mar 16, 2017 at 5:08 PM, Pauli Virtanen <p...@iki.fi> wrote:

> Thu, 16 Mar 2017 08:15:08 +0100, Didrik Pinte kirjoitti:
> >> The advantage of something like github pages is that it's big enough
> >> that it *does* have dedicated ops support.
> >
> > Agreed. One issue is that we are working with a lot of legacy. Github
> > will more than likely be a great solution to host static web pages but
> > the evaluation for the shift needs to get into all the funky legacy
> > redirects/rewrites we have in place, etc. This is probably not a real
> > issue for docs.scipy.org but would be for other services.
>
> IIRC, there's not that many of them, so in principle it could be possible
> to cobble them with  redirects.
>
> >> As long as we can fit under the 1 gig size limit then GH pages seems
> >> like the best option so far... it's reliable, widely understood, and
> >> all of the limits besides the 1 gig size are soft limits where they say
> >> they'll work with us to figure things out.
> >
> > Another option would be to just host the content under S3 with
> > Cloudfront.
> > It will also be pretty simple as a setup, scale nicely and won't have
> > much restrictions on sizing.
>
> Some minor-ish disadvantages of this are that it brings a new set of
> credentials to manage, it will be somewhat less transparent, and the
> tooling will be less familiar to people (eg release managers) who have to
> deal with it.
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>



-- 
-Robert
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: SfePy 2017.1

2017-02-28 Thread Robert Cimrman

I am pleased to announce release 2017.1 of SfePy.

Description
---

SfePy (simple finite elements in Python) is a software for solving systems of
coupled partial differential equations by the finite element method or by the
isogeometric analysis (limited support). It is distributed under the new BSD
license.

Home page: http://sfepy.org
Mailing list: http://groups.google.com/group/sfepy-devel
Git (source) repository, issue tracker: https://github.com/sfepy/sfepy

Highlights of this release
--

- spline-box parametrization of an arbitrary field
- conda-forge recipe (thanks to Daniel Wheeler)
- fixes for Python 3.6

For full release notes see http://docs.sfepy.org/doc/release_notes.html#id1
(rather long and technical).

Cheers,
Robert Cimrman

---

Contributors to this release in alphabetical order:

Siwei Chen
Robert Cimrman
Jan Heczko
Vladimir Lukes
Matyas Novak
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fortran order in recarray.

2017-02-22 Thread Robert McLeod
Just as a note, Appveyor supports uploading modules to "public websites":

https://packaging.python.org/appveyor/

The main issue I would see from this, is the PyPi has my password stored on
my machine in a plain text file.   I'm not sure whether there's a way to
provide Appveyor with a SSH key instead.

On Wed, Feb 22, 2017 at 4:23 PM, Alex Rogozhnikov <
alex.rogozhni...@yandex.ru> wrote:

> Hi Francesc,
> thanks a lot for you reply and for your impressive job on bcolz!
>
> Bcolz seems to make stress on compression, which is not of much interest
> for me, but the *ctable*, and chunked operations look very appropriate to
> me now. (Of course, I'll need to test it much before I can say this for
> sure, that's current impression).
>
> The strongest concern with bcolz so far is that it seems to be completely
> non-trivial to install on windows systems, while pip provides binaries for
> most (or all?) OS for numpy.
> I didn't build pip binary wheels myself, but is it hard / impossible to
> cook pip-installabel binaries?
>
> ​You can change shapes of numpy arrays, but that usually involves copies
> of the whole container.
>
> sure, but this is ok for me, as I plan to organize column editing in
> 'batches', so this should require seldom copying.
> It would be nice to see an example to understand how deep I need to go
> inside numpy.
>
> Cheers,
> Alex.
>
>
>
>
> 22 февр. 2017 г., в 17:03, Francesc Alted <fal...@gmail.com> написал(а):
>
> Hi Alex,
>
> 2017-02-22 12:45 GMT+01:00 Alex Rogozhnikov <alex.rogozhni...@yandex.ru>:
>
>> Hi Nathaniel,
>>
>>
>> pandas
>>
>>
>> yup, the idea was to have minimal pandas.DataFrame-like storage (which I
>> was using for a long time),
>> but without irritating problems with its row indexing and some other
>> problems like interaction with matplotlib.
>>
>> A dict of arrays?
>>
>>
>> that's what I've started from and implemented, but at some point I
>> decided that I'm reinventing the wheel and numpy has something already. In
>> principle, I can ignore this 'column-oriented' storage requirement, but
>> potentially it may turn out to be quite slow-ish if dtype's size is large.
>>
>> Suggestions are welcome.
>>
>
> ​You may want to try bcolz:
>
> https://github.com/Blosc/bcolz
>
> bcolz is a columnar storage, basically as you require, but data is
> compressed by default even when stored in-memory (although you can disable
> compression if you want to).​
>
>
>
>>
>> Another strange question:
>> in general, it is considered that once numpy.array is created, it's shape
>> not changed.
>> But if i want to keep the same recarray and change it's dtype and/or
>> shape, is there a way to do this?
>>
>
> ​You can change shapes of numpy arrays, but that usually involves copies
> of the whole container.  With bcolz you can change length and add/del
> columns without copies.​  If your containers are large, it is better to
> inform bcolz on its final estimated size.  See:
>
> http://bcolz.blosc.org/en/latest/opt-tips.html
>
> ​Francesc​
>
>
>>
>> Thanks,
>> Alex.
>>
>>
>>
>> 22 февр. 2017 г., в 3:53, Nathaniel Smith <n...@pobox.com> написал(а):
>>
>> On Feb 21, 2017 3:24 PM, "Alex Rogozhnikov" <alex.rogozhni...@yandex.ru>
>> wrote:
>>
>> Ah, got it. Thanks, Chris!
>> I thought recarray can be only one-dimensional (like tables with named
>> columns).
>>
>> Maybe it's better to ask directly what I was looking for:
>> something that works like a table with named columns (but no labelling
>> for rows), and keeps data (of different dtypes) in a column-by-column way
>> (and this is numpy, not pandas).
>>
>> Is there such a magic thing?
>>
>>
>> Well, that's what pandas is for...
>>
>> A dict of arrays?
>>
>> -n
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
>
> --
> Francesc Alted
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 
Robert McLeod, Ph.D.
Center for Cellular Imaging and Nano Analytics (C-CINA)
Biozentrum der Universität Basel
Mattenstrasse 26, 4058 Basel
Work: +41.061.387.3225
robert.mcl...@unibas.ch
robert.mcl...@bsse.ethz.ch <robert.mcl...@ethz.ch>
robbmcl...@gmail.com
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ANN: NumExpr3 Alpha

2017-02-19 Thread Robert McLeod
Hi Juan,

A guy on reddit suggested looking at SymPy for just such a thing. I know
that Dask also represents its process as a graph.

https://www.reddit.com/r/Python/comments/5um04m/numexpr3/

I'll think about it some more but it seems a little abstract still. To a
certain extent the NE3 compiler already works this way.  The compiler has a
dictionary in which keys are `ast.Node` types, and each value is a function
pointer, which knows how to handle that particular node. Providing an
external interface to this would be the most natural extension.

There's quite a few things to do before I would think about a functional
interface. The things I mentioned in my original mail; pickling of the
C-object so that it can be using within modules like `multiprocessing`;
having a pre-allocated shared memory region shared among threads for
temporaries and parameters, etc.  If someone else wants to dabble in it
they are welcome to.

Robert

On Sun, Feb 19, 2017 at 4:19 AM, Juan Nunez-Iglesias <jni.s...@gmail.com>
wrote:

> Hi everyone,
>
> Thanks for this. It looks absolutely fantastic. I've been putting off
> using numexpr but it looks like I don't have a choice anymore. ;)
>
> Regarding feature requests, I've always found it off putting that I have
> to wrap my expressions in a string to speed them up. Has anyone explored
> the possibility of using Python 3.6's frame evaluation API to do this? I
> remember a vague discussion on this list a while back but I don't know
> whether anything came of it.
>
> Thanks!
>
> Juan.
>
> On 18 Feb 2017, 3:42 AM +1100, Robert McLeod <robbmcl...@gmail.com>,
> wrote:
>
> Hi David,
>
> Thanks for your comments, reply below the fold.
>
> On Fri, Feb 17, 2017 at 4:34 PM, Daπid <davidmen...@gmail.com> wrote:
>
>> This is very nice indeed!
>>
>> On 17 February 2017 at 12:15, Robert McLeod <robbmcl...@gmail.com> wrote:
>> > * bytes and unicode support
>> > * reductions (mean, sum, prod, std)
>>
>> I use both a lot, maybe I can help you get them working.
>>
>> Also, regarding "Vectorization hasn't been done yet with cmath
>> functions for real numbers (such as sqrt(), exp(), etc.), only for
>> complex functions". What is the bottleneck? Is it in GCC or just
>> someone has to sit down and adapt it?
>
>
> I just haven't done it yet.  Basically I'm moving from Switzerland to
> Canada in a week so this was the gap to push something out that's usable if
> not perfect. Rather I just import cmath functions, which are inlined but I
> suspect what's needed is to break them down into their components. For
> example, the complex arccos function looks like this:
>
> static void
> nc_acos( npy_intp n, npy_complex64 *x, npy_complex64 *r)
> {
> npy_complex64 a;
> for( npy_intp I = 0; I < n; I++ ) {
> a = x[I];
> _inline_mul( x[I], x[I], r[I] );
> _inline_sub( Z_1, r[I], r[I] );
> _inline_sqrt( r[I], r[I] );
> _inline_muli( r[I], r[I] );
> _inline_add( a, r[I], r[I] );
> _inline_log( r[I] , r[I] );
> _inline_muli( r[I], r[I] );
> _inline_neg( r[I], r[I]);
> }
> }
>
> I haven't sat down and inspected whether the cmath versions get
> vectorized, but there's not a huge speed difference between NE2 and 3 for
> such a function on float (but their is for complex), so my suspicion is
> they aren't.  Another option would be to add a library such as Yeppp! as
> LIB_YEPPP or some other library that's faster than glib.  For example the
> glib function "fma(a,b,c)" is slower than doing "a*b+c" in NE3, and that's
> not how it should be.  Yeppp is also built with Python generating C code,
> so it could either be very easy or very hard.
>
> On bytes and unicode, I haven't seen examples for how people use it, so
> I'm not sure where to start. Since there's practically not a limitation on
> the number of operations now (the library is 1.3 MB now, compared to 1.2 MB
> for NE2 with gcc 5.4) the string functions could grow significantly from
> what we have in NE2.
>
> With regards to reductions, NumExpr never multi-threaded them, and could
> only do outer reductions, so in the end there was no speed advantage to be
> had compared to having NumPy do them on the result.  I suspect the primary
> value there was in PyTables and Pandas where the expression had to do
> everything.  One of the things I've moved away from in NE3 is doing output
> buffering (rather it pre-allocates the output array), so for reductions the
> understanding NumExpr has of broadcasting would have to be deeper.
>
> In any event contributions would certainly be welcome.
>
> Robert
>
> --
> Robert McLeod, Ph.D.
>

Re: [Numpy-discussion] ANN: NumExpr3 Alpha

2017-02-17 Thread Robert McLeod
Hi David,

Thanks for your comments, reply below the fold.

On Fri, Feb 17, 2017 at 4:34 PM, Daπid <davidmen...@gmail.com> wrote:

> This is very nice indeed!
>
> On 17 February 2017 at 12:15, Robert McLeod <robbmcl...@gmail.com> wrote:
> > * bytes and unicode support
> > * reductions (mean, sum, prod, std)
>
> I use both a lot, maybe I can help you get them working.
>
> Also, regarding "Vectorization hasn't been done yet with cmath
> functions for real numbers (such as sqrt(), exp(), etc.), only for
> complex functions". What is the bottleneck? Is it in GCC or just
> someone has to sit down and adapt it?


I just haven't done it yet.  Basically I'm moving from Switzerland to
Canada in a week so this was the gap to push something out that's usable if
not perfect. Rather I just import cmath functions, which are inlined but I
suspect what's needed is to break them down into their components. For
example, the complex arccos function looks like this:

static void
nc_acos( npy_intp n, npy_complex64 *x, npy_complex64 *r)
{
npy_complex64 a;
for( npy_intp I = 0; I < n; I++ ) {
a = x[I];
_inline_mul( x[I], x[I], r[I] );
_inline_sub( Z_1, r[I], r[I] );
_inline_sqrt( r[I], r[I] );
_inline_muli( r[I], r[I] );
_inline_add( a, r[I], r[I] );
_inline_log( r[I] , r[I] );
_inline_muli( r[I], r[I] );
_inline_neg( r[I], r[I]);
}
}

I haven't sat down and inspected whether the cmath versions get vectorized,
but there's not a huge speed difference between NE2 and 3 for such a
function on float (but their is for complex), so my suspicion is they
aren't.  Another option would be to add a library such as Yeppp! as
LIB_YEPPP or some other library that's faster than glib.  For example the
glib function "fma(a,b,c)" is slower than doing "a*b+c" in NE3, and that's
not how it should be.  Yeppp is also built with Python generating C code,
so it could either be very easy or very hard.

On bytes and unicode, I haven't seen examples for how people use it, so I'm
not sure where to start. Since there's practically not a limitation on the
number of operations now (the library is 1.3 MB now, compared to 1.2 MB for
NE2 with gcc 5.4) the string functions could grow significantly from what
we have in NE2.

With regards to reductions, NumExpr never multi-threaded them, and could
only do outer reductions, so in the end there was no speed advantage to be
had compared to having NumPy do them on the result.  I suspect the primary
value there was in PyTables and Pandas where the expression had to do
everything.  One of the things I've moved away from in NE3 is doing output
buffering (rather it pre-allocates the output array), so for reductions the
understanding NumExpr has of broadcasting would have to be deeper.

In any event contributions would certainly be welcome.

Robert

-- 
Robert McLeod, Ph.D.
Center for Cellular Imaging and Nano Analytics (C-CINA)
Biozentrum der Universität Basel
Mattenstrasse 26, 4058 Basel
Work: +41.061.387.3225 <061%20387%2032%2025>
robert.mcl...@unibas.ch
robert.mcl...@bsse.ethz.ch <robert.mcl...@ethz.ch>
robbmcl...@gmail.com
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: NumExpr3 Alpha

2017-02-17 Thread Robert McLeod
-

* vectorize real functions (such as exp, sqrt, log) similar to the
complex_functions.hpp vectorization.
* Add a keyword (likely 'yield') to indicate that a token is intended to be
changed by a generator inside a loop with each call to NumExpr.run()

If you have any thoughts or find any issues please don't hesitate to open
an issue at the Github repo. Although unit tests have been run over the
operation space there are undoubtedly a number of bugs to squash.

Sincerely,

Robert

-- 
Robert McLeod, Ph.D.
Center for Cellular Imaging and Nano Analytics (C-CINA)
Biozentrum der Universität Basel
Mattenstrasse 26, 4058 Basel
Work: +41.061.387.3225 <061%20387%2032%2025>
robert.mcl...@unibas.ch
robert.mcl...@bsse.ethz.ch <robert.mcl...@ethz.ch>
robbmcl...@gmail.com
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] composing Euler rotation matrices

2017-02-01 Thread Robert McLeod
Instead of trying to decipher what someone wrote on a Wikipedia, why don't
you look at a working piece of source code?

e.g.

https://github.com/3dem/relion/blob/master/src/euler.cpp

Robert


On Wed, Feb 1, 2017 at 4:27 AM, Seb <splu...@gmail.com> wrote:

> On Tue, 31 Jan 2017 21:23:55 -0500,
> Joseph Fox-Rabinovitz <jfoxrabinov...@gmail.com> wrote:
>
> > Could you show what you are doing to get the statement "However, I
> > cannot reproduce this matrix via composition; i.e. by multiplying the
> > underlying rotation matrices.". I would guess something involving the
> > `*` operator instead of `@`, but guessing probably won't help you
> > solve your issue.
>
> Sure, although composition is not something I can take credit for, as
> it's a well-described operation for generating linear transformations.
> It is the matrix multiplication of two or more transformation matrices.
> In the case of Euler transformations, it's matrices specifying rotations
> around 3 orthogonal axes by 3 given angles.  I'm using `numpy.dot' to
> perform matrix multiplication on 2D arrays representing matrices.
>
> However, it's not obvious from the link I provided what particular
> rotation matrices are multiplied and in what order (i.e. what
> composition) is used to arrive at the Z1Y2X3 rotation matrix shown.
> Perhaps I'm not understanding the conventions used therein.  This is one
> of my attempts at reproducing that rotation matrix via composition:
>
> ---<cut here---start--
> ->---
> import numpy as np
>
> angles = np.radians(np.array([30, 20, 10]))
>
> def z1y2x3(alpha, beta, gamma):
> """Z1Y2X3 rotation matrix given Euler angles"""
> return np.array([[np.cos(alpha) * np.cos(beta),
>   np.cos(alpha) * np.sin(beta) * np.sin(gamma) -
>   np.cos(gamma) * np.sin(alpha),
>   np.sin(alpha) * np.sin(gamma) +
>   np.cos(alpha) * np.cos(gamma) * np.sin(beta)],
>  [np.cos(beta) * np.sin(alpha),
>   np.cos(alpha) * np.cos(gamma) +
>   np.sin(alpha) * np.sin(beta) * np.sin(gamma),
>   np.cos(gamma) * np.sin(alpha) * np.sin(beta) -
>   np.cos(alpha) * np.sin(gamma)],
>  [-np.sin(beta), np.cos(beta) * np.sin(gamma),
>   np.cos(beta) * np.cos(gamma)]])
>
> euler_mat = z1y2x3(angles[0], angles[1], angles[2])
>
> ## Now via composition
>
> def rotation_matrix(theta, axis, active=False):
> """Generate rotation matrix for a given axis
>
> Parameters
> --
>
> theta: numeric, optional
> The angle (degrees) by which to perform the rotation.  Default is
> 0, which means return the coordinates of the vector in the rotated
> coordinate system, when rotate_vectors=False.
> axis: int, optional
> Axis around which to perform the rotation (x=0; y=1; z=2)
> active: bool, optional
> Whether to return active transformation matrix.
>
> Returns
> ---
> numpy.ndarray
> 3x3 rotation matrix
> """
> theta = np.radians(theta)
> if axis == 0:
> R_theta = np.array([[1, 0, 0],
> [0, np.cos(theta), -np.sin(theta)],
> [0, np.sin(theta), np.cos(theta)]])
> elif axis == 1:
> R_theta = np.array([[np.cos(theta), 0, np.sin(theta)],
> [0, 1, 0],
> [-np.sin(theta), 0, np.cos(theta)]])
> else:
> R_theta = np.array([[np.cos(theta), -np.sin(theta), 0],
> [np.sin(theta), np.cos(theta), 0],
> [0, 0, 1]])
> if active:
> R_theta = np.transpose(R_theta)
> return R_theta
>
> ## The rotations are given as active
> xmat = rotation_matrix(angles[2], 0, active=True)
> ymat = rotation_matrix(angles[1], 1, active=True)
> zmat = rotation_matrix(angles[0], 2, active=True)
> ## The operation seems to imply this composition
> euler_comp_mat = np.dot(xmat, np.dot(ymat, zmat))
> ---<cut here---end
> ->---
>
> I believe the matrices `euler_mat' and `euler_comp_mat' should be the
> same, but they aren't, so it's unclear to me what particular composition
> is meant to produce the matrix specified by this Z1Y2X3 transformation.
> What am I missing?
>
> --
> Seb
>
> ___
> NumPy-Discussion mailing list
> NumP

Re: [Numpy-discussion] Question about numpy.random.choice with probabilties

2017-01-23 Thread Robert Kern
On Mon, Jan 23, 2017 at 9:41 AM, Nadav Har'El <n...@scylladb.com> wrote:
>
> On Mon, Jan 23, 2017 at 4:52 PM, aleba...@gmail.com <aleba...@gmail.com>
wrote:
>>
>> 2017-01-23 15:33 GMT+01:00 Robert Kern <robert.k...@gmail.com>:
>>>
>>> I don't object to some Notes, but I would probably phrase it more like
we are providing the standard definition of the jargon term "sampling
without replacement" in the case of non-uniform probabilities. To my mind
(or more accurately, with my background), "replace=False" obviously picks
out the implemented procedure, and I would have been incredibly surprised
if it did anything else. If the option were named "unique=True", then I
would have needed some more documentation to let me know exactly how it was
implemented.
>>>
>> FWIW, I totally agree with Robert
>
> With my own background (MSc. in Mathematics), I agree that this algorithm
is indeed the most natural one. And as I said, when I wanted to implement
something myself when I wanted to choose random combinations (k out of n
items), I wrote exactly the same one. But when it didn't produce the
desired probabilities (even in cases where I knew that doing this was
possible), I wrongly assumed numpy would do things differently - only to
realize it uses exactly the same algorithm. So clearly, the documentation
didn't quite explain what it does or doesn't do.

In my experience, I have seen "without replacement" mean only one thing. If
the docstring had said "returns unique items", I'd agree that it doesn't
explain what it does or doesn't do. The only issue is that "without
replacement" is jargon, and it is good to recapitulate the definitions of
such terms for those who aren't familiar with them.

> Also, Robert, I'm curious: beyond explaining why the existing algorithm
is reasonable (which I agree), could you give me an example of where it is
actually  *useful* for sampling?

The references I previously quoted list a few. One is called "multistage
sampling proportional to size". The idea being that you draw (without
replacement) from a larger units (say, congressional districts) before
sampling within them. It is similar to the situation you outline, but it is
probably more useful at a different scale, like lots of larger units (where
your algorithm is likely to provide no solution) rather than a handful.

It is probably less useful in terms of survey design, where you are trying
to *design* a process to get a result, than it is in queueing theory and
related fields, where you are trying to *describe* and simulate a process
that is pre-defined.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Question about numpy.random.choice with probabilties

2017-01-23 Thread Robert Kern
On Mon, Jan 23, 2017 at 9:22 AM, Anne Archibald <peridot.face...@gmail.com>
wrote:
>
>
> On Mon, Jan 23, 2017 at 3:34 PM Robert Kern <robert.k...@gmail.com> wrote:
>>
>> I don't object to some Notes, but I would probably phrase it more like
we are providing the standard definition of the jargon term "sampling
without replacement" in the case of non-uniform probabilities. To my mind
(or more accurately, with my background), "replace=False" obviously picks
out the implemented procedure, and I would have been incredibly surprised
if it did anything else. If the option were named "unique=True", then I
would have needed some more documentation to let me know exactly how it was
implemented.
>
>
> It is what I would have expected too, but we have a concrete example of a
user who expected otherwise; where one user speaks up, there are probably
more who didn't (some of whom probably have code that's not doing what they
think it does). So for the cost of adding a Note, why not help some of them?

That's why I said I'm fine with adding a Note. I'm just suggesting a
re-wording so that the cautious language doesn't lead anyone who is
familiar with the jargon to think we're doing something ad hoc while still
providing the details for those who aren't so familiar.

> As for the standardness of the definition: I don't know, have you a
reference where it is defined? More natural to me would be to have a list
of items with integer multiplicities (as in: "cat" 3 times, "dog" 1 time).
I'm hesitant to claim ours is a standard definition unless it's in a
textbook somewhere. But I don't insist on my phrasing.

Textbook, I'm not so sure, but it is the *only* definition I've ever
encountered in the literature:

http://epubs.siam.org/doi/abs/10.1137/0209009
http://www.sciencedirect.com/science/article/pii/S002001900500298X

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Question about numpy.random.choice with probabilties

2017-01-23 Thread Robert Kern
On Mon, Jan 23, 2017 at 6:27 AM, Anne Archibald <peridot.face...@gmail.com>
wrote:
>
> On Wed, Jan 18, 2017 at 4:13 PM Nadav Har'El <n...@scylladb.com> wrote:
>>
>> On Wed, Jan 18, 2017 at 4:30 PM, <josef.p...@gmail.com> wrote:
>>>
>>>> Having more sampling schemes would be useful, but it's not possible to
implement sampling schemes with impossible properties.
>>>
>>> BTW: sampling 3 out of 3 without replacement is even worse
>>>
>>> No matter what sampling scheme and what selection probabilities we use,
we always have every element with probability 1 in the sample.
>>
>> I agree. The random-sample function of the type I envisioned will be
able to reproduce the desired probabilities in some cases (like the example
I gave) but not in others. Because doing this correctly involves a set of n
linear equations in comb(n,k) variables, it can have no solution, or many
solutions, depending on the n and k, and the desired probabilities. A
function of this sort could return an error if it can't achieve the desired
probabilities.
>
> It seems to me that the basic problem here is that the
numpy.random.choice docstring fails to explain what the function actually
does when called with weights and without replacement. Clearly there are
different expectations; I think numpy.random.choice chose one that is easy
to explain and implement but not necessarily what everyone expects. So the
docstring should be clarified. Perhaps a Notes section:
>
> When numpy.random.choice is called with replace=False and non-uniform
probabilities, the resulting distribution of samples is not obvious.
numpy.random.choice effectively follows the procedure: when choosing the
kth element in a set, the probability of element i occurring is p[i]
divided by the total probability of all not-yet-chosen (and therefore
eligible) elements. This approach is always possible as long as the sample
size is no larger than the population, but it means that the probability
that element i occurs in the sample is not exactly p[i].

I don't object to some Notes, but I would probably phrase it more like we
are providing the standard definition of the jargon term "sampling without
replacement" in the case of non-uniform probabilities. To my mind (or more
accurately, with my background), "replace=False" obviously picks out the
implemented procedure, and I would have been incredibly surprised if it did
anything else. If the option were named "unique=True", then I would have
needed some more documentation to let me know exactly how it was
implemented.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fwd: Backslash operator A\b and np/sp.linalg.solve

2017-01-09 Thread Robert Kern
On Mon, Jan 9, 2017 at 7:10 PM, Ilhan Polat <ilhanpo...@gmail.com> wrote:
>
> Yes, that's precisely the case but when we know the structure we can just
choose the appropriate solver anyhow with a little bit of overhead. What I
mean is that, to my knowledge, FORTRAN routines for checking for
triangularness etc. are absent.

I'm responding to that. The reason that they don't have those FORTRAN
routines for testing for structure inside of a generic dense matrix is that
in FORTRAN it's more natural (and efficient) to just use the explicit
packed structure and associated routines instead. You would only use a
generic dense matrix if you know that there isn't structure in the matrix.
So there are no routines for detecting that structure in generic dense
matrices.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fwd: Backslash operator A\b and np/sp.linalg.solve

2017-01-09 Thread Robert Kern
On Mon, Jan 9, 2017 at 5:09 PM, Ilhan Polat <ilhanpo...@gmail.com> wrote:

> So every test in the polyalgorithm is cheaper than the next one. I'm not
exactly sure what might be the best strategy yet hence the question. It's
really interesting that LAPACK doesn't have this type of fast checks.

In Fortran LAPACK, if you have a special structured matrix, you usually
explicitly use packed storage and call the appropriate function type on it.
It's only when you go to a system that only has a generic, unstructured
dense matrix data type that it makes sense to do those kinds of checks.

-- 
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: SfePy 2016.4

2016-12-07 Thread Robert Cimrman

I am pleased to announce release 2016.4 of SfePy.

Description
---

SfePy (simple finite elements in Python) is a software for solving systems of
coupled partial differential equations by the finite element method or by the
isogeometric analysis (limited support). It is distributed under the new BSD
license.

Home page: http://sfepy.org
Mailing list: http://groups.google.com/group/sfepy-devel
Git (source) repository, issue tracker: https://github.com/sfepy/sfepy

Highlights of this release
--

- support tensor product element meshes with one-level hanging nodes
- improve homogenization support for large deformations
- parallel calculation of homogenized coefficients and related sub-problems
- evaluation of second derivatives of Lagrange basis functions

For full release notes see http://docs.sfepy.org/doc/release_notes.html#id1
(rather long and technical).

Cheers,
Robert Cimrman

---

Contributors to this release in alphabetical order:

Robert Cimrman
Vladimir Lukes
Matyas Novak
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] array comprehension

2016-11-04 Thread Robert Kern
On Fri, Nov 4, 2016 at 6:36 AM, Neal Becker <ndbeck...@gmail.com> wrote:
>
> Francesc Alted wrote:
>
> > 2016-11-04 13:06 GMT+01:00 Neal Becker <ndbeck...@gmail.com>:
> >
> >> I find I often write:
> >> np.array ([some list comprehension])
> >>
> >> mainly because list comprehensions are just so sweet.
> >>
> >> But I imagine this isn't particularly efficient.
> >>
> >
> > Right.  Using a generator and np.fromiter() will avoid the creation of
the
> > intermediate list.  Something like:
> >
> > np.fromiter((i for i in range(x)))  # use xrange for Python 2
> >
> >
> Does this generalize to >1 dimensions?

No.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] missing from contributor list?

2016-11-02 Thread Robert Kern
Because Github (or maybe git) doesn't track the history of the file through
all of the renames. It is only reporting the contributors of changes to the
file at its current location. If you go back to the time just prior to the
commit that renamed the file, you do show up in the list:

https://github.com/numpy/numpy/blob/f179ec92d8ddb0dc5f7445255022be5c4765a704/numpy/build_utils/src/apple_sgemv_fix.c

On Wed, Nov 2, 2016 at 3:38 PM, Sturla Molden <sturla.mol...@gmail.com>
wrote:

> Why am I missing from the contributor hist here?
>
> https://github.com/numpy/numpy/blob/master/numpy/_
> build_utils/src/apple_sgemv_fix.c
>
>
> Sturla
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>



-- 
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How to use user input as equation directly

2016-10-28 Thread Robert McLeod
On Thu, Oct 27, 2016 at 11:35 PM, Benjamin Root <ben.v.r...@gmail.com>
wrote:

> Perhaps the numexpr package might be safer? Not exactly meant for this
> situation (meant for optimizations), but the evaluator is pretty darn safe.
>
>
It would not be able to evaluate something like 'np.arange(50)' for
example, since it only has a limited subset of numpy functionality. In the
example provided that or linspace is likely the natural input for the
variable 't'.

-- 
Robert McLeod, Ph.D.
Center for Cellular Imaging and Nano Analytics (C-CINA)
Biozentrum der Universität Basel
Mattenstrasse 26, 4058 Basel
Work: +41.061.387.3225
robert.mcl...@unibas.ch
robert.mcl...@bsse.ethz.ch <robert.mcl...@ethz.ch>
robbmcl...@gmail.com
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Intel random number package

2016-10-27 Thread Robert Kern
On Thu, Oct 27, 2016 at 10:45 AM, Todd <toddr...@gmail.com> wrote:
>
> On Thu, Oct 27, 2016 at 12:12 PM, Nathaniel Smith <n...@pobox.com> wrote:
>>
>> Ever notice how Anaconda doesn't provide pyfftw? They can't legally ship
both MKL and pyfftw, and they picked MKL.
>
> Anaconda does ship GPL code [1].  They even ship GPL code that depends on
numpy, such as cvxcanon and pystan, and there doesn't seem to be anything
that prevents me from installing them alongside the MKL version of numpy.
So I don't see how it would be any different for pyfftw.

I think we've exhausted the relevance of this tangent to Oleksander's
contributions.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Intel random number package

2016-10-27 Thread Robert McLeod
Releasing NumPy under GPL would make it incompatible with SciPy, which may
be _slightly_ inconvenient to the scientific Python community:

https://scipy.github.io/old-wiki/pages/License_Compatibility.html

https://mail.scipy.org/pipermail/scipy-dev/2013-August/019149.html

Robert

On Thu, Oct 27, 2016 at 5:14 PM, Julian Taylor <
jtaylor.deb...@googlemail.com> wrote:

> On 10/27/2016 04:52 PM, Todd wrote:
>
>> On Thu, Oct 27, 2016 at 10:43 AM, Julian Taylor
>> <jtaylor.deb...@googlemail.com <mailto:jtaylor.deb...@googlemail.com>>
>> wrote:
>>
>> On 10/27/2016 04:30 PM, Todd wrote:
>>
>> On Thu, Oct 27, 2016 at 4:25 AM, Ralf Gommers
>> <ralf.gomm...@gmail.com <mailto:ralf.gomm...@gmail.com>
>> <mailto:ralf.gomm...@gmail.com <mailto:ralf.gomm...@gmail.com>>>
>> wrote:
>>
>>
>> On Thu, Oct 27, 2016 at 10:25 AM, Pavlyk, Oleksandr
>> <oleksandr.pav...@intel.com
>> <mailto:oleksandr.pav...@intel.com>
>> <mailto:oleksandr.pav...@intel.com
>> <mailto:oleksandr.pav...@intel.com>>> wrote:
>>
>> Please see responses inline.
>>
>>
>>
>> *From:*NumPy-Discussion
>> [mailto:numpy-discussion-boun...@scipy.org
>> <mailto:numpy-discussion-boun...@scipy.org>
>> <mailto:numpy-discussion-boun...@scipy.org
>> <mailto:numpy-discussion-boun...@scipy.org>>] *On Behalf Of *Todd
>> *Sent:* Wednesday, October 26, 2016 4:04 PM
>> *To:* Discussion of Numerical Python
>> <numpy-discussion@scipy.org <mailto:numpy-discussion@scipy.org>
>> <mailto:numpy-discussion@scipy.org
>> <mailto:numpy-discussion@scipy.org>>>
>> *Subject:* Re: [Numpy-discussion] Intel random number
>> package
>>
>>
>>
>>
>> On Wed, Oct 26, 2016 at 4:30 PM, Pavlyk, Oleksandr
>> <oleksandr.pav...@intel.com
>> <mailto:oleksandr.pav...@intel.com>
>> <mailto:oleksandr.pav...@intel.com
>>
>> <mailto:oleksandr.pav...@intel.com>>>
>> wrote:
>>
>> Another point already raised by Nathaniel is that for
>> numpy's randomness ideally should provide a way to
>> override
>> default algorithm for sampling from a particular
>> distribution.  For example RandomState object that
>> implements PCG may rely on default
>> acceptance-rejection
>> algorithm for sampling from Gamma, while the
>> RandomState
>> object that provides interface to MKL might want to
>> call
>> into MKL directly.
>>
>>
>>
>> The approach that pyfftw uses at least for scipy, which
>> may also
>> work here, is that you can monkey-patch the
>> scipy.fftpack module
>> at runtime, replacing it with pyfftw's drop-in
>> replacement.
>> scipy then proceeds to use pyfftw instead of its built-in
>> fftpack implementation.  Might such an approach work here?
>> Users can either use this alternative randomstate
>> replacement
>> directly, or they can replace numpy's with it at runtime
>> and
>> numpy will then proceed to use the alternative.
>>
>>
>> The only reason that pyfftw uses monkeypatching is that the
>> better
>> approach is not possible due to license constraints with
>> FFTW (it's
>> GPL).
>>
>>
>> Yes, that is exactly why I brought it up.  Better approaches are
>> also
>> not possible with MKL due to license constraints.  It is a very
>> similar
>> situation overall.
>>
>>
>> Its not that similar, the better approach is certainly possible with
>> FFTW, the GPL is compatible with numpys license. It is only a
>> concern users of binary distributions. Nobody provided the code to
>> use fftw yet, but it would certainly be accepted.
>>
>>
>> Although it is technically compatible, it would make numpy effectively
>> GPL.  Suggestions for this have been explicitly 

Re: [Numpy-discussion] Intel random number package

2016-10-26 Thread Robert Kern
On Wed, Oct 26, 2016 at 12:41 PM, Warren Weckesser <
warren.weckes...@gmail.com> wrote:
>
> On Wed, Oct 26, 2016 at 3:24 PM, Nathaniel Smith <n...@pobox.com> wrote:

>> The patch also adds ~10,000 lines of code; here's an example of what
>> some of it looks like:
>>
>>
https://github.com/oleksandr-pavlyk/numpy/blob/b53880432c19356f4e54b520958272516bf391a2/numpy/random_intel/mklrand/mkl_distributions.cpp#L1724-L1833
>>
>> I don't see how we can realistically commit to maintaining this.
>
> FYI:  numpy already maintains code exactly like that:
https://github.com/numpy/numpy/blob/master/numpy/random/mtrand/distributions.c#L262-L397
>
> Perhaps the point should be that the numpy devs won't want to maintain
two nearly identical versions of that code.

Indeed. That's how the algorithm was published. The /* sigh ... */ is my
own. ;-)

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Intel random number package

2016-10-26 Thread Robert Kern
On Wed, Oct 26, 2016 at 9:36 AM, Sebastian Berg <sebast...@sipsolutions.net>
wrote:
>
> On Mi, 2016-10-26 at 09:29 -0700, Robert Kern wrote:
> > On Wed, Oct 26, 2016 at 9:10 AM, Julian Taylor <jtaylor.debian@google
> > mail.com> wrote:
> > >
> > > On 10/26/2016 06:00 PM, Julian Taylor wrote:
> >
> > >> I prefer for the full functionality of numpy to stay available
> > with a
> > >> stack of community owned software, even if it may be less powerful
> > that
> > >> way.
> > >
> > > But then if this is really just the same random numbers numpy
> > already provides just faster, it is probably acceptable in principle.
> > I haven't actually looked at the PR yet.
> >
> > I think the stream is different in some places, at least. And it's
> > not a silent backend drop-in like np.linalg being built against an
> > optimized BLAS, just a separate module that is inoperative without
> > MKL.
>
> I might be swayed, but my gut feeling would be that a backend change
> (if the default stream changes, an explicit one, though maybe one could
> make a "fastest") would be the only reasonable way to provide such a
> thing in numpy itself.

That mostly argues for distributing it as a separate package, not part of
numpy at all.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Intel random number package

2016-10-26 Thread Robert Kern
On Wed, Oct 26, 2016 at 9:10 AM, Julian Taylor <
jtaylor.deb...@googlemail.com> wrote:
>
> On 10/26/2016 06:00 PM, Julian Taylor wrote:

>> I prefer for the full functionality of numpy to stay available with a
>> stack of community owned software, even if it may be less powerful that
>> way.
>
> But then if this is really just the same random numbers numpy already
provides just faster, it is probably acceptable in principle. I haven't
actually looked at the PR yet.

I think the stream is different in some places, at least. And it's not a
silent backend drop-in like np.linalg being built against an optimized
BLAS, just a separate module that is inoperative without MKL.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Intel random number package

2016-10-25 Thread Robert Kern
On Tue, Oct 25, 2016 at 10:22 PM, Charles R Harris <
charlesr.har...@gmail.com> wrote:
>
> On Tue, Oct 25, 2016 at 10:41 PM, Robert Kern <robert.k...@gmail.com>
wrote:
>>
>> On Tue, Oct 25, 2016 at 9:34 PM, Charles R Harris <
charlesr.har...@gmail.com> wrote:
>> >
>> > Hi All,
>> >
>> > There is a proposed random number package PR now up on github:
https://github.com/numpy/numpy/pull/8209. It is from
>> > oleksandr-pavlyk and implements the number random number package using
MKL for increased speed. I think we are definitely interested in the
improved speed, but I'm not sure numpy is the best place to put the
package. I'd welcome any comments on the PR itself, as well as any thoughts
on the best way organize or use of this work. Maybe scikit-random
>>
>> This is what ng-numpy-randomstate is for.
>>
>> https://github.com/bashtage/ng-numpy-randomstate
>
> Interesting, despite old fashioned original ziggurat implementation of
the normal and gnu c style... Does that project seek to preserve all the
bytestreams or is it still in flux?

I would assume some flux for now, but you can ask the author by submitting
a corrected ziggurat PR as a trial balloon. ;-)

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Intel random number package

2016-10-25 Thread Robert Kern
On Tue, Oct 25, 2016 at 9:34 PM, Charles R Harris <charlesr.har...@gmail.com>
wrote:
>
> Hi All,
>
> There is a proposed random number package PR now up on github:
https://github.com/numpy/numpy/pull/8209. It is from
> oleksandr-pavlyk and implements the number random number package using
MKL for increased speed. I think we are definitely interested in the
improved speed, but I'm not sure numpy is the best place to put the
package. I'd welcome any comments on the PR itself, as well as any thoughts
on the best way organize or use of this work. Maybe scikit-random

This is what ng-numpy-randomstate is for.

https://github.com/bashtage/ng-numpy-randomstate

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Preserving NumPy views when pickling

2016-10-25 Thread Robert Kern
On Tue, Oct 25, 2016 at 7:05 PM, Feng Yu <rainwood...@gmail.com> wrote:
>
> Hi,
>
> Just another perspective. base' and 'data' in PyArrayObject are two
> separate variables.
>
> base can point to any PyObject, but it is `data` that defines where
> data is accessed in memory.
>
> 1. There is no clear way to pickle a pointer (`data`) in a meaningful
> way. In order for `data` member to make sense we still need to
> 'readout' the values stored at `data` pointer in the pickle.
>
> 2. By definition base is not necessary a numpy array but it is just
> some other object for managing the memory.

In general, yes, but most often it's another ndarray, and the child is
related to the parent by a slice operation that could be computed by
comparing the `data` tuples. The exercise here isn't to always represent
the general case in this way, but to see what can be done opportunistically
and if that actually helps solve a practical problem.

> 3. One can surely pickle the `base` object as a reference, but it is
> useless if the data memory has been reconstructed independently during
> unpickling.
>
> 4. Unless there is clear way to notify the referencing numpy array of
> the new data pointer. There probably isn't.
>
> BTW, is the stride information is lost during pickling, too? The
> behavior shall probably be documented if not yet.

The stride information may be lost, yes. We reserve the right to retain it,
though (for example, if .T is contiguous then we might well serialize the
transposed data linearly and return a view on that data upon
deserialization). I don't believe that we guarantee that the unpickled
result is contiguous.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Preserving NumPy views when pickling

2016-10-25 Thread Robert Kern
On Tue, Oct 25, 2016 at 5:09 PM, Matthew Harrigan <
harrigan.matt...@gmail.com> wrote:
>
> It seems pickle keeps track of references for basic python types.
>
> x = [1]
> y = [x]
> x,y = pickle.loads(pickle.dumps((x,y)))
> x.append(2)
> print(y)
> >>> [[1,2]]
>
> Numpy arrays are different but references are forgotten after
pickle/unpickle.  Shared objects do not remain shared.  Based on the quote
below it could be considered bug with numpy/pickle.

Not a bug, but an explicit design decision on numpy's part.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Preserving NumPy views when pickling

2016-10-25 Thread Robert Kern
On Tue, Oct 25, 2016 at 3:07 PM, Stephan Hoyer <sho...@gmail.com> wrote:
>
> On Tue, Oct 25, 2016 at 1:07 PM, Nathaniel Smith <n...@pobox.com> wrote:
>>
>> Concretely, what do would you suggest should happen with:
>>
>> base = np.zeros(1)
>> view = base[:10]
>>
>> # case 1
>> pickle.dump(view, file)
>>
>> # case 2
>> pickle.dump(base, file)
>> pickle.dump(view, file)
>>
>> # case 3
>> pickle.dump(view, file)
>> pickle.dump(base, file)
>>
>> ?
>
> I see what you're getting at here. We would need a rule for when to
include the base in the pickle and when not to. Otherwise,
pickle.dump(view, file) always contains data from the base pickle, even
with view is much smaller than base.
>
> The safe answer is "only use views in the pickle when base is already
being pickled", but that isn't possible to check unless all the arrays are
together in a custom container. So, this isn't really feasible for NumPy.

It would be possible with a custom Pickler/Unpickler since they already
keep track of objects previously (un)pickled. That would handle [base,
view] okay but not [view, base], so it's probably not going to be all that
useful outside of special situations. It would make a neat recipe, but I
probably would not provide it in numpy itself.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] automatically avoiding temporary arrays

2016-10-05 Thread Robert McLeod
On Wed, Oct 5, 2016 at 1:11 PM, srean <srean.l...@gmail.com> wrote:

> Thanks Francesc, Robert for giving me a broader picture of where this fits
> in. I believe numexpr does not  handle slicing, so that might be another
> thing to look at.
>

Dereferencing would be relatively simple to add into numexpr, as it would
just be some getattr() calls.  Personally I will add that at some point
because it will clean up my code.

Slicing, maybe only for continuous blocks in memory?

I.e.

imageStack[0,:,:]

would be possible, but

imageStack[:, ::2, ::2]

would not be trivial (I think...).  I seem to remember someone asked David
Cooke about slicing and he said something along the lines of, "that's what
Numba is for."  Perhaps NumPy backended by Numba is more so what you are
looking for, as it hooks into the byte compiler? The main advantage of
numexpr is that a series of numpy functions in  can be enclosed
in  ne.evaluate( "" ) and it provides a big acceleration for
little programmer effort, but it's not nearly as sophisticated as Numba or
PyPy.



> On Wed, Oct 5, 2016 at 4:26 PM, Robert McLeod <robbmcl...@gmail.com>
> wrote:
>
>>
>> As Francesc said, Numexpr is going to get most of its power through
>> grouping a series of operations so it can send blocks to the CPU cache and
>> run the entire series of operations on the cache before returning the block
>> to system memory.  If it was just used to back-end NumPy, it would only
>> gain from the multi-threading portion inside each function call.
>>
>
> Is that so ?
>
> I thought numexpr also cuts down on number of temporary buffers that get
> filled (in other words copy operations) if the same expression was written
> as series of operations. My understanding can be wrong, and would
> appreciate correction.
>
> The 'out' parameter in ufuncs can eliminate extra temporaries but its not
> composable. Right now I have to manually carry along the array where the in
> place operations take place. I think the goal here is to eliminate that.
>

 The numexpr virtual machine does create temporaries where needed when it
parses the abstract syntax tree for all the operations it has to do.  I
believe the main advantage is that the temporaries are created on the CPU
cache, and not in system memory. It's certainly true that numexpr doesn't
create a lot of OP_COPY operations, rather it's optimized to minimize them,
so probably it's fewer ops than naive successive calls to numpy within
python, but I'm unsure if there's any difference in operation count between
a hand-optimized numpy with out= set and numexpr.  Numexpr just does it for
you.

This blog post from Tim Hochberg is useful for understanding the
performance advantages of blocking versus multithreading:

http://www.bitsofbits.com/2014/09/21/numpy-micro-optimization-and-numexpr/


Robert

-- 
Robert McLeod, Ph.D.
Center for Cellular Imaging and Nano Analytics (C-CINA)
Biozentrum der Universität Basel
Mattenstrasse 26, 4058 Basel
Work: +41.061.387.3225
robert.mcl...@unibas.ch
robert.mcl...@bsse.ethz.ch <robert.mcl...@ethz.ch>
robbmcl...@gmail.com
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] automatically avoiding temporary arrays

2016-10-05 Thread Robert McLeod
All,

On Wed, Oct 5, 2016 at 11:46 AM, Francesc Alted <fal...@gmail.com> wrote:

> 2016-10-05 8:45 GMT+02:00 srean <srean.l...@gmail.com>:
>
>> Good discussion, but was surprised by the absence of numexpr in the
>> discussion., given how relevant it (numexpr) is to the topic.
>>
>> Is the goal to fold in the numexpr functionality (and beyond) into Numpy ?
>>
>
> Yes, the question about merging numexpr into numpy has been something that
> periodically shows up in this list.  I think mostly everyone agree that it
> is a good idea, but things are not so easy, and so far nobody provided a
> good patch for this.  Also, the fact that numexpr relies on grouping an
> expression by using a string (e.g. (y = ne.evaluate("x**3 + tanh(x**2) +
> 4")) does not play well with the way in that numpy evaluates expressions,
> so something should be suggested to cope with this too.
>

As Francesc said, Numexpr is going to get most of its power through
grouping a series of operations so it can send blocks to the CPU cache and
run the entire series of operations on the cache before returning the block
to system memory.  If it was just used to back-end NumPy, it would only
gain from the multi-threading portion inside each function call. I'm not
sure how one would go about grouping successive numpy expressions without
modifying the Python interpreter?

I put a bit of effort into extending numexpr to use 4-byte word opcodes
instead of 1-byte.  Progress has been very slow, however, due to time
constraints, but I have most of the numpy data types (u[1-4], i[1-4],
f[4,8], c[8,16], S[1-4], U[1-4]).  On Tuesday I finished writing a Python
generator script that writes all the C-side opcode macros for opcodes.hpp.
Now I have about 900 opcodes, and this could easily grow into thousands if
more functions are added, so I also built a reverse lookup tree (based on
collections.defaultdict) for the Python-side of numexpr.

Robert

-- 
Robert McLeod, Ph.D.
Center for Cellular Imaging and Nano Analytics (C-CINA)
Biozentrum der Universität Basel
Mattenstrasse 26, 4058 Basel
Work: +41.061.387.3225
robert.mcl...@unibas.ch
robert.mcl...@bsse.ethz.ch <robert.mcl...@ethz.ch>
robbmcl...@gmail.com
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: SfePy 2016.3

2016-09-30 Thread Robert Cimrman

I am pleased to announce release 2016.3 of SfePy.

Description
---

SfePy (simple finite elements in Python) is a software for solving systems of
coupled partial differential equations by the finite element method or by the
isogeometric analysis (limited support). It is distributed under the new BSD
license.

Home page: http://sfepy.org
Mailing list: http://groups.google.com/group/sfepy-devel
Git (source) repository, issue tracker: http://github.com/sfepy/sfepy

Highlights of this release
--

- Python 3 support
- testing with Travis CI
- new classes for homogenized coefficients
- using argparse instead of optparse

For full release notes see http://docs.sfepy.org/doc/release_notes.html#id1
(rather long and technical).

Cheers,
Robert Cimrman

---

Contributors to this release in alphabetical order:

Robert Cimrman
Jan Heczko
Thomas Kluyver
Vladimir Lukes
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Using library-specific headers

2016-09-29 Thread Robert McLeod
Pavlyk,

NumExpr optionally includes MKL's VML at compile-time.  You may want to
look at its implementation.  From what I recall it relies on a function in
a bootstrapped __config__.py to determine if MKL is present.

Robert

On Thu, Sep 29, 2016 at 7:27 PM, Pavlyk, Oleksandr <
oleksandr.pav...@intel.com> wrote:

> Hi Julian,
>
> Thank you very much for the response. It appears to work.
>
> I work on "Intel Distribution for Python" at Intel Corp. This question was
> motivated by work needed to
> prepare pull requests with our changes/optimizations to numpy source code.
> In particular, the numpy.random_intel package
>
>https://mail.scipy.org/pipermail/numpy-discussion/2016-June/075693.html
>
> relies on MKL, but its potential inclusion in numpy should not break the
> build if MKL is unavailable.
>
> Also our benchmarking was pointing at Numpy's sequential memory copying as
> a bottleneck.
> I am working to open a pull request into the main trunk of numpy to take
> advantage of multithreaded
> MKL's BLAS dcopy function to do memory copying in parallel for
> sufficiently large sizes.
>
> Related to numpy.random_inter, I noticed that the randomstate package,
> which extends numpy.random was
> not being made a part of numpy, but rather published on PyPI as a
> stand-alone module. Does that mean that
> the community decided against  including it in numpy's codebase? If so, I
> would appreciate if someone could
> elaborate on or point me to the reasoning behind that decision.
>
> Thank you,
> Oleksandr
>
>
>
> -Original Message-
> From: NumPy-Discussion [mailto:numpy-discussion-boun...@scipy.org] On
> Behalf Of Julian Taylor
> Sent: Thursday, September 29, 2016 8:10 AM
> To: numpy-discussion@scipy.org
> Subject: Re: [Numpy-discussion] Using library-specific headers
>
> On 09/27/2016 11:09 PM, Pavlyk, Oleksandr wrote:
> > Suppose I would like to take advantage of some functions from MKL in
> > numpy C source code, which would require to use
> >
> >
> >
> > #include "mkl.h"
> >
> >
> >
> > Ideally this include line must not break the build of numpy when MKL
> > is not present, so my initial approach was to use
> >
> >
> >
> > #if defined(SCIPY_MKL_H)
> >
> > #include "mkl.h"
> >
> > #endif
> >
> >
> >
> > Unfortunately, this did not work when building with gcc on a machine
> > where MKL is present on default LD_LIBRARY_PATH, because then the
> > distutils code was setting SCIPY_MKL_H preprocessor variable, even
> > though mkl headers are not on the C_INCLUDE_PATH.
> >
> >
> >
> > What is the preferred solution to include an external library header
> > to ensure that code-base continues to build in most common cases?
> >
> >
> >
> > One approach I can think of is to set a preprocessor variable, say
> > HAVE_MKL_HEADERS in numpy/core/includes/numpy/config.h depending on an
> > outcome of building of a simple _configtest.c using
> > config.try_compile(), like it is done in numpy/core/setup.py //
> >
> > / /
> >
> > Is there a simpler, or a better way?
> >
>
> hi,
> you could put the header into OPTIONAL_HEADERS in
> numpy/core/setup_common.py. This will define HAVE_HEADERFILENAME_H for you
> but this will not check that the corresponding the library actually exists
> and can be linked.
> For that SCIPY_MKL_H is probably the right macro, though its name is
> confusing as it does not check for the header presence ...
>
> Can you tell us more about what from mkl you are attempting to add and for
> what purpos, e.g. is it something that should go into numpy proper or just
> for personal/internal use?
>
> cheers,
> Julian
>
>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>



-- 
Robert McLeod, Ph.D.
Center for Cellular Imaging and Nano Analytics (C-CINA)
Biozentrum der Universität Basel
Mattenstrasse 26, 4058 Basel
Work: +41.061.387.3225
robert.mcl...@unibas.ch
robert.mcl...@bsse.ethz.ch <robert.mcl...@ethz.ch>
robbmcl...@gmail.com
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Using library-specific headers

2016-09-29 Thread Robert Kern
On Thu, Sep 29, 2016 at 6:27 PM, Pavlyk, Oleksandr <
oleksandr.pav...@intel.com> wrote:

> Related to numpy.random_inter, I noticed that the randomstate package,
which extends numpy.random was
> not being made a part of numpy, but rather published on PyPI as a
stand-alone module. Does that mean that
> the community decided against  including it in numpy's codebase? If so, I
would appreciate if someone could
> elaborate on or point me to the reasoning behind that decision.

No, we are just working out the API and the extensibility machinery in a
separate package before committing to backwards compatibility.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] New Indexing Methods Revival #N (subclasses!)

2016-09-06 Thread Robert Kern
On Tue, Sep 6, 2016 at 8:46 AM, Sebastian Berg <sebast...@sipsolutions.net>
wrote:
>
> On Di, 2016-09-06 at 09:37 +0200, Sebastian Berg wrote:
> > On Mo, 2016-09-05 at 18:31 -0400, Marten van Kerkwijk wrote:
> > >
> > > Actually, on those names: an alternative to your proposal would be
> > > to
> > > introduce only one new method which can do all types of indexing,
> > > depending on a keyword argument, i.e., something like
> > > ```
> > > def getitem(self, item, mode='outer'):
> > > ...
> > > ```
> > Have I been overthinking this, eh? Just making it `__getitem__(self,
> > index, mode=...)` and then from `vindex` calling the subclasses
> > `__getitem__(self, index, mode="vector")` or so would already solve
> > the
> > issue almost fully? Only thing I am not quite sure about:
> >
> > 1. Is `__getitem__` in some way special to make this difficult (also
> > considering some new ideas like allowing object[a=4]?
>
> OK; I think the C-side slot cannot get the kwarg likely, but probably
> you can find a solution for that

Well, the solution is to use a different name, I think.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Reading in a mesh file

2016-09-01 Thread Robert Kern
On Thu, Sep 1, 2016 at 3:49 PM, Florian Lindner <mailingli...@xgm.de> wrote:
>
> Hello,
>
> thanks for your reply which was really helpful!
>
> My problem is that I discovered that the data I got is rather unordered.
>
> The documentation for reshape says: Read the elements of a using this
index order, and place the elements into the
> reshaped array using this index order. ‘C’ means to read / write the
elements using C-like index order, with the last
> axis index changing fastest, back to the first axis index changing
slowest. ‘F’ means to read / write the elements using
> Fortran-like index order, with the first index changing fastest, and the
last index changing slowest.
>
> With my data both dimensions change, so there is no specific ordering of
the points, just a bunch of arbitrarily mixed
> "x y z value" data.
>
> My idea is:
>
> out = np.loadtxt(...)
> x = np.unique(out[:,0])
> y = np.unique[out]:,1])
> xx, yy = np.meshgrid(x, y)
>
> values = lookup(xx, yy, out)
>
> lookup is ufunc (I hope that term is correct here) that looks up the
value of every x and y in out, like
> x_filtered = out[ out[:,0] == x, :]
> y_filtered  = out[ out[:,1] == y, :]
> return y_filtered[2]
>
> (untested, just a sketch)
>
> Would this work? Any better way?

If the (x, y) values are actually drawn from a rectilinear grid, then you
can use np.lexsort() to sort the rows before reshaping.

[~/scratch]
|4> !cat random-mesh.txt
0.3 0.3 21
0   0   10
0   0.3 11
0.3 0.6 22
0   0.6 12
0.6 0.3 31
0.3 0   20
0.6 0.6 32
0.6 0   30


[~/scratch]
|5> scrambled_nodes = np.loadtxt('random-mesh.txt')

# Note! Put the "faster" column before the "slower" column!
[~/scratch]
|6> i = np.lexsort([scrambled_nodes[:, 1], scrambled_nodes[:, 0]])

[~/scratch]
|7> sorted_nodes = scrambled_nodes[i]

[~/scratch]
|8> sorted_nodes
array([[  0. ,   0. ,  10. ],
   [  0. ,   0.3,  11. ],
   [  0. ,   0.6,  12. ],
   [  0.3,   0. ,  20. ],
   [  0.3,   0.3,  21. ],
   [  0.3,   0.6,  22. ],
   [  0.6,   0. ,  30. ],
   [  0.6,   0.3,  31. ],
   [  0.6,   0.6,  32. ]])


Then carry on with the reshape()ing as before. If the grid points that
"ought to be the same" are not actually identical, then you may end up with
some problems, e.g. if you had "0.3001 0.0 20.0" as a row, but all
of the other "x=0.3" rows had "0.3", then that row would get sorted out of
order. You would have to clean up the grid coordinates a bit first.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Reading in a mesh file

2016-08-31 Thread Robert Kern
On Wed, Aug 31, 2016 at 4:00 PM, Florian Lindner <mailingli...@xgm.de>
wrote:
>
> Hello,
>
> I have mesh (more exactly: just a bunch of nodes) description with values
associated to the nodes in a file, e.g. for a
> 3x3 mesh:
>
> 0   0   10
> 0   0.3 11
> 0   0.6 12
> 0.3 0   20
> 0.3 0.3 21
> 0.3 0.6 22
> 0.6 0   30
> 0.6 0.3 31
> 0.6 0.6 32
>
> What is best way to read it in and get data structures like the ones I
get from np.meshgrid?
>
> Of course, I know about np.loadtxt, but I'm having trouble getting the
resulting arrays (x, y, values) in the right form
> and to retain association to the values.

For this particular case (known shape and ordering), this is what I would
do. Maybe throw in a .T or three depending on exactly how you want them to
be laid out.

[~/scratch]
|1> !cat mesh.txt

0   0   10
0   0.3 11
0   0.6 12
0.3 0   20
0.3 0.3 21
0.3 0.6 22
0.6 0   30
0.6 0.3 31
0.6 0.6 32

[~/scratch]
|2> nodes = np.loadtxt('mesh.txt')

[~/scratch]
|3> nodes
array([[  0. ,   0. ,  10. ],
   [  0. ,   0.3,  11. ],
   [  0. ,   0.6,  12. ],
   [  0.3,   0. ,  20. ],
   [  0.3,   0.3,  21. ],
   [  0.3,   0.6,  22. ],
   [  0.6,   0. ,  30. ],
   [  0.6,   0.3,  31. ],
   [  0.6,   0.6,  32. ]])

[~/scratch]
|4> reshaped = nodes.reshape((3, 3, -1))

[~/scratch]
|5> reshaped
array([[[  0. ,   0. ,  10. ],
[  0. ,   0.3,  11. ],
[  0. ,   0.6,  12. ]],

   [[  0.3,   0. ,  20. ],
[  0.3,   0.3,  21. ],
[  0.3,   0.6,  22. ]],

   [[  0.6,   0. ,  30. ],
[  0.6,   0.3,  31. ],
[  0.6,   0.6,  32. ]]])

[~/scratch]
|7> x = reshaped[..., 0]

[~/scratch]
|8> y = reshaped[..., 1]

[~/scratch]
|9> values = reshaped[..., 2]

[~/scratch]
|10> x
array([[ 0. ,  0. ,  0. ],
   [ 0.3,  0.3,  0.3],
   [ 0.6,  0.6,  0.6]])

[~/scratch]
|11> y
array([[ 0. ,  0.3,  0.6],
   [ 0. ,  0.3,  0.6],
   [ 0. ,  0.3,  0.6]])

[~/scratch]
|12> values
array([[ 10.,  11.,  12.],
   [ 20.,  21.,  22.],
   [ 30.,  31.,  32.]])

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Include last element when subindexing numpy arrays?

2016-08-31 Thread Robert Kern
On Wed, Aug 31, 2016 at 1:34 PM, Matti Viljamaa <mvilja...@kapsi.fi> wrote:
>
> On 31 Aug 2016, at 15:22, Robert Kern <robert.k...@gmail.com> wrote:
>
> On Wed, Aug 31, 2016 at 12:28 PM, Matti Viljamaa <mvilja...@kapsi.fi>
wrote:
> >
> > Is there a clean way to include the last element when subindexing numpy
arrays?
> > Since the default behaviour of numpy arrays is to omit the “stop index”.
> >
> > So for,
> >
> > >>> A
> > array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
> > >>> A[0:5]
> > array([0, 1, 2, 3, 4])
>
> A[5:]
>
> --
> Robert Kern
>
> No that returns the subarray starting from index 5 to the end.
>
> What I want to be able to return
>
> array([0, 1, 2, 3, 4, 5])
>
> (i.e. last element 5 included)
>
> but without the funky A[0:6] syntax, which looks like it should return
>
> array([0, 1, 2, 3, 4, 5, 6])
>
> but since bumpy arrays omit the last index, returns
>
> array([0, 1, 2, 3, 4, 5])
>
> which syntactically would be more reasonable to be A[0:5].

Ah, I see what you are asking now.

The answer is "no"; this is just the way that slicing works in Python in
general. numpy merely follows suit. It is something that you will get used
to with practice. My sense of "funkiness" and "reasonableness" is the
opposite of yours, for instance.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] State-of-the-art to use a C/C++ library from Python

2016-08-31 Thread Robert Kern
On Wed, Aug 31, 2016 at 12:28 PM, Michael Bieri <mibi...@gmail.com> wrote:
>
> Hi all
>
> There are several ways on how to use C/C++ code from Python with NumPy,
as given in http://docs.scipy.org/doc/numpy/user/c-info.html . Furthermore,
there's at least pybind11.
>
> I'm not quite sure which approach is state-of-the-art as of 2016. How
would you do it if you had to make a C/C++ library available in Python
right now?
>
> In my case, I have a C library with some scientific functions on matrices
and vectors. You will typically call a few functions to configure the
computation, then hand over some pointers to existing buffers containing
vector data, then start the computation, and finally read back the data.
The library also can use MPI to parallelize.

I usually reach for Cython:

http://cython.org/
http://docs.cython.org/en/latest/src/userguide/memoryviews.html

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Why np.fft.rfftfreq only returns up to Nyqvist?

2016-08-31 Thread Robert Kern
On Wed, Aug 31, 2016 at 1:14 PM, Matti Viljamaa <mvilja...@kapsi.fi> wrote:
>
> What’s the reasonability of np.fft.rfftfreq returning frequencies only up
to Nyquist, rather than for the full sample rate?

The answer to the question that you asked is that np.fft.rfft() only
computes values for frequencies only up to Nyquist, so np.fft.rfftfreq()
must give you the frequencies to match. I'm not sure if there is another
misunderstanding lurking that needs to be clarified.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Include last element when subindexing numpy arrays?

2016-08-31 Thread Robert Kern
On Wed, Aug 31, 2016 at 12:28 PM, Matti Viljamaa <mvilja...@kapsi.fi> wrote:
>
> Is there a clean way to include the last element when subindexing numpy
arrays?
> Since the default behaviour of numpy arrays is to omit the “stop index”.
>
> So for,
>
> >>> A
> array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
> >>> A[0:5]
> array([0, 1, 2, 3, 4])

A[5:]

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] coordinate bounds

2016-08-20 Thread Robert Kern
On Sat, Aug 20, 2016 at 9:16 PM, Alan Isaac <alan.is...@gmail.com> wrote:
>
> Is there a numpy equivalent to Mma's CoordinateBounds command?
> http://reference.wolfram.com/language/ref/CoordinateBounds.html

The first signature can be computed like so:

  np.transpose([coords.min(axis=0), coords.max(axis=0)])

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy set_printoptions, silent failure, bug?

2016-07-19 Thread Robert Kern
On Tue, Jul 19, 2016 at 10:41 PM, John Ladasky <jlada...@itu.edu> wrote:

> Should this be considered a Numpy bug, or is there some reason that
set_printoptions would legitimately need to accept a dictionary as a single
argument?

There is no such reason. One could certainly add more validation to the
arguments to np.set_printoptions().

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Design feedback solicitation

2016-07-14 Thread Robert Kern
On Fri, Jul 15, 2016 at 2:53 AM, Pavlyk, Oleksandr <
oleksandr.pav...@intel.com> wrote:
>
> Hi Robert,
>
> Thank you for the pointers.
>
> I think numpy.random should have a mechanism to choose between methods
for generating the underlying randomness dynamically, at a run-time, as
well as an extensible framework, where developers could add more methods.
The default would be MT19937 for backwards compatibility. It is important
to be able to do this at a run-time, as it would allow one to use different
algorithms in different threads (like different members of the parallel
Mersenne twister family of generators, see MT2203).
>
> The framework should allow to define randomness as a bit stream, a stream
of fixed size integers, or a stream of uniform reals (32 or 64 bits). This
is a lot of like MKL’s abstract method for basic pseudo-random number
generation.
>
> Each method should provide routines to sample from uniform distributions
over reals (in floats and doubles), as well as over integers.
>
> All remaining non-uniform distributions build on top of these uniform
streams.

ng-numpy-randomstate does all of these.

> I think it is pretty important to refactor numpy.random to allow the
underlying generators to produce a given number of independent variates at
a time. There could be convenience wrapper functions to allow to get one
variate for backwards compatibility, but this change in design would allow
for better efficiency, as sampling a vector of random variates at once is
often faster than repeated sampling of one at a time due to set-up cost,
vectorization, etc.

The underlying C implementation is an implementation detail, so the
refactoring that you suggest has no backwards compatibility constraints.

> Finally, methods to sample particular distribution should uniformly
support method keyword argument. Because method names vary from
distribution to distribution, it should ideally be programmatically
discoverable which methods are supported for a given distribution. For
instance, the standard normal distribution could support
method=’Inversion’, method=’Box-Muller’, method=’Ziggurat’,
method=’Box-Muller-Marsaglia’ (the one used in numpy.random right now), as
well as bunch of non-named methods based on transformed rejection method
(see http://statistik.wu-wien.ac.at/anuran/ )

That is one of the items under discussion. I personally prefer that one
simply exposes named methods for each different scheme (e.g.
ziggurat_normal(), etc.).

> It would also be good if one could dynamically register a new method to
sample from a non-uniform distribution. This would allow, for instance, to
automatically add methods to sample certain non-uniform distribution by
directly calling into MKL (or other library), when available, instead of
building them from uniforms (which may remain a fall-through method).
>
> The linked project is a good start, but the choice of the underlying
algorithm needs to be made at a run-time,

That's what happens. You instantiate the RandomState class that you want.

> as far as I understood, and the only provided interface to query random
variates is one at a time, just like it is currently the case
> in numpy.random.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Design feedback solicitation

2016-06-17 Thread Robert Kern
On Fri, Jun 17, 2016 at 4:08 PM, Pavlyk, Oleksandr <
oleksandr.pav...@intel.com> wrote:
>
> Hi,
>
> I am new to this list, so I will start with an introduction. My name is
Oleksandr Pavlyk. I now work at Intel Corp. on the Intel Distribution for
Python, and previously worked at Wolfram Research for 12 years. My latest
project was to write a mirror to numpy.random, named numpy.random_intel.
The module uses MKL to sample from different distributions for efficiency.
It provides support for different underlying algorithms for basic
pseudo-random number generation, i.e. in addition to MT19937, it also
provides SFMT19937, MT2203, etc.
>
> I recently published a blog about it:
>
>
https://software.intel.com/en-us/blogs/2016/06/15/faster-random-number-generation-in-intel-distribution-for-python
>
> I originally attempted to simply replace numpy.random in the Intel
Distribution for Python with the new module, but due to fixed seed
backwards incompatibility this results in numerous test failures in numpy,
scipy, pandas and other modules.
>
> Unlike numpy.random, the new module generates a vector of random numbers
at a time, which can be done faster than repeatedly generating the same
number of variates one at a time.
>
> The source code for the new module is not upstreamed yet, and this email
is meant to solicit early community feedback to allow for faster acceptance
of the proposed changes.

Cool! You can find pertinent discussion here:

  https://github.com/numpy/numpy/issues/6967

And the current effort for adding new core PRNGs here:

  https://github.com/bashtage/ng-numpy-randomstate

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Indexing with floats

2016-06-10 Thread Robert Kern
On Fri, Jun 10, 2016 at 12:15 PM, Fabien <fabien.mauss...@gmail.com> wrote:
>
> Hi,
>
> I really tried to do my homework before asking this here, but I just
couldn't find the relevant information anywhere...
>
> My question is about the rationale behind forbidding indexing with
floats, i.e.:
>
> >>> x[2.]
> __main__:1: VisibleDeprecationWarning: using a non-integer number instead
of an integer will result in an error in the future
>
> I don't find this very handy from a user's perspective, and I'd be
grateful for pointers on discussion threads and/or PRs where this has been
discussed, so that I can understand why it's important.

https://mail.scipy.org/pipermail/numpy-discussion/2012-December/064705.html
https://github.com/numpy/numpy/issues/2810
https://github.com/numpy/numpy/pull/2891
https://github.com/numpy/numpy/pull/3243
https://mail.scipy.org/pipermail/numpy-discussion/2015-July/073125.html

Note that the future is coming in the next numpy release:

https://github.com/numpy/numpy/pull/6271

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fwd: ifft padding

2016-05-26 Thread Robert McLeod
Allen,

Probably it needs to work in n-dimensions, like the existing
np.fft.fftshift function does, with an optional axis=tuple parameter. I
recall that fftshift is just an array indexing trick?  It would be helpful
to see what's faster, two fftshifts and a edge padding or your
inter-padding.  Probably it's faster to make a new zeros array of the
appropriate padded size and do 2*ndim copies?

Robert

On Wed, May 25, 2016 at 9:35 PM, Allen Welkie <allen.wel...@gmail.com>
wrote:

> I'd like to get some feedback on my [pull request](
> https://github.com/numpy/numpy/pull/7593).
>
> This pull request adds a function `ifftpad` which pads a spectrum by
> inserting zeros where the highest frequencies would be. This is necessary
> because the padding that `ifft` does simply inserts zeros at the end of the
> array. But because of the way the spectrum is laid out, this changes which
> bins represent which frequencies and in general messes up the result of
> `ifft`. If you pad with the proposed `ifftpad` function, the zeros will be
> inserted in the middle of the spectrum and the time signal that results
> from `ifft` will be an interpolated version of the unpadded time signal.
> See the discussion in [issue #1346](
> https://github.com/numpy/numpy/issues/1346).
>
> The following is a script to demonstrate what I mean:
>
> ```
> import numpy
> from numpy import concatenate, zeros
> from matplotlib import pyplot
>
> def correct_padding(a, n, scale=True):
> """ A copy of the proposed `ifftpad` function. """
> spectrum = concatenate((a[:len(a) // 2],
> zeros(n - len(a)),
> a[len(a) // 2:]))
> if scale:
> spectrum *= (n / len(a))
> return spectrum
>
> def plot_real(signal, label):
> time = numpy.linspace(0, 1, len(signal) + 1)[:-1]
> pyplot.plot(time, signal.real, label=label)
>
> def main():
> spectrum = numpy.zeros(10, dtype=complex)
> spectrum[-1] = 1 + 1j
>
> signal = numpy.fft.ifft(spectrum)
> signal_bad_padding = numpy.fft.ifft(10 * spectrum, 100)
> signal_good_padding = numpy.fft.ifft(correct_padding(spectrum, 100))
>
> plot_real(signal, 'No padding')
> plot_real(signal_bad_padding, 'Bad padding')
> plot_real(signal_good_padding, 'Good padding')
>
> pyplot.legend()
> pyplot.show()
>
>
> if __name__ == '__main__':
> main()
> ```
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 
Robert McLeod, Ph.D.
Center for Cellular Imaging and Nano Analytics (C-CINA)
Biozentrum der Universität Basel
Mattenstrasse 26, 4058 Basel
Work: +41.061.387.3225
robert.mcl...@unibas.ch
robert.mcl...@bsse.ethz.ch <robert.mcl...@ethz.ch>
robbmcl...@gmail.com
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal: numpy.random.random_seed

2016-05-23 Thread Robert Kern
On Mon, May 23, 2016 at 5:41 PM, Chris Barker <chris.bar...@noaa.gov> wrote:
>
> On Sun, May 22, 2016 at 2:35 AM, Robert Kern <robert.k...@gmail.com>
wrote:
>>
>> Well, I mean, engineers want lots of things. I suspect that most
engineers *really* just want to call `numpy.random.seed(8675309)` at the
start and never explicitly pass around separate streams. There's an upside
to that in terms of code simplicity. There are also significant limitations
and constraints. Ultimately, the upside against the alternative of passing
around RandomState objects is usually overweighed by the limitations, so
best practice is to pass around RandomState objects.
>
> Could we do something like the logging module, and have numpy.random
"manage" a bunch of stream objects for you -- so you could get the default
single stream easily, and also get access to specific streams without
needing to pass around the objects?

No, I don't think so. The logging module's namespacing doesn't really have
an equivalent use case for PRNGs. We would just be making a much more
complicated global state to manage.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal: numpy.random.random_seed

2016-05-22 Thread Robert Kern
On Wed, May 18, 2016 at 7:56 PM, Nathaniel Smith <n...@pobox.com> wrote:
>
> On Wed, May 18, 2016 at 5:07 AM, Robert Kern <robert.k...@gmail.com>
wrote:
> > On Wed, May 18, 2016 at 1:14 AM, Nathaniel Smith <n...@pobox.com> wrote:

> >> ...anyway, the real reason I'm a bit grumpy is because there are solid
> >> engineering reasons why users *want* this API,
> >
> > I remain unconvinced on this mark. Grumpily.
>
> Sorry for getting grumpy :-).

And my apologies for some unwarranted hyperbole. I think we're both
converging on a reasonable approach, though.

> The engineering reasons seem pretty
> obvious to me though?

Well, I mean, engineers want lots of things. I suspect that most engineers
*really* just want to call `numpy.random.seed(8675309)` at the start and
never explicitly pass around separate streams. There's an upside to that in
terms of code simplicity. There are also significant limitations and
constraints. Ultimately, the upside against the alternative of passing
around RandomState objects is usually overweighed by the limitations, so
best practice is to pass around RandomState objects.

I acknowledge that there exists an upside to the splitting API, but I don't
think it's a groundbreaking improvement over the alternative current best
practice. It's also unclear to me how often situations that really
demonstrate the upside come into play; in my experience a lot of these
situations are already structured such that preallocating N streams is the
natural thing to do. The limitations and constraints are currently
underexplored, IMO; and in this conservative field, pessimism is warranted.

> If you have any use case for independent streams
> at all, and you're writing code that's intended to live inside a
> library's abstraction barrier, then you need some way to choose your
> streams to avoid colliding with arbitrary other code that the end-user
> might assemble alongside yours as part of their final program. So
> AFAICT you have two options: either you need a "tree-style" API for
> allocating these streams, or else you need to add some explicit API to
> your library that lets the end-user control in detail which streams
> you use. Both are possible, but the latter is obviously undesireable
> if you can avoid it, since it breaks the abstraction barrier, making
> your library more complicated to use and harder to evolve.

ACK

> >> so whether or not it
> >> turns out to be possible I think we should at least be allowed to have
> >> a discussion about whether there's some way to give it to them.
> >
> > I'm not shutting down discussion of the option. I *implemented* the
option.
> > I think that discussing whether it should be part of the main API is
> > premature. There probably ought to be a paper or three out there
supporting
> > its safety and utility first. Let the utility function version flourish
> > first.
>
> OK -- I guess this particularly makes sense given how
> extra-tightly-constrained we currently are in fixing mistakes in
> np.random. But I feel like in the end the right place for this really
> is inside the RandomState interface, because the person implementing
> RandomState is the one best placed to understand (a) the gnarly
> technical details here, and (b) how those change depending on the
> particular PRNG in use. I don't want to end up with a bunch of
> subtly-buggy utility functions in non-specialist libraries like dask
> -- so we should be trying to help downstream users figure out how to
> actually get this into np.random?

I think this is an open research area. An enterprising grad student could
milk this for a couple of papers analyzing how to do this safely for a
variety of PRNGs. I don't think we can hash this out in an email thread or
PR. So yeah, eventually there might be an API on RandomState for this, but
it's way too premature to do so right now, IMO. Maybe start with a
specialized subclass of RandomState that adds this experimental API. In
ng-numpy-randomstate. ;-)

But if someone has spare time to work on numpy.random, for God's sake, use
it to review @gfyoung's PRs instead.

> >> It's
> >> not even 100% out of the question that we conclude that existing PRNGs
> >> are buggy because they don't take this use case into account -- it
> >> would be far from the first time that numpy found itself going beyond
> >> the limits of older numerical tools that weren't designed to build the
> >> kind of large composable systems that numpy gets used for.
> >>
> >> MT19937's state space is large enough that you could explicitly encode
> >> a "tree seed" into it, even if you don't trust the laws of probability
> >> -- e.g., you start with a RandomState with id [], then i

Re: [Numpy-discussion] Proposal: numpy.random.random_seed

2016-05-18 Thread Robert Kern
On Wed, May 18, 2016 at 6:20 PM, <josef.p...@gmail.com> wrote:
>
> On Wed, May 18, 2016 at 12:01 PM, Robert Kern <robert.k...@gmail.com>
wrote:
>>
>> On Wed, May 18, 2016 at 4:50 PM, Chris Barker <chris.bar...@noaa.gov>
wrote:
>> >>
>> >> > ...anyway, the real reason I'm a bit grumpy is because there are
solid
>> >> > engineering reasons why users *want* this API,
>> >
>> > Honestly, I am lost in the math -- but like any good engineer, I want
to accomplish something anyway :-) I trust you guys to get this right -- or
at least document what's "wrong" with it.
>> >
>> > But, if I'm reading the use case that started all this correctly, it
closely matches my use-case. That is, I have a complex model with multiple
independent "random" processes. And we want to be able to re-produce
EXACTLY simulations -- our users get confused when the results are
"different" even if in a statistically insignificant way.
>> >
>> > At the moment we are using one RNG, with one seed for everything. So
we get reproducible results, but if one thing is changed, then the entire
simulation is different -- which is OK, but it would be nicer to have each
process using its own RNG stream with it's own seed. However, it matters
not one whit if those seeds are independent -- the processes are different,
you'd never notice if they were using the same PRN stream -- because they
are used differently. So a "fairly low probability of a clash" would be
totally fine.
>>
>> Well, the main question is: do you need to be able to spawn dependent
streams at arbitrary points to an arbitrary depth without coordination
between processes? The necessity for multiple independent streams per se is
not contentious.
>
> I'm similar to Chris, and didn't try to figure out the details of what
you are talking about.
>
> However, if there are functions getting into numpy that help in using a
best practice even if it's not bullet proof, then it's still better than
home made approaches.
> If it get's in soon, then we can use it in a few years (given dependency
lag). At that point there should be more distributed, nested simulation
based algorithms where we don't know in advance how far we have to go to
get reliable numbers or convergence.
>
> (But I don't see anything like that right now.)

Current best practice is to use PRNGs with settable streams (or fixed
jumpahead for those PRNGs cursed to not have settable streams but blessed
to have super-long periods). The way to get those into numpy is to help
Kevin Sheppard finish:

  https://github.com/bashtage/ng-numpy-randomstate

He's done nearly all of the hard work already.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal: numpy.random.random_seed

2016-05-18 Thread Robert Kern
On Wed, May 18, 2016 at 4:50 PM, Chris Barker <chris.bar...@noaa.gov> wrote:
>>
>> > ...anyway, the real reason I'm a bit grumpy is because there are solid
>> > engineering reasons why users *want* this API,
>
> Honestly, I am lost in the math -- but like any good engineer, I want to
accomplish something anyway :-) I trust you guys to get this right -- or at
least document what's "wrong" with it.
>
> But, if I'm reading the use case that started all this correctly, it
closely matches my use-case. That is, I have a complex model with multiple
independent "random" processes. And we want to be able to re-produce
EXACTLY simulations -- our users get confused when the results are
"different" even if in a statistically insignificant way.
>
> At the moment we are using one RNG, with one seed for everything. So we
get reproducible results, but if one thing is changed, then the entire
simulation is different -- which is OK, but it would be nicer to have each
process using its own RNG stream with it's own seed. However, it matters
not one whit if those seeds are independent -- the processes are different,
you'd never notice if they were using the same PRN stream -- because they
are used differently. So a "fairly low probability of a clash" would be
totally fine.

Well, the main question is: do you need to be able to spawn dependent
streams at arbitrary points to an arbitrary depth without coordination
between processes? The necessity for multiple independent streams per se is
not contentious.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal: numpy.random.random_seed

2016-05-18 Thread Robert Kern
On Wed, May 18, 2016 at 1:14 AM, Nathaniel Smith <n...@pobox.com> wrote:
>
> On Tue, May 17, 2016 at 10:41 AM, Robert Kern <robert.k...@gmail.com>
wrote:
> > On Tue, May 17, 2016 at 6:24 PM, Nathaniel Smith <n...@pobox.com> wrote:
> >>
> >> On May 17, 2016 1:50 AM, "Robert Kern" <robert.k...@gmail.com> wrote:
> >> >
> >> [...]
> >> > What you want is a function that returns many RandomState objects
that
> >> > are hopefully spread around the MT19937 space enough that they are
> >> > essentially independent (in the absence of true jumpahead). The
better
> >> > implementation of such a function would look something like this:
> >> >
> >> > def spread_out_prngs(n, root_prng=None):
> >> > if root_prng is None:
> >> > root_prng = np.random
> >> > elif not isinstance(root_prng, np.random.RandomState):
> >> > root_prng = np.random.RandomState(root_prng)
> >> > sprouted_prngs = []
> >> > for i in range(n):
> >> > seed_array = root_prng.randint(1<<32, size=624)  #
> >> > dtype=np.uint32 under 1.11
> >> > sprouted_prngs.append(np.random.RandomState(seed_array))
> >> > return spourted_prngs
> >>
> >> Maybe a nice way to encapsulate this in the RandomState interface
would be
> >> a method RandomState.random_state() that generates and returns a new
child
> >> RandomState.
> >
> > I disagree. This is a workaround in the absence of proper jumpahead or
> > guaranteed-independent streams. I would not encourage it.
> >
> >> > Internally, this generates seed arrays of about the size of the
MT19937
> >> > state so make sure that you can access more of the state space. That
will at
> >> > least make the chance of collision tiny. And it can be easily
rewritten to
> >> > take advantage of one of the newer PRNGs that have true independent
streams:
> >> >
> >> >   https://github.com/bashtage/ng-numpy-randomstate
> >>
> >> ... But unfortunately I'm not sure how to make my interface suggestion
> >> above work on top of one of these RNGs, because for
RandomState.random_state
> >> you really want a tree of independent RNGs and the fancy new PRNGs only
> >> provide a single flat namespace :-/. And even more annoyingly, the
tree API
> >> is actually a nicer API, because with a flat namespace you have to
know up
> >> front about all possible RNGs your code will use, which is an
unfortunate
> >> global coupling that makes it difficult to compose programs out of
> >> independent pieces, while the RandomState.random_state approach
composes
> >> beautifully. Maybe there's some clever way to allocate a 64-bit
namespace to
> >> make it look tree-like? I'm not sure 64 bits is really enough...
> >
> > MT19937 doesn't have a "tree" any more than the others. It's the same
flat
> > state space. You are just getting the illusion of a tree by hoping that
you
> > never collide. You ought to think about precisely the same global
coupling
> > issues with MT19937 as you do with guaranteed-independent streams.
> > Hope-and-prayer isn't really a substitute for properly engineering your
> > problem. It's just a moral hazard to promote this method to the main
API.
>
> Nonsense.
>
> If your definition of "hope and prayer" includes assuming that we
> won't encounter a random collision in a 2**19937 state space, then
> literally all engineering is hope-and-prayer. A collision could
> happen, but if it does it's overwhelmingly more likely to happen
> because of a flaw in the mathematical analysis, or a bug in the
> implementation, or because random quantum fluctuations caused you and
> your program to suddenly be transported to a parallel world where 1 +
> 1 = 1, than that you just got unlucky with your random state. And all
> of these hazards apply equally to both MT19937 and more modern PRNGs.

Granted.

> ...anyway, the real reason I'm a bit grumpy is because there are solid
> engineering reasons why users *want* this API,

I remain unconvinced on this mark. Grumpily.

> so whether or not it
> turns out to be possible I think we should at least be allowed to have
> a discussion about whether there's some way to give it to them.

I'm not shutting down discussion of the option. I *implemented* the option.
I think that discussing whether it should be part of the main API is
premature. There probably ought to be a paper or three out there supporting
its safety and utility first. Let the utili

Re: [Numpy-discussion] Proposal: numpy.random.random_seed

2016-05-17 Thread Robert Kern
On Tue, May 17, 2016 at 6:24 PM, Nathaniel Smith <n...@pobox.com> wrote:
>
> On May 17, 2016 1:50 AM, "Robert Kern" <robert.k...@gmail.com> wrote:
> >
> [...]
> > What you want is a function that returns many RandomState objects that
are hopefully spread around the MT19937 space enough that they are
essentially independent (in the absence of true jumpahead). The better
implementation of such a function would look something like this:
> >
> > def spread_out_prngs(n, root_prng=None):
> > if root_prng is None:
> > root_prng = np.random
> > elif not isinstance(root_prng, np.random.RandomState):
> > root_prng = np.random.RandomState(root_prng)
> > sprouted_prngs = []
> > for i in range(n):
> > seed_array = root_prng.randint(1<<32, size=624)  #
dtype=np.uint32 under 1.11
> > sprouted_prngs.append(np.random.RandomState(seed_array))
> > return spourted_prngs
>
> Maybe a nice way to encapsulate this in the RandomState interface would
be a method RandomState.random_state() that generates and returns a new
child RandomState.

I disagree. This is a workaround in the absence of proper jumpahead or
guaranteed-independent streams. I would not encourage it.

> > Internally, this generates seed arrays of about the size of the MT19937
state so make sure that you can access more of the state space. That will
at least make the chance of collision tiny. And it can be easily rewritten
to take advantage of one of the newer PRNGs that have true independent
streams:
> >
> >   https://github.com/bashtage/ng-numpy-randomstate
>
> ... But unfortunately I'm not sure how to make my interface suggestion
above work on top of one of these RNGs, because for
RandomState.random_state you really want a tree of independent RNGs and the
fancy new PRNGs only provide a single flat namespace :-/. And even more
annoyingly, the tree API is actually a nicer API, because with a flat
namespace you have to know up front about all possible RNGs your code will
use, which is an unfortunate global coupling that makes it difficult to
compose programs out of independent pieces, while the
RandomState.random_state approach composes beautifully. Maybe there's some
clever way to allocate a 64-bit namespace to make it look tree-like? I'm
not sure 64 bits is really enough...

MT19937 doesn't have a "tree" any more than the others. It's the same flat
state space. You are just getting the illusion of a tree by hoping that you
never collide. You ought to think about precisely the same global coupling
issues with MT19937 as you do with guaranteed-independent streams.
Hope-and-prayer isn't really a substitute for properly engineering your
problem. It's just a moral hazard to promote this method to the main API.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal: numpy.random.random_seed

2016-05-17 Thread Robert Kern
On Tue, May 17, 2016 at 2:40 PM, Sturla Molden <sturla.mol...@gmail.com>
wrote:
>
> Stephan Hoyer <sho...@gmail.com> wrote:
> > I have recently encountered several use cases for randomly generate
random
> > number seeds:
> >
> > 1. When writing a library of stochastic functions that take a seed as an
> > input argument, and some of these functions call multiple other such
> > stochastic functions. Dask is one such example [1].
> >
> > 2. When a library needs to produce results that are reproducible after
> > calling numpy.random.seed, but that do not want to use the functions in
> > numpy.random directly. This came up recently in a pandas pull request
[2],
> > because we want to allow using RandomState objects as an alternative to
> > global state in numpy.random. A major advantage of this approach is
that it
> > provides an obvious alternative to reusing the private
numpy.random._mtrand
> > [3].
>
> What about making numpy.random a finite state machine, and keeping a stack
> of RandomState seeds? That is, something similar to what OpenGL does for
> its matrices? Then we get two functions, numpy.random.push_seed and
> numpy.random.pop_seed.

I don't think that addresses the issues brought up here. It's just more
global state to worry about.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal: numpy.random.random_seed

2016-05-17 Thread Robert Kern
On Tue, May 17, 2016 at 9:09 AM, Stephan Hoyer <sho...@gmail.com> wrote:
>
> On Tue, May 17, 2016 at 12:18 AM, Robert Kern <robert.k...@gmail.com>
wrote:
>>
>> On Tue, May 17, 2016 at 4:54 AM, Stephan Hoyer <sho...@gmail.com> wrote:
>> > 1. When writing a library of stochastic functions that take a seed as
an input argument, and some of these functions call multiple other such
stochastic functions. Dask is one such example [1].
>>
>> Can you clarify the use case here? I don't really know what you are
doing here, but I'm pretty sure this is not the right approach.
>
> Here's a contrived example. Suppose I've written a simulator for cars
that consists of a number of loosely connected components (e.g., an engine,
brakes, etc.). The behavior of each component of our simulator is
stochastic, but we want everything to be fully reproducible, so we need to
use seeds or RandomState objects.
>
> We might write our simulate_car function like the following:
>
> def simulate_car(engine_config, brakes_config, seed=None):
> rs = np.random.RandomState(seed)
> engine = simulate_engine(engine_config, seed=rs.random_seed())
> brakes = simulate_brakes(brakes_config, seed=rs.random_seed())
> ...
>
> The problem with passing the same RandomState object (either explicitly
or dropping the seed argument entirely and using the  global state) to both
simulate_engine and simulate_breaks is that it breaks encapsulation -- if I
change what I do inside simulate_engine, it also effects the brakes.

That's a little too contrived, IMO. In most such simulations, the different
components interact with each other in the normal course of the simulation;
that's why they are both joined together in the same simulation instead of
being two separate runs. Unless if the components are being run across a
process or thread boundary (a la dask below) where true nondeterminism
comes into play, then I don't think you want these semi-independent
streams. This seems to be the advice du jour from the agent-based modeling
community.

> The dask use case is actually pretty different -- the intent is to create
many random numbers in parallel using multiple threads or processes
(possibly in a distributed fashion). I know that skipping ahead is the
standard way to get independent number streams for parallel sampling, but
that isn't exposed in numpy.random, and setting distinct seeds seems like a
reasonable alternative for scientific computing use cases.

Forget about integer seeds. Those are for human convenience. If you're not
jotting them down in your lab notebook in pen, you don't want an integer
seed.

What you want is a function that returns many RandomState objects that are
hopefully spread around the MT19937 space enough that they are essentially
independent (in the absence of true jumpahead). The better implementation
of such a function would look something like this:

def spread_out_prngs(n, root_prng=None):
if root_prng is None:
root_prng = np.random
elif not isinstance(root_prng, np.random.RandomState):
root_prng = np.random.RandomState(root_prng)
sprouted_prngs = []
for i in range(n):
seed_array = root_prng.randint(1<<32, size=624)  # dtype=np.uint32
under 1.11
sprouted_prngs.append(np.random.RandomState(seed_array))
return spourted_prngs

Internally, this generates seed arrays of about the size of the MT19937
state so make sure that you can access more of the state space. That will
at least make the chance of collision tiny. And it can be easily rewritten
to take advantage of one of the newer PRNGs that have true independent
streams:

  https://github.com/bashtage/ng-numpy-randomstate

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal: numpy.random.random_seed

2016-05-17 Thread Robert Kern
On Tue, May 17, 2016 at 4:54 AM, Stephan Hoyer <sho...@gmail.com> wrote:
>
> I have recently encountered several use cases for randomly generate
random number seeds:
>
> 1. When writing a library of stochastic functions that take a seed as an
input argument, and some of these functions call multiple other such
stochastic functions. Dask is one such example [1].

Can you clarify the use case here? I don't really know what you are doing
here, but I'm pretty sure this is not the right approach.

> 2. When a library needs to produce results that are reproducible after
calling numpy.random.seed, but that do not want to use the functions in
numpy.random directly. This came up recently in a pandas pull request [2],
because we want to allow using RandomState objects as an alternative to
global state in numpy.random. A major advantage of this approach is that it
provides an obvious alternative to reusing the private numpy.random._mtrand
[3].

It's only pseudo-private. This is an authorized use of it.

However, for this case, I usually just pass around the the numpy.random
module itself and let duck-typing take care of the rest.

> [3] On a side note, if there's no longer a good reason to keep this
object private, perhaps we should expose it in our public API. It would
certainly be useful -- scikit-learn is already using it (see links in the
pandas PR above).

Adding a public get_global_random_state() function might be in order.
Originally, I wanted there to be *some* barrier to entry, but just grabbing
it to use as a default RandomState object is definitely an intended use of
it. It's not going to disappear.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: SfePy 2016.2

2016-05-12 Thread Robert Cimrman

I am pleased to announce release 2016.2 of SfePy.

Description
---

SfePy (simple finite elements in Python) is a software for solving systems of
coupled partial differential equations by the finite element method or by the
isogeometric analysis (preliminary support). It is distributed under the new
BSD license.

Home page: http://sfepy.org
Mailing list: http://groups.google.com/group/sfepy-devel
Git (source) repository, issue tracker, wiki: http://github.com/sfepy

Highlights of this release
--

- partial shell10x element implementation
- parallel computation of homogenized coefficients
- clean up of elastic terms
- read support for msh file mesh format of gmsh

For full release notes see http://docs.sfepy.org/doc/release_notes.html#id1
(rather long and technical).

Best regards,
Robert Cimrman on behalf of the SfePy development team

---

Contributors to this release in alphabetical order:

Robert Cimrman
Vladimir Lukes
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Floor divison on int returns float

2016-04-13 Thread Robert Kern
On Wed, Apr 13, 2016 at 3:17 AM, Antony Lee <antony@berkeley.edu> wrote:
>
> This kind of issue (see also https://github.com/numpy/numpy/issues/3511)
has become more annoying now that indexing requires integers (indexing with
a float raises a VisibleDeprecationWarning).  The argument "dividing an
uint by an int may give a result that does not fit in an uint nor in an
int" does not sound very convincing to me,

It shouldn't because that's not the rule that numpy follows. The range of
the result is never considered. Both *inputs* are cast to the same type
that can represent the full range of either input type (for that matter,
the actual *values* of the inputs are also never considered). In the case
of uint64 and int64, there is no really good common type (the integer
hierarchy has to top out somewhere), but float64 merely loses resolution
rather than cutting off half of the range of uint64.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] mtrand.c update 1.11 breaks my crappy code

2016-04-06 Thread Robert Kern
On Wed, Apr 6, 2016 at 4:17 PM, Neal Becker <ndbeck...@gmail.com> wrote:

> I prefer to use a single instance of a RandomState so that there are
> guarantees about the independence of streams generated from python random
> functions, and from my c++ code.  True, there are simpler approaches - but
> I'm a purist.

Consider using PRNGs that actually expose truly independent streams instead
of a single shared stream:

https://github.com/bashtage/ng-numpy-randomstate

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] mtrand.c update 1.11 breaks my crappy code

2016-04-06 Thread Robert Kern
On Wed, Apr 6, 2016 at 2:18 PM, Neal Becker <ndbeck...@gmail.com> wrote:
>
> I have C++ code that tries to share the mtrand state.  It unfortunately
> depends on the layout of RandomState which used to be:
>
> struct __pyx_obj_6mtrand_RandomState {
>   PyObject_HEAD
>   rk_state *internal_state;
>   PyObject *lock;
> };
>
> But with 1.11 it's:
> struct __pyx_obj_6mtrand_RandomState {
>   PyObject_HEAD
>   struct __pyx_vtabstruct_6mtrand_RandomState *__pyx_vtab;
>   rk_state *internal_state;
>   PyObject *lock;
>   PyObject *state_address;
> };
>
> So
> 1. Why the change?
> 2. How can I write portable code?

There is no C API to RandomState at this time, stable, portable or
otherwise. It's all private implementation detail. If you would like a
stable and portable C API for RandomState, you will need to contribute one
using PyCapsules to expose the underlying rk_state* pointer.

https://docs.python.org/2.7/c-api/capsule.html

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] linux wheels coming soon

2016-03-24 Thread Robert T. McGibbon
I suspect that many of the maintainers of major scipy-ecosystem projects
are aware of these (or other similar) travis wheel caches, but would guess
that the pool of travis-ci python users who weren't aware of these wheel
caches is much much larger. So there will still be a lot of travis-ci clock
cycles saved by manylinux wheels.

-Robert

On Thu, Mar 24, 2016 at 10:46 PM, Nathaniel Smith <n...@pobox.com> wrote:

> On Thu, Mar 24, 2016 at 11:44 AM, Peter Cock <p.j.a.c...@googlemail.com>
> wrote:
> > On Thu, Mar 24, 2016 at 6:37 PM, Nathaniel Smith <n...@pobox.com> wrote:
> >> On Mar 24, 2016 8:04 AM, "Peter Cock" <p.j.a.c...@googlemail.com>
> wrote:
> >>>
> >>> Hi Nathaniel,
> >>>
> >>> Will you be providing portable Linux wheels aka manylinux1?
> >>> https://www.python.org/dev/peps/pep-0513/
> >>
> >> Matthew Brett will (probably) do the actual work, but yeah, that's the
> idea
> >> exactly. Note the author list on that PEP ;-)
> >>
> >> -n
> >
> > Yep - I was partly double checking, but also aware many folk
> > skim the NumPy list and might not be aware of PEP-513 and
> > the standardisation efforts going on.
> >
> > Also in addition to http://travis-dev-wheels.scipy.org/ and
> > http://travis-wheels.scikit-image.org/ mentioned by Ralf there
> > is http://wheels.scipy.org/ which I presume will get the new
> > Linux wheels once they go live.
>
> The new wheels will go up on pypi, and I guess once everyone has
> wheels on pypi then these ad-hoc wheel servers that existed only as a
> way to distribute Linux wheels will become obsolete.
>
> (travis-dev-wheels will remain useful, though, because its purpose is
> to hold up-to-the-minute builds of project master branches to allow
> downstream projects to get early warning of breaking changes -- we
> don't plan to upload to pypi after every commit :-).)
>
> -n
>
> --
> Nathaniel J. Smith -- https://vorpus.org
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>



-- 
-Robert
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: SfePy 2016.1

2016-02-24 Thread Robert Cimrman

I am pleased to announce release 2016.1 of SfePy.

Description
---

SfePy (simple finite elements in Python) is a software for solving systems of
coupled partial differential equations by the finite element method or by the
isogeometric analysis (preliminary support). It is distributed under the new
BSD license.

Home page: http://sfepy.org
Mailing list: http://groups.google.com/group/sfepy-devel
Git (source) repository, issue tracker, wiki: http://github.com/sfepy

Highlights of this release
--

- major simplification of finite element field code
- automatic checking of shapes of term arguments
- improved mesh parametrization code and documentation
- support for fieldsplit preconditioners of PETSc

For full release notes see http://docs.sfepy.org/doc/release_notes.html#id1
(rather long and technical).

Best regards,
Robert Cimrman on behalf of the SfePy development team

---

Contributors to this release in alphabetical order:

Robert Cimrman
Vladimir Lukes
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] proposal: new logspace without the log in the argument

2016-02-19 Thread Robert Kern
On Fri, Feb 19, 2016 at 12:10 PM, Andrew Nelson <andyf...@gmail.com> wrote:
>
> With respect to geomspace proposals: instead of specifying start and end
values and the number of points I'd like to have an option where I can set
the start and end points and the ratio. The function would then work out
the correct number of points to get closest to the end value.
>
> E.g. geomspace(start=1, finish=2, ratio=1.03)
>
> The first entries would be 1.0, 1.03, 1*1.03**2, etc.
>
> I have a requirement for the correct ratio between the points, and it's a
right bind having to calculate the exact number of points needed.

At the risk of extending the twisty little maze of names, all alike, I
would probably call a function with this signature geomrange() instead. It
is more akin to arange(start, stop, step) than linspace(start, stop,
num_steps).

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] proposal: new logspace without the log in the argument

2016-02-18 Thread Robert Kern
On Thu, Feb 18, 2016 at 10:19 PM, Alan Isaac <alan.is...@gmail.com> wrote:
>
> On 2/18/2016 2:44 PM, Robert Kern wrote:
>>
>> In a new function not named `linspace()`, I think that might be fine. I
do occasionally want to swap between linear and logarithmic/geometric
spacing based on a parameter, so this
>> doesn't violate the van Rossum Rule of Function Signatures.
>
> Would such a new function correct the apparent mistake (?) of
> `linspace` including the endpoint by default?
> Or is the current API justified by its Matlab origins?
> (Or have I missed the point altogether?)

The last, I'm afraid. Different use cases, different conventions. Integer
ranges are half-open because that is the most useful convention in a
0-indexed ecosystem. Floating point ranges don't interface with indexing,
and the closed intervals are the most useful (or at least the most common).

> If this query is annoying, please ignore it.  It is not meant to be.

The same for my answer.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] proposal: new logspace without the log in the argument

2016-02-18 Thread Robert Kern
On Thu, Feb 18, 2016 at 7:38 PM, Nathaniel Smith <n...@pobox.com> wrote:
>
> Some questions it'd be good to get feedback on:
>
> - any better ideas for naming it than "geomspace"? It's really too bad
> that the 'logspace' name is already taken.

geomspace() is a perfectly cromulent name, IMO.

> - I guess the alternative interface might be something like
>
> np.linspace(start, stop, steps, spacing="log")
>
> what do people think?

In a new function not named `linspace()`, I think that might be fine. I do
occasionally want to swap between linear and logarithmic/geometric spacing
based on a parameter, so this doesn't violate the van Rossum Rule of
Function Signatures.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] making "low" optional in numpy.randint

2016-02-17 Thread Robert Kern
He was talking consistently about "random integers" not
"random_integers()". :-)

On Wednesday, 17 February 2016, G Young <gfyoun...@gmail.com> wrote:

> Your statement is a little self-contradictory, but in any case, you
> shouldn't worry about random_integers getting removed from the code-base.
> However, it has been deprecated in favor of randint.
>
> On Wed, Feb 17, 2016 at 11:48 PM, Juan Nunez-Iglesias <jni.s...@gmail.com
> <javascript:_e(%7B%7D,'cvml','jni.s...@gmail.com');>> wrote:
>
>> Also fwiw, I think the 0-based, half-open interval is one of the best
>> features of Python indexing and yes, I do use random integers to index into
>> my arrays and would not appreciate having to litter my code with "-1"
>> everywhere.
>>
>> On Thu, Feb 18, 2016 at 10:29 AM, Alan Isaac <alan.is...@gmail.com
>> <javascript:_e(%7B%7D,'cvml','alan.is...@gmail.com');>> wrote:
>>
>>> On 2/17/2016 3:42 PM, Robert Kern wrote:
>>>
>>>> random.randint() was the one big exception, and it was considered a
>>>> mistake for that very reason, soft-deprecated in favor of
>>>> random.randrange().
>>>>
>>>
>>>
>>> randrange also has its detractors:
>>> https://code.activestate.com/lists/python-dev/138358/
>>> and following.
>>>
>>> I think if we start citing persistant conventions, the
>>> persistent convention across *many* languages that the bounds
>>> provided for a random integer range are inclusive also counts for
>>> something, especially when the names are essentially shared.
>>>
>>> But again, I am just trying to be clear about what is at issue,
>>> not push for a change.  I think citing non-existent standards
>>> is not helpful.  I think the discrepancy between the Python
>>> standard library and numpy for a function going by a common
>>> name is harmful.  (But then, I teach.)
>>>
>>> fwiw,
>>>
>>> Alan
>>>
>>>
>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@scipy.org
>>> <javascript:_e(%7B%7D,'cvml','NumPy-Discussion@scipy.org');>
>>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> <javascript:_e(%7B%7D,'cvml','NumPy-Discussion@scipy.org');>
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>

-- 
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] making "low" optional in numpy.randint

2016-02-17 Thread Robert Kern
On Wed, Feb 17, 2016 at 8:43 PM, G Young <gfyoun...@gmail.com> wrote:

> Josef: I don't think we are making people think more.  They're all
keyword arguments, so if you don't want to think about them, then you leave
them as the defaults, and everyone is happy.

I believe that Josef has the code's reader in mind, not the code's writer.
As a reader of other people's code (and I count 6-months-ago-me as one such
"other people"), I am sure to eventually encounter all of the different
variants, so I will need to know all of them.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] making "low" optional in numpy.randint

2016-02-17 Thread Robert Kern
On Wed, Feb 17, 2016 at 8:30 PM, Alan Isaac <alan.is...@gmail.com> wrote:
>
> On 2/17/2016 12:28 PM, G Young wrote:
>>
>> Perhaps, but we are not coding in Haskell.  We are coding in Python, and
>> the standard is that the endpoint is excluded, which renders your point
>> moot I'm afraid.
>
> I am not sure what "standard" you are talking about.
> I thought we were talking about the user interface.

It is a persistent and consistent convention (i.e. "standard") across
Python APIs that deal with integer ranges (range(), slice(),
random.randrange(), ...), particularly those that end up related to
indexing; e.g. `x[np.random.randint(0, len(x))]` to pull a random sample
from an array.

random.randint() was the one big exception, and it was considered a mistake
for that very reason, soft-deprecated in favor of random.randrange().

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] making "low" optional in numpy.randint

2016-02-17 Thread Robert Kern
On Wed, Feb 17, 2016 at 4:40 PM, Alan Isaac <alan.is...@gmail.com> wrote:
>
> Behavior of random integer generation:
> Python randint[a,b]
> MATLAB randi  [a,b]
> Mma RandomInteger [a,b]
> haskell randomR   [a,b]
> GAUSS rndi[a,b]
> Maple rand[a,b]
>
> In short, NumPy's `randint` is non-standard (and,
> I would add, non-intuitive).  Presumably was due
> due to relying on a float draw from [0,1) along
> with the use of floor.

No, never was. It is implemented so because Python uses semi-open integer
intervals by preference because it plays most nicely with 0-based indexing.
Not sure about all of those systems, but some at least are 1-based
indexing, so closed intervals do make sense.

The Python stdlib's random.randint() closed interval is considered a
mistake by python-dev leading to the implementation and preference for
random.randrange() instead.

> The divergence in behavior between the (later) Python
> function of the same name is particularly unfortunate.

Indeed, but unfortunately, this mistake dates way back to Numeric times,
and easing the migration to numpy was a priority in the heady days of numpy
1.0.

> So I suggest further work on this function is
> not called for, and use of `random_integers`
> should be encouraged.  Probably NumPy's `randint`
> should be deprecated.

Not while I'm here. Instead, `random_integers()` is discouraged and perhaps
might eventually be deprecated.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Fwd: Numexpr-3.0 proposal

2016-02-16 Thread Robert McLeod
On Mon, Feb 15, 2016 at 10:43 AM, Gregor Thalhammer <
gregor.thalham...@gmail.com> wrote:

>
> Dear Robert,
>
> thanks for your effort on improving numexpr. Indeed, vectorized math
> libraries (VML) can give a large boost in performance (~5x), except for a
> couple of basic operations (add, mul, div), which current compilers are
> able to vectorize automatically. With recent gcc even more functions are
> vectorized, see https://sourceware.org/glibc/wiki/libmvec But you need
> special flags depending on the platform (SSE, AVX present?), runtime
> detection of processor capabilities would be nice for distributing
> binaries. Some time ago, since I lost access to Intels MKL, I patched
> numexpr to use Accelerate/Veclib on os x, which is preinstalled on each
> mac, see https://github.com/geggo/numexpr.git veclib_support branch.
>
> As you increased the opcode size, I could imagine providing a bit to
> switch (during runtime) between internal functions and vectorized ones,
> that would be handy for tests and benchmarks.
>

Dear Gregor,

Your suggestion to separate the opcode signature from the library used to
execute it is very clever. Based on your suggestion, I think that the
natural evolution of the opcodes is to specify them by function signature
and library, using a two-level dict, i.e.

numexpr.interpreter.opcodes['exp_f8f8f8'][gnu] = some_enum
numexpr.interpreter.opcodes['exp_f8f8f8'][msvc] = some_enum +1
numexpr.interpreter.opcodes['exp_f8f8f8'][vml] = some_enum + 2
numexpr.interpreter.opcodes['exp_f8f8f8'][yeppp] = some_enum +3

I want to procedurally generate opcodes.cpp and interpreter_body.cpp.  If I
do it the way you suggested funccodes.hpp and all the many #define's
regarding function codes in the interpreter can hopefully be removed and
hence simplify the overall codebase. One could potentially take it a step
further and plan (optimize) each expression, similar to what FFTW does with
regards to matrix shape. That is, the basic way to control the library
would be with a singleton library argument, i.e.:

result = ne.evaluate( "A*log(foo**2 / bar**2", lib=vml )

However, we could also permit a tuple to be passed in, where each element
of the tuple reflects the library to use for each operation in the AST tree:

result = ne.evaluate( "A*log(foo**2 / bar**2", lib=(gnu,gnu,gnu,yeppp,gnu) )

In this case the ops are (mul,mul,div,log,mul).  The op-code picking is
done by the Python side, and this tuple could be potentially optimized by
numexpr rather than hand-optimized, by trying various permutations of the
linked C math libraries. The wisdom from the planning could be pickled and
saved in a wisdom file.  Currently Numexpr has cacheDict in util.py but
there's no reason this can't be pickled and saved to disk. I've done a
similar thing by creating wrappers for PyFFTW already.

Robert

-- 
Robert McLeod, Ph.D.
Center for Cellular Imaging and Nano Analytics (C-CINA)
Biozentrum der Universität Basel
Mattenstrasse 26, 4058 Basel
Work: +41.061.387.3225
robert.mcl...@unibas.ch
robert.mcl...@bsse.ethz.ch <robert.mcl...@ethz.ch>
robbmcl...@gmail.com
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numexpr-3.0 proposal

2016-02-16 Thread Robert McLeod
On Mon, Feb 15, 2016 at 7:28 AM, Ralf Gommers <ralf.gomm...@gmail.com>
wrote:

>
>
> On Sun, Feb 14, 2016 at 11:19 PM, Robert McLeod <robbmcl...@gmail.com>
> wrote:
>
>>
>> 4.) I took a stab at converting from distutils to setuputils but this
>> seems challenging with numpy as a dependency. I wonder if anyone has tried
>> monkey-patching so that setup.py build_ext uses distutils and then pass the
>> interpreter.pyd/so as a data file, or some other such chicanery?
>>
>
> Not sure what you mean, since numpexpr already uses setuptools:
> https://github.com/pydata/numexpr/blob/master/setup.py#L22. What is the
> real goal you're trying to achieve?
>
> This monkeypatching is a bad idea:
> https://github.com/robbmcleod/numexpr/blob/numexpr-3.0/setup.py#L19. Both
> setuptools and numpy.distutils already do that, and that's already one too
> many. So you definitely don't want to add a third place You can use the
> -j (--parallel) flag to numpy.distutils instead, see
> http://docs.scipy.org/doc/numpy-dev/user/building.html#parallel-builds
>
> Ralf
>

Dear Ralf,

Yes, this appears to be a bad idea.  I was just trying to think about if I
could use the more object-oriented approach that I am familiar with in
setuptools to easily build wheels for Pypi.  Thanks for the comments and
links; I didn't know I could parallelize the numpy build.

Robert

-- 
Robert McLeod, Ph.D.
Center for Cellular Imaging and Nano Analytics (C-CINA)
Biozentrum der Universität Basel
Mattenstrasse 26, 4058 Basel
Work: +41.061.387.3225
robert.mcl...@unibas.ch
robert.mcl...@bsse.ethz.ch <robert.mcl...@ethz.ch>
robbmcl...@gmail.com
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Suggestion: special-case np.array(range(...)) to be faster

2016-02-15 Thread Robert Kern
On Mon, Feb 15, 2016 at 4:24 PM, Jeff Reback <jeffreb...@gmail.com> wrote:
>
> just an FYI.
>
> pandas implemented a RangeIndex in upcoming 0.18.0, mainly for memory
savings,
> see here, similar to how python range/xrange work.
>
> though there are substantial perf benefits, mainly with set operations,
see here
> though didn't officially benchmark thes.

Since it is a numpy-aware object (unlike the builtins), you can (and have,
if I'm reading the code correctly) implement __array__() such that it does
the correctly performant thing and call np.arange(). RangeIndex won't be
adversely impacted by retaining the status quo.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Numexpr-3.0 proposal

2016-02-14 Thread Robert McLeod
Hello everyone,

I've done some work on making a new version of Numexpr that would fix some
of the limitations of the original virtual machine with regards to data
types and operation/function count. Basically I re-wrote the Python and C
sides to use 4-byte words, instead of null-terminated strings, for
operations and passing types.  This means the number of operations and
types isn't significantly limited anymore.

Francesc Alted suggested I should come here and get some advice from the
community. I wrote a short proposal on the Wiki here:

https://github.com/pydata/numexpr/wiki/Numexpr-3.0-Branch-Overview

One can see my branch here:

https://github.com/robbmcleod/numexpr/tree/numexpr-3.0

If anyone has any comments they'd be welcome. Questions from my side for
the group:

1.) Numpy casting: I downloaded the Numpy source and after browsing it
seems the best approach is probably to just use
numpy.core.numerictypes.find_common_type?

2.) Can anyone foresee any issues with casting build-in Python types (i.e.
float and integer) to their OS dependent numpy equivalents? Numpy already
seems to do this.

3.) Is anyone enabling the Intel VML library? There are a number of
comments in the code that suggest it's not accelerating the code. It also
seems to cause problems with bundling numexpr with cx_freeze.

4.) I took a stab at converting from distutils to setuputils but this seems
challenging with numpy as a dependency. I wonder if anyone has tried
monkey-patching so that setup.py build_ext uses distutils and then pass the
interpreter.pyd/so as a data file, or some other such chicanery?

(I was going to ask about attaching a debugger, but I just noticed:
https://wiki.python.org/moin/DebuggingWithGdb   )

Ciao,

Robert

-- 
Robert McLeod, Ph.D.
Center for Cellular Imaging and Nano Analytics (C-CINA)
Biozentrum der Universität Basel
Mattenstrasse 26, 4058 Basel
Work: +41.061.387.3225
robert.mcl...@unibas.ch
robert.mcl...@bsse.ethz.ch <robert.mcl...@ethz.ch>
robbmcl...@gmail.com
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Hook in __init__.py to let distributors patch numpy

2016-02-12 Thread Robert Kern
I would add a numpy/_distributor_init.py module and unconditionally import
it in the __init__.py. It's contents in our upstream sources would just be
a docstring:

"""Distributors! Put your initialization code here!
"""

One important technical benefit is that the unconditional import won't hide
ImportErrors in the distributor's code.

On Fri, Feb 12, 2016 at 1:19 AM, Matthew Brett <matthew.br...@gmail.com>
wrote:

> Hi,
>
> Over at https://github.com/numpy/numpy/issues/5479 we're discussing
> Windows wheels.
>
> On thing that we would like to be able to ship Windows wheels, is to
> be able to put some custom checks into numpy when you build the
> wheels.
>
> Specifically, for Windows, we're building on top of ATLAS BLAS /
> LAPACK, and we need to check that the system on which the wheel is
> running, has SSE2 instructions, otherwise we know ATLAS will crash
> (almost everybody does have SSE2 these days).
>
> The way I propose we do that, is this patch here:
>
> https://github.com/numpy/numpy/pull/7231
>
> diff --git a/numpy/__init__.py b/numpy/__init__.py
> index 0fcd509..ba3ba16 100644
> --- a/numpy/__init__.py
> +++ b/numpy/__init__.py
> @@ -190,6 +190,12 @@ def pkgload(*packages, **options):
>  test = testing.nosetester._numpy_tester().test
>  bench = testing.nosetester._numpy_tester().bench
>
> +# Allow platform-specific build to intervene in numpy init
> +try:
> +from . import _distributor_init
> +except ImportError:
> +pass
> +
>  from . import core
>  from .core import *
>  from . import compat
>
> So, numpy __init__.py looks for a module `_distributor_init`, in which
> the distributor might have put custom code to do any checks and
> initialization needed for the particular platform.  We don't by
> default ship a `_distributor_init.py` but leave it up to packagers to
> generate this when building binaries.
>
> Does that sound like a sensible approach to y'all?
>
> Cheers,
>
> Matthew
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>



-- 
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy 1.11.0b2 released

2016-02-06 Thread Robert T. McGibbon
> (we've had a few recent issues with libgfortran accidentally missing as a
requirement of scipy).

On this topic, you may be able to get some milage out of adapting
pypa/auditwheel, which can load
up extension module `.so` files inside a wheel (or conda package) and walk
the shared library dependency
tree like the runtime linker (using pyelftools), and check whether things
are going to resolve properly and
where shared libraries are loaded from.

Something like that should be able to, with minimal adaptation to use the
conda dependency resolver,
check that a conda package properly declares all of the shared library
dependencies it actually needs.

-Robert

On Sat, Feb 6, 2016 at 3:42 PM, Michael Sarahan <msara...@gmail.com> wrote:

> FWIW, we (Continuum) are working on a CI system that builds conda
> recipes.  Part of this is testing not only individual packages that change,
> but also any downstream packages that are also in the repository of
> recipes.  The configuration for this is in
> https://github.com/conda/conda-recipes/blob/master/.binstar.yml and the
> project doing the dependency detection is in
> https://github.com/ContinuumIO/ProtoCI/
>
> This is still being established (particularly, provisioning build
> workers), but please talk with us if you're interested.
>
> Chris, it may still be useful to use docker here (perhaps on the build
> worker, or elsewhere), also, as the distinction between build machines and
> user machines is important to make.  Docker would be great for making sure
> that all dependency requirements are met on end-user systems (we've had a
> few recent issues with libgfortran accidentally missing as a requirement of
> scipy).
>
> Best,
> Michael
>
> On Sat, Feb 6, 2016 at 5:22 PM Chris Barker <chris.bar...@noaa.gov> wrote:
>
>> On Fri, Feb 5, 2016 at 3:24 PM, Nathaniel Smith <n...@pobox.com> wrote:
>>
>>> On Fri, Feb 5, 2016 at 1:16 PM, Chris Barker <chris.bar...@noaa.gov>
>>> wrote:
>>>
>>
>>
>>> >> > If we set up a numpy-testing conda channel, it could be used to
>>> cache
>>> >> > binary builds for all he versions of everything we want to test
>>> >> > against.
>>>
>>   Anaconda doesn't always have the
>>> > latest builds of everything.
>>
>>
>> OK, this may be more or less helpful, depending on what we want to built
>> against. But a conda environment (maybe tied to a custom channel) really
>> does make  a nice contained space for testing that can be set up fast on a
>> CI server.
>>
>> If whoever is setting up a test system/matrix thinks this would be
>> useful, I'd be glad to help set it up.
>>
>> -Chris
>>
>>
>>
>>
>>
>> --
>>
>> Christopher Barker, Ph.D.
>> Oceanographer
>>
>> Emergency Response Division
>> NOAA/NOS/OR(206) 526-6959   voice
>> 7600 Sand Point Way NE   (206) 526-6329   fax
>> Seattle, WA  98115   (206) 526-6317   main reception
>>
>> chris.bar...@noaa.gov
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 
-Robert
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Behavior of np.random.uniform

2016-01-21 Thread Robert Kern
On Thu, Jan 21, 2016 at 7:06 AM, Jaime Fernández del Río <
jaime.f...@gmail.com> wrote:
>
> There doesn't seem to be much of a consensus on the way to go, so leaving
things as they are and have been seems the wisest choice for now, thanks
for all the feedback. I will work with Greg on documenting the status quo
properly.

Ugh. Be careful in documenting the way things currently work. No one
intended for it to work that way! No one should rely on high___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Behavior of np.random.uniform

2016-01-21 Thread Robert Kern
On Tue, Jan 19, 2016 at 5:35 PM, Sebastian Berg <sebast...@sipsolutions.net>
wrote:
>
> On Di, 2016-01-19 at 16:28 +, G Young wrote:
> > In rand range, it raises an exception if low >= high.
> >
> > I should also add that AFAIK enforcing low >= high with floats is a
> > lot trickier than it is for integers.  I have been knee-deep in
> > corner cases for some time with randint where numbers that are
> > visually different are cast as the same number by numpy due to
> > rounding and representation issues.  That situation only gets worse
> > with floats.
> >
>
> Well, actually random.uniform docstring says:
>
> Get a random number in the range [a, b) or [a, b] depending on
> rounding.

Which docstring are you looking at? The current one says [low, high)

http://docs.scipy.org/doc/numpy/reference/generated/numpy.random.uniform.html#numpy.random.uniform

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Should I use pip install numpy in linux?

2016-01-21 Thread Robert McGibbon
Hi all,

Just as a heads up: Nathaniel and I wrote a draft PEP on binary linux
wheels that is now being discussed on distutils-sig, so you can check that
out and participate in the conversation if you're interested.

- PEP on python.org: https://www.python.org/dev/peps/pep-0513/
- PEP on github with some typos fixed:
https://github.com/manylinux/manylinux/blob/master/pep-513.rst
- Email archive:
https://mail.python.org/pipermail/distutils-sig/2016-January/027997.html

-Robert

On Tue, Jan 19, 2016 at 10:05 AM, Ralf Gommers <ralf.gomm...@gmail.com>
wrote:

>
>
> On Tue, Jan 19, 2016 at 5:57 PM, Chris Barker - NOAA Federal <
> chris.bar...@noaa.gov> wrote:
>
>>
>> > 2) continue to support those users fairly poorly, and at substantial
>> > ongoing cost
>>
>> I'm curious what the cost is for this poor support -- throw the source
>> up on PyPi, and we're done. The cost comes in when trying to build
>> binaries...
>>
>
> I'm sure Nathaniel means the cost to users of failed installs and of numpy
> losing users because of that, not the cost of building binaries.
>
> > Option 1 would require overwhelming consensus of the community, which
>> > for better or worse is presumably not going to happen while
>> > substantial portions of that community are still using pip/PyPI.
>>
>> Are they? Which community are we talking about? The community I'd like
>> to target are web developers that aren't doing what they think of as
>> "scientific" applications, but could use a little of the SciPy stack.
>> These folks are committed to pip, and are very reluctant to introduce
>> a difficult dependency.  Binary wheels would help these folks, but
>> that is not a community that exists yet ( or it's small, anyway)
>>
>> All that being said, I'd be happy to see binary wheels for the core
>> SciPy stack on PyPi. It would be nice for people to be able to do a
>> bit with Numpy or pandas, it MPL, without having to jump ship to a
>> whole new way of doing things.
>>
>
> This is indeed exactly why we need binary wheels. Efforts to provide those
> will not change our strong recommendation to our users that they're better
> off using a scientific Python distribution.
>
> Ralf
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Behavior of np.random.uniform

2016-01-19 Thread Robert Kern
On Tue, Jan 19, 2016 at 5:40 PM, Charles R Harris <charlesr.har...@gmail.com>
wrote:
>
> On Tue, Jan 19, 2016 at 10:36 AM, Robert Kern <robert.k...@gmail.com>
wrote:
>>
>> On Tue, Jan 19, 2016 at 5:27 PM, Charles R Harris <
charlesr.har...@gmail.com> wrote:
>> >
>>
>> > On Tue, Jan 19, 2016 at 9:23 AM, Chris Barker - NOAA Federal <
chris.bar...@noaa.gov> wrote:
>> >>
>> >> What does the standard lib do for rand range? I see that randint Is
closed on both ends, so order doesn't matter, though if it raises for b<a,
then that's a precedent we could follow.
>> >
>> > randint is not closed on the high end. The now deprecated
random_integers is the function that does that.
>> >
>> > For floats, it's good to have various interval options. For instance,
in generating numbers that will be inverted or have their log taken it is
good to avoid zero. However, the names 'low' and 'high' are misleading...
>>
>> They are correctly leading the users to the manner in which the author
intended the function to be used. The *implementation* is misleading by
allowing users to do things contrary to the documented intent. ;-)
>>
>> With floating point and general intervals, there is not really a good
way to guarantee that the generated results avoid the "open" end of the
specified interval or even stay *within* that interval. This function is
definitely not intended to be used as `uniform(closed_end, open_end)`.
>
> Well, it is possible to make that happen if one is careful or directly
sets the bits in ieee types...

For the unit interval, certainly. For general bounds, I am not so sure.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Behavior of np.random.uniform

2016-01-19 Thread Robert Kern
On Tue, Jan 19, 2016 at 5:27 PM, Charles R Harris <charlesr.har...@gmail.com>
wrote:
>

> On Tue, Jan 19, 2016 at 9:23 AM, Chris Barker - NOAA Federal <
chris.bar...@noaa.gov> wrote:
>>
>> What does the standard lib do for rand range? I see that randint Is
closed on both ends, so order doesn't matter, though if it raises for b<a,
then that's a precedent we could follow.
>
> randint is not closed on the high end. The now deprecated random_integers
is the function that does that.
>
> For floats, it's good to have various interval options. For instance, in
generating numbers that will be inverted or have their log taken it is good
to avoid zero. However, the names 'low' and 'high' are misleading...

They are correctly leading the users to the manner in which the author
intended the function to be used. The *implementation* is misleading by
allowing users to do things contrary to the documented intent. ;-)

With floating point and general intervals, there is not really a good way
to guarantee that the generated results avoid the "open" end of the
specified interval or even stay *within* that interval. This function is
definitely not intended to be used as `uniform(closed_end, open_end)`.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Behavior of np.random.uniform

2016-01-19 Thread Robert Kern
On Tue, Jan 19, 2016 at 5:36 PM, Robert Kern <robert.k...@gmail.com> wrote:
>
> On Tue, Jan 19, 2016 at 5:27 PM, Charles R Harris <
charlesr.har...@gmail.com> wrote:
> >
>
> > On Tue, Jan 19, 2016 at 9:23 AM, Chris Barker - NOAA Federal <
chris.bar...@noaa.gov> wrote:
> >>
> >> What does the standard lib do for rand range? I see that randint Is
closed on both ends, so order doesn't matter, though if it raises for b<a,
then that's a precedent we could follow.
> >
> > randint is not closed on the high end. The now deprecated
random_integers is the function that does that.
> >
> > For floats, it's good to have various interval options. For instance,
in generating numbers that will be inverted or have their log taken it is
good to avoid zero. However, the names 'low' and 'high' are misleading...
>
> They are correctly leading the users to the manner in which the author
intended the function to be used. The *implementation* is misleading by
allowing users to do things contrary to the documented intent. ;-)
>
> With floating point and general intervals, there is not really a good way
to guarantee that the generated results avoid the "open" end of the
specified interval or even stay *within* that interval. This function is
definitely not intended to be used as `uniform(closed_end, open_end)`.

There are special cases that *can* be implemented and are worth doing so as
they are building blocks for other distributions that do need to avoid 0 or
1 as you say. Full-featured RNG suites do offer these:

  [0, 1]
  [0, 1)
  (0, 1]
  (0, 1)

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Software Capabilities of NumPy in Our Tensor Survey Paper

2016-01-15 Thread Robert Kern
On Fri, Jan 15, 2016 at 5:30 PM, Nathaniel Smith <n...@pobox.com> wrote:
>
> On Jan 15, 2016 8:36 AM, "Li Jiajia" <jiaji...@gatech.edu> wrote:
> >
> > Hi all,
> > I’m a PhD student in Georgia Tech. Recently, we’re working on a survey
paper about tensor algorithms: basic tensor operations, tensor
decomposition and some tensor applications. We are making a table to
compare the capabilities of different software and planning to include
NumPy. We’d like to make sure these parameters are correct to make a fair
compare. Although we have looked into the related documents, please help us
to confirm these. Besides, if you think there are more features of your
software and a more preferred citation, please let us know. We’ll consider
to update them. We want to show NumPy supports tensors, and we also include
"scikit-tensor” in our survey, which is based on NumPy.
> > Please let me know any confusion or any advice!
> > Thanks a lot! :-)
> >
> > Notice:
> > 1. “YES/NO” to show whether or not the software supports the operation
or has the feature.
> > 2. “?” means we’re not sure of the feature, and please help us out.
> > 3. “Tensor order” means the maximum number of tensor dimensions that
users can do with this software.
> > 4. For computational cores,
> > 1) "Element-wise Tensor Operation (A * B)” includes element-wise
add/minus/multiply/divide, also Kronecker, outer and Katri-Rao products. If
the software contains one of them, we mark “YES”.
> > 2) “TTM” means tensor-times-matrix multiplication. We distinguish TTM
from tensor contraction. If the software includes tensor contraction, it
can also support TTM.
> > 3) For “MTTKRP”, we know most software can realize it through the above
two operations. We mark it “YES”, only if an specified optimization for the
whole operation.
>
> NumPy has support for working with multidimensional tensors, if you like,
but it doesn't really use the tensor language and notation (preferring
instead to think in terms of "arrays" as a somewhat more computationally
focused and less mathematically focused conceptual framework).
>
> Which is to say that I actually have no idea what all those jargon terms
you're asking about mean :-) I am suspicious that NumPy supports more of
those operations than you have marked, just under different names/notation,
but really can't tell either way for sure without knowing what exactly they
are.

In particular check if your operations can be expressed with einsum()

http://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.einsum.html

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Should I use pip install numpy in linux?

2016-01-15 Thread Robert McGibbon
On Fri, Jan 15, 2016 at 11:56 AM, Travis Oliphant <tra...@continuum.io>
wrote:
>
>
> I still submit that this is not the best use of time.   Conda *already*
solves the problem.My sadness is that people keep working to create an
ultimately inferior solution rather than just help make a better solution
more accessible. People mistakenly believe that wheels and conda
packages are equivalent.  They are not.   If they were we would not have
created conda.   We could not do what was necessary with wheels and
contorting wheels to become conda packages was and still is a lot more
work.Now, obviously, it's just code and you can certainly spend effort
and time to migrate wheels so that they functionally equivalently to conda
packages --- but what is the point, really?
>
> Why don't we work together to make the open-source conda project and
open-source conda packages more universally accessible?

 The factors that motivate my interest in making wheels for Linux (i.e. the
proposed manylinux tag) work on PyPI are

- All (new) Python installations come with pip. As a package author writing
documentation, I count on users having pip installed, but I can't count on
conda.
- I would like to see Linux have feature parity with OS X and Windows with
respect to pip and PyPI.
- I want the PyPA tools like pip to be as good as possible.
- I'm confident that the manylinux proposal will work, and it's very
straightforward.

-Robert
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Get rid of special scalar arithmetic

2016-01-13 Thread Robert Kern
On Wed, Jan 13, 2016 at 5:18 AM, Charles R Harris <charlesr.har...@gmail.com>
wrote:
>
> Hi All,
>
> I've opened issue #7002, reproduced below, for discussion.
>>
>> Numpy umath has a file scalarmath.c.src that implements scalar
arithmetic using special functions that are about 10x faster than the
equivalent ufuncs.
>>
>> In [1]: a = np.float64(1)
>>
>> In [2]: timeit a*a
>> 1000 loops, best of 3: 69.5 ns per loop
>>
>> In [3]: timeit np.multiply(a, a)
>> 100 loops, best of 3: 722 ns per loop
>>
>> I contend that in large programs this improvement in execution time is
not worth the complexity and maintenance overhead; it is unlikely that
scalar-scalar arithmetic is a significant part of their execution time.
Therefore I propose to use ufuncs for all of the scalar-scalar arithmetic.
This would also bring the benefits of __numpy_ufunc__ to scalars with
minimal effort.
>
> Thoughts?

Not all important-to-optimize programs are large in our field; interactive
use is rampant. The scalar optimizations weren't added speculatively:
people noticed that their Numeric code ran much slower under numpy and were
reluctant to migrate. I was forever responding on comp.lang.python, "It's
because scalar arithmetic hasn't been optimized yet. We know how to do it,
we just need a volunteer to do the work. Contributions gratefully
accepted!" The most critical areas tended to be optimization where you are
often working with implicit scalars that pop out in the optimization loop.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Should I use pip install numpy in linux?

2016-01-11 Thread Robert McGibbon
> And in any case we have lots of users who don't use conda and are thus
doomed to support both ecosystems regardless, so we might as well make the
best of it :-).

Yes, this is the key. Conda is a great tool for a lot of users / use cases,
but it's not for everyone.

Anyways, I think I've made a pretty good start on the tooling for a wheel
ABI tag for a LSB-style base system that represents a common set of shared
libraries and symbols versions provided by "many" linuxes (previously
discussed by Nathaniel here:
https://code.activestate.com/lists/python-distutils-sig/26272/)

-Robert

On Mon, Jan 11, 2016 at 5:29 PM, Nathaniel Smith <n...@pobox.com> wrote:

> On Jan 11, 2016 3:54 PM, "Chris Barker" <chris.bar...@noaa.gov> wrote:
> >
> > On Mon, Jan 11, 2016 at 11:02 AM, David Cournapeau <courn...@gmail.com>
> wrote:
> >>>
> >>> If we get all that worked out, we still haven't made any progress
> toward the non-standard libs that aren't python. This is the big "scipy
> problem" -- fortran, BLAS, hdf, ad infinitum.
> >>>
> >>> I argued for years that we could build binary wheels that hold each of
> these, and other python packages could depend on them, but pypa never
> seemed to like that idea.
> >>
> >>
> >> I don't think that's an accurate statement. There are issues to solve
> around this, but I did not encounter push back, either on the ML or face to
> face w/ various pypa members at Pycon, etc... There may be push backs for a
> particular detail, but making "pip install scipy" or "pip install
> matplotlib" a reality on every platform is something everybody agrees o
> >
> >
> > sure, everyone wants that. But when it gets deeper, they don't want to
> have a bunc hof pip-installable binary wheels that are simply clibs
> re-packaged as a dependency. And, then you have the problelm of those being
> "binary wheel" dependencies, rather than "package" dependencies.
> >
> > e.g.:
> >
> > this particular build of pillow depends on the libpng and libjpeg
> wheels, but the Pillow package, in general, does not. And you would have
> different dependencies on Windows, and OS-X, and Linux.
> >
> > pip/wheel simply was not designed for that, and I didn't get any warm
> and fuzzy feelings from dist-utils sig that the it ever would. And again,
> then you are re-designing conda.
>
> I agree that talking about such things on distutils-sig tends to elicit a
> certain amount of puzzled incomprehension, but I don't think it matters --
> wheels already have everything you need to support this. E.g. wheels for
> different platforms can trivially have different dependencies. (They even
> go to some lengths to make sure this is possible for pure python packages
> where the same wheel can be used on multiple platforms.) When distributing
> a library-in-a-wheel then you need a little bit of hackishness to make sure
> the runtime loader can find the library, which conda would otherwise handle
> for you, but AFAICT it's like 10 lines of code or something.
>
> And in any case we have lots of users who don't use conda and are thus
> doomed to support both ecosystems regardless, so we might as well make the
> best of it :-).
>
> -n
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Defining a base linux-64 environment [was: Should I use pip install numpy in linux?]

2016-01-11 Thread Robert McGibbon
I started working on a tool for checking linux wheels for "manylinux"
compatibility, and fixing them up if possible, based on the same ideas as
Matthew Brett's delocate <https://github.com/matthew-brett/delocate> for OS
X. Current WIP code, if anyone wants to help / throw penuts, is here:
https://github.com/rmcgibbo/deloc8.

It's currently fairly modest and can only list non-whitelisted external
shared library dependencies, and verify that sufficiently old versioned
symbols for glibc and its ilk are used.

-Robert

On Sun, Jan 10, 2016 at 1:19 AM, Robert McGibbon <rmcgi...@gmail.com> wrote:

> Hi all,
>
> I followed Nathaniel's advice and restricted the search down to the
> packages included in the Anaconda release (as opposed to all of the
> packages in their repositories), and fixed some technical issues with the
> way I was doing the analysis.
>
> The new list is much smaller. Here are the shared libraries that the
> components of Anaconda require that the system provides on Linux 64:
>
> libpanelw.so.5, libncursesw.so.5, libgcc_s.so.1, libstdc++.so.6,
> libm.so.6, libdl.so.2, librt.so.1, libcrypt.so.1, libc.so.6, libnsl.so.1,
> libutil.so.1, libpthread.so.0, libX11.so.6, libXext.so.6,
> libgobject-2.0.so.0, libgthread-2.0.so.0, libglib-2.0.so.0,
> libXrender.so.1, libICE.so.6, libSM.so.6, libGL.so.1.
>
> Many of these libraries are required simply for the interpreter. The
> remaining ones that aren't required by the interpreter are, but are
> required by some other package in Anaconda are:
>
> libgcc_s.so.1, libstdc++.so.6, libXext.so.6, libSM.so.6,
> libgthread-2.0.so.0, libgobject-2.0.so.0, libglib-2.0.so.0, libICE.so.6,
> libXrender.so.1, and libGL.so.1.
>
> Most of these are parts of X11 required by Qt (
> http://doc.qt.io/qt-5/linux-requirements.html).
>
> -Robert
>
>
>
> On Sat, Jan 9, 2016 at 4:42 PM, Robert McGibbon <rmcgi...@gmail.com>
> wrote:
>
>> > Maybe a better approach would be to look at what libraries are used on
>> by an up-to-date default Anaconda install (on the assumption that this
>> is the best tested configuration)
>>
>> That's not a bad idea. I also have a couple other ideas about how to
>> filter
>> this based on using debian popularity-contests and the package graph. I
>> will report back when I have more info.
>>
>> -Robert
>>
>> On Sat, Jan 9, 2016 at 3:04 PM, Nathaniel Smith <n...@pobox.com> wrote:
>>
>>> On Sat, Jan 9, 2016 at 3:52 AM, Robert McGibbon <rmcgi...@gmail.com>
>>> wrote:
>>> > Hi all,
>>> >
>>> > I went ahead and tried to collect a list of all of the libraries that
>>> could
>>> > be considered to constitute the "base" system for linux-64. The
>>> strategy I
>>> > used was to leverage off the work done by the folks at Continuum by
>>> > searching through their pre-compiled binaries from
>>> > https://repo.continuum.io/pkgs/free/linux-64/ to find shared
>>> libraries that
>>> > were dependened on (according to ldd)  that were not accounted for by
>>> the
>>> > declared dependencies that each package made known to the conda package
>>> > manager.
>>> >
>>> > The full list of these system libraries, sorted in from
>>> > most-commonly-depend-on to rarest, is below. There are 158 of them.
>>> [...]
>>> > So it's not perfect. But it might be a useful starting place.
>>>
>>> Unfortunately, yeah, it looks like there's a lot of false positives in
>>> here :-(. For example your list contains liblzma and libsqlite, but
>>> both of these are shipped as dependencies of python itself. So
>>> probably someone just forgot to declare the dependency explicitly, but
>>> got away with it because the libraries were pulled in anyway.
>>>
>>> Maybe a better approach would be to look at what libraries are used on
>>> by an up-to-date default Anaconda install (on the assumption that this
>>> is the best tested configuration), and then erase from the list all
>>> libraries that are shipped by this configuration (ignoring declared
>>> dependencies since those seem to be unreliable)? It's better to be
>>> conservative here, since the end goal is to come up with a list of
>>> external libraries that we're confident have actually been tested for
>>> compatibility by lots and lots of different users.
>>>
>>> -n
>>>
>>> --
>>> Nathaniel J. Smith -- http://vorpus.org
>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@scipy.org
>>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>
>>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Should I use pip install numpy in linux?

2016-01-10 Thread Robert McGibbon
> > Right. There's a small problem which is that the base linux system
>> isn't just "CentOS 5", it's "CentOS 5 and here's the list of libraries
> > that you're allowed to link to: ...", where that list is empirically
> > chosen to include only stuff that really is installed on ~all linux
>> machines and for which the ABI really has been stable in practice over
> > multiple years and distros (so e.g. no OpenSSL).
> >
> > Does anyone know who maintains Anaconda's linux build environment?

> I strongly suspect it was originally set up by Aaron Meurer. Who
maintains it now that he is no longer at Continuum is a good question.

>From looking at all of the external libraries referenced by binaries
included in Anaconda
and the conda repos, I am not confident that they have a totally strict
policy here, or at least
not ones that is enforced by tooling. The sonames I listed here
<https://mail.scipy.org/pipermail/numpy-discussion/2016-January/074602.html>
cover
all of the external
dependencies used by the latest Anaconda release, but earlier releases and
other
conda-installable packages from the default channel are not so strict.

-Robert
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Defining a base linux-64 environment [was: Should I use pip install numpy in linux?]

2016-01-10 Thread Robert McGibbon
Hi all,

I followed Nathaniel's advice and restricted the search down to the
packages included in the Anaconda release (as opposed to all of the
packages in their repositories), and fixed some technical issues with the
way I was doing the analysis.

The new list is much smaller. Here are the shared libraries that the
components of Anaconda require that the system provides on Linux 64:

libpanelw.so.5, libncursesw.so.5, libgcc_s.so.1, libstdc++.so.6, libm.so.6,
libdl.so.2, librt.so.1, libcrypt.so.1, libc.so.6, libnsl.so.1,
libutil.so.1, libpthread.so.0, libX11.so.6, libXext.so.6,
libgobject-2.0.so.0, libgthread-2.0.so.0, libglib-2.0.so.0,
libXrender.so.1, libICE.so.6, libSM.so.6, libGL.so.1.

Many of these libraries are required simply for the interpreter. The
remaining ones that aren't required by the interpreter are, but are
required by some other package in Anaconda are:

libgcc_s.so.1, libstdc++.so.6, libXext.so.6, libSM.so.6,
libgthread-2.0.so.0, libgobject-2.0.so.0, libglib-2.0.so.0, libICE.so.6,
libXrender.so.1, and libGL.so.1.

Most of these are parts of X11 required by Qt (
http://doc.qt.io/qt-5/linux-requirements.html).

-Robert



On Sat, Jan 9, 2016 at 4:42 PM, Robert McGibbon <rmcgi...@gmail.com> wrote:

> > Maybe a better approach would be to look at what libraries are used on
> by an up-to-date default Anaconda install (on the assumption that this
> is the best tested configuration)
>
> That's not a bad idea. I also have a couple other ideas about how to filter
> this based on using debian popularity-contests and the package graph. I
> will report back when I have more info.
>
> -Robert
>
> On Sat, Jan 9, 2016 at 3:04 PM, Nathaniel Smith <n...@pobox.com> wrote:
>
>> On Sat, Jan 9, 2016 at 3:52 AM, Robert McGibbon <rmcgi...@gmail.com>
>> wrote:
>> > Hi all,
>> >
>> > I went ahead and tried to collect a list of all of the libraries that
>> could
>> > be considered to constitute the "base" system for linux-64. The
>> strategy I
>> > used was to leverage off the work done by the folks at Continuum by
>> > searching through their pre-compiled binaries from
>> > https://repo.continuum.io/pkgs/free/linux-64/ to find shared libraries
>> that
>> > were dependened on (according to ldd)  that were not accounted for by
>> the
>> > declared dependencies that each package made known to the conda package
>> > manager.
>> >
>> > The full list of these system libraries, sorted in from
>> > most-commonly-depend-on to rarest, is below. There are 158 of them.
>> [...]
>> > So it's not perfect. But it might be a useful starting place.
>>
>> Unfortunately, yeah, it looks like there's a lot of false positives in
>> here :-(. For example your list contains liblzma and libsqlite, but
>> both of these are shipped as dependencies of python itself. So
>> probably someone just forgot to declare the dependency explicitly, but
>> got away with it because the libraries were pulled in anyway.
>>
>> Maybe a better approach would be to look at what libraries are used on
>> by an up-to-date default Anaconda install (on the assumption that this
>> is the best tested configuration), and then erase from the list all
>> libraries that are shipped by this configuration (ignoring declared
>> dependencies since those seem to be unreliable)? It's better to be
>> conservative here, since the end goal is to come up with a list of
>> external libraries that we're confident have actually been tested for
>> compatibility by lots and lots of different users.
>>
>> -n
>>
>> --
>> Nathaniel J. Smith -- http://vorpus.org
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Defining a base linux-64 environment [was: Should I use pip install numpy in linux?]

2016-01-09 Thread Robert McGibbon
> do those packages use ld --as-needed for linking?

Is it possible to check this? I mean, there are over 7000 packages that I
check. I don't know how they were all built.

It's totally possible for many of them to be unused. A reasonably common
thing might be that packages use ctypes or dlopen to dynamically load
shared libraries that are actually just optional (and catch the error and
recover gracefully if the library can't be loaded).

-Robert

On Sat, Jan 9, 2016 at 4:20 AM, Julian Taylor <jtaylor.deb...@googlemail.com
> wrote:

> On 09.01.2016 12:52, Robert McGibbon wrote:
> > Hi all,
> >
> > I went ahead and tried to collect a list of all of the libraries that
> > could be considered to constitute the "base" system for linux-64. The
> > strategy I used was to leverage off the work done by the folks at
> > Continuum by searching through their pre-compiled binaries
> > from https://repo.continuum.io/pkgs/free/linux-64/ to find shared
> > libraries that were dependened on (according to ldd)  that were not
> > accounted for by the declared dependencies that each package made known
> > to the conda package manager.
> >
>
> do those packages use ld --as-needed for linking?
> there are a lot libraries in that list that I highly doubt are directly
> used by the packages.
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Defining a base linux-64 environment [was: Should I use pip install numpy in linux?]

2016-01-09 Thread Robert McGibbon
",
"numpy-1.8.2-py34_0", "numpy-1.9.0-py27_0", "numpy-1.9.0-py34_0",
"numpy-1.9.1-py27_0", "numpy-1.9.1-py34_0", "numpy-1.9.2-py27_0",
"numpy-1.9.2-py34_0"].

Note that this list of numpy versions doesn't include the latest ones --
all of the numpy-1.10 binaries made by Continuum pick up libgfortan from a
conda package and don't depend on it being provided by the system. Also,
the final '_0' or '_1' segment of many of these package names is the build
number, which is to make a new release of the same release of a package,
usually because of a packaging problem. So many of these packages were
probably built incorrectly and superseded by new builds with a higher build
number.

So it's not perfect. But it might be a useful starting place.

-Robert
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Defining a base linux-64 environment [was: Should I use pip install numpy in linux?]

2016-01-09 Thread Robert McGibbon
> Maybe a better approach would be to look at what libraries are used on
by an up-to-date default Anaconda install (on the assumption that this
is the best tested configuration)

That's not a bad idea. I also have a couple other ideas about how to filter
this based on using debian popularity-contests and the package graph. I
will report back when I have more info.

-Robert

On Sat, Jan 9, 2016 at 3:04 PM, Nathaniel Smith <n...@pobox.com> wrote:

> On Sat, Jan 9, 2016 at 3:52 AM, Robert McGibbon <rmcgi...@gmail.com>
> wrote:
> > Hi all,
> >
> > I went ahead and tried to collect a list of all of the libraries that
> could
> > be considered to constitute the "base" system for linux-64. The strategy
> I
> > used was to leverage off the work done by the folks at Continuum by
> > searching through their pre-compiled binaries from
> > https://repo.continuum.io/pkgs/free/linux-64/ to find shared libraries
> that
> > were dependened on (according to ldd)  that were not accounted for by the
> > declared dependencies that each package made known to the conda package
> > manager.
> >
> > The full list of these system libraries, sorted in from
> > most-commonly-depend-on to rarest, is below. There are 158 of them.
> [...]
> > So it's not perfect. But it might be a useful starting place.
>
> Unfortunately, yeah, it looks like there's a lot of false positives in
> here :-(. For example your list contains liblzma and libsqlite, but
> both of these are shipped as dependencies of python itself. So
> probably someone just forgot to declare the dependency explicitly, but
> got away with it because the libraries were pulled in anyway.
>
> Maybe a better approach would be to look at what libraries are used on
> by an up-to-date default Anaconda install (on the assumption that this
> is the best tested configuration), and then erase from the list all
> libraries that are shipped by this configuration (ignoring declared
> dependencies since those seem to be unreliable)? It's better to be
> conservative here, since the end goal is to come up with a list of
> external libraries that we're confident have actually been tested for
> compatibility by lots and lots of different users.
>
> -n
>
> --
> Nathaniel J. Smith -- http://vorpus.org
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Should I use pip install numpy in linux?

2016-01-08 Thread Robert McGibbon
Does anyone know if there's been any movements with the PyPI folks on
allowing linux wheels to be uploaded?

I know you can never be certain what's provided by the distro, but it seems
like if Anaconda can solve the
cross-distro-binary-distribution-of-compiled-python-extensions problem,
there shouldn't be much technically different for Linux wheels.

-Robert

On Fri, Jan 8, 2016 at 9:12 AM, Matthew Brett <matthew.br...@gmail.com>
wrote:

> Hi,
>
> On Fri, Jan 8, 2016 at 4:28 PM, Yuxiang Wang <yw...@virginia.edu> wrote:
> > Dear Nathaniel,
> >
> > Gotcha. That's very helpful. Thank you so much!
> >
> > Shawn
> >
> > On Thu, Jan 7, 2016 at 10:01 PM, Nathaniel Smith <n...@pobox.com> wrote:
> >> On Thu, Jan 7, 2016 at 6:18 PM, Yuxiang Wang <yw...@virginia.edu>
> wrote:
> >>> Dear all,
> >>>
> >>> I know that in Windows, we should use either Christoph's package or
> >>> Anaconda for MKL-optimized numpy. In Linux, the fortran compiler issue
> >>> is solved, so should I directly used pip install numpy to get numpy
> >>> with a reasonable BLAS library?
> >>
> >> pip install numpy should work fine; whether it gives you a reasonable
> >> BLAS library will depend on whether you have the development files for
> >> a reasonable BLAS library installed, and whether numpy's build system
> >> is able to automatically locate them. Generally this means that if
> >> you're on a regular distribution and remember to install a decent BLAS
> >> -dev or -devel package, then you'll be fine.
> >>
> >> On Debian/Ubuntu, 'apt install libopenblas-dev' is probably enough to
> >> ensure something reasonable happens.
> >>
> >> Anaconda is also an option on linux if you want MKL (or openblas).
>
> I wrote a page on using pip with Debian / Ubuntu here :
> https://matthew-brett.github.io/pydagogue/installing_on_debian.html
>
> Cheers,
>
> Matthew
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Should I use pip install numpy in linux?

2016-01-08 Thread Robert McGibbon
> Both Anaconda and Canopy build on a base default Linux system so that
> the built binaries will work on many Linux systems.

I think the base linux system is CentOS 5, and from my experience, it
seems like
this approach
has worked very well. Those packages are compatible with all essentially
all Linuxes that are
more recent than CentOS 5 (which is ancient). I have not heard of anyone
complaining that the
packages they install through conda don't work on their CentOS 4 or Ubuntu
6.06 box. I assume
Python / pip is probably used on a wider diversity of linux flavors than
conda is, so I'm sure that
binaries built on CentOS 5 won't work for absolutely _every_ linux user,
but it does seem to
cover the substantial majority of linux users.

Building redistributable linux binaries that work across a large number of
distros and distro
versions is definitely tricky. If you run ``python setup.py bdist_wheel``
on your Fedora Rawhide
box, you can't really expect the wheel to work for too many other linux
users. So given that, I
can see why PyPI would want to be careful about accepting Linux wheels.

But it seems like, if they make the upload something like

```
twine upload numpy-1.9.2-cp27-none-linux_x86_64.whl \
--yes-yes-i-know-this-is-dangerous-but-i-know-what-i'm-doing
```

that this would potentially be able to let packages like numpy serve their
linux
users better without risking too much junk being uploaded to PyPI.

-Robert


On Fri, Jan 8, 2016 at 3:50 PM, Matthew Brett <matthew.br...@gmail.com>
wrote:

> Hi,
>
> On Fri, Jan 8, 2016 at 11:27 PM, Chris Barker <chris.bar...@noaa.gov>
> wrote:
> > On Fri, Jan 8, 2016 at 1:58 PM, Robert McGibbon <rmcgi...@gmail.com>
> wrote:
> >>
> >> I'm not sure if this is the right path for numpy or not,
> >
> >
> > probably not -- AFAICT, the PyPa folks aren't interested in solving teh
> > problems we have in the scipy community -- we can tweak around the edges,
> > but we wont get there without a commitment to really solve the issues --
> and
> > if pip did that, it would essentially be conda -- non one wants to
> > re-impliment conda.
>
> Well - as the OP was implying, it really should not be too difficult.
>
> We (here in Berkeley) have discussed how to do this for Linux,
> including (Nathaniel mainly) what would be sensible for pypi to do, in
> terms of platform labels.
>
> Both Anaconda and Canopy build on a base default Linux system so that
> the built binaries will work on many Linux systems.
>
> At the moment, Linux wheels have the platform tag of either linux_i686
> (32-bit) or linux_x86_64 - example filenames:
>
> numpy-1.9.2-cp27-none-linux_i686.whl
> numpy-1.9.2-cp27-none-linux_x86_64.whl
>
> Obviously these platform tags are rather useless, because they don't
> tell you very much about whether this wheel will work on your own
> system.
>
> If we started building Linux wheels on a base system like that of
> Anaconda or Canopy we might like another platform tag that tells you
> that this wheel is compatible with a wide range of systems.   So the
> job of negotiating with distutils-sig is trying to find a good name
> for this base system - we thought that 'manylinux' was a good one -
> and then put in a pull request to pip to recognize 'manylinux' as
> compatible when running pip install from a range of Linux systems.
>
> Cheers,
>
> Matthew
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Should I use pip install numpy in linux?

2016-01-08 Thread Robert McGibbon
Well, it's always possible to copy the dependencies like libopenblas.so
into the wheel and fix up the RPATHs, similar to the way the Windows wheels
work.

I'm not sure if this is the right path for numpy or not, but it seems like
something would be suitable for some projects with compiled extensions. But
it's categorically ruled out by the PyPI policy, IIUC.

Perhaps this is OT for this thread, and I should ask on distutils-sig.

-Robert

On Fri, Jan 8, 2016 at 12:12 PM, Oscar Benjamin <oscar.j.benja...@gmail.com>
wrote:

>
> On 8 Jan 2016 19:07, "Robert McGibbon" <rmcgi...@gmail.com> wrote:
> >
> > Does anyone know if there's been any movements with the PyPI folks on
> allowing linux wheels to be uploaded?
> >
> > I know you can never be certain what's provided by the distro, but it
> seems like if Anaconda can solve the
> cross-distro-binary-distribution-of-compiled-python-extensions problem,
> there shouldn't be much technically different for Linux wheels.
>
> Anaconda controls all of the dependent non-Python libraries which are
> outside of the pip/pypi ecosystem. Pip/wheel doesn't have that option until
> such libraries are packaged up for PyPI (e.g. pyopenblas).
>
> --
> Oscar
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Should I use pip install numpy in linux?

2016-01-08 Thread Robert McGibbon
> Doesn't building on CentOS 5 also mean using a quite old version of gcc?

I have had pretty good luck using the (awesomely named) Holy Build Box
<http://phusion.github.io/holy-build-box/>, which is a CentOS 5 docker
image with a newer gcc version installed (but I guess the same old libc).
I'm not 100% sure how it works, but it's quite nice. For example, you can
use c++11 and still keep all the binary compatibility benefits of CentOS 5.

-Robert

On Fri, Jan 8, 2016 at 7:38 PM, Nathaniel Smith <n...@pobox.com> wrote:

> On Fri, Jan 8, 2016 at 7:17 PM, Nathan Goldbaum <nathan12...@gmail.com>
> wrote:
> > Doesn't building on CentOS 5 also mean using a quite old version of gcc?
>
> Yes. IIRC CentOS 5 ships with gcc 4.4, and you can bump that up to gcc
> 4.8 by using the Redhat Developer Toolset release (which is gcc +
> special backport libraries to let it generate RHEL5/CentOS5-compatible
> binaries). (I might have one or both of those version numbers slightly
> wrong.)
>
> > I've never tested this, but I've seen claims on the anaconda mailing
> list of
> > ~25% slowdowns compared to building from source or using system packages,
> > which was attributed to building using an older gcc that doesn't
> optimize as
> > well as newer versions.
>
> I'd be very surprised if that were a 25% slowdown in general, as
> opposed to a 25% slowdown on some particular inner loop that happened
> to neatly match some new feature in a new gcc (e.g. something where
> the new autovectorizer kicked in). But yeah, in general this is just
> an inevitable trade-off when it comes to distributing binaries: you're
> always going to pay some penalty for achieving broad compatibility as
> compared to artisanally hand-tuned binaries specialized for your
> machine's exact OS version, processor, etc. Not much to be done,
> really. At some point the baseline for compatibility will switch to
> "compile everything on CentOS 6", and that will be better but it will
> still be worse than binaries that target CentOS 7, and so on and so
> forth.
>
> -n
>
> --
> Nathaniel J. Smith -- http://vorpus.org
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Should I use pip install numpy in linux?

2016-01-08 Thread Robert McGibbon
> Continuum and Enthought both have a whole list of packages beyond
glibc that are safe enough to link to, including a bunch of ones that
would be big pains to statically link everywhere (libX11, etc.).
That's the useful piece of information that goes beyond just CentOS5 +
RH devtools + static linking -- can't tell of the "Holy Build Box" has
anything like that.

Probably-crazy Idea: One could reconstruct that list by downloading all of
https://repo.continuum.io/pkgs/free/linux-64/, untarring everything, and
running `ldd` on all of the binaries and .so files. Can't be that hard...
right?


-Robert

On Fri, Jan 8, 2016 at 8:03 PM, Nathaniel Smith <n...@pobox.com> wrote:

> On Fri, Jan 8, 2016 at 7:41 PM, Robert McGibbon <rmcgi...@gmail.com>
> wrote:
> >> Doesn't building on CentOS 5 also mean using a quite old version of gcc?
> >
> > I have had pretty good luck using the (awesomely named) Holy Build Box,
> > which is a CentOS 5 docker image with a newer gcc version installed (but
> I
> > guess the same old libc). I'm not 100% sure how it works, but it's quite
> > nice. For example, you can use c++11 and still keep all the binary
> > compatibility benefits of CentOS 5.
>
> They say they have gcc 4.8:
>
> https://github.com/phusion/holy-build-box#isolated-build-environment-based-on-docker-and-centos-5
> so I bet they're using RH's devtools gcc. This means that it works via
> the labor of some unsung programmers at RH who went through all the
> library changes between gcc 4.4 and 4.8, and put together a version of
> 4.8 that for every important symbol knows whether it's available in
> the old 4.4 libraries or not; for the ones that are, it dynamically
> links them; for the ones that aren't, it has a special static library
> that it pulls them out of. Like sewer cleaning, it's the kind of very
> impressive, incredibly valuable infrastructure work that I'm really
> glad someone does. Someone else who's not me...
>
> Continuum and Enthought both have a whole list of packages beyond
> glibc that are safe enough to link to, including a bunch of ones that
> would be big pains to statically link everywhere (libX11, etc.).
> That's the useful piece of information that goes beyond just CentOS5 +
> RH devtools + static linking -- can't tell of the "Holy Build Box" has
> anything like that.
>
> -n
>
> --
> Nathaniel J. Smith -- http://vorpus.org
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] deprecate random.random_integers

2016-01-04 Thread Robert Kern
On Sun, Jan 3, 2016 at 11:51 PM, G Young <gfyoun...@gmail.com> wrote:
>
> Hello all,
>
> In light of the discussion in #6910, I have gone ahead and deprecated
random_integers in my most recent PR here.  As this is an API change (sort
of), what are people's thoughts on this deprecation?

I'm reasonably in favor. random_integers() with its closed-interval
convention only exists because it existed in Numeric's RandomArray module.
The closed-interval convention was broadly been considered to be a mistake
introduced early in the stdlib random module and rectified with the
introduction and promotion of random.randrange() instead.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy funding update

2015-12-31 Thread Robert Kern
On Wed, Dec 30, 2015 at 10:54 AM, Ralf Gommers <ralf.gomm...@gmail.com>
wrote:
>
> Hi all,
>
> A quick good news message: OSDC has made a $5k contribution to NumFOCUS,
which is split between support for a women in technology workshop and
support for Numpy:
http://www.numfocus.org/blog/osdc-donates-5k-to-support-numpy-women-in-tech
> This was a very nice surprise to me, and a first sign that the FSA
(fiscal sponsorship agreement) we recently signed with NumFOCUS is going to
yield significant benefits for Numpy.
>
> NumFOCUS is also doing a special end-of-year fundraiser. Funds donated
(up to $5k) will be tripled by anonymous sponsors:
http://www.numfocus.org/blog/numfocus-end-of-year-fundraising-drive-5000-matching-gift-challenge
> So think of Numpy (or your other favorite NumFOCUS-sponsored project of
course) if you're considering a holiday season charitable gift!

That sounds great! Do we have any concrete plans for spending that money,
yet?

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] FeatureRequest: support for array construction from iterators

2015-12-14 Thread Robert Kern
On Mon, Dec 14, 2015 at 3:56 PM, Benjamin Root <ben.v.r...@gmail.com> wrote:

> By the way, any reason why this works?
> >>> np.array(xrange(10))
> array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

It's not a generator. It's a true sequence that just happens to have a
special implementation rather than being a generic container.

>>> len(xrange(10))
10
>>> xrange(10)[5]
5

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] FeatureRequest: support for array construction from iterators

2015-12-14 Thread Robert Kern
On Mon, Dec 14, 2015 at 5:41 PM, Benjamin Root <ben.v.r...@gmail.com> wrote:
>
> Heh, never noticed that. Was it implemented more like a
generator/iterator in older versions of Python?

No, it predates generators and iterators so it has always had to be
implemented like that.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: SfePy 2015.4

2015-12-01 Thread Robert Cimrman

I am pleased to announce release 2015.4 of SfePy.

Description
---

SfePy (simple finite elements in Python) is a software for solving systems of
coupled partial differential equations by the finite element method or by the
isogeometric analysis (preliminary support). It is distributed under the new
BSD license.

Home page: http://sfepy.org
Mailing list: http://groups.google.com/group/sfepy-devel
Git (source) repository, issue tracker, wiki: http://github.com/sfepy

Highlights of this release
--

- basic support for restart files
- new type of linear combination boundary conditions
- balloon inflation example

For full release notes see http://docs.sfepy.org/doc/release_notes.html#id1
(rather long and technical).

Best regards,
Robert Cimrman on behalf of the SfePy development team

---

Contributors to this release in alphabetical order:

Robert Cimrman
Grant Stephens
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] reshaping array question

2015-11-17 Thread Robert Kern
On Tue, Nov 17, 2015 at 3:48 PM, Neal Becker <ndbeck...@gmail.com> wrote:
>
> I have an array of shape
> (7, 24, 2, 1024)
>
> I'd like an array of
> (7, 24, 2048)
>
> such that the elements on the last dimension are interleaving the elements
> from the 3rd dimension
>
> [0,0,0,0] -> [0,0,0]
> [0,0,1,0] -> [0,0,1]
> [0,0,0,1] -> [0,0,2]
> [0,0,1,1] -> [0,0,3]
> ...
>
> What might be the simplest way to do this?

np.transpose(A, (-2, -1)).reshape(A.shape[:-2] + (-1,))

> 
> A different question, suppose I just want to stack them
>
> [0,0,0,0] -> [0,0,0]
> [0,0,0,1] -> [0,0,1]
> [0,0,0,2] -> [0,0,2]
> ...
> [0,0,1,0] -> [0,0,1024]
> [0,0,1,1] -> [0,0,1025]
> [0,0,1,2] -> [0,0,1026]
> ...

A.reshape(A.shape[:-2] + (-1,))

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] reshaping array question

2015-11-17 Thread Robert Kern
On Nov 17, 2015 6:53 PM, "Sebastian Berg" <sebast...@sipsolutions.net>
wrote:
>
> On Di, 2015-11-17 at 13:49 -0500, Neal Becker wrote:
> > Robert Kern wrote:
> >
> > > On Tue, Nov 17, 2015 at 3:48 PM, Neal Becker <ndbeck...@gmail.com>
wrote:
> > >>
> > >> I have an array of shape
> > >> (7, 24, 2, 1024)
> > >>
> > >> I'd like an array of
> > >> (7, 24, 2048)
> > >>
> > >> such that the elements on the last dimension are interleaving the
> > >> elements from the 3rd dimension
> > >>
> > >> [0,0,0,0] -> [0,0,0]
> > >> [0,0,1,0] -> [0,0,1]
> > >> [0,0,0,1] -> [0,0,2]
> > >> [0,0,1,1] -> [0,0,3]
> > >> ...
> > >>
> > >> What might be the simplest way to do this?
> > >
> > > np.transpose(A, (-2, -1)).reshape(A.shape[:-2] + (-1,))
> >
> > I get an error on that 1st transpose:
> >
>
> Transpose needs a slightly different input. If you look at the help, it
> should be clear. The help might also point to np.swapaxes, which may be
> a bit more straight forward for this exact case.

Sorry about that. Was in a rush and working from a faulty memory.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


  1   2   3   4   5   6   7   8   9   10   >